Z Os
Z Os
Z Os
Frank Kyne
Peter Cottrell
Christian Deligny
Gavin Foster
Robert Hain
Roger Lowe
Charles MacNiven
Feroni Suhood
ibm.com/redbooks
SG24-2079-01
Note: Before using this information and the product it supports, read the information in Notices on
page xiii.
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
The team that wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Chapter 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction to the sysplex environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 What is a sysplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Functions needed for a shared-everything environment. . . . . . . . . . . . . . . . . . . . . 4
1.2.2 What is a Coupling Facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Sysplex types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Parallel Sysplex test configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Chapter 2. Parallel Sysplex operator commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Overview of Parallel Sysplex operator commands . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 XCF and CF commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Determining how many systems are in a Parallel Sysplex . . . . . . . . . . . . . . . . . .
2.2.2 Determining whether systems are active . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3 Determining what the CFs are called . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.4 Obtaining more information about CF paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.5 Obtaining information about structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.6 Determining which structures are in the CF . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.7 Determining which Couple Data Sets are in use. . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.8 Determining which XCF signalling paths are defined and available . . . . . . . . . . .
2.2.9 Determining whether Automatic Restart Manager is active . . . . . . . . . . . . . . . . .
2.3 JES2 commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Determining JES2 checkpoint definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Releasing a locked JES2 checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.3 JES2 checkpoint reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Controlling consoles in a sysplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Determining how many consoles are defined in a sysplex . . . . . . . . . . . . . . . . . .
2.4.2 Managing console messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 GRS commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.1 Determining which systems are in a GRS complex . . . . . . . . . . . . . . . . . . . . . . .
2.5.2 Determining whether any jobs are reserving a device . . . . . . . . . . . . . . . . . . . . .
2.5.3 Determining whether there is resource contention in a sysplex . . . . . . . . . . . . . .
2.5.4 Obtaining contention information about a specific data set. . . . . . . . . . . . . . . . . .
2.6 Commands associated with External Timer References. . . . . . . . . . . . . . . . . . . . . . . .
2.6.1 Obtaining Sysplex Timer status information . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7 Miscellaneous commands and displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7.1 Determining the command prefixes in your sysplex . . . . . . . . . . . . . . . . . . . . . . .
2.7.2 Determining when the last IPL occurred . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7.3 Determining which IODF data set is being used . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8 Routing commands through the sysplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.9 System symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.10 Monitoring the sysplex through TSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents
13
14
14
14
14
15
16
19
22
23
24
25
25
25
26
27
27
27
28
28
28
29
30
30
31
31
32
33
33
34
34
36
36
iii
iv
39
40
40
41
41
41
48
49
50
50
52
52
54
54
55
55
56
57
57
58
59
60
60
61
62
63
68
71
71
72
73
74
74
75
75
76
76
77
77
77
78
79
79
79
81
81
81
83
84
85
88
101
102
102
103
103
104
105
107
109
110
112
113
114
114
115
116
119
119
120
125
136
137
137
143
146
147
147
161
161
163
165
166
166
167
168
169
169
170
Contents
8.5.3
8.5.4
8.5.5
8.5.6
8.5.7
vi
171
172
176
177
180
183
184
184
185
186
186
187
187
188
188
189
190
192
193
194
197
198
201
202
202
204
205
208
213
213
214
216
219
220
220
221
222
223
225
225
225
226
227
231
232
233
233
235
235
235
237
238
239
240
240
242
244
244
246
246
247
248
248
251
252
252
253
255
257
258
258
259
261
267
271
272
272
273
274
275
275
276
276
276
277
277
279
281
283
284
285
286
286
287
288
289
290
291
291
295
298
298
Contents
vii
viii
299
300
301
304
304
307
308
309
309
310
313
316
317
317
319
319
320
320
320
321
323
324
324
327
328
330
330
333
334
336
338
339
340
341
342
345
346
346
347
348
349
349
350
350
350
350
350
351
352
352
352
352
353
353
354
355
355
355
356
357
357
358
358
359
359
360
360
360
361
362
362
362
363
364
365
366
367
368
368
369
369
370
371
371
372
374
376
380
383
384
385
386
387
388
388
388
388
389
389
389
390
390
390
Contents
ix
391
391
391
392
392
393
393
394
394
397
398
398
398
398
401
401
402
404
406
406
407
409
411
411
411
412
412
413
415
416
417
420
421
421
421
425
425
425
426
427
428
429
431
431
431
432
433
433
433
434
438
443
444
446
447
447
447
447
448
449
449
450
451
451
451
452
452
452
452
452
453
455
456
460
461
461
462
464
464
464
465
465
466
466
467
468
468
469
469
469
469
469
470
472
472
473
474
475
476
478
479
480
481
Contents
xi
483
484
484
485
485
485
486
487
487
488
489
503
504
504
504
504
505
508
509
510
510
511
512
512
512
513
513
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
How to get Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
517
517
517
519
519
519
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
xii
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs.
xiii
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. These and other IBM trademarked terms are
marked on their first occurrence in this information with the appropriate symbol ( or ), indicating US
registered or common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
AIX
AS/400
CICSPlex
CICS
DB2
IBM
IMS/ESA
Language Environment
NetView
OMEGAMON
OS/390
Parallel Sysplex
PR/SM
RACF
Redbooks
Redbooks (logo)
Sysplex Timer
System z10
System z
Tivoli
VTAM
WebSphere
z/OS
z/VM
zSeries
xiv
Preface
This IBM Redbooks publication is a major update to the Parallel Sysplex Operational
Scenarios book, originally published in 1997.
The book is intended for operators and system programmers, and is intended to provide an
understanding of Parallel Sysplex operations. This understanding, together with the
examples provided in this book, will help you effectively manage a Parallel Sysplex and
maximize its availability and effectiveness.
The book has been updated to reflect the latest sysplex technologies and current
recommendations, based on the experiences of many sysplex customers over the last 10
years.
It is our hope that readers will find this to be a useful handbook for day-to-day sysplex
operation, providing you with the understanding and confidence to expand your exploitation
of the many capabilities of a Parallel Sysplex.
Knowledge of single-system z/OS operations is assumed. This book does not go into
detailed recovery scenarios for IBM subsystem components, such as CICS Transaction
Server, DB2 or IMS. These are covered in great depth in other Redbooks publications.
xv
Robert Hain is an IMS Systems Programmer in IBM Australia, based in Melbourne. He has
23 years of experience in the mainframe operating systems field, specializing for the past 20
in IMS. His areas of expertise include the implementation, configuration, management, and
support of IMS systems. He is also a member of the IMS worldwide advocate team, part of
the IMS development labs in San Jose, California. Robert coauthored a number of IBM
Redbooks publications about IMS, as well as the IBM Press publication An Introduction to
IMS.
Roger Lowe is a Senior Technical Consultant in the Professional Services division of
Independent Systems Integrators, an IBM Large Systems Business Partner in Australia. He
has 23 years of experience in the operating systems and mainframe field. His areas of
expertise include the implementation and configuration of the z/OS operating system and
Parallel Sysplex. Roger coauthored the IBM Redbooks publication Merging Systems into a
Sysplex, SG24-6818.
Charles MacNiven is a z/OS System Programmer in IBM Australia. Charles has more than
21 years of experience with working with customers in large mainframe environments in
Europe and Australia. His areas of expertise include the implementation, configuration, and
support of the z/OS operating system, DB2, and CICS.
Feroni Suhood is a Senior Performance Analyst in IBM Australia. He has 25 years of
experience in the mainframe operating systems field. His areas of expertise include Parallel
Sysplex, performance, and hardware evaluation. Feroni coauthored the IBM Redbooks
publication Merging Systems into a Sysplex, SG24-6818.
Thanks also to those responsible for the original version of this book:
David Clitherow
IBM UK
Fatima Cavichione
IBM Brazil
Howard Charter
IBM UK
Jim Ground
IBM US
Brad Habbershaw
IBM Canada
Thomas Hauge
DMData, Denmark
Simon Kemp
IBM UK
Marcos Roberto de Lara
IBM Portugal
Wee Heong Ng
IBM Singapore
Vicente Ranieri Junior
IBM Brazil
xvi
Thanks to the following people for their invaluable contributions and support to this project:
Bob Haimowitz
International Technical Support Organization, Poughkeepsie Center
Carol Woodhouse
Australian Development Lab, Gold Coast Center
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an e-mail to:
[email protected]
Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
Preface
xvii
xviii
Chapter 1.
Introduction
This chapter explains the structure of this book and introduces the concepts and principles of
a sysplex environment. It highlights the main components in a Parallel Sysplex environment
and touches on the following topics:
The difference between a base sysplex and a Parallel Sysplex
The functions of the hardware and software components that operators encounter in a
sysplex environment
The test Parallel Sysplex used for the examples in this document
The only way to do this is to have at least two copies of all the components that deliver the
application servicethat is, two z/OS systems, two database manager instances (both being
able to update the same database), two sets of CICS regions that run the same applications,
and so on. Parallel Sysplex provides the infrastructure to deliver this capability by letting you
share databases across systems, and enabling you to automatically route work to the most
appropriate system. Figure 1-1 shows the major components of a sysplex that contains two
systems.
CF2
XCF2
CF1
XCF1
VTAM/TCP
CICS TOR
DB2 IMS
CICS AOR
CICS AOR
CICS AOR
CICS AOR
VTAM/TCP
CICS TOR
DB2 IMS
Cons
XES
XCF
WLM
Logger
SFM
ARM
z/OS
Chan
Ext.
Switch
Primary Alternate
Sysplex CFRM
CDS
CDS
Network
Sysplex
Timer
Sysplex Timer
XCF
CICS AOR
CICS AOR
CICS AOR
CICS AOR
Network
Cons
XES
XCF
WLM
Logger
SFM
ARM
z/OS
DWDM
Switch
Primary Alternate
CFRM Sysplex
CDS
CDS
Having multiple copies (known as clones) of your production environment allows your
applications to continue to run on other systems if you should experience a planned or
unplanned outage of one of the systems, thereby masking the outage from the application
users. Also, you have the ability to restart the impacted subsystems on another system in the
sysplex, pending the recovery of the failed system. When this failure and restart management
is called for it can be initiated automatically, based on policies you define for the sysplex.
Being able to run multiple instances of a subsystem using the same data across multiple
z/OS systems also makes it possible to process more transactions than would be possible
with a single-system approach (except, of course, in the unlikely case where all instances
need to update exactly the same records at the same time). The transaction programs do not
need to be rewritten, because it is the database managers that transparently provide the data
sharing capability.
Chapter 1. Introduction
There are also value-for-money advantages that you can realize from exploiting the sysplex
capabilities. Imagine you have two processors, and one has 75 MIPS of unused capacity and
the other has 50 MIPS. Also imagine that you want to add a new application that requires
100 MIPS.
If the application supports data sharing, you can divide it up and run some transactions on
one system and some on the other, thereby fully exploiting the unused capacity. On the other
hand, if the workload does not support data sharing, you must run all 100 MIPS of work in the
same system, meaning that you must purchase an upgrade for one of the two processors.
Additionally, if your work can run on any system in the sysplex, and you need more capacity,
you have the flexibility to add capacity to any of the current processors, or even to add
another processor to the sysplex, whichever is the most cost-effective option.
It may also be possible to break up large database queries into smaller parts and run those
parts in parallel across the members of the sysplex, resulting in significantly reduced elapsed
times for these transactions.
Common time
The first thing you will need is an ability to have every system use exactly the same time. Why
is this needed? Consider what happens when a database manager updates a database. For
every update, a log record is written containing a copy of the record before the update (so
failed updates can be backed out) and a copy of the record after the update (so updates can
be reapplied if the database needs to be recovered from a backup).
If there is only a single database manager updating the database, all the log records will be
created in the correct sequence, and the time stamps in the log records will be consistent with
each other. So, if you need to recover a database, you would restore the backup, then apply
all the updates using the log records from the time of the backup through to the time of the
failure.
But what happens if two or more database managers are updating the database? If you need
to recover the database, you would again restore it from the backup, then merge the log files
(in time sequence) and apply the log records again. Because the log records contain the after
image for each update, it is vital that the updates are applied in the correct sequence. This
means that both database managers must have their clocks synchronized, to ensure that the
time stamps in each log record are consistent, regardless of which database manager
instance created them.
4
In a sysplex environment, the need to have a consistent time across all the members of the
sysplex is addressed by attaching all the processors in the sysplex to a Sysplex Timer, or by
its replacement, Server Time Protocol (STP). Note that the objective of having a common
time source is not to have a more accurate time, but rather to have the same time across all
members of the sysplex. For more information about Sysplex Timers and STP, see 2.6,
Commands associated with External Timer References on page 31.
Buffer coherency
Probably the most common way to improve the performance of a database manager is to
give it more buffers. Having more buffers means that it can keep copies of more data records
in processor storage, thus avoiding the delay associated with having to read the data from
disk.
In a data sharing environment you will have multiple database managers, each with its own
set of buffers. It is likely that some data records will be contained in the buffers of more than
one database manager instance. This does not cause any issues as long as all the database
managers are only reading the data. But, what happens if database manager DB2A updates
data that is currently in the buffers of database manager DB2B? If there is no mechanism for
telling DB2B that its copy is outdated, then that old record could be passed to a transaction
which treats that data as current.
Therefore, when you have multiple database manager instances, all with update access to a
shared database, you need some mechanism that the database managers can use to
determine whether a record in their buffer has been updated elsewhere. One way to address
this would be for every instance to tell all the other instances every time it adds or removes a
record to its buffers. But this would generate tremendous overhead, especially as the number
of instances in the sysplex increases.
The solution that is implemented in a Parallel Sysplex is for each database manager to tell
the Coupling Facility (CF) every time it adds a record to its local buffer. The CF then knows
which instances have a copy of any given piece of data. Each instance also tells the CF every
time it updates one of those records. Because the CF knows who has a copy of each record,
it also knows who it has to tell when a given record is updated. This process is called
Cross-Invalidation, and it is handled automatically by the database managers and the CF.
Serialization
Because you can have multiple database manager instances all able to update any data in
the database, you may be wondering how to avoid having two instances make two different
updates to the same piece of data at the same time.
Again, one way to achieve this could be for every instance to talk to all the other instances to
ensure that no one else is updating a piece of data that it is about to update. However, this
would be quite inefficient, especially if there are many instances with access to the shared
database.
In a Parallel Sysplex, this requirement for serializing data access is achieved by using a lock
structure in the CF. Basically, every time a database manager instance wants to work with a
record (either to read it or to update it), it sends a lock request to the CF, identifying the record
in question and the type of access requested. Because the CF has knowledge of all the lock
requests, it knows what types of accesses are in progress for that record.
If the request is for shared access, and no one else has exclusive access, the CF grants the
request. Or if the request is for exclusive access, and no one else is accessing the record at
this time, the request is granted.
Chapter 1. Introduction
But if the type of serialized access needed by this request is not compatible with an instance
that is already accessing the record, the CF denies the request and identifies the current
owner of that data (who is doing the exclusive access).1 When the current update completes,
access will be granted to the next database manager in the queue, allowing it to make its
update.
Monitoring
If you are going to be able to run your work on any of the systems in the sysplex, then you will
probably want some way for products that provide a service to be aware of the status of their
peers on other systems. Of course, you could do this by having all the peers constantly
talking to each other to ensure they are still alive. But this would waste a lot of resource, with
all these programs talking back and forth to each other all the time, and only discovering a
failure a tiny fraction of the time.
A more efficient alternative is for the products to register their presence with the system, and
ask the system to inform them if one of the peer instances disappears. Because the system is
aware any time an address space is started or ends, it automatically knows if any of the peers
stop. As a result, it is much more efficient to have the system monitor for the availability of the
peer members, and to inform the remaining address spaces should one of them go away.
The system component that provides this service is called Cross-System Coupling Facility
(XCF).
Building on top of this concept, you also have the ability to monitor complete systems. Every
few seconds, every system updates a data set called the Sysplex Couple Data Set with its
current time stamp. At the same time, it checks the time stamp of all the other members of the
sysplex. If it finds that a system has not updated its time stamp in a certain interval (known as
the Failure Detection Interval), it can inform the operator that the system in question appears
to have failed. It can even automatically remove that failed system from the sysplex using a
function known as Sysplex Failure Management.
Workload distribution
We have now discussed how you have the ability to run work (transactions and batch jobs) on
more than one system in the sysplex, all accessing the same data. And the programs that
provide services are able to communicate with each other using XCF. This means that any
work you want to run can potentially run anywhere in the sysplex. And if one of the systems is
unavailable for some reason, the work can be processed on one of the other systems.
This description is not strictly accurate, but it is sufficient in the context of this discussion.
However, to derive the maximum benefit from this, you need two other things:
The ability to present a single system image to the user, so that if one system is down, the
user can still log on in the normal way, completely unaware of the fact that one of the
systems is down.
The ability to send incoming work requests to whichever system is best able to service
that work. The decision about which is the best system might be based on the response
times being delivered by the different systems in the sysplex, or on which system has the
most spare capacity.
Both of these capabilities are provided in a sysplex. Both VTAM and TCP provide the ability
for multiple transaction manager instances (CICS or IMS, for example) to use the same name
and have work routed to one of those instances. For example, you might have four CICS
Terminal Owning Regions that call themselves CICSPROD. When the users want to use this
service, they would logon to CICSPROD. Even if three of the four regions were down, the
user would still be able to logon to the fourth region, unaware that the other three regions are
currently down.
This capability to have multiple work managers use the same name can then be combined
with support in a system component called the Workload Manager (WLM). WLM is
responsible for assigning sysplex resources to work items in accordance with
installation-specified objectives. WLM works together with VTAM and TCP and the
transaction managers to decide which is the most appropriate system for each piece of work
to run on. This achieves the objectives of helping the work achieve its performance targets,
masking planned or unplanned outages from users, and also making full use of capacity
wherever it might be available in the sysplex.
System Logger
If you can run work anywhere in the sysplex, what other services would be useful? Many
system services create logs; syslog is probably the one you are most familiar with. z/OS
contains a system component called System Logger that provides a single syslog which
combines the information from all the systems. This avoids you having to look at multiple logs
and merge the information yourself. The exploiters you are probably most familiar with are
OPERLOG (for sysplex-wide syslog) and LOGREC (for sysplex-wide error information).
Other users of System Logger are CICS (for its log files), IMS (when using Shared Message
Queue), RRS, z/OS Health Checker, and others.
Chapter 1. Introduction
The Automatic Restart Manager is discussed in more detail in Chapter 6, Automatic Restart
Manager on page 83.
CICSPlex
There are a number of aspects to a CICS service. For example, there is the function that
manages interactions with the users terminal. There is the part that runs the actual
application code (such as reading a database and obtaining account balances). And there
may be other specialized functions, like providing access to in-storage databases. In the past,
it was possible that an error in application code could crash a CICS region, impacting all the
users logged on to that region.
To provide a more resilient environment, CICS provides the ability to run each of these
functions in a different region. For example, the code that manages interactions with the
users terminal tends to be very stable, so setting up a CICS region that only provides this
function (known as a Terminal Owning Region, or TOR) results in a very reliable service. And
by providing many regions that run the application code (called an Application Owning
Region, or AOR), if one region abends (or is stopped to make a change), other regions are
still available to process subsequent transactions. Running your CICS regions like this is
known as Multi Region Option (MRO). MRO can be used in both a single system environment
or in a sysplex.
When used in a sysplex, MRO is often combined with a CICS component called CICSPlex
System Manager. CICSPlex System Manager provides a single point of control for all the
CICS regions in the sysplex. It also provides the ability to control which AOR a given
transaction should be routed to.
Storage in the CF is assigned to entities called structures. The type of services that can be
provided in association with a given structure is dependent on the structure type. In normal
operation, you do not need to know what type a given structure is; this is all handled
automatically by whatever product is using the CF services. However, understanding that the
CF provides different types of services is useful when you are managing a CF, or if there is a
failure of a CF.
A CF has the unique ability to be stopped without impacting the users of its services. For
example, a CF containing a DB2 Lock structure could be shut down, upgraded, and brought
back online without impacting the DB2 subsystems that are using the structure. In fact, the
CF could even fail unexpectedly, and the users of its services could continue operating. This
capability relies on a combination of services provided by the CF and support in the products
that use its services that enable the contents of one CF to be dynamically moved to another
CF. For this reason, we recommend that every sysplex have at least two Coupling Facilities.
CFRM policy
The names and attributes of the structures that can reside in your CFs are described in a file
called the CFRM policy, which is stored in the CFRM Couple Data Set. The CFRM policy
would normally be created and maintained by the Systems Programmer. The contents of the
active CFRM policy can be displayed using a display command from the console. Some
structures have a fixed name (ISGLOCK, the GRS structure, for example). Other structures
(the JES2 checkpoint, for example) have a completely flexible name. Some of the information
that is included in the policy includes:
Information about your Coupling Facilities (LPAR name, serial number, and so on)
The name and sizes (minimum, initial, and maximum amounts) of each structure
Which CF each structure may be allocated in
Whether the system should monitor the use of storage within the structure and the
threshold at which the system should automatically adjust the structures size
The CF runs in an LPAR on any System z or zSeries processor. The code that is executed
in the LPAR is called Coupling Facility Control Code. This code is stored on the Service
Element of the processor and is automatically loaded when the CF LPAR is activated.
Unlike a z/OS system, a CF is not connected to disks or tapes or any of the normal peripheral
devices. Instead, the CF is connected to the z/OS systems that use its services by special
channels called CF Links. The only other device connected to the CF is the HMC, through
which a small set of commands can be issued to the CF. The links used to connect z/OS to
the CF are shown when you display information about the CF on the MVS console.
GoldPlex
PlatinumPlex
This document uses these terms to refer to the different types of sysplex. For more
information about this topic, refer to the IBM Redbooks publication Merging Systems into a
Sysplex, SG24-6818.
FACIL01
FACIL02
CFLvl 14
CFLvl 14
#@$1
#@$2
#@$3
CICS TS 3.1
DB2 V8
IMS V9
MQ V6
CICS TS 3.1
DB2 V8
IMS V9
MQ V6
CICS TS 3.1
DB2 V8
IMS V9
MQ V6
z/OS 1.7
z/OS 1.8
z/OS 1.8
Prim
Figure 1-2 The Test Parallel Sysplex configuration
10
This is a three-way, data sharing Parallel Sysplex with two Coupling Facilities. Each system
contains DB2, IMS, CICS, and MQ. All are set up to use the Coupling Facility to enable data
sharing, queue sharing, and dynamic workload balancing.
This sysplex is actually based on an offering known as the Parallel Sysplex Training
Environment, which is sold through IBM Technology and Education Services. The offering
consists of a full volume dump of the environment, a set of workloads to generate activity in
the sysplex, and an Exercise Guide. The offering can be installed in native LPARs or under
z/VM. We find that z/VM provides an excellent test environment because nearly everything
works exactly as it would in a native environmentbut you have more control over the scope
of things that can be touched from the test environment. The use of z/VM also makes it very
easy to add more systems, more Coupling Facilities, to add or remove CTCs, and so on.
Note: The unusual sysplex and system names (and subsystem names, as you will see
later in this book) were deliberately selected to minimize the chance of this sysplex having
the same names as any customer environment.
The three-way sysplex allows you to test recovery from a CF link failure, a CF failure, and a
system failure. Having workloads running at the same time makes the reaction of the system
and subsystems to these failure more closely resemble what happens in a production
environment.
Chapter 1. Introduction
11
12
Chapter 2.
13
#@$2
#@$3
The output displays the name of your sysplex (in this example, #@$#PLEX).
Note: Be aware that, although system names are shown in this display, it does not
necessarily mean they are currently active. They may be in the process of being
partitioned out of the sysplex, for example.
14
D XCF,S,ALL
IXC335I 18.53.10 DISPLAY
SYSTEM
TYPE SERIAL LPAR
#@$3
2084 6A3A
N/A
#@$2
2084 6A3A
N/A
#@$1
2084 6A3A
N/A
XCF 491
STATUS TIME
06/21/2007 18:53:10
06/21/2007 18:53:06
06/21/2007 18:53:07
SYSTEM STATUS
ACTIVE
ACTIVE
ACTIVE
TM=SIMETR
TM=SIMETR
TM=SIMETR
COUPLING FACILITY
SIMDEV.IBM.EN.0000000CFCC1
PARTITION: 00 CPCID: 00
SIMDEV.IBM.EN.0000000CFCC2
PARTITION: 00 CPCID: 00
SITE
N/A
N/A
For more information about the contents of each CF, as well as information about which
systems are connected to each CF, use the D XCF,CFNM=ALL command. This is discussed
further in 2.2.6, Determining which structures are in the CF on page 22.
For detailed physical information about each Coupling Facility, issue the D CF command as
shown in Figure 2-4 on page 16. The display is repeated for each CF defined in the CFRM
policy that is currently available to the z/OS system where the command was issued.
For example, some installations define their Disaster Recovery Coupling Facilities in their
CFRM policy. These CFs would be shown in the output from the D XCF,CF command.
However, they would not show in the output from the D CF command because they are not
online to that system.
The information displayed in Figure 2-4 on page 16 contains the following details:
1 The name and physical information about the CF
2 Space utilization and the CF Level and service level
3 CF information (type and status)
4 Subchannel status
5 Information about remote CFs (used for System Managed Duplexing)
CF Level 15 provides additional information:
The number of dedicated and shared PUs in the CF
Whether Dynamic CF Dispatching is enabled on this CF
For more detailed information about the Coupling Facility, refer to Chapter 7, Coupling
Facility considerations in a Parallel Sysplex on page 101.
15
D CF
IXL150I 19.02.15 DISPLAY CF 516
COUPLING FACILITY SIMDEV.IBM.EN.0000000CFCC1
PARTITION: 00 CPCID: 00
CONTROL UNIT ID: 0309
NAMED FACIL01 1
COUPLING FACILITY SPACE UTILIZATION
ALLOCATED SPACE
DUMP SPACE UTILIZATION
STRUCTURES:
108544 K
STRUCTURE DUMP TABLES:
DUMP SPACE:
2048 K
TABLE COUNT:
FREE SPACE:
612864 K
FREE DUMP SPACE:
TOTAL SPACE:
723456 K
TOTAL DUMP SPACE:
MAX REQUESTED DUMP SPACE:
VOLATILE:
YES
STORAGE INCREMENT SIZE:
CFLEVEL:
14
CFCC RELEASE 14.00, SERVICE LEVEL 00.29
BUILT ON 03/26/2007 AT 17:58:00
COUPLING FACILITY HAS ONLY SHARED PROCESSORS
COUPLING FACILITY SPACE CONFIGURATION 2
IN USE
FREE
CONTROL SPACE:
110592 K
612864 K
NON-CONTROL SPACE:
0 K
0 K
SENDER PATH 3
09
0E
PHYSICAL
ONLINE
ONLINE
LOGICAL
ONLINE
ONLINE
0
0
2048
2048
0
256
K
K
K
K
K
TOTAL
723456 K
0 K
CHANNEL TYPE
ICP
ICP
NOT USABLE:
M=CHP display). When the zSeries range of processors were announced, an enhanced link
type (peer mode links) was introduced. Because CFR and CFS links are not strategic, this
document only discusses peer mode links.
Previously, on zSeries processors, three types of CF links were supported:
Internal Coupling (IC)
System z10 introduced a new type of CF link known as Parallel Sysplex over Infiniband
(PSIFB) or Coupling over Infiniband (CIB). These also use fiber connections, and at the time
of writing support a maximum distance of 150 meters between the processors.
There are two ways you can obtain information about the CF links. The first way is to issue a
D M=CHP command; Figure 2-5 on page 18 shows an example of the use of this command.
Results of the display that are irrelevant to this exercise have been omitted and replaced with
an ellipsis (...).
17
D M=CHP
IEE174I 20.02.38 DISPLAY M 635
CHANNEL PATH STATUS
0 1 2 3 4 5 6 7 8 9 A B C D E F
0 + + + + + + + + + + + + + + + +
...
F + + + + + + + + + + + + + + + +
************************ SYMBOL EXPLANATIONS ********************
+ ONLINE
@ PATH NOT VALIDATED - OFFLINE
. DOES NOT EXIST
* MANAGED AND ONLINE
# MANAGED AND OFFLINE
CHANNEL PATH TYPE STATUS
0 1 2 3 4 5 6 7 8 9 A B C D E F
0 11 11 11 11 11 11 11 14 11 23 14 14 11 14 23 23 1
...
A 1B 1D 00 00 00 00 00 00 00 00 00 00 00 00 00 00
B 00 00 00 00 00 00 00 00 00 00 00 00 00 00 22 22 2
C 21 21 21 21 21 21 21 21 17 17 21 21 00 00 00 00
D 00 00 00 00 23 23 23 23 23 23 23 23 23 23 00 00
E 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
F 24 24 24 24 00 00 00 00 00 00 00 00 24 24 24 24
************************ SYMBOL EXPLANATIONS ******
00 UNKNOWN
UNDEF
01 PARALLEL BLOCK MULTIPLEX
BLOCK
02 PARALLEL BYTE MULTIPLEX
BYTE
03 ESCON POINT TO POINT
CNC_P
04 ESCON SWITCHED OR POINT TO POINT CNC_?
05 ESCON SWITCHED POINT TO POINT
CNC_S
06 ESCON PATH TO A BLOCK CONVERTER
CVC
07 NATIVE INTERFACE
NTV
08 CTC POINT TO POINT
CTC_P
09 CTC SWITCHED POINT TO POINT
CTC_S
0A CTC SWITCHED OR POINT TO POINT
CTC_?
0B COUPLING FACILITY SENDER
CFS
0C COUPLING FACILITY RECEIVER
CFR
0D UNKNOWN
UNDEF
0E UNKNOWN
UNDEF
0F ESCON PATH TO A BYTE CONVERTER
CBY
...
1A FICON POINT TO POINT
FC
1B FICON SWITCHED
FC_S
1C FICON TO ESCON BRIDGE
FCV
1D FICON INCOMPLETE
FC_?
1E DIRECT SYSTEM DEVICE
DSD
1F EMULATED I/O
EIO
20 RESERVED
UNDEF
21 INTEGRATED CLUSTER BUS PEER
CBP
22 COUPLING FACILITY PEER
CFP 3
23 INTERNAL COUPLING PEER
ICP 4
24 INTERNAL QUEUED DIRECT COMM
IQD
25 FCP CHANNEL
FCP
NA INFORMATION NOT AVAILABLE
Figure 2-5 Display all CHPs
18
You can also display specific CHPIDs to learn their details, as shown in Figure 2-6.
D M=CHP(BE)
IEE593I CHANNEL PATH BE HAS NO OWNERS
IEE174I 20.17.48 DISPLAY M 660
CHPID BE: TYPE=22, DESC=COUPLING FACILITY PEER, ONLINE
Figure 2-6 Display CFP-type channel
Figure 2-6 indicates that even though the channel is online, it is not in use. (We would not
really expect it to be in this configuration, because all of the systems are on the same CEC.)
Figure 2-7 shows the ICP type channel 0E, which was established previously. This is the type
that we would expect to be in use in this exercise.
D M=CHP(0E)
IEE174I 20.20.05 DISPLAY M 926
CHPID 0E: TYPE=23, DESC=INTERNAL COUPLING PEER, ONLINE
COUPLING FACILITY SIMDEV.IBM.EN.0000000CFCC1
PARTITION: 00 CPCID: 00
NAMED FACIL01
CONTROL UNIT ID: 0309
SENDER PATH
0E
PHYSICAL
ONLINE
LOGICAL
ONLINE
CHANNEL TYPE
ICP
NOT USABLE:
The second way to obtain information is to issue a D CF command for the CF you want to
know about, as shown in Figure 2-8. The CF names are shown in Figure 2-4 on page 16.
D CF,CFNAME=FACIL01
...
SENDER PATH
PHYSICAL
09
ONLINE
0E
ONLINE
LOGICAL
ONLINE
ONLINE
CHANNEL TYPE
ICP
ICP
19
For a display of only the structures that are currently allocated, use the
D XCF,STR,STAT=ALLOC command.
D XCF,STR
IXC359I 19.09.14 DISPLAY XCF 536
STRNAME
ALLOCATION TIME
CIC_DFHLOG_001 06/21/2007 01:47:54
CIC_DFHSHUNT_001 06/21/2007 01:47:56
CIC_GENERAL_001
--D#$#_GBP1
06/20/2007 04:11:05
D#$#_GBP1
06/20/2007 04:11:01
D#$#_GBP32K1
D#$#_LOCK1
--06/20/2007 03:32:17
D#$#_LOCK1
06/20/2007 03:32:15
D#$#_SCA
06/20/2007 03:32:12
D#$#_SCA
06/20/2007 03:32:10
DFHCFLS_#@$CFDT1
I#$#EMHQ
I#$#LOCK1
IGWCACHE1
IGWLOCK00
IRRXCF00_B001
ISGLOCK
ISTGENERIC
IXC_DEFAULT_1
JES2CKPT_1
LOG_FORWARD_001
LOG_SA390_MISC
. . .
06/21/2007
---06/16/2007
06/18/2007
06/18/2007
06/16/2007
06/18/2007
----
01:47:27
---06:36:16
03:43:29
03:43:12
06:36:26
03:43:00
----
STATUS
TYPE
ALLOCATED
LIST
ALLOCATED
LIST
NOT ALLOCATED
ALLOCATED (NEW)
CACHE
DUPLEXING REBUILD
METHOD: USER-MANAGED
PHASE: DUPLEX ESTABLISHED
ALLOCATED (OLD)
CACHE
DUPLEXING REBUILD
NOT ALLOCATED
ALLOCATED (NEW)
LOCK
DUPLEXING REBUILD
METHOD: SYSTEM-MANAGED
PHASE: DUPLEX ESTABLISHED
ALLOCATED (OLD)
LOCK
METHOD: SYSTEM-MANAGED
PHASE: DUPLEX ESTABLISHED
ALLOCATED (NEW)
LIST
DUPLEXING REBUILD
METHOD: SYSTEM-MANAGED
PHASE: DUPLEX ESTABLISHED
ALLOCATED (OLD)
LIST
DUPLEXING REBUILD
ALLOCATED
LIST
NOT ALLOCATED
NOT ALLOCATED
NOT ALLOCATED
ALLOCATED
LOCK
ALLOCATED
CACHE
ALLOCATED
LOCK
ALLOCATED
SLIST
ALLOCATED
LIST
NOT ALLOCATED
NOT ALLOCATED
NOT ALLOCATED
20
If you need more detail about a specific structure, for example the ISGLOCK structure, issue
the D XCF,STR,STRNAME=name command as shown in Figure 2-10.
D XCF,STR,STRNAME=ISGLOCK
IXC360I 02.32.08 DISPLAY XCF 493
STRNAME: ISGLOCK
STATUS: ALLOCATED
EVENT MANAGEMENT: POLICY-BASED
TYPE: LOCK
POLICY INFORMATION:
POLICY SIZE
: 8704 K
POLICY INITSIZE: 8704 K
POLICY MINSIZE : 0 K
FULLTHRESHOLD : 80
ALLOWAUTOALT : NO
REBUILD PERCENT: 1
DUPLEX
: DISABLED
ALLOWREALLOCATE: YES
PREFERENCE LIST: FACIL02 FACIL01
ENFORCEORDER : NO
EXCLUSION LIST IS EMPTY
ACTIVE STRUCTURE
---------------ALLOCATION TIME: 06/18/2007 03:43:12
CFNAME
: FACIL02
COUPLING FACILITY: SIMDEV.IBM.EN.0000000CFCC2
PARTITION: 00
CPCID: 00
ACTUAL SIZE
: 8704 K
STORAGE INCREMENT SIZE: 256 K
LOCKS:
TOTAL:
1048576
PHYSICAL VERSION: C0C39A21 7B9444C5
LOGICAL VERSION: C0C39A21 7B9444C5
SYSTEM-MANAGED PROCESS LEVEL: 8
XCF GRPNAME
: IXCLO007
DISPOSITION
: DELETE
1
ACCESS TIME
: 0
MAX CONNECTIONS: 32
# CONNECTIONS : 3
CONNECTION NAME
---------------ISGLOCK##@$1
ISGLOCK##@$2
ISGLOCK##@$3
ID
-03
02
01
VERSION
-------00030067
00020060
0001008D
SYSNAME
-------#@$1
#@$2
#@$3
JOBNAME
-------GRS
GRS
GRS
ASID
---0007
0007
0007
STATE
-------ACTIVE
ACTIVE
ACTIVE
21
The information displayed following the ACTIVE STRUCTURE line represents the actual
structure. This shows which CF the structure is currently allocated in, the structure size,
which address spaces are connected to it, and so on.
In the command output, the disposition 1 of DELETE is of particular interest. This specifies
that when the final user of this structure shuts down cleanly, the structure will be deleted. The
next time an address space that uses this structure tries to connect to it, the structure will be
allocated again, using information from the CFRM policy in most cases.
The extended version of this command, D XCF,STR,strname,CONNAME=ALL, provides all the
information shown in Figure 2-10 on page 21, as well as information unique to each
connector to the structure. This information can help you determine whether the structure
connectors support functions such as User-Managed Duplexing, System-Managed Rebuild,
System-Managed Duplexing, and so on.
D#$#_GBP0(NEW)
D#$#_SCA(OLD)
DFHXQLS_#@$STOR1
SYSTEM_OPERLOG
D#$#_GBP1(NEW)
DFHCFLS_#@$CFDT1
IRRXCF00_P001
This command also shows information about which systems are connected to this CF.
Note: In case of a CF failure, the information in the output from this command represents
the CF contents at the time of the failure. Normally, structures will automatically rebuild
from a failed CF to an alternate.
If you issue this command before the failed CF is brought online again, you will see that
some structures are listed as being in both CFs. After the failed CF comes online, it
communicates with z/OS to verify which structures are still in the CF (normally, the CF
would be empty at this point), and this information will be updated at that time.
22
23
D XCF,COUPLE
IXC357I 02.41.07 DISPLAY XCF 510
SYSTEM #@$3 DATA
1INTERVAL OPNOTIFY
MAXMSG
85
88
2000
SSUM ACTION
3 N/A
CLEANUP2
15
SSUM INTERVAL
N/A
RETRY
10
CLASSLEN
956
WEIGHT MEMSTALLTIME
N/A
N/A
2.2.8 Determining which XCF signalling paths are defined and available
For the XCF function on the different members of the sysplex to be able to communicate with
each other, some method of connecting the systems must be defined. These communication
paths are known as XCF signalling resources.
The D XCF,PATHIN/PATHOUT commands provide information for only those devices and
structures that are defined to the system where the commands are entered (in this example,
#@$3).
To obtain information about the inbound paths, enter D XCF,PI as shown in Figure 2-13.
D XCF,PI
IXC355I 03.07.43 DISPLAY XCF 546
PATHIN FROM SYSNAME: #@$1
STRNAME:
IXC_DEFAULT_1
IXC_DEFAULT_2
PATHIN FROM SYSNAME: 1 ???????? - PATHS NOT CONNECTED TO OTHER SYSTEMS
STRNAME:
IXC_DEFAULT_1
IXC_DEFAULT_2
Figure 2-13 Display inbound signalling paths
To obtain information about the outbound paths, enter D XCF,PO as shown in Figure 2-14
on page 25.
24
D XCF,PO
IXC355I 03.09.40 DISPLAY XCF 550
PATHOUT TO SYSNAME:
#@$1
STRNAME:
IXC_DEFAULT_1
IXC_DEFAULT_2
PATHIN FROM SYSNAME: 1 ???????? - PATHS NOT CONNECTED TO OTHER SYSTEMS
STRNAME:
IXC_DEFAULT_1
IXC_DEFAULT_2
Figure 2-14 Display outbound signalling paths
As shown, there is one path not connected to another system. A likely reason for this is that
the target system (#@$2) may not be active at the time the display was done.
For a more detailed display, issue either D XCF,PI,DEV=ALL or D XCF,PO,DEV=ALL.
25
$DCKPTDEF
$HASP829 CKPTDEF 489
$HASP829 CKPTDEF CKPT1=(STRNAME=JES2CKPT_1,INUSE=YES,
1
$HASP829
VOLATILE=YES),CKPT2=(DSNAME=SYS1.JES2.CKPT2,2
$HASP829
VOLSER=#@$#M1,INUSE=YES,VOLATILE=NO),
$HASP829
NEWCKPT1=(DSNAME=,VOLSER=),NEWCKPT2=(DSNAME=,
$HASP829
VOLSER=),MODE=DUPLEX,DUPLEX=ON,LOGSIZE=7,
$HASP829
VERSIONS=(STATUS=ACTIVE,NUMBER=2,WARN=80,
$HASP829
MAXFAIL=0,NUMFAIL=0,VERSFREE=2,MAXUSED=1),
$HASP829
RECONFIG=NO,VOLATILE=(ONECKPT=WTOR,
$HASP829
ALLCKPT=WTOR),OPVERIFY=NO
Figure 2-16 Display JES2 checkpoint definitions
As shown in Figure 2-18, the recovery for this is to first issue a $D MASDEF command. This is
best done with the RO *ALL option, for a display of all members.
$D MASDEF
$HASP843 MASDEF 604
$HASP843 MASDEF OWNMEMB=#@$1,AUTOEMEM=ON1,CKPTLOCK=ACTION,
$HASP843
COLDTIME=(2006.164,19:53:14),COLDVRSN=z/OS 1.4,
$HASP843
DORMANCY=(0,100),HOLD=0,LOCKOUT=1000,
$HASP843
RESTART=YES2,SHARED=CHECK,SYNCTOL=120,
$HASP843
WARMTIME=(2007.192,03:11:03),XCFGRPNM=XCFJES2A,
$HASP843
QREBUILD=0
Figure 2-18 Display JES2 MASDEF
As shown, the AUTOEMEM parm is set to ON 1 and RESTART 2 is set to YES. In this case,
it should auto-recover.
26
If AUTOEMEM were OFF (or if it were set to ON but the RESTART parm was set to NO), then
the operator should issue the command $E CKPTLOCK,HELDBY=sysname, an example of which
is shown in Figure 2-19.
$E CKPTLOCK,HELDBY=#@$1
Figure 2-19 Release JES2 checkpoint lock
This removes the lock on the checkpoint data set held by the identified system, #@$1.
Important:
Do not confuse message:
$HASP263 WAITING FOR ACCESS TO JES2 CHECKPOINT
With this message:
$HASP264 WAITING FOR RELEASE OF JES2 CKPT LOCK BY sysname
27
D C,A,CA
IEE890I 03.54.03
NAME
ID
#@$1M01
13
#@$2M01
11
#@$3M01
01
STATUS
ACTIVE
ACTIVE
ACTIVE
Note: Starting with z/OS 1.8, it is no longer necessary (or possible) to have a single
sysplex Master console, although you can still have multiple consoles that have master
authority.
28
D GRS
ISG343I 19.40.20 GRS STATUS 015
SYSTEM
STATE
SYSTEM
STATE
#@$1
CONNECTED
#@$2
CONNECTED
#@$3
CONNECTED
GRS STAR MODE INFORMATION 1
LOCK STRUCTURE (ISGLOCK) CONTAINS 1048576 LOCKS.
THE CONTENTION NOTIFYING SYSTEM IS #@$3
SYNCHRES:
YES
Figure 2-21 Display GRS information
This information 1 indicates that GRS is operating in Star mode. The GRS lock structure
(which must be called ISGLOCK) contains some number of lock entries. Note that the number
of lock entries in a GRS structure must always be a power of 2, meaning that if you want to
increase the size of the structure, the size must be doubled each time. This lock information
only appears when GRS is in Star mode.
If the device is reserved by another system, message IOS431I might follow; it identifies the
system holding the reserve.
For information about a specific device enter D GRS,DEV=devno, as shown in Figure 2-23.
Using this information, you can see which job is causing the reserve. You can decide if that
job should be allowed to continue, or if it is experiencing problems and should be cancelled.
D GRS,DEV=1D06
DEVICE:1D06 VOLUME:#@$#X1 RESERVED BY SYSTEM #@$3
S=SYSTEMS MVSRECVY ES3090.RNAME1
SYSNAME
JOBNAME
ASID
TCBADDR
EXC/SHR
#@$3
RESERVE
001A
007E4B58 EXCLUSIVE
STATUS
OWN
You may also see that a device has a reserve against it if you issue a DEVSERV command for
that device. Figure 2-24 on page 30 shows an example where the DEVSERV command has
been issued for a device that currently has a reserve.
29
DS P,1D00
IEE459I 20.23.09 DEVSERV PATHS 060
UNIT DTYPE M CNT VOLSER CHPID=PATH STATUS
RTYPE
SSID CFW TC DFW PIN DC-STATE CCA DDC
ALT
1D00,33903 ,A,023,#@$#M1,5A=R 5B=R 5C=R 5D=R
2105
8981 Y YY. YY.
N
SIMPLEX
32 32
************************ SYMBOL DEFINITIONS *****************
A = ALLOCATED
R = PATH AVAILABLE AND RES
Figure 2-24 Devserv on paths of reserved device
The reserve also shows up with the D U command shown in Figure 2-25.
D U,,,1D00,1
IEE457I 20.21.59 UNIT STATUS 047
UNIT TYPE STATUS
VOLSER
1D00 3390 A
-R
#@$#M1
VOLSTATE
PRIV/RSDNT
Even if a device has a reserve against it, that is not necessarily a problem. Be aware,
however, that no other system will be able to update a data set on a volume that has a
reserve against it, so reserves that impact another job for a long time should be investigated.
STATUS
OWN
WAIT
30
D GRS,RES=(*,EXAMPLE1.XX)
ISG343I 00.12.25 GRS STATUS 334
S=SYSTEMS SYSDSN
EXAMPLE1.XX
SYSNAME
JOBNAME
ASID
#@$3
SAMPJOB1
001A
#@$3
SAMPJOB2
001F
1
TCBADDR
EXC/SHR
007FF290 EXCLUSIVE
007FF290
SHARE
2
STATUS
OWN
WAIT
SHARE
OWN
WAIT
Another option is to use a variation of the D GRS,ANALYZE command, which will provide
information about the root cause of any contention you may be encountering. We recommend
that you become familiar with the use of this command, so that you can quickly use it to
diagnose any contention problems that may arise.
31
ETR mode
The 9037s provide the setting and synchronization for the TOD clocks of the CEC or multiple
CECs. The IPLing system determines the timing mode from the CEC. Each CEC should be
connected to two 9037s, thus providing the ability to continue operating even if one 9037 fails.
Figure 2-29 shows the status of the two ETR ports on the CEC. In this display, the ETR NET
ID of both 9037s is the same; only the port numbers and ETR ID differ. The display shows
which 9037 is currently being used for the time synchronization signals. If that 9037 or the
connection to it were to fail, the CEC will automatically switch to the backup.
D ETR
IEA282I 23.38.48 TIMING STATUS 550
SYNCHRONIZATION MODE = ETR
CPC PORT 0 <== ACTIVE
CPC PORT 1
OPERATIONAL
OPERATIONAL
ENABLED
ENABLED
ETR NET ID=01
ETR NET ID=01
ETR PORT=01
ETR PORT=02
ETR ID=00
ETR ID=01
Figure 2-29 Display Sysplex Timer ETR
STP mode
Server Time Protocol is the logical replacement for 9037s. STP is a message-based protocol
in which timekeeping information is passed over data links between CECs.
STP must run a Coordinated Timing Network (CTN). Like the 9037, the same network ID
must be used by all systems that are to have synchronized times.
This network can be configured as STP-only, where all CECs use only STP, or the network
can be configured as Mixed. A Mixed network uses both STP and 9037s. In a Mixed CTN, the
9037 still controls the time for the whole sysplex.
Figure 2-30 shows the response from a system in STP mode.
D ETR
SYNCHRONIZATION MODE = STP
THIS SERVER IS A STRATUM 1
CTN ID = ISCTEST
THE STRATUM 1 NODE ID = 002084.C24.IBM.02.000000046875
THIS IS THE PREFERRED TIME SERVER
THIS STP NETWORK HAS NO SERVER TO ACT AS ARBITER
Figure 2-30 Display Sysplex Timer with STP
Using D XCF,S,All
Another way to see the timing mode of each system in the sysplex is to issue the D XCF,S,ALL
command. This will show TM=ETR, TM=SIMETR, or TM=STP.
For more information about STP, refer to Server Time Protocol Planning Guide, SG24-7280.
32
REMOVE
NO
NO
NO
NO
NO
NO
YES
YES
YES
FAILDSP
SYSPURGE
SYSPURGE
SYSPURGE
PURGE
PURGE
PURGE
SYSPURGE
SYSPURGE
SYSPURGE
1
2
3
4
5
5
6
7
33
To route commands to multiple systems in your sysplex, there are several options available:
Two system names can be enclosed in parentheses ( ), as shown in Figure 2-35 on
page 35. This method was issued in system #@$3, and shows the response from system
#@$1 1 and system #@$2 2.
A group name can be defined by the system programmer in IEEGSYS in SYS1.SAMPLIB
with a combination of any desired system names.
A combination of group and system names can be used, enclosed in parentheses.
34
RO (#@$1,#@$2),D TS,L
IEE421I RO (LIST),D TS,L 807
#@$1 1
RESPONSES --------------------------------------------------IEE114I 21.59.40 2007.175 ACTIVITY 107
JOBS
M/S
TS USERS
SYSAS
INITS
ACTIVE/MAX VTAM
OAS
00004
00017
00002
00032
00016
00002/00030
00004
SMITH
OWT
JONES
OWT
#@$2 2 RESPONSES --------------------------------------------------IEE114I 21.59.40 2007.175 ACTIVITY 623
JOBS
M/S
TS USERS
SYSAS
INITS
ACTIVE/MAX VTAM
OAS
00001
00010
00001
00032
00016
0002/00030
00004
SOUTH
OWT
NORTH
OWT
Figure 2-35 Route to a group of systems
To route commands to all other systems in your sysplex, use the *OTHER parm as shown in
Figure 2-36. This was issued in #@$1, so the response shows systems #@$2 and #@$3.
RO *OTHER,D TS,L
#@$2
RESPONSES --------------------------------------------------IEE114I 21.49.13 2007.175 ACTIVITY 135
JOBS
M/S
TS USERS
SYSAS
INITS
ACTIVE/MAX VTAM
OAS
00001
00011
00001
00032
00016
00002/00030
00004
SOUTH
OWT
NORTH
OWT
#@$3
RESPONSES --------------------------------------------------IEE114I 21.49.13 2007.175 ACTIVITY 287
JOBS
M/S
TS USERS
SYSAS
INITS
ACTIVE/MAX VTAM
OAS
00001
00014
00004
00032
00021
00002/00030
00005
EAST
OWT
WEST
OWT
Figure 2-36 Route to all other systems
There may be another way, however. If IEECMDPF (an IBM-supplied sample program in
SYS1.SAMPLIB) has run at IPL time, it defines the system name as a command prefix that
substitutes for the ROUTE command on each system.
35
For example, the following commands have the same effect on each system in the sysplex:
ROute #@$1,command
#@$1command
Multi-Access Spool
This SDSF Multi-Access Spool (MAS) panel displays the members of the Multi-Access Spool,
as shown in Figure 2-39.
SDSF MAS DISPLAY #@$1 XCFJES2A 79% SPOOL
COMMAND INPUT ===>
PREFIX=* DEST=(ALL) OWNER=* SYSNAME=*
NP NAME Status
SID PrevCkpt
Hold
#@$1 ACTIVE
1
0.58
0
#@$2 ACTIVE
2
0.63
0
#@$3 ACTIVE
3
0.76
0
Figure 2-39 MAS display
36
Dormancy
(0,100)
(0,100)
(0,100)
ActDorm SyncT
100
1
100
1
101
1
Job classes
The SDSF JC command displays the JES-managed and WLM-managed job classes, as
shown in Figure 2-40.
JOB CLASS DISPLAY ALL CLASSES
COMMAND INPUT ===>
PREFIX=* DEST=(ALL) OWNER=* SYSNAME=*
NP CLASS
Status Mode Wait-Cnt Xeq-Cnt
A
NOTHELD JES
B
NOTHELD JES
C
NOTHELD JES
D
NOTHELD WLM
E
NOTHELD WLM
QHld
NO
NO
NO
NO
NO
Ho
NO
NO
NO
NO
NO
WLM resources
The SDSF RES command displays the WLM-managed resources, as shown in Figure 2-41.
SDSF RESOURCE DISPLAY
COMMAND INPUT ===>
PREFIX=* DEST=(ALL)
NP RESOURCE
CB390ELEM
DB2_PROD
PRIME_SHIFT
MAS SYSTEMS
OWNER=*
#@$1
RESET
RESET
RESET
SYSNAME=*
#@$2
#@$3
RESET
RESET
RESET
RESET
RESET
RESET
37
38
Chapter 3.
39
The descriptions in this chapter follow the system to the point where the z/OS initialization is
completed and the system is ready for the restart of subsystems and their workload. For
details of these stages, refer to this book's chapters on specific subsystems.
Before IPLing:
To avoid receiving messages IXC404I and IXC405D (indicating that other systems are
already active in the sysplex), the first system IPLed back into the sysplex should
preferably also have been the last one removed from the sysplex.
We do not recommend that you IPL additional systems at the same time as the first
system IPL. Wait until GRS initializes, as indicated by message ISG188I (Ring mode)
or message ISG300I (Star mode).
We also recommend that you try to avoid IPLing multiple systems from the same
physical sysres at the same time, to avoid possible sysres contention.
After the load has been performed, z/OS runs through four stages during initialization. These
stages are:
Nucleus Initialization Program (NIP)
Acquiring the Time Of Day (TOD) from the CEC TOD clock, and verifying that the CEC is
in the time mode indicated in the CLOCKxx member
z/OS joining the Parallel Sysplex
XCF initialization
Coupling Facility (CF) connection
Global Resource Serialization (GRS) initialization
Console initialization
40
The z/OS part of the IPL is complete when message IEE389I informs you that z/OS
command processing is available. At this point, the operator can concentrate on the restart of
the subsystems such as JES2, IMS, DB2, and CICS. It is preferable to use an automated
operations package to perform some of this activity.
3.3 IPLing the first system image (the last one out)
This section describes how the first image, in our example #@$1, is IPLed into a Parallel
Sysplex called #@$#PLEX. This means that no other system images are active in the Parallel
Sysplex prior to this IPL taking place. Prior to the IPL, all systems were stopped in an orderly
manner, with #@$1 being the last system to be stopped.
.
Note: As previously noted, IPLing the first system in a Parallel Sysplex should not be done
concurrently with other systems.
The description follows the sequence of events from the processor load through to the
completion of z/OS initialization.
41
The messages seen will depend on the second last character of load parameter specified
during IPL. This is the Initial Message Suppression Indicator (IMSI). It can be coded to
suppress most informational messages and to not prompt for system parameters.
Figure 3-1 shows the NIP messages related to system #@$1 being IPLed.
IEA371I SYS0.IPLPARM ON DEVICE 1D00 SELECTED FOR IPL PARAMETERS
IEA246I LOAD
ID SS SELECTED
IEA246I NUCLST ID $$ SELECTED
IEA519I IODF DSN = IODF.IODF59 1
IEA520I CONFIGURATION ID = TRAINER . IODF DEVICE NUMBER = 1D00
IEA528I IPL IODF NAME DOES NOT MATCH IODF NAME IN HARDWARE TOKEN
SYS6.IODF07
IEA091I NUCLEUS 1 SELECTED
IEA093I MODULE IEANUC01 CONTAINS UNRESOLVED WEAK EXTERNAL REFERENCE
IECTATEN
IEA370I MASTER CATALOG SELECTED IS MCAT.V#@$#M1
IST1096I CP-CP SESSIONS WITH USIBMSC.#@$1M ACTIVATED
IEE252I MEMBER IEASYMFK FOUND IN SYS1.PARMLIB
IEA008I SYSTEM PARMS FOLLOW FOR z/OS 01.07.00 HBB7720 013 2
IEASYSFK
IEASYSFK
IEE252I MEMBER IEASYS00 FOUND IN SYS1.PARMLIB
IEE252I MEMBER IEASYSFK FOUND IN SYS1.PARMLIB
IEA007I STATIC SYSTEM SYMBOL VALUES 018 3
&SYSALVL. = "2"
&SYSCLONE. = "$1"
&SYSNAME. = "#@$1"
&SYSPLEX. = "#@$#PLEX"
&SYSR1.
= "#@$#R1"
&BPXPARM. = "FS"
&CICLVL.
= "V31LVL1"
&CLOCK.
= "VM"
&COMMND.
= "00"
&LNKLST.
= "C0,C1"
&LPALST.
= "00,L"
&MQSLVL1. = "V60LVL1"
&OSREL.
= "ZOSR17"
&SMFPARM. = "00"
&SSNPARM. = "00"
&SYSID1.
= "1"
&SYSNAM.
= "#@$1"
&SYSR2.
= "#@$#R2"
&VATLST.
= "00"
&VTAMAP.
= "$1"
IFB086I LOGREC DATA SET IS SYS1.#@$1.LOGREC 045
IEE252I MEMBER GRSCNF00 FOUND IN SYS1.PARMLIB
IEE252I MEMBER GRSRNL02 FOUND IN SYS1.PARMLIB
IEA940I THE FOLLOWING PAGE DATA SETS ARE IN USE:
PLPA ........... - PAGE.#@$1.PLPA
COMMON ......... - PAGE.#@$1.COMMON
LOCAL .......... - PAGE.#@$1.LOCAL1 .....
Figure 3-1 IPL NIP phase display
42
The messages in Figure 3-1 on page 42 include information that is valuable to the operator:
1 The IODF DSN (and below it, the device number)
2 The version of z/OS that is being IPLed
3 The system symbol name and their values
The usual z/OS library loading and concatenation messages then follow; Figure 3-2 shows an
example of the messages displayed during this phase. The usual pause in the IPL message
flow follows at this point, while the LPA is built.
IEE252I MEMBER LPALST00 FOUND IN SYS1.PARMLIB
IEA713I LPALST LIBRARY CONCATENATION
SYS1.LPALIB
Figure 3-2 IPL LPA library concatenation
43
When XCF initialization is complete, message IXC418I is issued, as shown in Figure 3-5.
This indicates that the system is now part of the named sysplex.
IXC418I SYSTEM #@$1 IS NOW ACTIVE IN SYSPLEX #@$#PLEX
Figure 3-5 System active in sysplex
Note: The IXC4181 message is easy to miss because it occurs around the same time as
the PATHIN and PATHOUT activity, which can generate a large number of messages.
44
2
3
When the other systems are IPLed and start their end of the paths, communication will be
possible with other systems in the Parallel Sysplex.
128
ADDED AS THE PRIMARY
129
ADDED AS THE ALTERNATE
When the SFM CDS is added, if the SFM policy is active, it generates messages IXC602I,
IXC609I, and IXC601I. These indicate which policy is loaded and which attributes are used.
These messages are shown in Figure 3-8.
IXC602I SFM POLICY SFM01 INDICATES FOR SYSTEM #@$1 A STATUS 060
UPDATE MISSING ACTION OF ISOLATE AND AN INTERVAL OF 0 SECONDS.
THE ACTION WAS SPECIFIED FOR THIS SYSTEM.
IXC609I SFM POLICY SFM01 INDICATES FOR SYSTEM #@$1 A SYSTEM WEIGHT OF
80 SPECIFIED BY SPECIFIC POLICY ENTRY
IXC601I SFM POLICY SFM01 HAS BEEN MADE CURRENT ON SYSTEM #@$1
Figure 3-8 Adding SFM policy
45
During the process of starting XCF, XCF detects that you are IPLing this system as the first in
the sysplex, and verifies with each CF that the CF contains the same structures that the
CFRM indicates are present in the CF. This process is known as reconciliation. This activity,
shown in Figure 3-9, generates messages IXC504I, IXC505I, IXC506I and IXC507I, as
appropriate.
This happens regardless of whether the CFs were restarted. As additional systems are IPLed
into the sysplex, they detect that there are active systems and bypass this step.
IXC504I INCONSISTENCIES BETWEEN COUPLING FACILITY NAMED FACIL01 451
AND THE CFRM ACTIVE POLICY WERE FOUND.
THEY HAVE BEEN RESOLVED.
...
IXC505I STRUCTURE JES2CKPT_1 IN 442
COUPLING FACILITY SIMDEV.IBM.EN.0000000CFCC2
PARTITION: 00
CPCID: 00
NOT FOUND IN COUPLING FACILITY. CFRM ACTIVE POLICY CLEARE
...
TRACE THREAD: 00022E92.
IXC507I CLEANUP FOR 452
COUPLING FACILITY SIMDEV.IBM.EN.0000000CFCC2
PARTITION: 00
CPCID: 00
HAS COMPLETED.
TRACE THREAD: 00022E92.
Figure 3-9 CFRM initialization
When the reconciliation process completes, the system is able to use each of the CFs, as
confirmed by message IXC517I shown in Figure 3-10.
IXC517I SYSTEM #@$1 ABLE TO USE 123
COUPLING FACILITY SIMDEV.IBM.EN.0000000CFCC1
PARTITION: 00
CPCID: 00
NAMED FACIL01
Figure 3-10 CF connection confirmation
While the Couple Data Sets are added, any requested allocations of structures will also take
place. This is indicated with the IXL014I and IXL015I messages, as shown in Figure 3-11 on
page 47.
These messages may occur for z/OS components such as XCFAS, ALLOCAS, and
IXGLOGR, depending on which functions your installation is exploiting. The messages tell
you in which CF the structure was allocated and why.
46
CONSOLE activation
The CONSOLxx member is processed next. Using the information obtained in the member,
Multiple Console Support (MCS) is activated. The system initializes the system console as an
extended console. See Figure 3-13 for the type of messages you can expect to see during
this processing.
Note: Starting with z/OS 1.8, the sysplex master console is no longer supported or
required.
IEE252I
IEA630I
IEE828E
IEA549I
, LU=#@$1
, LU=*SYSLG$1
, LU=ROUTEALL
For more information about consoles, refer to Chapter 14, Managing consoles in a Parallel
Sysplex on page 283.
RACF initialization
The next step in the IPL process occurs as RACF starts using its databases. The system
programmers should have customized the RACF data set name table module to indicate
whether RACF sysplex communication is to be enabled. If sysplex communication is enabled,
RACF will automatically use the RACF structures in the CF to share the database between
the systems that are sharing that RACF database.
Chapter 3. IPLing systems in a Parallel Sysplex
47
The RACF data-sharing operation mode in the Parallel Sysplex is indicated by the messages
shown in Figure 3-14. Note that some installations may use a security product other than
RACF.
ICH559I MEMBER #@$1 ENABLED FOR SYSPLEX COMMUNICATIONS
Figure 3-14 RACF initialization
3.4 IPLing the first system image (not the last one out)
This section describes the IPL process when the first system to be IPLed in the sysplex is not
the last one to be stopped when the sysplex was shut down. This means that no other system
images are active in the Parallel Sysplex prior to this IPL taking place.
During initialization, XCF checks the CDS for systems other than the one being IPLed. In this
situation, that condition is met.
For example, if #@$1 was the last to leave the sysplex, and you IPL system #@$2 first, XCF
checks for systems other than the one being IPLed. If it finds #@$1 or #@$3 (which it does,
in this scenario), then it issues IXC404I, which lists the system names in question, and follows
it with IXC405D.
The procedure will depend on what happened before the IPL, and how the system or systems
left the sysplex.
If there was, for instance, an unplanned power outage and all systems failed at the same
time, then upon the first IPL of any system, IXC404I and IXC405D are issued.
Note: As previously mentioned, IPLing the first system in a Parallel Sysplex should not be
done concurrently with other systems. The cleanup of the sysplex CDSs and CF structures
is disruptive to other systems. Only the IPL of additional systems into the sysplex can run
concurrently, and we do not recommend having them in NIP at the same time.
48
Use the I option to request that sysplex initialization continue because none of the
systems identified in message IXC404I are in fact participating in an operating
sysplex; that is, they are all residual systems. This system will perform cleanup of old
sysplex data, initialize the Couple Data Set, and start a new sysplex. If any of the
systems identified in message IXC404I are currently active in the sysplex, they will be
placed into a disabled wait state.
Use the J option to request that this system join the already active sysplex. Choose
this reply if this system belongs in the sysplex with the systems identified in message
IXC404I, despite the fact that some of those systems appear to have out-of-date
system status update times. The initialization of this system will continue.
Use the R option to request that XCF be reinitialized on this system. XCF will stop
using the current Couple Data Sets and issue message IXC207A to prompt the
operator for a new COUPLExx parmlib member.
Choose R also to change the sysplex name and reinitialize XCF to remove any
residual data for this system from the Couple Data Set. The system prompts the
operator for a new COUPLExx parmlib member.
Consult your support staff if necessary. If no other systems are in fact active, you can answer
I to initialize the sysplex. The alternative options (J or R) are only valid with an active sysplex
or to make changes to the XCF parameters.
IXC404I SYSTEM(S) ACTIVE OR IPLING: #@$1
IXC405D REPLY I TO INITIALIZE THE SYSPLEX, J TO JOIN SYSPLEX #@$#PLEX,
OR R TO REINITIALIZE XCF
IEE600I REPLY TO 00 IS;I
Figure 3-16 Initialize the sysplex
If I is replied, the IPL continues and message IXC418I is issued, indicating that this system is
now part of a sysplex; see Figure 3-17.
IXC418I SYSTEM #@$2 IS NOW ACTIVE IN SYSPLEX #@$#PLEX
Figure 3-17 System now in sysplex
The IXC4181 message is easy to miss because it occurs at the same time as PATHIN and
PATHOUT activity, which generates a large number of messages.
49
If you reply J in the preceding scenario, the system being IPLed would join the active
sysplex. However, after it is running, if the time stamps of the other systems do not become
updated, this system may consider them to be in status update missing (SUM) condition,
and may start partitioning the inactive systems out of the sysplex.
Note: Under certain circumstances, the sysplex becomes locked and message IXC420A
might be issued instead of IXC405A. Those circumstances may include during a disaster
recovery where the CF name has changed; if CDS specifications for the IPLing system do
not appear to match what the current sysplex is using; or if the Sysplex Timers do not
appear to be the same.
In these cases, using J to join is not an option. The only choices offered are I to initialize
the sysplex, or R to specify a new COUPLExx.
If the system was removed from XCF since it was stopped or shut down
If the system was removed, then there is no residual presence of this system in the sysplex,
and it will join without incident.
50
004 IXC102A XCF IS WAITING FOR SYSTEM #@$1 DEACTIVATION. REPLY DOWN
WHEN MVS ON #@$1 HAS BEEN SYSTEM RESET
If the WTOR remains outstanding during this attempted IPL, it means that neither the sysplex
partitioning nor the cleanup of the system that was brought down has been performed. This
partitioning and cleanup will have to be performed before the system can rejoin the sysplex at
IPL.
As shown in Figure 3-19, #$@3 has been varied offline, but the DOWN reply to IXC102A is
not given. When the system is re-IPLed, the IPLing system issues the following IXC203
message as it tries to join the sysplex (referring to its previous incarnation).
IXC203I #@$3 IS CURRENTLY ACTIVE IN THE SYSPLEX
IXC218I SYSTEM STATUS FOR SYSPLEX #@$#PLEX AT 06/27/2007 01:27:13:
#@$2
01:27:12 ACTIVE
#@$3
01:23:43 BEING REMOVED
IXC214I COUPLE00 IS THE CURRENT COUPLE PARMLIB MEMBER
IXC240I IF XCF-LOCAL MODE INITIALIZATION IS DESIRED, RE-IPL WITH
"PLEXCFG=XCFLOCAL" AND "COUPLE=**"
IXC207A XCF INITIALIZATION IS RESTARTED. RESPECIFY COUPLE SYSTEM
PARAMETER, REPLY COUPLE=XX.
Figure 3-19 Trying to join when already active
The outstanding DOWN reply, in another system, must be responded to. After replying, wait
for the cleanup. The cleanup activity is highlighted in the IXC105I message text on the
remaining systems, shown in Figure 3-20.
IXC105I SYSPLEX PARTITIONING HAS COMPLETED FOR #@$3
- PRIMARY REASON: OPERATOR VARY REQUEST
- REASON FLAGS: 000004
Figure 3-20 System cleanup after DOWN reply
When IXC105 is issued, reply with your correct COUPLExx member and the IPL will
continue. You can also choose to reinitiate the IPL.
51
Enter the D XCF,COUPLE command to identify the maximum number of systems specified in
the sysplex. In the information displayed about the primary sysplex CDS, check the
MAXSYSTEM parameter 1 in Figure 3-24.
D XCF,COUPLE
IXC357I 23.30.06 DISPLAY XCF 893
SYSTEM #@$1 DATA
SYSPLEX COUPLE DATA SETS
...
PRIMARY
DSN: SYS1.XCF.CDS01
VOLSER: #@$#X1
DEVN: 1D06
FORMAT TOD
MAXSYSTEM MAXGROUP(PEAK) MAXMEMBER(PEAK)
11/20/2002 16:27:24
1 3
100
(52)
203
(18)
Figure 3-24 MAXSYSTEM value
Enter the command D XCF,S,ALL to obtain the number and status of systems in the sysplex.
See Figure 3-25.
D XCF,S,ALL
IXC335I 23.30.28 DISPLAY XCF 898
SYSTEM
TYPE SERIAL LPAR STATUS TIME
#@$1
2084 6A3A
N/A 07/01/2007 23:30:27
#@$2
2084 6A3A
N/A 07/01/2007 23:30:24
#@$3
2084 6A3A
N/A 07/01/2007 23:30:23
SYSTEM STATUS
ACTIVE
ACTIVE
ACTIVE
TM=SIMETR
TM=SIMETR
TM=SIMETR
The action you take will depend on the cause of the problem, as described in Table 3-1.
Table 3-1 MAXSYSTEM suggested actions
Cause
Suggested action
53
The text should specify the nature of the error, in some cases specifying the column it
occurred in. If the COUPLExx member is the correct one for the IPL, the syntax errors must
be corrected before the IPL can complete.
If the IPLing system is using the same COUPLExx member as the existing system or
systems, then the COUPLExx must have been changed since they were IPLed. It could,
however, be using a different COUPLExx member.
If all systems in the sysplex are sharing their parmlib definitions, the systems programmer
should be able to log on to one of the active systems and correct the definitions from there.
When the definitions have been corrected, respond to the IXC201A WTOR with COUPLE=xx,
where xx is the suffix of the corrected COUPLExx member in the PARMLIB. You may choose
to start the IPL again.
If the problems cannot be corrected from another system, you must IPL the failing system in
XCF-local mode. Before you attempt this, check with the systems programmer. After e the
system has completed the IPL, the systems programmer will be able to analyze and correct
the problem.
Note: To IPL in XCF-local mode, we recommend that an installation maintains an alternate
COUPLExx member in the PARMLIB containing the definition COUPLE
SYSPLEX(LOCAL).
54
When the definitions are corrected, respond to the IXC201A WTOR with COUPLE=xx, where
xx is the suffix of the corrected COUPLExx member in the PARMLIB. You may choose to
start the IPL again.
The error was resolved by the system and the current values were used. The IPL continued
successfully.
55
IXC406I THIS SYSTEM IS CONNECTED TO ETR NET ID=01. THE OTHER ACTIVE SYS
IN THE SYSPLEX ARE USING ETR NET ID=00.
IXC404I SYSTEM(S) ACTIVE OR IPLING: #@$1
#@$2
IXC419I SYSTEM(S) NOT SYNCHRONIZED: #@$1
#@$2
IXC420D REPLY I TO INITIALIZE SYSPLEX #@$#PLEX, OR R TO REINITIALIZE XC
REPLYING I WILL IMPACT OTHER ACTIVE SYSTEMS.
Figure 3-29 Mismatching SIMETRID
We used SIMETRID in this scenario because all three systems were on the same CEC. We
then commented out the SIMETRID parameter and message IEA261I was issued, as seen in
Figure 3-30. Under other circumstances, a different error messages may be issued.
IEA261I
IEA598I
IEA888A
IEA888A
System initialization stops. The operator and the systems programmer should check the
following areas to establish the cause of the problem.
Has any system in the sysplex issued message IXC451I, indicating invalid signalling
paths?
In the COUPLExx member in PARMLIB:
Are all systems using the same CDS?
Are there any incorrect or missing CF signalling structure definitions?
Do the signalling path definitions in the IPLing system match their corresponding
PATHINs and PATHOUTs in the other systems in the sysplex configuration?
56
Are the signalling path definitions consistent with the hardware configuration?
Are the CF signalling structures able to allocate the storage they require (check for
IXL013I messages)?
Are there any hardware failures?
The action taken will depend on the cause of the problem.
57
The exact cause is not obvious. The initial text is that the sysplex data sets cannot be used,
then the reason given is because of the sysplex name mismatch, saying it does not match the
one in use. The one in use does not refer to the name of the currently running sysplex, it
refers to the name of the one attempting to IPL.
The resolution is also not obvious. Correcting the sysplex name and replying to IXC207 with
the couple member is not effective because the IPL is past the stage where it picks up the
sysplex name. The IPL has to be re-initiated after correction of the sysplex name. That may
involve, for example, specifying a new name, or specifying a new loadparm.
The result is the same if the scenario is reversed; that is, if a STAR tries to join a RING. The
IPL stops and a non-restartable 0A3 wait state is loaded. Correct the parms and re-IPL.
The IPL stops and a non-restartable 0A3 wait state is loaded. Correct the parms and re-IPL.
58
Chapter 4.
59
6. If IXC102A is issued, perform a hardware system reset on the system being removed from
the sysplex.
7. Reply DOWN to IXC102A.
8. IXC105I will be displayed when system removal is complete.
Assuming each stage completes successfully, the system is now removed from the Parallel
Sysplex. If the shutdown is on the last system in the Parallel Sysplex, the Parallel Sysplex is
shut down completely. However, one or more of the Coupling Facilities may still be active.
This means that SFM will not take part in the shutdown of any of the systems in the sysplex,
and all eight steps in the shutdown overview will be used.
SFM Couple Data Sets and policy need to be configured by your system programmers. When
SFM needs to start, issue the SETXCF START,POLICY,TYPE=SFM command.
In the example shown in Figure 4-2, SFM is active. However, this does not tell you which SFM
settings are in effect.
D XCF,POL,TYPE=SFM
IXC364I 20.27.32 DISPLAY XCF 125
TYPE: SFM
POLNAME:
SFM01
STARTED:
07/02/2007 20:21:59
LAST UPDATED: 05/28/2004 13:44:52
SYSPLEX FAILURE MANAGEMENT IS ACTIVE
Figure 4-2 Display with SFM active
61
SSUM INTERVAL
0
2CLEANUP
15
RETRY
10
WEIGHT
1
CLASSLEN
956
MEMSTALLTIME
NO
...
Figure 4-3 Display SFM settings
62
D XCF,S,ALL
IXC335I 19.00.10 DISPLAY
SYSTEM
TYPE SERIAL LPAR
#@$3
2084 6A3A
N/A
#@$2
2084 6A3A
N/A
#@$1
2084 6A3A
N/A
XCF 491
STATUS TIME
06/21/2007 19:00:10
06/21/2007 19:00:06
06/21/2007 19:00:07
SYSTEM STATUS
ACTIVE
ACTIVE
ACTIVE
TM=SIMETR
TM=SIMETR
TM=SIMETR
z/OS closure
When all the subsystems and applications have been shut down, the response to the D A,L
command should be similar to the one shown in Figure 4-5. However, this may not be the
case if there were problems during subsystem closure, or due to your sites configuration.
D A,L
IEE114I 19.11.42 2007.177 ACTIVITY 412
JOBS
M/S
TS USERS
SYSAS
INITS
00000
00000
00000
00032
00015
ACTIVE/MAX VTAM
00000/00030
OAS
00001
To close z/OS cleanly, the operator should issue the end of day command, Z EOD, as shown
in Figure 4-6, prior to removing the system from the Parallel Sysplex. This writes a logrec
dataset error record, and closes the current SMF dataset to preserve statistical data.
Z EOD
IEE334I HALT EOD SUCCESSFUL
Figure 4-6 Z EOD command
Sysplex partitioning
Use the V XCF,sysname,OFFLINE command to remove the closing system from the Parallel
Sysplex. This is shown in Figure 4-7 on page 64. It can be issued on any system in the
sysplex, including on the system that is being removed.
Note: The VARY command (and sysname) should still be used when removing the last or
only system from the sysplex, because there is still some cleanup to be done. However,
you do not receive message IXC102A, because there is no active system to issue it.
63
V XCF,#@$1,OFFLINE
*018 IXC371D CONFIRM REQUEST TO VARY SYSTEM #@$1
* OFFLINE. REPLY SYSNAME=#@$1 TO REMOVE #@$1 OR C TO CANCEL.
Figure 4-7 Initiate sysplex partitioning
The system on which the command is entered issues message IXC371D. This message
requests confirmation of the removal, also shown in Figure 4-7. To confirm removal, this
WTOR must be replied to, as shown in Figure 4-8.
R 18,SYSNAME=#@$1
IEE600I REPLY TO 018 IS;SYSNAME=#@$1
Figure 4-8 Confirm removal system name
We recommend that this should not be performed prior to a SAD. If an XCF component was
causing the problem that necessitated the SAD, then diagnostic data will be lost.
Note: If the confirmation is entered with a sysname that is different from the one requested
(#@$1), then the following message IXC208I is issued, and IXC371 is repeated:
R 18,SYSNAME=#@$2
IXC208I THE RESPONSE TO MESSAGE IXC371D IS INCORRECT: #@$2 IS NOT ONE OF THE
SPECIFIED SYSTEMS
From this point on (having received a valid response to IXC371), the CLEANUP interval
(seen in Figure 4-3 on page 62) starts, sysplex partitioning (also known as fencing) begins,
and the message seen in Figure 4-9 is issued to a randomly chosen system, which monitors
the shutdown.
IXC101I SYSPLEX PARTITIONING IN PROGRESS FOR #@$1 REQUESTED BY
*MASTER*. REASON: OPERATOR VARY REQUEST
Figure 4-9 IXC101I Sysplex partitioning initiated by operator
During this step, XCF group members are given a chance to be removed,
The next step varies, depending on whether or not there is an active SFM policy with system
isolation in effect. The without scenario is described first.
64
When this stage of the cleanup is complete (or if the CLEANUP interval expires), the system
being removed is loaded with a non-restartable WAIT STATE 0A2.
Wait for this state before performing the system reset. Do not reply DOWN yet.
Do not reply to IXC102A until SYSTEM RESET is done: Before replying DOWN to
IXC102A, you must perform a hardware SYSTEM RESET (or equivalent) on the system
being removed. This is necessary to ensure that this system can no longer perform any I/O
operations, and that it releases any outstanding I/O reserves. The SYSTEM RESET
therefore ensures data integrity on I/O devices.
SYSTEM RESET refers to an action on the processor that bars z/OS from doing I/O. The
following are all valid actions for a SYSTEM RESET: Stand Alone Dump (SAD), System
Reset Normal, System Reset Clear, Load Normal, Load Clear, Deactivating the Logical
Partition, Resetting the Logical Partition, Power-on Reset (POR), Processor IML, or
Powering off the CPC.
When the SYSTEM RESET (or its equivalent) is complete, the operator should reply DOWN to
the IXC102A WTOR; see Figure 4-11.
R 22,DOWN
IEE600I REPLY TO 022 IS;DOWN
Figure 4-11 Reply DOWN after system reset
After DOWN has been entered, XCF performs a cleanup of the remaining resources relating
to the system being removed from the sysplex, as seen in Figure 4-12.
Finally, removal of the system from the sysplex completes, and the following IXC105I
message is issued, as shown in Figure 4-13. The reason flags may vary at your site.
IXC105I SYSPLEX PARTITIONING HAS COMPLETED FOR #@$1 236
- PRIMARY REASON: OPERATOR VARY REQUEST
- REASON FLAGS: 000004
Figure 4-13 Sysplex partitioning completed
65
Check for the IXC102A message and reply DOWN to it. When IXC105I is issued, then reply with
your correct COUPLExx member and the IPL will continue. You can also choose to start the
IPL again.
*IXC220W XCF IS UNABLE TO CONTINUE: WAIT STATE CODE: 0A2 REASON CODE: 004,
AN OPERATOR REQUESTED PARTITIONING WITH THE VARY XCF COMMAND
Figure 4-15 0A2 non-restartable wait state
Important: Wait for the WAIT STATE before performing SYSTEM RESET.
Sysplex cleanup
With any closure of a system in a Parallel Sysplex, whether controlled or not, the remaining
systems clean up the XCF connections. This activity occurs when the CLEANUP interval (as
shown in Figure 4-3 on page 62) expires.
The default XCF CLEANUP time is sixty seconds. However, thirty seconds is recommended.
Tip: Set XCF CLEANUP interval to 30 seconds.
As shown in Figure 4-16, GRS and system partitioning take place, and these are indicated by
many IXC467I, IXC307I, and IXC302I messages, which may not be seen at the console.
IXC467I STOPPING PATH STRUCTURE IXC_DEFAULT_2 217
RSN: SYSPLEX PARTITIONING OF LOCAL SYSTEM
IXC467I STOPPING PATHOUT STRUCTURE IXC_DEFAULT_2 LIST 8 218
USED TO COMMUNICATE WITH SYSTEM #@$2
RSN: SYSPLEX PARTITIONING OF LOCAL SYSTEM
...
IXC307I STOP PATHOUT REQUEST FOR STRUCTURE IXC_DEFAULT_2 226
LIST 10 TO COMMUNICATE WITH SYSTEM #@$3 COMPLETED
SUCCESSFULLY: SYSPLEX PARTITIONING OF LOCAL SYSTEM
IXC307I STOP PATHOUT REQUEST FOR STRUCTURE IXC_DEFAULT_2 224
LIST 8 TO COMMUNICATE WITH SYSTEM #@$2 COMPLETED
SUCCESSFULLY: SYSPLEX PARTITIONING OF LOCAL SYSTEM
...
IXC302I STOP PATHIN REQUEST FOR STRUCTURE IXC_DEFAULT_1 116
LIST 0 TO COMMUNICATE WITH SYSTEM #@$1 REJECTED:
UNKNOWN PATH
DIAG037=18 DIAG074=08710000 RC,RSN=00000008 081A0004
IXC302I STOP PATHIN REQUEST FOR STRUCTURE IXC_DEFAULT_2 117
LIST 0 TO COMMUNICATE WITH SYSTEM #@$1 REJECTED:
UNKNOWN PATH
DIAG037=18 DIAG074=08710000 RC,RSN=00000008 081A0004
Figure 4-16 PATHIN and PATHOUT cleanup
67
More commonly, the first indication of system failure is when a Status Update Missing (SUM)
condition is registered by one of the other systems in the sysplex. This occurs when a system
has not issued a status update (heartbeat) to the XCF couple dataset within the INTERVAL
since the last update. This value is defined in the COUPLExx member of PARMLIB, and
shown in Figure 4-3 on page 62. When a SUM condition occurs, the system detecting the
SUM notifies all other systems and issues the IXC101I message with the text as shown in
Figure 4-18. The SUM reason text differs from the OPERATOR VARY REQUEST in the
previous section. There are a dozen different reasons that can cause this alert to appear, but
only two are considered here.
IXC101I SYSPLEX PARTITIONING IN PROGRESS FOR #@$1 REQUESTED BY XCFAS.
REASON: SFM STARTED DUE TO STATUS UPDATE MISSING
Figure 4-18 IXC101 sysplex partitioning initiated by XCF
68
Do not reply to IXC102A until SYSTEM RESET is done: Before replying DOWN to
IXC102A, you must perform a hardware SYSTEM RESET (or equivalent) on the system
being removed. This is necessary to ensure that this system can no longer perform any I/O
operations, and that it releases any outstanding I/O reserves. The SYSTEM RESET,
therefore, ensures data integrity on I/O devices.
SYSTEM RESET refers to an action on the processor that bars z/OS from performing I/O.
The following are all valid actions for a SYSTEM RESET: Stand-alone Dump (SAD),
System Reset Normal, System Reset Clear, Load Normal, Load Clear, Deactivating the
Logical Partition, Resetting the Logical Partition, Power-on Reset (POR), Processor IML,
or Powering off the CEC.
SFM is inactive
When a sysplex without an active SFM policy becomes aware of the system failure, message
IXC402D is issued which alerts the operator that a system (#@$1 in this exercise) is not
operative, shown in Figure 4-20.
006 R #@$3
With the IXC402D WTOR, the operator is requested to reply DOWN when the system has
been reset. The INTERVAL option allows the operator to specify a period of time for system
operation recovery.
If the system has not recovered within this period, message IXC402D is issued again. The
INTERVAL reply can be in the range 0 to 86400 seconds (24 hours).
69
After the SYSTEM RESET has been performed, you can reply DOWN to IXC402D, as shown in
Figure 4-21. It is only this reply that starts the partitioning.
R 06,DOWN
IEE600I REPLY TO 006 IS;DOWN
Figure 4-21 DOWN reply after IXC402
Important: As for an IXC102A message, with IXC402D do not reply DOWN until a
SYSTEM RESET has been performed.
System cleanup
No matter how it is arrived at, after the DOWN reply, XCF performs a cleanup of resources
relating to the system being removed. This activity starts and ends with the IEA257I and
IEA258I messages interspersed with IEE501I messages, as shown in Figure 4-22.
IEA257I CONSOLE PARTITION CLEANUP IN PROGRESS FOR SYSTEM #@$1.
CNZ4200I CONSOLE #@$1M01 HAS FAILED. REASON=SYSFAIL
IEA258I CONSOLE PARTITION CLEANUP COMPLETE FOR SYSTEM #@$1
IEE501I CONSOLE #@$1M01 FAILED, REASON=SFAIL . ALL ALTERNATES
UNAVAILABLE, CONSOLE IS NOT SWITCHED
Figure 4-22 Console cleanup
When system removal completes, IXC105I message is issued. The system is now out of the
sysplex, as shown in Figure 4-23.
Note: When you receive message IXC105I, the RSN text may be different.
Sysplex cleanup
With any closure of a system in a Parallel Sysplex (except the last or only), whether controlled
or not, the remaining systems clean up the XCF connections. This activity occurs at the same
time as GRS and system partitioning take place. This is indicated by many IXC302 1,
IXC307I 2, and IXC467I 3 messages, which may not be seen at the console; see Figure 4-24
on page 71.
70
71
3. Issue the VARY XCF,sysname,OFFLINE command from another system in the sysplex (if
message IXC402D or IXC102A is not already present).
4. Reply DOWN to message IXC402D or IXC102A, without performing a SYSTEM RESET,
because this has already taken place through the IPL of the SADMP program, and you
would cause the SADMP to fail.
Note the following points:
If this is the last or only system in the sysplex, then step 1 and step 2 apply.
If the system has already been removed from the sysplex, then only step 2 applies.
You do not need to wait for the SAD to complete before continuing with step 3.
Performing steps 3 and 4 immediately after IPLing the SAD will speed up sysplex
recovery, allowing resources held by the IPLing system to be released quickly.
If there is a delay between steps 2 and 3, then messages IXC402D or IXC102A may be
issued by another system detecting the loss of connectivity with the IPLing system.
After the SAD program is IPLed, IXC402D or IXC102A will be issued, even if an active
SFM policy is in effect. This happens because z/OS is unable to automatically partition the
failing system using SFM.
4.4.2 SAD required during unplanned removal of a system with SFM active
If SFM is active and system isolation is in effect, then it will detect the system failure and start
sysplex partitioning. In this case, follow this procedure:
1. Perform a hardware STOP function to place the failing systems CPUs into a stopped
state (this is not strictly required, but is good practice).
2. IPL the standalone dump program on the failing system,
3. If message IXC102A is present, reply DOWN without performing a SYSTEM RESET.
72
Chapter 5.
73
74
any z/OS system's heartbeat time stamp is older than the current time minus that system's
INTERVAL value from the COUPLExx parmlib member, that system is considered to have
failed in some way. When this occurs, the failed system is considered to be in a Status
Update Missing (SUM) condition. All systems are notified when a SUM condition occurs. The
recovery actions which are taken when a SUM condition occurs depend on the recovery
parameters that you specify in your SFM policy. They could be:
Prompt the operator to perform the recovery actions manually.
Remove the system from the sysplex without operator intervention by:
Using the Coupling Facility fencing services to isolate the system.
System resetting the failing systems LPAR.
Deactivating the failing systems LPAR.
75
If the structure supports rebuild, you can influence when it should be rebuilt by using the
REBUILDPERCENT parameter in the structures definition in the Coupling Facility Resource
Management (CFRM) policy:
The structure is rebuilt if the weight of the system that lost connectivity is equal to or
greater than the REBUILDPERCENT value you specified.
The structure is not rebuilt if the weight of the system that lost connectivity is less than the
REBUILDPERCENT value you specified. In this case, the affected system will go into
error handling to recover from the connectivity failure.
If the structure supports user-managed rebuild and you used the default value of 1% for
REBUILDPERCENT, the structure rebuilds when a loss of connectivity occurs.
During the rebuild, to ensure that the rebuilt structure has better connectivity to the systems in
the sysplex than the old structure, the CF selection process will factor in the SFM system
weights and the connectivity that each system has to the CF. However, if there is no SFM
policy active, all the systems are treated as having equal weights when determining the
suitability of a CF for the new structure allocation.
76
CLEANUP
3 15
SSUM INTERVAL
5 0
RETRY
10
WEIGHT
6 19
CLASSLEN
956
MEMSTALLTIME
NO
The INTERVAL 1 is otherwise known as the failure detection interval. This specifies
when the failing system is considered to have entered a SUM condition.
The OPNOTIFY 2 specifies when SFM notifies the operator that a system has not updated
its status. The timer for both INTERVAL and OPNOTIFY start at the same time and the
value for OPNOTIFY must be greater than or equal to the value specified for INTERVAL.
The CLEANUP 3 interval specifies how long XCF group members can perform clean-up
for the z/OS system being removed from the sysplex. The intention of the cleanup interval
is to give XCF group members on the system being removed a chance to exit gracefully
from the system. The XCF CLEANUP interval only applies to planned system shutdowns,
when the VARY XCF command is used to remove a z/OS system from a sysplex.
77
78
79
D XCF,POL,TYPE=SFM
IXC364I 20.22.30 DISPLAY XCF 844
TYPE: SFM
POLNAME:
SFM01 1
STARTED:
07/02/2007 20:21:59
LAST UPDATED: 05/28/2004 13:44:52
SYSPLEX FAILURE MANAGEMENT IS ACTIVE
An example of the response to this command when SFM is not active is shown in Figure 5-5.
The last line of the output shows that SFM is not active 3.
D XCF,POLICY,TYPE=SFM
IXC364I 19.07.44 DISPLAY XCF 727
TYPE: SFM
POLICY NOT STARTED 3
Figure 5-5 SFM policy display when SMF is inactive
You can also use the following command to determine if SFM is active in the sysplex:
D XCF,COUPLE
An example of the response to this command when SFM is active and when SFM is not
active is shown in Figure 5-6. When SFM is active, the SSUM ACTION, SSUM INTERVAL,
WEIGHT, and MEMSTALLTIME fields 1 are populated with values from the SFM policy.
D XCF,COUPLE
IXC357I 02.54.26 DISPLAY XCF 893
SYSTEM #@$2 DATA
INTERVAL
OPNOTIFY
MAXMSG
85
88
2000
SSUM ACTION
ISOLATE
. . .
SSUM INTERVAL
0
CLEANUP
15
RETRY
10
CLASSLEN
956
WEIGHT MEMSTALLTIME
19
NO
When SFM is not active, these fields contain N/A 2, as shown in Figure 5-7.
D XCF,COUPLE
IXC357I 02.39.19 DISPLAY XCF 900
SYSTEM #@$2 DATA
INTERVAL
OPNOTIFY
MAXMSG
85
88
2000
SSUM ACTION
N/A
. . .
SSUM INTERVAL
N/A
CLEANUP
15
80
RETRY
10
CLASSLEN
956
WEIGHT MEMSTALLTIME
N/A
N/A
If your system programmer asks you to stop the current SFM policy, use the following
command:
SETXCF STOP,POLICY,TYPE=SFM
An example of the system response to this command is shown in Figure 5-9. This command
stops the SFM policy on all systems in the sysplex. After the SFM policy is stopped, its status
in the sysplex changes to POLICY NOT STARTED, as shown in Figure 5-5 on page 80.
SETXCF STOP,POLICY,TYPE=SFM
IXC607I SFM POLICY HAS BEEN STOPPED BY SYSTEM #@$2
Figure 5-9 Console messages when stopping SFM policy
81
82
Chapter 6.
83
84
Available
Available-to
The element was restarted and has registered, but has not indicated it is
ready to work. After a time-out period has expired, ARM will consider it
available.
Failed
Restarting
The element failed, and ARM is restarting it. The element may be
executing and has yet to register again with ARM, or job scheduling
factors may be delaying its start.
WaitPred
Recovering
The element has been restarted by ARM and has registered, but has not
indicated it is ready for work.
Note: A batch job or started task registered with Automatic Restart Manager can only be
restarted within the same JES XCF group. That is, it can only be restarted in the same
JES2 MAS or the same JES3 complex.
A system is considered ARM-enabled if it is connected to an ARM Couple Data Set. During
an IPL of an ARM-enabled system, the system indicates which ARM datasets it has
connected to, as shown in Figure 6-3 on page 86.
85
The D XCF,POLICY command displays the currently active ARM policy. Figure 6-3 displays a
system with the ARM policy ARMPOL01 active.
D XCF,POLICY,TYPE=ARM
IXC364I 18.43.22 DISPLAY XCF 330
TYPE: ARM
POLNAME:
ARMPOL01
STARTED:
06/22/2007 03:26:23
LAST UPDATED: 06/22/2007 03:25:58
1
2
3
86
D XCF,COUPLE,TYPE=ARM
IXC358I 18.34.46 DISPLAY XCF 319
ARM COUPLE DATA SETS
PRIMARY
DSN: SYS1.XCF.ARM01
VOLSER: #@$#X1
DEVN: 1D06
FORMAT TOD
MAXSYSTEM
11/20/2002 15:08:01
4
ADDITIONAL INFORMATION:
FORMAT DATA
VERSION 1, HBB7707 SYMBOL TABLE SUPPORT
POLICY(20) MAXELEM(200) TOTELEM(200)
ALTERNATE DSN: SYS1.XCF.ARM02
VOLSER: #@$#X2
DEVN: 1D07
FORMAT TOD
MAXSYSTEM
11/20/2002 15:08:04
4
ADDITIONAL INFORMATION:
FORMAT DATA
VERSION 1, HBB7707 SYMBOL TABLE SUPPORT
POLICY(20) MAXELEM(200) TOTELEM(200)
ARM IN USE BY ALL SYSTEMS
Figure 6-6 on page 88 displays an extract from the ARM policy report. The report shows
items such as a RESTART_GROUP 1, an ELEMENT 2 and a TERMTYPE 3.
A restart_group is a logical connected group of elements that need to be restarted
together if the system they are on running fails. Not all the elements in a restart_group
need to be running on the same system, nor do they all need to be running.
We recommend that you set up a default group (RESTART_GROUP(DEFAULT)) with
RESTART_ATTEMPTS(0), so that any elements that are not defined as part of another
restart group are not restarted. All elements that do not fall into a specific restart group in
the policy are in the DEFAULT restart group.
The figure displays three restart groups: CICS#@$1, DB2DS1, and the default group.
87
RESTART_GROUP(CICS#@$1)
ELEMENT(SYSCICS_#@$CCC$1)
TERMTYPE(ELEMTERM)
. . .
RESTART_GROUP(DB2DS1)
ELEMENT(D#$#D#$1)
ELEMENT(DR$#IRLMDR$1001)
. . .
/* NO OTHER ARM ELEMENTS WILL BE RESTARTED */
RESTART_GROUP(DEFAULT)
ELEMENT(*)
RESTART_ATTEMPTS(0)
RESTART_ATTEMPTS(0,)
RESTART_TIMEOUT(120)
TERMTYPE(ALLTERM)
RESTART_METHOD(BOTH,PERSIST)
1
2
3
Figure 6-6 Sample job output displaying the active ARM policy
An element specifies a batch job or started task that can register as an element of
Automatic Restart Manager. The element name can use wild card characters of ? and * as
well as two system symbols, &SYSELEM. and &SYSSUF.
The termtype has two options:
ALLTERM - indicates restart if either the system or the element fails.
ELEMTERM - indicates restart only if the element fails. If the system fails, do not
restart.
For more information about the ARM policy, refer to MVS Setting Up a Sysplex, SA22-7625.
-TOTAL6
11
-MAX7
200
88
4 The total number of batch jobs and started tasks that are currently registered as elements
of ARM that are in RESTARTING state.
5 The total number of batch jobs and started tasks that are currently registered as elements
of ARM that are in RECOVERING state.
6 The total number of batch jobs and started tasks that are currently registered as elements
of ARM.
7 The maximum number of elements that can register. This information is determined by the
TOTELEM value when the ARM couple data set was formatted.
To see more detail, enter the command D XCF,ARMS,DETAIL. A significant amount of useful
detail can be displayed, as shown in Figure 6-8.
D XCF,ARMS,DETAIL
IXC392I 21.13.37 DISPLAY XCF 547
ARM RESTARTS ARE ENABLED
-------------- ELEMENT STATE SUMMARY -------------- -TOTAL- -MAXSTARTING AVAILABLE FAILED RESTARTING RECOVERING
0
11
0
0
0
11
200
RESTART GROUP:DEFAULT
PACING :
0
FREECSA:
0
ELEMENT NAME :EZA$1TCPIP
JOBNAME :TCPIP
STATE
:AVAILABLE
CURR SYS :#@$1
JOBTYPE :STC
ASID
:0023
INIT SYS :#@$1
JESGROUP:XCFJES2A TERMTYPE:ELEMTERM
EVENTEXIT:*NONE*
ELEMTYPE:SYSTCPIP LEVEL
:
1
TOTAL RESTARTS :
0
INITIAL START:06/22/2007 22:49:47
RESTART THRESH :
0 OF 0
FIRST RESTART:*NONE*
RESTART TIMEOUT:
300
LAST RESTART:*NONE*
RESTART GROUP:DEFAULT
PACING :
0
FREECSA:
0
ELEMENT NAME :EZA$2TCPIP
JOBNAME :TCPIP
STATE
:AVAILABLE
CURR SYS :#@$2
JOBTYPE :STC
ASID
:0023
INIT SYS :#@$2
JESGROUP:XCFJES2A TERMTYPE:ELEMTERM
EVENTEXIT:*NONE*
ELEMTYPE:SYSTCPIP LEVEL
:
1
TOTAL RESTARTS :
0
INITIAL START:06/22/2007 22:04:21
RESTART THRESH :
0 OF 0
FIRST RESTART:*NONE*
RESTART TIMEOUT:
300
LAST RESTART:*NONE*
RESTART GROUP:DEFAULT
PACING :
0
FREECSA:
0
ELEMENT NAME :EZA$3TCPIP
JOBNAME :TCPIP
STATE
:AVAILABLE
CURR SYS :#@$3
JOBTYPE :STC
ASID
:0023
INIT SYS :#@$3
JESGROUP:XCFJES2A TERMTYPE:ELEMTERM
EVENTEXIT:*NONE*
ELEMTYPE:SYSTCPIP LEVEL
:
1
TOTAL RESTARTS :
0
INITIAL START:06/22/2007 22:00:23
RESTART THRESH :
0 OF 0
FIRST RESTART:*NONE*
RESTART TIMEOUT:
300
LAST RESTART:*NONE*
RESTART GROUP:DEFAULT
PACING :
0
FREECSA:
0
. . .
1A
2A
3A
1B
2B
3B
1C
2C
3C
This portion of the display, which was restricted to TCPIP, shows the following details:
TCPIP belongs to the default ARM group 1A, 1B, 1C
TCPIP has an STC on each LPAR 2A, 2B, 2C
TCPIP has never been restarted by ARM 3A, 3B, 3C
89
To start or change an ARM policy at the request of a system programmer, issue the SETXCF
command. Figure 6-10 shows the ARM policy changed to ARMPOL02.
SETXCF START,POLICY,TYPE=ARM,POLNAME=ARMPOL02
IXC805I ARM POLICY HAS BEEN STARTED BY SYSTEM #@$2.
POLICY NAMED ARMPOL02 IS NOW IN EFFECT.
Figure 6-10 Starting or changing an ARM policy
If SETXCF START is issued without the POLNAME parameter, the ARM defaults are used.
You can find the default values in MVS Setting Up a Sysplex, SA22-7625. Figure 6-11
displays an example of starting an ARM policy without specifying a POLNAME.
SETXCF START,POLICY,TYPE=ARM
IXC805I ARM POLICY HAS BEEN STARTED BY SYSTEM #@$2.
POLICY DEFAULTS ARE NOW IN EFFECT.
Figure 6-11 Starting the default ARM policy
To stop an ARM policy without activating another one, issue the command SETXCF
STOP,POLICY,TYPE=ARM as shown in Figure 6-12. This would be done at the request of a
system programmer, usually during a maintenance window or disaster recovery exercise.
SETXCF STOP,POLICY,TYPE=ARM
IXC806I ARM POLICY HAS BEEN STOPPED BY SYSTEM #@$2
Figure 6-12 Stopping the ARM policy
90
91
RESTART_GROUP(CICS#@$1)
ELEMENT(SYSCICS_#@$CCC$1)
TERMTYPE(ELEMTERM)
ELEMENT(SYSCICS_#@$CCM$1)
TERMTYPE(ELEMTERM)
. . .
/* NO OTHER ARM ELEMENTS WILL BE RESTARTED */
RESTART_GROUP(DEFAULT)
ELEMENT(*)
RESTART_ATTEMPTS(0)
RESTART_ATTEMPTS(0,)
RESTART_TIMEOUT(120)
TERMTYPE(ALLTERM)
There was no RESTART_GROUP for SDSF, which means that the command
C SDSF,ARMRESTART resulted in ARM not restarting SDSF, as shown in 2 in Figure 6-15.
C SDSF,ARMRESTART
IEA989I SLIP TRAP ID=X222 MATCHED. JOBNAME=SDSF
. . .
$HASP395 SDSF
ENDED
IEA989I SLIP TRAP ID=X33E MATCHED. JOBNAME=*UNAVAIL, ASID=0024.
IXC804I JOBNAME SDSF, ELEMENT ISFSDSF@$2 WAS NOT RESTARTED. 025
THE RESTART ATTEMPTS THRESHOLD HAS BEEN REACHED.
To create an ARM policy for SDSF, an updated ARM policy had to be created. The changes
made are shown in Figure 6-16.
/* SDSF */
RESTART_GROUP(SDSF)
ELEMENT(ISFSDSF*)
RESTART_METHOD(ELEMTERM,STC,'S SDSF')
RESTART_ATTEMPTS(3,60)
1
2
3
4
5
92
3 The element here is ISFSDSF*, where * is the standard wildcard matching 0 or more
characters. The element ID must match the registration ID. SDSF documentation states that
the registration ID for SDSF is ISFserver-name@&sysclone. Thus, on system #@$2, where
&SYSCLONE = $2, the registration ID is ISFSDSF$2. We could have defined three different
groups, one per system, but it is cleaner to create a wildcard entry that matches each system.
4 In this example we specify ELEMTERM, indicating that ARM is only to attempt a restart
when the element fails, that is, if the STC SDSF fails. The alternatives are to specify
SYSTERM, which means the restart only applies when the system fails, or to specify BOTH,
which means the restart method applies if either a system or an element fails. The second
part of this parameter says SDSF is a STC to be restarted via the S SDSF command.
5 In this example ARM will attempt to restart SDSF three times in 60 seconds. After the third
attempt, ARM will produce a message and not try to restart it. Automation should be set up to
trap on the IXC804I message; Figure 6-22 on page 95 shows this situation.
When SDSF is started using the defaults, it registers itself to ARM as shown at 1 in
Figure 6-19 on page 94. Even if ARM is inactive when an STC or job such as SDSF starts, it
still successfully registers to ARM. Notice that nothing in Figure 6-18 indicates whether ARM
is active or inactive. Instead, it is the state of ARM and the ARM policy when the STC or job
fails that determines what happens.
S SDSF
ISF724I SDSF level HQX7730 initialization complete for server SDSF.
ISF726I SDSF parameter processing started.
ISF170I Server SDSF ARM registration complete for element type SYSSDSF,
element name ISFSDSF@$2
1
ISF739I SDSF parameters being read from member ISFPRM00 of data set
SYS1.PARMLIB
ISF728I SDSF parameters have been activated
Figure 6-18 Starting SDSF
93
Figure 6-20 C SDSF without SDSF defined correctly to the ARM policy
If you define an ARM policy with incorrect elements, such as ELEMENT(SDSF) and
ELEMENT(ISFSDSF#@$2), then issuing a C SDSF,ARMRESTART command produces the same
results as shown in Figure 6-20.
94
C SDSF,ARMRESTART
. . .
$HASP395 SDSF
ENDED
S SDSF
IXC812I JOBNAME SDSF, ELEMENT ISFSDSF@$2 FAILED.
THE ELEMENT WAS RESTARTED WITH OVERRIDE START TEXT.
IXC813I JOBNAME SDSF, ELEMENT ISFSDSF@$
WAS RESTARTED WITH THE FOLLOWING START TEXT:
S SDSF
THE RESTART METHOD USED WAS DETERMINED BY THE ACTIVE POLICY.
$HASP100 SDSF
ON STCINRDR
1
2
3
4
95
PROC
EXEC
PGM=SLEEPY
To make use of the ARMWRAP facility, two steps must be added to the proc. These steps
can be seen in Figure 6-24.
//SLEEPY PROC
//*
//* Register element 'SLEEPY' element type 'APPLTYPE' with ARM
//* Requires access to SAF FACILITY IXCARM.APPLTYPE.SLEEPY
//*
//ARMREG EXEC PGM=ARMWRAP,
//
PARM=('REQUEST=REGISTER,READYBYMSG=N,',
//
'TERMTYPE=ALLTERM,ELEMENT=SLEEPY,','ELEMTYPE=APPLTYPE')
//*
//SLEEPY EXEC PGM=SLEEPY
//*
//* For normal termination, deregister from ARM
//*
//ARMDREG EXEC PGM=ARMWRAP,PARM=('REQUEST=DEREGISTER')
//SYSABEND DD SYSOUT=*
1
2
3
1 This is the first step in the new PROC that runs the program ARMWRAP.
2 ARMWRAP takes parameters to register and define the ARM values such as ELEMTYPE,
ELEMENT and TERMTYPE.
3 The step (or steps) that form the proc are left as they were before.
4 When the proc finishes normally, it needs to deregister.
After SLEEPY is configured to work with ARMWRAP, it must be added to the ARM policy. In
our case, because we wanted SLEEPY to move to a different system in the event of a system
failure, we added the lines seen in Figure 6-25 on page 97 to the current policy.
96
It is not sufficient to add ARMWRAP to the STC or batch job. Instead, you must define it the
ARM policy. Figure 6-25 shows that we added a RESTART_GROUP for SLEEPY.
/* Sleepy */
RESTART_GROUP(SLEEPY)
TARGET_SYSTEM(#@$2,#@$3)
ELEMENT(SLEEPY)
RESTART_METHOD(BOTH,STC,'S SLEEPY')
RESTART_ATTEMPTS(3,60)
1 As shown in Figure 6-25, the element defined in the ARM policy must match the element
defined in the ARMWRAP parameters. We could have coded a generic element in the ARM
policy such as ELEMENT(SL*).
$HASP373 SLEEPY
STARTED
+ARMWRAP IXCARM REGISTER
RC = 000C RSN = 0168
-JOBNAME STEPNAME PROCSTEP
RC
EXCP
CPU
-SLEEPY
STARTING ARMREG
12
1
.00
When an attempt was made to start SLEEPY without defining the appropriate RACF profile,
the startup messages shown in Figure 6-26 were received. This attempt failed with an error 1.
z/OS V1R10.0 MVS Programming: Sysplex Services Reference, SA22-7618, identifies
IXCARM RC=12, RSN=168 as a security error, as shown in Figure 6-27.
Equate Symbol: IXCARMSAFNOTDEFINED
Meaning: Environmental error. Problem state and problem key users cannot use
IXCARM without having a security profile.
Action: Ensure that the proper IXCARM.elemtype.elemname resource profile for the
unauthorized application is defined to RACF or another security product.
Figure 6-27 IXCARM RC=12 RSN=168
The RACF commands used to protect this resource are shown in Figure 6-28.
RDEFINE FACILITY IXCARM.APPLTYPE.SLEEPY
PE
IXCARM.APPLTYPE.SLEEPY ID(SLEEPY) AC(UPDATE) CLASS(FACILITY)
SETROPTS RACLIST(FACILITY) REFRESH
Figure 6-28 RACF commands to protect ARM - SLEEPY
When this is done, an attempt to start SLEEPY works, as shown in Figure 6-29 on page 98.
97
1
2
1
2
3
98
RESTART GROUP:SDSF
ELEMENT NAME :ISFSDSF@$2
CURR SYS :#@$2
INIT SYS :#@$2
EVENTEXIT:*NONE*
TOTAL RESTARTS :
1
RESTART THRESH :
0 OF 3
RESTART TIMEOUT:
300
PACING :
0
FREECSA:
0
0
JOBNAME :SDSF
STATE
:RESTARTING
JOBTYPE :STC
ASID
:004A
JESGROUP:XCFJES2A TERMTYPE:ELEMTERM
ELEMTYPE:SYSSDSF LEVEL
:
2
INITIAL START:06/25/2007 03:36:28
FIRST RESTART:06/25/2007 03:39:16
LAST RESTART:06/25/2007 03:39:16
After the element is started, the ARM status changes to AVAILABLE 1, as shown in
Figure 6-32. It will also increment the Restart Thresh count 2.
RESTART GROUP:SDSF
ELEMENT NAME :ISFSDSF@$1
CURR SYS :#@$1
INIT SYS :#@$1
EVENTEXIT:*NONE*
TOTAL RESTARTS :
2
RESTART THRESH :
1 OF 3
RESTART TIMEOUT:
300
PACING :
0
FREECSA:
0
JOBNAME :SDSF
STATE
:AVAILABLE
JOBTYPE :STC
ASID
:0022
JESGROUP:XCFJES2A TERMTYPE:ELEMTERM
ELEMTYPE:SYSSDSF LEVEL
:
2
INITIAL START:06/22/2007 22:49:17
FIRST RESTART:06/25/2007 03:49:16
LAST RESTART:06/25/2007 03:49:16
0
1
Note: ARM does not restart elements in the following instances. Therefore the operator, or
an automation product, must manually intervene.
Canceling a job without the ARMRESTART parameter.
*F J=jobname,C (JES3 cancel command without the ARMRESTART parameter).
Batch jobs in a JES3 DJC net.
During a system shutdown and the policy for the element has
TERMTYPE(ELEMTERM). A TERMTYPE(ALLTERM) is required for system failure
restarts.
During a system shutdown when there is only one target system defined in the policy
for that element.
If the element has not been restarted by ARM, the initial start, first restart, and last restart
output is updated in the display. Figure 6-33 on page 100 shows that, for System #@$1 the
fields FIRST RESTART: and LAST RESTART: have the value *NONE*, which indicates that ARM
has not restarted SDSF on this system. In contrast, System #@$2 has times for these fields.
Likewise, note that the total number of restarts is 0 for #@$1 and it is 2 for #@$2.
99
System #@$1
RESTART GROUP:SDSF
ELEMENT NAME :ISFSDSF@$1
CURR SYS :#@$1
INIT SYS :#@$1
EVENTEXIT:*NONE*
TOTAL RESTARTS :
0
RESTART THRESH :
0 OF 3
RESTART TIMEOUT:
300
PACING :
0
FREECSA:
0
JOBNAME :SDSF
STATE
:AVAILABLE
JOBTYPE :STC
ASID
:0022
JESGROUP:XCFJES2A TERMTYPE:ELEMTERM
ELEMTYPE:SYSSDSF LEVEL
:
2
INITIAL START:06/22/2007 22:49:17
FIRST RESTART:*NONE*
LAST RESTART:*NONE*
System #@$2
RESTART GROUP:SDSF
ELEMENT NAME :ISFSDSF@$2
CURR SYS :#@$2
INIT SYS :#@$2
EVENTEXIT:*NONE*
TOTAL RESTARTS :
2
RESTART THRESH :
0 OF 3
RESTART TIMEOUT:
300
PACING :
0
FREECSA:
0
JOBNAME :SDSF
STATE
:AVAILABLE
JOBTYPE :STC
ASID
:004B
JESGROUP:XCFJES2A TERMTYPE:ELEMTERM
ELEMTYPE:SYSSDSF LEVEL
:
2
INITIAL START:06/25/2007 03:36:28
FIRST RESTART:06/25/2007 03:39:16
LAST RESTART:06/25/2007 03:41:05
1
2
100
Chapter 7.
101
Lock: For serialization of data with high granularity. Global Resource Serialization
(GRS) is an example of a Lock structure exploiter.
List: For shared queues and shared status information. System Logger is an
example of a List structure exploiter.
Cache: For storing data and maintaining local buffer pool coherency information.
RACF database sharing is an example of a Cache structure exploiter.
For a current list of CF structure names and exploiters, refer to Appendix B, List of
structures on page 499.
Hardware
Processor
Channels (links) and subchannels
Storage
The Coupling Facility can be configured either stand-alone or in an LPAR on a CEC
alongside operating systems such as z/OS and z/VM. The Coupling Facility does not have
any connected I/O devices, and the only console interface to it is through the HMC.
Connectivity to the CF is with CF links, which can be a combination of the following:
Inter-System Channel (ISC)
Integrated Cluster Bus (ICB)
Internal Coupling Channel (IC)
102
A description of the various System z channel and CHPID types can be found at:
http://www.redbooks.ibm.com/abstracts/tips0086.html?Open
A Coupling Facility possesses unique attributes:
You can shut it down, upgrade it, and bring it online again without impacting application
availability.
You can potentially lose a CF without impacting the availability of the applications that are
using that CF.
The amount of real storage in a CF depends on several factors:
SITE
N/A
N/A
1 Name of the CF
To display the logical view of one of the CFs identified in Figure 7-1, issue the command D
XCF,CF,CFNAME=cfname as seen in Figure 7-2 on page 104.
103
D XCF,CF,CFNAME=FACIL01
IXC362I 19.19.36 DISPLAY XCF 550
CFNAME: FACIL01 1
COUPLING FACILITY
:
SIMDEV.IBM.EN.0000000CFCC1 2
PARTITION: 00 3 CPCID: 00 4
SITE
:
N/A
POLICY DUMP SPACE SIZE:
2000 K 5
ACTUAL DUMP SPACE SIZE:
2048 K 6
STORAGE INCREMENT SIZE:
256 K 7
CONNECTED SYSTEMS:
#@$1
#@$2
#@$3
STRUCTURES: 9
D#$#_LOCK1(OLD)
DFHNCLS_#@$CNCS1
IXC_DEFAULT_2
D#$#_SCA(OLD)
DFHXQLS_#@$STOR1
SYSTEM_OPERLOG
DFHCFLS_#@$CFDT1
IRRXCF00_P001
104
D CF
. . .
SENDER PATH
09
0E
PHYSICAL
ONLINE
ONLINE
LOGICAL
ONLINE
ONLINE
CHANNEL TYPE
ICP 1
ICP
NOT USABLE:
CHPID
E0
TYPE
ICP
. . .
Figure 7-3 Physical view of a CF
If the name of the CF is known, you can expand on the command in Figure 7-3 by issuing D
CF,CFNAME=FACIL01.
The output from the D CF command in Figure 7-3 displays information that includes the CF
link channel type in use.
1 CF channel type in use on FACIL01 is an Internal Coupling Channel Peer mode.
2 FACIL01 CF is connected to another CF named FACIL02.
3 The CF CHPIDs on FACIL01 are connected to CF FACIL02 using Internal Coupling
Channel Peer mode links.
105
D XCF,STR
IXC359I 20.05.03 DISPLAY XCF 643
STRNAME
ALLOCATION TIME
CIC_DFHLOG_001
--CIC_DFHSHUNT_001
--CIC_GENERAL_001
--D#$#_GBP0
--D#$#_GBP1
--D#$#_GBP32K
--D#$#_GBP32K1
--D#$#_LOCK1
06/20/2007 03:32:17
STATUS
TYPE
NOT ALLOCATED
NOT ALLOCATED
NOT ALLOCATED
NOT ALLOCATED
NOT ALLOCATED
NOT ALLOCATED
NOT ALLOCATED
ALLOCATED (NEW)3
LOCK
DUPLEXING REBUILD
METHOD: SYSTEM-MANAGED
PHASE: DUPLEX ESTABLISHED
D#$#_LOCK1
06/20/2007 03:32:15 ALLOCATED (OLD)
LOCK
DUPLEXING REBUILD
D#$#_SCA
06/20/2007 03:32:12 ALLOCATED (NEW)
LIST
DUPLEXING REBUILD
METHOD: SYSTEM-MANAGED
PHASE: DUPLEX ESTABLISHED
D#$#_SCA
06/20/2007 03:32:10 ALLOCATED (OLD)
LIST
DUPLEXING REBUILD
DFHCFLS_#@$CFDT1 06/21/2007 01:47:27 ALLOCATED 1
LIST
DFHNCLS_#@$CNCS1 06/21/2007 01:47:24 ALLOCATED
LIST
DFHXQLS_#@$STOR1 06/21/2007 01:47:22 ALLOCATED
LIST
IRRXCF00_B001
06/22/2007 21:59:18 ALLOCATED 2
CACHE
IRRXCF00_P001
06/22/2007 21:59:17 ALLOCATED
CACHE
. . .
Figure 7-4 Display of all CF structures
106
D XCF,CF,CFNAME=FACIL01
IXC362I 02.49.29 DISPLAY XCF 101
CFNAME: FACIL01
COUPLING FACILITY
:
SIMDEV.IBM.EN.0000000CFCC1
PARTITION: 00
CPCID: 00
SITE
:
N/A
POLICY DUMP SPACE SIZE:
2000 K
ACTUAL DUMP SPACE SIZE:
2048 K
STORAGE INCREMENT SIZE:
256 K
CONNECTED SYSTEMS:
#@$1
#@$2
#@$3
STRUCTURES:
2
CIC_DFHSHUNT_001
D#$#_LOCK1(OLD)
DFHNCLS_#@$CNCS1
IXC_DEFAULT_2
D#$#_GBP0(NEW)
D#$#_SCA(OLD)
DFHXQLS_#@$STOR1
SYSTEM_OPERLOG
D#$#_GBP1(NEW)
DFHCFLS_#@$CFDT1
IRRXCF00_P001
In Figure 7-5, the display shows a number of CF structures that are present in the CF named
FACIL01.
1 Systems #@$1, #@$2 and #@$3 are currently connected to this CF.
2 List of structures that are currently residing in this CF.
107
D XCF,STR,STRNAME=SYSTEM_OPERLOG 1
IXC360I 02.57.17 DISPLAY XCF 137
STRNAME: SYSTEM_OPERLOG
STATUS: ALLOCATED 2
EVENT MANAGEMENT: POLICY-BASED
TYPE: LIST 3
POLICY INFORMATION:
POLICY SIZE
: 16384 K
POLICY INITSIZE: 9000 K
POLICY MINSIZE : 0 K
FULLTHRESHOLD : 0
ALLOWAUTOALT
: NO
REBUILD PERCENT: N/A
DUPLEX
: DISABLED
ALLOWREALLOCATE: YES
PREFERENCE LIST: FACIL01 FACIL02 4
ENFORCEORDER
: NO
EXCLUSION LIST IS EMPTY
ACTIVE STRUCTURE
---------------ALLOCATION TIME: 06/18/2007 03:43:48
CFNAME
: FACIL01
COUPLING FACILITY: SIMDEV.IBM.EN.0000000CFCC1
PARTITION: 00
CPCID: 00
ACTUAL SIZE
: 9216 K
STORAGE INCREMENT SIZE: 256 K
ENTRIES: IN-USE:
6118 TOTAL:
6118, 100% FULL
ELEMENTS: IN-USE:
12197 TOTAL:
12341, 98% FULL
PHYSICAL VERSION: C0C39A43 CCB4260C
LOGICAL VERSION: C0C39A43 CCB4260C
SYSTEM-MANAGED PROCESS LEVEL: 8
DISPOSITION
: DELETE 5
ACCESS TIME
: 0
MAX CONNECTIONS: 32
# CONNECTIONS : 1 6
7
CONNECTION NAME ID VERSION SYSNAME JOBNAME ASID STATE
---------------- -- -------- -------- -------- ---- ---------------IXGLOGR_#@$1
01 000100CC #@$1
IXGLOGR 0016 FAILED-PERSISTENT
DIAGNOSTIC INFORMATION:
1 The name of the structure that detailed information is being gathered for. In this example,
detailed information for the SYSTEM_OPERLOG structure is being requested.
2 Identifies whether the structure is allocated or not.
3 The structure type. In this example, it is a List structure.
108
4 The Preference List as defined in the active Coupling Facility Resource manager (CFRM)
Policy. It displays the desired order of CFs as to where the structure should normally be.
5 The disposition of the structure.
6 The number of systems that are connected to this structure.
7 The connection names, system names, jobnames and states of the connection.
Note: The D XCF,STR,STRNM=ALL command displays all defined structures in detail for all
CFs.
Structure disposition
There are two disposition types for structures:
DELETE
KEEP
Attention: Use the SETXCF FORCE command with caution. Inform support staff before
proceeding.
ACTIVE
FAILED-PERSISTENT
DISCONNECTING or FAILING
At connection time, another parameter indicates the disposition of the connection. The state
of the connection depends on the disposition of the structure.
A connection with a disposition of keep is placed in a failed-persistent state if it terminates
abnormally, or if the owner of the structure has defined it this way (for example IMS). When in
the failed-persistent state, a connection becomes active again as soon as the connectivity to
the structure is recovered. The failed-persistent state can be thought of as a placeholder for
the connection to be recovered. Note that in some special cases, a connection with a
disposition of keep may be left in an undefined state even after an abnormal termination.
109
110
D XCF,STR,STRNM=SYSTEM_OPERLOG
IXC360I 21.10.43 DISPLAY XCF 741
STRNAME: SYSTEM_OPERLOG
STATUS: ALLOCATED
EVENT MANAGEMENT: POLICY-BASED
TYPE: LIST
POLICY INFORMATION:
POLICY SIZE
: 16384 K
POLICY INITSIZE: 9000 K
POLICY MINSIZE : 0 K
FULLTHRESHOLD : 0
ALLOWAUTOALT
: NO
REBUILD PERCENT: N/A
DUPLEX
: DISABLED
ALLOWREALLOCATE: YES
PREFERENCE LIST: FACIL01 FACIL02
ENFORCEORDER
: NO
EXCLUSION LIST IS EMPTY
ACTIVE STRUCTURE
---------------ALLOCATION TIME: 06/18/2007 03:43:48
CFNAME
: FACIL01
COUPLING FACILITY: SIMDEV.IBM.EN.0000000CFCC1
PARTITION: 00
CPCID: 00
ACTUAL SIZE
: 9216 K
STORAGE INCREMENT SIZE: 256 K
ENTRIES: IN-USE:
3246 TOTAL:
6118,
ELEMENTS: IN-USE:
6846 TOTAL:
12341,
PHYSICAL VERSION: C0C39A43 CCB4260C
LOGICAL VERSION: C0C39A43 CCB4260C
SYSTEM-MANAGED PROCESS LEVEL: 8
DISPOSITION
: DELETE
ACCESS TIME
: 0
MAX CONNECTIONS: 32
# CONNECTIONS : 3
53% FULL
55% FULL
1 The output displayed in Figure 7-7 identifies all of the connection names from the various
systems.
111
Now that the connection names have been identified for the SYSTEM_OPERLOG structure,
we can display the individual connection attributes for the structure. In this example, we
display the ConnectionName from our system #@$3.
D XCF,STR,STRNM=SYSTEM_OPERLOG,CONNM=IXGLOGR_#@$3
...
CONNECTION NAME : IXGLOGR_#@$3
ID
: 02
VERSION
: 00020065 1
CONNECT DATA
: 00000001 00000000
SYSNAME
: #@$3
JOBNAME
: IXGLOGR
2
ASID
: 0016
STATE
: ACTIVE
3
FAILURE ISOLATED FROM CF
CONNECT LEVEL
: 00000000 00000000
INFO LEVEL
: 01
CFLEVEL REQ
: 00000001
NONVOLATILE REQ : YES
CONDISP
: KEEP
4
ALLOW REBUILD
: YES
5
ALLOW DUPREBUILD: NO
ALLOW AUTO
: YES
SUSPEND
: YES
ALLOW ALTER
: YES
6
USER ALLOW RATIO: YES
USER MINENTRY
: 10
USER MINELEMENT : 10
USER MINEMC
: 25
DIAGNOSTIC INFORMATION: STRNUM: 0000000D STRSEQ: 00000000
MANAGER SYSTEM ID: 00000000
EVENT MANAGEMENT: POLICY-BASED
Figure 7-8 Displaying connection name details for a particular CF structure
1 The version of this connection. This is needed to distinguish it from other connections with
the same name on the same system for the same jobname; this was done, for example, after
a connection failure that was recovered.
2 This connection was done for the jobname IXGLOGR.
3 The connection is active.
4 The connection IXGLOGR_#@$3 has a connection disposition of KEEP.
5 The connection supports REBUILD.
6 The connection supports ALTER.
112
113
CF2
CF1
5a+b. Exchange of Ready to Complete signals
4. Execute request
4. Execute request
6a. Response
z/OS
XES
(split/merge)
6b. Response
1. Request in
Exploiter
7. Response out
114
User-managed rebuild:
It involves complex programming to ensure structure connectors communicate with each
other and XES to move the structure contents from one CF to another.
It requires that 12 specific events must be catered for, as well as handling error and
out-of-sequence situations.
The level of complex programming can be time-consuming, expensive, and error-prone.
The structure can only be moved to another CF if there is still one connector active.
Each exploiter of a CF structure must design and code its own solution. Therefore, some
exploiters do not provide rebuild capability (for example, JES2).
It leads to complex and differing operational procedures to handle planned and unplanned
CF outages.
System-managed rebuild:
It removes most of the complexity from the applications.
The actual movement of structure contents is handled by XES. This means that every
structure that supports system-managed rebuild is handled consistently.
Failure and out-of-sequence events are handled by XES.
Rebuild support is easier for CF exploiters.
It provides consistent operational procedures.
It can rebuild a structure when there are no active connectors.
It provides support for planned CF reconfigurations.
It is not for recovery scenarios.
115
Activate a new CFRM policy with DUPLEX(ALLOWED) keyword for the structure.
This method allows the structures to be duplexed; however, the duplexing must be
initiated by command because z/OS will not automatically duplex the structure.
Duplexing can then be initiated using the SETXCF START,REBUILD,DUPLEX command or
programmatically via the IXLREBLD STARTDUPLEX programming interface.
Duplexing can be manually stopped by using the SETXCF STOP,REBUILD,DUPLEX command or
programmatically via the IXLREBLD STOPDUPLEX programming interface. When you need
to stop duplexing structures, you must first decide which is to remain as the surviving simplex
structure.
You can also stop duplexing of a structure in a particular CF by issuing the command SETXCF
STOP,REBUILD,DUPLEX,CFNAME=cfname
For further information about each system command of CF Duplexing, including SETXCF,
refer to z/OS MVS System Commands, SA22-7627.
116
D XCF,STR,STRNM=ALL
IXC360I 19.01.23 DISPLAY XCF 013
STRNAME: CIC_DFHLOG_001 1
STATUS: NOT ALLOCATED
POLICY INFORMATION:
POLICY SIZE
: 32756 K
POLICY INITSIZE: 16384 K
POLICY MINSIZE : 0 K
FULLTHRESHOLD : 80
ALLOWAUTOALT
: NO
REBUILD PERCENT: N/A
DUPLEX
: DISABLED 2
ALLOWREALLOCATE: YES
PREFERENCE LIST: FACIL02 FACIL01
ENFORCEORDER
: NO
EXCLUSION LIST IS EMPTY
STRNAME: D#$#_GBP0 3
STATUS: NOT ALLOCATED
POLICY INFORMATION:
POLICY SIZE
: 8192 K
POLICY INITSIZE: 4096 K
POLICY MINSIZE : 3072 K
FULLTHRESHOLD : 80
ALLOWAUTOALT
: YES
REBUILD PERCENT: N/A
DUPLEX
: ENABLED 4
ALLOWREALLOCATE: YES
PREFERENCE LIST: FACIL02 FACIL01
ENFORCEORDER
: NO
EXCLUSION LIST IS EMPTY
STRNAME: LOG_FORWARD_001 5
STATUS: NOT ALLOCATED
POLICY INFORMATION:
POLICY SIZE
: 16384 K
POLICY INITSIZE: 9000 K
POLICY MINSIZE : 0 K
FULLTHRESHOLD : 0
ALLOWAUTOALT
: NO
REBUILD PERCENT: N/A
DUPLEX
: ALLOWED 6
ALLOWREALLOCATE: YES
PREFERENCE LIST: FACIL01 FACIL02
ENFORCEORDER
: NO
EXCLUSION LIST IS EMPTY
...
Figure 7-10 Displaying all CF structures
117
As shown in Figure 7-10 on page 117, the DUPLEX field can have a value of DISABLED,
ENABLED, and ALLOWED, as explained here:
DISABLED
ENABLED
ALLOWED
To obtain detailed duplexing information about a particular structure, you can use the output
from Figure 7-10 on page 117 and use the MVS command D XCF,STR,STRNAME=strname.
D XCF,STR,STRNAME=D#$#_GBP0
IXC360I 19.53.13 DISPLAY XCF 154
STRNAME: D#$#_GBP0
STATUS: REASON SPECIFIED WITH REBUILD START:
POLICY-INITIATED
DUPLEXING REBUILD
METHOD: USER-MANAGED
PHASE: DUPLEX ESTABLISHED 1
...
DUPLEXING REBUILD NEW STRUCTURE
------------------------------ALLOCATION TIME: 06/26/2007 19:37:34
CFNAME
: FACIL01 2
COUPLING FACILITY: SIMDEV.IBM.EN.0000000CFCC1
PARTITION: 00
CPCID: 00
...
DUPLEXING REBUILD OLD STRUCTURE
------------------------------ALLOCATION TIME: 06/26/2007 19:37:31
CFNAME
: FACIL02 3
COUPLING FACILITY: SIMDEV.IBM.EN.0000000CFCC2
PARTITION: 00
CPCID: 00
...
4 CONNECTION NAME ID VERSION SYSNAME JOBNAME
---------------- -- -------- -------- -------DB2_D#$1
02 00020045 #@$1
D#$1DBM1
DB2_D#$2
03 0003003E #@$2
D#$2DBM1
DB2_D#$3
01 00010048 #@$3
D#$3DBM1
...
ASID
---004B
004C
0024
STATE
---------------ACTIVE NEW,OLD
ACTIVE NEW,OLD
ACTIVE NEW,OLD
118
For each structure in the CFRM policy, the percent full threshold can be specified by the
installation to be any percent value between 0 and 100. Specifying a threshold value of zero
(0) means that no structure full monitoring will take place. If no threshold value is specified,
then the default value of 80% is used as the full threshold percent value.
When the utilization of all monitored structures falls below the structure full threshold,
message IXC586I, as shown in Figure 7-13, will be issued to the console and to the system
message logs to indicate that the full condition was relieved. Message IXC585E will be
deleted.
IXC586I STRUCTURE IXC_DEFAULT1 IN COUPLING FACILITY FACIL01, 295
PHYSICAL STRUCTURE VERSION C0D17D82 29104C88,
IS NOW BELOW STRUCTURE FULL MONITORING THRESHOLD.
Figure 7-13 IXC586I message
119
For more information about Coupling Facility CF recovery, refer to z/OS System z Parallel
Sysplex Recovery, which is available at:
http://www.ibm.com/servers/eserver/zseries/zos/integtst/library.html
1 This is the active policy name from the CFRM Couple Data Set.
If the relevant RACF authority has been granted, you can execute the IXCMIAPU utility to list
the CFRM Policies contained within the CFRM Couple Data Set. Sample JCL can be found in
SYS1.SAMPLIB (IXCCFRMP). An example is shown in Figure 7-15 on page 121.
120
//STEP1
EXEC PGM=IXCMIAPU
//SYSPRINT DD
SYSOUT=*
//SYSOUT
DD
SYSOUT=*
//SYSABEND DD
SYSOUT=*
//SYSIN
DD
*
DATA TYPE(CFRM) REPORT(YES)
Figure 7-15 Sample JCL for the CFRM Administrative utility to report on the CFRM policies
After the IXCMIAPU utility has been executed successfully, output that is similar to
Figure 7-16 will be displayed.
DEFINE POLICY NAME(CFRM02 )
/* Defined: 06/12/2007 17:22:42.862596 User: ROBI
/* 55 Structures defined in this policy */
/* 2 Coupling Facilities defined in this policy */
*/
To see sample output of the command used to display the logical view of the CF, refer to the
D XCF,CF command output shown in Figure 7-2 on page 104.
If you receive the response shown in Figure 7-17, then it means that your new CF is not
defined in the active CFRM policy.
D XCF,CF,CFNAME=CFT1
IXC362I 23.40.55 DISPLAY XCF 024
NO COUPLING FACILITIES MATCH THE SPECIFIED CRITERIA
Figure 7-17 Display a new CF
Contact your system programmer to define and activate a new CFRM policy that includes the
new CF.
After the CFRM Policy has been updated and activated, issue the command to display the
logical view of the new CF, as shown in Figure 7-18 on page 122.
121
D XCF,CF,CFNM=CFT1
IXC362I 01.28.23 DISPLAY XCF 692
CFNAME: CFT1
COUPLING FACILITY
:
SIMDEV.IBM.EN.0000000CFCC3
PARTITION: 00
CPCID: 00
SITE
:
N/A
POLICY DUMP SPACE SIZE:
2000 K
ACTUAL DUMP SPACE SIZE:
N/A
STORAGE INCREMENT SIZE:
N/A
NO SYSTEMS ARE CONNECTED TO THIS COUPLING FACILITY
NO STRUCTURES ARE IN USE BY THIS SYSPLEX IN THIS COUPLING FACILITY
Figure 7-18 Display the logical view of the new CF
Single-click Change Options, as shown in Figure 7-20 on page 123, to display the list of
profiles.
122
Select the image profile you want to view and single-click View to display the image profile.
Ensure that the mode Coupling facility is selected on the General view panel. Single-click
Processor on the left of the screen to display the processor view, as shown in Figure 7-22 on
page 124.
123
Verify these processor settings with your system programmer. If this is a production CF, you
will normally dedicate one or more processors. It is possible to share one or more processors,
but then you must also assign a processor weight. Unlike z/OS, the CFCC is running in a
continuous loop and if its processor resources are shared with other images, it can cause a
performance degradation on the other images if you do not assign the correct processor
weight value.
Tip: We recommend that you do not enable capping for a CF image.
Single-click Storage, on the left of the screen, to display the storage view as shown in HMC
Image Profile (Storage) in Figure 7-23.
124
Important: Do not activate an already active CEC if there are multiple images
defined on the CPC. This will reload all the images and they might contain active
operating systems.
If the CEC is already activated but the CF image is deactivated, you can activate the CF
image to load the CFCC, as described here:
Drag and drop the CF image you want to activate to the activate task in the task area.
Click yes on the confirmation panel to start the activation process. This will load the
CFCC code.
As soon as z/OS detects connectivity to the CF, you will see the messages as shown in
Figure 7-24 for all the images in the sysplex that are connected to this CF.
IXL157I PATH 09 IS NOW OPERATIONAL TO CUID: 0309 095
COUPLING FACILITY SIMDEV.IBM.EN.0000000CFCC1
PARTITION: 00 CPCID: 00
IXL157I PATH 0E IS NOW OPERATIONAL TO CUID: 0309 096
COUPLING FACILITY SIMDEV.IBM.EN.0000000CFCC1
PARTITION: 00 CPCID: 00
IXL157I PATH 0F IS NOW OPERATIONAL TO CUID: 030F 097
COUPLING FACILITY SIMDEV.IBM.EN.0000000CFCC2
PARTITION: 00 CPCID: 00
IXL157I PATH 10 IS NOW OPERATIONAL TO CUID: 030F 098
COUPLING FACILITY SIMDEV.IBM.EN.0000000CFCC2
PARTITION: 00 CPCID: 00
...
IXC517I SYSTEM #@$3 ABLE TO USE 125
COUPLING FACILITY SIMDEV.IBM.EN.0000000CFCC1
PARTITION: 00
CPCID: 00
NAMED FACIL01
IXC517I SYSTEM #@$3 ABLE TO USE 126
COUPLING FACILITY SIMDEV.IBM.EN.0000000CFCC2
PARTITION: 00
CPCID: 00
NAMED FACIL02
Figure 7-24 z/OS connectivity messages to CF
After you receive these messages, your Coupling Facility is ready to be used.
125
Resolve failed persistent and no connector structure conditions before shutting down the
CF.
Ensure all systems in the sysplex that are currently using the structures in the CF you
want to remove have connectivity to the alternate Coupling Facility.
To allow structures to be rebuilt on an alternate CF, ensure that enough capacity, such as
storage and CPU cycles, exists on the alternate CF.
126
D CF
IXL150I 21.14.10 DISPLAY CF 308
COUPLING FACILITY SIMDEV.IBM.EN.0000000CFCC1
PARTITION: 00 CPCID: 00
CONTROL UNIT ID: 0309
NAMED FACIL01
COUPLING FACILITY SPACE UTILIZATION
ALLOCATED SPACE
DUMP SPACE UTILIZATION
STRUCTURES:
209920 K 1
STRUCTURE DUMP TABLES:
DUMP SPACE:
2048 K
TABLE COUNT:
FREE SPACE:
511488 K
FREE DUMP SPACE:
TOTAL SPACE:
723456 K
TOTAL DUMP SPACE:
MAX REQUESTED DUMP SPACE:
VOLATILE:
YES
STORAGE INCREMENT SIZE:
CFLEVEL:
14
CFCC RELEASE 14.00, SERVICE LEVEL 00.29
BUILT ON 03/26/2007 AT 17:58:00
COUPLING FACILITY HAS ONLY SHARED PROCESSORS
0
0
2048
2048
0
256
K
K
K
K
K
FREE
599040 K 2
0 K
TOTAL
723456 K
0 K
For example, in our case, we want to move all the structures from Coupling Facility FACIL01
to FACIL02.
There is 209920 KB of storage used by structures on 1 FACIL01. There is 599040 KB of
storage available on 2 FACIL02. There is sufficient storage available on FACIL02 to move all
the structures from FACIL01 to FACIL02.
127
You can use the output of the command shown in Figure 7-25 on page 127 to determine
whether there is enough storage available on the alternate CF. If there is more than one
alternate CF available, the sum of the free storage for these CFs must be enough to
accommodate all the structures you want to move.
If you do not have enough free storage available on your CF, you can disable certain
subsystem functions from using the Coupling Facility. Refer to Removing a Coupling Facility
when only one Coupling Facility exists on page 131 for more information.
The command shown in Figure 7-26 displays the logical view of the CF. When you want to
remove a CF, use this command to determine which structures are allocated in the CF.
D XCF,CF,CFNAME=FACIL02
IXC362I 20.01.28 DISPLAY XCF 169
CFNAME: FACIL02
COUPLING FACILITY
:
SIMDEV.IBM.EN.0000000CFCC2
PARTITION: 00
CPCID: 00
SITE
:
N/A
POLICY DUMP SPACE SIZE:
2000 K
ACTUAL DUMP SPACE SIZE:
2048 K
STORAGE INCREMENT SIZE:
256 K
CONNECTED SYSTEMS:
#@$1
#@$2
#@$3
STRUCTURES:
1
D#$#_GBP0(OLD)
D#$#_GBP1(OLD)
I#$#RM
I#$#VSAM
IRRXCF00_B001
ISGLOCK
IXC_DEFAULT_1
I#$#LOCK1
IGWLOCK00
ISTGENERIC
Issue the command shown in Figure 7-28 on page 129 to activate the new CFRM policy.
128
SETXCF START,POLICY,TYPE=CFRM,POLNAME=new_policy_name
Figure 7-28 Activate a new CFRM Policy
Use a name for the new CFRM policy that is different from the name of the original CFRM
policy so that when CF maintenance is complete, the original policy can be restored.
After the command shown in Figure 7-28 has been issued, you may see the messages
displayed in Figure 7-29.
IXC511I START ADMINISTRATIVE POLICY CFRM03 FOR CFRM ACCEPTED
IXC512I POLICY CHANGE IN PROGRESS FOR CFRM 876
TO MAKE CFRM03 POLICY ACTIVE.
2 POLICY CHANGE(S) PENDING.
Figure 7-29 Policy change pending messages
If you receive the messages shown in Figure 7-29, issue the command in Figure 7-30 to
identify structures that are in a policy change pending state.
Note: When you start a new CFRM policy, the allocated structures that are affected by the
new policy change enter a policy change pending state. Structures that enter a policy
change pending state remain in that state until the structure is deallocated and reallocated
through a rebuild. Structures that reside on CFs that are not being removed might remain
in a policy change pending state until the original policy is restored.
D XCF,STR,STATUS=POLICYCHANGE
IXC359I 00.00.09 DISPLAY XCF 884
STRNAME
ALLOCATION TIME
STATUS
TYPE
IXC_DEFAULT_1
06/26/2007 22:25:22 ALLOCATED
LIST
POLICY CHANGE PENDING - CHANGE
IXC_DEFAULT_2
06/26/2007 22:25:13 ALLOCATED
LIST
POLICY CHANGE PENDING - CHANGE
EVENT MANAGEMENT: POLICY-BASED
Figure 7-30 Identify structures in a policy change pending state
Other structures must be moved by the owning subsystem (for example, the JES2 checkpoint
structure). See 7.8, Managing CF structures on page 147, for information about moving
these structures and deallocating persistent structures.
129
For a list of structures that either support or do not support rebuild, refer to Appendix B, List
of structures on page 499.
To verify that no structures remain in the CF that is being removed, issue the command in
Figure 7-32.
D XCF,CF,CFNM=FACIL02
IXC362I 19.43.13 DISPLAY XCF
CFNAME: FACIL02
...
1 NO STRUCTURES ARE IN USE BY THIS SYSPLEX IN THIS COUPLING FACILITY
Figure 7-32 Displaying information about the CF to be removed
If there are no structures allocated to this 1 CF, you can continue to configure the sender
paths offline and to deactivate the CF.
CHANNEL TYPE
ICP
ICP
Issue the command shown in Figure 7-34, where nn is one of the sender paths you need to
configure offline.
CONFIG CHP(nn),OFFLINE,UNCOND
Figure 7-34 Configure command for a CF chpid
Note: The UNCOND parameter is only needed if it is the last sender path that is connected
to the CF.
130
CONFIG CHP(10),OFFLINE,UNCOND
IEE712I CONFIG PROCESSING COMPLETE
Figure 7-35 Configure a CF chpid offline
To ensure that the sender paths were taken offline, issue the command in Figure 7-36.
D CF
COUPLING FACILITY SIMDEV.IBM.EN.0000000CFCC2
PARTITION: 00 CPCID: 00
CONTROL UNIT ID: 030F
NAMED FACIL02
...
SENDER PATH
PHYSICAL
LOGICAL
0F
ONLINE
ONLINE
10
1 NOT OPERATIONAL
ONLINE
CHANNEL TYPE
ICP
ICP
The output of the command in Figure 7-36 indicates that the physical status of the sender
path is 1 NOT OPERATIONAL and that there are 2 NO SUBCHANNELS AVAILABLE as a result of the
configure offline command.
131
D XCF,CF,CFNAME=FACIL02
IXC362I 21.56.25 DISPLAY XCF 294
CFNAME: FACIL02
COUPLING FACILITY
:
SIMDEV.IBM.EN.0000000CFCC2
PARTITION: 00
CPCID: 00
SITE
:
N/A
POLICY DUMP SPACE SIZE:
2000 K
ACTUAL DUMP SPACE SIZE:
2048 K
STORAGE INCREMENT SIZE:
256 K
CONNECTED SYSTEMS:
#@$1
#@$2
#@$3
NO STRUCTURES ARE IN USE BY THIS SYSPLEX IN THIS COUPLING FACILITY
Figure 7-37 Identify structures allocated in CF
Figure 7-39 displays the output of the command to stop the CFRM policy.
SETXCF STOP,POLICY,TYPE=CFRM
IXC510I STOP POLICY FOR CFRM ACCEPTED
IXC512I POLICY CHANGE IN PROGRESS FOR CFRM
TO MAKE NULL POLICY ACTIVE.
11 POLICY CHANGE(S) PENDING. 1
Figure 7-39 Stopping an active CFRM policy
Note: When you stop the active CFRM policy, allocated structures will enter a 1 policy
change pending state.
The following examples explain how to remove CF structure exploiters.
Note: Each CF structure exploiter may have an explicit way of removing its use of a
particular structure. Therefore, you may need to reference the relevant documentation to
obtain further specific information about the process required.
132
Use the SETXCF FORCE command to delete the persistent JES2 structure after the
reconfiguration is completed successfully.
See Chapter 14, Managing consoles in a Parallel Sysplex on page 283 for more
information on managing OPERLOG.
Issue the command shown in Figure 7-42 to stop log stream recording of LOGREC and
revert to the disk based data set, if a disk version exists.
SETLOGRC DATASET
Figure 7-42 LOGREC medium reverted back to disk data set
If a disk-based data set is not available, you can request that recording logrec error and
environmental records be disabled by issuing the command shown in Figure 7-43.
SETLOGRC IGNORE
Figure 7-43 Disable LOGREC recording
133
If entering the command using the console, you will need to first identify the correct
Command Recognition Character (CRC) in use by the RACF subsystem. Issue the command
in Figure 7-46 to display the various CRCs that are in use.
D OPDATA
IEE603I 23.35.51 OPDATA DISPLAY 534
PREFIX
OWNER
SYSTEM
SCOPE
...
%
RACF
#@$1
SYSTEM
%
RACF
#@$2
SYSTEM
% 1
RACF
#@$3 2
SYSTEM 3
REMOVE
FAILDSP
NO
NO
NO
PURGE
PURGE
PURGE
Note: Regardless of which method you use to disable RACF data sharing mode, you will
be required to enter the RVARY password to authorize this command.
134
D CF,CFNAME=FACIL02
IXL150I 23.48.44 DISPLAY CF 560
COUPLING FACILITY SIMDEV.IBM.EN.0000000CFCC2
PARTITION: 00 CPCID: 00
CONTROL UNIT ID: 030F
NAMED FACIL02
...
SENDER PATH
PHYSICAL
LOGICAL
10 1
ONLINE
ONLINE
COUPLING FACILITY SUBCHANNEL STATUS
TOTAL:
6
IN USE:
6
NOT USING:
DEVICE
SUBCHANNEL
STATUS
5030
000A
OPERATIONAL
5031
000B
OPERATIONAL
5032
000C
OPERATIONAL
5033
000D
OPERATIONAL
5034
000E
OPERATIONAL
5035
000F
OPERATIONAL
CHANNEL TYPE
ICP
NOT USABLE:
Issue the command in Figure 7-50, where nn is the 1 sender path that you need to configure
offline.
CONFIG CHP(nn),OFFLINE
Figure 7-50 Configure the CF Sender Path offline
If all structures are removed from the CF, you can use the UNCOND parameter.
Note: The FORCE and UNCOND parameters are only necessary if it is the last sender
path that is connected to the CF.
CF CHP(10),OFF,FORCE
1 IXL126I CONFIG WILL FORCE OFFLINE LAST CHP(10) TO COUPLING FACILITY FACIL02
06 2 IXL127A REPLY CANCEL OR CONTINUE
R 06,CONTINUE
IEE600I REPLY TO 06 IS;CONTINUE
IEE503I CHP(10),OFFLINE
IEE712I CONFIG
PROCESSING COMPLETE
3 IXC518I SYSTEM #@$3 NOT USING
COUPLING FACILITY SIMDEV.IBM.EN.0000000CFCC2
PARTITION: 00 CPCID: 00
CONTROL UNIT ID: 030F 2
NAMED FACIL02
REASON: CONNECTIVITY LOST.
REASON FLAG: 13300002.
Figure 7-51 Configuring a CHPID offline
135
You will receive message 1 IXL126I when you specify the FORCE keyword. Reply
CONTINUE on message 2 IXL127A to configure the sender path offline. You will receive
message 3 IXC518I as soon as all the sender paths are configured offline. To ensure that the
sender paths were taken offline, issue the command as shown in Figure 7-52.
D CF,CFNAME=FACIL02
IXL150I 23.48.44 DISPLAY CF 560
COUPLING FACILITY SIMDEV.IBM.EN.0000000CFCC2
PARTITION: 00 CPCID: 00
CONTROL UNIT ID: 030F
NAMED FACIL02
...
SENDER PATH
PHYSICAL
LOGICAL
10
NOT OPERATIONAL 1
ONLINE
CHANNEL TYPE
ICP
The output of the command in Figure 7-52 is indicating that the physical status of the sender
path is 1 NOT OPERATIONAL and that there is 2 NO SUBCHANNEL AVAILABLE as a result of the
configure offline command.
Figure 7-54 shows the messages that are issued when the new CFRM policy is activated.
SETXCF START,POLICY,TYPE=CFRM,POLNAME=CFRM03
IXC511I START ADMINISTRATIVE POLICY CFRM03 FOR CFRM ACCEPTED
IXC511I START ADMINISTRATIVE POLICY CFRM03 FOR CFRM ACCEPTED
IXC513I COMPLETED POLICY CHANGE FOR CFRM.
CFRM03 POLICY IS ACTIVE.
Figure 7-54 CFRM policy messages
136
The LOC=OTHER parameter may be needed, depending on the CFRM policy structure
preference list.
For more information about the rebuild command, refer to 7.8.1, Rebuilding structures that
support rebuild on page 147.
For more information and a list of structures that do not support rebuild, refer to Appendix B,
List of structures on page 499.
Restart any other subsystems that have been stopped during the CF shutdown procedure.
Note: After the CF and CFRM policy are restored, some functions might reconnect to the
CF automatically, depending on the method used to remove the structures.
137
The following commands are used to display CF resource information. These commands can
only be issued to the CF using the Operating System Messages interface of the HMC.
When the CF is initially activated, you see the messages displayed in Figure 7-56.
138
The output of this command displays the paths currently configured online to the CF.
139
The volatility mode of the CF can either be NONVOLATILE or VOLATILE, as explained here:
NONVOLATILE
VOLATILE
This specifies that the CF should be run in this mode regardless of the
actual volatility state of the CF. Coupling Facility storage contents are
lost if a power failure occurs or the CF is turned off. This is the
preferred mode for CF operation without a UPS backup.
140
141
Display Dyndisp
Issue the Display Dyndisp command to turn Dynamic Coupling Facility Dispatching on or off
for the CF, as shown in Figure 7-63.
Display CPs
Issue the Display CPs command to display the online and standby central processors
assigned to the CF partition, as shown in Figure 7-64 on page 143.
142
Display Help
Issue the Display Help command to display CF command syntax for the command you enter,
as shown in Figure 7-65.
143
Dynamic CF dispatching
You can enable dynamic CF dispatching for a CF image in order to use it as a backup CF if
the primary CF fails. Issue the command shown in Figure 7-66 to enable or disable dynamic
CF dispatching.
DYNDISP ON|OFF
Figure 7-66 Enabling/disabling dynamic CF dispatching
The message shown in Figure 7-67 will be displayed if you attempt to enable dynamic CF
dispatching on dedicated CF processors.
DYNDISP ON
CF0505I DYNDISP command cancelled
Command has no effect with dedicated CP's
Figure 7-67 Message when enabling dynamic CF dispatching on dedicated CF processors
POWERSAVE
This specifies that the CF runs in POWERSAVE mode and that CF storage contents are
nonvolatile if the battery backup feature is installed and its battery is online and charged.
Issue the command shown in Figure 7-68 to enable POWERSAVE.
MODE POWERSAVE
CF0102I MODE is POWER SAVE. Current status is NONVOLATILE.
Power-Save resources are available.
Figure 7-68 Enabling POWERSAVE mode
If the volatility state of the CF remains nonvolatile, running in POWERSAVE mode assures
the following:
If a utility power failure occurs and utility power does not return before a rideout interval
completes, the CF will cease operation and save its storage contents across the utility
power failure. When power is restored in the CF, CF storage contents will be intact and do
not have to be rebuilt.
If a utility power failure occurs and utility power returns before a rideout interval completes,
CF operation continues. You specify the length of the rideout interval using the RIDEOUT
command. POWERSAVE is the default volatility mode.
144
VOLATILE
This specifies that the CF runs in volatile mode regardless of the actual volatility state of the
CF. CF storage contents are lost if a power failure occurs or if CF power is turned off. This is
the preferred mode for CF operation without an UPS backup or internal battery feature (IBF).
Issue the command shown in Figure 7-69 to change the CF to volatile mode.
MODE VOLATILE
CF0100I MODE is VOLATILE
Figure 7-69 Changing mode to VOLATILE
NONVOLATILE
This specifies that the CF runs in nonvolatile mode. This should be used if a UPS is available
for the processor complex that the CF is running on. The CF does not monitor the installation
or availability of a UPS, but maintains a nonvolatile status for the CF. Issue the command
shown in Figure 7-70 to change the CF to nonvolatile mode.
MODE NONVOLATILE
CF0100I MODE is NONVOLATILE
Figure 7-70 Changing mode to NONVOLATILE
Shutdown CFCC
This ends CF operation and puts all CF logical central processors (CPs) into a disabled wait
state. Issue the command shown in Figure 7-72 to shut down the CF.
SHUTDOWN
CF0082A If SHUTDOWN is confirmed, shared data will be lost;
CF0090A Do you really want to shut down the Coupling Facility? (YES/NO)
YES
Figure 7-72 Shutdown command
Attention: By responding YES to the prompt in Figure 7-72, any shared data that remains
in the CF will be lost.
Enter YES to confirm the shutdown.
Chapter 7. Coupling Facility considerations in a Parallel Sysplex
145
General Help
Issue the command shown in Figure 7-73 to obtain a list of available CFCC commands.
Specific Help
Issue the command shown in Figure 7-74 to obtain help regarding a specific CFCC
command.
HELP command
Figure 7-74 Specific HELP for a CFCC command
As an example, the output shown in Figure 7-75 displays help information about the
CONFIGURE command.
HELP CONFIGURE
CF0403I Configure command formats:
CONfigure xx ONline
xx OFFline
Where xx is a hex CHPID number.
Example:
configure 11 offline
Figure 7-75 Requesting specific HELP for the CONFIGURE command
146
If required, the structure rebuild process can be initiated using operator commands including
the command shown in Figure 7-77 on page 148.
147
SETXCF START,REBUILD,STRNAME=strname.
Figure 7-77 SETXCF START REBUILD command for a structure
There are several reasons why a structure may have to be rebuilt, including performance,
structure size changes, or maintenance.
ID
-03
01
02
VERSION
-------00030069
00010090
00020063
SYSNAME
-------#@$1
#@$2
#@$3
JOBNAME
-------GRS
GRS
GRS
ASID
---0007
0007
0007
148
STATE
-------ACTIVE
ACTIVE
ACTIVE
D#$#_GBP1(NEW)
I#$#RM
IRRXCF00_B001
IXC_DEFAULT_1
I#$#LOCK1
I#$#VSAM
ISGLOCK
Refer to the output in Figure 7-80 and determine if any of the structures that you want to
rebuild has a number of connections of zero (0) (no-connector condition exists), or if the
number of connections is not zero (0) and all connectors are failed persistent.
D XCF,STR,STRNM=SYSTEM_OPERLOG
IXC360I 00.39.12 DISPLAY XCF 692
STRNAME: SYSTEM_OPERLOG
. . .
ACTIVE STRUCTURE
---------------ALLOCATION TIME: 07/01/2007 19:33:55
CFNAME
: FACIL01
COUPLING FACILITY: SIMDEV.IBM.EN.0000000CFCC1
PARTITION: 00
CPCID: 00
ACTUAL SIZE
: 9216 K
STORAGE INCREMENT SIZE: 256 K
ENTRIES: IN-USE:
704 TOTAL:
6118,
ELEMENTS: IN-USE:
1780 TOTAL:
12341,
PHYSICAL VERSION: C0D4C6E1 1ADD1885
LOGICAL VERSION: C0D4C6E1 1ADD1885
SYSTEM-MANAGED PROCESS LEVEL: 8
DISPOSITION
: DELETE
ACCESS TIME
: 0
MAX CONNECTIONS: 32
# CONNECTIONS : 3 1
. . .
11% FULL
14% FULL
149
The command shown in Figure 7-81 illustrates how to display a structure with a
failed-persistent connection.
D XCF,STR,STRNM=IGWLOCK00
IXC360I 14.39.56 DISPLAY XCF 458
STRNAME: IGWLOCK00
STATUS: ALLOCATED
POLICY SIZE
: 160000 K
POLICY INITSIZE: 80000 K
REBUILD PERCENT: 1
DUPLEX
: DISABLED
PREFERENCE LIST: FACIL01 FACIL02
EXCLUSION LIST IS EMPTY
...
# CONNECTIONS : 1
CONNECTION NAME ID VERSION SYSNAME JOBNAME ASID STATE
---------------- -- -------- -------- -------- ---- ---------------ZZZZZZZZ#@$3
01 00010091 #@$3
SMSVSAM 000A FAILED-PERSISTENT
Figure 7-81 Displaying a structure with a failed-persistent connection
Enter SETXCF START,REBUILD,STRNM=strname to rebuild the structure into the CF that is first
displayed on the PREFERENCE LIST, as defined in the CFRM policy.
An example of the messages from the REBUILD is shown in Figure 7-82 on page 151.
150
SETXCF START,REBUILD,STRNAME=IGWLOCK00
IXC521I REBUILD FOR STRUCTURE IGWLOCK00 209
HAS BEEN STARTED
IXC367I THE SETXCF START REBUILD REQUEST FOR STRUCTURE 210
IGWLOCK00 WAS ACCEPTED.
IGW457I DFSMS REBUILD PROCESSING HAS BEEN 181
INVOKED FOR LOCK STRUCTURE IGWLOCK00
PROCESSING EVENT: REBUILD QUIESCE
. . .
IXC526I STRUCTURE IGWLOCK00 IS REBUILDING FROM 183
COUPLING FACILITY FACIL02 TO COUPLING FACILITY FACIL01.
REBUILD START REASON: OPERATOR INITIATED
INFO108: 00000003 00000003.
IXC582I STRUCTURE IGWLOCK00 ALLOCATED BY COUNTS. 184
PHYSICAL STRUCTURE VERSION: C0D4E45C 2FCFBC4E
STRUCTURE TYPE:
LOCK
CFNAME:
FACIL01
ALLOCATION SIZE:
14336 K
POLICY SIZE:
20480 K
POLICY INITSIZE:
14336 K
POLICY MINSIZE:
0 K
IXLCONN STRSIZE:
0 K
ENTRY COUNT:
36507
LOCKS:
2097152
ALLOCATION SIZE IS WITHIN CFRM POLICY DEFINITIONS
IXL014I IXLCONN REBUILD REQUEST FOR STRUCTURE IGWLOCK00 185
WAS SUCCESSFUL. JOBNAME: SMSVSAM ASID: 000A
CONNECTOR NAME: ZZZZZZZZ#@$2 CFNAME: FACIL01
IXL015I REBUILD NEW STRUCTURE ALLOCATION INFORMATION FOR 186
STRUCTURE IGWLOCK00, CONNECTOR NAME ZZZZZZZZ#@$2
CFNAME
ALLOCATION STATUS/FAILURE REASON
---------------------------------------FACIL02
RESTRICTED BY REBUILD OTHER
FACIL01
STRUCTURE ALLOCATED AC001800
...
IGW457I DFSMS REBUILD PROCESSING HAS BEEN 218
INVOKED FOR LOCK STRUCTURE IGWLOCK00
PROCESSING EVENT: REBUILD PROCESS COMPLETE
IXC579I PENDING DEALLOCATION FOR STRUCTURE IGWLOCK00 IN 463
COUPLING FACILITY SIMDEV.IBM.EN.0000000CFCC2
PARTITION: 00
CPCID: 00
HAS BEEN COMPLETED.
PHYSICAL STRUCTURE VERSION: C0D006C8 BE46ECC5
INFO116: 13088068 01 6A00 00000003
TRACE THREAD: 00002B57.
IGW457I DFSMS REBUILD PROCESSING HAS BEEN 464
INVOKED FOR LOCK STRUCTURE IGWLOCK00
PROCESSING EVENT: REBUILD PROCESS COMPLETE
Figure 7-82 Messages from REBUILD command
Note: The structure was moved from FACIL02 to FACIL01 in this example, because
FACIL01 was the first system in the preference list.
151
Issue the D XCF,STR,STRNM=strname command, as shown in Figure 7-84 on page 153. Check
whether the structure is active in another Coupling Facility.
152
D XCF,STR,STRNM=IGWLOCK00
IXC360I 02.28.37 DISPLAY XCF 897
STRNAME: IGWLOCK00
STATUS: ALLOCATED
EVENT MANAGEMENT: POLICY-BASED
TYPE: LOCK
POLICY INFORMATION:
POLICY SIZE
: 20480 K
POLICY INITSIZE: 14336 K
POLICY MINSIZE : 0 K
FULLTHRESHOLD : 80
ALLOWAUTOALT
: NO
REBUILD PERCENT: N/A
DUPLEX
: DISABLED
ALLOWREALLOCATE: YES
PREFERENCE LIST: FACIL02 FACIL01 1
ENFORCEORDER
: NO
EXCLUSION LIST IS EMPTY
ACTIVE STRUCTURE
---------------ALLOCATION TIME: 07/01/2007 22:02:55
CFNAME
: FACIL01 2
COUPLING FACILITY: SIMDEV.IBM.EN.0000000CFCC1
PARTITION: 00
CPCID: 00
ACTUAL SIZE
: 14336 K
STORAGE INCREMENT SIZE: 256 K
ENTRIES: IN-USE:
0 TOTAL:
36507,
LOCKS:
TOTAL:
2097152
PHYSICAL VERSION: C0D4E82E BDEF7B05
LOGICAL VERSION: C0D4E82E BDEF7B05
SYSTEM-MANAGED PROCESS LEVEL: 8
XCF GRPNAME
: IXCLO000
DISPOSITION
: KEEP
ACCESS TIME
: 0
NUMBER OF RECORD DATA LISTS PER CONNECTION: 16
MAX CONNECTIONS: 4
# CONNECTIONS : 3
CONNECTION NAME
---------------ZZZZZZZZ#@$1
ZZZZZZZZ#@$2
ZZZZZZZZ#@$3
ID
-03
02
01
VERSION
-------00030077
0002006C
00010091
DIAGNOSTIC INFORMATION:
SYSNAME
-------#@$1
#@$2
#@$3
JOBNAME
-------SMSVSAM
SMSVSAM
SMSVSAM
0% FULL
ASID
---000A
000A
000A
STATE
---------ACTIVE
ACTIVE
ACTIVE
153
154
...
IXC544I REALLOCATE PROCESSING FOR STRUCTURE CIC_DFHSHUNT_001 883
WAS NOT ATTEMPTED BECAUSE
STRUCTURE IS ALLOCATED IN PREFERRED CF
IXC574I EVALUATION INFORMATION FOR REALLOCATE PROCESSING 884
OF STRUCTURE DFHXQLS_#@$STOR1
SIMPLEX STRUCTURE ALLOCATED IN COUPLING FACILITY: FACIL01
ACTIVE POLICY INFORMATION USED.
CFNAME
STATUS/FAILURE REASON
---------------------------FACIL01
PREFERRED CF 1
INFO110: 00000003 AC007800 0000000E
FACIL02
PREFERRED CF ALREADY SELECTED
INFO110: 00000003 AC007800 0000000E
Figure 7-85 IXC544I message when CF structure is not selected for reallocation
When the entire REALLOCATE process completes for all structures, the processing issues
message IXC545I and a report summarizing the actions that were taken as a whole. See
Figure 7-86 for an example of the messages issued.
...
IXC545I REALLOCATE PROCESSING RESULTED IN THE FOLLOWING: 904
0 STRUCTURE(S) REALLOCATED - SIMPLEX
2 STRUCTURE(S) REALLOCATED - DUPLEXED
0 STRUCTURE(S) POLICY CHANGE MADE - SIMPLEX
0 STRUCTURE(S) POLICY CHANGE MADE - DUPLEXED
28 STRUCTURE(S) ALREADY ALLOCATED IN PREFERRED CF - SIMPLEX
0 STRUCTURE(S) ALREADY ALLOCATED IN PREFERRED CF - DUPLEXED
0 STRUCTURE(S) NOT PROCESSED
25 STRUCTURE(S) NOT ALLOCATED
145 STRUCTURE(S) NOT DEFINED
-------200 TOTAL
0 ERROR(S) ENCOUNTERED DURING PROCESSING
IXC543I THE REQUESTED START,REALLOCATE WAS COMPLETED. 905
Figure 7-86 IXC545I message issued after completion of the REALLOCATE command
Consider the following when you use the SETXCF START,REALLOCATE command:
Move structures out of a Coupling Facility following a CFRM policy change that deletes or
changes that Coupling Facility (for example, following a Coupling Facility upgrade or add).
Move structures back into a Coupling Facility following a CFRM policy change that adds or
restores the Coupling Facility (for example, following a Coupling Facility upgrade or add).
Clean up pending CFRM policy changes that may have accumulated for whatever reason,
even in the absence of any need for structure relocation.
Clean up simplex or duplexed structures that were allocated in or moved into the wrong
Coupling Facilities (for example, if the right Coupling Facility was not accessible at the
time of allocation).
Clean up duplexed structures that have primary and secondary reversed because of a
prior condition which resulted in having duplexing stopped.
Chapter 7. Coupling Facility considerations in a Parallel Sysplex
155
You can also use the REBUILD POPULATECF command to move structures between CFs.
SETXCF START,REBUILD,POPULATECF=cfname.
Figure 7-87 POPULATECF command
This rebuilds all structures defined in the current CFRM policy that are not in their preferred
CF. Sample output is shown in Figure 7-88 after the REBUILD POPULATECF command was
issued.
SETXCF START,REBUILD,POPCF=FACIL02
IXC521I REBUILD FOR STRUCTURE IGWLOCK00 459
HAS BEEN STARTED
IXC540I POPULATECF REBUILD FOR FACIL02 REQUEST ACCEPTED. 460
THE FOLLOWING STRUCTURES ARE PENDING REBUILD:
IGWLOCK00
ISGLOCK
IXC_DEFAULT_1
ISTGENERIC
I#$#RM
I#$#LOCK1
I#$#VSAM
I#$#OSAM
IRRXCF00_B001
Figure 7-88 Messages from REBUILD POPULATECF
The method of emptying a CF using the SETXCF START,REBUILD command has some
disadvantages:
All rebuilds are started at the same time, resulting in contention for the CFRM couple
dataset. This contention elongates the rebuild process for all affected structures, thus
making the rebuilds more disruptive to ongoing work
The IXC* (XCF signalling structures) do not participate in that process, but must instead
be separately rebuilt via manual commands on a structure-by-structure basis.
A duplexed structure cannot be rebuilt out of the target CF, so a separate step is needed
to explicitly unduplex it so that it can be removed from the target CF.
Figure 7-89 on page 157 illustrates the disadvantages of using the SETXCF START,REBUILD
command. In the figure, our CF named 1 FACIL02 has 2 DB2 Group Buffer Pool duplexed
structures and an 3 IXC* XCF signalling structure located in it.
156
D XCF,CF,CFNAME=FACIL02
IXC362I 18.47.58 DISPLAY XCF 014
CFNAME: FACIL02 1
COUPLING FACILITY
:
SIMDEV.IBM.EN.0000000CFCC2
PARTITION: 00
CPCID: 00
SITE
:
N/A
POLICY DUMP SPACE SIZE:
2000 K
ACTUAL DUMP SPACE SIZE:
2048 K
STORAGE INCREMENT SIZE:
256 K
CONNECTED SYSTEMS:
#@$1
#@$2
#@$3
STRUCTURES:
CIC_DFHLOG_001
I#$#LOCK1
I#$#VSAM
ISGLOCK
D#$#_GBP0(OLD) 2
I#$#OSAM
IGWLOCK00
ISTGENERIC
D#$#_GBP1(OLD)
I#$#RM
IRRXCF00_B001
IXC_DEFAULT_1 3
D#$#_GBP1(OLD) 2
IXC_DEFAULT_1 3
Maintenance mode
z/OS V1.9 includes support for placing Coupling Facilities into a new state called
maintenance mode. When a CF is in maintenance mode, it is logically ineligible for CF
157
structure allocation purposes, as if it had been removed from the CFRM Policy entirely
(although no CFRM Policy updates are required to accomplish this).
Subsequent rebuild or REALLOCATE processing will also tend to remove any CF structure
instances that were already allocated in that CF at the time it was placed into maintenance
mode.
In conjunction with the REALLOCATE command, the new maintenance mode support can
greatly simplify operational procedures related to taking a CF down for maintenance or
upgrade in a Parallel Sysplex. In particular, now the need is avoided to laboriously update or
maintain several alternate copies of the CFRM Policy that omit a particular CF to be removed
for maintenance.
Here we illustrate the maintenance mode command. In Figure 7-91, a display of the
ISTGENERIC structure shows that it is currently allocated in 1 CF2 and the CFRM Policy has
a preference list of 2 CF2 and then CF1.
D XCF,STR,STRNAME=ISTGENERIC
IXC360I 20.21.49 DISPLAY XCF 756
STRNAME: ISTGENERIC
STATUS: ALLOCATED
...
ALLOWREALLOCATE: YES
PREFERENCE LIST: CF2
CF1 2
ENFORCEORDER : NO
EXCLUSION LIST IS EMPTY
ACTIVE STRUCTURE
---------------ALLOCATION TIME: 06/08/2007 15:39:47
CFNAME
: CF2 1
COUPLING FACILITY: 002094.IBM.02.00000002991E
PARTITION: 1D
CPCID: 00
ACTUAL SIZE
: 16384 K
STORAGE INCREMENT SIZE: 512 K
...
Figure 7-91 Display of structure prior to invoking maintenance mode
In Figure 7-92, the SETXCF command is issued to rebuild ISTGENERIC from CF2 to CF1.
SETXCF START,REBUILD,STRNAME=ISTGENERIC,LOC=OTHER
IXC521I REBUILD FOR STRUCTURE ISTGENERIC
HAS BEEN STARTED
IXC367I THE SETXCF START REBUILD REQUEST FOR STRUCTURE
ISTGENERIC WAS ACCEPTED.
IXC526I STRUCTURE ISTGENERIC IS REBUILDING FROM
COUPLING FACILITY CF2 TO COUPLING FACILITY CF1.
REBUILD START REASON: OPERATOR INITIATED
INFO108: 00000028 00000028.
IXC521I REBUILD FOR STRUCTURE ISTGENERIC
HAS BEEN COMPLETED
Figure 7-92 Rebuild ISTGENERIC structure to CF1
On completion of the rebuild, a display of the ISTGENERIC structure shows that it has been
reallocated into 1 CF1, as shown in Figure 7-93 on page 159.
158
D XCF,STR,STRNAME=ISTGENERIC
IXC360I 20.26.48 DISPLAY XCF 767
STRNAME: ISTGENERIC
STATUS: ALLOCATED
...
PREFERENCE LIST: CF2
CF1
...
ACTIVE STRUCTURE
---------------ALLOCATION TIME: 07/18/2007 20:26:24
CFNAME
: CF1 1
COUPLING FACILITY: 002094.IBM.02.00000002991E
PARTITION: 0F CPCID: 00
. . .
Figure 7-93 Structure after being reallocated to CF1
Issuing the SETXCF command as shown in Figure 7-94, the CF named CF2 will be placed into
maintenance mode.
SETXCF START,MAINTMODE,CFNAME=CF2
IXC369I THE SETXCF START MAINTMODE REQUEST FOR COUPLING FACILITY
CF2 WAS SUCCESSFUL.
Figure 7-94 Invoking maintenance mode for CF2
SC70
IXC_DEFAULT_4
SYSTEM_OPERLOG(OLD)
SYSARC_PLEX0_RCL
SYSZWLM_991E2094
In Figure 7-96 on page 160, attempting to rebuild a structure back to a CF while the CF is still
in maintenance mode will not succeed and error messages are issued.
159
SETXCF START,REBUILD,STRNAME=ISTGENERIC,LOC=OTHER
IXC521I REBUILD FOR STRUCTURE ISTGENERIC
HAS BEEN STARTED
IXC367I THE SETXCF START REBUILD REQUEST FOR STRUCTURE
ISTGENERIC WAS ACCEPTED.
IXC522I REBUILD FOR STRUCTURE
ISTGENERIC IS BEING STOPPED
TO FALL BACK TO THE OLD STRUCTURE DUE TO
NO COUPLING FACILITY PROVIDING BETTER OR EQUIVALENT CONNECTIVITY
IXC521I REBUILD FOR STRUCTURE ISTGENERIC
HAS BEEN STOPPED
Figure 7-96 Attempting to allocate structure into CF while still in maintenance mode
With the CF in maintenance mode and structures still located in the CF, issue the SETXCF
START,REALLOCATE command to relocate these structures into an alternative CF, as shown in
Figure 7-97.
SETXCF START,REALLOCATE
IXC543I THE REQUESTED START,REALLOCATE WAS ACCEPTED.
. . .
IXC521I REBUILD FOR STRUCTURE IXC_DEFAULT_2
HAS BEEN STARTED
IXC526I STRUCTURE IXC_DEFAULT_2 IS REBUILDING FROM
COUPLING FACILITY CF2 TO COUPLING FACILITY CF1.
REBUILD START REASON: OPERATOR INITIATED
INFO108: 00000028 00000028.
IXC521I REBUILD FOR STRUCTURE IXC_DEFAULT_2
HAS BEEN COMPLETED
. . .
Figure 7-97 REALLOCATE while in maintenance mode
Figure 7-98 shows that after the REALLOCATE command is completed, the CF will have no 1
structures located in it and still be in 2 maintenance mode.
D XCF,CF,CFNAME=CF2
IXC362I 20.40.21 DISPLAY XCF 915
CFNAME: CF2
COUPLING FACILITY
:
002094.IBM.02.00000002991E
PARTITION: 1D CPCID: 00
SITE
:
N/A
POLICY DUMP SPACE SIZE:
2048 K
ACTUAL DUMP SPACE SIZE:
2048 K
STORAGE INCREMENT SIZE:
512 K
ALLOCATION NOT PERMITTED
MAINTENANCE MODE 2
CONNECTED SYSTEMS:
SC63
SC64
SC65
SC70
160
To remove maintenance mode from the CF, issue the SETXCF command as shown in
Figure 7-99.
SETXCF STOP,MAINTMODE,CFNAME=CF2
IXC369I THE SETXCF STOP MAINTMODE REQUEST FOR COUPLING FACILITY
CF2 WAS SUCCESSFUL.
Figure 7-99 Turn off maintenance mode
161
D XCF,STR,STRNM=IGWLOCK00
IXC360I 19.30.05 DISPLAY XCF 538
STRNAME: IGWLOCK00
STATUS: ALLOCATED
EVENT MANAGEMENT: POLICY-BASED
TYPE: LOCK
POLICY INFORMATION:
POLICY SIZE
: 20480 K
POLICY INITSIZE: 14336 K
POLICY MINSIZE : 0 K
FULLTHRESHOLD : 80
ALLOWAUTOALT : NO
REBUILD PERCENT: N/A
DUPLEX
: DISABLED
ALLOWREALLOCATE: YES
PREFERENCE LIST: FACIL02 FACIL01
ENFORCEORDER : NO
EXCLUSION LIST IS EMPTY
ACTIVE STRUCTURE
---------------ALLOCATION TIME: 07/02/2007 18:49:15
CFNAME
: FACIL01
COUPLING FACILITY: SIMDEV.IBM.EN.0000000CFCC1
PARTITION: 00
CPCID: 00
ACTUAL SIZE
: 14336 K
STORAGE INCREMENT SIZE: 256 K
ENTRIES: IN-USE:
0 TOTAL:
36507,
LOCKS:
TOTAL:
2097152
PHYSICAL VERSION: C0D5FEC2 D6736F02
LOGICAL VERSION: C0D5FEC2 D6736F02
SYSTEM-MANAGED PROCESS LEVEL: 8
XCF GRPNAME
: IXCLO000
DISPOSITION
: KEEP
ACCESS TIME
: 0
NUMBER OF RECORD DATA LISTS PER CONNECTION: 16
MAX CONNECTIONS: 4
# CONNECTIONS : 3
0% FULL
162
SETXCF FORCE,CON,CONNM=ZZZZZZZZ#@$1,STRNAME=IGWLOCK00
IXC354I THE SETXCF FORCE REQUEST FOR CONNECTION 638
ZZZZZZZZ#@$1 IN STRUCTURE IGWLOCK00 WAS REJECTED:
FORCE CONNECTION NOT ALLOWED FOR PERSISTENT LOCK OR SERIALIZED LIST
Figure 7-102 Removing a connection
The message in Figure 7-102 was received because the IGWLOCK00 is a LOCK type
structure, so removing a connection to it could result in undetected data loss.
When the structure enters into this state, it means that the structure is a target for deletion.
Depending on the application that owns the structure, you may need to restart the application
for the structure to become allocated in the alternate CF. If this situation has been created by
a CF failure, then when the failed CF is eventually restored, XES resolves this condition and
this information is no longer displayed.
Another way to remove this condition is by removing the failed CF from the active CFRM
policy. IPLing the z/OS images does not clear this condition. The number of connectors in
message IXC360I (as shown in Displaying a Structure with FAILED-PERSISTENT
Connections in Figure 7-81 on page 150) must be zero (0) before proceeding. If ACTIVE
connectors exist, invoke recovery procedures for the connector, or CANCEL the connector's
address space to make the connector disconnect from a structure. Issue SETXCF FORCE to
delete the structure.
SETXCF FORCE,STR,STRNM=IGWLOCK00
IXC353I THE SETXCF FORCE REQUEST FOR STRUCTURE IGWLOCK00 WAS ACCEPTED:
REQUEST WILL BE PROCESSED ASYNCHRONOUSLY
Figure 7-104 Forcing a structure
Note: If ACTIVE connectors exist, a message similar to the one shown in Figure 7-105 will
be received.
163
If the structure remains, check the owner of the application and inform the appropriate
support personnel.
164
Chapter 8.
165
Sysplex
ARM
BPXMCDS
CFRM
LOGR
SFM
WLM
Notice that the Couple Data Sets are named after the system components that use them. Not
all of these components must be active in a Parallel Sysplex, however. The following list
identifies which Couple Data Sets are mandatory and which ones are optional:
In a Parallel Sysplex, the sysplex CDS and the CFRM CDS are mandatory because they
describe the Parallel Sysplex environment you are running.
166
Although the WLM CDS is not mandatory for a sysplex, it has been a part of the z/OS
Operating System since z/OS v1.4. Most sites are now running in WLM Goal mode, so the
WLM CDS will be active in most sites.
Use of the remaining four Couple Data Sets is optional. Their use in a sysplex may vary
from site to site and will depend on which functions have been enabled in your sysplex.
Couple Data Sets contain a policy, which is a set of rules and actions that systems in the
sysplex follow. For example, the WLM policy describes the performance goals and the
importance of the different workloads running in the sysplex.
Most Couple Data Sets contain multiple policies. Only one of these policies may be active at
a time. However, a new policy can be activated dynamically by using the SETXCF command.
The seven Couple Data Sets and their contents are described briefly in Table 8-1.
Table 8-1 CDS type and description
CDS
Description
Sysplex
This is the most important CDS in the sysplex. It contains the active XCF policy, which describes the
Couple Data Set and signal connectivity configuration of the sysplex and failure-related timeout values,
such as the interval after which a system is considered to have failed.
It also holds control information about the sysplex, such as:
The system status information for every system in the sysplex.
Information about XCF groups and the members of those groups.
Information about the other Couple Data Sets defined to the sysplex.
ARM
This CDS contains the active ARM policy. This policy describes how ARM registered started tasks and
batch jobs should be restarted if they abend.
BPXMCDS
This CDS contains information that is used to support the shared HFS and zFS facility in the sysplex.
This CDS does not contain a policy.
CFRM
This CDS contains the active CFRM policy and status information about the CFs. The CFRM policy
describes the CFs that are used by the sysplex and the attributes of the CF structures that can be
allocated in them.
LOGR
This CDS contains one LOGR policy. The LOGR policy describes the structures and logstreams that
you can define. It also contains information about the Logger staging data sets and offload data sets.
You could say that this CDS is like a catalog for Logger offload data sets.
SFM
This CDS contains the SFM policy. The SFM policy describes how the systems in the sysplex will
manage a system failure, a signalling connectivity failure or a CF connectivity failure.
WLM
This CDS contains the WLM policy. The WLM policy describes the performance goals and the
importance of the different workloads running in the sysplex.
167
If a system loses access to all the SFM CDSs, SFM will be disabled across the entire
sysplex.
To avoid an outage to the sysplex, it is good practice to run with a primary and an alternate
CDS. The primary CDS is used for all read and write operations. The alternate CDS will only
be used for write operations. This is to ensure the currency of the alternate CDSs contents. If
the primary CDS fails, the sysplex will automatically switch to the alternate CDS and drop the
primary CDS from the its configuration. This leaves the sysplex running on a single CDS. If
you have a spare CDS defined, you can add this to the sysplex configuration dynamically to
ensure your sysplex continues to run with two CDSs.
To avoid contention on the Couple Data Sets during recovery processing, place the primary
sysplex CDS and primary CFRM CDS on separate volumes. Normally, these Couple Data
Sets are not busy. However, during recovery processing, they can both become very busy.
We recommend the following Couple Data Set configuration:
Define three Couple Data Sets for each component: a primary CDS, an alternate CDS,
and a spare CDS.
Run with a primary and an alternate CDS.
Place the primary sysplex CDS and primary CFRM CDS on separate volumes.
Follow the recommended CDS layout listed in Table 8-2 for a single site sysplex.
Table 8-2 CDS layout
Volume 1
Volume 2
Volume 3
Primary sysplex
Alternate sysplex
Spare sysplex
Alternate CFRM
Spare CFRM
Primary CFRM
Spare LOGR
Primary LOGR
Alternate LOGR
Primary SFM
Alternate SFM
Spare SFM
Primary ARM
Alternate ARM
Spare ARM
Alternate WLM
Spare WLM
Primary WLM
Spare BPXMCDS
Primary BPXMCDS
Alternate BPXMCDS
168
parmlib member, this information is not deleted from the sysplex CDS. Instead, this
information remains in the sysplex CDS until it is replaced by a new definition.
After the systems are active in a sysplex, it is possible to change the Couple Data Set
configuration dynamically by using the SETXCF COUPLE command.
169
D XCF,COUPLE,TYPE=ALL
IXC358I 00.44.28 DISPLAY XCF 877
SYSPLEX COUPLE DATA SETS
PRIMARY
DSN: SYS1.XCF.CDS01
VOLSER: #@$#X1
DEVN: 1D06
FORMAT TOD
MAXSYSTEM MAXGROUP(PEAK) MAXMEMBER(PEAK)
11/20/2002 16:27:24
4
100
(52)
203
(18)
ADDITIONAL INFORMATION:
ALL TYPES OF COUPLE DATA SETS ARE SUPPORTED
GRS STAR MODE IS SUPPORTED
ALTERNATE DSN: SYS1.XCF.CDS02
VOLSER: #@$#X2
DEVN: 1D07
FORMAT TOD
MAXSYSTEM MAXGROUP
MAXMEMBER
11/20/2002 16:27:28
4
100
203
ADDITIONAL INFORMATION:
ALL TYPES OF COUPLE DATA SETS ARE SUPPORTED
GRS STAR MODE IS SUPPORTED
ARM COUPLE DATA SETS
PRIMARY
DSN: SYS1.XCF.ARM01
VOLSER: #@$#X1
DEVN: 1D06
FORMAT TOD
MAXSYSTEM
11/20/2002 15:08:01
4
ADDITIONAL INFORMATION:
NOT PROVIDED
ALTERNATE DSN: SYS1.XCF.ARM02
VOLSER: #@$#X2
DEVN: 1D07
VOLSER: #@$#X2
DEVN: 1D07
FORMAT TOD
MAXSYSTEM
11/20/2002 15:08:04
4
ADDITIONAL INFORMATION:
NOT PROVIDED
ARM IN USE BY ALL SYSTEMS
1
BPXMCDS COUPLE DATA SETS
PRIMARY
DSN: SYS1.XCF.OMVS01
VOLSER: #@$#X1
DEVN: 1D06
FORMAT TOD
MAXSYSTEM
11/20/2002 15:24:18
4
ADDITIONAL INFORMATION:
. . .
Figure 8-1 Displaying CDS information
If you want to display a specific CDS, use the command D XCF,COUPLE,TYPE=xxxx, where xxxx
is the component name. For a list of components, refer to Table 8-1 on page 167.
170
An example of the response to this command when SFM is not active is shown in Figure 8-3.
The last line of the output shows that SFM is not active 3.
D XCF,POLICY,TYPE=SFM
IXC364I 19.07.44 DISPLAY XCF 727
TYPE: SFM
POLICY NOT STARTED 3
Figure 8-3 SFM policy display when SMF is inactive
If your system programmer asks you to stop a policy, use the following command. In this
example we are stopping an SFM policy:
SETXCF STOP,POLICY,TYPE=SFM
An example of the system response to this command is shown in Figure 8-5 on page 172.
171
SETXCF STOP,POLICY,TYPE=SFM
IXC607I SFM POLICY HAS BEEN STOPPED BY SYSTEM #@$2
Figure 8-5 Console messages when stopping SFM policy
172
D XCF,COUPLE,TYPE=SFM
IXC358I 02.46.14 DISPLAY XCF 785
SFM COUPLE DATA SETS
PRIMARY
DSN: SYS1.XCF.SFM01 1
VOLSER: #@$#X1
DEVN: 1D06
FORMAT TOD
MAXSYSTEM
11/20/2002 16:08:53
4
ADDITIONAL INFORMATION:
FORMAT DATA
POLICY(9) SYSTEM(16) RECONFIG(4)
ALTERNATE DSN: SYS1.XCF.SFM02 2
VOLSER: #@$#X2
DEVN: 1D07
FORMAT TOD
MAXSYSTEM
11/20/2002 16:08:53
4
ADDITIONAL INFORMATION:
FORMAT DATA
POLICY(9) SYSTEM(16) RECONFIG(4)
SFM IN USE BY ALL SYSTEMS 3
Figure 8-6 Current SFM CDS configuration
The first step is to replace the existing alternate CDS with the replacement primary CDS. To
do this, issue the following command:
SETXCF COUPLE,ACOUPLE=SYS1.XCF.SFM03,TYPE=SFM
Figure 8-7 shows the resulting messages that are issued.
SETXCF COUPLE,ACOUPLE=SYS1.XCF.SFM03,TYPE=SFM
IXC309I SETXCF COUPLE,ACOUPLE REQUEST FOR SFM WAS ACCEPTED
IXC260I ALTERNATE COUPLE DATA SET REQUEST FROM SYSTEM 792
#@$2 FOR SFM IS NOW BEING PROCESSED.
IXC253I ALTERNATE COUPLE DATA SET 794
SYS1.XCF.SFM02 FOR SFM
IS BEING REMOVED BECAUSE OF A SETXCF COUPLE,ACOUPLE OPERATOR COMMAND
DETECTED BY SYSTEM #@$2
IXC263I REMOVAL OF THE ALTERNATE COUPLE DATA SET 797
SYS1.XCF.SFM02 FOR SFM IS COMPLETE
IXC251I NEW ALTERNATE DATA SET 798
SYS1.XCF.SFM03
FOR SFM HAS BEEN MADE AVAILABLE
Figure 8-7 Replacing the alternate Couple Data Set
In Figure 8-8 on page 174, you can see the WTOR 1 that you may receive when you use the
SETXCF COUPLE,ACOUPLE command to add a new alternate CDS. This WTOR asks you to
confirm that the alternate CDS can be used. It is issued because the CDS has been used
before in the sysplex.
Attention: If this WTOR is issued, consult your system programmer before replying.
173
SETXCF COUPLE,ACOUPLE=SYS1.XCF.SFM03,TYPE=SFM
IXC309I SETXCF COUPLE,ACOUPLE REQUEST FOR SFM WAS ACCEPTED
IXC260I ALTERNATE COUPLE DATA SET REQUEST FROM SYSTEM 817
#@$2 FOR SFM IS NOW BEING PROCESSED.
IXC248E COUPLE DATA SET 819
SYS1.XCF.SFM03 ON VOLSER #@$#X2
FOR SFM MAY BE IN USE BY ANOTHER SYSPLEX.
013 IXC247D REPLY U TO ACCEPT USE OR D TO DENY USE OF THE COUPLE DATA
SET FOR SFM.
1
R 13,U
IEE600I REPLY TO 013 IS;U
IXC251I NEW ALTERNATE DATA SET 824
SYS1.XCF.SFM03
FOR SFM HAS BEEN MADE AVAILABLE
Figure 8-8 Accept or deny Couple Data Set WTOR
Display the Couple Data Set configuration again by issuing the following command:
D XCF,COUPLE,TYPE=SFM
An example of the response from this command in Figure 8-9 shows the current SFM CDS
configuration. Youll notice the alternate CDS 1 is the replacement primary CDS that was
added with the previous command.
D XCF,COUPLE,TYPE=SFM
IXC358I 02.48.16 DISPLAY XCF 801
SFM COUPLE DATA SETS
PRIMARY
DSN: SYS1.XCF.SFM01
VOLSER: #@$#X1
DEVN: 1D06
FORMAT TOD
MAXSYSTEM
11/20/2002 16:08:53
4
ADDITIONAL INFORMATION:
FORMAT DATA
POLICY(9) SYSTEM(16) RECONFIG(4)
ALTERNATE DSN: SYS1.XCF.SFM03 1
VOLSER: #@$#X1
DEVN: 1D06
FORMAT TOD
MAXSYSTEM
06/27/2007 02:39:06
4
ADDITIONAL INFORMATION:
FORMAT DATA
POLICY(9) SYSTEM(16) RECONFIG(4)
SFM IN USE BY ALL SYSTEMS
Figure 8-9 Current SFM CDS configuration
The second step is to remove the existing primary CDS and replace it with the replacement
primary CDS by issuing the following command:
SETXCF COUPLE,PSWITCH,TYPE=SFM
An example of the response from this command in Figure 8-10 on page 175 shows the
messages that are issued. These messages contain a warning to indicate you are processing
without an alternate CDS 1.
174
SETXCF COUPLE,PSWITCH,TYPE=SFM
IXC309I SETXCF COUPLE,PSWITCH REQUEST FOR SFM WAS ACCEPTED
IXC257I PRIMARY COUPLE DATA SET 805
SYS1.XCF.SFM01 FOR SFM
IS BEING REPLACED BY
SYS1.XCF.SFM03 DUE TO OPERATOR REQUEST
IXC263I REMOVAL OF THE PRIMARY COUPLE DATA SET 808
SYS1.XCF.SFM01 FOR SFM IS COMPLETE
IXC267E PROCESSING WITHOUT AN ALTERNATE 809
COUPLE DATA SET FOR SFM.
ISSUE SETXCF COMMAND TO ACTIVATE A NEW ALTERNATE 1.
Figure 8-10 Replacing the primary Couple Data Set
Display the Couple Data Set configuration again by issuing the following command:
D XCF,COUPLE,TYPE=SFM
An example of the response from this command in Figure 8-11 shows the current SFM CDS
configuration. Notice that there is no alternate CDS in the configuration.
D XCF,COUPLE,TYPE=SFM
IXC358I 02.49.30 DISPLAY XCF 811
SFM COUPLE DATA SETS
PRIMARY
DSN: SYS1.XCF.SFM03
VOLSER: #@$#X1
DEVN: 1D06
FORMAT TOD
MAXSYSTEM
06/27/2007 02:39:06
4
ADDITIONAL INFORMATION:
FORMAT DATA
POLICY(9) SYSTEM(16) RECONFIG(4)
SFM IN USE BY ALL SYSTEMS
Figure 8-11 Current SFM CDS configuration
The final step is add the replacement alternate CDS by issuing the following command:
SETXCF COUPLE,ACOUPLE=SYS1.XCF.SFM04,TYPE=SFM
An example of the response from this command in Figure 8-12 shows the messages that are
issued.
SETXCF COUPLE,ACOUPLE=SYS1.XCF.SFM04,TYPE=SFM
IXC309I SETXCF COUPLE,ACOUPLE REQUEST FOR SFM WAS ACCEPTED
IXC260I ALTERNATE COUPLE DATA SET REQUEST FROM SYSTEM 183
#@$2 FOR SFM IS NOW BEING PROCESSED.
IXC251I NEW ALTERNATE DATA SET 185
SYS1.XCF.SFM04
FOR SFM HAS BEEN MADE AVAILABLE
Figure 8-12 Replacing the alternate Couple Data Set
Display the Couple Data Set configuration again by issuing the following command:
D XCF,COUPLE,TYPE=SFM
175
An example of the response from this command in Figure 8-13 shows the current SFM CDS
configuration.
D XCF,COUPLE,TYPE=SFM
IXC358I 02.51.48 DISPLAY XCF 826
SFM COUPLE DATA SETS
PRIMARY
DSN: SYS1.XCF.SFM03
VOLSER: #@$#X1
DEVN: 1D06
FORMAT TOD
MAXSYSTEM
06/27/2007 02:39:06
4
ADDITIONAL INFORMATION:
FORMAT DATA
POLICY(9) SYSTEM(16) RECONFIG(4)
ALTERNATE DSN: SYS1.XCF.SFM04
VOLSER: #@$#X2
DEVN: 1D07
FORMAT TOD
MAXSYSTEM
06/27/2007 02:39:06
4
ADDITIONAL INFORMATION:
FORMAT DATA
POLICY(9) SYSTEM(16) RECONFIG(4)
SFM IN USE BY ALL SYSTEMS
Figure 8-13 Current SFM CDS configuration
After you have completed changing the CDS configuration, the COUPLExx parmlib member
must be updated to reflect the new configuration.
176
177
An example of some of the messages you may see in this scenario are displayed in
Figure 8-15.
Messages such as 1, 2, and 3 are issued periodically to indicate that *MASTER* and
XCFAS are incurring I/O delays during the 5-minute timeout.
Messages such as 4 are issued periodically to indicate which CDS is experiencing I/O
delays and for how long.
A message such as 5 is issued to indicate that the CDS has been removed because of an
I/O error.
A message such as 6 is issued to warn you that there is no alternate for this CDS.
IOS071I 1D06,**,*MASTER*, START PENDING
. . .
178
An example of some of the messages you may see in this scenario are shown in Figure 8-16.
These messages are almost identical to the messages you see when you lose access to a
primary CDS.
Messages such as 1, 2, and 3 are issued periodically to indicate that *MASTER* and
XCFAS are incurring I/O delays during the 5-minute timeout.
Messages such as 4 are issued periodically to indicate which CDS is experiencing I/O
delays and for how long.
A message such as 5 is issued to indicate that the CDS has been removed because of an
I/O error.
A message such as 6 is issued to warn you that there is no alternate for this CDS.
IOS071I 1D06,**,*MASTER*, START PENDING
. . .
Sysplex CDS
If a system loses access to both the primary and alternate sysplex CDSs, that system would
be unable to update its system status and as a result it would be partitioned out of the
sysplex.
179
ARM CDS
If a system loses access to both the primary and alternate ARM CDSs, ARM services on that
system are disabled until a primary CDS is assigned.
BPXMCDS CDS
If a system loses access to both the primary and alternate BPXCDS CDSs, that system loses
the ability to share UNIX System Services file systems with other systems in the sysplex.
CFRM CDS
If a system loses access to both the primary and alternate CFRM CDSs, that system is
placed in a non-restartable disabled wait state X' 0A2' reason code X'9C'.
LOGR CDS
If a system loses access to both the primary and alternate LOGR CDSs, the logger loses
connectivity to its inventory data set. The logger address space on that system terminates
itself.
SFM CDS
If a system loses access to both the primary and alternate SFM CDSs, SFM is disabled
across the entire sysplex.
WLM CDS
If a system loses access to both the primary and alternate WLM CDSs, then Workload
Manager continues to run, using the policy information that was in effect at the time of the
failure. WLM is described as being in independent mode, operating only on local data, and
does not transmit data to other members of the sysplex.
180
181
182
Chapter 9.
XCF management
This chapter describes the Cross-System Coupling Facility (XCF) and operational aspects of
XCF including:
XCF signalling
XCF groups
XCF system monitoring
183
184
SYSA
SYSB
SYSC
SYSD
PATHIN and PATHOUT device numbers are defined in the COUPLExx parmlib member.
An example of CTC addressing standards is given here.
All PATHINs begin with 40xy, where:
x is the system where the communication is being sent from.
y is the device number 0-7 on one CHPID and 8-F on another CHPID.
All PATHOUTs begin with 50xy, where:
x is the system where the communication is being sent to.
y is the device number 0-7 on one CHPID and 8-F on another CHPID.
Chapter 9. XCF management
185
SYSB
SYSA
CF
SYSC
SYSD
The COUPLExx parmlib member defines the PATHOUT and PATHIN with a structure
name of IXC_xxxxx (the structure name must begin with IXC).
Multiple signalling structures can be specified in the COUPLExx member.
The structure name must be in the active CFRM policy.
During IPL, z/OS will establish a signalling path to every other image using the CF.
186
D XCF,PI
IXC355I 15.08.29 DISPLAY XCF 422
PATHIN FROM SYSNAME: ???????? - PATHS NOT CONNECTED TO OTHER SYSTEMS
DEVICE (LOCAL/REMOTE): 4010/???? 4018/????
1
PATHIN FROM SYSNAME: SC64
DEVICE (LOCAL/REMOTE): 4020/5010 4028/5018
2
STRNAME:
IXC_DEFAULT_1
IXC_DEFAULT_2
3
...
Figure 9-3 D XCF,PATHIN command
Figure 9-4 shows output from a D XCF,PI command that was entered on system SC64:
1 The question marks (?) after PATHIN devices 4010 and 4018 indicate that they are not
connected to a PATHOUT device.
2 The PATHIN CTC device number 4020 on a system named SC63 (local) is connected to
PATHOUT CTC device number 5010 on the system named SC64 (remote).
3 IXC_DEFAULT_1 and IXC_DEFAULT_2 are CF structures that are being used by SC63 for
PATHIN and PATHOUT signalling to the other systems in the sysplex.
Both CTCs and CF structures are being used for signalling. We recommend that you allocate
each signalling structure in separate CFs.
Linking
187
Restarting
Inoperative
Stopping
D XCF,PI,DEV=ALL
IXC356I 12.00.52 DISPLAY XCF 501
LOCAL DEVICE
REMOTE
PATHIN
PATHIN
SYSTEM
STATUS
4010
???????? INOPERATIVE
4018
???????? INOPERATIVE
4020
SC64
WORKING
4028
SC64
WORKING
...
REMOTE
PATHOUT RETRY
????
100
????
100
5010
100
5018
100
LAST MXFER
MAXMSG RECVD TIME
750
1
750
1
750 65393
274
750 39713
591
REMOTE
PATHIN
????
????
4010
4018
RETRY
100
100
100
100
LAST MXFER
MAXMSG RECVD TIME
750
1
750
1
750 65393
274
750 39713
591
188
Working
Starting
Linking
Restarting
Inoperative
The signalling path is defined to XCF but is not usable until hardware
or definition problems are resolved.
Stopping
D XCF,PI,STRNM=ALL
IXC356I 00.34.17 DISPLAY XCF 058
STRNAME
REMOTE PATHIN
PATHIN
SYSTEM STATUS
IXC_DEFAULT_1
WORKING
#@$1
WORKING
#@$2
WORKING
IXC_DEFAULT_2
WORKING
#@$1
WORKING
#@$2
WORKING
STRNAME
PATHIN
LIST
IXC_DEFAULT_1
9
11
IXC_DEFAULT_2
9
11
UNUSED 1
PATHS RETRY
6
10
10
LAST MXFER
MAXMSG RECVD TIME
2000
18369 1911
55620 2332
2000
66492 1936
74970 2116
REMOTE
SYSTEM
PATHIN
STATUS
DELIVRY BUFFER
PENDING LENGTH
MSGBUF SIGNL
IN USE NUMBR NOBUF
#@$1
#@$2
WORKING
WORKING
0
0
956
956
36 18369
0 55620
0
0
#@$1
#@$2
WORKING
WORKING
0
0
956
956
50 66492
12 74970
0
0
1 Unused paths - the values shown in this column indicate the number of lists in the list
structure that are available for use as signalling paths.
189
D XCF,PO,STRNAME=ALL
IXC356I 02.42.58 DISPLAY XCF 418
STRNAME
REMOTE PATHOUT
PATHOUT
SYSTEM STATUS
IXC_DEFAULT_1
WORKING
#@$1
WORKING
#@$2
WORKING
IXC_DEFAULT_2
WORKING
#@$1
WORKING
#@$2
WORKING
STRNAME
REMOTE PATHOUT
PATHOUT
LIST SYSTEM STATUS
IXC_DEFAULT_1
8 #@$1
WORKING
10 #@$2
WORKING
IXC_DEFAULT_2
8 #@$1
WORKING
10 #@$2
WORKING
UNUSED
PATHS RETRY
6
10
10
TRANSFR BUFFER
PENDING LENGTH
TRANSPORT
MAXMSG CLASS
2000 DEFAULT
2000 DEFAULT
0
0
956
956
24 62772
18 62749
1480
2189
0
0
956
956
16 21848
28 70706
783
4394
{PATHOUT,{DEVICE=([/]outdevnum[,[/]outdevnum]...)} }
{STRNAME=(strname[,strname]...)
}
[,UNCOND=NO|YES]
Figure 9-9 Syntax of SETXCF STOP command
The syntax for the SETXCF START command is the same as the SETCXCF STOP command shown
in Figure 9-9.
An example of the output of a SETXCF STOP command of an inbound signalling structure path
is shown in Figure 9-10 on page 191. The command was issued on system name #@$1.
190
SETXCF STOP,PI,STRNAME=IXC_DEFAULT_2
IXC467I STOPPING PATHIN STRUCTURE IXC_DEFAULT_2 032
RSN: OPERATOR REQUEST
IXC467I STOPPING PATHIN STRUCTURE IXC_DEFAULT_2 LIST 9 033
USED TO COMMUNICATE WITH SYSTEM #@$2
RSN: PROPAGATING STOP OF STRUCTURE
DIAG073: 08690003 0E011000 01000000 00000000 00000000
IXC467I STOPPING PATHIN STRUCTURE IXC_DEFAULT_2 LIST 11 034
USED TO COMMUNICATE WITH SYSTEM #@$3
RSN: PROPAGATING STOP OF STRUCTURE
DIAG073: 08690003 0E011000 01000000 00000000 00000000
...
IXC467I STOPPING PATHOUT STRUCTURE IXC_DEFAULT_2 LIST 9 756
USED TO COMMUNICATE WITH SYSTEM #@$1
RSN: OTHER SYSTEM STOPPING ITS SIDE OF PATH
...
IXC307I STOP PATHIN REQUEST FOR STRUCTURE IXC_DEFAULT_2 035
LIST 9 TO COMMUNICATE WITH SYSTEM #@$2 COMPLETED
SUCCESSFULLY: PROPAGATING STOP OF STRUCTURE
IXC307I STOP PATHOUT REQUEST FOR STRUCTURE IXC_DEFAULT_2 757
LIST 9 TO COMMUNICATE WITH SYSTEM #@$1 COMPLETED
SUCCESSFULLY: OTHER SYSTEM STOPPING ITS SIDE OF PATH
...
1 The command output shows the signalling structure (IXC_DEFAULT_2), which was being
used as an inbound path (PATHIN) to system #@$1, being stopped.
2 It further shows the outbound paths (PATHOUT), which were being used on the other
systems to communicate with the inbound path (PATHIN) on #@$1, also being stopped. As a
result, the other systems in the sysplex can no longer communicate with #@$1 via the
IXC_DEFAULT_2 signalling structure.
After stopping the inbound (PATHIN) path from the IXC_DEFAULT_2 signalling structure on
#@$1, the PATHIN and PATHOUTs on #@$1 and PATHOUTs on both #@$2 and #@$3
were displayed, with the output shown in Figure 9-11 on page 192.
191
D XCF,PI
IXC355I 19.43.39 DISPLAY XCF 040
PATHIN FROM SYSNAME: #@$2
STRNAME:
IXC_DEFAULT_1
PATHIN FROM SYSNAME: #@$3
STRNAME:
IXC_DEFAULT_1
D XCF,PO
IXC355I 19.47.54 DISPLAY XCF 066
PATHOUT TO SYSNAME:
#@$2
STRNAME:
IXC_DEFAULT_1
PATHOUT TO SYSNAME:
#@$3
STRNAME:
IXC_DEFAULT_1
RO #@$2,D XCF,PO
RESPONSE=#@$2
IXC355I 19.52.41 DISPLAY XCF 796
PATHOUT TO SYSNAME:
#@$1
STRNAME:
IXC_DEFAULT_1
PATHOUT TO SYSNAME:
#@$3
STRNAME:
IXC_DEFAULT_1
RO #@$3,D XCF,PO
RESPONSE=#@$3
IXC355I 21.41.52 DISPLAY XCF 368
PATHOUT TO SYSNAME:
#@$1
STRNAME:
IXC_DEFAULT_1
PATHOUT TO SYSNAME:
#@$2
STRNAME:
IXC_DEFAULT_1
IXC_DEFAULT_2
IXC_DEFAULT_2
3
IXC_DEFAULT_2
4
IXC_DEFAULT_2
1 The display of the PATHIN on #@$1 shows the inbound path is only using the
IXC_DEFAULT_1 structure.
2 The display of the PATHOUT on #@$1 shows the outbound path using both structures,
IXC_DEFAULT_1 and IXC_DEFAULT_2.
3 The display of the PATHOUT on #@$2 shows the outbound path is only using the
IXC_DEFAULT_1 structure.
4 the display of the PATHOUT on #@$3 shows the outbound path is only using the
IXC_DEFAULT_1 structure.
192
throughout the sysplex. The command returns information regarding the size of messages
being sent through the transport class to all members of the sysplex, and it identifies current
buffer usage needed to support the load.
D XCF,CD,CLASS=ALL
IXC344I 01.53.35 DISPLAY XCF 399
TRANSPORT
CLASS
DEFAULT
CLASS
LENGTH
MAXMSG
DEFAULT
956
2000
ASSIGNED
GROUPS
UNDESIG
0
2
3
0
2
3
0
2
With this information you can determine how the transport classes are being used and,
potentially, whether additional transport classes are needed to help signalling performance. In
Figure 9-12, most of the messages being sent fit into the DEFAULT buffer 2, which is the only
transport class defined (956 bytes) 1. There is other traffic which could use a larger buffer
(8 k), and which would benefit from having an additional transport class defined 3.
For more detailed information about transport classes and their performance considerations,
refer to the white paper Parallel Sysplex Performance: XCF Performance Considerations
authored by Joan Kelley and Kathy Walsh, which is available at:
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP100743/
193
Problem diagnosis and recovery should be performed on the failing device. After the problem
has been resolved and the device has been successfully varied online, XCF will start using it
again, as shown in Figure 9-14.
V 4020,ONLINE,UNCOND
IEE302I 4020
ONLINE
IXC306I START PATHIN REQUEST FOR DEVICE 4020 COMPLETED 339
SUCCESSFULLY: DEVICE CAME ONLINE
IXC466I INBOUND SIGNAL CONNECTIVITY ESTABLISHED WITH SYSTEM SC64 340
VIA DEVICE 4020 WHICH IS CONNECTED TO DEVICE 5010
Figure 9-14 CTC signalling path recovery
For examples of signalling problems during IPLs, refer to 3.6.6, Unable to establish XCF
connectivity on page 56.
COFVLFNO(3)
IDAVQUI0(3)
CSQGPSMG(3)
IGWXSGIS(6)
SYSRMF(3)
SYSXCF(2)
SYSTTRC(3)
XCFJES2A(3)
Group names are important because they are one way of segregating XCF message traffic.
Table 9-1 on page 195 lists common IBM XCF groups. Other vendor products may have other
XCF groups, which are not listed in the table.
194
Group Name
Exploiter
Group Name
IXCLOxxx
APPC, ASCH
SYSATBxx
OMVS
SYSBPX
CICS MRO
DFHIR000
RACF
IRRXCF00
RMF
SYSRMF
CICSVR
DWWCVRCM
RRS
ATRRRS
Console Services
SYSMCS, SYSMCS2
SA for z/OS
INGXSGxx
DAE
SYSDAE
TCP/IP
EZBTCPCS
DB2
SYSIGW00, SYSIGW01
Trace
SYSTTRC
ENF
SYSENF
TSO Broadcast
SYSIKJBC
GRS
SYSGRS, SYSGRS2
VLF
COFVLFNO
HSM
ARCxxxxxx
VSAM/RLS
IDAVQUI0, IGWXSGIS,
SYSIGW01, SYSIGW02,
SYSIGW03
ESCM
VTAM, TCP/IP
ISTXCF, ISTCFS01
IMS
WebSphere MQ
CSQGxxxx
IOS
SYSIOSxx
WLM
SYSWLM
IRLM
DXRxxx
XES
IXCLOxxx
JES2 MAS
zFS
IOEZFS
JES3 complex
You can use the D XCF,G,<groupname> command, as shown in Figure 9-16, to find out what
members are in a given group.
D XCF,G,ISTXCF
IXC332I 02.16.52
GROUP ISTXCF:
#@$2M$$$USIBMSC
195
D XCF,G,ISTXCF,ALL
IXC333I 02.38.41 DISPLAY XCF 723
INFORMATION FOR GROUP ISTXCF
MEMBER NAME:
SYSTEM:
#@$1M$$$USIBMSC
#@$1
#@$2M$$$USIBMSC
#@$2
#@$3M$$$USIBMSC
#@$3
JOB ID:
NET
NET
NET
STATUS:
ACTIVE
ACTIVE
ACTIVE
MESSAGE TABLE:
3
2298
1973
325
SENDPND
0
NOBUFFER:
LCL CNT:
LCL CNT:
RESPPND
0
0
0
0
MSGI RECEIVED:
MSGI XFER CNT:
2153
2259
PENDINGQ:
XFERTIME:
0
2408
MSGI PENDINGQ:
SYMPATHY SICK:
IO BUFFERS
0
0
DREF
0
PAGEABLE
0
PENDINGQ:
1 Summary information at the top of the output of the command shows the member names,
the system name and the job name of the task associated with the member, and the status of
the member.
2 The system (#@$3) was unable to obtain the most current data for system #@$1 from the
sysplex Couple Data Set. To obtain the latest information, the command would need to be
issued from the #@$1 system.
3 The signalling service data describes the use of the XCF Signalling Service by the member.
One line will appear for each different signal size used by the member.
196
There are two ways to address stalled members and the problems they can cause:
Let SFM automatically address the situation for you. This requires a recent enhancement
to SFM related to sympathy sickness, as explained in Chapter 5, Sysplex Failure
Management on page 73.
Take some manual action to identify the culprit and resolve the situation before other
users start being impacted. We will discuss this further in this section.
To identify XCF members that are not collecting their messages quickly enough, XCF notes
the time every time it schedules a member's message exit. If the exit has not completed
processing in four minutes, XCF issues a Stalled Member message (IXC431I), as shown in
Figure 9-18.
10:59:09.06 IXC431I GROUP B0000002 MEMBER M1 JOB MAINASID ASID 0023
STALLED AT 02/06/2007 10:53:57.823698 ID: 0.2
LAST MSGX: 02/06/2007 10:58:13.112304 12 STALLED 0 PENDINGQ
LAST GRPX: 02/06/2007 10:53:53.922204 0 STALLED 0 PENDINGQ
11:00:17.23 *IXC430E SYSTEM SC04 HAS STALLED XCF GROUP MEMBERS
Figure 9-18 IXC431I stalled member message
The drawback of relying on these messages is that a member can be stalled for around four
minutes before IXC431I is issued. If you have an idea that you have a stalled member, you
can issue the D XCF,GROUP command to look for stalled members, as shown in Figure 9-19.
D XCF,G
IXC331I 11.00.31 DISPLAY XCF
GROUPS(SIZE):1*B0000002(3)
ISTCFS01(3)
SYSGRS(3)
SYSIGW01(3)
SYSIOS02(1)
. . .
COFVLFNO(3)
SYSDAE(4)
SYSIEFTS(3)
SYSIKJBC(3)
SYSIOS03(1)
CTTXGRP(3)
SYSENF(3)
SYSIGW00(3)
SYSIOS01(1)
SYSJES(3)
197
The D XCF,G command will indicate the group with the stalled member after 30 seconds. An
asterisk (*) indicates the stall. However (prior to z/OS 1.8 or APAR OA09194), you must issue
the command on every system. You only receive the indication if the stalled member is on the
system where the command was issued.
Issue the D XCF,G,<groupname> command to learn which members are in a given group, as
shown in Figure 9-20.
D XCF,G,B0000002
IXC332I 02.16.52
GROUP B0000002:
1 Member names connected to the B0000002 group, with an asterisk (*) indicating the stalled
member.
Issue the D XCF,G,<groupname>,<member> command to display additional information for a
given member, as shown in Figure 9-21.
D XCF,G,B0000002,M1
IXC333I 11.05.31 DISPLAY XCF 926
INFORMATION FOR GROUP SYSMCS2
MEMBER NAME:
SYSTEM:
JOB ID:
STATUS:
#@$1
#@$1
TEST01
ACTIVE
. . .
SIGNALLING SERVICE
MSGO ACCEPTED: 2401 NOBUFFER: 0
MSGO XFER CNT: 0 LCL CNT: 2401 BUFF LEN: 956
MSGI RECEIVED: 844 PENDINGQ: 0
MSGI XFER CNT: 4001 XFERTIME: N/A
EXIT 01FB9300: 02/06/2007 10:57:15.939863 ME 00:00:00.001107
*EXIT 01FB9500: 02/06/2007 10:53:58.181009 ME RUNNING
. . .
Figure 9-21 XCF member list with stall
198
D XCF,S,ALL
IXC335I 21.03.32 DISPLAY XCF 769
SYSTEM
TYPE SERIAL LPAR STATUS TIME
#@$2
2084 6A3A
N/A 06/27/2007 21:03:28
#@$3
2084 6A3A
N/A 06/27/2007 21:03:32
#@$1
2084 6A3A
N/A 06/27/2007 21:03:29
SYSTEM STATUS
ACTIVE
ACTIVE
ACTIVE
TM=SIMETR
TM=SIMETR
TM=SIMETR
XCF also checks the time stamp for all the other ACTIVE members. If any systems time
stamp is older than the current time, minus that systems INTERVAL value from the
COUPLExx parmlib member, then XCF suspects that the system may be dead and flags the
status of the system with a non-active status, as shown in Figure 9-23.
D XCF,S,ALL
IXC335I 21.19.46 DISPLAY XCF 870
SYSTEM
TYPE SERIAL LPAR STATUS TIME
#@$2
2084 6A3A
N/A 06/27/2007 21:15:08
#@$3
2084 6A3A
N/A 06/27/2007 21:19:45
#@$1
2084 6A3A
N/A 06/27/2007 21:19:44
SYSTEM STATUS
MONITOR-DETECTED STOP
ACTIVE
TM=SIMETR
ACTIVE
TM=SIMETR
1 MONITOR-DETECTED STOP means the system has not updated its status on the CDS within the
time interval specified on that system's COUPLExx parmlib member. This can mean that:
199
200
10
Chapter 10.
201
202
JESPlex Systems 1, 2, 3, 4
JESPlex
Systems
A, B, C
Stand alone
JES2 Systems
JESXCF is a system address space that contains functions and data areas used by JES2 to
send messages to other members in the MAS. It provides a common set of messaging
services that guarantee delivery and first-in first-out (FIFO) queuing of messages between
JES2 subsystems. When a JES2 member fails or a participating z/OS system fails, other
JES2 members are notified through JESXCF.
The JESXCF address space is created as soon as JES2 first starts on any system in the
Parallel Sysplex. Only the first system up in the Parallel Sysplex creates structures. The other
systems will find these on connection to the CF and use them.
JES2 has two assisting address spaces, JES2MON and JES2AUX. These address spaces
provide support services for JES2. The name is derived from the subsystem definition in
PARMLIB(IEFSSNxx), as is the JES2 PROC. Thus, if the entry SUBSYS SUBNAME(JES2)
is replaced with SUBSYS SUBNAME(FRED), then the system would have started tasks
FRED, FREDMON and FREDAUX.
JES2AUX is an auxiliary address space used by JES2 to hold spool buffers. These are user
address space buffers going to the spool.
JES2MON is the JES2 health monitor that is intended to address situations where JES2 is
not responding to commands and where it is not possible to determine the issue. Operations
can communicate directly with JES2MON.
Resources such as those listed are monitored across the MAS:
JNUM
JQEs
JOEs
TGs
Job Numbers
Job Queue Elements
Job Output Elements
Spool Space and Track Groups
If you want to cancel a job, restart a job, or send a message, you can do so from any member
of the MAS. It is not necessary to know which member the job is running on. In addition, the
$C command can also be used to cancel an active time-sharing user (TSO user ID) from any
member of the MAS.
203
JES2 thresholds
The first member of the MAS that detects a JES2 threshold issues a message to the console.
There are three ways in which threshold values can be set:
In the JES2 PARMLIB, by the system programmer
Modified by a JES2 command, such as $TSPOOLDEF,TGSPACE=WARN=85
Not set, in which case the default value is used
Other members still issue the messages, but rather than sending them to the console, they
are written to the hardcopy log. This means that you must have an effective method for
monitoring consoles on a sysplex-wide basis to avoid missing critical messages. This could
be done by having automation tool that issues an alert to bring the problem to the attention of
an operator.
204
For more information about placement of the checkpoint, see JES2 Initialization and Tuning
Guide, SA22-7532.
This section illustrates suspending a checkpoint and then reactivating it. The starting
configuration can be seen in Figure 10-2.
$D CKPTDEF
$HASP829 CKPTDEF
$HASP829
$HASP829
$HASP829
$HASP829
$HASP829
$HASP829
$HASP829
$HASP829
CKPT1=(STRNAME=JES2CKPT_1,INUSE=YES,
VOLATILE=YES),CKPT2=(DSNAME=SYS1.JES2.CKPT2,
VOLSER=#@$#M1,INUSE=YES,VOLATILE=NO),
NEWCKPT1=(DSNAME=,VOLSER=),NEWCKPT2=(DSNAME=,
VOLSER=),MODE=DUPLEX,DUPLEX=ON,LOGSIZE=7,
VERSIONS=(STATUS=ACTIVE,NUMBER=2,WARN=80,
MAXFAIL=0,NUMFAIL=0,VERSFREE=2,MAXUSED=1),
RECONFIG=NO,VOLATILE=(ONECKPT=WTOR,
ALLCKPT=WTOR),OPVERIFY=NO
205
$T CKPTDEF,RECONFIG=YES
$HASP285 JES2 CHECKPOINT RECONFIGURATION STARTING
$HASP233 REASON FOR JES2 CHECKPOINT
* RECONFIGURATION IS OPERATOR
*
REQUEST
$HASP285 JES2 CHECKPOINT RECONFIGURATION STARTED
* - DRIVEN BY
*
MEMBER #@$2
*$HASP271 CHECKPOINT RECONFIGURATION OPTIONS
*
*
VALID RESPONSES ARE:
*
*
'1' - FORWARD CKPT1 TO NEWCKPT1
*
'2' - FORWARD CKPT2 TO NEWCKPT2
*
'5' - SUSPEND THE USE OF CKPT1
*
'6' - SUSPEND THE USE OF CKPT2
*
'CANCEL' - EXIT FROM RECONFIGURATION
*
CKPTDEF (NO OPERANDS)
- DISPLAY MODIFIABLE
*
SPECIFICATIONS
*
CKPTDEF (WITH OPERANDS) - ALTER MODIFIABLE
*
SPECIFICATIONS
*189 $HASP272 ENTER RESPONSE (ISSUE D R,
* MSG=$HASP271 FOR RELATED MSG)
R 189,6
IEE600I REPLY TO 189 IS;6
$HASP280 JES2 CKPT2 DATA SET (SYS1.JES2.CKPT2 ON
IS NO LONGER IN USE
$HASP255 JES2 CHECKPOINT RECONFIGURATION COMPLETE
CKPT1=(STRNAME=JES2CKPT_1,INUSE=YES,
VOLATILE=YES),CKPT2=(DSNAME=SYS1.JES2.CKPT2,
VOLSER=#@$#M1,INUSE=NO),NEWCKPT1=(DSNAME=,
VOLSER=),NEWCKPT2=(DSNAME=,VOLSER=),
MODE=DUPLEX,DUPLEX=ON,LOGSIZE=7,
VERSIONS=(STATUS=ACTIVE,NUMBER=2,WARN=80,
MAXFAIL=0,NUMFAIL=0,VERSFREE=2,MAXUSED=1),
RECONFIG=NO,VOLATILE=(ONECKPT=WTOR,
ALLCKPT=WTOR),OPVERIFY=NO
4
5
Now we can move the volume to the new DASD subsystem and resume the use of CKPT2,
as shown in Figure 10-4 on page 206.
$T CKPTDEF,RECONFIG=YES
*$HASP285 JES2 CHECKPOINT RECONFIGURATION STARTING
*$HASP233 REASON FOR JES2 CHECKPOINT
* RECONFIGURATION IS OPERATOR
*
REQUEST
*$HASP285 JES2 CHECKPOINT RECONFIGURATION STARTED
* - DRIVEN BY MEMBER #@$2
*$HASP271 CHECKPOINT RECONFIGURATION OPTIONS
*
*
VALID RESPONSES ARE:
*
*
'1' - FORWARD CKPT1 TO NEWCKPT1
*
'8' - UPDATE AND START USING CKPT2
*
'CANCEL' - EXIT FROM RECONFIGURATION
*
CKPTDEF (NO OPERANDS)
- DISPLAY MODIFIABLE
*
SPECIFICATIONS
*
CKPTDEF (WITH OPERANDS) - ALTER MODIFIABLE
*
SPECIFICATIONS
*190 $HASP272 ENTER RESPONSE (ISSUE D R,
* MSG=$HASP271 FOR RELATED MSG) SPECIFICATIONS
R190,8
IEE600I REPLY TO 190 IS;8
*$HASP273 JES2 CKPT2 DATA SET WILL BE ASSIGNED TO
*
*
SYS1.JES2.CKPT2 ON #@$#M1
*
*
VALID RESPONSES ARE:
*
*
'CONT'
- PROCEED WITH ASSIGNMENT
*
'CANCEL' - EXIT FROM RECONFIGURATION
*
CKPTDEF (NO OPERANDS)
- DISPLAY MODIFIABLE
*
SPECIFICATIONS
*
CKPTDEF (WITH OPERANDS) - ALTER MODIFIABLE
*
SPECIFICATIONS
*191 $HASP272 ENTER RESPONSE (ISSUE D R,
* MSG=$HASP273 FOR RELATED MSG)
R 191,CONT
IEE600I REPLY TO 191 IS;CONT
$HASP280 JES2 CKPT2 DATA SET (SYS1.JES2.CKPT2 ON
#@$#M1) IS NOW IN USE
$HASP255 JES2 CHECKPOINT RECONFIGURATION COMPLETE
10
11
207
CKPT1=(STRNAME=JES2CKPT_1,INUSE=YES,
VOLATILE=YES),CKPT2=(DSNAME=SYS1.JES2.CKPT2,
VOLSER=#@$#M1,INUSE=YES,VOLATILE=NO),
NEWCKPT1=(DSNAME=SYS1.JES2.CKPT1,
VOLSER=#@$#J1),NEWCKPT2=(DSNAME=,VOLSER=),
MODE=DUPLEX,DUPLEX=ON,LOGSIZE=7,
VERSIONS=(STATUS=ACTIVE,NUMBER=2,WARN=80,
MAXFAIL=0,NUMFAIL=0,VERSFREE=2,MAXUSED=1),
RECONFIG=NO,VOLATILE=(ONECKPT=WTOR,
ALLCKPT=WTOR),OPVERIFY=NO
1
2
3
4
5
208
1
2
3
4
1
2
3
4
5
6
7
8
9
209
$D CKPTDEF
$HASP829 CKPTDEF
$HASP829
$HASP829
$HASP829
$HASP829
$HASP829
$HASP829
$HASP829
$HASP829
$HASP829
CKPT1=(STRNAME=JES2CKPT_1,INUSE=YES,
VOLATILE=YES),CKPT2=(DSNAME=SYS1.JES2.CKPT2,
VOLSER=#@$#M1,INUSE=YES,VOLATILE=NO),
NEWCKPT1=(DSNAME=SYS1.JES2.CKPT1,
VOLSER=#@$#J1),NEWCKPT2=(DSNAME=,VOLSER=),
MODE=DUPLEX,DUPLEX=ON,LOGSIZE=7,
VERSIONS=(STATUS=ACTIVE,NUMBER=2,WARN=80,
MAXFAIL=0,NUMFAIL=0,VERSFREE=2,MAXUSED=1),
RECONFIG=NO,VOLATILE=(ONECKPT=WTOR,
ALLCKPT=WTOR),OPVERIFY=YES
1
2
3
4
1
2
3
4
5
3
4
210
1
2
3
4
5
6
CKPT1=(STRNAME=JES2CKPT_1,INUSE=YES,
VOLATILE=YES),CKPT2=(DSNAME=SYS1.JES2.CKPT2,
VOLSER=#@$#M1,INUSE=YES,VOLATILE=NO),
NEWCKPT1=(DSNAME=,VOLSER=),NEWCKPT2=(DSNAME=,
VOLSER=),MODE=DUPLEX,DUPLEX=ON,LOGSIZE=7,
VERSIONS=(STATUS=ACTIVE,NUMBER=2,WARN=80,
MAXFAIL=11,NUMFAIL=11,VERSFREE=2,MAXUSED=2),
RECONFIG=NO,VOLATILE=(ONECKPT=WTOR,
ALLCKPT=WTOR),OPVERIFY=NO
1
2
211
3
4
5
212
213
The messages generated in a Parallel Sysplex differ, depending upon whether the restart is
on the first JES2 MAS member or on an additional JES2.
The following examples are discussed in this chapter:
The configuration used is the recommended Parallel Sysplex configuration with two duplexed
checkpoints: CKPT1 located in the CF, and CKPT2 on DASD.
Note: The message flow assumes NOREQ is specified in the options. Otherwise, a
$HASP400 message is also issued, requiring a $S to start JES2 processing.
214
S JES2
IEF677I WARNING MESSAGE(S) FOR JOB JES2 ISSUE
017 $HASP426 SPECIFY OPTIONS - JES2 z/OS 1.8 SSNAME=JES2
. . .
17COLD,NOREQ
. . .
IXZ0001I CONNECTION TO JESXCF COMPONENT
ESTABLISHED GROUP XCFJES2A MEMBER N1$#@$2
$HASP9084 JES2 MONITOR ADDRESS SPACE STARTED FOR JES2
$HASP537 THE CURRENT CHECKPOINT USES 2946 4K RECORDS
IXL014I IXLCONN REQUEST FOR STRUCTURE JES2CKPT_#@$1_1 080
WAS SUCCESSFUL. JOBNAME: JES2 ASID: 001
CONNECTOR NAME: JES2_#@$2 CFNAME: FACIL01
IXL015I STRUCTURE ALLOCATION INFORMATION FOR
STRUCTURE JES2CKPT_#@$1_1, CONNECTOR NAME JES2_#@$2
CFNAME
ALLOCATION STATUS/FAILURE REASON
---------------------------------------FACIL01
STRUCTURE ALLOCATED AC001800
FACIL02
PREFERRED CF ALREADY SELECTED AC001800
$HASP436 CONFIRM COLD START ON 083
CKPT1 - STRNAME=JES2CKPT_#@$1_1
CKPT2 - VOLSER=#@$#Q1 DSN=SYS1.#@$2.CKPT2
SPOOL - PREFIX=#@$#Q DSN=SYS1.#@$2.HASPACE
018 $HASP441 REPLY 'Y' TO CONTINUE INITIALIZATION OR 'N' TO TERMINATE
IN RESPONSE TO MESSAGE HASP436
. . .
R 018,Y
. . .
$HASP478 INITIAL CHECKPOINT READ IS FROM CKPT1 792
(SYS1.#@$2.CKPT1 ON #@$#Q1)
$HASP405 JES2 IS UNABLE TO DETERMINE IF OTHER MEMBERS ARE ACTIVE
019 $HASP420 REPLY 'Y' IF ALL MEMBERS ARE DOWN (IPL REQUIRED),
'N' IF NOT
REPLY 19,Y
$HASP478 INITIAL CHECKPOINT READ IS FROM CKPT1
(SYS1.#@$2.CKPT1 ON #@$#Q1)
$HASP405 JES2 IS UNABLE TO DETERMINE IF OTHER
MEMBERS ARE ACTIVE
$HASP266 JES2 CKPT2 DATA SET IS BEING FORMATTED
$HASP267 JES2 CKPT2 DATA SET HAS BEEN
SUCCESSFULLY FORMATTED
$HASP266 JES2 CKPT1 DATA SET IS BEING FORMATTED
$HASP267 JES2 CKPT1 DATA SET HAS BEEN
SUCCESSFULLY FORMATTED
$HASP492 JES2 COLD START HAS COMPLETED
1
2
3
6
7
8
9
10
1
2
3
4
5
6
215
2
3
216
This is the first JES2 system started and the checkpoint structure has been deleted; for
example, the CF has been powered off.
There is an existing active JES2 system.
In a MAS configuration, this would be the normal JES2 start configuration.
1
2
3
217
S JES2,PARM='WARM,NOREQ'
$HASP9084 JES2 MONITOR ADDRESS SPACE STARTED FOR JES2
. . .
IXZ0001I CONNECTION TO JESXCF COMPONENT ESTABLISHED, 294
GROUP XCFJES2A MEMBER N1$#@$1
. . .
IXL014I IXLCONN REQUEST FOR STRUCTURE JES2CKPT_#@$1_1 315
WAS SUCCESSFUL. JOBNAME: JES2 ASID: 001C
CONNECTOR NAME: JES2_#@$1 CFNAME: FACIL01
$HASP478 INITIAL CHECKPOINT READ IS FROM CKPT1 317
(STRNAME JES2CKPT_#@$1_1)
LAST WRITTEN FRIDAY, 6 JUL 2007 AT 00:54:10 (GMT)
$HASP493 JES2 MEMBER-#@$1 QUICK START IS IN PROGRESS
$HASP537 THE CURRENT CHECKPOINT USES 2946 4K RECORDS
IEF196I IEF237I 1D0B ALLOCATED TO $#@$#Q1
$HASP492 JES2 MEMBER-#@$1 QUICK START HAS COMPLETED
1
2
3
218
S JES2,PARM='WARM,NOREQ'
IXZ0001I CONNECTION TO JESXCF COMPONENT ESTABLISHED, 560
GROUP XCFJES2A MEMBER N1$#@$2
$HASP9084 JES2 MONITOR ADDRESS SPACE STARTED FOR JES2
. . .
< IXC582I Messages indicating a new structure has been allocated >
. . .
$HASP290 MEMBER #@$2 -- JES2 CKPT1 IXLLIST READ_LIST REQUEST FAILURE
*** CHECKPOINT DATA SET NOT DAMAGED BY THIS MEMBER ***
RETURN CODE = 00000008
REASON CODE = 0C1C0825
RECORD
= CHECK
$HASP460 UNABLE TO CONFIRM THAT CKPT1 IS A VALID CHECKPOINT 585
DATA SET DUE TO AN I/O ERROR READING THE LOCK RECORD.
VERIFY THE SPECIFIED CHECKPOINT DATA SETS ARE CORRECT:
2
3
5
6
219
S JES2,PARM='WARM,NOREQ'
IXZ0001I CONNECTION TO JESXCF COMPONENT ESTABLISHED, 440
GROUP XCFJES2A MEMBER TRAINER$#@$2
IXL014I IXLCONN REQUEST FOR STRUCTURE JES2CKPT_1 442
WAS SUCCESSFUL. JOBNAME: JES2 ASID: 001C
CONNECTOR NAME: JES2_#@$2 CFNAME: FACIL01
$HASP478 INITIAL CHECKPOINT READ IS FROM CKPT1 444
(STRNAME JES2CKPT_1)
LAST WRITTEN FRIDAY, 6 JUL 2007 AT 05:22:39 (GMT)
$HASP493 JES2 MEMBER-#@$2 HOT START IS IN PROGRESS
$HASP537 THE CURRENT CHECKPOINT USES 2952 4K RECORDS
1
2
With z/OS 1.8 and later, however, IBM added an enhancement that can be seen in
Figure 10-20 on page 221.
220
The improved display shows which address spaces are holding out JES2. The expected
behavior of JES2 for a clean shutdown can be seen in Figure 10-21.
$PJES2
$HASP608
$HASP608
$HASP608
$HASP608
$HASP608
$HASP623
$HASP607
$PJES2 906
ACTIVE ADDRESS SPACES
ASID
JOBNAME JOBID
-------- -------- -------0028
ZFS
STC10221
MEMBER DRAINING
JES2 NOT DORMANT -- MEMBER DRAINING,
RC=10 ACTIVE ADDRESS SPACES
1
2
221
04.17.25 #@$2
$PJES2
04.17.25 #@$2
$HASP608 $PJES2 COMMAND ACCEPTED
04.17.25 #@$2
*CNZ4201E SYSLOG HAS FAILED
04.17.25 #@$2
IEE043I A SYSTEM LOG DATA SET HAS BEEN QUEUED TO
SYSOUT CLASS L
04.17.25 #@$2
*IEE037D LOG NOT ACTIVE
04.17.26 #@$2
IXZ0002I CONNECTION TO JESXCF COMPONENT DISABLED, 1
GROUP XCFJES2A MEMBER TRAINER$#@$2
04.17.26 #@$2
$HASP9085 JES2 MONITOR ADDRESS SPACE STOPPED FOR
JES2
04.17.31 #@$2
$HASP085 JES2 TERMINATION COMPLETE
Figure 10-22 $PJES2, last system in the jesplex
222
01.20.17 #@$2
$PJES2,ABEND
*01.20.17 #@$2
*$HASP095 JES2 CATASTROPHIC ERROR. CODE = $PJ2
01.20.18 #@$2
$HASP088 JES2 ABEND ANALYSIS
$HASP088 -----------------------------------------------------$HASP088 FMID = HJE7730
LOAD MODULE = HASJES20
$HASP088 SUBSYS = JES2 z/OS 1.8
$HASP088 DATE = 2007.186
TIME
= 1.20.18
$HASP088 DESC = OPERATOR ISSUED $PJES2, ABEND
$HASP088 MODULE
MODULE
OFFSET SERVICE ROUTINE
EXIT
$HASP088 NAME
BASE
+ OF CALL LEVEL
CALLED
##
$HASP088 -------- -------------- ------- ---------- ---$HASP088 HASPCOMM 000127D8 + 0081C8 OA18916 *ERROR $PJ2
$HASP088 PCE = COMM
(0C9B55E0)
$HASP088 R0 = 0001A642 00C7E518 00006F16 00016FC8
$HASP088 R4 = 00000000 0C9B5D84 00000004 0C9B5D88
$HASP088 R8 = 0001A642 00000000 00000000 00007000
$HASP088 R12 = 00012828 0C9B55E0 0C9750B8 0002D380
$HASP088 -----------------------------------------------------*01.20.18 #@$2
*$HASP198 REPLY TO $HASP098 WITH ONE OF THE
* FOLLOWING:
* END
- STANDARD ABNORMAL END
* END,DUMP
- END JES2 WITH A DUMP (WITH AN OPTIONAL TITLE)
* END,NOHOTSTART - ABBREVIATED ABNORMAL END (HOT-START IS AT RISK)
* SNAP
- RE-DISPLAY $HASP088
* DUMP
- REQUEST SYSTEM DUMP (WITH AN OPTIONAL TITLE)
*01.20.18 #@$2
*009 $HASP098 ENTER TERMINATION OPTION
IST314I END
01.21.30 #@$2
01.21.30 #@$2
01.21.31 #@$2
01.21.31 #@$2
REASON=D7D1F240
01.21.31 #@$2
GROUP XCFJES2A
9end
IEE600I REPLY TO 009 IS;END
$HASP085 JES2 TERMINATION COMPLETE
IEF450I JES2 JES2 - ABEND=S02D U0000
IXZ0003I CONNECTION TO JESXCF COMPONENT BROKEN
MEMBER TRAINER$#@$2
Figure 10-23 JES2 abend - shut down any system in the sysplex
223
For a job to start execution, there must be an initiator available that has been started and
can accept work in that class. The initiator does not need to be on the system where the
job was first read onto the JES queue. Furthermore, if more than one system has initiators
started for that class, you cannot control which system will execute the job through the use
of class alone.
System affinity
Again, either through explicit coding in JCL or through a JES exit, it is possible to assign a
specific system affinity to a job. Additionally, it is possible to assign an affinity to the JES2
internal reader. This affinity can be altered by using the command $TINTRDR,SYSAFF=. For
example, when we issue $TINTRDR,SYSAFF=#@$2 on system #@$3, then all jobs submitted
on #@$3 ran on system #@$2.
The same technique can also be applied to local readers by using the
$T RDR(nn),SYSAFF= command. The affinity will ensure that the job will only be executed
on a specific system. This does not guarantee, however, that an initiator is available to
process the job in its assigned class.
Through the controlled use of classes and system affinity, you can determine where a job will
be executed. You can let JES2 manage where the job will run by having all initiator classes
started on all members of the MAS. The scheduling environment, which is discussed in
11.11, Using the SCHEDULING ENVIRONMENT (SE) command on page 248, can also be
used to control where jobs run.
If you want to cancel a job, restart a job, or send a message, you can do so from any member
of the MAS. It is not necessary to know which member the job is running on. We see this in
Figure 10-24.
#@$3
#@$2
-$CJ(12645)
$HASP890 JOB(TESTJOB1)
$HASP890 JOB(TESTJOB1)
$HASP890
$HASP890
1
STATUS=(EXECUTING/#@$2),CLASS=A,
PRIORITY=9,SYSAFF=(#@$2),HOLD=(NONE),
CANCEL=YES
CANCEL
TESTJOB1,A=002B
IEE301I TESTJOB1
CANCEL COMMAND ACCEPTED
IEA989I SLIP TRAP ID=X222 MATCHED. JOBNAME=TESTJOB1, ASID=002B
IEF450I TESTJOB1 STEP3 - ABEND=S222 U0000 REASON=00000000
3
4
1
2
3
4
The $C command for a TSO user is converted into a C U=xxxx command. It does not matter
which system the TSO user is logged onto; the cancel command is routed to the appropriate
system. JES2 cannot cancel an STC. As a result, it still does not matter which system the
STC is running on, because any attempt to issue a JES2 cancel will fail.
For more information about batch management, refer to Getting the Most Out of a Parallel
Sysplex, SG24-2073. For more information about JES2 commands, refer to z/OS JES2
Commands, SA22-7526.
224
You can switch initiators from one mode to another by using the $TJOBCLASS command with
the MODE= parameter. However, ensure that all jobs with the same service class are
managed by the same type of initiator. For example, assume that job classes A and B are
assigned to the HOTBATCH service class. If JOBCLASS(A) is controlled by WLM, and
JOBCLASS(B) is controlled by JES2, then WLM will find it difficult to manage the
HOTBATCH goals without managing class B jobs.
Unlike JES2 initiators, WLM initiators do not share classes. Also, the number of WLM
initiators is not limited. Using $TJOBCLASS, however, you can limit the number of concurrent
WLM jobs in a particular class.
MODE=JES,QHELD=NO,SCHENV=,
XEQCOUNT=(MAXIMUM=*,CURRENT=0)
1
2
MODE=WLM,QHELD=NO,SCHENV=,
XEQCOUNT=(MAXIMUM=*,CURRENT=5)
3
4
225
MODE=WLM,QHELD=NO,SCHENV=,
XEQCOUNT=(MAXIMUM=5,CURRENT=0) 1
$PXEQ
$HASP000 OK
$HASP222 XEQ DRAINING
$HASP000 OK
$S XEQ
$HASP000 OK
Figure 10-27 $PXEQ and $SXEQ
226
$DJ 12789
$HASP890 JOB(TE$TJOB1)
$HASP890 JOB(TE$TJOB1)
$HASP890
. . .
$TJ(12789),S=#@$1
$HASP890 JOB(TE$TJOB1)
$HASP890 JOB(TE$TJOB1)
$HASP890
. . .
$DJ 12789
$HASP890 JOB(TE$TJOB1)
$HASP890 JOB(TE$TJOB1)
$HASP890
STATUS=(AWAITING EXECUTION),CLASS=L,
PRIORITY=9,SYSAFF=(ANY),HOLD=(JOB)
STATUS=(AWAITING EXECUTION),CLASS=L,
PRIORITY=9,SYSAFF=(#@$1),HOLD=(JOB)
STATUS=(AWAITING EXECUTION),CLASS=L,
PRIORITY=9,SYSAFF=(#@$1),HOLD=(JOB)
$JDJES
$JDMONITOR
$JDDETAILS
227
$JDHISTORY
$JSTOP
$JDSTATUS
$HASP9120 D STATUS
$HASP9121 NO OUTSTANDING ALERTS
$HASP9150 NO JES2 NOTICES
1
2
3
1 Issue a command to the JES2 Monitor to display the information about JES2.
2 No outstanding JES2 alerts.
3 No outstanding JES2 notices.
$JDJES
$HASP9120
$HASP9121
$HASP9122
$HASP9150
1
D JES
NO OUTSTANDING ALERTS
NO INCIDENTS BEING TRACKED
NO JES2 NOTICES
1
2
LASTPTF
-------NONE
NONE
UA33267
228
$JDDETAILS
$HASP9103 D DETAIL
$HASP9104 JES2 RESOURCE USAGE SINCE 2007.183 23:00:01
RESOURCE
LIMIT
USAGE
LOW
HIGH AVERAGE
-------- -------- -------- -------- -------- -------BERT
65620
341
341
345
342
BSCB
0
0 7483647
0
0
BUFX
89
0
0
2
0
CKVR
2
0
0
1
0
CMBS
201
0
0
0
0
CMDS
200
0
0
0
0
ICES
33
0
0
0
0
JNUM
32760
2025
2016
2025
2020
JOES
20000
2179
2166
2179
2172
JQES
32760
2025
2016
2025
2020
LBUF
23
0
0
0
0
NHBS
53
0
0
0
0
SMFB
52
0
0
0
0
TBUF
104
0
0
0
0
TGS
9911
6062
6052
6063
6056
TTAB
3
0
0
0
0
VTMB
10
0
0
0
0
$HASP9105 JES2 SAMPLING STATISTICS SINCE 2007.183 23:00:01
TYPE
COUNT PERCENT
---------------- ------ ------ACTIVE
493
0.83
IDLE
58740
99.16
LOCAL LOCK
0
0.00
NON-DISPATCHABLE
0
0.00
PAGING
0
0.00
OTHER WAITS
0
0.00
TOTAL SAMPLES
59233
. . .
Figure 10-33 $JDDETAILS
The $JDDETAILS command causes the monitor to display all the JES2 resources and their
limits. This is similar to the information seen with SDSF using the RM feature. Refer to 11.7,
Resource monitor (RM) command on page 246, for more detailed information.
$JDHISTORY
$HASP9130 D HISTORY
$HASP9131 JES2 BERT
USAGE HISTORY
DATE
TIME
LIMIT
USAGE
LOW
HIGH AVERAGE
-------- -------- -------- -------- -------- -------- -------2007.184 0:00:00
65620
342
341
342
341
2007.183 23:00:01
65620
341
341
345
342
2007.183 22:00:00
65620
343
343
344
343
. . .
Figure 10-34 $JDHISTORY
The $JDHISTORY command displays a history for all JES2 control blocks, that is, BERTs,
BSCBs, and so on.
229
$JSTOP
$HASP9101 MONITOR STOPPING 753
IEA989I SLIP TRAP ID=X13E MATCHED. JOBNAME=JES2MON , ASID=0020.
$HASP9085 JES2 MONITOR ADDRESS SPACE STOPPED FOR JES2
. . .
IEF196I IEF375I JOB/JES2MON /START 2007182.1936
IEF196I IEF376I JOB/JES2MON /STOP 2007184.0035 CPU
0MIN 47.69SEC
IEF196I SRB
0MIN 33.21SEC
IEA989I SLIP TRAP ID=X33E MATCHED. JOBNAME=*UNAVAIL, ASID=0020.
IRR812I PROFILE ** (G) IN THE STARTED CLASS WAS USED 771
TO START JES2MON WITH JOBNAME JES2MON.
. . .
Figure 10-35 $JSTOP - JES2MON restarts
230
1
2
3
11
Chapter 11.
231
Active users
Input queue
Output queue
Held output queue
Status of jobs
LOG
SR
MAS
JC
SE
RES
ENC
PS
System log
System requests
Members in the MAS
Job classes
Scheduling environments
WLM resources
Enclaves
Processes
END
Exit SDSF
INIT
PR
PUN
RDR
LINE
NODE
SO
SP
Initiators
Printers
Punches
Reader
Lines
Nodes
Spool offload
Spool volumes
RM
CK
Resource monitor
Health checker
ULOG
232
233
If syslog is not being regularly offloaded, or if a runaway task creates excessive syslog
messages, then SDSF may not be able to view the log. This will be indicated by the error
message ISF002I MASTER SDSF SYSLOG INDEX FULL when the SDSF LOG command is issued.
234
235
1 Extended console name. This will be NOT ACTIVE if the console was turned off with the ULOG
CLOSE command, or you are not authorized to use it.
2 System name on which the command was issued, or from which the response originated.
3 Date when the message was logged.
4 Job ID applying to the message, if available.
5 Command text or message response. If it is echoed by SDSF, it is preceded by a
hyphen (-).
Use the PRINT command to save the ULOG data. You can route it to a printer or save it in a
data set. You will need to save the ULOG data before exiting SDSF. The PRINT command is
described in 11.5, Printing and saving output in SDSF on page 239.
The ULOG command creates an EMCS console, and the default name for the console is your
user ID. Each EMCS console in a sysplex requires a unique name. Thus, if you attempt to
open a second ULOG screen, for example by having two SDSF sessions within a single ISPF
session you receive the message ISF031I, as seen in 2 Figure 11-6.
Display Filter View Print Options Help
------------------------------------------------------------------------------1
SDSF ULOG CONSOLE COTTRELC (SHARED)
LINE 0
COLUMNS 42- 121
COMMAND INPUT ===>
SCROLL ===> CSR
********************************* TOP OF DATA **********************************
ISF031I CONSOLE COTTRELC ACTIVATED (SHARED)
2
******************************** BOTTOM OF DATA ********************************
Figure 11-6 Second ULOG session started
When this happens, SDSF will share an EMCS console 1. However, responses to commands
are always sent to the first SDSF console, not to the second (shared) console.
Figure 11-7 on page 237 shows a command issued. However, the output is displayed in the
primary SDSF ULOG panel, as seen in Figure 11-8 on page 237.
236
1 The response from the command is returned to this screen, even though the command was
not issued on this screen.
IBM has removed the restriction against having the same user ID logged onto two different
JES2 systems, although other restrictions remain, such as TSO enqueue and ISPF issues. If
you log on onto two systems and attempt to use the same EMCS console ID for both, then
the attempt to open the second ULOG will fail, as seen in 1 in Figure 11-9.
Display Filter View Print Options Help
------------------------------------------------------------------------------SDSF ULOG CONSOLE NOT ACTIVE
LINE 0
COLUMNS 42- 121
COMMAND INPUT ===>
SCROLL ===> CSR
********************************* TOP OF DATA **********************************
ISF032I CONSOLE COTTRELC ACTIVATE FAILED, RETURN CODE 0004, REASON CODE 0000 1
******************************** BOTTOM OF DATA ********************************
Figure 11-9 ULOG on a second system
Commands can be issued from the second system but the output is only visible in the system
log. That is, you can view the results using the LOG command but not the ULOG command.
237
There are many different columns available and the actual display can be modified using the
ARRANGE command. Some of the more useful fields are:
1 System ID of system you are logged on to.
2 Systems displayed (z/OS value or SYSNAME value).
3 Total demand paging rate.
4 Percent of time that the CPU is busy (z/OS/LPAR/zAAP Views).
5 Where JES2 commands such as C are issued.
6 Jobname.
7 CPU% usage by each job.
8 Total CPU used by each job.
9 Current I/O Rate for each job.
10 Total IOs performed by each job.
11 Real memory used by each job.
12 Paging rate for each job.
13 System where this job is running.
Use the SYSNAME command to restrict the display to a particular system or to view all the
systems. SYSNAME <sysid> will restrict the display to one system. SYSNAME ALL will display the
active tasks on all systems.
To restrict the display to only batch jobs, use DA OJOB. To restrict the display to only STCs, use
DA OSTC. To restrict the display to only TSO user IDs, use DA OTSU.
The arrange command allows you reorder the columns displayed. Thus, to move the SIO and
EXCP-Cnt columns before the CPU% and CPU-TOTAL, issue the commands seen in Figures
Figure 11-11 on page 239 and Figure 11-12 on page 239.
238
239
---------------------COLUMNS
1 80
SCROLL ===> CSR
IEF196I IEF237I A943 A
ERB102I ZZ : TERMINATE
IEF196I IEF285I SYS1
IEF196I IEF285I VOL
ERB451I RMF: SMF DATA
ERB102I RMF: TERMINATE
--------P
-JOBNAME STEPNAME PRO
CLOCK SERV PG
PAG
You can use the different options to print to a data set, to a pre-allocated DD statement, and
then specify which lines are to be printed.
240
1 The row of the first line of the syslog shown on the screen.
In Figure 11-16 we print 7500 lines, line numbers 10703 through 18203, to the currently open
print output data set. We choose line 10703 because we want to start the print at the
time19:00.
Display Filter View Print Options Help
------------------------------------------------------------------------------ISFPCU41 OG 6972.101 #@$1 #@$2 06/26/2007 0W
10693 PRINT OPENED
COMMAND INPUT ===> print 10703 18203
SCROLL ===> CSR
S
--------P
N 0004000 #@$1
2007177 18:59:50.37 STC07173 00000290 -JOBNAME STEPNAME PRO
S
CLOCK SERV PG
PAG
N 0004000 #@$1
2007177 18:59:50.37 STC07173 00000290 -D#$1IRLM STARTING
S
885.4 354K 0
N 0004000 #@$1
2007177 18:59:50.37 STC07173 00000290 -D#$1IRLM ENDED. NAME
S
TOTAL ELAPSED TIME= 88
N 4020000 #@$1
2007177 18:59:50.38 STC07173 00000090 IEF352I ADDRESS SPACE
N 4000000 #@$1
2007177 18:59:50.38 STC07173 00000090 $HASP395 D#$1IRLM ENDE
N 0000000 #@$1
2007177 18:59:50.41
00000280 IEA989I SLIP TRAP ID=X
N 4000000 #@$1
2007177 19:00:02.24 STC07000 00000090 ERB101I ZZ : REPORT AV
NC0000000 #@$1
2007177 19:00:12.06 #@$1M01 00000290 K S
NC0000000 #@$1
2007177 19:00:13.87 #@$1M01 00000290 K S,DEL=R,SEG=28,CON=N
NC0000000 #@$1
2007177 19:00:18.49 #@$1M01 00000290 D A,L
Figure 11-16 Print range
Finally, as shown in Figure 11-17 on page 242, we close the output data set. Therefore, the
print range is from line 10703 through 18203.
241
Note: In Figure 11-17 on page 242, 1, the top line of the screen has moved to line 10703,
which is first line written to the output data set.
Add the c option to any of the x action characters to close the print file when printing is
complete. For example, XDC displays a panel for opening a print data set when the data set
information is provided SDSF prints to the data set, and then closes the print data set.
Use the following steps to have XDC print one or more JES2-managed DD statements for a
job.
1. Expand the job number JOB06301 into its separate output data sets
by using the ? command.
2. Use the XDC command against the SYSPRINT DD statement.
3. Supply the attributes of a the data set that you want the output copied to.
4. Finally, XDC automatically closes the print output file.
These steps are illustrated in the following figures.
Figure 11-18 on page 243 shows using ? to expand the output available from JOB06301.
242
Figure 11-19 shows using the XDC command 1 to print the output of the SYSPRINT DD output
to a data set.
Display Filter View Print Options Help
------------------------------------------------------------------------------ISFPCU41 DATA SET DISPLAY - JOB COTTREL# (JOB06301)
LINE 1-4 (4)
COMMAND INPUT ===>
SCROLL ===> CSR
NP DDNAME StepName ProcStep DSID Owner
C Dest
Rec-Cnt Page
JESMSGLG JES2
2 COTTREL T LOCAL
18
JESJCL JES2
3 COTTREL T LOCAL
45
JESYSMSG JES2
4 COTTREL T LOCAL
13
xdc SYSPRINT STEP1
102 COTTREL T LOCAL
149
Figure 11-20 shows defining the data set to be printed to. In this example, we create a new
PDS with a member.
ISFPNO41
COMMAND INPUT ===>
===>
===>
===>
===>
===>
===>
===>
===>
===>
===>
===>
===>
CYLS
1
1
10
VBA
240
3120
Finally, in Figure 11-21 on page 244, notice that the data set has been closed automatically, 1
243
244
For more information about the ST command, refer to SDSF Operation and Customization,
SA22-7670.
245
246
stand-alone systems. In this configuration, SDSF running on system 1 could only manage the
JES2 MAS for the systems sharing that MAS, that is, systems 1, 2, 3, and 4.
For more detailed information about JES2 MAS, refer to 10.2, JES2 multi-access spool
support on page 202.
Note: By using the Operlog facility, when it is configured appropriately, you can view all the
syslog for the sysplex from any of the JESPLEXs.
JESPlex Systems 1, 2, 3, 4
JESPlex
Systems
A, B, C
Stand alone
JES2 Systems
There are many fields displayed on the MAS panel. Some of the more useful values are:
1 System logged onto.
2 Spool utilization.
247
For more information about the JC command, refer to SDSF Operation and Customization,
SA22-7670.
248
Figure 11-29 shows a DB2 datasharing region DBP0 on systems MVSA and MVSB. JCL
such as shown in Figure 11-30 will only run on systems for which the condition 1
SCHENV=DB2DBP0 is true, such as systems MVSA or MVSB and when DBP0 is active.
WLM manages DB2DBP0.
//DB2LOAD
//
//
//
//
. . .
JOB (C003,6363),'DB2LOAD',
REGION=0M,
CLASS=A,
SCHENV=DB2DBP0,
MSGCLASS=O
Figure 11-30 SCHENV=DB2DBP0-this JCL will not run on systems where this resource is not available
SDSF allows authorized users to display the scheduling environment by using the
SE command. If an authorized user enters SE on a command line, the SCHEDULING
ENVIRONMENT panel is displayed, as shown in Figure 11-31 on page 250.
249
To display resources for a scheduling environment, access the panel with the R action
character, as seen in Figure 11-32.
Display Filter View Print Options Help
------------------------------------------------------------------------------ISFPCU41 DULING ENVIRONMENT DISPLAY MAS SYSTEMS
LINE 1-13 (13)
COMMAND INPUT ===>
SCROLL ===> CSR
NP SCHEDULING-ENV
Description
Systems
R
BATCHUPDATESE
off shift batch updates to DB
CB390SE
S/390 Component Broker SE
DB_REORGSE
reorganization of DB timeframe
. . .
Figure 11-32 SDSF Select select scheduling resource
R To find the resources that are required to be available when work under this environment
can be run.
Figure 11-33 on page 251 shows that for work scheduled as BATCHUPDATES to run, then
two resources need to resolved. DB2_PROD has to be ON and PRIME_SHIFT has to be
OFF. A resource can have three values, ON, OFF, and RESET.
For more detailed information about the topic of resources, refer to 11.12, Using the
RESOURCE (RES) command on page 251.
250
Using SDSF, an authorized user can reset the state of these values by overtyping the field, as
shown in Figure 11-35 on page 252. When the SDSF panel is used in this manner to issue an
MVS, or JES2 command for you, the command can be seen by viewing the LOG or the ULOG
panels.
251
For more information about the RES command, refer to SDSF Operation and Customization,
SA22-7670.
Active users
INIT
Initiators
252
information about the system Health Checker, refer to Chapter 12, IBM z/OS Health
Checker on page 257.
Display Filter View Print Options Help
------------------------------------------------------------------------------SDSF HEALTH CHECKER DISPLAY #@$2
LINE 1-38 (81)
COMMAND INPUT ===>
SCROLL ===> CSR
NP NAME
CheckOwner
State
Status
ASM_LOCAL_SLOT_USAGE
IBMASM
ACTIVE(ENABLED)
EXCEPT
ASM_NUMBER_LOCAL_DATASETS
IBMASM
ACTIVE(ENABLED)
EXCEPT
ASM_PAGE_ADD
IBMASM
ACTIVE(ENABLED)
EXCEPT
ASM_PLPA_COMMON_SIZE
IBMASM
ACTIVE(ENABLED)
EXCEPT
ASM_PLPA_COMMON_USAGE
IBMASM
ACTIVE(ENABLED)
SUCCES
CNZ_AMRF_EVENTUAL_ACTION_MSGS
IBMCNZ
ACTIVE(ENABLED)
SUCCES
CNZ_CONSOLE_MASTERAUTH_CMDSYS
IBMCNZ
ACTIVE(ENABLED)
SUCCES
CNZ_CONSOLE_MSCOPE_AND_ROUTCODE IBMCNZ
ACTIVE(ENABLED)
EXCEPT
CNZ_CONSOLE_ROUTCODE_11
IBMCNZ
ACTIVE(ENABLED)
EXCEPT
CNZ_EMCS_HARDCOPY_MSCOPE
IBMCNZ
ACTIVE(ENABLED)
SUCCES
CNZ_EMCS_INACTIVE_CONSOLES
IBMCNZ
ACTIVE(ENABLED)
SUCCES
CNZ_SYSCONS_MSCOPE
IBMCNZ
ACTIVE(ENABLED)
SUCCES
. . .
Figure 11-37 SDSF CK panel
11.15 Enclaves
An enclave is a piece of work that can span multiple dispatchable units (SRBs and tasks) in
one or more address spaces, and is reported on and managed as a unit. It is managed
separately from the address space it runs in. CPU and I/O resources associated with
processing the transaction are managed by the transactions performance goal and reported
to the transaction.
A classical example of an enclave is DB2 work. The DB2 work tasks run under the DB2
STCs, not the jobs that make the DB2 SQL call. However, the CPU is reported against the
enclave. This allows the performance team to assign different priorities to different DB2 work.
Without enclaves, all the DB2 work runs as in the DB2 STCs and has the same priority.
The SDSF ENC command allows authorized personnel to view the current active work and
which enclave the work is active in; Figure 11-38 on page 254 shows SDSF ENC output.
253
1 When some work is placed in an enclave, a token is created for WLM to manage this piece
of work. Each piece of work has a unique token.
2 The work can be of various types; DDF means it is from a remote system (for example,
AIX). JES means it is from within the sysplex.
3 ACTIVE means doing work. INACTIVE means waiting for a resource.
4 Subsys indicates where the work came from.
5 OWNER indicates which job or STC made the DB2 call. D#$2DIST means it was a distributed
call, which can be from outside of the sysplex.
Display Filter View Print Options Help
------------------------------------------------------------------------------SDSF ENCLAVE DISPLAY #@$1
ALL
LINE 1-33 (33)
COMMAND INPUT ===>
SCROLL ===> CSR
1
2
3
4
5
PREFIX=COTTREL* DEST=(ALL) OWNER=* SYSNAME=*
NP TOKEN
zAAP-Time zACP-Time zIIP-Time zICP-Time
60003D4328
0.00
0.00
0.00
0.00
200005DD1A1
0.00
0.00
0.00
0.00
2B4005CD227
0.00
0.00
0.00
0.00
. . .
2FC005DD217
0.00
0.00
0.00
0.00
2DC005DE4AC
0.00
0.00
0.00
0.00
Figure 11-39 SDSF ENC <scroll right>
Figure 11-39 shows the same token, 1. Notice which zAAP 2 and 3 and zIIP 4 and 5
resources have been used by these processes. However, in our test system, we have neither
of these specialty engines, so the value is 0.
Figure 11-40 on page 255 shows the SDSF DA panel with two columns, ECPU% and
ECPU-TIME. These columns display the CPU enclave usage for different jobs. A typical user of
enclave work (and thus, ECPU-TIME) is DB2 DDF. This is DB2 distributed work where the
DB2 query has come from another DB2 system. This can be from a different MVS image or
another platform, such as AIX.
Note: zIIP and zAAP are specialty processors provided by IBM. Contact IBM, or see the
IBM Web site at http://www.ibm.com, for further information about these processors.
254
1 CPU-Time that has been consumed by this address space, and has been charged to this
address space.
2 ECPU-Time that been consumed by this address space. This includes CPU-TIME for work
performed on behalf of another address space. The difference between ECPU-TIME and
CPU-TIME is work that has run in this address space but was scheduled by another task.
This extra time is charged to the requesting address space.
3 Current interval percentage of CPU-TIME.
4 Current interval percentage of ECPU-TIME.
255
256
12
Chapter 12.
257
Tip: The task of starting HZSPROC should be included in any automation package or in
the parmlib COMMNDxx member, so that it is automatically started after a system restart.
Each check has a set of predefined values including:
How often the check will run
Severity of the check which will then influence how the check output will be issued
Routing and descriptor code for the check
Some check values can be overridden by using SDSF, statements in the HZSPRMxx
member, or the MODIFY command; overrides are usually performed when some check values
are not suitable for your environment or configuration.
Note: Before changing any check values, consult your system programmer.
The HZSPROC started task reads parameters, if coded, from parmlib member HZSPRMxx.
After HZSPROC is active on your z/OS system images, you can invoke the Health Checker
application using option 1 CK from the SDSF primary option menu, as shown in Figure 12-2
on page 259.
258
DA
I
O
H
ST
Active users
Input queue
Output queue
Held output queue
Status of jobs
LOG
SR
MAS
JC
SE
RES
System log
System requests
Members in the MAS
Job classes
Scheduling environments
WLM resources
INIT
PR
PUN
RDR
LINE
NODE
SO
SP
Initiators
Printers
Punches
Readers
Lines
Nodes
Spool offload
Spool volumes
RM
CK
Resource monitor
Health checker 1
Check name
IBMASM
ASM
ASM_LOCAL_SLOT_USAGE
ASM_NUMBER_LOCAL_DATASETS
ASM_PAGE_ADD
ASM_PLPA_COMMON_SIZE
ASM_PLPA_COMMON_USAGE
IBMCNZ
Consoles
CNZ_CONSOLE_MSCOPE_AND_ROUTCOD
CNZ_AMRF_EVENTUAL_ACTION_MSGS
CNZ_CONSOLE_MASTERAUTH_CMDSYS
CNZ_CONSOLE_ROUTCODE_11
CNZ_EMCS_HARDCOPY_MSCOPE
CNZ_EMCS_INACTIVE_CONSOLES
CNZ_SYSCONS_MSCOPE
CNZ_SYSCONS_PD_MODE
CNZ_SYSCONS_ROUTCODE
CNZ_TASK_TABLE
CNZ_SYSCONS_MASTER (z/OS V1R4-V1R7 only)
IBMCSV
Contents
Supervision
CSV_APF_EXISTS
CSV_LNKLST_SPACE
CSV_LNKLST_NEWEXTENTS
259
Check owner
Check name
IBMCS
Communications
Server
CSTCP_SYSTCPIP_CTRACE_TCPIPstackname
CSTCP_TCPMAXRCVBUFRSIZE_TCPIPstackname
CSVTAM_CSM_STG_LIMIT
IBMGRS
GRS
GRS_CONVERT_RESERVES
GRS_EXIT_PERFORMANCE
GRS_MODE
GRS_SYNCHRES
GRS_GRSQ_SETTING
GRS_RNL_IGNORED_CONV
IBMIXGLOGR
System logger
IXGLOGR_ENTRYTHRESHOLD
IXGLOGR_STAGINGDSFULL
IXGLOGR_STRUCTUREFULL
IBMRACF
RACF
RACF_SENSITIVE_RESOURCES
RACF_GRS_RNL
RACF_IBMUSER_REVOKED
RACF_TEMPDSN_ACTIVE
RACF_FACILITY_ACTIVE
RACF_OPERCMDS_ACTIVE
RACF_TAPEVOL_ACTIVE
RACF_TSOAUTH_ACTIVE
RACF_UNIXPRIV_ACTIVE
RRS_DUROFFLOADSIZE
RRS_MUROFFLOADSIZE
RRS_RMDATALOGDUPLEXMODE
RRS_RMDOFFLOADSIZE
RRS_RSTOFFLOADSIZE
RRS_ARCHIVECFSTRUCTURE
IBMRSM
RSM
RSM_AFQ
RSM_HVSHARE
RSM_MAXCADS
RSM_MEMLIMIT
RSM_REAL
RSM_RSU
IBMSDUMP
SDUMP
SDUMP_AUTO_ALLOCATION
SDUMP_AVAILABLE
IBMRRS
RRS
260
Check owner
Check name
IBMUSS
z/OS UNIX
USS_AUTOMOUNT_DELAY
USS_FILESYS_CONFIG
USS_MAXSOCKETS_MAXFILEPROC
IBMVSAM
VSAM
IBMVSM
VSM
IBMXCF
XCF
VSAM_SINGLE_POINT_FAILURE
VSAMRLS_DIAG_CONTENTION
VSAM_ INDEX_TRAP
VSM_CSA_CHANGE
VSM_CSA_LIMIT
VSM_CSA_THRESHOLD
VSM_PVT_LIMIT
VSM_SQA_LIMIT
VSM_SQA_THRESHOLD
VSM_ALLOWUSERKEYCSA
XCF_CF_STR_PREFLIST
XCF_CDS_SEPARATION
XCF_CF_CONNECTIVITY
XCF_CF_STR_EXCLLIST
XCF_CLEANUP_VALUE
XCF_DEFAULT_MAXMSG
XCF_FDI
XCF_MAXMSG_NUMBUF_RATIO
XCF_SFM_ACTIVE
XCF_SIG_CONNECTIVITY
XCF_SIG_PATH_SEPARATION
XCF_SIG_STR_SIZE
XCF_SYSPLEX_CDS_CAPACITY
XCF_TCLASS_CLASSLEN
XCF_TCLASS_CONNECTIVITY
XCF_TCLASS_HAS_UNDESIG
261
You can view complete check output messages in the message buffer using the following:
HZSPRINT utility
SDSF
Health Checker log stream (in our example, HZS.HEALTH.CHECKER.HISTORY) for
historical data using the HZSPRINT utility
When a check exception is detected, a WTO is issued to the syslog. Figure 12-3 is a sample
of a check message.
HZS0002E CHECK(IBMASM,ASM_LOCAL_SLOT_USAGE): 535
ILRH0107E Page data set slot usage threshold met or exceeded
Figure 12-3 WTO message issued by check ASM_LOCAL_SLOT_USAGE
Any check exception messages are issued both as WTOs and to the message buffer. The
WTO version contains only the message text in Figure 12-3. The exception message in the
message buffer, shown in Figure 12-4 on page 263, includes both the text and an explanation
of the potential problem, including severity. It also displays information about what actions
might fix the potential problem.
Tip: To obtain the best results from IBM Health Checker for z/OS, let it run continuously on
your system so that you will know when your system has changed. When you get an
exception, resolve it using the information in the check exception message or overriding
check values, so that you do not receive the same exceptions over and over.
Also consider configuring your automation software to trigger on some of the WTOs that
are issued.
262
CHECK(IBMASM,ASM_LOCAL_SLOT_USAGE)
START TIME: 07/09/2007 00:25:51.131746
CHECK DATE: 20041006 CHECK SEVERITY: MEDIUM
CHECK PARM: THRESHOLD(30%)
N/A
Status
-----OK
Usage
----44%
Dataset Name
-----------PAGE.#@$3.LOCAL1
STATUS: EXCEPTION-MED
263
1 SYS1.SAMPLIB(HZSPRINT) provides sample JCL and parameters that can be used for the
HZSPRINT utility. Sample output generated from the HZSPRINT utility can be seen in
Figure 12-6.
************************************************************************
*
*
* Start: CHECK(IBMRSM,RSM_MEMLIMIT)
*
*
*
************************************************************************
CHECK(IBMRSM,RSM_MEMLIMIT)
START TIME: 07/08/2007 20:25:51.393858
CHECK DATE: 20041006 CHECK SEVERITY: LOW
* Low Severity Exception *
IARH109E MEMLIMIT is zero
Explanation: Currently, the MEMLIMIT
has not been specified.
Setting MEMLIMIT too low may cause jobs
storage to fail. Setting MEMLIMIT too
of real storage resources and lead to
system loss.
By using the CK option from the SDSF main menu, you can display the various z/OS Health
Checks available and the status of the checks. Figure 12-7 on page 265 shows a sample of
the checks available on our z/OS #@$3 system.
264
265
CHECK(IBMCNZ,CNZ_CONSOLE_MSCOPE_AND_ROUTCODE)
START TIME: 07/08/2007 20:25:51.078025
CHECK DATE: 20040816 CHECK SEVERITY: LOW
There is a total of 10 consoles (1 active, 9 inactive) that are
configured with a combination of message scope and routing code values
that are not reasonable.
Console Console
Active
Type
Name
System
MCS
#@$3M01
#@$3
MCS
#@$3M02
(Inactive)
SMCS
CON1
(Inactive)
SMCS
CON2
(Inactive)
. . .
* Low Severity Exception *
MSCOPE
*ALL
*ALL
*ALL
*ALL
n/a
STATUS: EXCEPTION-LOW
Using the HZSPRINT utility with the LOGSTREAM keyword, check reports can be generated
from the log stream; see Figure 12-9 on page 267. In our Parallel Sysplex environment, the
266
IGVH100I The current allocation of CSA storage is 756K (15% of the total
size of 4760K). The highest allocation during this IPL is 16%. Ensuring
an appropriate amount of storage is available is critical to the long
term operation of the system. An exception will be issued when the
allocated size of CSA is greater than the owner specified threshold of
80%.
* High Severity Exception *
IGVH100E ECSA utilization has exceeded 80% and is now 89%
Explanation: The current allocation of ECSA storage is 89% of 72288K.
7916K (11%) is still available.
The highest allocation during this IPL is 89%.
This allocation exceeds the owner threshold.
System Action: The system continues processing. However, eventual
action may need to be taken to prevent a critical depletion of
virtual storage resources.
Operator Response:
programmer.
...
267
F HZSPROC,DISPLAY
HZS0203I 01.17.08 HZS INFORMATION 358
POLICY(*NONE*)
OUTSTANDING EXCEPTIONS: 25
(SEVERITY NONE: 0
LOW: 9
MEDIUM: 13
HIGH: 3)
ELIGIBLE CHECKS: 80 (CURRENTLY RUNNING: 0)
INELIGIBLE CHECKS: 1
DELETED CHECKS: 0
ASID: 0061
LOG STREAM: HZS.HEALTH.CHECKER.HISTORY - CONNECTED
HZSPDATA DSN: SYS1.#@$3.HZSPDATA
PARMLIB: 00
Figure 12-11 Display the overall status of z/OS Health Checker
STATE
AE
AE
AE
AE
AE
AE
AE
AE
AE
AE
AE
AE
AE
AE
AE
AE
STATUS
SUCCESSFUL
SUCCESSFUL
SUCCESSFUL
SUCCESSFUL
SUCCESSFUL
EXCEPTION-MED
SUCCESSFUL
SUCCESSFUL
SUCCESSFUL
SUCCESSFUL
SUCCESSFUL
EXCEPTION-LOW
EXCEPTION-LOW
SUCCESSFUL
SUCCESSFUL
EXCEPTION-MED
Figure 12-13 on page 268 illustrates how to display a specific check when the owner is
unknown.
F HZSPROC,DISPLAY,CHECKS,CHECK=(* 1,USS_FILESYS_CONFIG)
HZS0200I 01.22.27 CHECK SUMMARY
391
CHECK OWNER
CHECK NAME
STATE STATUS
IBMUSS
USS_FILESYS_CONFIG
AE
SUCCESSFUL
A - ACTIVE
I - INACTIVE
E - ENABLED
D - DISABLED
G - GLOBAL CHECK
+ - CHECK ERROR MESSAGES ISSUED
Figure 12-13 Display a specific check when the owner of the check is not known
1 When the owner of the check is not known, use an asterisk (*) as a wildcard. This will
display the appropriate information for that particular check.
268
Figure 12-14 illustrates how to display a specific check when the check owner and check
name are known.
F HZSPROC,DISPLAY,CHECKS,CHECK=(IBMRACF,RACF_TSOAUTH_ACTIVE)
HZS0200I 02.26.08 CHECK SUMMARY
986
CHECK OWNER
CHECK NAME
STATE STATUS
IBMRACF
RACF_TSOAUTH_ACTIVE
AE
SUCCESSFUL
A - ACTIVE
I - INACTIVE
E - ENABLED
D - DISABLED
G - GLOBAL CHECK
+ - CHECK ERROR MESSAGES ISSUED
Figure 12-14 Display a specific check when the check owner and check name are known
Figure 12-15 illustrates how to display all checks relating to a specific check owner.
F HZSPROC,DISPLAY,CHECKS,CHECK=(IBMGRS,* 1)
HZS0200I 02.28.11 CHECK SUMMARY
002
CHECK OWNER
CHECK NAME
STATE
IBMGRS
GRS_RNL_IGNORED_CONV
AEG
IBMGRS
GRS_GRSQ_SETTING
AE
IBMGRS
GRS_EXIT_PERFORMANCE
AE
IBMGRS
GRS_CONVERT_RESERVES
AEG
IBMGRS
GRS_SYNCHRES
AE
IBMGRS
GRS_MODE
AEG
A - ACTIVE
I - INACTIVE
E - ENABLED
D - DISABLED
G - GLOBAL CHECK
+ - CHECK ERROR MESSAGES ISSUED
STATUS
SUCCESSFUL
SUCCESSFUL
SUCCESSFUL
EXCEPTION-LOW
SUCCESSFUL
SUCCESSFUL
1 When the check owner is known but the check name is not known, use an asterisk (*) as a
wildcard. This will display all checks that have that particular check owner.
Figure 12-16 on page 270 illustrates how to display detailed information about a particular
check.
269
F HZSPROC,DISPLAY,CHECKS,CHECK=(IBMGRS,GRS_CONVERT_RESERVES),DETAIL
HZS0201I 02.44.14 CHECK DETAIL
096
CHECK(IBMGRS,GRS_CONVERT_RESERVES)
STATE: ACTIVE(ENABLED)
GLOBAL STATUS: EXCEPTION-LOW
EXITRTN: ISGHCADC
LAST RAN: 07/08/2007 23:13
NEXT SCHEDULED: (NOT SCHEDULED)
INTERVAL: ONETIME
EXCEPTION INTERVAL: SYSTEM
SEVERITY: LOW
WTOTYPE: INFORMATIONAL
SYSTEM DESCCODE: 12
THERE ARE NO PARAMETERS FOR THIS CHECK
REASON FOR CHECK: When in STAR mode, converting RESERVEs can
help improve performance and avoid deadlock.
MODIFIED BY: N/A
DEFAULT DATE: 20050105
ORIGIN: HZSADDCK
LOCALE: HZSPROC
DEBUG MODE: OFF VERBOSE MODE: NO
Figure 12-16 Display detailed information for a particular check
For additional information about z/OS Health Checker, refer to IBM Health Checker for z/OS
Users Guide, SA22-7994.
270
13
Chapter 13.
271
Postscan processing
The first two phases can occur in either the JES3 address space on the global processor
or in the C/I functional subsystem address space on either the local or the global
processor.
3. Job resource management
The next phase of JES3 job processing is called job resource management. The job
resource management function provides for the effective use of system resources. JES3
main device scheduler (MDS) functionality, alias the setup, ensures the operative use of
non-sharable mountable volumes, eliminates operator intervention during job execution,
and performs data set serialization. It oversees specific types of pre-execution job setup
and generally prepares all necessary resources to process the job. The main device
scheduler routines use resource tables and allocation algorithms to satisfy a jobs
requirements through the allocation of volumes and devices, and, if necessary, the
serialization of data sets.
4. Generalized main scheduling
After a job is set up, it enters JES3 job scheduling. JES3 job scheduling is the group of
services that govern where and when z/OS execution of a JES3 job occurs. Job
scheduling controls the order and execution of jobs running within the JES3 complex.
5. Job execution
Jobs are scheduled to the waiting initiators on the JES3 main processors. For the sysplex
environment, the use of Workload Manager (WLM) allows the optimization of resources to
address spaces using goals defined for the various work using a WLM policy.
6. Output processing
The final part of JES3 job processing is called job output and termination. Output service
routines operate in various phases to process SYSOUT data sets destined for print or
punch devices (local, RJP, or NJE), TSO users, internal readers, external writers, and
writer functional subsystems.
7. Purge processing
Purge processing represents the last JES3 processing step for any job. It releases the
resources used during the job.
273
274
275
Cold start
Warm start
Warm start with analysis
Warm start to replace a spool data set
Warm start with analysis to replace a spool data set
Hot start with refresh
Hot start with refresh and analysis
Hot start
Hot start with analysis
You must use a cold start when starting JES3 for the first time. JES3 initialization statements
are read as part of cold start, warm start, and hot start with refresh processing.
276
If JES3 detects any error in the initialization statements, it prints an appropriate diagnostic
message on the console or in the JES3OUT data set. JES3 ends processing if it cannot
recover from the error.
277
Note: The NETSRV address space is a JES component used by both JES2 and JES3.
The NETSRV address space is a started task that communicates with JES to spool and
de-spool jobs and data sets. In Figure 13-6, a sample JES3 node definition for nodes BOSTON
and NEW YORK are defined.
The networking flow between the nodes Boston and New York is illustrated in Figure 13-7 on
page 279.
278
Table 13-1 on page 280 lists useful commands for TCP/IP NJE and provides a brief
description of their purpose.
279
Purpose
*I,SOCKET=ALL
Tell me what sockets I have and what the current connections are.
*I,NETSERV=ALL
*I,SOCKET=name
*I,NETSERV=name
*I,NJE,NAME=node
The following figures display typical output from these as issued to a z/OS 1.9 system.
Figure 13-10 illustrates how to inquire about sockets and connections.
*I,SOCKET=ALL
IAT8709 SOCKET INQUIRY RESPONSE 836
INFORMATION FOR SOCKET WTSCNET
NETSERV=JES3NS1, HOST=WTSCNET.ITSO.IBM.COM, PORT=
0,
NODE=WTSCNET, JTRACE=NO, VTRACE=NO, ITRACE=NO, ACTIVE=NO,
SERVER=NO
INFORMATION FOR SOCKET @0000001
NETSERV=JES3NS1, HOST=, PORT=
0, NODE=WTSCNET, JTRACE=NO,
VTRACE=NO, ITRACE=NO, ACTIVE=YES, SERVER=YES
END OF SOCKET INQUIRY RESPONSE
Figure 13-10 Sockets and current connections
0, STACK=TCPIP,
280
0, STACK=TCPIP,
JES3 command
JES2 command
*Start or *S
$S
*Cancel or *C
$P
Restart a Process or
Device
*Restart or *R
$R
(see Cancel)
$Z
Cancel a Process or
Device
*Cancel or *C
$C
*Modify or *F
$T
*Modify or *F
$H
*Modify or *F
$O
z/OS command
281
Action
JES3 command
JES2 command
Repeat
*Restart or *R
$N
Inquire or Display
*Inquire or *I
$D
DISPLAY
DeviceOnline/Offline
*Vary or *V
$S, $P
VARY
*Return
$PJES2
Send Command to
Remote Node
*Send or *T
$M or $N
ROUTE
Send Message to
Console(s)
*Message or *Z
$D M
SEND
*I D D/uuu
$D U,PRTS
$D U,PRTnnn
*S dev,P
$L Jnnnn,ALL or
$L STCnnn,ALL
Start printer
*S devname
$S PRTnnn
*F,D=devname or
*S uuu,WC=class(es)
$T PRTnnn or
$T
PRTnnn,Q=class(es)
$I PRTnnn
$Z PRTnnn
z/OS command
Restart a printer
*R,devname
$E PRTnnn
*R devname,G
$N PRTnnn
*R devname,J
$E PRTnnn
*R devname,RSCD
$E PRTnnn
*R devname,C or
N,R=-ppppP
$B PRTnnn,pppp or
$B PRTnnn,D to the
start of the data set
*R devname,R=+ppppP
$F PRTnnn,pppp or
$F PRTnnn,D to the
end of the data set
Cancel printer
*C devname
$C PRTnnn
*V devname,OFF or ON
Start a FSS
*MODIFY,F,FSS=fssna
me,ST=Y
Stop a FSS
V ddd,OFFLINE
or ONLINE
1 FSS is started by the z/OS command S fssprocname and then a JES2 command $S PRTnnn.
2 FSS stops if specified in the parameters.
282
14
Chapter 14.
283
284
MCS
MCS consoles are display devices that are attached to a z/OS system to
provide communication between operators and z/OS. MCS consoles are
defined to a local non-SNA control unit (for example an OSA Integrated
Console Controller, or 2074). Currently you can define a maximum of 99
MCS consoles for the entire Parallel Sysplex. In a future release of z/OS,
IBM plans to increase the maximum number of MCS and SNA MCS
(SMCS) consoles that can be defined and active in a configuration from
99 per sysplex to 99 per system in the sysplex.
SMCS
EMCS
Hardware
In this context, the term hardware (or system) consoles refers to the
interface provided by the Hardware Management Console (HMC) on an
IBM System z processor. It is referred to as SYSCONS. See 14.4,
Operating z/OS from the HMC on page 291 for more details.
1 OPERLOG is active.
2 Name of the console as defined in the CONSOLxx member.
3 Device address of the console.
4 The system where the console is defined.
5 The status of the console. In this case, it is A for Active.
6 The console has master command authority.
7 The system that the command is directed to, if no command prefix is entered.
8 Messages are received at this console from these systems. *ALL indicates messages from
all active systems in the sysplex will be received on this console.
285
11
COND=M 1
AUTH=MASTER
NBUF=N/A
AREA=Z
MFORM=T,S,J,X
DEL=R
RTME=1/4
RNUM=20
SEG=38
CON=N
In z/OS V1.8, the single master console was eliminated, which removed a single point of
failure. The functions associated with the master console, including master command
authority and the ability to receive messages delivered via the INTERNAL or INSTREAM
message attribute, can be assigned to any console in the configuration, including EMCS
consoles. The console switch function has also been removed, which removed another
potential point of failure because you are now able to define more than two consoles with
master console authority.
The display master console command (D C,M) no longer identifies a console as an M on the
COND= field. In Figure 14-3, the three systems in the sysplex have the COND field that is not
M 1 (in this case A for Active, but there are other possible conditions) and the AUTH field as
MASTER 2, which means the console is authorized to enter any operator command. All three
consoles have master console authority in the sysplex, so there is no longer a requirement to
switch a console if one is deactivated or fails.
D C,M
. . .
#@$3M01
08E0
#@$3
. . .
#@$2M01
08E0
#@$2
. . .
#@$1M01
08E0
#@$1
. . .
01
11
13
COND=A 1
AUTH=MASTER 2
NBUF=0
AREA=Z
MFORM=T,S,J,X
DEL=R
RTME=1/4
RNUM=20
SEG=38
CON=N
COND=A 1
AUTH=MASTER 2
NBUF=N/A
AREA=Z
MFORM=T,S,J,X
DEL=R
RTME=1/4
RNUM=20
SEG=14
CON=N
COND=A 1
AUTH=MASTER 2
NBUF=N/A
AREA=Z
MFORM=T,S,J,X
DEL=R
RTME=1/4
RNUM=20
SEG=38
CON=N
286
NetView utilize EMCS functions. To display information related to extended MCS (EMCS)
consoles, use the DISPLAY EMCS command.
There are a number of parameters you can use when displaying the EMCS console
configuration to obtain specific information. Review MVS System Commands, SA22-7627, for
more detailed information about this topic. Some of the command parameter options are
discussed in this chapter. An example of a response from the DISPLAY EMCS command is
shown in Figure 14-4. The command entered used the S parameter to display a summary of
EMCS consoles, which includes the number and names for the consoles that meet the
criteria.
D EMCS,S
IEE129I 21.50.06 DISPLAY EMCS 878
DISPLAY EMCS,S
NUMBER OF CONSOLES MATCHING CRITERIA: 18
*DICNS$1 *DICNS$2 *DICNS$3 COTTRELC HAIN
FOSTER
#@$1
*ROUTE$1 #@$2
*ROUTE$2 #@$3
*ROUTE$3 *SYSLG$1 *OPLOG01
*SYSLG$2 *OPLOG02 *SYSLG$3 *OPLOG03
Figure 14-4 EMCS console summary
To obtain more information about the EMCS consoles or a specific console, you can use the
I (info) or F (full) parameter. Figure 14-5 shows output from the D EMCS,F command for a
specific EMCS console, CN=*SYSLG$3. The output is similar to the D C command.
D EMCS,F,CN=*SYSLG$3
CNZ4101I 21.53.05 DISPLAY EMCS 887
DISPLAY EMCS,F,CN=*SYSLG$3
NUMBER OF CONSOLES MATCHING CRITERIA: 1
CN=*SYSLG$3 1 STATUS=A 2 CNID=03000005 KEY=SYSLOG
SYS=#@$3
ASID=000B JOBNAME=-------- JOBID=-------HC=N AUTO=N DOM=NONE
TERMNAME=*SYSLG$3
MONITOR=-------CMDSYS=#@$3 3
LEVEL=ALL
AUTH= MASTER 4
MSCOPE=#@$3 5
ROUTCDE=NONE
INTIDS=N UNKNIDS=N
ALERTPCT=100
QUEUED=0
QLIMIT=50000
SIZEUSED=5184K
MAXSIZE=2097152K
Figure 14-5 EMCS console detail
287
z/OS Communications Server is active and have the appropriate VTAM and console
definitions set up.
The SMCS consoles are defined in the CONSOLxx member, and can be defined with the
same configuration as MCS consoles, including master command authority. Using the D C,L
command, as seen in Figure 14-6, the SMCS console is identified by the COND field A,SM 1
which means the console is an active SMCS console. In this example, the console also has
the AUTH field as MASTER 2, which means the console has master command authority and is
authorized to enter any operator command.
D C,L
. . .
CON1
03
COND=A,SM 1 AUTH=MASTER 2
NBUF=N/A
AREA=Z
MFORM=T,S,J,X
DEL=R
RTME=1/4
RNUM=28
SEG=28
CON=N
USE=FC
LEVEL=ALL
PFKTAB=PFKTAB1
ROUTCDE=ALL
LOGON=REQUIRED
CMDSYS=*
MSCOPE=*ALL
INTIDS=N UNKNIDS=N
. . .
Figure 14-6 Display SMCS consoles
A D C,SMCS command can also be used to display the status and APPLID of the SMCS VTAM
configuration on each system in the sysplex, as shown in Figure 14-7.
D C,SMCS
IEE047I 23.47.27 CONSOLE DISPLAY 521
GENERIC=SCSMCS$$
SYSTEM
APPLID
SMCS STATUS
#@$2
SCSMCS$2 ACTIVE
#@$1
SCSMCS$1 ACTIVE
#@$3
SCSMCS$3 ACTIVE
Figure 14-7 Display SMCS VTAM information
288
#@$1M01
(#@$1M01,#@$2M01,.)
*ALL
#@$1
A specific system
(#@$1,#@$2,...)
A list of systems
For example, Figure 14-8 shows a display of an EMCS console named TEST01 which shows
an MSCOPE setting of *ALL 1.
D EMCS,I,CN=TEST01
CNZ4101I 01.08.11 DISPLAY EMCS 614
DISPLAY EMCS,I,CN=TEST01
NUMBER OF CONSOLES MATCHING CRITERIA: 1
CN=TEST01
STATUS=A
CNID=01000004 KEY=SDSF
. . .
MSCOPE=*ALL 1
ROUTCDE=NONE
INTIDS=N UNKNIDS=N
Figure 14-8 Display MSCOPE information before change
In Figure 14-9, a V CN command is issued from console TEST01 to change its MSCOPE from
*ALL to #@$3.
V CN(*),MSCOPE=#@$3
IEE712I VARY CN PROCESSING COMPLETE
Figure 14-9 Changing MSCOPE of a console
289
In Figure 14-10, a subsequent display of the EMCS console named TEST01 shows the
MSCOPE setting has changed to #@$3 1.
D EMCS,I,CN=TEST01
CNZ4101I 01.08.43 DISPLAY EMCS 618
DISPLAY EMCS,I,CN=TEST01
NUMBER OF CONSOLES MATCHING CRITERIA: 1
CN=TEST01
STATUS=A
CNID=01000004 KEY=SDSF
. . .
MSCOPE=#@$3 1
ROUTCDE=NONE
INTIDS=N UNKNIDS=N
Figure 14-10 Display MSCOPE information after change
You can monitor your sysplex from a single console in the sysplex if its MSCOPE value is set
to *ALL on that console. Keep in mind that all consoles with an MSCOPE value of *ALL will
receive many more messages than consoles defined with an MSCOPE of a single system.
Thus, there is more chance of running into a console buffer shortage. This is discussed in
14.5, Console buffer shortages on page 295.
290
291
There are various scenarios where the use of the HMC as the SYSCONS may be required.
One of these scenarios is at IPL time, if no other consoles are available.
Normally a locally attached console is used when a system is IPLed. The console is defined
as a Nucleus Initialization Program (NIP) console in the operating system configuration
(IODF). If none of the consoles specified as a NIP console are available, or if there are none
specified for the system, then the system will IPL using the SYSCONS console as the NIP
console. If there is no working HMC available, then the Support Element on the processor will
be used instead. When the SYSCONS console is used during IPL, or for receiving other
messages that may be sent to it from the operating system, it is important that the operator
knows how to use the console for this purpose.
To use the SYSCONS console on the HMC, you must select the Operating System
Messages (OSM) task and the appropriate system on the HMC. The HMC will open a window
which will be the SYSCONS console for the system. During an IPL process, the messages
are automatically displayed on the SYSCONS console. If there are any replies required
during the NIP portion of the IPL, the operator can reply using the Respond button on the
window, as shown in Figure 14-13. If you need to use the SYSCONS console for command
processing, you can use the Send button to send a command to z/OS. You must first enter the
VARY CN(*),ACTIVATE command, as shown in the command line of Figure 14-13, to allow the
SYSCONS console to send commands and receive messages.
This command can only be entered at the SYSCONS console. If you try to enter any other
z/OS command prior to this command, you receive a reply stating that you must enter the
VARY CONSOLE command to enable system console communications, as shown in
Figure 14-14 on page 293.
292
As a result of entering this command, and if z/OS is able to establish communication with the
SYSCONS console, there is a response to indicate that the vary processing is complete, as
shown in Figure 14-15.
Messages are now displayed from systems as specified in the MSCOPE parameter for the
SYSCONS. Also, almost any z/OS command can now be entered, with a few restrictions. If
there is no response to the command, it may indicate that the system is not active, or the
interface between z/OS and the Support Element (SE) is not working. There is also the
possibility that the ROUTCDE setting on the SYSCONS is set to NONE. You can check this
293
by using the D C,M command; if the ROUTCDE parameter is not set to ALL and you want to
see all the messages for the system or sysplex, then enter;
V CN(*),ROUT=ALL
The SYSCONS console would normally only be used when IPLing systems with no local
attached consoles to complete the IPL, or when messages are sent to it from z/OS in
recovery situations.
Although the SYSCONS console for a system may be accessed on multiple HMCs, you do
not have to issue the VARY CONSOLE command on each HMC. It only needs to be entered once
for the system. It remains active for the duration of the IPL, or until the VARY CN,DEACT
command (to deactivate the system console) is entered.
To display the SYSCONS console status for all systems in the sysplex, use the DISPLAY
CONSOLE command as shown Figure 14-16.
D C,M
IEE889I 14.19.39 CONSOLE DISPLAY 638
MSG: CURR=0
LIM=1500 RPLY:CURR=3
LIM=20
SYS=AAIL
PFK=00
CONSOLE
ID --------------- SPECIFICATIONS --------------. . .
AAILSYSC
COND=A,PD
AUTH=MASTER
SYSCONS
MFORM=M
LEVEL=ALL,NB
AAIL
ROUTCDE=NONE
CMDSYS=AAIL
MSCOPE=*ALL
AUTOACT=-------INTIDS=N UNKNIDS=N
. . .
Figure 14-16 Display the SYSCONS console
For each system that is active or has been active in the sysplex since the sysplex was
initialized, there is a SYSCONS console status displayed, along with all the other consoles in
the sysplex. The COND status for the SYSCONS has three possible values. They are:
A
The system is active in the sysplex, but the SYSCONS is not available.
A,PD
The status of A indicates that the system has been IPLed, but there has not been a VARY
CONSOLE command issued from the SYSCONS for the system. The status of A,PD indicates
that the system is IPLed, and there has been a VARY CONSOLE command issued from the
SYSCONS for the system. The status of N indicates that the system associated with the
SYSCONS is not active in the sysplex. The console appears in the list because the system
had been active in the sysplex. It could also indicate that the interface between z/OS and the
SE is not working, although this would be rare.
There is an MSCOPE parameter for this console, and it should be set appropriately. See
14.2.5, MSCOPE implications on page 289 for further details.
294
In the event of a WTO buffer shortage, you can use the DISPLAY CONSOLE,BACKLOG command
to determine the console buffer conditions, as shown in Figure 14-18. The command will
display details about the affected console and will also display any jobs using more than 1000
console buffers. This can help to determine the most appropriate corrective action.
D C,B
IEE889I 02.16.21 CONSOLE DISPLAY 665
MSG: CURR=****1 LIM=3000 RPLY:CURR=3
LIM=999 SYS=#@$3
PFK=00
CONSOLE
ID --------------- SPECIFICATIONS --------------#@$3M01 2
13 COND=A
AUTH=MASTER
NBUF=3037 3
08E0
AREA=Z
MFORM=T,S,J,X
#@$3
DEL=N
RTME=1/4
RNUM=20
SEG=20
CON=N
USE=FC
LEVEL=ALL
PFKTAB=PFKTAB1
ROUTCDE=ALL
LOGON=OPTIONAL
CMDSYS=#@$3
MSCOPE=*ALL
INTIDS=N UNKNIDS=N
WTO BUFFERS IN CONSOLE BACKUP STORAGE =
18821
4
ADDRESS SPACE WTO BUFFER USAGE
ASID = 002F
JOBNAME = TSTWTO2 NBUF =
7432
5
ASID = 002E
JOBNAME = TSTWTO1 NBUF =
7231
ASID = 0030
JOBNAME = TSTWTO3 NBUF =
7145
MESSAGES COMING FROM OTHER SYSTEMS - WTO BUFFER USAGE
SYSTEM = #@$2
NBUF =
20
6
SYSTEM = #@$1
NBUF =
20
Figure 14-18 Console buffer display - buffer shortage
295
1 The number of write-to-operator (WTO) message buffers in use by the system at this time. If
the number is greater than 9999, asterisks (*) will appear.
2 The name of the console experiencing a buffer shortage.
3 The number of WTO message buffers currently queued to this console. If the number is
greater than 9999, asterisks (*) will appear.
4 All WTO buffers are in use, and the communications task (COMMTASK) is holding WTO
requests until the WTO buffer shortage is relieved. The number shown is the number of WTO
requests that are being held.
5 This shows the address space that is using more than 33% of the available WTO buffers.
The NBUF shows the number of WTO buffers in use by the specified ASID and job. In this
case, it may be appropriate to cancel the jobs sending the large number of messages to the
console.
6 Messages coming from other systems in the sysplex are using WTO message buffers. This
shows each system that has incoming messages in WTO buffers. The system name and the
number of buffers being used for messages from that system is shown.
There are a number of actions that can be attempted to try to relieve a WTO buffer shortage
condition. The console with the buffer shortage may not be local to an operations area, so
physically resolving some issues may not be an option.
Here are suggested actions to help relieve a buffer shortage:
Respond to any WTOR requesting an operator action.
Re-route the messages to another console by entering:
K Q,R=consname1,L=consname2
Here consname1 is the name of the console to receive the re-routed messages, and
consname2 is the name of the console whose messages are being rerouted. This only
reroutes messages already in the queue for the console.
It may be appropriate to cancel any jobs, identified using the D C,B command, that are
using a large number of buffers. The job or jobs may be flooding the console with
messages, and cancelling the job may help relieve the shortage.
Determine if there are outstanding action messages by using the DISPLAY REPLIES
command, as shown in Figure 14-19.
D R,L,CN=(ALL)
IEE112I 11.11.45 PENDING REQUESTS 894
RM=3
IM=36
CEM=18
EM=0
RU=0
IR=0
AMRF
ID: R/K
T TIME
SYSNAME JOB ID
MESSAGE TEXT
. . .
27065 C 09.19.14 SC55
*ATB052E LOGICAL UNIT SC55HMT
FOR TRANSACTION SCHEDULER ASCH
NOT ACTIVATED IN THE APPC
CONFIGURATION. REASON CODE =
5A.
27063 C 09.18.08 SC54
*ATB052E LOGICAL UNIT SC54HMT
FOR TRANSACTION SCHEDULER ASCH
NOT ACTIVATED IN THE APPC
CONFIGURATION. REASON CODE =
5A.
. . .
Figure 14-19 Display outstanding messages
296
You can then use the K C (control console) command to delete the outstanding action
messages that the action message retention facility (AMRF) has retained. An example of
this is shown in Figure 14-20.
K C,A,27063-27065
IEE146I K COMMAND ENDED-2 MESSAGE(S) DELETED
Figure 14-20 Deleting outstanding action messages
You can also use this command to change the limits of each buffer. For example,
Figure 14-22 shows the K M command to change the WTO message limit.
K M,MLIM=8000
IEE712I CONTROL
PROCESSING COMPLETE
297
Increasing the limits specified may require the use of more private storage in the console
address space (for MLIM) and ECSA for RLIM and LOGLIM, which may create other
system performance concerns. The maximum values of each type of buffer are listed in
Table 14-1.
Table 14-1 Maximum console buffer values
Type
Parameter
Maximum
WTO
MLIM
9999
WTOR
RLIM
9999
WTL
LOGLIM
999999
Deleting the message queue by using a K Q command is also an option. Use this
command to delete messages that are queued to an MCS or SMCS console (not EMCS).
This action affects only messages currently on the console's queue. Subsequent
messages are queued as usual, so the command may need to be issued a number of
times. Remember that the messages deleted from the queue will be lost. Use the following
command:
K Q[,L=consname]
Here L=consname is the name of the console whose message queue is to be deleted, or
blank defaults to the console where the command is issued.
DEVNUM=08E0 SYS=#@$3
298
CMDSYS=#@$3
If you are using an EMCS console, you must use the DISPLAY CONSOLE command to
determine the CMDSYS setting. To alter the CMDSYS value for your console, use the
CONTROL VARY command. For example, to change the CMDSYS value for console #@$3M01
to #@$1, enter:
K V,CMDSYS=#@$1,L=#@$3M01
There is no response to this command on the console. The IEE612I message is updated with
the new CMDSYS value. The L= parameter is only required if the change is to be made on a
console where the command is not entered.
IEE612I CN=#@$3M01
DEVNUM=08E0 SYS=#@$3
CMDSYS=#@$1
This change will remain in effect until the next IPL, or until another CONTROL command is
entered. If you want this change to be permanent, the system programmer would need to
change the CONSOLxx parmlib member.
UTC:
You are not limited to just one system name when using the ROUTE command. An example of
executing a command on more than one system in the sysplex is shown in Figure 14-26.
#@$3
RO (#@$1,#@$2),D T
#@$3
IEE421I RO (LIST),D T 471
SYSNAME RESPONSES --------------------------------------------------#@$1
IEE136I LOCAL: TIME=19.57.18 DATE=2007.197 UTC:
TIME=23.57.18 DATE=2007.197
#@$2
IEE136I LOCAL: TIME=19.57.18 DATE=2007.197 UTC:
TIME=23.57.18 DATE=2007.197
Figure 14-26 ROUTE command to multiple systems
To execute a command on all systems in the sysplex without listing all the system names in
the ROUTE command, you can enter:
RO *ALL,D T
299
This will result in a response from each system active in the sysplex.
To execute a command on all other systems in the sysplex except the one specified in the
SYS field in the IEE612I message, you can enter:
RO *OTHER,D T
System grouping
Another way of using the ROUTE command is to use a system group name. The system group
names can be set up by the system programmer using the IEEGSYS sample program and
procedure in SYS1.SAMPLIB.
In our examples, we have defined system groups as shown in Figure 14-27.
GROUP(TEST)
GROUP(DEVL)
GROUP(PROD)
NAMES(#@$1)
NAMES(#@$1,#@$2)
NAMES(#@$2,#@$3)
All active systems included in the system group will execute the command; see Figure 14-28.
These group names can be used to route commands to the active systems in the group.
#@$3
RO PROD,D T
#@$3
IEE421I RO PROD,D T 161
SYSNAME RESPONSES --------------------------------------------------#@$2
IEE136I LOCAL: TIME=23.17.23 DATE=2007.197 UTC:
TIME=03.17.23 DATE=2007.198
#@$3
IEE136I LOCAL: TIME=23.17.23 DATE=2007.197 UTC:
Figure 14-28 System group command
You can include more than one system group name, or include both system group names
and system names when using the ROUTE command. Be sure to put multiple names in
brackets with a comma (,) separating each name, for example:
RO (DEVL,#@$3),D T
Here DEVL is a system group and #@$3 is a system name that is outside the defined system
group, but is still in the sysplex.
300
D OPDATA
IEE603I 23.30.59 OPDATA DISPLAY 189
PREFIX
OWNER
SYSTEM
SCOPE
$
JES2
#@$2
SYSTEM
$
JES2
#@$1
SYSTEM
$
JES2
#@$3
SYSTEM
%
RACF
#@$2
SYSTEM
%
RACF
#@$1
SYSTEM
%
RACF
#@$3
SYSTEM
#@$1
IEECMDPF
#@$1
SYSPLEX
#@$2
IEECMDPF
#@$2
SYSPLEX
#@$3
IEECMDPF
#@$3
SYSPLEX
REMOVE
NO
NO
NO
NO
NO
NO
YES
YES
YES
FAILDSP
SYSPURGE
SYSPURGE
SYSPURGE
PURGE
PURGE
PURGE
SYSPURGE
SYSPURGE
SYSPURGE
When a command prefix has been defined, a command can be prefixed with the appropriate
system prefix to have a command executed on that system, without having to prefix the
command with the ROUTE command. For example, in Figure 14-30 we used the prefix #@$1 to
send a command to system #@$1 from a console on #@$3.
#@$3
#@$1 D T
#@$1
IEE136I LOCAL: TIME=23.40.36 DATE=2007.197
TIME=03.40.36 DATE=2007.198
UTC:
There is no requirement to put a space between the prefix and the beginning of the
command.
301
Multiple levels of policy specification allow criteria and actions to be applied to message
types, jobs, or even individual message IDs. The actions that may be taken during a message
flood include:
Preventing the flood messages from being displayed on a console.
Preventing the flood messages from being logged in the SYSLOG or OPERLOG.
Preventing the flood messages from being queued for automation.
Preventing the flood messages from propagating to other systems in a sysplex (if the
message is not displayed, logged or queued for automation).
Preventing the flood messages from being queued to the Action Message Retention
Facility (AMRF) if the message is an action message.
Taking action against the address space issuing the flood messages, by issuing a
command (typically a CANCEL command).
The PARAMETERS keyword can be specified to display the current values of all of the
parameters for all of the msgtypes. The msgtypes are either regular, action, or specific, as
shown in Figure 14-32.
D MSGFLD,PARAMETERS
MSGF901I Message Flood Automation parameters
Message type REGULAR
ACTION SPECIFIC
MSGCOUNT =
5
22
8
MSGTHRESH =
30
30
10
JOBTHRESH =
30
30
INTVLTIME =
1
1
1
JOBIMTIME =
2
2
SYSIMTIME =
2
2
5
NUMJOBS
=
10
10
Figure 14-32 Display Message Flood Automation parameters
The DEFAULTS keyword can be specified to display the current default actions to be taken for
all of the msgtypes, as shown in Figure 14-33 on page 303.
302
D MSGFLD,DEFAULTS
MSGF904I Message Flood Automation DEFAULTS
Message type REGULAR ACTION SPECIFIC
LOG
=
Y
Y
N
AUTO
=
Y
Y
N
DISPLAY
=
N
N
N
CMD
=
N
N
RETAIN
=
N
N
Figure 14-33 Display Message Flood Automation defaults
The JOBS keyword can be specified to display the current default actions to be taken for all of
the jobs that have been defined in the active MSGFLDxx parmlib member, as shown in
Figure 14-34.
D MSGFLD,JOBS
MSGF905I Message Flood Automation JOB actions
REGULAR messages LOG AUTO DISPLAY CMD RETAIN
JOB D1%%MSTR Y
N
N
N
ACTION messages LOG AUTO DISPLAY CMD RETAIN
JOB D2%%MSTR Y
N
N
N
N
Figure 14-34 Display Message Flood Automation jobs
The MSGS keyword can be specified to display the current default actions to be taken for all of
the messages that have been defined in the active MSGFLDxx parmlib member, as shown in
Figure 14-35.
D MSGFLD,MSGS
MSGF906I Message Flood Automation MSG actions
SPECIFIC messages LOG AUTO DISPLAY CMD RETAIN
MSG IOS000I
N
N
N
N
MSG IOS002A
N
N
N
N
MSG IOS291I
N
N
N
N
MSG IEA476E
N
N
N
N
MSG IEA491E
N
N
N
N
MSG IEA494I
N
N
N
N
MSG IEA497I
N
N
N
N
MSG IOS251I
N
N
N
N
MSG IOS444I
N
N
N
N
MSG IOS450E
N
N
N
N
. . .
Figure 14-35 Display Message Flood Automation messages
The MODE keyword can be specified to display the current intensive mode states for the three
message types, as shown in Figure 14-36.
D MSGFLD,MODE
MSGF040I Intensive modes: REGULAR-OFF
ACTION-OFF
SPECIFIC-OFF
303
304
Workspace
A workspace is the work area of the Tivoli Enterprise Portal application window. It is
comprised of one or more views. A view is a pane in the workspace (typically a chart, graph,
or table) showing data collected by a monitoring agent. As you select items in the Navigator,
each workspace presents views relevant to your selection. Every workspace has at least one
view, and every view has a set of properties associated with it. You can customize the
workspace by working in the Properties Editor to change the style and content of each view.
You can also change, add, and delete views on a workspace.
An example of the sysplex information available from the z/OS Management Console is the
Coupling Facility Systems Data for Sysplex workspace report shown in Figure 14-37. This
report displays status and storage information about the Coupling Facilities defined to the
sysplex. This workspace contains views such as;
The Dump Table Storage bar chart, which shows each Coupling Facility; the number of
4 K pages of storage reserved for dump tables; the number of pages currently holding
dumps; and the percentage of allocated storage currently being used.
The Coupling Facility Systems Information table displays basic status and storage
statistics for each Coupling Facility. From this table, you can link to the other workspaces
for the selected Coupling Facility
305
Another example of the information available in the z/OS Management Console is the z/OS
Health Checker information. An example of this is the Health Monitor Checks workspace,
which provides a summary of information about each health check. This workspace displays
data provided by the Health Checker Checks attribute group, as seen in Figure 14-38.
306
15
Chapter 15.
307
Figure 15-1 Logical and physical views of system logger-maintained log data
308
There are basically two types of users of system logger. One type of exploiter uses the
system logger as an archival facility for log data; for example, OPERLOG or LOGREC. The
second type of exploiter typically uses the data more actively, and explicitly deletes it when it
is no longer required; for example, the CICS DFHLOG. CICS stores information in DFHLOG
about running transactions, and deletes the records as the transactions complete. These are
called active exploiters.
IXGLOGRS is the command processor to start the system logger address space. IXGLOGRS
only starts the system logger address space (IXGLOGR) and then it immediately ends.
309
While an application is connected to a log stream, the supporting instance of the z/OS system
logger might fail independently of the exploiting application. When the z/OS system logger
address space fails, connections to log streams are automatically disconnected by the
system logger. All requests to connect are rejected. When the recovery processing
completes, the system logger is restarted and an Event Notification Facility (ENF) is
broadcast. On receipt of the ENF, applications may connect to log streams and resume
processing. During startup, system logger runs through a series of operations for all CF
structure-based log streams to attempt to recover and clean up any failed connections, and to
ensure that all data is valid.
[,L={a|name|name-a}
Note: An asterisk (*) can be used as a wildcard character with the DISPLAY LOGGER
command. Specify an asterisk as the search argument, or specify an asterisk as the last
character of a search argument. If used, the wildcard must be the last character in the
search argument, or the only character.
To display the current operational status of the system logger, use the D LOGGER,ST
command, as shown in Figure 15-4.
D LOGGER,ST
IXG601I
18.54.39 LOGGER DISPLAY 088
SYSTEM LOGGER STATUS
SYSTEM
SYSTEM LOGGER STATUS
------------------------#@$3
ACTIVE
Figure 15-4 Display Logger status
310
To check the state of a log stream and the number of systems connected to the log stream,
use the D LOGGER,LOGSTREAM command. The amount of output displayed will depend on the
number of log streams defined in the LOGR policy. See Figure 15-5 for an example of the
output.
D LOGGER,LOGSTREAM
IXG601I
19.17.49 LOGGER DISPLAY 472
INVENTORY INFORMATION BY LOGSTREAM
LOGSTREAM 1
STRUCTURE 2
----------------#@$C.#@$CCM$1.DFHLOG2
CIC_DFHLOG_001
#@$C.#@$CCM$1.DFHSHUN2
CIC_DFHSHUNT_001
#@$C.#@$CCM$2.DFHLOG2
CIC_DFHLOG_001
#@$C.#@$CCM$2.DFHSHUN2
CIC_DFHSHUNT_001
. . .
#@$3.DFHLOG2.MODEL
CIC_DFHLOG_001
#@$3.DFHSHUN2.MODEL
CIC_DFHSHUNT_001
ATR.#@$#PLEX.DELAYED.UR
RRS_DELAYEDUR_1
SYSNAME: #@$1
DUPLEXING: LOCAL BUFFERS
SYSNAME: #@$2
DUPLEXING: LOCAL BUFFERS
SYSNAME: #@$3
DUPLEXING: LOCAL BUFFERS
GROUP: PRODUCTION
. . .
IGWTV003.IGWSHUNT.SHUNTLOG LOG_IGWSHUNT_001
IGWTV999.IGWLOG.SYSLOG
*DASDONLY*
ING.HEALTH.CHECKER.HISTORY LOG_SA390_MISC
SYSPLEX.LOGREC.ALLRECS
SYSTEM_LOGREC
SYSPLEX.OPERLOG
SYSTEM_OPERLOG
. . .
#CONN
-----000000
000000
000000
000000
STATUS
-----AVAILABLE
AVAILABLE
AVAILABLE
AVAILABLE
000000 *MODEL*
000000 *MODEL*
000003 IN USE
3
4
5
6
7
000000
000000
000000
000000
000003
AVAILABLE
AVAILABLE
AVAILABLE
LOSS OF DATA
IN USE
1 Logstream name.
2 Structure name defined in the CFRM policy or *DASDONLY* when a DASD only configured
log stream is displayed.
3 The number of active connections from this system to the log stream, and the log stream
status.
Some examples of the status are:
4 Available: The log stream is available for connects.
5 Model: The log stream is a model and is exclusively for use with the LIKE parameter to set
up general characteristics for other log stream definitions.
6 In use: The log stream is available and has a current connection.
7 DUPLEXING: LOCAL BUFFERS indicates that the duplex copy of log stream resides in
system loggers data space.
8 Loss of data: There is a loss of data condition present in the log stream.
311
To display all defined log streams that have a DASD-only configuration, use the D
LOGGER,L,DASDONLY command. See Figure 15-6 for an example of the output.
D LOGGER,L,DASDONLY
IXG601I
20.26.46 LOGGER DISPLAY 840
INVENTORY INFORMATION BY LOGSTREAM
LOGSTREAM
STRUCTURE
----------------BDG.LOG.STREAM
*DASDONLY*
IGWTV999.IGWLOG.SYSLOG
*DASDONLY*
#CONN
-----000000
000000
STATUS
-----AVAILABLE
AVAILABLE
To check the number of connections to the log streams, use the D LOGGER,CONN command.
This command displays only log streams that have connectors on the system where the
command has been issued. See Figure 15-7 for an example of the output.
D LOGGER,CONN
IXG601I
19.50.58 LOGGER DISPLAY 695
CONNECTION INFORMATION BY LOGSTREAM FOR SYSTEM #@$3
LOGSTREAM
STRUCTURE
#CONN STATUS
---------------------- -----ATR.#@$#PLEX.RM.DATA
RRS_RMDATA_1
000001 IN USE
ATR.#@$#PLEX.MAIN.UR
RRS_MAINUR_1
000001 IN USE
SYSPLEX.OPERLOG
SYSTEM_OPERLOG
000002 IN USE
ATR.#@$#PLEX.DELAYED.UR
RRS_DELAYEDUR_1 000001 IN USE
ATR.#@$#PLEX.RESTART
RRS_RESTART_1
000001 IN USE
#@$#.SQ.MSGQ.LOG
I#$#LOGMSGQ
000001 IN USE
#@$#.SQ.EMHQ.LOG
I#$#LOGEMHQ
000001 IN USE
Figure 15-7 Display Logger connections
To display which jobnames are connected to the log stream, you can use the D
LOGGER,CONN,LSN=<logstream>,DETAIL command. This command displays only those log
streams that have connectors on the system where the command has been issued. See
Figure 15-8 for an example of the output using the sysplex OPERLOG as the log stream
example.
D LOGGER,C,LSN=SYSPLEX.OPERLOG,DETAIL
IXG601I
20.12.45 LOGGER DISPLAY 800
CONNECTION INFORMATION BY LOGSTREAM FOR SYSTEM #@$3
LOGSTREAM
STRUCTURE
#CONN STATUS
---------------------- -----SYSPLEX.OPERLOG
SYSTEM_OPERLOG
000002 IN USE
DUPLEXING: STAGING DATA SET
STGDSN: IXGLOGR.SYSPLEX.OPERLOG.#@$3
VOLUME=#@$#W1 SIZE=004140 (IN 4K)
% IN-USE=001
GROUP: PRODUCTION
JOBNAME: CONSOLE
ASID: 000B
R/W CONN: 000000 / 000001
RES MGR./CONNECTED: *NONE*
/ NO
IMPORT CONNECT: NO
Figure 15-8 Display Logger R/W connections
312
To display which log streams are allocated to a particular structure, use the D LOGGER,STR
command. The display shows whether a log stream is defined to the structure and whether it
is connected. See Figure 15-9 for an example of the output.
D LOGGER,STR
IXG601I
20.22.19 LOGGER DISPLAY 825
INVENTORY INFORMATION BY STRUCTURE
STRUCTURE
GROUP
CONNECTED
--------------------CIC_DFHLOG_001
PRODUCTION
#@$C.#@$CCM$1.DFHLOG2
YES
#@$C.#@$CCM$2.DFHLOG2
NO
#@$C.#@$CCM$3.DFHLOG2
NO
#@$C.#@$CWC2A.DFHLOG2
NO
. . .
LOG_TEST_001
*NO LOGSTREAMS DEFINED*
N/A
RRS_ARCHIVE_2
*NO LOGSTREAMS DEFINED*
N/A
RRS_DELAYEDUR_1
PRODUCTION
ATR.#@$#PLEX.DELAYED.UR
YES
RRS_MAINUR_1
PRODUCTION
ATR.#@$#PLEX.MAIN.UR
YES
RRS_RESTART_1
PRODUCTION
ATR.#@$#PLEX.RESTART
YES
RRS_RMDATA_1
PRODUCTION
ATR.#@$#PLEX.RM.DATA
YES
SYSTEM_LOGREC
PRODUCTION
SYSPLEX.LOGREC.ALLRECS
NO
SYSTEM_OPERLOG
PRODUCTION
SYSPLEX.OPERLOG
YES
Figure 15-9 Display Logger structures
313
the CF Structure-based log streams. Without the DETAIL(YES) keyword, only the log stream
definition are reported in the sysout.
//LOGRLIST JOB (0,0),'LIST LOGR POL',CLASS=A,REGION=4M,
//
MSGCLASS=X,NOTIFY=&SYSUID
//STEP1
EXEC PGM=IXCMIAPU
//SYSPRINT DD
SYSOUT=*
//SYSABEND DD
SYSOUT=*
//SYSIN
DD
*
DATA TYPE(LOGR) REPORT(NO)
LIST LOGSTREAM NAME(SYSPLEX.LOGREC.ALLRECS) DETAIL(YES)
/*
Figure 15-10 JCL to list log steam data using IXCMIAPU
The output of the LOGR list report is shown in Figure 15-11 on page 315. The report shown is
of the LOGREC log stream which has possible loss of data.
The listing shows the following:
1 Log stream information about how the log stream was defined.
2 Timing of the possible loss of data.
3 Log stream connection information, which shows the systems connected to the log stream
and their connection status.
4 Offload data set name prefix. The data set is a linear VSAM data set.
5 Information about the offload data sets in the log stream, the sequence numbers, and the
date and time of the oldest record in the data set.
314
User Data:
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
3
STRUCTURE
VERSION
---------------C0D636D2B8F5CDCC
C0D636D2B8F5CDCC
C0D636D2B8F5CDCC
CON
ID
-01
03
02
CONNECTION
VERSION
---------00010027
0003001B
00020021
CONNECTION
STATE
---------Active
Active
Active
<SEQ#>
-------A0000001
A0000002
A0000003
Lowest Blockid
---------------000000000043777C
000000000086F62E
0000000200A5D115
Highest GMT
----------------05/25/07 13:46:36
06/12/07 06:06:21
07/03/07 04:56:38
Highest Local
Status
----------------- ------05/25/07 09:46:36
06/12/07 02:06:21
07/03/07 00:56:38 CURRENT
4
5
For both CF structure-based and DASD only log streams, system logger marks a log stream
as permanently damaged when it cannot recover log data from either DASD staging data sets
or the local buffers after a system, sysplex, or Coupling Facility failure. Applications are
notified of the damage via system logger services and reason codes. Recovery actions are
necessary only if warranted for the application. Notify your system programmer if any loss of
data is identified.
Important: Never delete offload data sets (except orphaned ones) manually. This will
cause an unrecoverable loss of data.
315
There are several actions you can take to determine what could be inhibiting an offload:
Verify that there are no outstanding WTORs.
Determine whether there are inhibitors to offload processing by issuing commands such
as:
D
D
D
D
LOGGER,C,LSN=logstreamname
LOGGER,L,LSN=logstreamname
XCF,STRUCTURE,STRNAME=stucturename
GRS,C
If message IXG115A is displayed, as shown in Figure 15-13, reply only after you have
attempted to remedy any delayed offloads by responding to the related IXG312E
messages. As a last resort, if you reply TASK=END to an IXG115A message, then system
logger will terminate all the log stream connections in the structure named in the message
on this system.
Review the complete description of messages IXG311I, IXG312E, IXG114I, and IXG115I
in z/OS MVS System Messages Volume 10 (IXC - IZP), SA22-7640, before responding to
any of these messages.
IXG115A CORRECT THE OFFLOAD CONDITION ON sysname FOR strname OR REPLY
TASK=END TO END THE STRUCTURE TASK.
Figure 15-13 Offload error condition - IXG115A
316
IN USE:
The LOGR Couple Data Set, using the log data set directory extent, keeps track of all the
data sets in the log stream. Deleting them (with IDCAMS, for example) will not update the
LOGR Couple Data Set, and system logger will still think that the data set exists. It will report
missing data if an attempt is then made to access the data mapped in those offload data sets.
To resolve this situation, you can either try to understand which log stream is generating the
high number of offload data sets, or you can enlarge the DSEXTENT portion of the LOGR
couple data set.
To determine which log stream is using all the directory entries, you can run the IXCMIAPU
utility, as seen in Figure 15-10 on page 314,. Then take the corrective action against the log
stream, as described by the IXG261E or IXG262A messages.
If this does not solve the situation, here is a list of actions that may help the system
programmer resolve it:
Run the IXCMIAPU utility with the LIST option against the log streams to verify which log
streams are generating the high amount of offload data sets that are using all the directory
entries. Check if there is any anomaly in the definition of these log streams. A wrong
Chapter 15. z/OS system logger considerations
317
parameter may be the cause of the elevated number of offload data sets being created.
For example, a small value for LS_SIZE may be found. This means very small offload data
sets and if the log stream is generating a large amount of data, this can cause many
offload data sets being created, using all the available directory entries.
Define a new LOGR CDS with a bigger DSEXTENT value to allow new offload data sets
to be allocated and make this new LOGR Couple Data Set the active data set in the
sysplex.
Before allocating the new data set, you can display the current allocation with the
command D XCF,COUPLE,TYPE=LOGR, or you can run the IXCMIAPU and look for the
DSEXTENT field in the output display 1. This tells you how many extents are allocated in
the current LOGR Couple Data Set.
D XCF,COUPLE,TYPE=LOGR
IXC358I 01.47.02 DISPLAY XCF 709
LOGR COUPLE DATA SETS
PRIMARY
DSN: SYS1.XCF.LOGR01
VOLSER: #@$#X1
DEVN: 1D06
FORMAT TOD
MAXSYSTEM
12/11/2002 22:43:54
4
ADDITIONAL INFORMATION:
LOGR COUPLE DATA SET FORMAT LEVEL: HBB7705
LSR(200) LSTRR(120) DSEXTENT(10) 1
SMDUPLEX(1)
ALTERNATE DSN: SYS1.XCF.LOGR02
VOLSER: #@$#X2
DEVN: 1D07
FORMAT TOD
MAXSYSTEM
12/11/2002 22:43:58
4
ADDITIONAL INFORMATION:
LOGR COUPLE DATA SET FORMAT LEVEL: HBB7705
LSR(200) LSTRR(120) DSEXTENT(10) 1
SMDUPLEX(1)
LOGR IN USE BY ALL SYSTEMS
Figure 15-15 Display system logger Couple Data Set
The systems programmer can then use the IXCL1DSU utility to format a new LOGR Couple
Data Set, making sure the new Couple Data Set format has the appropriate number of LSRs,
LSTRRs, and a larger DSEXTENT.
After the new LOGR Couple Data Set is allocated, you can make it the alternate LOGR
Couple Data Set in your installation by issuing the command:
SETXCF COUPLE,ACOUPLE=(new_dsname),TYPE=LOGR
If the addition of this Couple Data Set is successful, then you can proceed and issue the
following command to switch control from the current primary to the new alternate Couple
Data Set:
SETXCF COUPLE,TYPE=LOGR,PSWITCH
A new alternate LOGR Couple Data Set will also need to be defined with the larger
DSEXTENT and allocated as the alternate Couple Data Set by issuing the command:
SETXCF COUPLE,ACOUPLE=(new_alternate_dsname),TYPE=LOGR
318
ACTIVE STRUCTURE
---------------ALLOCATION TIME: 07/03/2007 07:09:23
CFNAME
: FACIL01
2
COUPLING FACILITY: SIMDEV.IBM.EN.0000000CFCC1
PARTITION: 00
CPCID: 00
ACTUAL SIZE
: 19200 K
STORAGE INCREMENT SIZE: 256 K
. . .
# CONNECTIONS : 3
CONNECTION NAME ID VERSION SYSNAME JOBNAME
---------------- -- -------- -------- -------IXGLOGR_#@$1
03 00030039 #@$1
IXGLOGR
IXGLOGR_#@$2
01 00010055 #@$2
IXGLOGR
IXGLOGR_#@$3
02 0002003B #@$3
IXGLOGR
. . .
ASID
---0016
0016
0016
STATE
---------------ACTIVE
ACTIVE
ACTIVE
319
To initiate a structure rebuild to an alternate Coupling Facility in the preference list, use this
command:
SETXCF START,REBUILD,STRNAME=structure_name
SETXCF START,REBUILD,STRNAME=CIC_DFHLOG_001
IXC521I REBUILD FOR STRUCTURE CIC_DFHLOG_001
HAS BEEN STARTED
IXC367I THE SETXCF START REBUILD REQUEST FOR STRUCTURE
CIC_DFHLOG_001 WAS ACCEPTED.
IXC526I STRUCTURE CIC_DFHLOG_001 IS REBUILDING FROM
COUPLING FACILITY FACIL01 TO COUPLING FACILITY FACIL02.
REBUILD START REASON: OPERATOR INITIATED
INFO108: 00000013 00000013.
IXC521I REBUILD FOR STRUCTURE CIC_DFHLOG_001
HAS BEEN COMPLETED
Figure 15-17 System Logger structure rebuild
320
DATASET
IGNORE
D LOGREC
IFB090I
00.23.01 LOGREC DISPLAY 845
CURRENT MEDIUM = DATASET
MEDIUM NAME = SYS1.#@$3.LOGREC
SETLOGRC LOGSTREAM
IFB097I LOGREC RECORDING MEDIUM CHANGED FROM DATASET TO LOGSTREAM
D LOGREC
IFB090I
00.26.41 LOGREC DISPLAY 856
CURRENT MEDIUM = LOGSTREAM
MEDIUM NAME = SYSPLEX.LOGREC.ALLRECS
STATUS = CONNECTED
SETLOGRC DATASET
IFB097I LOGREC RECORDING MEDIUM CHANGED FROM LOGSTREAM TO DATASET
D LOGREC
IFB090I
00.33.55 LOGREC DISPLAY 868
CURRENT MEDIUM = DATASET
MEDIUM NAME = SYS1.#@$3.LOGREC
Figure 15-19 SETLOGRC command output
321
322
16
Chapter 16.
Network considerations in a
Parallel Sysplex
This chapter provides details of operational considerations to keep in mind for the network
environment about a Parallel Sysplex environment. It includes:
Virtual Telecommunications Access Method (VTAM) and its use of Generic Resources
(GR)
TCP/IP
Sysplex Distributor
Load Balancing Advisor (LBA)
IMS Connect
323
324
However, with the introduction of data sharing, it became necessary to have multiple SNA
application instances that could all access the same data. Taking CICS as an example, you
may have four CICS regions that all run the same applications and can access the same
data. To provide improved workload balancing and better availability, VTAM introduced a
function known as Generic Resources.
Generic Resources allows an SNA application to effectively have two APPLIDs. One ID s
unique to that application instance. The other ID is shared with other SNA application
instances that share the same data or support the same business applications. The one that
is shared is called the generic resource name. Now, when an application connects to VTAM, it
can specify its APPLID and also request to join a particular generic resource group with the
appropriate generic resource name.
Note: VTAM Generic Resource can only be used by SNA applications, not by TCP/IP
applications.
There can be a number of generic resource groups. For example, there might be one for the
TSO IDs on every system, another for all the banking CICS regions, another for all the test
CICS regions, and so forth. When someone wants to logon to one of the banking CICS
regions, they can now specify the generic resource name, rather than the name of one
specific CICS region. As a result, if one of the CICS regions is down, the user will still get
logged on, and is not even aware of the fact that one of the regions is unavailable. This also
provides workload balancing advantages because VTAM, together with WLM, will now
ensure that the user sessions are spread across all the regions in the group.
325
VTAM uses a list structure in the Coupling Facility (CF) to hold the information about all the
generic resources in the Parallel Sysplex. In the structure, it keeps a list of all the active
generic resource groups, the APPLIDs of all the SNA applications that are connected to each
of those groups, a list of LUs that are in session with each APPLID, and counts of how many
sessions there are with each instance within each group. This information is updated
automatically each time a session is established or terminated. The default name for this
structure is ISTGENERIC. but you can override the default name by specifying a different
structure name on the VTAM STRGR start option (however, it must still begin with IST*).
For VTAM to use the CF, there must be an active CFRM policy defined for the Parallel
Sysplex, and the structure must be defined in that policy. All the VTAMs in the Parallel
Sysplex that are part of the same generic resource configuration must be connected to the
CF containing the structure, as well as all the other CFs indicated by the preference list for the
structure. When VTAM in a Parallel Sysplex is started, it automatically attempts to connect to
the CF structure, after first checking that the CFRM policy is active. When the first VTAM
become active in the Parallel Sysplex, XES will allocate the storage for the CF structure.
The structure disposition is specified as DELETE, which means when the last connector
disconnects from the structure, it is deallocated from CF storage.
The connection disposition is specified as KEEP, which means the connection is placed in a
failed-persistent state if it terminates. If the connection is failed-persistent, that usually means
that the VTAM that disconnected still has data out in the CF.
When one of the VTAMs in the sysplex disconnects from the structure, the remaining VTAMs
will normally clean up after that VTAM and remove the connection. If they detect any data that
was not deleted, they will leave the connection in a failed-persistent state. In that case, when
you issue the VARY NET,CFS command to get VTAM to disconnect from the structure, the other
VTAMs detect that the VTAM that disconnected is still active, and therefore do not actually
clean up any information relating to that VTAM, so the connection stays in failed-persistent
state.
On the other hand, when you actually stop VTAM, the other VTAMs know that it is not active
and clean up the entries related to that VTAM. As long as there are no persistent affinities,
they will delete the failing VTAM's connection.
The local copy of the generic resource information contained in the VTAM nodes is needed to
rebuild the VTAM structure.
When you stop VTAM normally, the connection to the structure will be deleted, unless it is the
last connector in the sysplex. If it is the last connector in the sysplex, it will go into a
failed-persistent state. This is because there might be persistent information in the structure
about affinities between certain applications and generic resources, so VTAM protects that
data by keeping the structure.
Because it impacts the availability of applications that use generic resources, you should be
aware that VTAM records affinities between application instances and sessions with those
instances for any application that has been using a generic resource name. The reason for
this is that, if the user initially logs on using a generic resource name, and is routed to CICSA,
any subsequent logon attempts should be routed to the same application instance (CICSA).
To be able to do this, VTAM sets a flag for any LU that is using an application that has
registered a generic resource name: on any subsequent logon attempt, VTAM checks that
flag to see if the logon should be routed to a particular instance. If the VTAM GR function
should become unavailable for some reason, VTAM is no longer able to check this
information. As a result, it will refuse to set up new sessions with those applications. This is
why it is important that you understand how VTAM GR works and how to manage it.
326
ON 07/03/07
CDRDYN
CDSERVR
CINDXSIZ
CMPVTAM
CNNRTMSG
CONFIG
CPCDRSC
=
=
=
=
=
=
YES
***NA***
8176
0
***NA***
$3 1
= NO
NETID
NNSPREF
NODETYPE
NSRTSIZE
OSIEVENT
=
=
=
=
=
USIBMSC
NONE
EN
*BLANKS*
PATTERNS
STRGR
= ISTGENERIC 2
SUPP
= NOSUP
TCPNAME = *BLANKS*
327
When VTAM is restarted on your system, you see the message shown in Figure 16-4.
IXL014I IXLCONN REQUEST FOR STRUCTURE ISTGENERIC 461
WAS SUCCESSFUL. JOBNAME: NET ASID: 001B
CONNECTOR NAME: USIBMSC_#@$1M CFNAME: FACIL01
IST1370I USIBMSC.#@$1M IS CONNECTED TO STRUCTURE ISTGENERIC
Figure 16-4 VTAM connecting to the ISTGENERIC structure
Display information about VTAM generic resource groups (Figure 16-7 on page 329).
328
D NET,RSCLIST,IDTYPE=GENERIC,ID=*
IST097I DISPLAY ACCEPTED
IST350I DISPLAY TYPE = RSCLIST
IST1417I NETID
NAME
STATUS
TYPE
IST1418I USIBMSC TSO$$
ACT/S
GENERIC
IST1418I USIBMSC SCSMCS$$ ACTIV
GENERIC
IST1418I USIBMSC #@$C1TOR ACTIV
GENERIC
IST1418I USIBMSC ITSOI#$# INACT
GENERIC
IST1418I USIBMSC #@$C1TOR *NA*
GENERIC
IST1418I USIBMSC ITSOI#$# *NA*
GENERIC
IST1418I USIBMSC TSO$$
*NA*
GENERIC
IST1418I USIBMSC SCSMCS$$ *NA*
GENERIC
IST1454I 8 RESOURCE(S) DISPLAYED FOR ID=*
IST314I END
MAJNODE
**NA**
**NA**
**NA**
**NA**
**NA**
**NA**
**NA**
**NA**
RESOURCE
RESOURCE
RESOURCE
RESOURCE
USERVAR
USERVAR
USERVAR
USERVAR
Display who is using a specific VTAM generic resource group (Figure 16-8).
D NET,ID=TSO$$,E
IST097I DISPLAY ACCEPTED
IST075I NAME = TSO$$, TYPE = GENERIC RESOURCE
IST1359I MEMBER NAME
OWNING CP
SELECTABLE
IST1360I USIBMSC.SC$2TS
#@$2M
YES
IST1360I USIBMSC.SC$3TS
#@$3M
YES
IST1360I USIBMSC.SC$1TS
#@$1M
YES
IST314I END
APPC
NO
NO
NO
Note: When you have issued the command to disconnect from the structure and then
display the status of the structure, the connection will be in failed-persistent state, as
displayed in Figure 16-11.
329
D XCF,STR,STRNM=ISTGENERIC
IXC360I 00.56.41 DISPLAY XCF 581
STRNAME: ISTGENERIC
STATUS: ALLOCATED
...
CONNECTION NAME ID VERSION SYSNAME JOBNAME ASID STATE
---------------- -- -------- -------- -------- ---- ---------------USIBMSC_#@$1M
03 00030095 #@$1
NET
001B ACTIVE
USIBMSC_#@$2M
01 000100C3 #@$2
NET
001B ACTIVE
USIBMSC_#@$3M
02 0002008F #@$3
NET
001B FAILED-PERSISTENT 1
...
Figure 16-11 Failed-persistent state for structure
330
D NET,STATS,TYPE=CFS,STRNAME=ISTGENERIC
Figure 16-13 Display VTAM GR is active
Figure 16-14 illustrates sample output of the D NET,STATS command, displaying the status of
our GR structure.
D NET,STATS,TYPE=CFS,STRNAME=ISTGENERIC
IST097I DISPLAY ACCEPTED
IST350I DISPLAY TYPE = STATS,TYPE=CFS
IST1370I USIBMSC.#@$3M IS CONNECTED TO STRUCTURE ISTGENERIC
IST1797I STRUCTURE TYPE = LIST
IST1517I LIST HEADERS = 4 - LOCK HEADERS = 4
IST1373I STORAGE ELEMENT SIZE = 1024
IST924I ------------------------------------------------------------IST1374I
CURRENT
MAXIMUM PERCENT 1
IST1375I STRUCTURE SIZE
2560K
4096K
*NA*
IST1376I STORAGE ELEMENTS
4
77
5
IST1377I LIST ENTRIES
17
4265
0
IST314I END
Figure 16-14 Output from D NET,STATS command
1 If any one of the entries is utilized more than 80%, contact your systems programmer to
review and possibly alter the size of the structure. This type of monitoring used by the system
in a Parallel Sysplex environment is called structure full monitoring.
Structure full monitoring adds support for the monitoring of objects within a Coupling Facility
structure. Its objective is to determine the level of usage for objects that are monitored within
a CF and to issue a warning message to the console if a structure full condition is imminent.
The default value for the monitoring threshold is 80%.
You can also issue the command in Figure 16-15 to display the structure full monitoring
threshold for the particular structure.
D XCF,STR,STRNAME=ISTGENERIC
IXC360I 02.03.32 DISPLAY XCF 788
STRNAME: ISTGENERIC
STATUS: ALLOCATED
EVENT MANAGEMENT: POLICY-BASED
TYPE: SERIALIZED LIST
POLICY INFORMATION:
POLICY SIZE
: 4096 K
POLICY INITSIZE: 2560 K
POLICY MINSIZE : 0 K
FULLTHRESHOLD : 80 1
. . .
Figure 16-15 Display structure full threshold value
331
D NET,RSCLIST,IDTYPE=GENERIC,ID=*
IST097I DISPLAY ACCEPTED
IST350I DISPLAY TYPE = RSCLIST
IST1417I NETID
NAME
STATUS
TYPE 1
IST1418I USIBMSC ITSOI#$# INACT
GENERIC
IST1418I USIBMSC TSO$$
ACT/S
GENERIC
IST1418I USIBMSC SCSMCS$$ ACTIV
GENERIC
IST1418I USIBMSC #@$C1TOR ACTIV
GENERIC
IST1418I USIBMSC #@$C1TOR *NA*
GENERIC
IST1418I USIBMSC ITSOI#$# *NA*
GENERIC
IST1418I USIBMSC TSO$$
*NA*
GENERIC
IST1418I USIBMSC SCSMCS$$ *NA*
GENERIC
IST1454I 8 RESOURCE(S) DISPLAYED FOR ID=*
IST314I END
RESOURCE
RESOURCE
RESOURCE
RESOURCE
USERVAR
USERVAR
USERVAR
USERVAR
MAJNODE
**NA**
**NA**
**NA**
**NA**
**NA**
**NA**
**NA**
**NA**
To display a specific generic resource name for CICS, issue the command as displayed in
Figure 16-18.
D NET,SESSIONS,LU1=#@$C1TOR
IST097I DISPLAY ACCEPTED
IST350I DISPLAY TYPE = SESSIONS
IST1364I #@$C1TOR IS A GENERIC RESOURCE NAME FOR:
IST988I #@$C1T3A #@$C1T2A
IST924I ---------------------------------------------------IST878I NUMBER OF PENDING SESSIONS =
0
IST878I NUMBER OF ACTIVE SESSIONS =
1 1
IST878I NUMBER OF QUEUED SESSIONS =
0
IST878I NUMBER OF TOTAL
SESSIONS =
1
IST314I END
Figure 16-18 Displaying a specific generic resource name for CICS
1 There is 1 active session to the CICS Terminal Owning Region (TOR) named #@$C1TOR.
332
DATE: 07.04.07
1 Openstatus returns the value indicating the communication status between CICS and
VTAM.
2 Grstatus returns one of the following, indicating the status of Generic Resource registration.
Blanks are returned if the Generic Resource function is disabled for the CICS system.
Deregerror
Deregistered
Notapplic
Regerror
Registration was attempted but was unsuccessful and there has been
no attempt to deregister.
Registered
Unavailable
Unregistered
3 Grname returns the Generic Resource under which this CICS system requests registration to
VTAM. Blanks are returned if the Generic Resource function is not enabled for the CICS
system.
333
Issue the following command to deregister the CICS TOR, where stcname is the name of the
CICS TOR.
F stcname,CEMT SET VTAM DEREGISTER
Figure 16-20 Remove CICS from using a generic resource group
Figure 16-21 illustrates sample output of the deregister command CEMT SET VTAM
DEREGISTER.
F #@$C1T3A,CEMT SET VTAM DEREGISTER
+
Vtam
Openstatus( Open )
Psdinterval( 000000 )
Grstatus( Deregistered ) 1
Grname(#@$C1TOR)
NORMAL
RESPONSE: NORMAL TIME: 20.06.44 DATE: 07.04.07
SYSID=1T3A APPLID=#@$C1T3A
Figure 16-21 Sample output from the deregister command
1 The CICS TOR has been successfully deregistered from the generic resource group.
Refer to Figure 16-22 for sample output of the D NET,SESSIONS command. Notice that the
CICS TOR has been 1 removed from the generic resource group.
D NET,SESSIONS,LU1=#@$C1TOR
IST097I DISPLAY ACCEPTED
IST350I DISPLAY TYPE = SESSIONS
IST1364I #@$C1TOR IS A GENERIC RESOURCE NAME FOR:
IST988I #@$C1T2A 1
IST924I -------------------------------------------IST172I NO SESSIONS EXIST
IST314I END
Figure 16-22 Sample output from the D NET,SESSIONS command
Note: After you deregister the CICS TOR from the generic resource group, you must
restart the CICS TOR to register it to the generic resource group again.
334
and allow existing TSO/VTAM users on that system to continue working. To accomplish this,
issue the following command, where stcname is the TSO name.
F stcname,USERMAX=0
Figure 16-23 Command to set TSO to user max of zero (0)
Figure 16-24 shows the output after the command was issued to system #@$2.
RO #@$2,F TSO,USERMAX=0
IKT033I TCAS USERMAX VALUE SET TO 0
IKT008I TCAS NOT ACCEPTING LOGONS
Figure 16-24 TSO usermax set to zero (0) on system #@$2
This causes the TSO/VTAM on the image (#@$2) to deregister and to stop accepting new
TSO logons.
Figure 16-25 shows the output of the D NET,SESSIONS command prior to the TSO/VTAM
generic resource group from system #@$2 being removed.
D NET,SESSIONS,LU1=TSO$$
IST097I DISPLAY ACCEPTED
IST350I DISPLAY TYPE = SESSIONS
IST1364I TSO$$ IS A GENERIC RESOURCE NAME FOR:
IST988I SC$2TS
SC$3TS
SC$1TS 1
IST924I -----------------------------------------IST172I NO SESSIONS EXIST
IST314I END
Figure 16-25 D NET,SESSIONS prior to removing TSO on system #@$2
1 TSO on z/OS systems #@$1, #@$2, and #@$3 are all connected to the TSO generic
resource group.
D NET,SESSIONS,LU1=TSO$$
IST097I DISPLAY ACCEPTED
IST350I DISPLAY TYPE = SESSIONS
IST1364I TSO$$ IS A GENERIC RESOURCE NAME FOR:
IST988I SC$3TS
SC$1TS 1
IST924I -------------------------------------------IST172I NO SESSIONS EXIST
IST314I END
Figure 16-26 D NET,SESSIONS after removing TSO on system #@$2
1 TSO on z/OS systems #@$1 and #@$3 are the only ones connected to the TSO generic
resource group.
Figure 16-26 shows the output of the D NET,SESSIONS command after we removed TSO on
system #@$2 from using the TSO generic resource group.
335
After you deregister TSO from the generic resource group, you can issue the F
stcname,USERMAX=nn command where nn is greater than zero (0), or you can restart the TSO
started procedure, to register it to the generic resource group again.
In Figure 16-27, we issue MVS commands to reset the 1 TSO usermax value to 30 on z/OS
system #@$2, and then 2 display the usermax setting to verify that the 3 value has been set
correctly.
RO #@$2,F TSO,USERMAX=30 1
RO #@$2,D TS,L 2
IEE114I 20.55.29 2007.185 ACTIVITY 512
JOBS
M/S
TS USERS
SYSAS
INITS
00001
00040
00001
00033
00016
HAIN
OWT
ACTIVE/MAX VTAM
00001/00030 3
OAS
00011
Figure 16-28 shows the output of the D NET,SESSIONS command. Notice that 1 TSO (SC$2TS)
has been added to the generic resource group.
We issued the D NET,SESSIONS command shown in Figure 16-28 to verify that TSO on
system #@$2 has successfully been added to the generic resource group.
D NET,SESSIONS,LU1=TSO$$
IST097I DISPLAY ACCEPTED
IST350I DISPLAY TYPE = SESSIONS
IST1364I TSO$$ IS A GENERIC RESOURCE NAME FOR:
IST988I SC$2TS 1 SC$3TS
SC$1TS
IST924I ----------------------------------------------IST172I NO SESSIONS EXIST
IST314I END
Figure 16-28 System #@$2 added to generic resource group
The TCP/IP started task is the engine that drives all IP-based activity on z/OS. The TCP/IP
profile data set controls the configuration of the TCP/IP environment.
Figure 16-30 is a sample of the TCP/IP started task. The DD statements PROFILE and SYSTCPD
refer to data sets that contain various configuration information that is used by TCP/IP.
//TCPIP
//*
//TCPIP
//
//
//SYSPRINT
//ALGPRINT
//SYSOUT
//CEEDUMP
//SYSERROR
//PROFILE
//SYSTCPD
PROC PARMS='CTRACE(CTIEZB00)'
EXEC PGM=EZBTCPIP,
PARM='&PARMS',
REGION=0M,TIME=1440
DD SYSOUT=*,DCB=(RECFM=FB,LRECL=137,BLKSIZE=137)
DD SYSOUT=*,DCB=(RECFM=FB,LRECL=137,BLKSIZE=137)
DD SYSOUT=*,DCB=(RECFM=FB,LRECL=137,BLKSIZE=137)
DD SYSOUT=*,DCB=(RECFM=FB,LRECL=137,BLKSIZE=137)
DD SYSOUT=*
DD DISP=SHR,DSN=SYS1.TCPPARMS(TCPPRF&SYSCLONE.)
DD DSN=SYS1.TCPPARMS(TCPDATA),DISP=SHR
The TCP/IP profile member referred to by the PROFILE DD statement is read by TCP/IP
when it is started. If a change needs to be made to the TCP/IP configuration after it has been
started, TCP/IP can be made to reread the profile dynamically (or read a new profile
altogether) using the V TCPIP command. Additional information about the V TCPIP command
can be found in z/OS Communications Server: IP System Administration Commands,
SC31-8781.
337
D TCPIP,,HELP
EZZ0371I D...(NETSTAT|TELNET|HELP|DISPLAY|VARY|OMPROUTE|
EZZ0371I SYSPLEX|STOR)
Figure 16-31 D TCPIP command
Figure 16-32 shows the command for Help on the NETSTAT command.
D TCPIP,,HELP,NETSTAT
EZZ0372I D...NETSTAT(,ACCESS|ALLCONN|ARP|BYTEINFO|CACHINFO|
EZZ0372I CONFIG|CONN|DEVLINKS|HOME|IDS|ND|PORTLIST|ROUTE|
EZZ0372I SOCKETS|SRCIP|STATS|TTLS|VCRT|VDPT|VIPADCFG|VIPADYN)
Figure 16-32 Help on NETSTAT command
Figure 16-33 shows the command for Help on the TELNET command.
D TCPIP,,HELP,TELNET
EZZ0373I D...TELNET(,CLIENTID|CONNECTION|OBJECT|
EZZ0373I INACTLUS|PROFILE|WHEREUSED|WLM)
EZZ0373I V...TELNET(,ACT|INACT|QUIESCE|RESUME|STOP)
Figure 16-33 Help on TELNET command
Figure 16-34 on page 339 shows the command for Help on the VARY command.
338
D TCPIP,,HELP,VARY
EZZ0358I V...(,DATTRACE|DROP|OBEYFILE|OSAENTA|PKTTRACE|
EZZ0358I PURGECACHE|START|STOP|SYSPLEX|TELNET)
Figure 16-34 Help on VARY command
Figure 16-35 shows the command for Help on the OMPROUTE command.
D TCPIP,,HELP,OMPROUTE
EZZ0626I D...OMPROUTE(,GENERIC|GENERIC6|IPV6OSPF|IPV6RIP|
EZZ0626I OSPF|RIP|RTTABLE|RT6TABLE)
Figure 16-35 Help on OMPROUTE command
Figure 16-36 shows the command for Help on the SYXPLEX command.
D TCPIP,,HELP,SYSPLEX
EZZ0637I D...SYSPLEX,(GROUP|VIPADYN)
EZZ0637I V...SYSPLEX,(LEAVEGROUP|JOINGROUP|DEACTIVATE|REACTIVATE
|QUIESCE|RESUME)
Figure 16-36 Help on SYSPLEX command
Figure 16-37 shows the command for Help on the STOR command.
D TCPIP,,HELP,STOR
EZZ0654I D...STOR<,MODULE=XMODID>
Figure 16-37 Help on STOR command
Note: Although these commands are display only, some of the options returned have the
potential to impact your TCP/IP configuration. If you are unsure about the outcome, consult
your support staff.
339
ORIGINATION
VIPABACKUP 2
DISTSTAT
VIPABACKUP 3
VIPADEFINE 1
340
341
In Figure 16-40, the load balancer is configured with a list of systems and applications that it
will balance. The load balancer tells the Load Balancing Advisor about the applications by
specifying an IP address, port, and protocol, or about the systems by specifying an IP
address. Note the following:
The advisor is configured with a list of authorized load balancers and a list of load
balancing agents with which it can gather data, and with a poll interval at which the agents
update the advisor's data.
Each agent gathers data on its own z/OS system about the TCP/IP stacks and
applications running on that system. The agent is configured with the information it needs
to contact the advisor.
The advisor consolidates the data from all its agents, and returns the data to the load
balancer to advise the load balancer about the status of the systems and applications.
342
D TCPIP,,NETSTAT,ALLCONN
EZZ2500I NETSTAT CS V1R8 TCPIP 304
USER ID CONN
LOCAL SOCKET
#@$CWE3A 00000033 0.0.0.0..4445
BPXOINIT 00000012 0.0.0.0..10007
D#$3DIST 0000008B 0.0.0.0..33366
D#$3DIST 0000008C 0.0.0.0..33367
FTPD1
00000011 0.0.0.0..21
I#$3CON 000000AC 0.0.0.0..7302 1
I#$3CON 000000AB 0.0.0.0..7301 2
I#$3CON 000000AD 0.0.0.0..7303 3
FOREIGN SOCKET
0.0.0.0..0
0.0.0.0..0
0.0.0.0..0
0.0.0.0..0
0.0.0.0..0
0.0.0.0..0
0.0.0.0..0
0.0.0.0..0
STATE
LISTEN
LISTEN
LISTEN
LISTEN
LISTEN
LISTEN
LISTEN
LISTEN
1, 2, and 3 display ports 7301, 7302, and 7303 as being used by IMS Connect.
Refer to Chapter 19, IMS operational considerations in a Parallel Sysplex on page 397, for
more information about IMS.
343
344
17
Chapter 17.
345
346
347
to talk to each other, overriding the access method specified on the connection resource
definition.
Figure 17-2 shows a terminal connected to one CICS system running with a user transaction
in another CICS system. Communication between the terminal and the user transaction is
handled by a CICS-supplied transaction called the relay transaction.
The CICS system that owns the terminal is called the terminal-owning region (TOR). The
CICS system that owns the transaction is called the application-owning region (AOR). These
terms are not meant to imply that one system owns all the terminals and the other system all
the transactions, although this is a possible configuration.
The terminal-owning region and the application-owning region must be connected by MRO or
APPC links.
348
For CICS logging, you can use Coupling Facility-based logstreams, DASD only logstreams,
or a combination of both. Remember that all connections to DASD only logstreams must
come from the same z/OS image, which means you cannot use a DASD only logstream for a
user journal that is accessed by CICS regions executing on different z/OS images. For
CF-based logstreams, only place logstreams with similar characteristics (such as frequency
and size of data written to the logstream) in the same structure.
17.4.1 DFHLOG
The DFHLOG logstream is the primary CICS log, often referred to as the CICS System Log.
DFHLOG contains transient data relating to an in-progress unit of work. The data contained
within the logstream is used for dynamic transaction backout (or backward recovery) and
emergency restart. CICS access to data in DFHLOG is provided by system logger.
When CICS is active, it writes information about its transactions to DFHLOG. Periodically
CICS tells the system logger to delete the DFHLOG records related to transactions that have
completed. If the log structure has been defined with enough space, it is unusual for data from
the DFHLOG logstream to be offloaded to a logger offload data set.
DFHLOG logstreams are used exclusively by a single CICS region. You have one DFHLOG
logstream per CICS region.
The log streams are accessible from any system that has connectivity to the CF containing
the logger structure that references the logstreams. If a z/OS system fails, it is possible to
restart the affected CICS regions on another z/OS system, which would still be able to access
their DFHLOG data in the CF.
DFHLOG is required for the integrity of CICS transactions. Failed transactions cannot be
backed out if no backout information is available in DFHLOG. CICS will stop working if it
cannot access the data in the DFHLOG logstream.
17.4.2 DFHSHUNT
The DFHSHUNT logstream is the secondary CICS log, which is also referred to as the CICS
SHUNT Log. It contains ongoing data relating to incomplete units of work; the data is used for
resynchronizing in-doubt UOWs (even on a cold start). Information or data about long-running
transactions is moved (or shunted) from DFHLOG to DFHSHUNT.
The status of a UOW defines whether or not it is removed (shunted) from DFHLOG to the
secondary system log, DFHSHUNT. If the status of a unit of work that has failed is at one of
the following points, it will be shunted from DFHLOG to DFHSHUNT pending recovery from
the failure:
While in doubt during a two-phase commit process.
While attempting to commit changes to resources at the end of the UOW.
While attempting to back out the UOW.
When the failure that caused the data to be shunted is fixed, the shunted UOW is
resolved. This means the data is no longer needed and is discarded.
349
DFHSHUNT logstreams are used exclusively by a single CICS region; you have one
DFHSHUNT logstream for EACH CICS region.
17.4.3 USRJRNL
The USERJRNL logstream contains recovery data for user journals where block writes are
not forced. A block write is several writes (each being a block) to a logstream that may get
grouped together and written as a group rather than being written immediately (block write
forced). The USERJRNL structure is optional and was designed primarily to be customized
and used by customers to manipulate their own data for other purposes.
17.4.4 General
The GENERAL logstream is another, more basic log that contains recovery data for forward
recovery, auto-journaling, and user journals.
350
351
352
Check the structures size, location, and connectors by issuing the z/OS command:
D XCF,STR,STRNAME=DFHXQLS_*
DFHXQ0304I
DFHXQ0351I
DFHXQ0352I
DFHXQ0303I
DFHXQ0307I
DFHXQ0111I
AXMSC0061I
DFHXQ0461I
DFHXQ0112I
STOP command is waiting for connections to be closed. Number of active connections = 1.1
Connection: Job #@$C1A2A Appl #@$C1A2A Idle 00:00:04
2
Queue pool #@$STOR1 total active connections: 1.
DISPLAY command has been processed.
CANCEL command has been processed. Number of active connections = 1.
3
Shared TS queue server for pool #@$STOR1 is terminating.
Server DFHXQ.#@$STOR1 is now disabled for connections.
Disconnected from CF structure DFHXQLS_#@$STOR1.
Shared TS queue server has terminated, return code 8, reason code 307.
353
354
If System-Managed Duplexing were used, recovery would have been seamless. See 7.4,
Structure duplexing on page 112 for more details about that topic.
355
Data table pools are CF list structures that are defined in the CFRM policy. There must be a
definition statement in the CFRM policy for each list structure. These statements define
parameters such as maximum size of structure and initial size and preference list for the CFs.
Poolnames are defined in the CFDT server parms and must be in the form
DFHCFLS_user-given poolname.
Each z/OS image must have a CFDT server for each CFDT pool. A CFDT is created
automatically when the first file that names it is opened.
Typical uses might include sharing scratchpad data between CICS regions across a sysplex,
or sharing of files for which changes do not have to be permanently saved. Coupling Facility
data tables are particularly useful for grouping data into different tables, where the items can
be identified and retrieved by their keys. You could use a record in a Coupling Facility data
table to:
Maintain the next free order number for use by an order processing application
Look up tables of telephone numbers
Store data extracted from a larger file or database for further processing
356
5. Check the structures size, location and connectors by issuing the z/OS command:
D XCF,STR,STRNAME=DFHCFLS_*
357
358
restarted, where it will attempt to connect to the original structure. If this should fail, it will
allocate a new structure in an alternate CF.
DFHCF0424 Connectivity has been lost to CF structure 445
DFHCFLS_#@$CFDT1. The CF data table server cannot continue.
DFHCF0307I CANCEL RESTART=YES command has been processed. Number of
active connections = 0.
DFHCF0111I CF data table server for pool #@$CFDT1 is terminating.
AXMSC0061I Server DFHCF.#@$CFDT1 is now disabled for connections.
Figure 17-16 Loss of CF connectivity to the CFDT structure
Recovery would have been seamless if System-Managed Duplexing had been used. See 7.4,
Structure duplexing on page 112 for more details about that topic.
359
1. Check the structures size, location and connectors by issuing the z/OS command:
D XCF,STR,STRNAME=DFHNCLS_*
360
361
Recovery would have been seamless if System-Managed Duplexing had been used. See 7.4,
Structure duplexing on page 112 for more details about that topic.
362
363
ISPF interface
An ISPF interface is available to carry out operational and administrative tasks.
365
Workload Manager
The Workload Manager (WLM) component of CPSM provides for dynamic workload
balancing. WLM routes transactions to regions based upon predefined performance criteria. If
one region reaches a performance threshold, either through volume of work or because of
some problem, WLM stops routing work to it until the workload has reduced. WLM, therefore,
ensures optimum capacity usage and throughput, and guards against any system in its
cluster becoming a single point of failure.
Note that with WLM, work is not balanced in a round-robin fashion. WLM selects the system
most likely to meet specified criteria by using either the QUEUE algorithm or GOAL algorithm.
QUEUE algorithm
The QUEUE algorithm uses the following selection criteria:
Selects the system with shortest queue of work relative to system MAXTASKS
The system least likely to be affected by Short on Storage, SYSDUMP, and TRANDUMP
conditions
The system least likely to cause the transaction to abend
Standardizes response times across a CICSPlex
Accommodates differences in processor power and MAXTASK values, asymmetric region
configuration and unpredictable workloads.
GOAL algorithm
The GOAL algorithm uses the following selection criteria:
Selects system least likely to be affected by SOS, SYSDUMP, and TRANDUMP
conditions
The system least likely to cause the transaction to abend
Most likely to meet average MVS WLM response time goals
366
18
Chapter 18.
367
Group attachment
Each DB2 member in a data sharing group must have a unique subsystem name. To facilitate
this, the Group Attachment Name was created. This is a common name that can be used by
batch jobs, utilities, IMS BMPs, and CICS TS to connect to any DB2 subsystem within the
data sharing group. The Group Attachment Name is specified in the IEFSSNxx member of
PARMLIB, or created dynamically via the SETSSI command.
368
Lock structure
There is one lock structure per data sharing group. This is used by IRLM to serialize the
resources used by the associated data sharing group. The naming convention for the lock
structure is DB2-data-sharing-groupname_LOCK1.
List structure
There is one list structure per data sharing group used as a Shared Communication Area
(SCA) for the members of the group. The SCA contains all database exception status
conditions, copies of the Boot Strap Data Sets (BSDS), and other information. The naming
convention for the SCA structure is DB2-data-sharing-groupname_SCA.
Cache structure
Group Buffer Pools (GBPs) are used to cache data in the CF and to maintain the consistency
of data across the buffer pools of members of the group by using a cross-invalidating
mechanism. Cross Invalidation is used to notify a member when its local buffer pool contains
an out-of-date copy of the data. The next time the DB2 member tries to use that data, it must
get the current data from either the GBP or DASD.
One GBP is used for all local buffer pools of the same name in the DB2 group that contain
shared data. For example, each DB2 must have a local buffer pool named BP0 to contain the
catalog and directory table spaces. Therefore, you must define a GBP0 in a CF that maps to
local buffer pool BP0.
369
disposition of DELETE implies that the structure will be deallocated when all the connectors
are gone.
Assumptions
In this chapter the following environment is used for our examples:
DB2 Version 8.1
We will assume a DB2-data-sharing-groupname of D#$#
3 x DB2 Subsystem names of D#$1, D#$2, and D#$3 in our DB2 data sharing group
2 x Coupling Facility, FACIL01 and FACIL02
Determine if any GBP structures are allocated using the z/OS XCF command:
D XCF,STR
IXC359I 20.35.20 DISPLAY XCF 418
STRNAME
ALLOCATION TIME
D#$#_GBP0
--D#$#_GBP1
--D#$#_GBP32K
--D#$#_GBP32K1
--D#$#_LOCK1
06/20/2007 03:32:15
D#$#_SCA
06/20/2007 03:32:10
STATUS
NOT ALLOCATED
NOT ALLOCATED
NOT ALLOCATED
NOT ALLOCATED
ALLOCATED
ALLOCATED
370
TYPE
LOCK
LIST
1
2
371
This option would normally be used when you have finished testing duplexing and have
decided that you want to use it on a permanent basis.
When considering duplexing, note the following to avoid confusion:
The primary structure is referred to as the OLD structure in many of the displays and the
more recently allocated secondary structure is called the NEW structure.
When moving a structure to another Coupling Facility or implementing a duplex structure,
it is important to ensure that enough storage capacity is available to create every other
structure that could be allocated in that CF. The z/OS system programmer is responsible
for ensuring that there is sufficient storage capacity available in the Coupling Facility to
allow every structure that needs to be allocated in a failure scenario.
Be aware that if a structure is manually moved or duplexed, this has to be included in the
calculation. Otherwise, another structure allocation may fail or a duplex recovery may fail
due to CF storage not being available.
372
D XCF,STR,STRNM=D#$#_GBP1
IXC360I 15.13.14 DISPLAY XCF 022
STRNAME: D#$#_GBP1
STATUS: ALLOCATED 1
TYPE: CACHE
POLICY INFORMATION:
POLICY SIZE : 8192 K
POLICY INITSIZE: 4096 K
POLICY MINSIZE : 0 K
FULLTHRESHOLD : 80
ALLOWAUTOALT : NO
REBUILD PERCENT: N/A
DUPLEX : ALLOWED 2
ALLOWREALLOCATE: YES
PREFERENCE LIST: FACIL02 FACIL01 3
ENFORCEORDER : NO
EXCLUSION LIST IS EMPTY
ACTIVE STRUCTURE
---------------ALLOCATION TIME: 06/29/2007 01:07:414
CFNAME : FACIL02 5
COUPLING FACILITY: SIMDEV.IBM.EN.0000000CFCC2
PARTITION: 00 CPCID: 00
ACTUAL SIZE : 4096 K 6
STORAGE INCREMENT SIZE: 512 K
ENTRIES: IN-USE: 18 TOTAL: 2534, 0% FULL
ELEMENTS: IN-USE: 18 TOTAL: 506, 3% FULL
PHYSICAL VERSION: C0932A63 30E5DD94
LOGICAL VERSION: C0932A63 30E5DD94
SYSTEM-MANAGED PROCESS LEVEL: 14
DISPOSITION : DELETE 7
ACCESS TIME : 0
MAX CONNECTIONS: 32
# CONNECTIONS : 3 8
CONNECTION NAME ID VERSION SYSNAME JOBNAME ASID STATE
---------------- -- -------- -------- -------- ---- ---------DB2_D#$1 02 00020021 #@$1 D#$1DBM1 0044 ACTIVE 8
DB2_D#$2 01 0001001D #@$2 D#$2DBM1 003A ACTIVE
DB2_D#$3 03 00030019 #@$3 D#$3DBM1 003C ACTIVE
Figure 18-4 Output of the XCF view of the GBP structure
373
FREE
511488 K
0 K
0
0
2048
2048
0
256
K
K
K
K
K
TOTAL
723456 K
0 K
374
From the output in Figure 18-6, note that the new structure has been allocated in 1 CF
FACIL01 while the OLD structure still exists in 2 CF FACIL02. During the duplexing process,
changed pages are copied from the primary to the secondary structure. The message 3
DSNB332I shows that nothing in the primary structure has been copied to the secondary
structure. This could be because the duplexing process was started before the jobs got a
chance to write any data into the GBP.
You may find that one of the systems will copy all the pages while the other DB2 Group
members do not copy anything. This is because the castout owner is the one that is
375
responsible for copying pages for the page sets that it owns from the primary structure to the
secondary one.
The castout owner is generally the DB2 subsystem that first updates a given page set or
partition. In this instance, DB2 must cast out any data that will not fit in the secondary
structure. If this happens, you will get a non-zero value in the PAGES CAST OUT FROM
ORIGINAL STRUCTURE field in the DSNB332I message, and this should be treated as an
indicator of a possible problem.
If the secondary structure is smaller than the primary one, DB2 will treat both structures as
though they were the same size as the smaller of the two, resulting in wasted CF storage and
degraded performance.
376
Figure 18-7 Output of the XCF view of the GBP structure after user-initiated duplexing
Figure 18-7 displays the status after the user-initiated duplexing of the GBP structure:
The information in the STATUS field tells you that a rebuild has been initiated. It also
indicates that the rebuild was a duplexing rebuild 1 and that the duplex pair has been
established 2.
The structure INITSIZE as defined in the CFRM policy 3. This value is used when
allocating the secondary structure. If the size of the primary structure was altered prior to
Chapter 18. DB2 operational considerations in a Parallel Sysplex
377
the start of the duplexing, then the primary structure will be a different size than the
secondary structure.Check to see that the POLICY INITSIZE is the same as the ACTUAL
SIZE for each of the structure instances 8. If it is not, then inform your DB2 systems
programmer as soon as possible.
The DUPLEX option defined in the CFRM policy is ALLOWED 4. This means that
duplexing must be started by the operator instead of the system starting it automatically,
as would happen if DUPLEX was set to ENABLED.
The information relating to the secondary (NEW) structure 5 indicates that, because this is
a duplexed structure, then the primary (OLD) structure will not be deleted. For the
secondary structure, notice that ALLOCATION TIME 6 for this structure is later than the
ALLOCATION TIME 9 for the primary structure.
The CF that this structure allocated in 7 must be a different CF from the primary structure
in 10.
The same information is provided for the primary (OLD) structure 11.
Looking at the list of connections, notice that three lines are still provided, one for each of
the DB2 subsystems, with new information in the STATE field 12. For a simplex structure,
this usually says ACTIVE or possibly FAILED-PERSISTENT. Now, however, it displays
the status ACTIVE and tells which structure instance each DB2 has a connection to. In our
example, all three DB2s have an ACTIVE connection to both the OLD and NEW structure
instances.
378
379
Figure 18-9 shows the structure D#$#_GBP1 2 is in both FACIL01 and FACIL02. The process
would be exactly the same for any of the other GBP structures. Note that the OLD structure is
the Primary and the NEW structure is the secondary. Updates are made to both OLD and
NEW structures, but reads are only done from the OLD (primary) structure.
1 identifies systems connected to the Coupling Facility.
The following example demonstrates how to keep the primary GBP as the surviving one. The
process is almost the same regardless of which instance you are keeping.
Stopping duplexing
Issue the command to stop duplexing, specifying that you want to keep the primary (OLD)
instance of the structure:
SETXCF STOP,REBUILD,DUPLEX,STRNAME=D#$#_GBP1,KEEP=OLD
Specifying KEEP=OLD tells XCF that you want to stop duplexing and retain the primary
structure. To continue using the secondary structure, and delete the primary, you would
specify KEEP=NEW. You may want do this if you had to remove the CF containing the
primary structure, but be aware of the performance impact.
Figure 18-10 on page 382 displays the messages issued while duplexing is being stopped.
381
SETXCF STOP,REBUILD,DUPLEX,STRNAME=D#$#_GBP1,KEEP=OLD
IXC522I REBUILD FOR STRUCTURE D#$#_GBP1
IS BEING STOPPED TO FALL BACK TO THE OLD STRUCTURE DUE TO
REQUEST FROM AN OPERATOR 1
IXC367I THE SETXCF STOP REBUILD REQUEST FOR STRUCTURE
D#$#_GBP1 WAS ACCEPTED.
DSNB742I -D#$2 DSNB1GBR DUPLEXING HAS BEEN
SUCCESSFULLY ESTABLISHED FOR
GROUP BUFFER POOL GBP1
DSNB743I -D#$1 DSNB1GBR DUPLEXING IS BEING STOPPED
FOR GROUP BUFFER POOL GBP1
FALLING BACK TO PRIMARY
REASON = OPERATOR 2
DB2 REASON CODE = 00000000
IXC579I NORMAL DEALLOCATION FOR STRUCTURE D#$#_GBP1 IN
COUPLING FACILITY SIMDEV.IBM.EN.0000000CFCC1
PARTITION: 0 CPCID: 00
HAS BEEN COMPLETED.
PHYSICAL STRUCTURE VERSION: B48FCF37 01F2E844
INFO116: 13089180 01 2800 00000004
TRACE THREAD: 00001A1B.
DSNB743I -D#$2 DSNB1GBR DUPLEXING IS BEING STOPPED
FOR GROUP BUFFER POOL GBP1
FALLING BACK TO PRIMARY
REASON = OPERATOR
DB2 REASON CODE = 00000000
IXC521I REBUILD FOR STRUCTURE D#$#_GBP1
HAS BEEN STOPPED
DSNB745I -D#$1 DSNB1GBR THE TRANSITION BACK TO
SIMPLEX MODE HAS COMPLETED FOR
GROUP BUFFER POOL GBP1 3
Figure 18-10 Stopping structure duplexing
Two messages confirm that the secondary GBP structure is being deallocated from CF
FACIL01 and that duplexing is stopped.
The first message, IXC522I, is from XCF and it states that the rebuild has stopped and is
falling back to the OLD structure 1. This is due to the operator requested change.
Message DSNB743I is from DB2 (you will get one of these from every DB2 in the data
sharing group). It advises that it is falling back to the primary structure, and that the change
was requested by an operator 2.
After the rebuild has stopped, you receive message IXC579I indicating that the structure has
been deleted from FACIL01.The last message, DSNB745I, indicates that the processing
related to switching back to simplex mode has completed 3.
Check for successful completion by confirming that GBP duplexing has stopped:
D XCF,STR,STRNAME=D#$#_GBP1
The following should be true:
The GBP structure should only be located in one Coupling Facility.
In SIMPLEX mode, there is only one structure, which should have a state of ACTIVE. This
was the primary structure previously called the OLD structure when duplexed.
382
383
1 POLICY SIZE is the largest size that the structure can be increased to, without updating the
CFRM policy.
384
2 POLICY INITSIZE is the initial size of the structure, as defined in the CFRM policy.
3 ACTUAL SIZE is the size of the structure at this time.
4 CFNAME: FACIL02 details where the structure currently resides.
2. Check that there is sufficient free space in the current CF.
D CF,CFNAME=FACIL02
The field FREE SPACE: displays the amount of space available.
3. Extend the structure size with the ALTER command.
SETXCF START,ALTER,STRNM=D#$#_GBP1,SIZE=8192
SETXCF START,ALTER,STRNAME=D#$#_GBP1,SIZE=8192
IXC530I SETXCF START ALTER REQUEST FOR STRUCTURE D#$#_GBP1 ACCEPTED.
IXC533I SETXCF REQUEST TO ALTER STRUCTURE D#$#_GBP1 064
COMPLETED. TARGET ATTAINED.
CURRENT SIZE: 8192 K TARGET: 8192 K
IXC534I SETXCF REQUEST TO ALTER STRUCTURE D#$#_GBP1 065
COMPLETED. TARGET ATTAINED.
CURRENT SIZE: 8192 K TARGET: 8192 K
CURRENT ENTRY COUNT: 6142 TARGET: 6142
CURRENT ELEMENT COUNT: 1228 TARGET: 1228
CURRENT EMC COUNT: 0 TARGET: 0
Figure 18-13 Output from SETXCF ALTER
The DSNB758I message shows that the new size has been allocated.
To make these changes permanent, modify the structure size definition in the CFRM policy
and then rebuild the structure.
385
Note that you cannot rebuild a duplexed structure. A duplexed structure already has a copy in
both CFs. If you want to free up one CF, you would revert to simplex mode, deleting the
structure that is in the CF that you wish to free up.
If you have a number of duplexed structures in a CF that you want to empty out, you can
issue the command SETXCF STOP,REBUILD,DUPLEX,CFNAME=Target CF to revert all the
structures to simplex mode, and to delete whichever structure instance (primary or
secondary) might be in the named CF.
To move a GBP structure to another CF using REBUILD, follow these steps.
1. Check the current GBP1 structure size, location, and connectors and the preference list:
D XCF,STR,STRNAME=D#$#_GBP1
2. Check that enough free space is available in the new location:
D CF,CFNAME=Target CF name
3. The structure must be allocated in the alternate CF; this will be the next CF in the
preference list. Perform the rebuild:
SETXCF START,RB,STRNM=D#$#_GBP1,LOC=OTHER
4. All structure data is copied from the old structure to the new structure.
5. All the connections are moved from the original structure to the new structure.
6. Activity is resumed.
7. The original structure is deleted.
8. Check the current GBP1 structure size, location and connectors:
D XCF,STR,STRNAME=D#$#_GBP1
The GBP structure should now be allocated in the target Coupling Facility and all DB2
systems should still have ACTIVE connections to the structure.
386
387
This information is available to all members of the data sharing group. Other DB2s have the
ability to recover information from a failure during a group-wide restart if one of the DB2 is not
restarted. This is known as a Peer Restart of the Current Status Rebuild phase of DB2
restart.
The DB2 member performing the Peer Restart needs to know the log information about the
non-starting member. It finds this out by reading the SCA structure. The first connector to the
structure is responsible for building the structure if it does not exist.
The SCA structure supports REBUILDPERCENT and automatic rebuild, if the SFM policy is
defined with CONNFAIL(YES) specified. Manual rebuild is supported using the SETXCF
START,REBUILD command without stopping the data sharing group members.
388
Note: We do not recommend deleting the SCA structure (even though it is possible to do
so). If there is a catastrophic failure and all DB2s in the data sharing group come down
without resource cleanup, then the recovery information needed to perform a peer
recovery is still available. When a DB2 restarts, it will have the names of the logs of the
other DB2s, what the current location in the logs are, where the oldest unit of recovery is,
and other recovery information in the SCA.
To remove the structure from a Coupling Facility to perform maintenance on it, use the SETXCF
START,REBUILD command to move it to the other CF. Although not recommended, it is
possible to remove the SCA structure using the command:
SETXCF FORCE,STR,STRNAME=Group attachment name_SCA
389
produce highlighted warning messages on the MCS console when the structure reaches its
threshold The default threshold is 80% full for the SCA structure. If the SCA is too small, DB2
may crash.
Even though the DB2 system remains operational, processes that require access to the SCA
structures are queued while DB2 is rebuilding the SCA. This is the same for Lock and GBP
structures.
390
391
When you add new applications or workload to your DB2 data sharing group, you must take
into account the size of your lock structure because the number of locks may increase. If the
lock structure is too small, IRLM warns you that it is running out of space by issuing the
DXR170I message when storage begins to fill.
It is extremely important to avoid having the structure reach 100% full. This condition can
cause DB2 locks to fail and lead to severe performance degradation due to false contention.
DB2 will continue, but transactions might begin failing with resource unavailable errors and
SQL CODE -904. Use of the lock structure should be monitored by using a performance
monitor such as the RMF Structure Activity Report.
Perform the following steps to alter the size of a DB2 lock structure using the appropriate
z/OS system commands.
1. Check the structure's size and location:
D XCF,STR,STRNAME=D#$#_LOCK1
2. Check that there is sufficient free space in the current CF:
D CF,CFNAME=current CF name
3. Modify the structure size with the ALTER command:
SETXCF START,ALTER,STRNM=D#$#_LOCK1,SIZE=new size
4. Verify the results:
D XCF,STR,STRNAME=D#$#_LOCK1
Check the ACTUAL SIZE value.
392
A group restart is distinguished from a normal restart by the act of recovering from the logs of
all members that were lost from the lock structure. A group restart does not necessarily mean
that all DB2s in the group start up again, but information from all non-starting DB2s must be
used to rebuild the lock structure.
393
Note that a data sharing group started with the Light option is not registered with the
Automatic Resource Manager (ARM). Therefore, ARM will not automatically restart a
member that has been started with LIGHT(YES).
You are able to restart any DB2 on any system in the DB2 data-sharing group if the command
prefix scope is already specified as S or X in the system definition statements of the
IEFSSNxx member using the DB2 command START DB2,LIGHT(YES).
This same DB2 can later be recovered back on the original failing system as soon as it is
available.
Routing commands
You can control operations on an individual member of a data sharing group from any z/OS
console by entering commands prefixed with the appropriate command prefix.
For example, assuming that you chose -D#$1 as the command prefix for member D#$1, you
can start a DB2 statistics trace on that member by entering this command at any z/OS
console in the Parallel Sysplex:
-D#$1 START TRACE (STAT)
Command routing requires that the command prefix scope is registered as S or X on the
IEFSSNxx PARMLIB member. You can also control operations on certain objects by using
commands or command options that affect an entire group. These can also be entered from
any z/OS console.
For example, assuming that D#$1 is active, you can start database XYZ by entering this
command at any z/OS console in the Parallel Sysplex:
-D#$1 START DATABASE (XYZ)
394
Command scope
The breadth of a command's impact is called the scope of that command.
Many commands that are used in a data sharing environment affect only the member for
which they are issued. For example, a STOP DB2 command stops only the member identified
by the command prefix. Such commands have member scope.
Other commands have group scope because they affect an object in such a way that all
members of the group are affected. For example, a STOP DATABASE command, issued from
any member of the group, stops that database for all members of the group.
395
396
19
Chapter 19.
397
DBCTL
DBCTL is an environment where only IMS DB is implemented, and CICS is used as the only
transaction manager. In this model, the application may also access DB2 data. An example of
this type of configuration is shown in Figure 19-1 on page 399.
398
CICSterminal
CICS
DB2
IMS DB
RECON
DB2
table
FAST PATH
DATABASE
IMS logs
(OLDS & WADS)
FULL
FUNCTION
DATABASE
DCCTL
DCCTL is an environment where only IMS TM is implemented and DB2 is used as the only
database manager.
Note: The DC in DCCTL refers to Data Communications. There may still be IMS
documentation that refers to IMS/DC, but this has been replaced by IMS/TM.
An example of this type of configuration is shown in Figure 19-2 on page 400.
399
CICSterminal
IMSTerminal
CICS
IMS TM
Message
Queues
DB2
RECON
IMS logs
(OLDS & WADS)
DB2
table
Figure 19-2 IMS DCCTL configuration
IMS DB/DC
IMS DB/DC is an environment where both IMS DB and IMS TM are implemented. Here, IMS
can process transactions submitted by users logged on to terminals connected to IMS, and
trigger application programs running in IMS that access IMS databases.
In this model, CICS can coexist as another transaction manager and also access the IMS
data, as per the DBCTL model. This configuration also supports access to DB2 databases by
IMS applications.
An example of this type of configuration is shown in Figure 19-3 on page 401.
400
CICSterminal
IMSterminal
CICS
IMS TM
Message
Queues
DB2
IMS DB
RECON
DB2
table
FAST PATH
DATABASE
IMS logs
(OLDS & WADS)
FULL
FUNCTION
DATABASE
401
In addition, there are several communication components of IMS that can be incorporated
into an IMSplex environment. They are:
VTAM Generic Resources
Rapid Network Reconnect (RNR), which comes in two varieties:
SNPS: Single Node Persistent Sessions
MNPS: Multi Node Persistent Sessions
For a brief discussion of these components, refer to 19.4.2, VTAM Generic Resources on
page 411, and 19.4.3, Rapid Network Reconnect on page 412.
WADS
OLDS
DBRC
DLISAS
MPP
IFP
JMP
BMP
JBP
RECON
FULL
FUNCTION
DATABASE
FAST PATH
DATABASE
MESSAGE
QUEUES
The following sections briefly describe the various components of IMS that are shown in
Figure 19-4.
402
interface to z/OS for controlling the operation of the IMS subsystem. It also controls and
dispatches the application programs running in the various dependent regions.
The control region provides all logging, restart, and recovery functions for the IMS
subsystems. The terminals, message queues, and logs are all attached to this region. If Fast
Path is used, the Fast Path database data sets are also allocated by the control region
address space.
DLISAS (IMSDLI)
The DLI Separate Address Space (DLISAS) has all the full function IMS database data sets
allocated to it, and it handles most of the data set access functions. It contains some of the
control blocks associated with database access and the database buffers used for accessing
the full function databases. Although it is not required to use the DLISAS address space, its
use is recommended. If you specify that you wish to use DLISAS, this address space is
automatically started when IMS is started.
DBRC (IMSDBRC)
The DataBase Recovery and Control (DBRC) address space contains the code for the DBRC
component of IMS. It processes all access to the DBRC recovery control data sets (RECON).
It also performs all generation of batch jobs for DBRC; for example, for archiving the online
IMS log. All IMS control regions have a corresponding DBRC address space because it is
needed, at a minimum, for managing the IMS logs. This address space is automatically
started when IMS is started.
403
Message queues
All messages and transactions that come into IMS are placed on the IMS message queue,
and are then scheduled to be processed by an online dependant region (for example, an
MPR). In a non-sysplex environment, the message queue is actually kept in storage buffers in
the IMS control region. In a sysplex environment, you have the option to place the message
queue in the Coupling Facility. This is known as IMS Shared Queues.
19.2.1 Terminology
This section defines terminology used later in this chapter.
IMSplex
An IMSplex is one or more IMS subsystems that work together as a unit. Typically (but not
always), these address spaces:
Share either databases or resources or message queues (or a combination of these)
Run in a z/OS Parallel Sysplex environment
Include an IMS Common Service Layer
The address spaces that can participate in the IMSplex are:
Control region address spaces
IMS manager address spaces (Operations Manager, Resource Manager, Structured Call
Interface)
IMS server address spaces (Common Queue Server (CQS))
An IMSplex allows you to manage multiple IMS systems as though they were one system (a
single-system perspective). An IMSplex can exist in a non-sysplex environment, or it can
consist of multiple IMS subsystems (in data or queue sharing groups) in a sysplex
environment.
404
Shared queues
IMS provides the option for multiple IMS systems in a sysplex to share a single set of
message queues. The set of systems that share the message queue is known as an IMS
Queue Sharing Group.
VSAM databases
Virtual Sequential Access Method (VSAM) is used by many IMS and non-IMS applications,
and comes in two varieties:
Entry Sequenced Data Sets (ESDS) for the primary data sets
Key Sequenced Data Sets (KSDS) for index databases
These data sets are defined using the IDCAMS utility program.
OSAM databases
The Overflow Sequential Access Method (OSAM) is unique to IMS. It is delivered as part of
the IMS product. It consists of a series of channel programs that IMS executes to use the
standard operating system channel I/O interface. The data sets are defined using JCL
statements. As far as the operating system is concerned, an OSAM data set looks like a
physical sequential data set (DSORG=PS).
405
DEDBs
The Data Entry Database (DEDB) was designed to support particularly intensive IMS
database requirements, primarily in the banking industry, for larger databases, high
transaction workloads, improved availability, and reduced I/O.
MSDBs
The Fast Path database access method, Main Storage Database (MSDB), has functionality
that has been superseded by the Virtual Storage Option (VSO) of the DEDB, so it is not
described in this book, and you are advised not to use it.
406
IRLM
RECON
DATABASE
MESSAGE
QUEUES
MESSAGE
QUEUES
Figure 19-5 Simple IMS 2-way local data sharing in a single z/OS system
IRLM
Internal Resource Lock Manager (IRLM) is required by IMS when running block-level data
sharing. It is used to externalize all the database locks to enable data sharing. When the IMS
subsystems are all within the same z/OS system, then IRLM maintains the database locks
within its own address space.
IRLM was originally known as the IMS Resource Lock Manager, and you may find it referred
to by this name in older publications. It is now also used by DB2.
Important: IRLM is provided and shipped along with IMS as well as with DB2, but you
cannot share IRLM between IMS and DB2. Ensure that the IRLM instance running to
support IMS is managed along with the IMS product, and that the IRLM instance running to
support DB2 is managed along with the DB2 product.
407
An example of this can be found in Figure 19-6, which shows a two-way IMSplex across two
systems. This can be extended to many more IMS systems across many more systems (up to
32 in total), all using the same shared RECONs, databases, and Coupling Facility structures.
Note that in this scenario, the IMS message queues are still unique to each IMS system and
not shared.
IRLM
IRLM
COUPLING
FACILITY
Lock Structures
DB Structures
System 2
System 1
RECON
MESSAGE
QUEUES
MESSAGE
QUEUES
DATABASE
Figure 19-6 Simple two-way global IMS data sharing environment across two systems
IRLM
(Internal Resource Lock Manager (IRLM) is required by IMS when running in BLDS or data
sharing mode, and is still used to externalize all the database locks to enable data sharing, as
explained in the preceding example.
In this case, because the IMS systems are running on different systems, each system
requires an IRLM address space to be active and IRLM uses the Coupling Facility to store
and share its lock information and database buffers.
408
CQS
CQS
SCI
SCI
COUPLING
FACILITY
IRLM
IRLM
MsgQStructures
Lock Structures
DB Structures
Resource Struc.
SCI
SCI
SCI
SCI
System 1
System 2
SCI
SCI
SCI
SCI
RECON
OM
SCI
RM
SCI
SCI
FDBR1
FDBR2
DATABASE
SCI
OM
SCI
RM
SCI
The additional address spaces and structures not previously described are listed here:
409
Resource Manager
The Resource Manager (RM) address space maintains global resource information for clients
using a Resource structure in the Coupling Facility
It can contain IMSplex global and local member information, resource names and types,
terminal and user status, global process status, and resource management services. It also
handles sysplex terminal management and global online change.
One or more RM address spaces are required per IMSplex in IMS V8. IMS V9 allows for zero
RMs in the IMSplex.
410
Operations Manager
The Operations Manager (OM) address space provides an API allowing single point of
command entry into IMSplex. It will become the focal point for operations management and
automation. Command responses from multiple IMS systems are consolidated.
One or more OM address spaces are required per IMSplex.
411
Second system
Third system
System Name
#@$1
(known as System1)
#@$2
(known as System2)
#@$3
(known as System3)
IMS ID
I#$1
(known as IMS1)
I#$2
(known as IMS2)
I#$3
(known as IMS3)
IMS CTL
IMS control region
I#$1CTL
I#$2CTL
I#$3CTL
DLI SAS
(DLI Separate Address Space)
I$#1DLI
I#$2DLI
I#$3DLI
DBRC
(Database Recovery & Control)
I#$1DBRC
I#$2DBRC
I#$3DBRC
CQS
(Common Queue Server)
I#$1CQS
I#$2CQS
I#$3CQS
IRLM
(Internal Resource Lock Manager)
I#$#IRLM
I#$#IRLM
I#$#IRLM
SCI
(Structured Call Interface)
I#$#SCI
I#$#SCI
I#$#SCI
OM
(Operations Manager)
I#$#OM
I#$#OM
I#$#OM
RM
(Resource Manager)
I#$#RM
I#$#RM
I#$#RM
412
First system
Second system
Third system
IMS Connect
(TCP/IP Communication)
I#$1CON
I#$2CON
I#$3CON
FDBR
(Fast Database Recovery)
I#$3FDR
I#$1FDR
I#$2FDR
This parameter specifies the name of the XCF group that will be
used by OTMA.
This parameter specifies the name of the VTAM Generic Resource
group that this IMS subsystem will use.
This parameter specifies the name of the z/OS subsystem that
IRLM will use.
(ID=I#$1,RACF=N)
(HOSTNAME=TCPIP,PORTID=(7101,7102,7103),ECB=Y,MAXSOC=1000,
EXIT=(HWSCSLO0,HWSCSLO1))
DATASTORE (ID=I#$1,GROUP=I#$#XCF,MEMBER=I#$1TCP1,TMEMBER=I#$1OTMA)
DATASTORE (ID=I#$2,GROUP=I#$#XCF,MEMBER=I#$1TCP2,TMEMBER=I#$2OTMA)
DATASTORE (ID=I#$3,GROUP=I#$#XCF,MEMBER=I#$1TCP3,TMEMBER=I#$3OTMA)
IMSPLEX
(MEMBER=I#$1CON,TMEMBER=I#$#)
Figure 19-8 IMS Connect configuration for an IMSplex
Other connections
There are no DB2 systems, MQ systems, or CICS DBCTL systems connected to IMS, and
APPC is not enabled in this test environment.
413
Type of structure
I#$#LOCK1
LOCK structure
I#$#VSAM
CACHE structure
I#$#OSAM
CACHE structure
I#$#VSO1DB1
I#$#VSO1DB2
I#$#VSO2DB1
I#$#VSO2DB2
CACHE structures
I#$#MSGQ
LIST structure
I#$#MSGQOFLW
LIST structure
I#$#EMHQ
LIST structure
I#$#EMHQOFLW
LIST structure
Resource structure
I#$#RM
LIST structure
414
the updated data is only available in the cache structure. If that structure is lost for some
reason, the updates must be recovered by reading back through the IMS logs.
Resource structure
The Resource structure is a CF list structure that contains information about uniquely-named
resources that are managed by the resource manager address space. This structures takes
on a greater role with the introduction of Dynamic Resource Definition in IMS Version 10.
415
System-managed duplexing
One option is to tell the operating system (actually the XES component of the operating
system) that you want the structure to be duplexed by the system. In this case, XES creates
two copies of the selected structures (with identical names) and ensures that all updates to
the primary structure are also applied to the secondary one.
To enable this function for a structure, you must add the DUPLEX keyword to the structure
definition in the CFRM policy. For more information about system-managed duplexing, refer
to 7.4.4, Enabling system-managed CF structure duplexing on page 115.
User-managed duplexing
Another option provided by XES is what is known as user-managed duplexing. However, IMS
does not support user-managed duplexing for its CF structures.
STATUS
ALLOCATED
NOT ALLOCATED
ALLOCATED
ALLOCATED
ALLOCATED
ALLOCATED
NOT ALLOCATED
ALLOCATED
ALLOCATED
ALLOCATED
ALLOCATED
ALLOCATED
NOT ALLOCATED
NOT ALLOCATED
More detailed information about an individual structures can be obtained using the
D XCF,STR,STRNAME=structure_name command. A subset of the response from this command
416
is shown in Figure 19-10. For a complete description, refer to Appendix B, List of structures
on page 499.
-D XCF,STR,STRNAME=I#$#RM
IXC360I 01.49.30 DISPLAY XCF 793
STRNAME: I#$#RM
STATUS: ALLOCATED
TYPE: SERIALIZED LIST
POLICY INFORMATION:
...
DUPLEX
: DISABLED
ALLOWREALLOCATE: YES
PREFERENCE LIST: FACIL02 FACIL01
ENFORCEORDER
: NO
EXCLUSION LIST IS EMPTY
2
3
...
ENTRIES: IN-USE:
ELEMENTS: IN-USE:
LOCKS:
TOTAL:
110 TOTAL:
12 TOTAL:
256
5465,
5555,
2% FULL
0% FULL
...
MAX CONNECTIONS: 32
# CONNECTIONS : 3
CONNECTION NAME ID VERSION SYSNAME JOBNAME ASID STATE
---------------- -- -------- -------- -------- ---- ---------------CQSS#$1CQS
03 0003000F #@$1
I#$1CQS 0024 ACTIVE
CQSS#$2CQS
01 00010019 #@$2
I#$2CQS 0047 ACTIVE
CQSS#$3CQS
02 00020010 #@$3
I#$3CQS 0043 ACTIVE
417
If a loss of the IRLM structures or the Coupling Facility that contain the IRLM structure occurs,
then:
IMS batch data sharing jobs end abnormally with a U3303 abend code on the system with
the loss of connectivity. Backout is required for updaters. All the batch data sharing jobs
must be restarted later.
Although the online system continues operating, data sharing quiesces, and transactions
making lock requests are suspended until the lock structure is automatically rebuilt. Each
IRLM participating in the data sharing group is active in the automatic rebuild of the IRLM
lock structure to the alternate CF.
When the rebuilding is complete, transactions that were suspended have their lock
requests processed.
To invoke automated recovery, a second Coupling Facility is required and the CFRM policy
must specify an alternate CF in the preference list.
The target CF structure is repopulated with active locks from the IRLMs. Given that IMS and
IRLM will rebuild the structure1, using system-managed duplexing for the lock structure will
not provide any additional recovery, but it may speed up the recovery.
418
This assumes that only the lock structure was impacted by the failure. If the lock structure and one or more
connected IRLMs are impacted, all IRLMs in the data sharing group will abend and need to be restarted.
419
The response to the command shows 1 that the structure was rebuilt to the alternate Coupling
Facility.
Another reason for rebuilding a structure might be to implement a change to the maximum
size for the structure. This is known as a rebuild-in-place because the structure is not
moving to a different CF. An example is shown in Figure 19-12.
-SETXCF START,REBUILD,STRNAME=I#$#VSAM
IXC521I REBUILD FOR STRUCTURE I#$#VSAM
HAS BEEN STARTED
IXC367I THE SETXCF START REBUILD REQUEST FOR STRUCTURE
I#$#VSAM WAS ACCEPTED.
IXC526I STRUCTURE I#$#VSAM IS REBUILDING FROM
COUPLING FACILITY FACIL02 TO COUPLING FACILITY FACIL02.
REBUILD START REASON: OPERATOR INITIATED
INFO108: 00000003 00000003.
IXC521I REBUILD FOR STRUCTURE I#$#VSAM
HAS BEEN COMPLETED
2
3
4
420
Most structures can be rebuilt without impacting the users of those structures. However, in the
case of a rebuild of an IMS OSAM structure, if batch DL/1 jobs using shared OSAM
databases were running at the time of the rebuild, those jobs will abend and will need to be
restarted. To avoid this, we recommend only rebuilding these structures at a time when no
IMS batch DL/1 jobs are running.
421
422
Table 19-3 ARM Element names for IMS CSL address spaces
CSL address space name
OM
CSL + omname + OM
RM
CSL + rmname + RM
SCI
CSL + sciname + SC
Given that the CSL address space names are the same on each system, there is no need to
have ARM active for them, and so ARMRST=N has been coded. This is because they are
already started on the other system. Given this, the ARM policies for these address spaced
do not show up in any displays. If they were active, then they would be displayed by
commands like D XCF,ARMS,DETAIL,ELEMENT=CSLOM1OM.
423
D XCF,ARMS,DETAIL,ELEMENT=I#$1
IXC392I 03.27.19 DISPLAY XCF 347
ARM RESTARTS ARE ENABLED
-------------- ELEMENT STATE SUMMARY --------------TOTAL- -MAXSTARTING AVAILABLE FAILED RESTARTING RECOVERING
0
1
0
0
0
1
200
RESTART GROUP:IMS
PACING :
0
FREECSA:
0
0
ELEMENT NAME :I#$1
JOBNAME :I#$1CTL STATE
:AVAILABLE
1
CURR SYS :#@$1
JOBTYPE :STC
ASID
:0056
INIT SYS :#@$1
JESGROUP:XCFJES2A TERMTYPE:ALLTERM
EVENTEXIT:*NONE*
ELEMTYPE:SYSIMS
LEVEL
:
1
TOTAL RESTARTS :
0
INITIAL START:07/03/2007 01:49:56
RESTART THRESH :
0 OF 3
FIRST RESTART:*NONE*
RESTART TIMEOUT:
300
LAST RESTART:*NONE*
Figure 19-14 Output from the command D XCF,ARMS,DETAIL,ELEMENT=I#$1
1 Shows the element name of the IMS name I#$1, and the jobname of I#$1CTL.
When ARM is used for IMS, an installation can choose either to use or not to use ARM for
FDBR. If FDBR does not use ARM, it cannot tell ARM not to restart IMS. In this case, a failure
of IMS will cause FDBR to do its processing and ARM to restart IMS. This could be
advantageous. You would expect FDBR processing to complete before the restart of IMS by
ARM completes. If so, locks would be released quickly by FDBR and a restart of IMS by ARM
would occur automatically.
IRLM
I#$#IR#Innn
(In this case, the system name of n is expanded
by z/OS to 3 characters; thus 1 becomes 001.
SCI
CSLCSInSC
OM
CSLOMnOM
RM
CSLRMnRM
CTL
I#$n
CQS
CQSS#SnCQS
FDBR
F#$n
Note that n = 1, 2, or 3.
Because the SCI, OM and RM address spaces are identical on each of the three systems in
our example, there is no point to having any of them registered to ARM because the address
spaces are already active on each system.
425
426
IMS SPOC is available with the Control Center for IMS, which is provided as part of DB2 9
for Linux UNIX and Windows Control Center. This is available via the IMS Web page:
http://www.ibm.com/software/data/ims/imscc/
These commands can also be entered from a REXX program. Details about the sample
REXX program can be found in IMS Common Service Layer Guide and Reference V9,
SC18-7816.
Type 2 commands available with IMS Version 9 are:
DELETE - Used to delete Language Environment options previously altered with the
UPDATE command.
INITIATE - Used for online change and online reorg management across the sysplex.
QUERY - querying the status of IMS components (similarly to the /DISPLAY command), as
well as for querying IMSPLEX or Coupling Facility structure status.
TERMINATE - Used for online change and online reorg management across the sysplex.
UPDATE - Used to update the status of various IMS definitions, similarly to the /ASS or /CHA
command.
As each IMS release comes out, there will be more and more functionality added to the Type
2 command set.
427
Type 1 commands
These commands would not normally cover sysplex-related functions.
Displaying the active regions.
Displaying connection to other subsystems (that is, DB2, MQ, OTMA, and so on).
Displaying the status of a particular resource.
Type 2 commands
These commands are designed for sysplex-related functions.
Displaying the status of a particular resource.
Display the status of the different components of an IMSplex.
Display the status of the Coupling Facility structures.
CQS queries
The CQS structures can be queried by using the /CQQ IMS command. As shown in
Figure 19-16, the command displays:
1 LEALLOC the list entries allocated.
2 LEINUSE the list entries in use.
3 ELMALLOC the elements allocated.
4 ELMINUSE the elements in use.
TRUCTURE NAME
I#$#MSGQ
I#$#MSGQOFLW
I#$#EMHQ
I#$#EMHQOFLW
Figure 19-16 IMS response to the command /CQQ STATISTICS STRUCTURE ALL
CQS checkpoints
CQS Checkpoints for individual Coupling Facility list structures can be triggered by the IMS
command /CQCHKPT or /CQC. An example is shown in Figure 19-17.
DFS058I 21:15:15 CQCHKPT COMMAND IN PROGRESS I#$1
DFS1972I CQCHKPT SHAREDQ COMMAND COMPLETE FOR STRUCTURE=I#$#MSGQ
DFS1972I CQCHKPT SHAREDQ COMMAND COMPLETE FOR STRUCTURE=I#$#EMHQ
Figure 19-17 IMS response to the command /CQC SHAREDQ STRUCTURE ALL
STATUS option
The command F irlmproc,STATUS displays status, work units in progress, and detailed lock
information for each DBMS identified to this instance of IRLM (irlmproc is the procedure name
for the IRLM address space.) Figure 19-18 is an example showing an IMS control region and
an FDR region on each system.
DXR101I IR#I001 STATUS SCOPE=GLOBAL
DEADLOCK: 0500
SUBSYSTEMS IDENTIFIED
NAME
T/OUT
STATUS
UNITS
FDRI#$3
0300
UP-RO
0
I#$1
0300
UP
1
DXR101I End of display
662
HELD
0
2
WAITING
0
0
RET_LKS
01
02
STATUS,ALLD option
The ALLD option shows all the subsystems connected to all the IRLMs in the data sharing
group that IRLM belongs to. The RET_LKS field is very important. It shows how many
database records are retained by a failing IRLM and is therefore unavailable to any other IMS
subsystem. See Figure 19-19 for an example.
DXR102I IR#I001 STATUS 731
SUBSYSTEMS IDENTIFIED
NAME
STATUS
RET_LKS
FDRI#$1 UP-RO
0
FDRI#$2 UP-RO
0
FDRI#$3 UP-RO
0
I#$1
UP
0
I#$2
UP
0
I#$3
UP
0
DXR102I End of display
IRLMID
002
003
001
001
002
003
IRLM_NAME
IR#I
IR#I
IR#I
IR#I
IR#I
IR#I
IRLM_LEVL
1.009 1
1.009
1.009
1.009 2
1.009
1.009
1 This shows all the FDBR regions active with READ ONLY mode, and which IRLM they are
connected to.
2 This shows the IMS systems, together with which IRLM they are connected to.
429
STATUS,ALLI option
The ALLI option shows the names and status of all IRLMs in the data sharing group, as
shown in Figure 19-20.
DXR103I IR#I001 STATUS 752
IRLMS PARTICIPATING IN DATA SHARING GROUP FUNCTION LEVEL=2.025
IRLM_NAME IRLMID STATUS LEVEL SERVICE MIN_LEVEL MIN_SERVICE
IR#I
003
UP
2.025 PK05211
1.022
PQ52360
IR#I
002
UP
2.025 PK05211
1.022
PQ52360
IR#I*
001
UP
2.025 PK05211
1.022
PQ52360
DXR103I End of display
Figure 19-20 Response from an IRLM STATUS command F I#$#IRLM,STATUS,ALLI
STATUS,MAINT option
The MAINT option lists the PTF levels of all the modules active in IRLM. The command is:
F irlmproc,STATUS,MAINT
ABEND option
This option causes IRLM to terminate abnormally. IRLM informs all DBMSs linked to it,
through their status exits, that it is about to terminate. The command is:
F irlmproc,ABEND
RECONNECT option
This option causes IMS to reconnect to the IRLM specified in the IRLMNM parameter in the
IMS control region JCL. This is necessary after an IRLM is restarted following an abnormal
termination and IMS was not taken down. The command is:
F irlmproc,RECONNECT
PURGE option
Warning: Use the PURGE option with extreme caution. It is included in this section for
completeness only.
The PURGE option causes IRLM to release any retained locks it holds for IMS. This
command must be used with care in these situations:
The RECON reflects that database backout was done, but IRLM was not up at time of the
backout.
A decision is made not to recover, or to defer recovery, but the data is required to be
available to other IMS subsystems.
The command is:
F irlmproc,PURGE,IMSname
The PURGE ALL option of this command is even more hazardous. It allows you to release all
retained locks of all IMS subsystems held by a specific IRLM.
430
yyyyddd/hhmmsst I#$1CTL
Attention: When entering the /ere command, the period after ere is required. Otherwise,
the WTOR message DFS972A *IMS AWAITING MORE INPUT will be displayed. If this does
occur, simply reply with a period.
CLOSED;
431
1 This shows that FDBR has recognized that it needs to do a recovery, and the reason why.
2 This shows the FDBR recovery has completed.
ARM will then restart IMS on the same system, which will automatically perform the
emergency restart.
The IRLM, SCI, OM, RM or CQS are unaffected by this.
IMS Connect behaves the same as described in 19.9.1, Single IMS abend without ARM and
without FDR on page 431.
432
433
All the address spaces on system1 simply fail along with system1. FDR running on system2
produces the messages as shown in Figure 19-25, and then ARM will automatically restart
IMS on any valid system within the sysplex, based on the ARM policy.
DFS4165W FDR FOR (I#$1) XCF DETECTED TIMEOUT ON ACTIVE IMS
SYSTEM,REASON=SYSTEM,DIAGINFO=0C030384 F#$1
DFS4164W FDR FOR (I#$1) TIMEOUT DETECTED DURING LOG AND XCF SURVEILLANCE
F#$1
DFS3257I ONLINE LOG CLOSED ON DFSOLP03 F#$1
DFS3257I ONLINE LOG CLOSED ON DFSOLS03 F#$1
DFS4166I FDR FOR (I#$1) DB RECOVERY PROCESS STARTED. REASON = XCF NOTIFICATION
DFS3257I ONLINE LOG NOW OPENED ON DFSOLP04 F#$1
DFS3257I ONLINE LOG NOW OPENED ON DFSOLS04 F#$1
DFS3261I WRITE AHEAD DATA SET NOW ON DFSWADS0 F#$1
DFS3261I WRITE AHEAD DATA SET NOW ON DFSWADS1 F#$1
DFS4168I FDR FOR (I#$1) DATABASE RECOVERY COMPLETED
$HASP100 ARCHI#$1 ON INTRDR
CONWAY
FROM STC09096 I#$1FDR
IRR010I USERID I#$1
IS ASSIGNED TO THIS JOB.
DFS2484I JOBNAME=ARCHI#$1 GENERATED BY LOG AUTOMATIC ARCHIVING F#$1
DFS3257I ONLINE LOG CLOSED ON DFSOLP04 F#$1
DFS3257I ONLINE LOG CLOSED ON DFSOLS04 F#$1
$HASP100 ARCHI#$1 ON INTRDR
CONWAY
FROM STC09096 I#$1FDR
IRR010I USERID I#$1
IS ASSIGNED TO THIS JOB.
$HASP100 ARCHI#$1 ON INTRDR
CONWAY
FROM STC09096 I#$1FDR
IRR010I USERID I#$1
IS ASSIGNED TO THIS JOB.
DFS2484I JOBNAME=ARCHI#$1 GENERATED BY LOG AUTOMATIC ARCHIVING F#$1
DFS092I IMS LOG TERMINATED
F#$1
DFS627I IMS RTM CLEANUP ( EOT ) COMPLETE FOR JS I#$1FDR .I#$1FDR .IEFPROC
,RC=00
Figure 19-25 FDR address space recovering IMS following an system failure
IMS Connect will behave as described in 19.9.5, Single system abend with ARM but without
FDR on page 433.
434
when the
DFS3306A
DFS3705I
DFS3705I
DFS2500I
DFS2500I
DFS2823I
DFS2574I
DFS0488I
DFS4450I
DFS4450I
Figure 19-26 Coupling Facility failure messages in the IMS control region
As a result of this the VSO database area DFSIVD3B, which was in use at the time, is now
stopped, and marked with an EEQE status. Because this is a write error, it causes the copy of
the CI in cache structure or structures to be deleted. Other sharing systems no longer have
access to the CI.
If any of the other IMS systems try to access this CI, they will receive messages indicating
that another system has this CI as a retained lock, as shown in Figure 19-27.
DFS3304I IRLM LOCK REQUEST REJECTED. PSB=DFSIVP8 DBD=IVPDB3 JOBNAME=IV3H212J
DFS0535A RGN=
1, HSSP CONN PROCESS ATTEMPTED AREA DFSIVD3A PCB LABEL HSSP
DFS0535I RC=03, AREA LOCK FAILED. I#$3
+STATUS FH, DLI CALL = ISRT
Figure 19-27 Messages indicating IRLM retained locks
To resolve this, issue the /VUN AREA DFSIVD3B command to take the area out of VSO, thus
ensuring that all updates are now reflected on DASD. Next, issue the /STA AREA DFSIVD3B
command, and it will be reloaded into VSO without any errors.
IMS DLISAS
Figure 19-28 on page 436 shows the messages displayed by DLISAS when a Coupling
Facility fails.
435
when
IRLM
Figure 19-29 on page 437 shows the messages displayed by IRLM when a Coupling Facility
fails.
436
2M
CQS
Figure 19-30 on page 438 shows the messages displayed by CQS when a Coupling Facility
fails.
437
Resource Manager
Figure 19-31 shows the messages displayed by RM when a Coupling Facility fails.
when the preferred Coupling facility fails:
CSL2040I RM RM1RM
IS QUIESCED; STRUCTURE I#$#RM
RM1RM
IS UNAVAILABLE
438
CQS abend
IMS itself will not complete initialization as it waits for the CQS address space, which abends
because it is unable to connect to the I#$#EMHQ and I#$#MSGQ structures, as shown in
Figure 19-32. The RM address space will probably also abend as a result. The important
piece is message CQS0350W.
Restart the RM address space, which will then also wait for CQS to restart.
CQS0350W
CQS0350W
CQS0350W
CQS0001E
BPE0006I
BPE0006I
BPE0006I
BPE0006I
BPE0006I
BPE0006I
BPE0006I
...
CQS0350W
CQS0350W
CQS0350W
CQS0001E
BPE0006I
BPE0006I
BPE0006I
BPE0006I
BPE0006I
BPE0006I
BPE0006I
439
-SETXCF FORCE,CONNECTION,STRNAME=I#$#EMHQ,CONNAME=ALL
IXC363I THE SETXCF FORCE FOR ALL CONNECTIONS FOR STRUCTURE
I#$#EMHQ WAS REJECTED:
FORCE CONNECTION NOT ALLOWED FOR PERSISTENT LOCK OR SERIALIZED LIST
Figure 19-33 Attempting to disconnect CQS from the CF structures using the SETXCF command
FORCE,STRUCTURE,STRNAME=I#$#MSGQ
FORCE,STRUCTURE,STRNAME=I#$#EMHQ
FORCE,STRUCTURE,STRNAME=I#$#MSGQOFLW
FORCE,STRUCTURE,STRNAME=I#$#EMHQOFLW
The resulting output is shown in Figure 19-35, which also shows the connections were
deleted.
-SETXCF FORCE,STRUCTURE,STRNAME=I#$#EMHQ
IXC353I THE SETXCF FORCE REQUEST FOR STRUCTURE
I#$#EMHQ WAS COMPLETED:
STRUCTURE DELETED BUT ALSO RESULTED IN DELETED CONNECTION(S)
Figure 19-35 Result of deleting a CQS structure using the SETXCF command
Scratch both structure recovery data sets (SRDS 1 and 2) for the structure
Important: SCRATCHING both SRDS datasets means to DELETE/DEFINE the datasets.
The term scratch in this context means to delete and redefine the VSAM ESDS used for
both SRDS1 and SRDS2. The data sets involved are also specified in the CQSSGxxx IMS
proclib members SRDSDSN1 and SRDSDSN2 for each structure.
To achieve this a simple IDCAMS delete/define of all the SRDS data sets is required. If you do
not have the IDCAMS define statements available, refer to CQS Structure Recovery Data
Sets in IMS Common Queue Server Guide and Reference Version 9, SC18-7815.
440
//DEL
EXEC PGM=IXCMIAPU
//SYSPRINT DD
SYSOUT=*
//SYSABEND DD
SYSOUT=*
//SYSIN
DD
*
DATA TYPE(LOGR) REPORT(YES)
DELETE LOGSTREAM NAME(#@$#.SQ.EMHQ.LOG)
DELETE LOGSTREAM NAME(#@$#.SQ.MSGQ.LOG)
DELETE STRUCTURE NAME(I#$#LOGEMHQ)
DELETE STRUCTURE NAME(I#$#LOGMSGQ)
Figure 19-36 Sample JCL to delete the CQS log streams
Figure 19-37 shows the sample JCL to define the log streams.
//DEF
EXEC PGM=IXCMIAPU
//SYSPRINT DD SYSOUT=*
//SYSABEND DD SYSOUT=*
//SYSIN
DD *
DATA TYPE(LOGR) REPORT(YES)
DEFINE STRUCTURE
NAME(I#$#LOGEMHQ)
LOGSNUM(1)
MAXBUFSIZE(65272)
AVGBUFSIZE(4096)
DEFINE STRUCTURE
NAME(I#$#LOGMSGQ)
LOGSNUM(1)
MAXBUFSIZE(65272)
AVGBUFSIZE(4096)
DEFINE LOGSTREAM
NAME(#@$#.SQ.EMHQ.LOG)
STRUCTNAME(I#$#LOGEMHQ)
LS_DATACLAS(LOGR4K)
HLQ(IMSU#@$#)
MODEL(NO)
LS_SIZE(1000)
LOWOFFLOAD(0)
HIGHOFFLOAD(80)
STG_DUPLEX(NO)
RETPD(0)
AUTODELETE(NO)
DASDONLY(NO)
DEFINE LOGSTREAM
NAME(#@$#.SQ.MSGQ.LOG)
STRUCTNAME(I#$#LOGMSGQ)
LS_DATACLAS(LOGR4K)
HLQ(IMSU#@$#)
MODEL(NO)
LS_SIZE(1000)
LOWOFFLOAD(0)
HIGHOFFLOAD(80)
STG_DUPLEX(NO)
RETPD(0)
AUTODELETE(NO)
DASDONLY(NO)
Figure 19-37 Sample JCL to define the log streams
441
Figure 19-39 Messages indicating IMS batch backout has occurred for full function IMS databases
442
have been updated on the DASD version of the databases. Again, consult your DBA for
validation and advice.
DFS980I 3:08:35 BACKOUT PROCESSING HAS ENDED FOR DFSIVP6
I#$1
DFS2500I DATASET DFSIVD31 SUCCESSFULLY ALLOCATED I#$1
DFS2500I DATASET DFSIVD31 SUCCESSFULLY ALLOCATED
I#$1
DFS2500I DATASET DFSIVD32 SUCCESSFULLY ALLOCATED I#$1
DFS2500I DATASET DFSIVD32 SUCCESSFULLY ALLOCATED
I#$1
DFS2822I AREA DFSIVD3A CONNECT TO STR: I#$#VSO1DB1
SUCCESSFUL I#$1
IXL014I IXLCONN REQUEST FOR STRUCTURE I#$#VSO1DB1 981
WAS SUCCESSFUL. JOBNAME: I#$1CTL ASID: 0047
CONNECTOR NAME: I#$1 CFNAME: FACIL01
IXL015I STRUCTURE ALLOCATION INFORMATION FOR 982
STRUCTURE I#$#VSO1DB1, CONNECTOR NAME I#$1
CFNAME
ALLOCATION STATUS/FAILURE REASON
-------- --------------------------------FACIL01
STRUCTURE ALLOCATED AC007800
FACIL02
PREFERRED CF ALREADY SELECTED AC007800
DFS2822I AREA DFSIVD3A CONNECT TO STR: I#$#VSO1DB1
SUCCESSFUL I#$1
DFS2823I AREA DFSIVD3A DISCONNECT FROM STR: I#$#VSO1DB1
SUCCESSFUL I#$1
DFS2823I AREA DFSIVD3A DISCONNECT FROM STR: I#$#VSO1DB1
SUCCESSFUL
I#$1
....
443
Abending IRLM
The messages that the abending IRLM will display as it shuts down are shown in
Figure 19-42. The statistics have been suppressed due to their size, but they are basically a
hex dump of the lock structure at the time IRLM abended.
DXR122E IR#I001 ABEND UNDER IRLM TCB/SRB IN MODULE DXRRL020 ABEND CODE=Sxxx
IXL030I CONNECTOR STATISTICS FOR LOCK STRUCTURE I#$#LOCK1, 579
CONNECTOR I#$#$$$$$IR#I001:
...(statistics suppressed)
IXL031I CONNECTOR CLEANUP FOR LOCK STRUCTURE I#$#LOCK1, 580
CONNECTOR I#$#$$$$$IR#I001, HAS COMPLETED.
INFO: 00010032 00000000 00000000 00000000 00000000 00000004
DXR121I IR#I001 END-OF-TASK CLEANUP SUCCESSFUL - HI-CSA
457K - HI-ACCT-CSA
Figure 19-42 Messages of interest from abending IRLM
Other IRLM
The other IRLM address spaces in the IRLM group will receive similar error messages, as
shown in Figure 19-43 on page 445.
444
IRLM status
The status of other IRLM address spaces following this failure shows IMS1 now in an SFAIL
status, as shown in Figure 19-45 on page 446. This means that the IRLM that IMS is
identified to has been disconnected from the data sharing group. Any modify-type locks held
by IMS have been retained by IRLM.
445
IRLMID
002
003
001
002
003
IRLM_NAME
IR#I
IR#I
IR#I
IR#I
IR#I
IRLM_LEVL
1.009
1.009
1.009
1.009
1.009
Figure 19-45 Displaying IRLM status ALLD after the IRLM failure
Restarting IRLM
Restart IRLM normally.
The IRLM address space rejoins the data sharing group, as shown in Figure 19-47.
IXL014I IXLCONN REQUEST FOR STRUCTURE I#$#LOCK1 758
WAS SUCCESSFUL. JOBNAME: I#$#IRLM ASID: 004D
CONNECTOR NAME: I#$#$$$$$IR#I001 CFNAME: FACIL01
DXR141I IR#I001 THE LOCK TABLE I#$#LOCK1
WAS ALLOCATED IN A VOLATILE
DXR132I IR#I001 SUCCESSFULLY JOINED THE DATA SHARING GROUP WITH
2M LOCK
TABLE LIST ENTRIES
Figure 19-47 IRLM reconnection messages
446
S
S
S
S
S
S
S
I#$#SCI
I#$#OM
I#$#RM
I#$#IRLM
I#$1CTL
I#$1FDR (possibly on a different system)
I#$1CON
The SCI, RM, OM, and IRLM address spaces are required to be active before IMS will start.
When starting the IMS Control region, it will automatically start the DLISAS, DBRC, and CQS
address spaces.
The following figures display the messages that indicate the various address spaces have
started successfully.
19.10.2 RM startup
The RM address space requires CQS to be active before it completes initialization.
Figure 19-50 shows the messages indicating RM is both waiting for CQS and then active.
CSL0003A RM
CSL0020I RM
19.10.3 OM startup
The OM address space requires SCI to be active before it completes initialization.
Figure 19-51 shows the messages indicating OM is both waiting for SCI and then active.
CSL0003A OM
CSL0020I OM
447
The automated start of OM and SCI address spaces is controlled by the parameters in
DFSCGxxx member in IMS.PROCLIB. The parameters are as follows:
448
RMENV=
SCIPROC=
This parameter is used to specify the procedure name for the SCI
address space, which IMS will automatically start if not already
started. This will only occur if RMENV=N is also specified.
OMPROC=
Attention: In Figure 19-53, the region waiting messages for SCI, RM, OM, and CQS are
not highlighted and could be lost with all the other messages produced at IMS startup, but
as soon as these address spaces are active, IMS will continue automatically.
If the DFS039A message waiting on IRLM WTOR appears, then operations or automation
will have to respond RETRY to the message before IMS will continue.
If an IMS automatic (AUTO=Y) restart is done, then the IMS WTOR DFS3139I will appear.
As soon as this is used, then the IMS WTOR DFS996I *IMS READY* message will appear.
If an IMS manual (AUTO=N) restart is done, then only the IMS WTOR DFS996A message
will appear.
449
The messages in Figure 19-56 indicate that CQS has initialized and connected with the
Shared Queues structures for MSGQ and EMHQ.
450
If IRLM is not already active on the system where FDR is starting, then the FDR address
space will abend, as shown in Figure 19-58. If this happens, restart IRLM before starting
FDBR.
DFS4179E FDR FOR (I#$1) IRLM IDENT-RO FAILED, RC=08 REASON=4008
DFS629I IMS RST TCB ABEND - IMS 0574
F#$1
F#$1
CONNECTED TO IMSPLEX=I#$#
CONNECTED TO DATASTORE=I#$1
; M=DSC1
CONNECTED TO DATASTORE=I#$3
; M=DSC1
CONNECTED TO DATASTORE=I#$2
; M=DSC1
TCPIP COMMUNICATION ON HOSTNAME=TCPIP
OPENED; M=
LISTENING ON PORT=7101
STARTED; M=SDOT
LISTENING ON PORT=7102
STARTED; M=SDOT
LISTENING ON PORT=7103
STARTED; M=SDOT
WELCOME TO IMS CONNECT!
If SCI is not active on the system when IMS Connect starts, it will still connect to the IMS
systems, but will also receive the message shown in Figure 19-60.
HWSI1720W REGISTRATION TO SCI FAILED: MEMBER=I#$1CON
Figure 19-60 SCI failure messages at IMS Connect startup
451
FDBR shutdown
If the IMS control region is shut down, then FDBR will be shut down. However, if you only
want FDBR to shut down, then use the command F I#$1FDR,TERM.
452
453
454
20
Chapter 20.
WebSphere MQ
This chapter provides an overview of various operational considerations to keep in mind
when WebSphere MQ is implemented in a Parallel Sysplex.
455
456
Figure 20-1 Relationship between application, channel initiator and queue managers
If the queue to which the message is sent is not on the same system as the sender
application, another part of MQ is used to transport the message from the local system to the
remote system. The channel initiator is responsible for transporting a message from one
queue manager to another using a transmission protocol such as TCP/IP or SNA.
Channel initiator code runs on a z/OS system as a started task named xxxxCHIN.
Queue manager code runs on a z/OS system as a started task named xxxxMSTR, with xxxx
being the respective subsystem ID.
457
Figure 20-2 displays two queue managers within a sysplex. Each queue manager has a
channel initiator and a local queue. Messages sent by queue managers on AIX and Windows
are placed on the local queue, from where they are retrieved by an application. Reply
messages are returned via a similar route.
When MQ is running in a Parallel Sysplex, the need may arise to access a queue from more
than one queue manager due to workload management and availability requirements. This is
where shared queues fit in.
A shared queue is a type of local queue. The messages on that queue can be accessed by
one or more queue managers that are in a sysplex. The queue managers that can access the
same set of shared queues form a group called a queue-sharing group.
Any queue manager in the queue-sharing group can access a shared queue. This means that
you can put a message on to a shared queue on one queue manager, and get the same
message from the queue from a different queue manager. This provides a rapid mechanism
for communication within a queue-sharing group that does not require channels to be active
between queue managers.
458
Figure 20-3 displays three queue managers and a Coupling Facility, which form a
queue-sharing group. All three queue managers can access the shared queue in the
Coupling Facility.
An application can connect to any of the queue managers within the queue-sharing group.
Because all the queue managers in the queue-sharing group can access all the shared
queues, the application does not depend on the availability of a specific queue manager; any
queue manager in the queue-sharing group can service the queue.
At least two CF structures are needed for shared queues. One is the administrative structure.
The administrative structure contains no queues or messages. It only contains internal queue
manager information and has a fixed name of queuesharinggroupCSQ_ADMIN.
Subsequent structures are used for queues and messages. Up to 63 structures can be
defined to contain queues or messages for a particular queue-sharing group. The names of
these structures are elective, but the first four characters must be the queue- sharing group
name.
Queue-sharing groups have a name of up to four characters. The name must be unique in
your network, and be different from any queue manager names.
Figure 20-4 on page 460 illustrates a queue-sharing group that contains two queue
managers. Each queue manager has a channel initiator and its own local page sets and log
data sets. Each member of the queue-sharing group must also connect to a DB2 system.
The DB2 systems must all be in the same DB2 data-sharing group so that the queue
managers can access the DB2 shared repository, which contains shared object definitions.
These are any type of WebSphere MQ object (for example, a queue or channel) that is
defined only once so any queue manager in the group can use them.
459
After a queue manager joins a queue-sharing group, it will have access to the shared objects
defined for that group. You can use that queue manager to define new shared objects within
the group. If shared queues are defined within the group, you can use this queue manager to
put messages to and get messages from those shared queues.
Any queue manager in the group can retrieve the messages held on a shared queue. You
can enter an MQSC command once, and have it executed on all queue managers within the
queue-sharing group as though it had been entered at each queue manager individually. The
command scope attribute is used for this.
460
You can additionally configure multiple queue managers running on different operating
system images in a sysplex to operate as a queue-sharing group, which can take advantage
of shared queues and shared channels for higher availability and workload balancing.
. . . . . . . . . . 1
0.
1.
2.
3.
4.
5.
6.
7.
Manage
Perform
Start
Stop
. PSM3
. PSM3
remote
. PSM3
. 5
You can connect to a queue manager and define the queue managers on which your
requests should be executed by filling in the following three fields:
Connect name: This is the queue manager to which you actually connect.
Target queue manager: With this parameter, you specify on which queue manager you
want to input your request.
Action queue manager: On the action queue manager, this is where the commands are
actually executed.
461
Name
<> *
CICS01.INITQ
GROUP.QUEUE
GROUP.QUEUE
ISF.CLIENT.SDSF._/%3.REQUESTQ
ISF.MODEL.QUEUE
PSM1
PSM1.XMITQ
PSM2
PSM2.XMITQ
PSM3.DEAD.QUEUE
PSM3.DEFXMIT.QUEUE
PSM3.LOCAL.QUEUE
SHARED.QUEUE
SYSTEM.ADMIN.CHANNEL.EVENT
SYSTEM.ADMIN.CONFIG.EVENT
SYSTEM.ADMIN.PERFM.EVENT
Row 1 of 35
Press F11 to display queue status.
4=Manage
Type
QUEUE
QLOCAL
QLOCAL
QLOCAL
QALIAS
QMODEL
QREMOTE
QLOCAL
QREMOTE
QLOCAL
QLOCAL
QLOCAL
QLOCAL
QLOCAL
QLOCAL
QLOCAL
QLOCAL
Disposition
ALL
PSM3
QMGR
PSM3
COPY
PSM3
GROUP
QMGR
PSM3
QMGR
PSM3
QMGR
PSM3
QMGR
PSM3
QMGR
PSM3
QMGR
PSM3
QMGR
PSM3
QMGR
PSM3
QMGR
PSM3
SHARED
QMGR
PSM3
QMGR
PSM3
QMGR
PSM3
462
Display the status of all queues; this will provide information about the queue and, if the CF
structure is filling up unexpectedly, may help to identify when an application is looping; see
Figure 20-8 for an example.
DISPLAY QSTATUS(*)
-PSM3 DIS QSTATUS(*)
CSQM293I -PSM3 CSQMDRTC 25 QSTATUS FOUND MATCHING REQUEST CRITERIA
CSQM201I -PSM3 CSQMDRTC DIS QSTATUS DETAILS
QSTATUS(CICS01.INITQ)
TYPE(QUEUE)
QSGDISP(QMGR)
END QSTATUS DETAILS
CSQM201I -PSM3 CSQMDRTC DIS QSTATUS DETAILS
QSTATUS(GROUP.QUEUE)
TYPE(QUEUE)
QSGDISP(COPY)
END QSTATUS DETAILS
CSQM201I -PSM3 CSQMDRTC DIS QSTATUS DETAILS
QSTATUS(PSM1.XMITQ)
TYPE(QUEUE)
QSGDISP(QMGR)
...
CSQ9022I -PSM3 CSQMDRTC ' DIS QSTATUS' NORMAL COMPLETION
Figure 20-8 Display the status of all queues
The CFSTATUS command displays the current status of all structures including the
administrative structure; see Figure 20-9 for an example. You can display three different types
of status information:
SUMMARY: Gives an overview of the status information.
CONNECT: Shows all members connected to the structure and in the case of a
connection failure, failure information.
BACKUP: Shows backup date and time, RBA information, and the queue manager that
did the backup.
DISPLAY CFSTATUS(A*) TYPE(SUMMARY)
CSQM293I -PSM3 CSQMDRTC 1 CFSTATUS FOUND MATCHING REQUEST CRITERIA
CSQM201I -PSM3 CSQMDRTC DISPLAY CFSTATUS DETAILS
CFSTATUS(APPL01)
TYPE(SUMMARY)
CFTYPE(APPL)
STATUS(ACTIVE)
SIZEMAX(10240)
SIZEUSED(1)
ENTSMAX(2217)
ENTSUSED(35)
FAILTIME( )
FAILDATE( )
END CFSTATUS DETAILS
CSQ9022I -PSM3 CSQMDRTC ' DISPLAY CFSTATUS' NORMAL COMPLETION
Figure 20-9 DISPLAY CFSTATUS output
463
464
MQ structure rebuilds should normally be performed when there is little or no MQ activity. The
rebuild process is fully supported by MQ, but there is a brief period of time where access to
the shared queues in the structure is denied.
465
Only the queue manager should be restarted by ARM. The channel initiator should be
restarted from CSQINP2 initialization data set. Set up your WebSphere MQ environment so
that the channel initiator and associated listeners are started automatically when the queue
manager is restarted.
For more detailed information about ARM, refer to Chapter 6, Automatic Restart Manager
on page 83.
466
21
Chapter 21.
467
468
Description
ATR.grpname.ARCHIVE
RRS main UR
state log
ATR.grpname.MAIN.UR
ATR.grpname.RM.DATA
469
Description
RRS delayed UR
state log
ATR.grpname.DELAYED.
UR
ATR.grpname.RESTART
ATR.grpname.METADATA
If the ARCHIVE logstream is defined and write activity to the RRS ARCHIVE log stream is
high, this may impact the performance throughput of all RRS transactions actively in use by
RRS. This log stream is optional and only needed by the installation for any post-transaction
history type of investigation. If you choose not to use the ARCHIVE log stream, a warning
message is issued at RRS startup time about not being able to connect to the log stream,
RRS, however, will continue its initialization process.
Warm start
The normal mode of operation is a warm start. This occurs when valid data is found in the
RM.DATA log stream. For RRS to access data about incomplete transactions, all defined
RRS log streams should be intact. METADATA and ARCHIVE are optional log streams and
do not need to be defined. Figure 21-2 on page 471 displays the messages you would expect
after an RRS warm start.
470
Figure 21-2 Typical messages produced after a RRS warm start in a Parallel Sysplex
Note the error messages for the 1 ARCHIVE and 2 METADATA logstreams.
Cold start
When RRS finds an empty RM.DATA log stream, it cold starts. RRS flushes any log data
found in the MAIN.UR and DELAYED.UR log streams to the ARCHIVE log, if it exists. An RRS
cold start applies to the entire RRS logging group, which may contain some or all members of
the sysplex. The logstreams are shared across all systems in the sysplex that are in that
logging group. After an RRS cold start, there is no data available to RRS to complete any
work that was in progress. RRS can be cold started by stopping all RRS instances in the
logging group, and deleting and redefining the RM.DATA log stream using the IXCMIAPU
utility.
There is a sample procedure available in SYS1.SAMPLIB(ATRCOLD) which deletes and
then defines the RM.DATA logstream. This forces a cold start when RRS tries to initialize.
RRS should only be deliberately cold-started in very controlled circumstances such as:
The first time that RRS is started
When there is a detected data loss in RM.DATA
For a controlled RRS cold start, all resource managers that require RRS should be stopped
on all systems that are a part of the RRS logging group to be cold started. Use the RRS ISPF
panels to check on resource manager status. Check that no incomplete URs exist for any
resource manager.
471
ATR602I
23.52.32 RRS RM SUMMARY 148
RM NAME
STATE
CSQ.RRSATF.IBM.PSM2
Reset
DFHRXDM.#@$CWE2A.IBM
Reset
DFHRXDM.#@$C1A2A.IBM
Reset
DFHRXDM.#@$C1T2A.IBM
Reset
DSN.RRSATF.IBM.D#$2
Run
DSN.RRSPAS.IBM.D#$2
Run
SYSTEM
#@$2
#@$2
#@$2
#@$2
#@$2
#@$2
GNAME
#@$#PLEX
#@$#PLEX
#@$#PLEX
#@$#PLEX
#@$#PLEX
#@$#PLEX
The following system commands may assist in identifying delays within RRS.
D RRS,UR,S (available only with z/OS V1R8 and above)
472
GNAME
#@$#PLEX
#@$#PLEX
#@$#PLEX
#@$#PLEX
#@$#PLEX
#@$#PLEX
ST
FLT
FLT
FLT
FLT
FLT
FLT
TYPE COMMENTS
Unpr
Unpr
Unpr
Unpr
Unpr
Unpr
D RRS,UR,DETAILED,URID=C0D8CB4F7E15C0000000000001010000
Figure 21-5 on page 474 displays the output from the D LOGGER,LOGSTREAM,LSN=ATR.*
command.
473
IXG601I
23.57.03 LOGGER DISPLAY 242
INVENTORY INFORMATION BY LOGSTREAM
LOGSTREAM
STRUCTURE
----------------ATR.#@$#PLEX.DELAYED.UR
RRS_DELAYEDUR_1
SYSNAME: #@$2
DUPLEXING: LOCAL BUFFERS
SYSNAME: #@$1
DUPLEXING: LOCAL BUFFERS
SYSNAME: #@$3
DUPLEXING: LOCAL BUFFERS
ATR.#@$#PLEX.MAIN.UR
RRS_MAINUR_1
SYSNAME: #@$2
DUPLEXING: LOCAL BUFFERS
SYSNAME: #@$1
DUPLEXING: LOCAL BUFFERS
SYSNAME: #@$3
DUPLEXING: LOCAL BUFFERS
ATR.#@$#PLEX.RESTART
RRS_RESTART_1
SYSNAME: #@$2
DUPLEXING: LOCAL BUFFERS
SYSNAME: #@$1
DUPLEXING: LOCAL BUFFERS
SYSNAME: #@$3
DUPLEXING: LOCAL BUFFERS
ATR.#@$#PLEX.RM.DATA
RRS_RMDATA_1
SYSNAME: #@$2
DUPLEXING: LOCAL BUFFERS
SYSNAME: #@$1
DUPLEXING: LOCAL BUFFERS
SYSNAME: #@$3
DUPLEXING: LOCAL BUFFERS
NUMBER OF LOGSTREAMS:
#CONN STATUS
------ -----000003 IN USE
1
2
000003 IN USE
000003 IN USE
000003 IN USE
000004
In the figure, 1 displays the association between logstream and CF structure name, the
number of connections (there are three), and the status. As shown in 2, duplexing is taking
place in the IXGLOGR data space.
474
STATUS
NOT ALLOCATED
ALLOCATED
ALLOCATED
ALLOCATED
ALLOCATED
Whether it is duplexed 1
Alternative CFs 2
In which CF the structure is allocated 3
Its connections 4
475
1
2
ACTIVE STRUCTURE
---------------ALLOCATION TIME: 06/27/2007 01:06:57
CFNAME
: FACIL01
3
COUPLING FACILITY: SIMDEV.IBM.EN.0000000CFCC1
PARTITION: 00 CPCID: 00
ACTUAL SIZE
: 9216 K
STORAGE INCREMENT SIZE: 256 K
ENTRIES: IN-USE:
12 TOTAL:
14021,
ELEMENTS: IN-USE:
39 TOTAL:
14069,
PHYSICAL VERSION: C0CEC7FD 73B897CC
LOGICAL VERSION: C0CEC7FD 73B897CC
SYSTEM-MANAGED PROCESS LEVEL: 8
DISPOSITION
: DELETE
ACCESS TIME
: 0
MAX CONNECTIONS: 32
# CONNECTIONS : 3
CONNECTION NAME
---------------IXGLOGR_#@$1
IXGLOGR_#@$2
IXGLOGR_#@$3
ID
-03
01
02
VERSION
-------00030046
0001010C
00020053
SYSNAME
-------#@$1
#@$2
#@$3
JOBNAME
-------IXGLOGR
IXGLOGR
IXGLOGR
ASID
---0016
0016
0016
0% FULL
0% FULL
STATE
---------------ACTIVE
ACTIVE
ACTIVE
476
Option ===> 1
Select an option and press ENTER:
1
2
3
4
5
6
After invoking the RRS ISPF primary panel, you are able to display or update the various
logstream types; see Figure 21-9.
RRS Log Stream Browse Selection
Command ===>
Provide selection criteria and press Enter:
. . . . . .
UR identifier . . .
RM name . . . . . .
SURID
The RRS Resource Manager Data log gives details about the RM such as:
Figure 21-10 on page 478 displays more detail about the RRS structure status.
477
LOG STREAM
#@$2
2007/06/27 02:58:49.450178 BLOCKID=00000000000155B9
RESOURCE MANAGER=CSQ.RRSATF.IBM.PSM2
LOGGING SYSTEM=#@$2
RESOURCE MANAGER MAY RESTART ON ANY SYSTEM
RESOURCE MANAGER WAS LAST ACTIVE WITH RRS ON SYSTEM #@$2
LOG NAME IS CSQ3.MQ.RRS.IBM.PSM2
RESTART ANYTIME SUPPORTED
LOG INSTANCE NUMBER: 2007/06/27 06:58:04.069718
#@$2
2007/06/27 02:58:49.451368 BLOCKID=0000000000015691
RESOURCE MANAGER=CSQ.RRSATF.IBM.PSM1
LOGGING SYSTEM=#@$2
RESOURCE MANAGER MAY RESTART ON ANY SYSTEM
RESOURCE MANAGER WAS LAST ACTIVE WITH RRS ON SYSTEM #@$1
LOG NAME IS CSQ3.MQ.RRS.IBM.PSM1
RESTART ANYTIME SUPPORTED
LOG INSTANCE NUMBER: 2007/06/27 06:58:03.066048
Figure 21-10 Output from RRS resource manager data log
479
LOG STREAM
#@$3
2007/02/25 17:04:21.098521 BLOCKID=0000000000037681
URID=C03647C67E5DBF140000000001020000 LOGSTREAM=ATR.#@$#PLEX.MAIN.UR
PARENT URID=00000000000000000000000000000000
SURID=N/A
WORK MANAGER NAME=#@$3.Q6G4BRK.00BD
STATE=InCommit
EXITFLAGS=00840000 FLAGS=20000000
LUWID=
TID=
GTID=
FORMATID=
GTRID=
(decimal)
(hexadecimal)
BQUAL=
RMNAME=DSN.RRSATF.IBM.D81D
ROLE=Participant
CMITCODE=00000FFF BACKCODE=00000FFF PROTOCOL=PresumeAbort
READING ATR.#@$#PLEX.DELAYED.UR
LOG STREAM
#@$3
2005/06/30 12:12:05.907016 BLOCKID=0000000000000001
URID=BD3D78047E62CBA00000001601020000 LOGSTREAM=ATR.#@$#PLEX.DELAYED.UR
PARENT URID=00000000000000000000000000000000
SURID=N/A
WORK MANAGER NAME=#@$3.PMQ2BRK1.015F
STATE=InPrepare
EXITFLAGS=00040000 FLAGS=20000000
LUWID=
TID=
GTID=
FORMATID=
GTRID=
(decimal)
(hexadecimal)
BQUAL=
RMNAME=DSN.RRSATF.IBM.D61B
ROLE=Participant
CMITCODE=00000FFF BACKCODE=00000FFF PROTOCOL=PresumeNothing
Figure 21-11 ATRBATCH sample report
ATRBDISP
Produce detailed information about every UR known to RRS.
480
481
482
22
Chapter 22.
z/OS UNIX
This chapter discusses the UNIX System Services environment, which is now called z/OS
UNIX. It describes a shared zFS/HFS environment and examines some zFS commands.
For more information about these topics, refer to z/OS UNIX System Services Planning,
GA22-7800 or z/OS System Services Command Reference, SA22-7802.
For more information about ZFS, refer to z/OS Distributed File Service zFS Administration,
SC24-5989.
483
22.1 Introduction
z/OS UNIX is a component of z/OS that is a certified UNIX implementation, XPG4 UNIX 95. It
was the first UNIX 95 system not derived from the AT&T source code. It includes a shell
environment, OMVS, which can be accessed from TSO.
z/OS UNIX allows UNIX applications from other platforms to run on IBM z/OS mainframes. In
many cases a recompile is all that is needed. Additional effort may be advisable for enhanced
z/OS integration. Programs using hardcoded ASCII numerical values may need adjustment to
support the EBCDIC character set.
Database access (DB2 using Call Attach) is one example of how z/OS UNIX can access
services found elsewhere in z/OS. Such programs cannot be ported to z/OS platforms without
rewriting. Conversely, a program that adheres to standards such as POSIX and ANSI C is
easier to port to the z/OS UNIX environment.
Numerous core System z subsystems (such as TCP/IP) and applications rely on z/OS UNIX.
z/OS 1.9 introduced several new z/OS UNIX features and included improved Single UNIX
Specification Version 3 (UNIX 03) alignment.
Directory
Directory
File
File
Directory
File
File
File
Directory
File
File
Directory
File
File
File
File
File
File
File
File
484
Directory
Directory
There are four different types of UNIX file systems supported by z/OS:
JOB (SWR,1-1),PETER,CLASS=A,MSGCLASS=T,NOTIFY=&SYSUID
EXEC PGM=IEFBR14
DD SYSOUT=*
DD DISP=(,CATLG),DSN=SYSU.LOCAL.BIN,
SPACE=(CYL,(5,4,1)),DSNTYPE=HFS
1 When an HFS is allocated, it needs to have some directory blocks included. If directory
blocks=0 is specified, or if blocks is omitted, the data set allocates but it is unusable. An error
such as Errno=80x No such device exists; Reason=EF096056 occurs when the system tries
to use it.
485
The JCL to define a zFS can be seen in Figure 22-4. The process has two steps. Step 1
creates a linear VSAM data set. Step 2, which requires the zFS STC to be active, formats the
data set so it can be used by z/OS UNIX. In this case, the data set is a multi-volume
SMS-managed data set.
//ZFSDEFN JOB
PETER4,MSGCLASS=S,NOTIFY=&SYSUID,CLASS=A
//*
USER=SWTEST
//DEFINE
EXEC
PGM=IDCAMS
//SYSPRINT DD
SYSOUT=*
//SYSUDUMP DD
SYSOUT=*
//AMSDUMP DD
SYSOUT=*
//SYSIN
DD
*
DEFINE CLUSTER (NAME(SYSU.LOCAL.BIN)
VOLUME(* * * * * * * * * *)
LINEAR CYL(500 100) SHAREOPTIONS(2))
/*
//CREATE
EXEC
PGM=IOEAGFMT,REGION=0M,
// PARM=('-aggregate SYSU.LOCAL.BIN -compat')
//SYSPRINT DD
SYSOUT=*
//STDOUT
DD
SYSOUT=*
//STDERR
DD
SYSOUT=*
//SYSUDUMP DD
SYSOUT=*
//CEEDUMP DD
SYSOUT=*
//*
2
3
486
H F S P le x - s y s te m s A ,B ,C
S ta n d a lo n e S y s te m s
Figure 22-5 HFSPLEX can be smaller than a sysplex
487
MOUNTED
LATCHES
07/16/2007 L=33
21.30.01
Q=0
2
07/16/2007
21.30.00
L=15
Q=0
1 The D OMVS,F command displays all the file systems in the HFSplex.
2 A file system that is owned by system #@$3 with AUTOMOVE=U. In this case, when
system #@$3 is shut down, the file system is unmounted and becomes unusable.
3 A file system that is owned by system #@$2 with AUTOMOVE=Y. In this case, when
system #@$2 is shut down, the file systems ownership will be taken up by another system in
the sysplex and it will still be available.
In a shared file system environment, instead of each system having read-only its own root file
system, there is a single HFSplex-wide root file system. If this root file system is damaged or
needs to be moved, then the entire HFSplex needs to be restarted. This file system should be
very small and consist of directories and links only.
When a file system is mounted, there is a new attribute, called automove, that is assigned.
The automove attribute indicates what is to happen to the file system when the system that
owns the file system is shut down. There are a number of options available, but the result
effectively is that the file system is unmounted and made unavailable or it is moved to another
system.
When a file system is mounted read-only, the owning system has no impact because every
system directly reads the file system. When the file system is mounted read-write, then the
owning system is important. All updates are performed by the owning system. When a
different system wants to update a file or directory in the file system, the update is
communicated to the owning system, using XCF, which then does the actual update.
The I/O on the requesting system is not completed until the owning system indicates it has
completed the I/O. As a consequence, it is possible to have significant extra XCF traffic
caused by a file system being inappropriately owned. For example, consider a HTTP server
running on system MVSA that logs all the HTTP traffic. When the log file is in a file system
488
owned by MVSB, then all the HTTP logging writes will be transferred, using XCF, from MVSA
to MVSB.
As systems are IPLed, the ownership of file systems, especially those that are
automove-enabled, can change. This not only can cause significant extra XCF traffic, but can
also impact the response time. Your z/OS system programmer is responsible for maintaining
the mount configuration, and therefore should know the optimal configuration.
*/
*/
*/ 1
1 The ZFS supporting task STC name; the JCL needs to be in a system proclib, such as
SYS1.PROCLIB.
With z/OS 1.7, the zFS supporting address space was terminated with P ZFS. With z/OS 1.8
and later, it is stopped with the F OMVS,STOPFS=ZFS command; this can be seen in
Figure 22-8.
F OMVS,STOPPFS=ZFS
014 BPXI078D STOP OF ZFS REQUESTED. REPLY 'Y' TO
PROCEED. ANY OTHER REPLY WILL CANCEL THIS STOP.
14y
IEE600I REPLY TO 014 IS;Y
IOEZ00050I zFS kernel: Stop command received.
IOEZ00048I Detaching aggregate OMVS.ZOSR18.#@$#R3.ROOT
IOEZ00387E System #@$3 has left group IOEZFS, aggregate recovery in progress.
IOEZ00387E System #@$3 has left group IOEZFS, aggregate recovery in progress.
IOEZ00357I Successfully left the sysplex group.
IOEZ00057I zFS kernel program IOEFSCM is ending
IEF352I ADDRESS SPACE UNAVAILABLE
$HASP395 ZFS
ENDED
015 BPXF032D FILESYSTYPE ZFS TERMINATED. REPLY
'R' WHEN READY TO RESTART. REPLY 'I' TO IGNORE.
1
2
4
5
489
The zFS address space cannot be started manually; that is, issuing S ZFS does not work. If
message BPX0232D is given a reply of I, then the only way to restart the ZFS address space
is with the SETOMVS RESET= command (or the SET OMVS= command). We recommend using the
SETOMVS RESET= command, because the alternative can significantly alter the OMVS
configuration.
Unlike an HFS, which will allocate a secondary extent automatically, a zFS needs to be
explicitly grown. This can be done using the z/OS UNIX command zfsadm. If you prefer to do
this in a batch job, then Figure 22-9 shows an example.
//ZFSGRWX JOB 'GROWS ZFS',CLASS=A,MSGCLASS=S,NOTIFY=&SYSUID
//STEP0
EXEC PGM=IKJEFT01
//SYSPROC DD DSN=SYS1.SBPXEXEC,DISP=SHR
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
oshell rm /tmp/zfsgrw_*
//STEP1
EXEC PGM=IEBGENER
//SYSPRINT DD SYSOUT=*
//SYSIN
DD DUMMY
//SYSUT2
DD PATH='/tmp/zfsgrw_in',
//
PATHDISP=(KEEP),FILEDATA=TEXT,
//
PATHOPTS=(OWRONLY,OCREAT,OEXCL),
//
PATHMODE=(SIRWXG,SIRWXU,SIRWXO)
//SYSUT1
DD *
zfsadm grow -aggregate SYSU.LOCAL.BIN -size 0
//CONFIG EXEC PGM=BPXBATCH,REGION=0M,PARM='SH /tmp/zfsgrw_in'
//STDERR
DD SYSOUT=*
//STDOUT
DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
//SYSPRINT DD SYSOUT=*
Figure 22-9 ZFSADM - batch
490
2
3
Appendix A.
Operator commands
This appendix lists and describes operator commands that can help you to manage your
Parallel Sysplex environment.
491
Description
V xxxx,AS,ON
V xxxx,AS,OFF
Configuration commands
D IOS,CONFIG
D IOS,GROUP
D M=CPU
D M=CHP(nn)
D M=DEV(nnnn)
D U,IPLVOL
D IPLINFO
D OPDATA
D PARMLIB
D SYMBOLS
D SSI
ACTIVATE IODF=xx
Console commands
D C
D C,A,CA
D C,B
D CNGRP
D EMCS,S
D EMCS,F,CN=consname
V CN(*),ACTIVATE
V CN(*),DEACTIVATE
V CN(console),MSCOPE=(*)
492
Command
Description
V CN(console),MSCOPE=(*ALL)
V CN(console),MSCOPE=(sys1,sys2,...)
V CN(console),ROUT=(ALL)
V CN(console),ROUT=(rcode1,rcode2,...)
RO sysname,command
RO *ALL,command
RO *OTHER,commands
DEVSERV commands
DS P,nnnn
DS QP,nnnn
DS SMS,nnnn
DQ QD,nnnn
ETR commands
D ETR
SETETR,PORT=n
GRS commands
D GRS,A
D GRS,ANALYZE
D GRS,C
D GRS,RES=(*,dsname)
D GRS,DEV=nnnn
D GRS,DELAY
D GRS,SUSPEND
T GRSRNL=(xx)
JES2 commands
$D MEMBER(*)
$E MEMBER(sysname)
$D JOBQ,SPOOL=(%>n)
493
Command
Description
$D MASDEF
$D CKPTDEF
$E CKPTLOCK,HELDBY=sysname
$T CKPTDEF,RECON=Y
$T SPOOLDEF,...
$S XEQ
$P XEQ
$T JOBCLASS(*),QHELD=Y|N
Logger commands
D LOGGER
D LOGGER,CONN
D LOGGER,L
D LOGGER,STR
SETLOGR FORCE,DISC,LSN=logstreamname
SETLOGR FORCE,DEL,LSN=logstreamname
LOGREC commands
D LOGREC
SETLOGRC LOGSTREAM
SETLOGRC DATASET
Operlog commands
D C,HC
V OPERLOG,HARDCOPY
Activate Operlog.
V OPERLOG,HARDCOPY,OFF
Deactivate Operlog.
PDSE commands
V SMS,PDSE,MONITOR
V SMS,PDSE,MONITOR,ON|OFF
V SMS,PDSE,ANALYSIS
V SMS,PDSE,FREELATCH
SMF commands
D SMF
T SMF=xx
SETSMF parameter
494
Command
Description
SMSVSAM commands
D SMS,SMSVSAM,ALL
D SMS,CFLS
D SMS,CFCACHE(structurename|*)
D SMS,CFVOL(volid)
D SMS,CICSVR(ALL)
D SMS,LOG(ALL)
D SMS,DSNAME(dsname)
D SMS,JOB(job)
D SMS,TRANVSAM
V SMSVSAM,ACTIVE
Start SMSVSAM.
V SMSVSAM,TERMINATESERVER
Stop SMSVSAM.
V SMS,CFCACHE(cachename),E|Q
D OMVS,F
D OMVS,A=ALL
T OMVS=xx
SETOMVS parameter
VTAM commands
D NET,STATS,TYPE=CFS
WLM commands
D WLM
V WLM,POLICY=policyname
V WLM,APPLENV=applenv,RESUME
V WLM,APPLENV=applenv,QUIESCE
F WLM,RESOURCE=resource,ON|OFF|RESET
E task,SRVCLASS=srvclass
XCF commands
D XCF
495
Command
Description
D XCF,S,ALL
D XCF,COUPLE
D XCF,PI
D XCF,PI,DEV=ALL
D XCF,PO
D XCF,PO,DEV=ALL
D XCF,POL
D XCF,POL,TYPE=type
D XCF,STR
D XCF,STR,STAT=ALLOC
D XCF,STR,STRNAME=strname
D CF
D XCF,CF
D XCF,ARMSTATUS
D XCF,ARMSTATUS,DETAIL
V XCF,sysname,OFFLINE
SETXCF COUPLE,ACOUPLE=dsn,TYPE=type
SETXCF COUPLE,PSWITCH,TYPE=type
SETXCF START,POL,TYPE=type,POLNAME=polname
Start a policy.
SETXCF START,REALLOC
SETXCF START,RB,POPCF=cfname
SETXCF START,RB,CFNM=cfname,LOC=OTHER
SETXCF START,RB,STRNAME=strname
Rebuild a CF structure.
SETXCF START,RB,DUPLEX,STRNAME=strname
SETXCF START,RB,DUPLEX,CFNAME=cfname
SETXCF START,ALTER,STRNAME=strname,SIZE=nnnn
SETXCF FORCE,STR,STRNAME=strname
SETXCF FORCE,CON,STRNAME=strname,CONNAME=conname
SETXCF FORCE,STRDUMP,STRNAM=strname
496
Command
Description
SETXCF START,CLASSDEF,CLASS=class
SETXCF START,MAINTMODE,CFNM=cfname
SETXCF START,PI,DEV=nnnn
SETXCF START,PI,STRNM=(strname)
SETXCF START,PO,DEV=nnnn
SETXCF START,PO,STRNM=(strname)
SETXCF START,CLASSDEF,CLASS=class
SETXCF STOP,MAINTMODE,CFNM=cfname
SETXCF STOP,PI,DEV=nnnn
SETXCF STOP,PI,STRNM=(strname)
SETXCF STOP,PO,DEV=nnnn
SETXCF STOP,PO,STRNM=(strname)
SETXCF MODIFY,...
497
498
Appendix B.
List of structures
Table B-1 on page 500 in this appendix lists information about the exploiters of the Coupling
Facility including structure name, structure type, structure disposition, connection disposition,
and whether the structure supports rebuild.
499
Structure name
Structure
type
Structure
disposition
Connection
disposition
Support
rebuild?
CICS DFHLOG
user defined
List
Delete
Delete
Yes
CICS DFHSHUNT
user defined
List
Delete
user defined
List
DFHCFLS_...
List
Keep
Keep
No
DFHNCLS_...
List
Keep
Keep
No
DFHXQLS_...
List
Delete
Delete
No
user defined
Cache
Delete
Delete
Yes
IGWLOCK00
Lock
Keep
Keep
Yes
DB2 SCA
grpname_SCA
List
Keep
Delete
Yes
DB2 V8 GBP
grpname_GBP...
Cache
Delete
Keep
Yes
SYSIGGCAS_ECS
Cache
Delete
Delete
Yes
GRS Star
ISGLOCK
Lock
Delete
Delete
Yes
SYSARC_..._RCL
List
IMS Lock
user defined
Lock
Keep
Keep
IMS OSAM
user defined
Cache
Delete
Delete
IMS VSAM
user defined
Cache
Delete
Delete
user defined
user defined
List
Keep
Keep
user defined
user defined
List
Keep
Keep
user defined
user defined
List
Keep
Keep
user defined
List
Delete
user defined
List
Keep
IMS VSO
user defined
Cache
Delete
SYSZWLM_cpuidcputype
Cache
Delete
IRLM (DB2)
grpname_LOCK1
Lock
Keep
CICS/VR
500
Yes
Keep
No
Keep
Yes
Exploiter
Structure name
Structure
type
Structure
disposition
Connection
disposition
Support
rebuild?
IRLM (IMS)
user defined
Lock
Keep
Keep
Yes
JES2 Checkpoint
user defined
List
Keep
Delete
No
user defined
List
Delete
Yes
mqgrpname
List
Keep
No
MQ Shared Queues
Applications
mqgrpname
List
Keep
No
user defined
List
Delete
Yes
IRRXCF00_B00n
Cache
Delete
Delete
Yes
IRRXCF00_P00n
Cache
Delete
Delete
Yes
user defined
List
Delete
user defined
List
Delete
user defined
List
Delete
user defined
List
Delete
user defined
List
Delete
SmartBatch
SYSASFPnnnn
List
Delete
Delete
Yes
System Logger
user defined
List
Delete
Keep
Yes
EZBEPORT
List
EZDVIPA
List
HSA_LOG
List
ING_HEALTHCHKLOG
List
VSAM/RLS Cache
IGWCACHEn
Cache
VSAM/RLS Lock
IGWLOCK00
Lock
Keep
VTAM GR
ISTGENERIC or user
defined
List
Delete
Keep
Yes
VTAM MNPS
ISTMNPS
List
Keep
Keep
Yes
XCF
IXC...
List
Delete
Delete
Yes
HZS...
List
Delete
Yes
WebSphere
501
502
Appendix C.
503
504
Restriction: At this time, there is no support for the 3584/3592 tape library to be used as a
SAD output medium.
Note: Your installation may have enabled the Lock out disruptive tasks: radio button
on the image icon. If that is the case, you must select the No radio button before
proceeding; see Figure C-2.
After double-clicking the CPC Images icon, a list of images is displayed as shown in
Figure C-3 on page 506.
505
On the HMC, in the CPC Images Work Area, we selected system AAIS by single-clicking it to
highlight it. Then we double-clicked the STOP All icon in the CPC Recovery window, as
shown in Figure C-4.
506
For this example, we selected Yes to confirm that the STOP All action should continue. The
panel shown in Figure C-6 displays the progress of the STOP All request for system AAIS.
507
The Load Task Confirmation panel was then displayed, as shown in Figure C-8 on page 509.
508
We clicked Yes on the Load Task Confirmation panel. This was followed by the Load
Progress panel, as shown in Figure C-9.
509
510
We replied with the address of our DASD device (413A) that had SYS1.SADMP on it.
Note: If the SAD output data set has been allocated across multiple DASD volumes, you
specify the first device address to the AMD001A message.
Note: In our SAD example, we received messages AMD091I and AMD092I as seen in
Figure 18-38. These messages were issued because the SAD output data set was
originally created using a data set name of SYS1.AAIS.SADMP. When the SAD program
was generated, we allowed the output SAD data set name to default to SYS1.SADMP.
511
512
513
Here are the HIGH LEVEL steps to perform when taking a stand-alone dump of a z/OS
system that resides in a sysplex. Assume that the z/OS system to be dumped is SYSA.
Procedure A
Important: Follow each step in order.
1. Perform the STOP function to place the SYSA CPUs into the stopped state.
2. IPL the stand-alone dump program
3. Issue VARY XCF,SYSA,OFFLINE from another active z/OS system in the sysplex if message
IXC402D or IXC102A is not already present.
4. Reply DOWN to message IXC402D, IXC102A
Notes on Procedure A
You do not have to wait for the stand-alone dump to complete before issuing the VARY
XCF,SYSA,OFFLINE command.
Performing Procedure A steps 3 and 4 immediately after IPLing the stand-alone dump will
expedite sysplex recovery actions for SYSA. This will allow resources held by SYSA to be
cleaned up quickly, and enable other systems in the sysplex to continue processing.
After the stand-alone dump is IPLed, z/OS will be unable to automatically ISOLATE
system SYSA via SFM, so message IXC402D or IXC102A will be issued after the VARY
XCF,SYSA,OFFLINE command or after the XCF failure detection interval expires. You must
reply DOWN to IXC402D/IXC102A before sysplex partitioning can complete.
Do not perform a SYSTEM RESET in response to IXC402D or IXC102A after IPLing the
stand-alone dump. The SYSTEM RESET is not needed in this case because the IPL of
stand-alone dump causes a SYSTEM RESET to occur. After the stand-alone dump is
IPLed, it is safe to reply DOWN to IXC402D or IXC102A.
If there is a time delay between Procedure A steps 1 and 2, then use Procedure B.
Executing Procedure B will help to expedite the release of resources held by system
SYSA while you are preparing to IPL the stand-alone dump program.
Procedure B
Important: Follow each step in order unless otherwise stated.
1. Execute the STOP function to place the SYSA CPUs into the stopped state.
2. Perform the SYSTEM RESET-NORMAL function on SYSA.
3. Issue VARY XCF,SYSA,OFFLINE from another active z/OS system in the sysplex if message
IXC402D or IXC102A is not already present.
4. Reply DOWN to message IXC402D or IXC102A.
5. IPL the stand-alone dump program. This step can take place any time after step 2.
Notes on Procedure B
Performing Procedure B steps 3 and 4 immediately after doing the SYSTEM RESET will
expedite sysplex recovery actions for SYSA. This will allow resources held by SYSA to be
cleaned up quickly, and enable other systems in the sysplex to continue processing.
After a SYSTEM RESET is performed, z/OS will be unable to automatically ISOLATE
system SYSA via SFM, so message IXC402D or IXC102A will be issued after the VARY
514
XCF,SYSA,OFFLINE command or after the XCF failure detection interval expires. You must
reply DOWN to IXC402D/IXC102A before sysplex partitioning can complete.
Both of these procedures emphasize the expeditious removal of the failing z/OS system
from the sysplex. If the failed z/OS is not partitioned out of the sysplex promptly, some
processing on the surviving z/OS systems might be delayed.
Attention: Do not IPL standalone dump more than once. Doing so will invalidate the dump
of z/OS. To restart stand-alone dump processing, perform the CPU RESTART function on
the CPU where the stand-alone dump program was IPLed.
For additional information about stand-alone dump procedures, refer to z/OS V1R8.0 MVS
Diagnosis Tool and Service Aids, GA22-7589.
515
516
Related publications
The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this book.
IBM Redbooks
For information about ordering these publications, see How to get Redbooks on page 519.
Note that some of the documents referenced here may be available in softcopy only.
CICS Workload Management Using CICSPlex SM and the z/OS/ESA Workload Manager,
GG24-4286
Getting the Most Out of a Parallel Sysplex, SG24-2073
DB2 in the z/OS Platform Data Sharing Recovery, SG24-2218
IMS/ESA Version 6 Guide, SG24-2228
IMS/ESA Data Sharing in a Parallel Sysplex, SG24-4303
Automating CICS/ESA Operations with CICSPlex SM and NetView, SG24-4424
OS/390 z/OS Multisystem Consoles Implementing z/OS Sysplex Operations, SG24-4626
OS/390 z/OS Parallel Sysplex Configuration Cookbook, SG24-4706
CICS and VSAM Record Level Sharing: Recovery Considerations, SG24-4768
JES3 in a Parallel Sysplex, SG24-4776
IMS/ESA Parallel Sysplex Implementation: A Case Study, SG24-4831
IMS/ESA Version 6 Shared Queues, SG24-5088
IMS Primer, SG24-5352
Merging Systems into a Sysplex, SG24-6818
Systems Programmers Guide to: z/OS System Logger, SG24-6898
IMS in the Parallel Sysplex Volume I: Reviewing the IMSplex Technology, SG24-6908
IMS in the Parallel Sysplex Volume II: Planning the IMSplex, SG24-6928
IMS in the Parallel Sysplex Volume III: IMSplex Implementation and Operations,
SG24-6929
ABCs of z/OS System Programming Volume 9, SG24-6989
Server Time Protocol Planning Guide, SG24-7280
Implementing REXX Support in SDSF, SG24-7419
Other publications
These publications are also relevant as further information sources:
OS/390 Parallel Sysplex Recovery, GA22-7286
z/OS V1R8.0 MVS Diagnosis Tool and Service Aids, GA22-7589
517
518
Online resources
These Web sites are also relevant as further information sources:
IBM homepage for Parallel Sysplex
http://www.ibm.com/systems/z/advantages/pso/index.html/
Related publications
519
520
Index
Numerics
cache structure 8
Cache structure (GBPs) 369
capacity
unused 4
capping 124
CDS 23
CF 368
CICS 351
data tables
CICS 355
DB2 369
failure 22
CICS recovery 352
IMS 413
locking 390
log stream 309
outages 9
physical information 15
receiver (CFR) 16
role in locking 5
role in maintaining buffer coherency 5
sender (CFS) 16
structure 184
structures 9, 22, 105
VTAM 326
CFRM
CDS allocation during IPL 45
Couple Data Set 9
initialization 46
policies 85, 115
policy 9, 21
Channel-to-Channel adapter (CTC) 6, 184
checkpoint data set (CQS) 415
CICS
APPLID 346
ARM 362
CF 351
CF data tables 355
CICSPlex 364
co-existing with IMS 400
commands for generic resources 333
deregister from a generic resource group 333
interregion communication (IRC) 347
introduction 346
journal 348
log 348
managing generic resources 333
named counter server 359
remove from a generic resource group 333
shared temporary storage 352
transaction routing 348
VTAM 347
with IMS DB 398
A
abnormal stop 68
action message retention facility (AMRF) 297
adding
a system image to a Parallel Sysplex 50
CF 120
alerts 304
applications
critical 84
ARM 252
automation 84
cancelling with ARMRESTART parameter 98
changing ARM policy 90
CICS 362
cross-system restarts 100
defining RRS 481
description 84
operating with ARM 98
policies 90, 425
policy management 90
restrictions 99
same system restarts 98
starting ARM policy 90
ARM (see Automatic Restart Manager) 394
ARM Couple Data Set 7
ARMRESTART 94
ARMWRAP 96
ATRHZS00 479
AUTOEMEM 26
automated monitoring 14
Automatic Restart Manager (ARM) 7, 84, 394
ARM 25
IMS 421
WebSphere MQ 466
automation
ARM 99
automove 488
B
base sysplex
definition 2
Batch Message Processing (BMP) region 403
BronzePlex 9
buffer
coherency 5
Business Application Services (BAS) 365
Index
521
XCF 346
CICS Multi Region Option 8
CICSPlex 364
Coordinating Address Space (CAS) 365
Environment Services System Services (ESSS) 365
System Manager (CMAS) 365
CICSPlex System Manager 8
CICSPlex System Manager (CPSM) 363
CLEANUP 23
CLEANUP interval 64
clone 3
CMDSYS 298
commands
cancelling with ARMRESTART parameter 98
CF 14
CFCC 137
changing ARM policy 90
display 14
GRS 28
IMS 429
IRLM 429
JES2 25
miscellaneous 32
POPULATECF parameter of REBUILD 154
REBUILD POPULATECF 154
ROUTE 34
starting ARM policy 90
table of 492
V 4020,ONLINE,UNCOND 194
XCF 14
Common Queue Server (CQS) 410
common time 4
need for, in a sysplex 4
connection disposition 109
connection state 109
CONNFAIL 75
console 27
activation 47
buffer shortage 295
CMDSYS parameter 298
EMCS 299
extended MCS 286
group 290
initialization 40
IPL process 292
master 286
message flood automation (MFA) 301
message scope (MSCOPE) 289
messages 28
remove 304
removing 291
ROUTCDE 293
SYSCONS 291
z/OS management 304
console groups 288
consoles 283
sysplex
naming 288
consoles in a sysplex
managing 283
522
D
D XCF,S,ALL 67
DA command
display active panel example 238
E
element 466
EMCS console 299
ETR
mode 31
Event Notification Facility (ENF) 310
Expedited Message Handler Queue (EMHQ) structure
415
exploiting installed capacity 4
extended MCS (EMCS) consoles 287
extended MCS consoles 286
F
failed-persistent state 326
failure
CF 22
IPL failure scenarios 52
processor 443
Failure Detection Interval 6
failure management
restart management 3
Fast Path databases 406
fencing 64, 75
G
Generic Resources 324, 402
CICS 333
managing 330
TSO 334
Global Resource Serialization (GRS) 8, 28
initialization 40
Global Resource Sharing (GRS)
initialization 47
GoldPlex 10
Group Buffer Pool (GBP) 369
GRS
commands 28
description 8
ring initialization in first IPL 47
H
Hardware Management Console (HMC) 102, 122, 291
image profile 122
reset profile 122
health check 304
Health Checker 258
RRS 479
HFS 485
HFSplex 487
Hierarchical File System (HFS) 485
HMC
SAD 503
HZSPRINT 266
HZSPROC 258
I
IBM Tivoli OMEGAMON z/OS Management Console 304
IEARELCN 304
IEARELEC 304
IEECMDPF program to create command prefixes 35
IEEGSYS 300
IMS
ARM 421
CF failures 417
CF structures 408, 413
Connect 323
data sharing 131, 405406
data sharing with shared queues 409
Index
523
J
Java 305
Java Batch Processing (JBP) regions 403
Java Message Processing (JMP) region 403
524
JES2
checkpoint 204
checkpoint definitions 25
clean shutdown on any JES2 in a MAS 220
cold start 214
on an additional JES2 in a MAS 216
commands 25
hot start 219
in a Parallel Sysplex 201
loss of CF checkpoint reconfiguration 213
monitor 227
Multi-Access Spool (MAS) 26, 202
reconfiguration dialog 132
remove checkpoint structure 132
restart 213
SDSF JC command 248
SDSF MAS panel 247
shutdown 220
thresholds 204
warm start 216
JES2 CF checkpoint reconfiguration 208
JES2AUX 203
JES2MON 203
JES3 271
in a sysplex 273
networking with TCP/IP 277
operator commands 281
JESPLEX 202
JESXCF 203, 273
L
list structure 8, 388
list structure (SCA) 369
Load Balancing Advisor (LBA) 323
introduction 341
lock structure 8, 369
lock structures 391
lock table entry (LTE) 391
locking
DB2 and IRLM 390
locking in a sysplex 5
LOG command 233
log records 4
log token (key) 308
LOGREC 7, 320
disable 133
system logger considerations 307
logstream
management 320
M
master console 286
MCS
extended 286
Message Flood Automation (MFA) 301
Message Processing Regions (MPR) 403
Message Queue (MSGQ) structure 415
message queues 404
message scope (MSCOPE) 289
messages
console 28
IXC207A 52
IXC211A 54
monitoring
automation product 14
monitoring JES2 227
MSCOPE implications 289
Multi-Access Spool (MAS) 26, 36, 202
multiple console support (MCS) 284
multiregion operation (MRO)
CICS
multiregion operation (MRO) 347
multisystem console support 284
N
named counter server 359
NCS structure 360
Network File System (NFS) 485
networking in a Parallel Sysplex
CICS generic resources 333
deregister CICS from a generic resource group 333
deregister TSO from a generic resource group 334
determine status of generic resources 330
managing CICS generic resources 333
managing generic resources 330
remove CICS from a generic resource group 333
NJE 277
Nucleus Initialization Program (NIP) 40
O
OLDS data sets 404
OMEGAMON 304
OMVS 484
Open Systems Adapter (OSA) 324
operator
commands table 492
OPERLOG 7, 133, 233
SDSF OPERLOG panel 235
system logger considerations 307
outage
masking 3
P
Parallel Sysplex
abnormal shutdown 68
activation 45
CFRM initialization 46
checking if SFM is active 61
checking that system is removed 68
CICS 346
consoles 283
Coupling Facility 102
definition 2
description 2
description of CF 8
description of GRS 8
IMS 406, 426
IPL 3940
of additional system 50
of first system after abnormal shutdown 48
of first system after normal shutdown 41
overview 40
problem scenarios 41
problems in a Parallel Sysplex 52
IPL of additional system 50
IPLing scenarios 41
managing 32
managing JES2 in a Parallel Sysplex 194
MVS closure 63
normal shutdown 63
partitioning 63
remove z/OS systems 5960
removing system 62
running a standalone dump on a Parallel Sysplex 71
SAD 71
SFM settings 62
shutdown overview 5960
shutdown with SFM active 66, 69
shutdown with SFM inactive 64, 69
stand-alone dump example 503
sysplex cleanup 67, 70
Test Parallel Sysplex 10
TOD clock setting in Parallel Sysplex 43
wait state
hex.0A2 66
Parallel Sysplex over Infiniband (PSIFB) 17
partitioning 63, 69
PATHIN 44, 51, 185
displaying devices 187188
displaying structures 188
recommended addressing 185
PATHOUT 44, 51, 185
recommended addressing 185
peer monitoring 6
pending state 129
persistent structures 134
planned shutdown 63
PlatinumPlex 10
policies
ARM 425
ARM policy management 90
policy
CFRM 21
change 129
pending state 129
information 166
POPULATECF parameter of REBUILD command 154
POSIX compliant, 484
power outage 48
preface xv
processor
failure 443
R
RACF
data sharing 131, 133
initialization 47
Index
525
S
SAD (see also stand-alone dump) 64
SCA 388
SCHEDULING ENVIRONMENT (SE) command 248
SDSF
DA command 237
JC command 248
MAS panel 247
526
allocation 45
cache 369
DB2 368369
duplexing
system-managed 113
list 369
list (SCA) 388
lock 369
NCS 360
rebuilding 420
resource 415
WebSphere MQ 464
structure full monitoring 119
Structure Recovery Data Sets (SRDS) 415
structures 9
as possible cause of IPL problem 56
CF 22, 105
IMS 408
connection state 109
disposition 109
during CFRM initialization 46
generic resources 330
IMS 413
list of 499
managing CF 147
move to alternate CF 129
rebuild failure 161
rebuilding 147
rebuilding in another CF 152
rebuilding in either CF 148
remove JES2 checkpoint structure 132
remove LOGREC 133
remove OPERLOG 133
remove RACF data sharing 133
remove signalling 133
signalling structures during first IPL 44
stopping rebuild 161
that support rebuild, rebuilding of 147
symbol 36
sympathy sickness 76
SYSCONS 291
SYSLOG 133, 233
sysplex 27, 286
console 27
DB2 394
definition 2
environment 2
GoldPlex 10
IMS 406
JES3 273
master console 286
partitioning 63
PlatinumPlex 10
sympathy sickness 76
three-way 11
timer 31
Sysplex Couple Data Set 6
sysplex couple dataset (CDS) 74
Sysplex Distributor 323
introduction 339
T
TCP Sysplex Distributor 7
TCP/IP 277, 323
commands 338
introduction 336
test environment
z/VM 11
three-way sysplex 11
time
common 4
consistency 5
time zone
offset setting 43
Time-Of-Day (TOD)
clock setting 43
Tivoli Enterprise Portal 305
transaction processing (TP) 346
transaction routing 348
transport class 192
troubleshooting
RRS 480
TSO
deregister from a generic resource group 334
monitoring a sysplex 36
two-phase commit 468
U
ULOG command 235
Index
527
V
value-for-money 4
VARY command
V 4020,ONLINE,UNCOND 194
VARY XCF 74
Virtual IP Address (VIPA), dynamic
dynamic VIPA 339
Virtual Telecommunications Access Method (VTAM) 323
volatility 478
VSAM
cache structures 414
VTAM 347
CF 326
Generic Resources 324
VTAM Generic Resources 7
VTAM in a Parallel Sysplex
CICS generic resources 333
commands for generic resources 330
deregister CICS from a generic resource group 333
deregister TSO from a generic resource group 334
determine status of generic resources 330
managing CICS generic resources 333
managing generic resources 330
remove CICS from a generic resource group 333
W
WADS data sets 404
WebSphere MQ
Automatic Restart Manager (ARM) 466
commands 462
introduction 456
ISPF panels 461
monitoring 461
structure 464
WEIGHT 75
WLM
policies 85
Work Context 468
Workload Manager (WLM) 7, 366
X
XCF 6, 489
CICS 346
commands 14
connectivity, unable to establish 56
initialization 40
signalling 184
signalling paths 24
signalling services 6, 194
stalled member detection 197
starting 46
XCF initialization in IPL 44
XCF initialization restarted in failed IPL 52
XCF/MRO 347
528
Z
z/OS
system log 233
z/OS Health Checker 258
z/OS Management Console 304
z/OS UNIX
files 487
introduction 484
z/VM
test environment 11
zFS 486
administration 489
(1.0 spine)
0.875<->1.498
460 <-> 788 pages
Back cover
Understanding
Parallel Sysplex
Operations best
practices
INTERNATIONAL
TECHNICAL
SUPPORT
ORGANIZATION
BUILDING TECHNICAL
INFORMATION BASED ON
PRACTICAL EXPERIENCE
IBM Redbooks are developed
by the IBM International
Technical Support
Organization. Experts from
IBM, Customers and Partners
from around the world create
timely technical information
based on realistic scenarios.
Specific recommendations
are provided to help you
implement IT solutions more
effectively in your
environment.
ISBN 0738432687