XRF Guide PDF
XRF Guide PDF
XRF Guide PDF
XRF Guide
Release 1
SC33-1671-01
CICS Transaction Server for VSE/ESA IBM
XRF Guide
Release 1
SC33-1671-01
Note!
Before using this information and the product it supports, be sure to read the general information under “Notices” on page 93.
This edition applies to Release 1 of CICS Transaction Server for VSE/ESA, program number 5648-054, and to all subsequent
versions, releases, and modifications until otherwise indicated in new editions. Make sure you are using the correct edition for the
level of the product.
The CICS for VSE/ESA Version 2.3 edition remains applicable and current for users of CICS for VSE/ESA Version 2.3.
Order publications through your IBM representative or the IBM branch office serving your locality.
At the back of this publication is a page entitled “Sending your comments to IBM”. If you want to make any comments, please use
one of the methods described there.
Copyright International Business Machines Corporation 1988, 1999. All rights reserved.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Notes on terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Determining if a publication is current . . . . . . . . . . . . . . . . . . . . . . viii
Road map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Books from VSE/ESA 2.5 base program libraries . . . . . . . . . . . . . . . . . 88
Books from VSE/ESA 2.5 optional program libraries . . . . . . . . . . . . . . . 90
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Trademarks and service marks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
The appendixes provide a checklist of what you do to create an XRF complex, and
also a sample implementation with suitable definitions.
Additional task-specific information about XRF is given in other CICS books, and
this book provides references to those books.
Notes on terminology
There is a glossary of terms of particular relevance to XRF on page /GLOSSY/.
There is a general glossary of CICS terms in the CICS Glossary GC33-1649.
Preface vii
Determining if a publication is current
IBM regularly updates its publications with new and changed information. When
first published, both the printed hardcopy and the BookManager softcopy versions
of a publication are in step, but subsequent updates are normally made available in
softcopy before they appear in hardcopy.
For CICS Transaction Server for VSE/ESA Release 1 books, softcopy updates
appear regularly on the Transaction Processing and Data Collection Kit CD-ROM,
SK2T-0730-xx and on the VSE/ESA Collection Kit CD-ROM, SK2T-0060-xx. Each
reissue of the collection kit is indicated by an updated order number suffix (the -xx
part). For example, collection kit SK2T-0730-20 is more up-to-date than
SK2T-0730-19. The collection kit is also clearly dated on the front cover.
For individual books, the suffix number is incremented each time it is updated, so a
publication with order number SC33-0667-02 is more recent than one with order
number SC33-0667-01. Updates in the softcopy are clearly marked by revision
codes (usually a “#” character) to the left of the changes.
Note that book suffix numbers are updated as a product moves from release to
release, as well as for updates within a given release. Also, the date in the edition
notice is not changed until the hardcopy is reissued.
Road map
Table 2. Getting started road map
If you want to... Refer to...
XRF does not eliminate outages. It minimizes the duration of certain kinds of
outage. Even if all unplanned failures, caused by both hardware and software
failures, could be eliminated, there would still be planned downtime for
maintenance, configuration changes, or migration. XRF reduces the impact of both
unplanned and planned outages on the end user, and thus provides a higher level
of availability than a non-XRF CICS system.
CICS with XRF is based on the use of an active CICS system, which supports the
processing requests from the end user, in combination with an alternate CICS
system, which can take over from the active if the active fails or if it is taken out of
service.
The active and alternate systems must be at the same level. For example, you
cannot match a CICS Transaction Server for VSE/ESA Release 1 active system
with a CICS/VSE 2.3 alternate. Also, if the active and alternate CICS systems
are running on separate VSE operating systems, it is advisable to use the same
level of VSE for both.
Shared
DASD
This partially initialized alternate CICS system lets you provide greater availability to
your end users. It can do this by reacting automatically to problems that cause
interruptions in service. Through the CICS availability manager (CAVM), the
active constantly communicates with the alternate, so that the alternate can record
changes in terminal usage—tracking—and monitor the well-being of the active
system—surveillance. Surveillance and tracking information is passed through the
CAVM data sets—the message data set and the control data set. These data
sets are on shared DASD, accessible to both active and alternate CICS systems.
When the alternate CICS system concludes that the active has failed, or when it is
instructed to act, it has access to all the necessary information and resources to
take over from the active system and reestablish service with the minimum of
interruption.
When the alternate takes over the running of the CICS system, it performs an
emergency restart similar to an emergency restart after the failure of a non-XRF
CICS system. Resources are recovered in the same way as they are in an
emergency restart. However, with XRF, the whole emergency restart process is
faster. This is because:
The alternate is already partially initialized.
The restart is initiated sooner because of the surveillance activity.
Most of your existing emergency restart procedures remain valid for XRF, because
XRF builds on the existing CICS emergency restart facilities.
The alternate CICS is only partially initialized. It cannot complete its initialization
until its active partner has terminated. It cannot do any normal processing until it
has taken over and become the new active system. The alternate takes up very
little resource, so, if you are using two VSE images, the second is largely available
for other work.
Terminal capability
Although XRF is made up of active and alternate CICS systems, it presents a
single-system image to the end user at a VTAM terminal. A terminal only has a
working session with an active CICS system.
When VTAM terminals log on, the alternate tracks them, and after a takeover it
tries to reestablish their sessions.
After a takeover, end users do not normally have to sign on to CICS, because
signon security may be passed from the active to the alternate CICS system. If this
facility is not implemented, end users have to follow their normal procedures for
emergency restart. If there is a task in flight at the time of takeover, that task must
be reentered.
More detailed information about different types of terminals and their XRF
capabilities is given in Chapter 5, “The terminal network” on page 37.
In either case, XRF offers end users increased system availability. There is more
information about the causes of a takeover in Chapter 2, “Types of outage handled
by CICS with XRF” on page 7.
When a failure has occurred and the alternate has become the active system, you
should initialize another alternate, and thus maintain the extended recovery facility.
To make changes to your CICS system, you can initiate a takeover to an alternate
CICS system that has already had software maintenance or its configuration
changed. That alternate becomes the new active, which can then be backed up
with a new alternate.
This book describes the decisions you make about XRF. You decide under which
conditions a takeover occurs, whether to restart failed active systems rather than
have a takeover, whether the operator has to authorize a takeover, and how much
involvement the operator has in the takeover.
Your installation might already have procedures for dealing with some of these
other types of failure—an uninterruptible power supply, perhaps, or strict
programming standards to avoid the risk of recurrent software failures.
The figures in this chapter show a multi-VSE environment. This environment could
be provided by a single CEC or by two separate CPCs. The single CPC may be
partitioned, logically (using the PR/SM feature) or physically, into a multi-VSE
environment. This environment provides cover against VSE, VTAM, and CICS
failures, as described in “XRF environments” on page 2. To guard against a CPC
failure, you require two separate CPCs. XRF running in one VSE image normally
covers only against failures in the CICS address space, and against outages that
would routinely be caused by CICS planned maintenance.
CICS outage
Figure 2 illustrates an XRF system in which the active CICS fails, resulting in the
loss of terminal sessions and the breakdown of information sent to the alternate via
shared DASD.
VSE1 VSE2
VSE/ESA VSE/ESA
Active Alternate
Shared CICS
CICS information
Boundary Network
Node
Communication
Controller
End user
XRF provides a rapid restart after the failure of the active CICS.
VTAM outage
Figure 3 illustrates an XRF system in which the VTAM serving the active system
fails, resulting in the loss of terminal sessions and the breakdown of information
sent to the alternate via shared DASD.
VSE 1 VSE 2
VSE/ESA VSE/ESA
Active Alternate
CICS Shared CICS
information
Boundary Network
Node
Communication
Controller
End user
A VTAM failure may result in a takeover, or you may restart VTAM and leave the
active running. If VTAM on the active’s side fails, it drives the TPEND exit for the
active CICS, which can then decide whether a takeover is the appropriate action.
You may select beforehand the situations where a takeover is necessary, by coding
a global user exit program for the XXRSTAT exit, or adding code to the overseer
program to cause the takeover or other action. For more information about
XXRSTAT and other global user exits, see the CICS Customization Guide.
If a takeover is not selected (the CICS default action), the active continues, in
degraded mode.
See “Multi-VSE, single-region XRF configuration” on page 25 and “User exit for
VTAM failure” on page 62 for more information.
VSE outage
Figure 4 illustrates an XRF system in which the VSE serving the active system
fails, resulting in the loss of terminal sessions and the breakdown of information
sent to the alternate via shared DASD.
Note: XRF cannot guarantee recovery for any type of VSE outage.
VSE 1 VSE 2
VSE/ESA VSE/ESA
Active Alternate
Shared CICS
CICS information
Boundary Network
Node
Communication
Controller
End user
If you have two VSE images, you can run the active CICS on one VSE, and have
the alternate CICS partially initialized on the other VSE. VTAM terminals that you
want to switch automatically from the active to the alternate, without having to log
on to VTAM again, are connected to both CICS systems through a 3745/3725/3720
communication controller.
Without XRF, a VSE (or hardware) failure means that CICS could be unavailable
for a long time. With XRF, when the active can no longer function properly, either
because of a VSE or hardware failure, the alternate is notified, through the CAVM,
of the active’s failure and initiates a takeover.
CPC outage
To cope with the failure of a CPC, and the other failures detailed previously, the
alternate CICS has to run in a separate CPC. The second CPC could be either in
a physical partition in the same processing system as the active, or in a physically
separate processing system. Running the active and alternate in different 3090s,
for example, provides XRF cover against a failure of the active’s 3090.
For a CPC failure, like a VSE failure, the alternate cannot always be certain of what
has happened to its active counterpart. The operator has to confirm to the
alternate that its active counterpart has failed because of a CPC failure and that a
takeover can go ahead. For more information, see “Checking for termination of the
active” on page 19.
Planned takeover
CICS with XRF gives you improved availability if a failure occurs. It also allows you
to shut down the active system and instruct the alternate to take over to do CICS
software maintenance, or to introduce changes into your CICS system more easily.
In a multi-VSE or two-CPC environment, XRF also helps you to take care of the
maintenance of the CPCs or of other software.
There are some maintenance activities that must be performed concurrently to both
the active and the alternate systems, and so upgrading through a takeover is
impossible. Operation in a single VSE image is also more restrictive, because
some changes cannot be made without an IPL of VSE. This applies, for example,
to maintenance of any CICS software that must reside in the SVA (shared virtual
area).
For more information about the use of XRF takeovers as a maintenance aid, see
Chapter 7, “XRF and other products” on page 67.
XRF gives you the flexibility, through a planned takeover, to choose when you carry
out maintenance. You probably would not want to perform a takeover during a
peak period, while there are many end users on the system, unless there is a good
reason for it. But you might choose to make changes more frequently, to tables for
which RDO is not available, or to parameters, or to apply PTFs, for example.
To initiate a takeover, your operator can use the CEBT transaction, or an extension
to the CEMT transaction, both described in “Supplied transactions for controlling the
alternate” on page 63.
An XRF sequence
Figure 5 on page 12 shows a possible XRF sequence. The stages in the
sequence are described in the following five sections:
.
“1. Initialization” on page 13.
“2. Synchronization” on page 15.
“3. Surveillance and tracking” on page 15.
“4. Takeover” on page 16.
“5. After takeover” on page 21.
Synchronization
Surveillance
and Tracking
CICS1
restarted as Alternate No cover by
alternate
CICS
system
Initialization
Synchronization
Surveillance
and Tracking
Time
Operator
initiated
shutdown
No cover by
alternate
Maintenance
applied to
CICS1 and
restarted as Alternate
CICS
system
Initialization
Synchronization
Surveillance
and Tracking
Operator
Alternate takes over initiated
Active resources to become takeover
an active with a
CICS higher level of
system maintenance
Boundary
Network Node
Communication
Controller
NCP
Active No path to
session Communication
Controller yet
Beginning
Control access to
data set control
Active CAVM CAVM Alternate
CICS data set CICS
processing starting
System log
Shared
data sets
Figure 6 shows that you need a pair of CICS systems to use XRF, the active and
the alternate running in a shared POWER environment. You start the active and
the alternate separately, and you can start them concurrently, or in either order.
The startup job streams for active and alternate must be very similar except for
some of the system initialization parameters (probably overrides), and certain data
set definitions.
The active and alternate systems have their own local catalog, dump, and auxiliary
trace data sets. They either share or have their own extrapartition transient data
data sets. The alternate has its own transient data destination, CXRF, which is
dynamically defined and is available to the alternate before takeover. For guidance
information about how to use CXRF, see the description of the DFHCXRF data set
in the CICS System Definition Guide. Apart from such minor differences, the active
and alternate must be compatible, with the same recoverable resource definitions.
The active and alternate sign on to the CICS availability manager (CAVM) at the
start of initialization. The CAVM is the mechanism that allows actives and
alternates to coordinate their processing. The CAVM uses a shared pair of data
sets: a control data set and a message data set. Each active and each alternate
has its own CAVM (in the CICS partition), and the active and alternate pair share
the CAVM data sets.
CAVM rejects a request from a CICS job to sign on as the active if the control data
set shows that an active is already present, or that a takeover is in progress. This
ensures that the integrity of files and databases cannot be lost because of
uncontrolled concurrent updating by two or more actives. When an active or
alternate signs on, it starts to write its own surveillance signals, and to look for its
partner’s surveillance signals.
Once a pair of CAVM data sets has been used by the active and alternate systems
that share a generic applid, those data sets may not subsequently be used by
another active or alternate with a different generic applid.
For more guidance information about the CAVM data sets, see the CICS System
Definition Guide.
The active completes its initialization normally. It then begins to provide a service
to its end users.
The alternate cannot be fully initialized because, until it takes over from its active
counterpart, it does not own the resources that can be used by only one system at
a time, such as the system log and user data sets. The alternate is initialized only
to the point at which it can monitor the active. VTAM must be running before the
The alternate cannot perform any active CICS function, for example, users cannot
log on to it, and it takes up very little resource. The only means of external
communication with the alternate is through the VSE console communication
interface or the overseer. The VSE console communication interface command is
limited to a small set of CEBT commands, described in “Supplied transactions for
controlling the alternate” on page 63. The overseer is described in “The overseer”
on page 62. The alternate carries out surveillance and tracking, writing its own
surveillance signals, reading the active’s surveillance signals, and reading
messages describing the status of terminals in the active.
2. Synchronization
When the active is initialized, and it detects that the alternate has signed on to
CAVM, they are both at the synchronization stage. The active uses CAVM
message services to send a stream of messages describing the current state of all
its VTAM terminals via the message data set to the alternate. This is called the
catch-up process, which allows the alternate to build a complete picture of the
active’s terminal resources and the status of those terminals. In this way, the
alternate is aware of the existing terminal network, and can track any VTAM
terminals.
If the alternate stops for any reason, and the active runs by itself for some time
before another alternate is started, the same catch-up process is used for the new
alternate.
Then the active and alternate enter the surveillance and tracking stage.
The active sends out surveillance signals to the alternate, and the alternate
monitors them, checking for any sign of failure in the active. If the active itself
detects a failure that prevents it from continuing to provide a service, it signs off
abnormally from the CAVM to inform the alternate of its failure. A CPC, VSE, or
serious CICS failure causes the active’s surveillance signals to stop.
While running normally, the active uses CAVM message services to inform the
alternate about changes made to the terminals installed in the system. The active
also informs the alternate of changes to the installed, logged-on, and logged-off
state of all VTAM terminals and sessions as they are acquired or released. In this
way, the alternate tracks the installed, logged-on and logged-off state of all VTAM
terminals.
Control
data set
Active
state
Alternate
Sent at state Surveillance
startup
Active
surveillance Sent at
signal startup
Surveillance
Active Alternate Alternate
surveillance
CICS CICS
signal
system Surveillance system
Sending
Surveillance Sending
Message
data set
Messages Tracking
Sending of messages
about
resources
Figure 7. Use of the CAVM data sets for surveillance and tracking
4. Takeover
A takeover can be started by several events:
The alternate detects that the active has signed off abnormally from the CAVM.
The alternate detects the disappearance of the active’s surveillance signal.
The operator or an MRO-connected partition that is taking over sends the
alternate a CEBT PERFORM TAKEOVER command.
The operator issues a CEMT PERFORM SHUTDOWN TAKEOVER or a CEMT
PERFORM SHUTDOWN IMMEDIATE command to the active.
The type of event and the TAKEOVR system initialization parameter determine
whether a takeover occurs and also the level of operator involvement in that
Active signs off abnormally from the CAVM: If the active signs off abnormally
from the CAVM, for whatever reason, and TAKEOVR=COMMAND is not specified,
the alternate starts a takeover.
TAKEOVR=AUTO
The alternate initiates a takeover automatically, when the alternate delay
interval (ADI) has elapsed.
TAKEOVR=COMMAND
The alternate does not initiate a takeover.
TAKEOVR=MANUAL
After the ADI interval has elapsed, the alternate sends a message asking the
operator whether it should try to takeover, or ignore the apparent failure of the
active. If the operator can repair the active, the alternate can be told to ignore
the loss of the surveillance signal. If the active recovers, the alternate detects
the reappearance of its surveillance signal, cancels the message to the
operator, and continues with its standby role. If the operator cannot repair the
active, the alternate should be told to begin takeover.
NCP
Access
discontinued Control Takeover
data set
Active CAVM CAVM Alternate
CICS CICS
closing down taking over
Message
data set
Access to Alternate accesses
data sets data sets to enable
discontinued takeover and
continued running
System log
Shared
data sets
Figure 8. Takeover
Takeover begins
Once it has been decided that the alternate will try to take over from the active, a
takeover request is passed to the CAVM, as shown in Figure 8. In most cases this
request will be accepted, but may be rejected for any of the following reasons:
The active has already signed off normally.
The active is not the same active as the one that the alternate had been
tracking. The CAVM detects that it is a new active, probably because of a
restart-in-place. Here, the alternate cannot continue its role, and a new
alternate should be started.
The active and alternate are on different VSE images, and the alternate has not
been monitoring the active’s surveillance signals long enough to assess the
difference between the time-of-day clocks on the two VSE images.
During takeover, the alternate uses two different mechanisms to try to force the
termination of the active CICS job, as follows:
1. If the active is still signed on to the CAVM, the alternate uses the surveillance
mechanism to try to pass a “takeover-requested” message to the active,
including a “dump” or “no-dump” indicator. If the active receives the message,
it responds by issuing an abend (Abend Code 0206) and eventually signs off
abnormally from the CAVM.
2. If the active job is still executing, the alternate also issues a CANCEL command
(prefixed by a POWER routing command in a multi-VSE configuration). The
CANCEL command is issued if the active is unable to respond to the
alternate’s request to take over.
Next, the alternate starts to process the command list table (CLT). You build your
CLT to describe what will happen at takeover. It provides the authorization to
cancel the active system, and can also contain routing information, VSE system
commands, and messages to the operator. For more information, see “Command
list table (CLT)” on page 54.
If POWER replies that the job is still executing, the alternate continues to check the
status until the interval defined by the XRFTODI system initialization parameter
expires. After that interval, the alternate prompts the operator (with message
DFHXA6561 or DFHXA6562) to investigate why the job has not stopped. There
might be a POWER problem, or an authorization problem in the CLT. The
alternate also offers this prompt if POWER is not running, or does not respond.
When active and alternate are running in different VSE images, POWER might
continue to tell the alternate that the active job is still running even though the
active’s VSE or CPC has failed. Here, the alternate cannot complete its takeover
without operator intervention. Another possibility is that the active job is still
running, and either never received the CANCEL command, or received it but could
not terminate because a system error necessitating a PCANCEL command has
occurred.
If the active’s VSE has not failed, the operator must ensure that the active job really
has terminated before informing the alternate that the active job has ended.
If the active’s VSE has failed, and the operator decides that an IPL is required, the
operator should stop the processors of the failed VSE and IPL the system, after
Here, an internal record is kept that the VSE image, identified by its POWER
SYSID and time and date of IPL, has failed. Other alternates examine this record
while they are taking over, to try to avoid operator intervention.
The alternate cannot complete takeover until the operator replies to its question,
unless either of the following occurs:
The alternate receives a late reply from POWER that the active job has
terminated
A previous reply to another alternate’s message has already confirmed CPC or
VSE failure.
In either case, the operator does not have to reply, and takeover continues.
If the clock on the new active is fast compared to that of the new active, takeover
resumes without waiting.
If you submit the archiving job for execution on the active’s VSE, and that VSE fails
while an archiving job is running, the job has to be resubmitted, and takeover might
be delayed until it finishes. This problem could be avoided by making a practice of
submitting the archiving job for execution on the other VSE.
Failure analysis
Diagnostic information about the failure of the active is provided by the termination
VSE SDUMPS. Taking a dump is a part of the CICS job, and the alternate cannot
complete its takeover until the active job has taken its dump and terminated.
CICS provides an offline dump analyzer, DFHPD410, to interpret and format the
VSE SDUMP, and thereby simplify the task of problem determination. You are
recommended to specify (via the JCL OPTION statement) SYSDUMP as the
If the active is running normally and it is being taken over because of a command
from the operator or from another CICS partition, no dump is taken, unless
requested by the command.
5. After takeover
In a multi-VSE environment, after the takeover, the operator manually switches any
devices that need to be physically connected to the new active: perhaps local
VTAM terminals, or other software outside the control of CICS.
Depending on the options you set, end users of VTAM terminals do not normally
have to sign on again after their terminals have been switched to the new active.
As in an emergency restart, an end user might have to reenter the last transaction,
if that transaction was in flight when the active failed.
In a multi-VSE environment, you must ensure that databases and other shared
information, like the system log, are placed on shared DASD. (Some shared
information, such as user journals, may be on tape.) Data specific to the active or
to the alternate does not have to be on shared DASD. If you want to collect data
across a takeover, you might have to modify utilities to read unique data from the
old active and from the new active.
Clearly, XRF involves new and changed procedures for your installation. By careful
planning and organization, you can minimize this overhead.
Performance
The CICS Performance Guide contains further information about XRF performance.
This section contains some general points.
Takeover performance
Takeover performance may be considered as the time it takes to close down the
active, establish the alternate as the running system, and switch the terminal
network. This performance depends on many factors, including the:
Number of CPCs
Model and characteristics of the CPCs
Use of logical or physical partitioning
Number of related partitions to be taken over
Number of open databases or files
Number of recoverable inflight transactions
Number of active terminals, lines, and NCPs
Recovery mode chosen for terminals
Frequency of activity keypointing
Type of dump, if any, taken by the active
Setting of the alternate delay interval (ADI) parameter
Communication management configuration in use
Time difference between the two time-of-day clocks in a multi-VSE environment
The alternate is potentially the active, so you should normally assign to it the same
priority and performance group that you assign to the active. You should also
consider the real storage isolation of the CICS system.
Both the single and the multiple-CPC environments described above do not guard
you against CICS failures. If CICS fails in either environment the CICS XRF
takeover might also fail. (The backup VSE image may not be of sufficient size.)
Restart in place of failing CICS partitions should be performed using the
(TAKEOVR=COMMAND) system initialization parameter, but this can be automated
using the overseer. See “The overseer” on page 62.
After a takeover, the new active provides the same service as the old. In a
two-CPC environment, if the new active is in a CPC that is already running near
capacity, you should make arrangements to suspend some of the work. This could
be a particular concern if the alternate’s CPC is smaller than the active’s CPC.
You might, for example, have to suspend some batch jobs temporarily.
If there are other subsystems running in the alternate’s CPC, such as SQL/DS,
and they continue to run after takeover, performance will be degraded because the
new active takes up more of the CPC’s resources. A lot depends on how the VSE
tuning parameters have been set. Refer to the VSE/ESA Operation manual, and
the VSE/POWER Administration and Operation manual.
A single 3090, logically or physically partitioned, can run multi-VSE images, making
possible a CICS with XRF system providing cover against VSE, VTAM, and CICS
outages.
The examples that follow begin with multi-VSE configurations. Even if you are not
concerned with multi-VSE configurations, it is best to read them first, because the
information builds up through the examples.
VTAM is a special case. When VTAM fails, you can initiate a takeover, but you
might gain better availability by allowing other, unaffected users to continue to work
without the interruption of a takeover. There are two ways that you can select your
course of action:
1. The XXRSTAT global user exit allows you to decide what to do if VTAM fails.
The exit allows you to abend CICS, which could lead to a takeover, or you
could do nothing and wait for VTAM to restart. For more information about the
XXRSTAT global user exit, see the CICS Customization Guide.
Boundary
Network Node
Communication
Controller
VSE1 VSE2
VTA M VTA M
Active Alternate
CICS CAVM CAVM CICS
VSE/ESA VSE/ESA
2. The overseer program, introduced more fully on page 31, can be customized to
allow you to initiate a takeover, or to wait for VTAM to recover and then act
appropriately.
More information about the exit and the overseer is given in Chapter 6, “Defining
CICS for XRF” on page 49. In this configuration, a simple exit program is probably
a more suitable tool for deciding whether to take over, rather than the more
complex overseer program.
If you are using XRF primarily to protect against non-CICS failures, for a CICS
failure you might prefer to try to restart the failing CICS region (restart in place)
before taking over, to try to minimize the disruption to the end user. You might
choose to restart in place if many terminals need manual switching, or if (in a
two-CPC configuration) the alternate CPC is heavily loaded at the time of the CICS
failure, or if the time taken by a restart in place compares well with the time taken
by a takeover. There is a further discussion of restarting in place, in an MRO
environment, on page 30.
The end users of most VTAM terminals do not have to log on to VTAM again, and,
depending on the options set, they do not have to sign on to CICS again, because
signon security may be passed from the active to the alternate. A user who is in
the middle of a transaction when the system goes down will have to go through the
same procedures as in a non-XRF emergency restart. You can provide your own
message to tell end users what to do. XRF will certainly shorten the length of the
interruption.
In this multiregion configuration, there are more things to consider about a takeover
than in a single-region configuration. The takeover is across VSE images. If one
alternate region takes over, all the related alternate regions must take over,
because interregion communication does not operate across VSE images. A CPC
or VSE failure clearly should result in a takeover of all the regions.
Boundary
Network Node
Communication
Controller
VSE1 VSE2
VTA M VTA M
Terminal- Terminal-
owning CAVM CAVM owning
region region
Application- Application-
owning CAVM CAVM owning
region region
Data-base- Data-base-
owning CAVM CAVM owning
region region
Active Alternate
CICS CICS
system system
VSE/ESA VSE/ESA
In an MRO configuration, you decide how important each region is, and whether
there should be a takeover if a region fails. The alternative to a takeover is to
restart a region in place, rather than involving all the related regions in a takeover.
Hierarchy of regions
To help understand a takeover strategy that handles regions of varying importance,
you might find it useful to think of your regions as forming a hierarchy. A typical
arrangement is shown in Figure 11.
At takeover,
CEBT PERFORM TAKEOVER
commands sent to alternates
alternate alternate
dependent dependent - specified with
region region system iniitialization
parameter
A dependent region differs from a master or coordinator region in that its takeover
system initialization parameter is TAKEOVR=COMMAND. This means that the
failure of a dependent region does not result in its own takeover, nor does it force a
takeover of the entire complex of regions. Instead, the system operator (or perhaps
the XRF overseer) tries a restart in place using existing emergency restart
procedures.
The failure of an active master region results in its takeover by its alternate region.
That alternate master region initiates its own takeover, and issues:
CEBT PERFORM TAKEOVER
commands in its command list table (CLT) to all the other alternate regions,
instructing them to take over from their active counterparts. These other regions
are the dependent regions, probably application-owning or database-owning
regions.
If there is more than one master region, one of them may be made the coordinator
region. If a master region or the coordinator region fails, then only the alternate
In this way, the coordinator is responsible for the takeover of all its MRO-connected
regions. If an alternate coordinator region is called on to start a general takeover,
and that alternate coordinator is not running for some reason, an automatic
takeover is impossible, and the operator must intervene.
You must work out in advance the strategy for each situation. For a speedy restart,
your operations should be automated wherever possible. Operators must
understand clearly what to do when any type of failure occurs. They must also
know what is happening automatically, so that they can take the speediest path to
recovery.
The overseer can be particularly useful in a large installation, where you might have
many XRF regions that are connected by MRO, with a hierarchy of coordinator,
master, and dependent regions.
If you usually run XRF on two VSE images, but one is temporarily unavailable
because of maintenance or because it has other work to do, you might choose a
single-VSE configuration to provide cover against CICS failures during that period.
With this configuration, you are able to cover yourself against CICS outages,
whether they are scheduled, for service or maintenance, or unscheduled, perhaps
because of a program error. There is no protection against outages of the CPC,
VSE, or VTAM, because these parts of the system are not duplicated. But there
are two paths from the network control program through VTAM: one to the active
CICS system, and one to the alternate. If the active fails, or if you require a
planned takeover, the alternate takes over.
Terminal
Network
Boundary
Network Node
Communication
Controller
VSE/ESA
VTAM
Active Alternate
CICS CAVM CAVM CICS
For each active region shown in Figure 13 on page 34—terminal, application, and
database—there is a corresponding alternate region. Each active-alternate pair has
its own CAVM and associated data sets.
Whichever active region fails, its alternate takes over and becomes the new active.
The other active regions are unchanged, and the new active reestablishes MRO
links with them. The effect observed by the end user depends on which region
fails. In this example, failure of the terminal-owning region would result in the
effects already described in “Multi-VSE, MRO XRF configuration” on page 27 (and
more fully in Chapter 5, “The terminal network” on page 37). Failure of other
regions is observable at the terminal only if the user is running a transaction that
uses the failing region. Such an end user would suffer a transaction failure, but
would not lose the session to CICS, nor have to sign on again.
In this sort of configuration, there is no need for the restart in place suggested for
multi-VSE configurations.
Boundary
Network Node
Communication
Controller
VTA M
Terminal- Terminal-
owning CAVM CAVM owning
region region
Application- Application-
owning CAVM CAVM owning
region region
Data-base- Data-base-
owning CAVM CAVM owning
region region
Active Alternate
CICS VSE/ESA CICS
system system
The examples are divided into single- and multi-VSE configurations, but even if you
are able to run XRF on two VSE images, there might be some systems that you
would prefer to run with the active and alternate in the same VSE.
If you have three VSE images available, you could use the third for a new alternate
CICS, if the failure of the first meant that it would be unavailable for an
unacceptably long time.
The examples also make a division into MRO and single regions, but you might find
that you want to use a combination of MRO and non-MRO XRF regions. You can
also have non-XRF regions running with XRF regions in the same VSE image.
In multi-VSE operation, you can place actives and alternates from different CICS
systems in the same VSE image.
If you have applications or databases that are rarely used, or applications that
rarely fail, they could be placed in non-XRF regions. This non-XRF region could be
a CICS Transaction Server for VSE/ESA system defined with XRF=NO as a system
initialization parameter. A failure in a non-XRF region would then be handled by an
emergency restart.
Multiregion operation links can be maintained between the non-XRF region and the
active XRF regions. In a single-VSE operation, if a takeover occurs in one of the
XRF regions, the MRO link between the new active and the non-XRF region is
reestablished. To that non-XRF region, the takeover looks like an emergency
restart.
Any terminal that you currently use with CICS can be used in an XRF environment.
XRF offers benefits to all terminals, because they may experience a faster restart.
This is because the alternate can recognize failure earlier, and because it tracks
the installed, logged-on, or logged-off state of other VTAM terminals and attempts
to reestablish sessions after takeover.
Each terminal can have a working session with only one CICS system. However,
the active CICS system notifies its alternate of all its sessions (except those defined
with RECOVOPTION(NONE)).
Transactions that are in flight at the point of takeover are backed out by CICS and
must be reentered by the end user (or by your normal restart practices). However,
depending on the signon options set, end users do not normally have to sign on to
CICS again.
Before specific terminal types and levels of service are discussed, note that there
are many factors that can affect the performance of a terminal at takeover, as
follows:
The type of terminal and its access method
The total number of terminals connected
What the end user is doing at the time of takeover
Whether the terminal has signon security
The signon options set
The type of failure of the active CICS system
Whether the terminal has to be physically switched to a second VSE image
How the terminal is defined by the systems programmer.
The active and alternate share a common generic applid. In addition, each active
and alternate has a unique specific applid to identify itself to VTAM. The end user
is only aware of the generic applid used at logon. For existing systems that you
convert to XRF, you could retain the applid that is familiar to the end user as the
generic applid, and have two new names, probably based on the generic applid, as
the specific applids.
For more VTAM information, you should consult the VTAM Network Implementation
Guide and the VTAM Operation manual. This is particularly important if you are not
accustomed to multi-VSE network environments.
The generic applid is known in VTAM terms as the USERVAR; the specific applid is
the VTAM application id. The generic applid is used by CICS for many purposes:
The first part of the APPL statement defines to VTAM the specific applids (known to
VTAM as the application ids).
The generic and specific applids have to be defined to CICS using the APPLID
system initialization parameter. See page 49 for more information.
When a terminal logs on, the “logon message”, which refers to the generic applid,
is interpreted as a logon request to the application whose specific applid is
contained in the USERVAR. In this way, the USERVAR table relates the generic
applid (which does not change) to the specific applid of the current active, and
VTAM can identify the CICS system to which the terminal’s active session should
be connected.
Figure 14 on page 40 shows a set of definitions, with CICS1 as the active system
and VTAM1 as the network owner. At startup, the active uses the:
F NET,USERVAR,ID=generic-applid,VALUE=specific-applid
command to set its specific applid (CICS1 in the Figure 14 in the VTAM USERVAR
table. The USERVAR table contains an entry like this:
CICS, CICS1
which ensures that logons are directed to the current active. The TYPE=DYNAMIC
parameter (the default) specifies that this USERVAR entry is for an XRF system
that is likely to change its specific applid periodically.
The user’s logon message “CICS” is associated with the correct specific applid by
VTAM’s USERVAR processing.
At the start of a takeover, the alternate changes the setting of the USERVAR to its
own specific applid, so that logons to a failing active are stopped as soon as
possible.
Unless you have other, non-XRF, uses for USERVARs that conflict with such
USERVAR processing, you are recommended to allow VTAM to manage this
propagation of USERVARs. If you leave the operator to propagate the USERVAR,
and there is a delay before the operator issues the command, some new users
cannot log on to CICS during that delay.
There are no XRF-specific changes for the SNA unformatted system services
(USS) tables.
In the example in Figure 15 on page 41, there are the following considerations:
The ownership of the network by the VTAM in VSE1
The cross-domain definitions of the network to VSE2
The local definition of application CICS1 in VSE1
The cross-domain definition of application CICS1 in VSE2
The local definition of application CICS2 in VSE2
The cross-domain definition of application CICS2 in VSE1.
CICS1 CICS2
F NET,USERVAR,ID=CICS,
VALUE=CICS1
VTAM1 VTAM2
NCP
TERMINAL TE1
BNN
Communication
Controller
VSE1 VSE2
CICS1 CICS2
The following partial NCP definition defines VSE1 as the network owner, and the
terminals in that network:
BUILD......,BACKUP=35/
GROUP....,LNCTL=SDLC,....,OWNER=VSE1
LINE...
PU...
TE1 LU...
TE2 LU...
TE3 LU...
The following partial definition defines CICS1 on VSE1, with a cross-domain
definition for CICS2:
CICS1 APPL....HAVAIL=YES
VBUILD TYPE=CDRSC
Table 3 describes the two classes of terminals in an XRF environment, how XRF
supports them, and what the user can expect at a takeover.
In this table, the word “terminal” does not just describe a simple terminal device,
but also describes a component of a terminal system, including a programmable
controller and its attached operator terminals, printers, and remote subsystems.
The RECOVOPTION terminal definition keyword and the signon options modify the
service that CICS gives to each terminal, but initially the default values of these
keywords are assumed. The defaults give each terminal the best service that its
characteristics allow. The effects of using alternative settings of the terminal
definition keywords, and of signon security, are discussed under “Defining the
recovery process” on page 44.
After a takeover, the new active tries to establish new sessions for terminals that it
tracked when they were in session with the old active. This reconnection may not
succeed immediately because you may need to transfer the connection of some of
these terminals manually from one to the other. So CICS tries again at intervals of
1, 2, 4, and 8 minutes after the first attempt. The timing of the first attempt
depends on the value set by the AUTCONN system initialization parameter.
After the reconnection transaction has finished, you either use operator intervention
to reacquire remaining sessions, or the users themselves log on again. This
situation could arise if the VTAM that owns the network has failed, and it takes
more than 8 minutes to restart it. In that case, all terminals that are normally
reconnected will require some sort of intervention.
If the network owner has not failed, end users might experience a short interruption
in service, and the takeover has the appearance of an emergency restart. If the
session is successfully reestablished, end users of such terminals do not have to
log on again, nor, depending on the options set, do they have to sign on to CICS
again. The “good morning” message is displayed. The end user must be aware
that logon or signon might not be necessary. For more information about the
options that control signon, see “Signon after takeover” on page 45.
You must consider how your operations staff will transfer class 2 terminals from
one VSE to another in a multi-VSE environment. In a single-VSE system, this is
not a problem, but you might still need procedures for connecting class 2 terminals
to a new active after a takeover.
Notes:
1. There is a technique that allows local terminals to be reconnected to the new
active, but it involves you in additional programming. If local terminals are
attached to an IBM 3814 communication controller and a multisystem
configuration manager (MSCM), you can write a program to provide the
physical transfer from the active to the alternate. If you add to the program an
operator interface that could be driven from the CLT, the operator is not
involved in the physical switching. If you already have terminals attached
through a 3814 and MSCM, you might be interested in this form of switching.
Notes:
1. For both UNCONDREL (which means that any session is unbound) and
RELEASESESS (which means that only active sessions are unbound) the
RECOVNOTIFY message or transaction is not run. The “good morning”
message (if defined) is sent instead.
2. If the VTAM network owner fails, any session that is to be unbound and then
rebound will only be unbound. It cannot be rebound until VTAM network
ownership is reestablished.
So, to summarize, there are three levels at which terminals may be forced to sign
off at takeover and end users have to sign on again. This is shown in Figure 16.
SIT
TYPETERM
ESM
CICS
segment
single terminal
entry
set of terminals
all terminals
You must consider the effect of the system initialization AUTCONN parameter.
AUTCONN delays the reconnection of terminals (see “Starting the alternate” on
page 51), so you might choose to extend the XRFSTME value to allow these
terminals to be reconnected and remain signed on.
Note: When a CEMT PERFORM SECURITY (REBUILD) command is issued to
the active CICS, it uses the message data set to tell the alternate that the ESM
resource profiles have been rebuilt. ESM definitions must be the same for the
active and alternate. If the active fails at the time of the rebuild, a message warns
the operator if the rebuild has not been successful.
If you have an earlier level of VTAM, the subsystems must first determine which of
the two CICS systems is the active by issuing the INQUIRE USERVAR command
to VTAM. This returns the specific applid that has been set in that user variable.
CICS-to-CICS communication
An active can communicate, using ISC, with:
A CICS/VSE Version 2 system
A CICS/ESA Version 3 system
A CICS/ESA Version 4 system
A CICS Transaction Server for OS/390 Version 1 system
A CICS OS/2 Version 2 system
A CICS for OS/2 Version 2.0.1 system
A CICS/VM system
A CICS 400 system
A CICS/6000 Version 1.0 system
A CICS for Windows NT system
CICS on Open Systems:
– CICS/6000 Version 1.2
– CICS for DEC OSF/1AXP
– CICS for HP 9000.
Bind format
The format of the bind that the active sends to the terminal or secondary logical
unit (SLU) contains the normal primary logical unit (PLU) name field. The contents
of this name field depend on whether the PLU or the SLU initiated the session; that
is, whether the terminal user logged on to CICS, or CICS acquired the terminal.
If the PLU initiated the session, the field contains the PLU name. This will be
the specific applid of the CICS system.
If the SLU issued the INITSELF, the name field contains the uninterpreted
name as carried in that RU. This is the generic applid of the CICS system.
This is no different from what happens in the normal SNA environment, but in an
XRF environment it may become significant if the SLU examines this name field. If
the SLU relies on the host to initiate the session (using the RDO attribute
AUTOCONNECT(YES), for example), the contents of this name field vary according
to which system is the active.
APPC architecture has defined the structure of the bind user data fields. One of
these user data fields is reserved for the PLU name, and CICS uses this field to
Programmable terminals
You may have programmable, or “intelligent”, LU0 terminals that examine the bind
parameters they receive from CICS. As discussed above, if such terminals
examine the PLU name in the bind, their programs might need modification to
accept a bind from both the active and the alternate.
Active Alternate
CICS VTAM CICS VTAM NCP
(VSE1) (VSE1) (VSE2) (VSE2) BNN Terminal
INITSELF
CINIT
Transaction data
Failure
INIT
CINIT
Transaction data
For reference information for tables, see the CICS Resource Definition Guide
manual. For system initialization, see the CICS System Definition Guide. Two
specific sample implementations are given in Appendix B, “Sample XRF
implementations” on page 75.
Advice about terminal operands that can influence the takeover characteristics for
individual terminals is given in Chapter 5, “The terminal network” on page 37.
Most of the system initialization parameters operands are the same as for a system
specified with XRF=NO. When an active is started, operands that are only for an
alternate do not take effect. If that system is subsequently started as an alternate,
those operands then apply. Similarly, when an alternate is started, operands for
actives only take effect if it takes over and becomes the new active. Only operands
affecting XRF are described in this section.
START=AUTO
This gives you a normal cold, warm, or emergency restart.
XRF=YES
The system signs on to CAVM because XRF support is required.
APPLID=(generic-applid,specific-applid)
The generic applid is the applid of this matching active and alternate pair. It is
the applid by which the system is known to the end user. It is also used in
interregion communication.
The specific applid is the applid for the active. It is used by CICS when CICS
opens the VTAM ACB. See “VTAM and NCP considerations for active and
alternate” on page 37 for more information.
PDI=30|decimal-value
decimal-value is the interval (in seconds) before the active tells the operator
that it cannot detect the alternate’s surveillance signal. This value is not
critical. The default value is 30 seconds. No other action is taken; the active
continues to operate as if the alternate were still present.
AIRDELAY=700|hhmmss
hhmmss is the restart delay (in hours, minutes, and seconds) that will elapse
after a takeover before autoinstalled terminal entries are deleted if they are not
in session. The default value is 700, that is, 7 minutes. A zero value means
that the TCTTE of an autoinstalled terminal is not written to the catalog. You
might choose a zero value to improve normal emergency restart times or your
autoinstall performance. For XRF systems, a zero value means that you might
lose some autoinstalled terminal entries if there is a takeover during the
catchup process. This is because the information about an autoinstalled
terminal might not have been passed to the alternate through the message
data set, and the alternate cannot learn about that terminal from the catalog.
The end user of that terminal has to log on again. You should set the same
restart delay value for both the active and the alternate, to maintain the
takeover characteristics for autoinstalled terminals over several takeovers.
XRFSOFF=FORCE|NOFORCE
This operand is used by the active to determine whether it should send signon
information to the alternate.
FORCE specifies that the active ensures that the alternate does not have any
terminals signed on after a takeover.
NOFORCE (the default) allows you to be more selective about the terminals
that are signed off, by using the RDO TYPETERM definition or the ESM CICS
segment.
For more information, see “Signon after takeover” on page 45.
START=STANDBY
Specifies that the system you are starting is an alternate.
APPLID=(generic-applid,specific-applid)
generic-applid must be the same as that in the SIT of its matching active, but
the alternate has a different specific applid.
CLT=xx
Specifies the command list table to be used if a takeover occurs. xx specifies
that table DFHCLTxx is to be used. The CLT applies only to the alternate.
The CLT is described in “Command list table (CLT)” on page 54.
TAKEOVR=AUTO|MANUAL|COMMAND
AUTO specifies that the takeover is to be automatic, requiring no intervention
by the operator. The alternate requests help from the operator only if it needs
confirmation that the takeover can proceed safely. Possible causes of a
request to the operator are described in “Supplied transactions for controlling
the alternate” on page 63. The operator can always issue a takeover
command to an alternate, whatever takeover system initialization parameter is
specified. So, if you define a system with TAKEOVR=AUTO, you retain the
right to order a takeover. You can also change the takeover operand
dynamically. “Supplied transactions for controlling the alternate” on page 63
tells you about issuing operator commands to the alternate.
COMMAND is the most restrictive type of takeover, whereby the alternate
sends a message to the operator and takes over only when it receives a
command to do so. This command could come from the operator (or the
Note: If the active CICS VSE image fails, the operator must confirm to the
alternate that takeover may proceed.
ADI=30|decimal-value
Defines the delay (in seconds) before the alternate takes action after it has
noted the disappearance of the active’s surveillance signal. If you have coded
TAKEOVR=AUTO, the alternate initiates a takeover. The ADI value here has
to be a compromise, as follows:
A low ADI value means that the alternate does not wait long before it starts
its takeover process. So, a low value could mean a more rapid takeover
after the active fails.
A high ADI value reduces the risk of unnecessary takeovers, which might
otherwise happen, when the active system has not failed, but has been
temporarily prevented from transmitting its surveillance signals.
AUTCONN=0|hhmmss
Delays the reconnection, after a takeover, of tracked terminals in session at the
time of failure. The default is zero.
You might set a delay to allow the operator to do some manual switching of
lines.
AUTCONN also applies to an active start. If you specify a long delay, terminals
at normal start will be affected, unless you specify AUTCONN as an override.
XRFSTME=nn|5
This operand has already been described on page 46. It gives a time limit for
signed-on terminals. When a takeover has not completed by the expiry of the
time limit, terminals that would normally be in a signed-on state after a takeover
are signed off.
XSWITCH=(0-254,progname,{A|B})
This option, described more fully in “Starting the active” on page 50, defines a
programmable terminal switching unit. The unit may be operated, using the
program defined in this option, to switch terminal lines to the alternate's CPC
during takeover.
XRFTODI=30|decimal-value
Defines the interval (in seconds) between takeover initiation and the point at
which the alternate first prompts the system operator to investigate why the
alternate cannot proceed. The alternate asks for this help if POWER is unable
to inform the alternate that the active has stopped. The XRFTODI value might
have to be a compromise, as follows:
If an alternate in one VSE takes over from an active that is one of a set of
MRO-connected regions running in a second VSE, the remaining alternates must
be forced to take over, so that the MRO communication can continue. The CAVM
can achieve this by issuing VSE system commands, which are coded in the CLT,
causing each of the related alternates to take over.
This information is therefore placed in the CLT. Unlike other CICS tables, the CLT
is not loaded permanently when the alternate is initialized. It is loaded temporarily
during initialization of the alternate, and when the alternate detects that an active
job has signed on to the CAVM. This temporary loading is only for validity
checking, after which it is discarded until takeover. (The validity check gives an
opportunity to correct any problems, before the CLT is needed at takeover.)
Loading only at takeover time means that you do not have to stop and
subsequently restart an alternate to provide it with a changed CLT. During
takeover, CICS loads the CLT, and deletes it again after the CAVM has processed
the information.
Usually, each alternate needs a different CLT, but you may combine several of
these CLTs in a single CLT load module. The specific applid of the alternate is
used to select the relevant part of the single CLT when that alternate takes over.
Using a single CLT might make it easier for you to manage your CLTs, especially
in a large installation with many interconnected CICS systems.
If CICS2 is running as the alternate and it is told of a failure in the active (CICS1),
or the operator instructs CICS2 to take over, DFHCLT02 is used. The FORALT
operand of the DFHCLT macro allows CICS2 to cancel JOB1.
DFHCLT/2 DFHCLT TYPE=INITIAL,
SUFFIX=/2
DFHCLT TYPE=LISTSTART,
FORALT=(CICS2,JOB1)
DFHCLT TYPE=WTO,
WTOL=MESSAGE
MESSAGE WTO 'CICS2 IS TAKING OVER, PERFORM MANUAL OPS',
MF=L
DFHCLT TYPE=LISTEND
DFHCLT TYPE=FINAL
END
Putting together the macros described in the CICS Resource Definition Guide
manual, the sample CLT following the figure defines the CICS2 system illustrated in
Figure 18 on page 56.
Active Alternate
SIT: SIT:
START=AUTO START=STANDBY
TAKEOVR=AUTO
TAKEOVR=AUTO
CLT=02
CLT=01
DFHCLT02
Specific applid=
CICS2
Authorization
to cancel JOB1
Messages to
operator
The system initialization parameters and the CLT determine the takeover policy for
each active-alternate pair, and for groups where the actives are connected by
MRO. In a hierarchy of communicating XRF regions, you use the CLT and the
TAKEOVR system initialization parameter to structure the regions into dependent,
master, and coordinator regions. The effect of a takeover of each type of region is
as follows:
The failure of an active dependent region does not automatically cause a
takeover. Such a takeover is always initiated by a command from the operator
or from another region. An alternate dependent region does not command
other alternate regions to takeover.
The takeover of a failing master region forces the takeover of all
communicating regions to the alternates in the second VSE image.
If there is more than one master region, one of them may be used as a
coordinator to organize the takeovers.
In the next example, shown in Figure 19 on page 58, there are two active regions,
connected by MRO, in a multi-VSE configuration. The master region has
TAKEOVR=AUTO as its system initialization parameter. Its dependent region has
the TAKEOVR=COMMAND system initialization parameter. The alternate master
region’s CLT authorizes the cancellation of the active master job, and the alternate
dependent region’s CLT authorizes the cancellation of the active dependent job.
Active Alternate
VSE 1 VSE 2
MASTER DFHCLT01 MASTER
(TAKEOVR= TAKEOVR=
COMMAND) COMMAND
(CLT=01) CLT=01
START=AUTO START=STANDBY
In this hierarchy, if the alternate master region takes over from its failing active
counterpart, it sends a command to the alternate dependent region telling it to take
over from the active dependent region; the
MODIFY JOBD2,CEBT PERFORM TAKEOVER
command for the dependent region is coded in the CLT of the master region, and is
shown in the figure. On receipt of this command, the dependent alternate region
initiates a takeover. The CEBT transaction is described in “Supplied transactions
for controlling the alternate” on page 63.
If the dependent region fails, its alternate does not take over because of the
TAKEOVR=COMMAND system initialization parameter. It takes over only on
receipt of a command, and not automatically. Instead, the alternate sends a
message to the operator stating that the active’s surveillance signal is missing or
that the active has signed off abnormally. The operator, or the overseer, might
decide to try to restart the failed region in VSE1. This would avoid the disruption in
the service provided by the master region that would occur on a takeover to VSE2.
If the restart failed, it might be necessary to effect a takeover of both regions by
issuing a CEBT PERFORM TAKEOVER command to the master alternate region.
For restart in place, see “Restarting regions in place” on page 30.
With an MRO configuration, you can code a single CLT for all the regions involved.
So, in the configuration discussed here, it could be for both master regions and
both dependents. The FORALT operand indicates the section for a particular
region. In the example CLT following the figure, only the entries for the current
alternates (M2 and D2) are shown, for clarity.
DFHCLT TYPE=COMMAND,
COMMAND='MODIFY JOBD2,CEBT PERFORM TAKEOVER'
DFHCLT TYPE=WTO,
WTOL=MESSAGE
MESSAGE WTO 'TAKEOVER TO NUMBER 2 REGIONS',
MF=L
DFHCLT TYPE=LISTEND
DFHCLT TYPE=LISTEND
DFHCLT TYPE=FINAL
END
You can extend the usefulness of the CLT by adding other commands to the CEBT
commands shown here. The CLT can be used to issue any VSE commands that
are needed to complete the takeover, for example, VTAM VARY NET commands.
In this way, you can reduce the need for the operator to be involved.
Alternate master
region
2 4
Alternate
coordinator
region
Other alternate
masters and
alternate
dependents
Notes:
1. When the active master region fails, it triggers the alternate master region.
2. The alternate master region issues a CLT command to the alternate
coordinator region to initiate a takeover.
3. The alternate coordinator region issues CLT commands to alternate dependent
regions to initiate takeovers.
4. The alternate coordinator region sends a redundant command back to the
alternate master region to initiate a takeover. If the coordinator active region
had failed, rather than the master, this command would not be redundant.
If a coordinator region fails, its alternate uses the CLT to issue CEBT PERFORM
TAKEOVER commands to all other alternate regions, master and dependent. If a
master region fails, its alternate will initiate a takeover, and issue a command to the
alternate coordinator region to take over. Then the coordinator will issue its own
commands to all regions, in the way that a single master region would.
User exit XXRSTAT is called after CICS has been told of a VTAM failure by the
TPEND exit. This occurs just before the update of status information that will
become available to the alternate through the CAVM data sets. In the exit you can
choose what to do following a VTAM failure. You can tell CICS to take any of the
following actions:
Abend CICS and thus force a takeover, or whatever action you have specified
if that region abends. You may specify a dump with the abend. The status
information is not written to the control data set. If you do require a takeover,
you need the TAKEOVR=AUTO or TAKEOVR=MANUAL system initialization
parameter.
Allow the CICS region to continue, after updating the status information to tell
the overseer that VTAM has failed. The overseer then performs the action that
you have specified for this particular combination of circumstances, as
described in the next section.
Suppress the update of the status information, and allow the CICS region to
continue, on the assumption that the VTAM region will be restarted. In this
way, the overseer, if present in the system, is not made aware of the VTAM
failure and does not go through its VTAM failure procedure.
In some configurations, you might prefer to handle VTAM failures in the exit
program (by initiating a takeover or tolerating the VTAM failure) instead of in the
overseer. The exit program is probably quicker and relatively simple to implement.
The overseer is more complex, and could be slower. However, the overseer allows
you to use more complicated logic to deal with the situation.
The overseer
The overseer was introduced on page 31. The IBM-supplied sample overseer can
perform two functions. It can display the status of XRF regions, and it can restart a
failed region in place. The overseer sample source is named DFH$AXRO, and is
supplied in the VSE/ESA sublibrary PRD1.BASE. There is also a pregenerated
version ready to use. See the CICS Customization Guide for guidance information
about using the overseer, and for definitive product-sensitive programming
information about the interface for defining actives and alternates to the overseer.
You can write your own overseer program to extend its capabilities. The overseer
can perform non-CICS functions. Here are some examples of what the overseer
can do:
Display its status information in a suitable format at regular intervals.
Examine information about VTAM failure passed by the user exit, and act
accordingly. Information is available to the overseer about the last eight
The sample overseer carries out basic functions, which will be adequate for some
installations. Other installations will accept the added complexity and significant
programmer effort involved, and extend the scope of the overseer.
The CEBT transaction is usable from the time when the alternate is initialized to the
time after takeover when CEMT becomes usable. The operator can use CEBT to
do the following:
Request the alternate CICS to take over.
This is relevant for a failed dependent region, which is taken over only when its
alternate receives specific instructions. The failure of a dependent region
results in a message to the operator, and the operator can then decide what to
do. The first thing to do would probably be to try to restart the failed region;
you can use the overseer to automate that process. If it is impossible to restart
the region, the operator might initiate a general takeover to the other VSE
Data sets that are shared, passively or actively, such as user VSAM data or DL/I
data sets, must be placed on shared volumes or VSAM spaces. For more
information about data sets, see the CICS System Definition Guide.
CICS does not save any of the storage-related system initialization parameters in
the global catalog, including the values for DSALIM and EDSALIM.
This support is not described in this book. For guidance information about DB2 for
VSE/ESA, see the DB2 for VSE/ESA library.
Note that after a takeover you can automatically initiate the CIRB transaction
(required for the DB2 for VSE/ESA online resource manager), by using CICS
sequential device support. Sequential device support is described in the CICS
Resource Definition Guide, and the CICS Application Programming Guide.
DL/I VSE
CICS with XRF supports DL/I VSE
This support is not described in this book. For guidance information about the DL/I
DOS/VS interface, see the DL/I VSE Release Guide.
NetView
You can use the network management product NetView to add function to XRF.
One possible use of NetView is to propagate changes in the USERVAR value to
remote VTAMs that are in communication with the local VTAM of the XRF complex.
However, you are recommended to leave this propagation to the VTAM automatic
USERVAR facility, described in “VTAM and NCP considerations for active and
alternate” on page 37
In this section, we give you an overview. For further reading, see the Network
Program Products Planning manual.
When a 37xx or its NCP fails, VTAM issues an error message. You can pass this
message to NetView, which compares the message with its message table. If
there is a match, NetView initiates a CLIST that corresponds to that message.
If you prefer not to automate such a procedure, you can send messages to the
operator, requesting intervention. Alternatively, the CLIST can attempt to reload the
37xx communication controller. If the 37xx communication controller cannot be
reloaded, you can use a further CLIST to prompt the operator to switch to another,
if one is available. You can then use a CLIST to acquire resources from the failed
37xx and activate them for the new one.
Figure 21 illustrates the sequence of events from the failure of the NCP, through
VTAM, NCCF, the message table and a CLIST, to the sending of a message to the
operator.
37xx
NCP
VTAM
Message
NetView
Message table
Match
Message to operator
CLIST
No match
Recovery action
Application programs
Ensure that your existing application programs run in an XRF environment.
You should look at those programs that depend on the specific applid, or that
have unsupported interfaces into CICS code.
DL/I VSE
Ensure that table definitions, shared DASD, and system logging are suitable for
DL/I VSE databases. For more information, see The DL/I VSE Release Guide.
Dump
Determine if you need a dump of a failing active.
Make sure that you initialize CICS, with an appropriate system initialization ADI
value to avoid unnecessary takeovers. See page 53.
NCP
Define NCP for XRF.
Operator instructions
Prepare operator instructions, so that the operators understand the CEBT
transaction, the way an XRF takeover works, and any extra tasks they might
have to perform. There is information about operating XRF throughout this
book. For further guidance information, see the CICS Operations and Utilities
Guide.
POWER jobnames
The POWER jobnames must be unique when running XRF.
Programmable terminals
Ensure that your terminals have any extra code they need to enable them to
connect to whichever system is the active.
Shared DASD
Many data sets for XRF must be on shared DASD, in particular the CAVM data
sets. The CICS System Definition Guide. gives advice about the
characteristics of each data set.
Signon options
Ensure that each terminal has the correct characteristics for signon after a
takeover. See “Signon after takeover” on page 45 for more information.
System logging
System logging must be on two disk extents.
Consider using automatic archiving for journal archiving. The CICS Operations
and Utilities Guide describes automatic archiving.
Table definitions
You need to consider the definitions for the:
There is some guidance about definitions in this book. For more details, see
the CICS Resource Definition Guide
Takeover message
Code a message, or write a transaction, to provide information to terminal users
after takeover, if required.
Time-of-day clock
The setting of the clocks in a multi-VSE environment must be as close as
possible at IPL. If the alternate clock is running later than the active clock
there is a delay at takeover.
User exits
Ensure that the current user exit programs run in an XRF environment. You
should check the function, timing, and use of data of each exit program.
XXRSTAT exit
Create a user exit program for the XXRSTAT exit, if required. For more
information, see “User exit for VTAM failure” on page 62. For the definitive
product-sensitive programming interface information about global user exits,
see CICS Customization Guide
Appendix A. Checklist 73
74 CICS Transaction Server for VSE/ESA XRF Guide
Appendix B. Sample XRF implementations
In this appendix there are two sample implementations:
1. A single CICS region with an alternate in a second VSE image
2. An MRO configuration, with a dependent region, a master region, and a
coordinator region, with actives and alternates in separate VSE images.
This appendix gives an overview of the SIT and SIT system initialization overrides,
and CLT definitions. If you need more information about the SIT and CLT, see
Chapter 6, “Defining CICS for XRF” on page 49. The CICS System Definition
Guide contains a sample startup job stream.
In the following examples it is assumed that SIT overrides are entered using
SYSIPT and not the CONSOLE.
In this example, CICS1 is started as the active and CICS2 as the alternate.
DFHSITAA
DFHSIT .....
,SUFFIX=AA
,XRF=YES
,START=STANDBY /@ (May be altered by override)
,APPLID=(CICS,CICS1) /@ (May be altered by override)
,ADI=4/ /@ (Alternate only)
,PDI=4/ /@ (Active only)
,TAKEOVR=MANUAL /@ (Alternate only)
,CLT=/1 /@ (Alternate only)
,XRFTODI=35 /@ (Alternate only)
,AUTCONN=/
,AIRDELAY=7// /@ (Active only)
,XRFSOFF=NOFORCE /@ (Active only)
,XRFSTME=5 /@ (Alternate only)
,.....
CICS job JOB1: The SIT overrides in JOB1 required to initialize CICS1 as the
active on VSE1 are as follows. SIT parameters for an alternate are ignored during
an active startup. If you want to start CICS1 as an alternate, remove the
START=AUTO override from the SYSIPT data set, because START=STANDBY
has been coded in the SIT table AA.
@ $$ JOB JNM=JOB1,CLASS=2,DISP=L
...
// EXEC DFHSIP,SIZE=DFHSIP,PARM=' ....,SI',OS39/
....
,SIT=AA
,START=AUTO /@ (Could be COLD or EMER)
,APPLID=(CICS,CICS1) /@ (Not strictly necessary, but
,..... /@ (compatible with the job for
,..... /@ (specific applid CICS2)
Terminal
BNN
Communication
Controller
Generic applid
VTAM VTAM
POWER1 POWER2
CICS1 CICS2
Each alternate uses the CLT entries that apply to its specific applid. The FORALT
option indicates that the entries that follow it are for the systems with the specific
applids shown in the FORALT option. Each system using this CLT will have been
initialized with the START=STANDBY and CLT=01 parameters.
The sample CLT demonstrates that a single CLT, with one sequence of commands
and messages, can be used for both CICS jobs. This is possible here because
both jobs execute the same set of commands and messages. If you wanted to
issue different commands or send messages that depend on which job is taking
over, you could still use a single CLT, but you would have a separate LISTSTART
and LISTEND for each of the specific applids.
The operator can initiate a takeover of all the regions by issuing a CEBT
PERFORM TAKEOVER command to the coordinator region. By doing this, all
regions are taken over by their alternates. A CEBT PERFORM TAKEOVER
command issued to a dependent region does not cause a takeover of all the
regions. To allow this would require additional entries for the dependent portions of
the CLT. There would be no benefit in having extra entries, because the
advantage of issuing the CEBT command to the coordinating region is that doing
so minimizes the flow of commands from the CLTs.
Note: For this example, only three regions are shown, one of each kind. Adding
more dependent regions to the example would not illustrate anything new, because
the entries for each of them would be basically the same. However, in a real
system with only three regions, you probably would not want the added complexity
of a coordinator because it saves very few CLT commands.
Note: POWER1 and POWER2 share the spool and DASD.
BNN
Communication
Controller
Generic applid
VTAM VTAM
POWER1 POWER2
C1 C2
M1 M2
D1 D2
If you want to start JOBC1 as an alternate, you should remove the START=AUTO
override. This applies to all of the jobs that follow that are initially started with
START=AUTO because START=STANDBY is coded in each SIT.
If the alternate coordinator region is taking over, it uses CEBT to force the other
regions to take over. If the master region fails and is being taken over by its
alternate, that alternate forces the alternate coordinator to take over, and the
coordinator instructs the other regions to take over. In this example, the command
to the alternate master region is redundant, because it has already begun its
takeover processing. But in a larger MRO complex, where the addition of a
coordinator is more worthwhile, the number of redundant commands would not
increase with the extra regions.
However, you might not want the added complexity of a coordinator. If there were
no coordinator, each master region would contain two CEBT commands to the
other regions in the complex.
General
Master Index SC33-1648
Trace Entries SX33-6108
User’s Handbook SX33-6101
Glossary (softcopy only) GC33-1649
Administration
System Definition Guide SC33-1651
Customization Guide SC33-1652
Resource Definition Guide SC33-1653
Operations and Utilities Guide SC33-1654
CICS-Supplied Transactions SC33-1655
Programming
Application Programming Guide SC33-1657
Application Programming Reference SC33-1658
Sample Applications Guide SC33-1713
Application Migration Aid Guide SC33-1943
System Programming Reference SC33-1659
Distributed Transaction Programming Guide SC33-1661
Front End Programming Interface User’s Guide SC33-1662
Diagnosis
Problem Determination Guide GC33-1663
Messages and Codes Vol 3 (softcopy only) SC33-6799
Diagnosis Reference LY33-6085
Data Areas LY33-6086
Supplementary Data Areas LY33-6087
Communication
Intercommunication Guide SC33-1665
CICS Family: Interproduct Communication SC33-0824
CICS Family: Communicating from CICS on System/390 SC33-1697
Special topics
Recovery and Restart Guide SC33-1666
Performance Guide SC33-1667
Shared Data Tables Guide SC33-1668
Security Guide SC33-1942
External Interfaces Guide SC33-1669
XRF Guide SC33-1671
Report Controller User’s Guide SC34-5688
CICS Clients
CICS Clients: Administration SC33-1792
CICS Universal Clients Version 3 for OS/2: Administration SC34-5450
CICS Universal Clients Version 3 for Windows: Administration SC34-5449
CICS Universal Clients Version 3 for AIX: Administration SC34-5348
CICS Universal Clients Version 3 for Solaris: Administration SC34-5451
CICS Family: OO programming in C++ for CICS Clients SC33-1923
CICS Family: OO programming in BASIC for CICS Clients SC33-1671
CICS Family: Client/Server Programming SC33-1435
CICS Transaction Gateway Version 3: Administration SC34-5448
VSE/ICCF
VSE/POWER
VSE/VSAM
Bibliography 89
VTAM for VSE/ESA
DL/I VSE
Bibliography 91
Screen Definition Facility II (SDF II)
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing,
to:
For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property Department in
your country or send inquiries, in writing, to:
The following paragraph does not apply in the United Kingdom or any other country where such provisions
are inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION “AS IS” WITHOUT
WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.
Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore this statement
may not apply to you.
This publication could include technical inaccuracies or typographical errors. Changes are periodically made to the
information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without
notice.
Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of
information between independently created programs and other programs (including this one) and (ii) the mutual use
of the information which has been exchanged, should contact IBM United Kingdom Laboratories, MP151, Hursley
Park, Winchester, Hampshire, England, SO21 2JN. Such information may be available, subject to appropriate terms
and conditions, including in some cases, payment of a fee.
The licensed program described in this document and all licensed material available for it are provided by IBM under
terms of the IBM Customer Agreement, IBM International Programming License Agreement, or any equivalent
agreement between us.
CICS, IBM,
CICS/ESA, NetView,
CICS/MVS, Processor Resource/Systems Manager,
CICS/VSE, VSE/ESA,
DB2 for VSE/ESA, VTAM,
DL/I VSE, 3090
Other company, product and service names may be the trademarks or service marks of others.
Index 97
signed-on state 14 takeover (continued)
signing on to CICS, options for defining 45 defining type of 49
signing on to the CAVM 14 description of 16
signon security 26 failures that do not cause a 5
single-system image 37 performance 22
SIT (system initialization table) 49 planned 10
MRO CICS sample 79 starting the 16
naming active and alternate 50 strategies for multi-VSE environments 27
overrides 76, 81 system initialization parameters 16, 51
single-CICS sample 76 unnecessary 53
SLU (secondary logical unit) 47 TAKEOVR, system initialization parameter 16, 51
SMF system identification (SID) 20 telecommunication network failures 5
SNA (Systems Network Architecture) terminals
SNA flows 48 autoinstalled 50
USS tables 39 BSC 3270 42
software failures recurring after takeover 5 class 2 42
specific applid 21 class 3 44
defining 50 establishing new sessions after takeover 43
use with VTAM 37 factors that affect service 37
START, system initialization parameter 50 general information 37
starting the active 50 levels of support 42
starting the alternate 51 LU0 48
startup job streams 13 nonswitchable 43
state information in CAVM data sets 14 overview 4
storage protection and XRF 65 programmable 48
sublibrary PRD1.BASE 62 service in an XRF environment 37
surveillance switching local 43
definition 3 tracking 15, 43
signal disappears 17 terminology v
signal in the control data set 14 time-of-day clock values 18, 20
stage in XRF 15 TPEND exit 62
turning off by CEBT 64 trace data sets 64
synchronization phase of XRF 15 tracking terminals 3, 15, 43
syncpointing, for class 2 terminals 44 transient data destination, CXRF 13
SYSIPT overrides 79
system console transaction 63
system data set failure 5 U
system initialization UNCONDREL option of RECOVOPTION 45
TAKEOVR parameter 16 unformatted system services (USS) tables 39
system initialization table (SIT) unique data 22
see SIT unnecessary takeovers 53
system log unplanned outage 1
archiving 20 user exit for VTAM failures 62
failure 5 user exits, executing in XRF 22
requirement for disk 20 USERVAR
system resources manager, VSE 23 automatic 39
Systems Network Architecture (SNA) 48 propagation 39
see SNA table 21, 37
user-managed 39
USS tables 39
T
takeover
after takeover 21 V
automatic 51 validity check of CLT 54
causes of 5, 7 VM/XA and VM/ESA and XRF 69
changing the takeover operand 64
W
workload on second VSE image 23
X
XDOMAIN definitions 39
XRF, system initialization parameter 50
XRFSIGNOFF attribute 45
XRFSOFF operand of DFHSNT 46
XRFSOFF, system initialization parameter 45, 50
XRFSTME, system initialization parameter 53
XRFTODI system initialization parameter 53
XSWITCH system initialization parameter 53
XSWITCH, system initialization parameter 51
XXRSTAT, global user exit 62
Index 99
Sending your comments to IBM
CICS Transaction Server for VSE/ESA
XRF Guide
SC33-1671-01
If you want to send to IBM any comments you have about this book, please use one of the methods
listed below. Feel free to comment on anything you regard as a specific error or omission in the subject
matter, and on the clarity, organization or completeness of the book itself.
To request additional publications, or to ask questions or make comments about the functions of IBM
products or systems, you should talk to your IBM representative or to your IBM authorized remarketer.
When you send comments to IBM, you grant IBM a nonexclusive right to use or distribute your comments
in any way it believes appropriate, without incurring any obligation to you.
You can send your comments to IBM in any of the following ways:
By mail:
IBM UK Laboratories
Information Development
Mail Point 095
Hursley Park
Winchester, SO21 2JN
England
By fax:
– From outside the U.K., after your international access code use 44 1962 870229
– From within the U.K., use 01962 870229
Electronically, use the appropriate network ID:
– IBM Mail Exchange: GBIBM2Q9 at IBMMAIL
– IBMLink: HURSLEY(IDRCF)
– Email: [email protected]
Whichever method you use, ensure that you include:
The publication number and title
The page number or topic to which your comment applies
Your name and address/telephone number/fax number/network ID.
IBM
SC33-1671-/1
Spine information:
IBM CICS TS for VSE/ESA XRF Guide Release 1