SG 246990
SG 246990
SG 246990
Alvaro Salla
Patrick Oughton
Redbooks
International Technical Support Organization
May 2018
SG24-6990-05
Note: Before using this information and the product it supports, read the information in “Notices” on
page vii.
This edition applies to version 2 release 3 of IBM z/OS (product number 5650-ZOS) and to all
subsequent releases and modifications until otherwise indicated in new editions.
© Copyright International Business Machines Corporation 2017, 2018. All rights reserved.
Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
The team who wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x
Contents v
vi ABCs of z/OS System Programming Volume 10
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs.
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
AIX® IBM Z® Resource Link®
C3® IBM z13® RMF™
CICS® IBM z13s® S/390®
Db2® IBM z14™ VTAM®
DS8000® Interconnect® z/Architecture®
ECKD™ Language Environment® z/OS®
FICON® MVS™ z/VM®
GDPS® Parallel Sysplex® z/VSE®
Geographically Dispersed Parallel PowerVM® z10™
Sysplex™ PR/SM™ z13®
HiperSockets™ RACF® z13s®
IBM® Redbooks® z9®
IBM LinuxONE™ Redbooks (logo) ® zEnterprise®
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States,
other countries, or both.
Java, and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its
affiliates.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Other company, product, or service names may be trademarks or service marks of others.
The ABCs of IBM® z/OS® System Programming is an 13-volume collection that provides an
introduction to the z/OS operating system and the hardware architecture. Whether you are a
beginner or an experienced system programmer, the ABCs collection provides the
information that you need to start your research into z/OS and related subjects. If you would
like to become more familiar with z/OS in your current environment, or if you are evaluating
platforms to consolidate your e-business applications, the ABCs collection will serve as a
powerful technical tool.
Patrick Oughton joined IBM in 2015 after a career as a Mainframe Systems Programmer for
over 30 years. He has worked for various organizations in New Zealand and Australia. He is
currently an IBM Z Client Technical Specialist for New Zealand and Australian clients.
Thanks also to Lydia Parziale and Robert Haimowitz, International Technical Support
Organization, Poughkeepsie Center, for their support of this project, and the original authors
of this book Lydia Parziale and Patrick Oughton.
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an email to:
[email protected]
Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
Preface xi
xii ABCs of z/OS System Programming Volume 10
1
In order to understand z/Architecture, you have to be familiar with the basics of ESA/390 and
its predecessors.
One of the key concepts of a computer architecture is the number of addresses a running
program (application or system) may reference to access instructions and operands.
An address space (AS) is defined as the range of addresses available to a computer program
and is like a programmer's map of the virtual storage available for code and data. An address
space provides each programmer with access to all of the addresses available through the
computer architecture. We may say that an AS is the set of all of these addresses.
The number of addresses in an address space depends on the length of the address field
where it is stored by the execution processor. This length varies with the computer
architecture.
This was a major reason for the introduction of the System/370 Extended Architecture
(370-XA) in early 1980, and its operating system, MVS/XA. Because the effective length of an
address field expanded from 24 bits to 31 bits, the number of possible addresses in the AS
expanded from 16 MB addresses to 2 GB of addresses. The MVS/XA address space is 128
times bigger than the MVS/370 AS.
To maintain compatibility with old application code, IBM z/OS provided two addressing modes
(AMODE). Programs that run in:
AMODE 24 can use only the first 16 MB addresses.
Similarly, when the address field length grew from 31 to 64 bits in z/Architecture, the
maximum possible number of addresses in the address space went from 2 GB, to 16
exabytes (EB), or 8 billion times larger than the former 31 bit. To maintain compatibility with
old application code, the IBM z/OS (aka MVS™) operating system provided three addressing
modes1 for programs that run in:
AMODE 24 can use only all the first 16 MB addresses.
AMODE 31 can use only all the 2 GB of addresses.
AMODE 64 can use all the 16 EB addresses.
z/OS and the IBM Z family of hardware products deliver the 64-bit architecture
(z/Architecture) to provide qualities of service that are critical to the business world. 64-bit
CPC memory support helps eliminate paging. 64-bit CPC memory support may allow you to
consolidate your current systems into fewer logical partitions (LP), or to a single native image.
Here, we use the expression “CPC memory” for central storage or real storage or real
memory or Random Access Memory (RAM).
Now, a key question. Should you skip or read this chapter? The answer is simple. Are you
certain that your daily production workload will be processed smoothly, with no problems with
these disciplines: availability, performance, integrity, security, and compatibility? If your
answer is yes, skip this chapter and go directly to Chapter 2, “Introducing the IBM z14” on
page 77.
In this section, you find the basics. These basics matter when you do have problems.
On top of that, the writers of this chapter are not aware of any IBM publication that makes the
POP more easy to understand.
Computer architecture
This chapter provides a detailed description of z/Architecture, and it discusses aspects of the
computers (CPCs) running such architecture. These CPCs are now referred to as IBM Z and
they comprise the: IBM z800, z890, z900, z990, z9® EC and BC, z10™ EC and BC, z196 EC
and BC, zEC12, zBC12, IBM z13®, IBM z13s® and z14.
1 For more information on AMODE, see section 1.16, “Addressing mode” on page 30
Software
POP
Hardware
Computer
Architect
In this chapter, we describe the z/Architecture, that is composed of the following key
elements:
– Set of instructions (more than 1000 different ones)
– Program status word (PSW) and interruptions
– Accessing CPC memory and registers
– Execute an I/O operation
– Implement memory Protection mechanism
– Cross Memory
– Virtual memory
Real Life
Taxi
Street
Data Processing
Program Payroll
Process
Analogies
To illustrate the process concept, let 's draw an analogy using real-life terms.
Each item in the list labeled “Real Life” in Figure 1-3 corresponds to an item in the Data
Processing list: the taxi is the Processor Unit (PU) or CP, the street is the program, and "going
to the movies" is the process.
So, if every time you go to the movies, you notice that you are late because you have a slow
taxi, it would be useless to call another slow taxi, the same way, if your process is taking
longer because of a slow PU, it is useless to add another PU that has the same speed.
You could also go to the movies by taking a taxi, then get out somewhere, do some shopping
then take another taxi to continue on to the movies, and so on.
Similarly, the same process can start in one PU, be interrupted (for example, by an I/O
interrupt), and later resume in another PU, and so on.
Also, on the same street there may be different cars taking people to different places.
Likewise, the same reentrant program can be executed by different PUs on behalf of different
processes.
Taking someone to the hospital in an ambulance allows you to shortcut the traffic queue, just
as a key process has a higher priority in a queue for getting PU and I/O compared with other
processes.
A process may create another process to execute both in parallel. For example, the parent
process "going to the movies" may attach another daughter process "buy a cake for
Grandma". It implies two taxis taking different people through streets or different processes
executing programs (the same or not) in different PUs.
States of processing
From an operating system perspective, a process has one of three states:
Active: In the Active state, a process program is being executed by one PU.
Ready: In the Ready state, a process is delayed because all available PUs are busy
executing other processes.
Wait: In the Wait state, a process is being delayed for a reason that is not PU-related, for
example, waiting for an event, such as an I/O operation to complete.
Process attributes
In the operating system, processes are usually born depending on the needs of business
transactions introduced in the system through the network.
A process dies normally when its last program completes (i.e., its last instruction gets
executed) without any error. A process dies abnormally when one of its programs (in fact, one
of its instructions) tries to execute something wrong, or forbidden.
The amount of resources consumed (PU cycles, number of I/O operations, amount of
consumed memory) is charged to the process, and not to the program.
Also, when there are queues for accessing resources, the priority to be placed in such queues
depends on how important the process is, and not on the program.
Then, for each non-active process in the system, z/OS must keep the following information in
process control blocks, such as the task control block (TCB) and the service request block
(SRB):
An address space has naturally several programs from several DUs, but just one (the step
task) usually and optionally creates other DUs. The others are naturally in wait state.
The benefits of multiprocessing (several live DUs active in parallel in the same system) can
usually be achieved with only several step tasks of several address spaces.
Note: From now on in this publication, we will use the acronym DU instead of
"process".
A system is made up of hardware components, including a processor unit (PU), and software
products, with the primary software being an operating system such as z/OS. Other types of
software (system and user application programs) also run on the system. The processor unit
(PU) is the functional hardware unit that interprets and processes program instructions. The
PU and other system hardware, such as channels and memory, make up a server complex
(CPC).
PU
Channel Subsystem
Dynamic Dynamic CU
Switch Switch CU ...
...
CU
CU CU CU ...
... CU ...
... CU
... ...
Because the commercial workload is prone to I/Os (I/O bound), a z14 may allow hundreds of
channels. There are three types of channels at the z14 CPC, depending mainly on the used
protocol along the dialog:
IBM FICON Express 6S+.
IBM Z High Performance FICON (zHPF), that is similar to FICON but more efficient.
PCIe HyperLink, a z14 exclusive, more modern and five times higher I/O rate than zHPF.
CF links to access the Coupling Facilities. Refer to the Figure on page 76.
VFM is a part of the CPC memory (up to 10 TB) not directly accessed by the PU. z/OS keeps
track of the VFM objects such as: page data sets, large pages, overflow area of coupling
facility structures, SVC dumps and so on. At zEC12 CPC, the Flash memory cards were
located in slots of the PCIe I/O Drawer.
The IBM Z platform is designed to provide pervasive encryption capabilities to help you
protect data efficiently in the digital enterprise.
IBM z/OS was designed to take advantage of the z14 platform embedding the use of the z14
cryptographic engines within the operating environment. This approach helps to create an
environment where policies can be enforced that govern intrinsic data protection, helping you
build a perimeter around business data.
ZFamily
Channel Subsystem
(SAP PU)
CP A CP is a general purpose processor that is able to execute all the possible z14
running operating systems, such as z/OS, zLinux, z/ VM, IBM z/VSE®, Coupling
Facility Control Code (CFCC), and z/TPF. A CP is also known as a CPU. All other
PU types are less expensive than PUs.
IFL This type of PU is only able to execute native zLinux and zLinux under IBM
z/VM®. IFLs are less expensive than the CPs.
ICF This type of PU is only able to execute the CFCC operating system. The CFCC is
loaded in a Coupling Facility LP from a copy in HSA; after this, the LP is activated
and IPLed. ICFs are less expensive than the CPs.
zIIP This type of PU is to run in z/OS only, for eligible IBM Db2® workloads such as
DDF, business intelligence (BI), ERP, CRM, and IPSec (an open networking
A B
Micro
Program
Maestro
ADD
ALU Storage
In a microcoded PU instruction, there is one micro program that tells data control what to do
in order to execute the instruction in the data flow. The micro program has, in a special
language, the time sequence (including parallelism) of orders to be sent by the data flow.
These micro programs are loaded in a special internal memory in the PU, called control
storage, at power-on reset (POR) time, by the Support Element.
Decoding a microcoded instruction consists of finding the address in the control storage of its
pre-loaded micro program. The opposite of microcoding is hardwiring, in which the logic of
data control for each instruction is determined by Boolean hardware components. The
advantage of microcoding is flexibility, where any correction or even a new instruction
(creating a new function) can be implemented by just changing or adding to the existent
microcode. It is also possible that the same PU may switch instantaneously from one
architecture to another (such as from ESA/390 to z/Architecture) by using another set of
microcode to be loaded into a piece of its control memory. The advantage of hardwiring is
performance because there is no data control overhead. In z14, due to these performance
reasons, a large majority of z14 instructions are hardwired.
1.8 PU Registers
PU registers (Figure 1-7 on page 13) are a fast memory within a PU, like the current PSW.
The PU provides registers that are available to programs, but that do not have addressable
representations like bytes in CPC memory. There are several types of registers, as explained
in the following sections: general purpose registers (GPR), floating-point registers, control
registers, access registers, and prefix register. Each type (with the prefix exception) has 16
registers per PU of 8 bytes each (but access registers have only 4 bytes).
ALU
FLAGS
ME-
MO- 0
RY 64
The instruction operation code determines which type of register is to be used in the
operation associated with this instruction.
GPRs are identified by the numbers 0-15 (register numbers 00 through 0F), and are
designated by a four-bit R field in an instruction. Usually, when a GPR contains data, this data
is in binary integer format, also called fixed point. There are certain PU instructions that are
able to process data loaded and stored in GPRs. Load-type instructions copy data from the
CPC memory to GPRs, and Store-type instructions copy data from GPR to memory.
When an instruction needs only 32 bits, such as Load, Add, and Store, only bits 32 to 63 of
the GPR are used, the high order bits 0 to 31 are not used.
z/Architecture provides instructions that use 64-bit operands to produce a 64-bit binary
integer. These instructions include a “G” in the instruction mnemonic, Load (LG), Add (AG)
and Store (STG) that handle all 64 bits of the GPR. Those instructions that have 64- and
32-bit operands have a “GF” in the mnemonic such as Load (LGF) and Add (AGF).
Summarizing, CRs are registers accessed and modified by z/OS through privileged
instructions. All the data contained in the CRs are architected containing information input by
z/OS and used by hardware functions (such as Crypto, cross memory, virtual memory, and
clocks) implemented in the PU. It is a kind of extension of the PSW (covered in “Program
status word (PSW)” on page 17. Refer to z/Architecture Reference Summary, SA22-7871,
for the complete set of CR contents.
The CRs are identified by the numbers 0-15 and are designated by four-bit R fields in the
instructions LOAD CONTROL and STORE CONTROL. Multiple control registers can be
addressed by these instructions.
When the PU is in a mode called the access-register mode (controlled by bits 16 and 17 in the
PSW), an instruction B field (in addition to its role of specifying a virtual address for a
memory-operand reference) designates an AR that points indirectly to the data space
segment table to be used by DAT to translate this virtual address.
Instructions are provided (such LAM) for loading and storing the contents of the access
registers, and for moving the contents of one access register to another.
As shown in Figure 1-8, pairs of FPRs can be used for extended (128-bit) operands. When
using extended operand instructions, each of the eight pairs is referred to by the number of
the lower-numbered register of the pair.
Then, FPRs are used to keep temporary data (operands) loaded from CPC memory to be
processed. This data must be in the format HFP, BFP or DFP.
ALU
FLAGS
MEMORY
32
EXPONENT
SIGN CHARACTERISTIC EXCESS OF 64, X’40’ +
IEEE
The quality of an architecture depends very much on how powerful the instruction set is in
solving the various types of programming solutions. z/Architecture is a powerful architecture
used to run in z14 PUs and to address different kinds of problems, mainly the ones of
commercial workloads.
There are more than 1,300 instructions, as defined in z/Architecture and the majority of them
have their implicit logic described in z/Architecture Principles of Operations, SA22-7832.
Each instruction set for z/Architecture has many instructions (compared with other platforms),
because IBM Z is a Complex Instruction Set of Computing (CISC) machines, as required by
commercial data processing. Many of these instructions operate on 64-bit binary integers and
also address memory operands with 64-bit. For compatibility reasons, the instructions of the
previous mainframe architectures are carried forward to the z/Architecture. Each new family
of IBM Z PUs adds more complex and more useful instructions into the z/Architecture. The
major objective of these instructions is to decrease the total number of instructions executed
(also known as the Path Length), and as a result, to decrease your PU time. It is
recommended that you recompile your more frequently used code with a modern compiler in
order to exploit these new and more powerful instructions.
Instructions
An instruction is one, two, or three halfwords (two bytes) in length, as shown in Figure 1-9,
and must be located in storage on a halfword boundary. Its size is dependent on how much
information needs to be passed to the PU.
All instructions that process operands in CPC memory need to address that operand, usually
through a virtual address. For that, the PU adds the contents of the following:
Contents of a GPR indicated in the instruction as a base, such as B1 (as shown in
Figure 1-9 on page 17) or an index, such as X2 (as shown in Figure 1-9 on page 17.)
A displacement is indicated in the instruction, such as D1 (as shown in Figure 1-9 on
page 17). For the RSY, RXY and SYI type of instruction, this displacement is 20 bits (1 MB
range); for the others, it has 12 bits.
Op code
RR R1 R2 RRE ///// R1 R2
RS R1 R3 B2 D2
RSE R1 R3 B2 D2 ///// Op Cd
SS L B1 D1 B2 D2
SSE B1 D1 B2 D2
RSI R1 R3 I2
S B2 D2
The reason for so many formats is the diversity of the information to be passed to the PU by
the programmer. As an analogy, if you need to go to a nearby grocery store, you might use a
bicycle (a 2-byte RR format of instruction); but if you go to a place that is farther away, you
might drive a car instead (a 6-byte SSE format of instruction) to save time.
For more information and a description of the various instruction formats, see the
z/Architecture Principles of Operation, SA22-7832 which can be found at:
http://www-01.ibm.com/support/docview.wss?uid=isg2b9de5f05a9d57819852571c500428f9a
B
0000000 0000000 0 0000000 0 0000000 0
A
32 63
Instruction Address
64 95
The PSW includes a system mask, the PSW key, any problem state, the instruction address,
condition code, and other information used to control instruction sequencing and to hold and
indicates the status of the PU. In other words, the current PSW governs the program currently
being executed.
With Simultaneous Multi Thread (SMT) at the zIIPs and SAPs in z14, that means, several
tasks running can run concurrently in the same PU (each one using a thread ID). In this case,
there is one current PSW per thread ID in the same PU.
Bits 0-7 of the PSW are collectively referred to as the system mask. A summary of the
functions can be found in the z/Architecture Principles of Operation, SA22-7832 which can be
found at:
http://www-01.ibm.com/support/docview.wss?uid=isg2b9de5f05a9d57819852571c500428f9a
PER mask
Shown as an R in Figure 1-10, this is the program event recording mask. Bit 1 (shown as
number 1 in Figure 1-11) controls whether the PU is enabled for interrupts associated with
program-event recording. When the bit is zero, no PER event can cause an interruption.
When the bit is one, interruptions are permitted, subject to the PER-event-mask bits in control
register 9. PER is a hardware z/Architecture function to assist in debugging z/OS code.
I/O mask
Shown as IO in Figure 1-10, Bit 6 controls whether the PU is enabled for I/O interruptions. An
I/O interruption is demanded by a channel when an I/O operation is over.
When the bit is zero, an I/O interruption cannot occur in this PU. When the bit is one, I/O
interruptions are subject to the I/O-interruption subclass-mask bits in control register 6. When
an I/O-interruption subclass-mask bit is zero, an I/O interruption for that I/O-interruption
subclass cannot occur; when the I/O-interruption subclass-mask bit is one, an I/O interruption
for that I/O-interruption subclass can occur. Usually in z/OS, these CR6 bits are on.
At z14 using the PCIe HyperLink for 4-KB random read hits, there are not I/O interrupts,
because in such case the I/O operation is very fast (tens of microseconds) and it is
synchronous, that is, the PU loops, thus testing the end of the I/O operation.
External mask
Shown as EX in Figure 1-10, Bit 7 controls whether the PU is enabled for interruption by
conditions included in the external type of interrupt. When the bit is zero, an external
interruption cannot occur. When the bit is one, an external interruption is subject to the
corresponding external subclass-mask bits in control register 0; when the subclass-mask bit
is zero, conditions associated with the subclass cannot cause an interruption; when the
subclass-mask bit is one, an interruption in that subclass can occur. Usually in z/OS, these
CR6 bits are on.
There are certain z/OS programs that cannot be interrupted, to avoid rising data integrity
exposures in CPC memory. It is also important to know that the incoming external interrupt is
not lost when all the PUs are disabled for external interrupts. They are kept at SAP hardware
queue.
Note: An incoming external interrupt is not lost when all the PUs are disabled for external
interrupts. It is kept in the SAP hardware queue.
PSW
Indicated by the word Key in Figure 1-10 and positioned in bits 8-11, the PSW key is used by
a hardware mechanism within the PU called memory protection. It guarantees that programs
running DUs do not alter or read areas in CPC memory that belong to other DUs; refer to
“Storage protection” on page 39 for more information.
Architecture Mode
This bit 12 must be off (0) and indicates 128-bit z/Architecture mode, which is the only
possible architecture to execute in a z14.
Machine-check mask
Indicated by an M in Figure 1-10, bit 13 controls whether the PU is enabled for interruption by
machine-check conditions. When the bit is zero, a machine-check interruption cannot occur.
When the bit is one, machine-check interruptions due to system damage and
instruction-processing damage are permitted, but interruptions due to other
machine-check-subclass conditions are subject to the subclass-mask bits in control register
14.
This bit is off when the z/OS routine, Machine Check Handler (MCH), is trying to recover from
a hardware machine check and recursively receives another machine check interrupt.
Wait state
Indicated by a W in Figure 1-10, when bit 14 is on, the PU is waiting; that is, no instructions
are processed by the PU, but interruptions may take place. When bit 14 is zero, instruction
fetching and execution occur in the normal manner. When in wait state, the only way of getting
out of such state is through an Interruption, which is covered in 1.17, “Interrupts” on page 31,
or by an IPL (a z/OS “boot”).
When z/OS is running under IBM PR/SM™ (as is mandatory in z14) in a shared logical PU,
and there is no ready DU to get this logical PU, bit 14 is set on by z/OS itself. In this case, it
causes a PR/SM intercept that forces the physical PU to leave the logical PU. Later, when
PR/SM cannot find any shared logical PU to get to the physical PU, bit 14 is switched on
again. In this case, the physical PU enters into a real wait state.
Privileged instructions should only be executed by reliable z/OS code. z/OS manages that,
when its code is executing, bit 15 is Off; when an application program is executing, bit 15 is
On. Some instructions are said to be semi-privileged, because they depend on other factors
beyond the bit 15 state.
Address-space control
Indicated by an AS in Figure 1-10, bits 16 and 17, along with PSW bit 5, control the
translation mode. If the AS is:
00 - Primary-space mode
01 - Access-register mode
10 - Secondary-space mode
11 - Home-space mode
Condition code
Shown as CC in Figure 1-10, bits 18-19 can be set to a 0, 1, 2 or 3 depending on the result
obtained in executing the last instruction. Most arithmetic and logical instructions, as well as
some other instructions, set the condition code.
For example, the instruction BRANCH ON CONDITION can specify any selection of the
condition-code values as a criterion for branching. Use the BRANCH ON CONDITION to test the
contents of the CC of the current PSW that were set by the previous instruction.
Program Mask
Indicated by Prog Mask in Figure 1-10, bits 20-23 are associated with one of the following
types of program exceptions:
Fixed-point overflow (bit 20)
Decimal overflow (bit 21)
Hexadecimal floating point (HFP) exponent underflow (bit 22)
HFP significance (bit 23)
During the execution of an arithmetic instruction, the PU may find some unusual (or error)
condition, such as data overflow, loss of significance, or underflow. In such an instance, the
relevant program mask bit is set to one (1) and a program interrupt occurs; refer to “Interrupts”
on page 31 for more details.
When this program exception is encountered by z/OS, usually the current task is abnormally
ended (ABEND). However, in certain situations, programmers do not want an ABEND for
certain specific cases. In such a case, the programmer will instruct the program itself to
handle the situation. So by using the unprivileged instruction SET PROGRAM MASK (SPM), the
interrupt can be bypassed by setting relevant program mask bits to off (0).
The active program is informed about these events through the condition code posted by the
instruction, where the events described happened.
The contents of the current PSW PU can be totally changed by two events:
Loading a new PSW from CPC memory along an interruption.
By executing the instruction LPSWE, which copies 128 bits from memory to the current
PSW.
The combination of bits 31 and 32 identify the addressing mode (AMODE 24, 31 or 64) of the
running program.
Instruction Address
Bits 64 through 127, shown in Figure 1-10, point to the memory virtual address of the next
instruction to be executed by this PU. When an instruction is fetched from CPC memory, its
length is automatically added to this field. The PSW will then point to the next instruction
address (see Figure 1-12). However, there are instructions, such as BRANCH, that may
replace the contents of this PSW field, pointing to the branch target instruction. The address
length contained in this PSW field depends on the program addressing mode attribute
(AMODE) of the executing program. For compatibility reasons, old programs that address
small addresses are still allowed to execute. When in 24- or 31-bit addressing mode, the
leftmost bits of this field are filled with zeros. Then programs are allowed to run in 64-bit
PSW
0 64 127
--------
NEXT
--------------
SEQUENTIAL
-------- INSTRUCTION
------------
BR
MVC
MVC
-------------
--------
------------
----------
The PSW bits 31 and 32 (extended and basic addressing modes) indicate to the PU which
AMODE the running program is using.
The PU has an interrupt capability, which permits it to switch rapidly to another program in
response to exceptional conditions and external stimuli. Interrupts can modify the address of
the next instruction to be executed.
When an interrupt occurs, the PU places a copy of the current PSW in an assigned CPC
memory location, called the old-PSW location, for the particular type of interrupt. Then, the
PU fetches a new PSW copy from a second assigned CPC memory location. This new PSW
(already prepared by z/OS) determines the next z/OS program to be executed (a z/OS
component named First Level Interrupt Handler - FLIH). When it has finished processing the
interrupt, the program handling the interrupt may reload the old PSW, making it the current
PSW again, (through the LPSWE instruction) that the interrupted program can continue.
There are six types of interrupt: restart, external, supervisor call, program check, machine
check, and I/O. Each type has a distinct pair of old-PSW and new-PSW locations
permanently assigned in CPC memory.
For more details about the interrupt concept, refer “Interrupts” on page 31.
By the way, z14 implements Simultaneous Dual Thread, where two program tasks can be
executed concurrently in the same PU (zIIP and SAP), each one in a Thread ID.
Summarizing, when you add more PUs to the CPC, you add the capability of processing
program instructions simultaneously in these PUs. When all the PUs share CPC memory and
a single z/OS image manages the processing, each task is assigned to a PU that is available
to execute such task. If a PU has hardware failure, such tasks can be routed to another PU.
This hardware and software organization is called a tightly coupled multiprocessor.
To implement tightly coupled systems, the following items are needed in the z/Architecture:
Shared CPC memory, which allows hundreds of PUs to share the same logical partition
CPC memory.
PU-to-PU interconnection and signaling.
Prefixing, implemented by the Prefix Register and related logic. This is beyond the scope
of this volume.
The following sections discuss data formats that are defined and used with the z/Architecture.
Short HFP
S Character 6 Digit Fraction
0 78 31
Short BFP
S Exponent Fraction
0 9 31
For signed binary integers, the leftmost bit represents the sign (0 for positive and 1 for
negative), which is followed by the numeric field. Positive numbers are represented in true
binary notation with the sign bit set to zero. Negative numbers are represented in
two's-complement binary notation, with a one in the sign-bit position. The length of such data
can be two bytes (a halfword), four bytes (a fullword) or eight bytes (a doubleword). By the
way, some floating-point instructions use extended operands made of sixteen bytes (a
quadword).
In the zoned format, the rightmost four bits of a byte are called the numeric bits (N), and
normally consist of a code representing a decimal digit (from 0 to 9). The leftmost four bits of
Decimal digits in the zoned format may be part of a larger character set (as EBCDIC), which
includes also alphabetic and special characters. The zoned format is, therefore, suitable for
input, editing, and output of numeric data in human-readable form (printers or screens). There
are no decimal-arithmetic instructions that operate directly on decimal numbers in the zoned
format; such numbers must first be converted to the packed decimal format.
In the packed format, each byte contains two decimal digits (D), except for the rightmost byte,
which contains a sign to the right of a decimal digit (Hex C or Hex F for positive, and Hex D or
Hex B for negative (Hex A and Hex E are also positive, but seldom used)). Decimal arithmetic
operation is performed with operands in the packed format, and generates results in the
packed format. The packed-format operands and results of decimal-arithmetic instructions
may be up to 16 bytes (31 digits and sign). The editing instructions can fetch as many as 256
bytes from one or more decimal numbers of variable length, each in packed format. There are
instructions to convert between the numeric data formats.
The magnitude (an unsigned value) of the number is the product of the significant and the
radix raised to the power of the exponent. The number is positive or negative depending on
whether the sign bit is zero or one, respectively. The radix values 16 and 2 lead to the
terminology “hexadecimal” and “binary” floating-point (HFP developed by IBM and BFP
developed by IEEE). The formats are also based on three operand lengths: short (32 bits),
long (64 bits), and extended (128 bits). There are instructions able to execute both types, just
as there are instructions specialized in just one of the formats.
The exponent of a BFP number is represented in the number as an unsigned binary integer
called the biased exponent. The biased exponent is obtained by adding a bias to the
exponent value. The number of bit positions containing the biased exponent, the value of the
bias, and the exponent range depend on the number format (short, long, or extended) and are
shown for the three formats. For more information about floating-point representation, refer to
“Floating point registers” on page 15. Recently, the z/Architecture introduced the Decimal
Floating Point (DFP) format, very useful for commercial computing. At z14 there is also a
hardware accelerator to execute instructions accessing DFP format.
If you key the letter B, you have C2 (1100 0010). If you key the number 2, you have F2 (1111
0010). Refer to z/Architecture Reference Summary, SA22-7871 for the complete set of
EBCDIC code. GB
1.12.6 Unicode
Unicode an alphanumeric double byte code with 64 KB possibilities, used by ideograms
written language. It is fully supported in z/Architecture by a set of several instructions.
1.12.7 UTF-8
UTF-8 is an alphanumeric quad byte code with 4 GB possibilities, used by ideograms written
language. It is fully supported in z/Architecture by a set of several instructions.
8-bits (1 byte)
(Virtual addresses)
16E
Bytes attribute
Each byte has two attributes:
Location: Location in memory is identified by a unique non-negative integer, known simply
as the byte address. Adjacent byte locations have consecutive addresses, starting with 0
on the left and proceeding in a left-to-right sequence. These addresses are located in
fields that may have: 24, 31, or 64 bits.
Contents: The given value for a byte is the value obtained by considering the bits of the
byte to represent a binary code. Thus, when a byte is said to contain a zero, the value
00000000 binary, or 00 hex, is meant. There are 256 different contents per byte.
In each byte, the bits are numbered in a left-to-right sequence. The leftmost bits are referred
to as the “high-order” bits and the rightmost bits as the “low-order” bits. The bits in a byte are
numbered 0 through 7, from left to right. Bit numbers are not memory addresses, however.
Only bytes can be addressed. To operate on individual bits of a byte in memory, it is
necessary to access the entire byte.
Architecture: System/360
24 bits
PSW Instruction
Address
24 bits
Operand
GPR
Address
16 MB Address
Space
Architecture: z/Architecture
64 bits
Instruction
PSW
Address
64 bits
GPR Operand 16 Exabyte
Address Address Space
addresses
An address space (AS) is the set of addresses that this program may potentially reference
during its PU execution. The number of possible addresses in an AS is controlled by the
architecture of the PU which defines the size of the PU field containing such addresses. For
example:
System 360 architecture has 24-bit, which allows an AS of up to 16 MB addresses.
System 370/XA architecture has 31-bit, which allows an AS of up to 2 GB of addresses.
In order to support the growth in business with large numbers of users and transactions, the
z/Architecture has 64-bit, thus allowing each z/OS address space up to 16 EB of addresses.
When virtual storage is not implemented, numerically the set of addresses in the AS is
identical to the set of bytes addresses.
1.16.1 AMODE
The addressing mode determines where in virtual storage, the operands can reside. The
memory operands for programs running in AMODE 64 can be anywhere in the 16 EB of
addresses of the AS, while a program running in AMODE 24 can use only memory operands
that reside in the first 16 MB addresses of the 16 EB AS.
One assigns an AMODE to indicate which hardware addressing mode is active when the
program executes. Addressing modes are:
24 Indicates that 24-bit addressing must be in effect.
ANY Indicates that either 24-bit or 31-bit addressing can be in effect.
64 Indicates that 64-bit addressing can be in effect.
Note: Even though ANY is still used, it is restricted to only 24- and 31-bit addressing
modes and does NOT includes 64-bit addressing mode.
Running programs
When a program is loaded into memory, its addressing mode is already determined. There
are non-privileged instructions that are able to change dynamically the addressing mode,
such as BRANCH AND SET MODE (BSM) and BRANCH AND SAVE AND SET MODE
(BASSM).
The PU addressing mode for the running program is described in bits 31 and 32 in the current
PSW, respectively:
00 indicates AMODE 24.
01 indicates AMODE 31.
10 is invalid.
11 indicates AMODE 64.
31 127
32
16E
64-BIT
ADDRESSES
2 GB
31-BIT
24-BIT 16 MB ADDRESSES
ADDRESSES
1.16.2 RMODE
The residence mode indicates the location where the load module is stored. When brought
into memory:
24 Indicates that 24-bit residence mode must be in effect, that is, store below 16 MB
address. This address is also called, the line.
ANY Indicates that 24-bit or 31-bit residence mode must be in effect, that is, between the
16 MB and the 2 GB of addresses by preference.
64 Indicates that 64-bit residence mode must be in effect, that is, between above the
2G-bar. This residence mode is valid only at z/OS 2.3 under certain conditions.
1.17 Interrupts
An interrupt occurs in the PU when the PU detects an alteration in the sequence of
instructions of a specific event or signal and are either planned or unplanned. The flow of
these events is shown in Figure 1-17.
DISABLED
1 4 FLIH STM
MVC
Current PSW
PSA 5
3
2
z/OS Service
Routines
RESTART OLD RESTART NEW
6
EXTERNAL OLD EXTERNAL NEW
LOCK
LCTL 1
S V C OLD S V C NEW DISPATCHER LM
LPSW
In the Prefixed Storage Area (PSA) there are a pair of quadwords (made up of 16 bytes each)
for each type of interrupt (1). When an interrupt occurs (2), an end of an I/O for example, the
current PSW is stored in the I/O old PSW location (one of the quadwords for I/I interruption).
Then the I/O new PSW (3) is loaded in the current PSW. That PSW points to (4) the
corresponding First Level Interrupt Handler (FLIH). From the FLIH (5) control is passed to the
service routines to handle the specific interrupt. After this is finished control is passed to the
Dispatcher (6) that will look for work (7) and pass control (8) to the unit of work.
During interrupt processing, the PU hardware performs the three following hardware steps:
1. Stores (saves) the current PSW in a specific PSA CPC memory location named old PSW.
2. Stores information identifying the cause of the interrupt in specific PSA CPC memory
location, named interrupt code.
3. Fetches, from a specific PSA CPC memory location named new PSW (already prepared
by z/OS), an image of the PSW and loads it in the current PSW.
After this third step, the interrupt process is over. The PU returns to normal mode, that is,
fetching the next instruction, where its virtual address is located at the current PSW (a new
PSW copy).
Note: The old and new PSWs are just copies of the current PSW contents. Processing
resumes as specified by the new PSW instruction address and status. The old PSW stored
on an interrupt normally contains the status and the address of the instruction that would
have been executed next had the interrupt not occurred, thus permitting later the
resumption of the interrupted program (and task).
https://www.ibm.com/support/knowledgecenter/zosbasics/com.ibm.zos.zconcepts/zconc_
interrupts.htm
Remember that certain causes for program interrupts can be masked by the SET PROGRAM
MASK (SPM) instruction, as covered in “PSW format” on page 18. However, getting something
wrong during instruction execution does not necessarily indicate an error. For example, a
page-fault program interrupt (interrupt code x’11’) indicates that the virtual memory address
does not correspond to a real address in CPC memory, then the task is not abended. Refer to
“Dynamic address translation (DAT) (I)” on page 45, for more information.
Program
issues SVC 0 Current
instruction 2 PSW 3 z/OS
Old New
PSW PSW
1 4
FLIH
When executed by the PU, the SVC instruction causes an SVC interrupt(1). The current PSW
is stored in the PSA at the SVC old PSW address field(2), and the second byte of the
instruction is stored in the SVC interruption code in PSA memory, and a new PSW (3) from
the PSA new PSW address field is loaded in the current PSW.
The purpose of such an interrupt is a part of the z/Architecture, where an application program
running in problem mode (bit 15 of the current PSW on) may pass control to z/OS asking for
z/OS service. After the interrupt, the PU goes to the current PSW to get the address of the
next instruction. Now, the content of the current PSW is the copy of the supervisor call’s
(SVC) new PSW, which was prepared by z/OS and is in the supervisor state. The FLIH(4),
already running in the supervisor state and saves the contents of the state (PSW and
registers) in the Task Control Block (TCB), and other related control blocks. It then points to
There is an entry in such a table for every possible SVC number. The entry describes the
attributes and the CPC memory address of a z/OS SVC routine that handles a required
function associated with the SVC number. For example, SVC 00 means the z/OS Input
Output Supervisor (IOS) component, the one in charge of starting an I/O operation.
Consulting the proper entry in the SVC table, the SVC SLIH routine branches to the specific
SVC routine. Consequently, the program issuing the SVC instruction needs to know the
relationship between the SVC interrupt code (contents of the second byte of the SVC
instruction) and the z/OS component to be invoked. For example:
00 (EXCP) means that the application program is invoking the IOS component, asking for
the execution by the channel of an I/O operation.
01 (WAIT) means that the active task wants to enter the wait state.
10 (GETMAIN) means that a program in the active task wants to have the right to access a
certain number of virtual addresses.
After the request is processed by the z/OS SVC routine, the interrupted program can regain
control again, by the z/OS Dispatcher component restoring its state (stored at TCB-related
control blocks), that is, registers and the PSW through the LPSWE instruction.
I/O interrupts
An input/output (I/O) interrupt occurs when the channel subsystem signals a change of
status, such as I/O operation completion, an error occurrence, or an I/O device (such as a
printer) has become available for work.
When an I/O operation is requested by a program in a task, the request is forwarded to the I/O
supervisor (IOS) through an SVC 00 instruction (aa EXCP). After SVC interrupt processing,
the SVC FLIH passes control to IOS. In z/Architecture, the I/O operation is not handled by the
PU that executes z/OS code. There are less expensive and more specialized processors to
execute the I/O operation, the I/O channels (usually FICON). When IOS issues the privileged
START SUBCHANNEL (SSCH) instruction, the PU (CP or zIIP) delegates it to a SAP PU, that
will find an I/O channel for the execution of the I/O operation. Then, the I/O operation is a
dialog between this I/O channel and an I/O controller, in order to move data between CPC
memory and the I/O device controlled by such a controller. After the execution of the SSCH
instruction, IOS returns control to the program task issuer of the SVC 00. If needed, the
access method (such as VSAM) still running in this task places itself in wait state (SVC 01),
until the end of the I/O operation.
Now, how does IOS and the PU become aware that the I/O operation handled by the I/O
channel is finished? This is handled through an I/O interrupt triggered by a SAP, and
requested by the I/O channel at the end of the I/O operation.
Then, the I/O new PSW points to the IOS code in z/OS (I/O FLIH) and the interrupt code tells
IOS which device has the completed I/O operation. Be aware that there are many I/O
operations running in parallel. The final status of the I/O operation is kept in a hardware
z/Architecture defined control block called the Interrupt Request Block (IRB).
The I/O old PSW has the current PSW at the moment of the I/O interrupt, so it can be used to
resume the processing of the interrupted task. This return to the interrupted program task is
done, of course, by the z/OS dispatcher, gathering the status saved at TCB-related control
blocks, and restoring it in the proper hardware registers, that is, executing a context switch.
External interrupt
This type of interrupt has eight different causes, usually not connected with what the active
program is doing. The most key causes are:
1004 Clock comparator - The contents of the TOD Clock became equal to the Clock
Comparator; refer to “z/Architecture time facilities” on page 55.
1005 PU timer - The contents of the PU Timer became negative; refer to “z/Architecture
time facilities” on page 55.
1200 Malfunction alert - Another PU in the multiprocessing complex is in checkstop state
due to a hardware error. The address of the PU that generated the condition is stored at
PSA locations 132-133.
1201 Emergency signal - Generated by signal PROCESSOR instruction when z/OS,
running in a PU with a hardware malfunction, decided to stop (Wait Disable) that PU. The
address of the PU sending the signal is provided with the interrupt code when the interrupt
occurs. (Note: The PU receiving such an interrupt is not the one with the defect.)
1202 External call - Generated by the SIGNAL PROCESSOR instruction when a program
wants to communicate synchronously or asynchronously with another program running in
another processor. The address of PU sending the signal is provided with interrupt code
when the interrupt occurs. This function is used by z/OS to take a PU out of a wait state.
1406 ETR - An interrupt request for the External Timer Reference (ETR) is generated
when a port availability change occurs at any port in the current server-port group, or
when an ETR alert occurs; refer to “z/Architecture time facilities” on page 55.
Restart interrupt
The restart interrupt provides a means for the operator (by using the restart function at HMC)
or a program running on another PU (through a SIGNAL PROCESSOR instruction) to invoke
the execution of a specified z/OS component program, that is the Recovery Termination
Manager (RTM). It does an evaluation of the system status, reporting at the console: hangs,
locks, and unusual states in certain key z/OS tasks. It gives the operator a chance to cancel
the offending task. It is maybe the last chance to avoid an IPL.
Application Current
Program
2 PSW 3 z/OS
Old New
PSW PSW
1 4
FLIH
Interrupt
5
Second level
Interrupt handler
6
Dispatcher
Step 4 Control is then passed to the first-level interrupt handler (FLIH), a z/OS
component. z/OS will use privileged instructions to respond to the
event; the new PSW (prepared by z/OS itself during nucleus
initialization program (NIP) processing) has bit 15 turned off (refer to
“Problem or supervisor state” on page 21). z/OS needs to access
some CPC memory locations to respond to the event and save the
previous status, and the new PSW (prepared by z/OS) has the PSW
key set for those memory locations. Refer to “Storage protection” on
page 39, for more information on PSW key field.
Step 5 After saving the status, control is passed, for each type of interrupt, to
a second-level interrupt handler for further processing of the interrupt.
After servicing the interrupt, the SLIH calls the z/OS dispatcher.
Steps 6 and 7 The Dispatcher (a z/OS component) dispatches (delivers the PU to a
program of the selected task) the interrupted program if there is an
available PU that is ready for new work. It might dispatch only the
interrupted one or any other, depending on the dispatching priority
queue. Recall that FLIH saved the status (PSW and registers) on
control blocks, such as TCBs. The z/OS Dispatcher does this by using
the old PSW copy, which has been saved, and which contains PU
status information necessary for resumption of the previous
interrupted program task. Then, the instruction LOAD PSW
EXTENDED may be used to restore the current PSW to the value of
the old PSW, and return control to the interrupted program at the exact
point.
The function of the PSA is to be a memory communication area between the operating
system (z/OS) and the PU hardware. This communication is made in the two directions. For
example, the field named old PSW is information from the hardware to z/OS and the PSA
field named new PSW is information from z/OS to the hardware.
This is defined by the z/Architecture to protect CPC memory. This facility is complemented by
the virtual storage mode (PSW bit 5 on).
Storage protection imposes limits and a task is only able to access (for read or write) the CPC
memory locations with its own data and programs, or, if specifically allowed, to read areas
from other tasks. Any violation of this rule causes the PU to generate a program interrupt
X‘0004‘ (protection exception), that causes z/OS to abend that task.
All real addresses manipulated by PUs or I/O channels must go through storage protection
verification before being used as an argument to access the contents of CPC memory. The
input of memory protection is a CPC memory address, the PSW key field, and the output is
either an ‘OK’ or a program interrupt with interruption code X‘0004‘.
The reference and change bits do not participate in the memory protection algorithm. They
are used for virtual memory implementation; refer to “Dynamic Address Translation (DAT) (II)”
on page 46.
The reference bit is inspected and switched off after inspection. The PU storage hardware
switches on the reference bit when the frame is accessed by a PU or any channel, and also
switches on the change bit when the frame contents are changed by those components (see
Figure 1-22). The reference and change bits do not participate in storage protection. They are
used for virtual storage implementation.
PSW
KEY EQUAL
. OK
.
0000
NOT
EQUAL
No FETCH
DATA
(Alter Data) ?
Yes
REF
Yes BIT No
OK
ON
?
PSW
KEY EQUAL
: OK
Access
Bits
NOT
EQUAL Program interrupt
x '0004': Protection
Exception
z/OS exploits storage protection by managing frame memory key values and running the
program PSW key field in the current PSW; for example, several z/OS routines run with PSW
key zero, and others some z/OS subsystems run with PSW key one. Application code has
PSW key eight.
This illusion is created by the z14 hardware PU component named Dynamic Address
Translation (DAT) along with other z/OS components such as:
The virtual storage manager (VSM),
The real storage manager (RSM) and,
The auxiliary storage manager (ASM)
Before virtual storage, the number of programs that could run in CPC memory was restricted
to how much CPC memory there was. For example, if you had 100 KB of CPC memory, only
two programs of 50 KB each would fit in memory. And in this 50 KB program, only 10 KB
contained the active portion of the program.
This was not efficient because, by the principle of locality, only less than 20% of a program
(load module) is frequently executed (kernel). So, it is a waste of CPC memory keeping rarely
executed code in memory. On top of that, the transaction multiprogramming level (number of
current tasks) is very low due to the limitation in the number of concurrent programs. Another
key consequence would be that a very low average CPU utilization.
With virtual storage, the code that is not currently active is stored in page data sets, making
room in CPC memory for the active portions of each program to be loaded.
The management of address spaces by z/OS is in various sizes. Main memory is partitioned
into equal fixed size chunks that are relatively small and each process is also divided into
small fixed size chunks of the same size. Then, the chunks of a process, known as pages, are
assigned to available chunks of memory, known as frames or page frames. For example,
page address spaces are divided into 4 KB units of virtual storage called pages. A page fault
occurs when the page containing the PU referenced address is not associated with a memory
frame.
Segment address spaces are divided into 1-megabyte units called segments. A segment is a
block of sequential virtual addresses spanning megabytes, beginning at a 1 MB boundary. A
2 GB address space, for example, consists of 2048 segments.
Virtual addresses are the addresses contained in the AS, identifying a specific address in the
AS. When a virtual address is used by the PU for an access to CPC memory, it is translated
by means of DAT to a real address. By the way, DAT is a circuit within each PU. A real address
identifies a location of a byte at CPC memory.
Refer to Figure 1-23 on page 43, where you may see 8 pages from an AS distributed among
frames at CPC memory and slots at page data sets.
PUs and I/O channels in a z14 machine access CPC memory byte contents through Memory
Controller Units (MCU) located at the PU chip that connects to CPC memory boxes (DIMM).
PUs and I/O channels send byte addresses to MCUs and these return byte contents. This
behavior is called non-associative.
Address Space
Pre - z/Architecture 0
31-bit Virtual Address Frame/Page
| 11 bits | 8 bits | 12 bits | (4K)
/
Segment Page
Byte Index
Central Storage
Index Index
0 1 12 20 31
Page Table
Primary Region-Table
CR1 or Segment-Table Origin
0 52 63
Figure 1-25 Translating a virtual address up to 2G
In order to accomplish such task, DAT access translation tables are maintained by the z/OS
Real Storage Manager (RSM) component. There is a set of tables for each z/OS address
space (AS). The first-layer table depends (region table or segment table) on the
high-water-mark requests virtual storage using the GETMAIN macro.
Assuming that the virtual address to be translated is up to 2 GB, the high-level table
corresponding to the active AS is a segment table. This table is pointed to by a special
register named control register 1 (CR1). RSM is in charge of maintaining such CR1. There is
one segment table per each current AS.
The predecessor of today’s z/OS was called Multiple Virtual Storage (MVS), meaning multiple
address spaces. Whenever the ATTACH macro creates a task, it is decided if the programs of
this task will share virtual addresses with already existing programs of other tasks. If the
answer is yes, the newborn programs will run in an already existing AS. If not, a new AS is
also created during task creation, with CR1 pointing to a new segment table when it becomes
active.
...
F2
P255 F3
P0 F4
P1
.
...
S1
.
P255 (Number of pages equal
to the number of frames)
Segment Table
Page Table
S0
S1
256
Entries
DAT Page Table
S 4A6
256
Entries
4A6C8A26 100000
C8A26 4A6 (# SEGM)
Page Table
C8A26 1000
A26
C8 (# Page)
PG C8 8000
Location = 8000 + A26 = 8A26 (the location 8000
was taken from the page table entry)
DAT performs the following tasks for a virtual address space that is less than 2 GB:
Receives a virtual address from the PU. It does not matter whether it refers to an operand
(data) or to an instruction.
Segments the virtual address as described in “Segmenting a virtual address” on page 44.
It does that by dividing the virtual address by 1 MB. The quotient is the number of the
segments (S), and the remainder, if any, is the displacement within the segment (D1).
Finds the corresponding entry (S) in the segment table to obtain the pointer of the
corresponding page table.
Figure 1-26 on page 46 illustrates the process of translating the address x ‘4A6C8A26’.
R3T SGT
PGT PAGE
R1T
R2T
PAGE
R3T
SGT
PGT
R2T
8P 2 GB 4 KB
Petabytes Gigabytes Kilobytes
16E 2 53 4T 2 31 1 MB 2 12
Exabytes Terabytes Megabytes
2 64 2 42 2 20
Each entry in a region table may point to another region table or to a segment table
(depending on the region table layer), and the segment table always points to a page table.
An AS in z/OS is born with 2 GB of addresses only. When the first Getmain above 2 GB (bar)
memory is requested, RSM creates the R3T. (Refer to “A 64-bit Address Space” on page 51
for more details about the bar concept.) The R3T table has 2048 segment table pointers, and
provides addressability up to 4 TB virtual addresses. When virtual memory Getmain address
received is greater than 4 TB, an R2T is created. An R2T has 2048 R3T table pointers and
provides addressability to 8 PB virtual addresses. An R1T is created when virtual memory
getmained greater than 8 PB is allocated. The R1T has 2048 R2T table pointers, and
provides addressability to 16 EB virtual addresses. At Figure 1-27 on page 47, you may see
the table layers and the respective pointers.
Segment tables and page table formats remain the same as for virtual addresses below the 2
GB bar. When translating a 64-bit virtual address, once DAT has identified the corresponding
2 GB region entry that points to the segment table, the process is the same as that described
previously.
RSM only creates the additional layers of region tables, when necessary to back virtual
storage that is mapped by future Getmains. Then, they are not built until needed. So, just as
an example, if an application requests 60 PB of virtual memory, the necessary R2T, R3T,
segment table, and page tables are created because there is a need to back a referenced
page.
Summarizing, up to five translation tables may be needed by DAT to do translation, but the
translation only starts from the table layer that provides translation for the highest getmained
virtual address in the AS.
0014A000
Real address
abc 4k
User X address space
10254000
Virtual address
Page faults
Following the DAT processing of a virtual address, either a real address is returned or a page
fault occurs. There is a bit in the page table entry called the invalid bit. When the invalid bit is
on, it means that the content of the referenced page is not mapped into a frame in CPC
memory. DAT reacts to this by generating a program interrupt code X’11” indicating a page
fault. This page must be read in from external memory called a page data set located in a
3390 volume or in z14 Virtual Flash Memory (VFM).
For each AS to reference the 4 KB page beginning at that address, that 4 KB page must be
mapped with a 4 KB frame at CPC memory. When referenced, the page could either be in
CPC memory already, or a page fault occurs and it must be read from the page data set
where is resides, through an I/O operation.
If the virtual address is above the bar in the AS, the segment table is not pointed by the
control register 1. In this case such registers points to the appropriate region table: third
region table, second region table and first region table; refer to “Translating large virtual
address” on page 47.
x 29 2 64
2 31 (High Non-shared)
Region 1st table (R1T)
User Private area
Data spaces
0
2 53
2 50 512TB
Shared Memory
(z/OS V1R5) Region 2nd table (R2T)
2 42
HVCOMMON 2 41 64-bit Common
2TB
2TB-66GB
Default value: 64G Region 3rd table (R3T)
User Private area
The bar 2 32 2 GB
2 31 Segment table
16M - Line Below 2GB
0 Page table
On the other hand, z/OS supports for up to 4 TB of CPC memory on a single LP image in a
z14 with up to 32 TB total CPC memory.
The line
This line exists for compatibility reasons. The line is the 16 MB address separating the set of
virtual addresses below 16 MB and the ones above 16 MB. Very old programs (AMODE24
and RMODE24) are loaded and access only addresses below this line.
Common area
Common area is a set of virtual addresses located in all address spaces, where the contents
are shared by all programs running in all these address spaces and consequently in this
z/OS. There are three common areas:
Below the line with the components: Nucleus (for z/OS code), SQA (z/OS control blocks),
LPA (for subsystem and application code) and CSA (mainly subsystem control blocks).
Above the line and below the bar: with components: Extended Nucleus (for z/OS code),
Extended SQA (z/OS control blocks), Extended LPA (for subsystem and application code),
and Extended CSA mainly subsystem control blocks).
Above the bar with the only component High CSA (mainly subsystem control blocks).
In summary, the Nucleus contains the z/OS kernel (CPC memory resident) programs, the
LPA contains subsystem and reentrant load modules, SQA contains z/OS control blocks, and
the CSA contains subsystem control blocks.
Just for comparison purposes, the common area is a set of virtual addresses located in all
address spaces, where the contents are shared by all programs running in all these address
spaces. Shared area is a set of virtual addresses located in some address spaces, where the
contents are shared by all programs running only in these address spaces. One example of
the use of shared area is the Db2 address spaces.
(User) MVCP
(Service
Provider)
Data Data
Move Data
MVCS
PC Pass Control
PR
Cross memory
Cross-memory (XM) is an evolution of virtual storage at z/Architecture. It has in reality two
major objectives:
Pass control synchronously between instructions located in distinct address spaces.
As we can see at the Figure 1-30, there is an instruction PROGRAM CALL (PC), able to
do that. To return the origin AS, there is the instruction PROGRAM RETURN (PR).
Synchronous cross-memory communication enables two load modules (executing
programs) located in private areas of different address spaces to communicate
synchronously. For example, cross-memory communication takes place between load
modules located at AS 2, which gets control from AS 1, when the PC instruction is issued.
Usually the PC-called load module (running in the AS 2) provides a service requested by
the caller (at AS 1) and then returns control through the PR instruction. The called PC
routine executes under the same task (TCB) as the load module that issues the PC. For
instance, address space 1 may be an IBM CICS® address space that invokes address
space 2 using Xmemory(PC), such as a data base management system's address space
asking for a query operation to be performed.
Move data synchronously between virtual addresses located in private areas of distinct
address spaces. Refer to Figure 1-30.
This can be implemented by the use of the SET SECONDARY ADDRESS REGISTER
(SSAR) instruction. It points to an AS and makes it secondary. A secondary AS has its
Segment Table pointed to by CR 7 instead of CR 1. Next, using MOVE CHARACTER TO
Data
Spaces
16E
2G 2G
Extended
Private
Operands
Extended Common
16M ---------------------------------------------
Common
Private
(System)
0 0
Address
Spaces
Figure 1-31 Address spaces and data spaces
Data spaces
Data spaces are data-only (containing no instruction virtual addresses) virtual address
spaces that can hold up to 2 GB of addresses. A data space provides integrity and isolation
for the data pointed to by its virtual addresses. They are a very flexible solution to problems
related to accessing large amounts of data in CPC memory as with Db2 buffer pools. The
idea of data spaces comes from researchers, that is, the idea of separating data and
instructions into different memory spaces.
As stressed above, a data space contains only virtual addresses of data on which to perform
operations (operands), and does not contain virtual addresses for instructions. As an address
space, a data space also has a segment table pointing to page tables. Consequently, when
accessing data spaces, DAT must be smart enough to manage the following translation
activities:
Use the AS segment table to translate virtual addresses for instructions.
Use the data space segment table to translate operand (data) virtual addresses referred
by those instructions.
In order to exploit the data space concept, a program must be executed with the PU in Access
Register mode (PSW bit 16 off and bit 17 on). This mode is set through the SST
CP
0 103 0 16 31
Store Clock
Extended
Instruction
Memory
0 8 112
TOD clock
The TOD clock is a 104-bit counter register inside of the z14 CPC Drawer. In a
multiprocessing configuration, a single TOD clock is shared by all PUs. The TOD clock
provides a high-resolution measure of real time suitable for the indication of date and time of
The cycle of the clock is approximately 143 years (from all bits zero to all bits zero again). The
TOD clock nominally is incremented by 1 in bit 51, every microsecond. In models having a
higher or lower resolution, a different bit position is added at such a frequency that the rate of
advancing the clock is the same as if one were added in bit 51 every microsecond.
TOD follows the Coordinated Universal Time (UTC) that is derived from the atomic time TA1
(based in Cesium isotope 133 radioactivity), and is adjusted with discrete leap seconds to
keep reasonably close to UT1 (based on the Earth’s rotation).
Incrementing the TOD clock does not depend on whether the PU is in a wait state or whether
the PU is in operating, load, stopped, or checkstop states.
Clock Comparator
The clock comparator is a circuit in each PU that provides an external interrupt (X’1004’)
when the TOD clock value exceeds a value specified by the program. Using the clock
comparator, a software application can be alerted when a certain amount of wall clock time
has elapsed, or at a specific hour of the day.
CPU Timer
The CPU timer is a binary counter within a PU, with a format that is the same as that of bits
0-63 of the TOD clock, except that bit 0 is considered a sign. The PU timer nominally is
decremented by subtracting a 1 in bit position 51 every microsecond. In models having a
higher or lower resolution, a different bit position is decremented at a frequency such that the
rate of decrementing the PU timer is the same as if a one were subtracted in bit position 51
every microsecond. The PU timer requests an external interrupt with the interrupt code 1005
hex whenever the PU timer value is negative (bit 0 of the PU timer is one).
The CPU Timer is used by z/OS to account the PU time consumed by all the dispatching units
(TCB and SRB) in the system. When a DU is dispatched, a value X is inputted by z/OS at the
CPU timer. When the DU is interrupted, a value Y is stored in CPC memory coming from the
CPU Timer. X minus Y is the amount of PU time processed along such dispatching interval
The CPU Timer is accessed by the instructions: STORE CPU TIMER and SET CPU TIMER.
The CPU timer stops when the PU is in stop state. This state may be caused by operator
hardware intervention at the Hardware Management Console (HMC), or in PR/SM mode,
when a shared logical PU is ready, meaning not executing in a physical PU.
The IBM z/Architecture external time reference (ETR) allows the synchronization of the CPC
time-of-day (TOD) clocks to ensure consistent time stamp data across multiple CPCs. Then,
ETR provides a mean of synchronizing TOD clocks in different CPCs with a centralized time
reference, which in turn may be set accurately on the basis of an international time standard
(External Time Source). The architecture defines a time-signal protocol and a distribution
network, called the ETR network, that permits accurate setting, maintenance, and
consistency of TOD clocks. It is important to note that, the TOD synchronization between
CPCs is mandatory for the implementation of a Parallel Sysplex, where the participant z/OS
systems are located in distinct CPCs.
ETR time
In defining an architecture to meet z/Architecture time coordination requirements, it was
necessary to introduce a new kind of time, sometimes called ETR time, that reflects the
evolution of international time standards, yet remains consistent with the original TOD
definition. Until the advent of the ETR architecture (September 1990), the server TOD clock
value had been entered manually, and the occurrence of leap seconds had been essentially
ignored. Introduction of the ETR architecture has provided a means whereby TOD clocks can
be set and stepped very accurately, on the basis of an external Coordinated Universal Time
(UTC) time source.
HMC
ETS
z14 z14(1)
Preferred Backup
Time Server Time Server
Straturn 1 Straturn 2
P1
P2
z13(1)
z13(2)
Arbiter
Stratum 2
Stratum 2
P3 P4
z14
The logic of STP is executed by the SAP PU connected to other CPCs through the same CF
Link, the same used in Parallel Sysplex communication, that is, z/OS with coupling facilities.
DASD Hub
Farm
LAN
DASD FICON
zHPF
Switch HyperLink
Ethernet
FICON Token-Ring
Printers Director FDDI
Fast Ethernet
Router 155 ATM
Gigabit Ethernet
Tape
FICON
zHPF Controller
HyperLink
Tape
Vault WAN
37xx,Cisco,3174
Multiple Rooms/Floors/Bldgs.
As you can imagine, the mainframe platform cannot be plug-and-play, for security reasons.
HCD provides the capability to make both hardware and software I/O configuration changes
dynamically. No need for z/OS PL, and no need for logical partition initialization.
The hardware configuration part of the HCD definition describes these resources and the
connections between these resources. The resources include:
CPCs
Logical Channel Subsystems (LCSS)
Logical partitions
I/O Channels
FICON Directors (switches)
The last four items in the above list are called I/O configuration.
Figure 1-34 shows a typical IBM Z family data center. As you can see, the complex consists of
separate I/O devices and networks connected through high-speed data links to the CPC,
which comprises PUs (CP, zIIP, IFL), CPC memory, and I/O channels. It shows connections
among CPCs, as well. z/Architecture provides up to 1,512 high-speed data buses, called I/O
channels, per CPC. Included in those are the OSA channels used to implement network
connection.
Recently, IBM introduced a product called zDAC (z/OS Discovery and Autoconfiguration).
zDAC is very convenient during partial dynamic changes in a base configuration (such as a
new DASD controller), where the modifications are discovered, informed to the installations in
order to be modified or not in the base configuration.
LPARs
Names LP1 LP2 LP3 LP14 LP15 LP16
MIF IDs 1 2 3 1 2 3
CHPIDs 80 81 90 91 80 81 90 91
Directors
A logical channel subsystem (LCSS) has several definitions, such as, a set of:
256 I/O channels, or
15 logical partitions in the same CPC, or
4 subchannel sets.
Many IBM manuals mention channel subsystem only, but meaning the same as LCSS. Also,
some manuals include the SAPs together with the channels in the LCSS.
The LCSS design provides functionality in the form of multiple LCSS. Up to 6 LCSS can be
configured within the same z14, delivering a significant increase in I/O performance and
scalability.
The IBM Z family was designed for processing mainly commercial workloads, dominated by a
high number of I/O operations. Furthermore, for example, the z14 has a very high processing
capacity increasing the burden on the I/O configuration. In support of this, the introduction of
several LCSS allows significantly more channels, more logical partitions, and more devices in
your IBM Z family I/O configuration.
Previously, before the implementation of multiple LCSS, a maximum of only 256 I/O channels
per CPC was possible, which is too low for large customers. These channels were identified
by a 4-bit CHPID (256-bit values). Due to compatibility reasons, the CHPID could not be
At the HCD level, logical partitions cannot be added until at least one LCSS has been
previously and hierarchically defined. Logical partitions are defined under a LCSS, not directly
under the CPC. A logical partition is associated with one LCSS only. CHPID numbers are
unique within the logical partitions of a LCSS. However, the same CHPID number can be
reused within all LCSSs.
SSCH
(Subchannel
number)
ORB UCWs
Executes
Points CCW
CCW
to CCW
Move
contents
Figure 1-36 SSCH instruction logic
If the I/O operation target device has its base UCB not busy (busy bit off), there is no other
ongoing I/O operation running in the device represented by this base UCB. In this case, IOS
issues the SSCH instruction (requesting the I/O operation to SAP) and switches the busy bit
at this corresponding base UCB.
This bit will be switched off, at I/O interruption time, to indicate the end of such I/O operation.
However, due to the implementation of the function Parallel Access Volume (PAV), the 3390
device accepts concurrent I/Os. In this case, if the base UCB is busy, IOS may use shared
UCBs named aliases, in order to star another I/O operation towards the same 3390 device.
If there are no available UCB aliases, this I/O request is queued at the base UCB level, also
called the IOS queue.
SSCH logic
The SSCH instruction has two operands:
General register 1 contains the device identification (where the I/O operation will be
executed) using the device subchannel number (refer to “Subchannel and subchannel
numbers” on page 72).
The second-operand address is a pointer to the operation request block (ORB) that
describes the I/O operation. The ORB is an architected control block informing about what
to do during the I/O operation; among other fields, it contains the channel program
address at CPC memory. Refer to z/Architecture Reference Summary, SA22-7871, for
the ORB contents.
The SAP is signaled to asynchronously perform the start of the I/O operation for the
associated device, and the execution parameters that are contained in the designated ORB
are placed at the designated subchannel (UCW), as follows:
SSCH microcode in the PU moves the ORB contents into the dynamic part of the
respective subchannel (UCW) and places this UCW in a specific Hardware System Area
(HSA) queue called the initiative queue. There is one initiative queue per each SAP. Refer
to “Subchannel and subchannel numbers” on page 72, to get more information about the
subchannel concept.
After that SSCH logic completes, the next IOS instruction is executed, which later will
allow the use of the PU in another task. Probably, the requesting task is placed in wait
state, waiting for the end of this I/O operation.
The z/OS system, the SAP, and the channel subsystem (formed by the FICON I/O channels)
need to know the I/O configuration, which contains the following types of information:
SAP is also in charge of the STP protocol. Refer to “Server Time Protocol (STP)” on page 58.
SAP logic
SAP encounters the UCW in its SAP Initiative queue (as left by the SSCH PU logic) and tries
to find an available FICON I/O channel path (channel, port switch, I/O controller and device)
that succeeds in starting the requested I/O operation. These initiative queues are ordered by
the I/O priority, as determined by the z/OS component Workload Manager (WLM). However,
such queue priority order must be installation allowed at HMC.
SAP uses the I/O configuration information described in the HCD, now stored in the static part
of the UCW, to determine which FICON I/O channels can reach the target device. Initial
selection may be delayed if:
Controller interface delays the establishment of the requested I/O operation. This delay is
called Command Reply (CMR) delay.
The device is busy, due to shared DASD device hardwired Reserve.
During all of these delays, the I/O request is serviced by the SAP without z/OS awareness.
When the I/O operation finishes (device-end status is presented), SAP queues the UCW
(containing all the I/O operation final status) in the I/O interrupt queue, ready to be picked up
by any enabled PU. All the time between the SSCH and the real start of the I/O operation is
timed by SAP and is called pending time.
I/O Channels
I/O channels are components of the z14 Channel Subsystems (CSS). I/O channels are
processors that are much simpler than a PU. They are able to communicate in a dialog with
The different types of channels, including the ones connecting the Coupling Facility (CF) are:
FICON Express 6+ at z14, able to use FICON or zHPF protocols. It uses optical fiber at an
aggregate data rate of 1.6 + GB/sec.
z14 HyperLink Express is a direct connect short distance IBM I/O feature designed to
work in conjunction with a FICON or zHPF. It dramatically reduces latency by
interconnecting the z14 CPC directly to the I/O Bay of the DS8880 DASD controller.
zHyperLink improves application response time, cutting I/O sensitive workload response
time in half without significant application changes.
Open Systems Adapter (OSA) Express 6S, for network connection with the following type
of features:
– 10 Gigabit Ethernet (10 GbE)
– Gigabit Ethernet (1 GbE)
– OSA-Express6S 1000BASE-T Ethernet.
IBM Integrated Coupling Adapter (ICA SR) for Short Distance connecting z14 and z13
with z/OS and coupling facilities logical partitions.
Coupling Express Long Reach (CE LR) for Long Distance connecting z14 and z13 with
z/OS and coupling facilities logical partitions.
The implementation of FICON Director improves your I/O configuration in the following
disciplines: flexibility by easier I/O re-configuration, simplicity by decreasing the number of I/O
elements, disaster recovery by allowing a more efficient remote copy operation.
CE /DE
CR 6 = I/O Interrupts Subclass Mask
4 PATH 2
5 CE/DE, with or w/o CSS C
CR 6 mask O
C U
UNIT CHECK H N
N
A T
I
N R
CP T
N O
UCB E L
6 L
IOS IOQ
IOQ
IOQ
I/O “DISABLED” RG1 UCW#
TSCH 7 3 HSA DE
UCW
SAP 1
8 SCSW I
SSCH
R CCWs
EXT.STAT
B
EXT.CTRL
TPI 9
Device
Usually, IOS then schedules a dispatchable unit (SRB) to post the task waiting IOS. This is
done asynchronously for the I/O operation, and changes the task state from wait to ready.
Another SSCH may be executed for a previously IOS UCB queued I/O request.
In certain error situations, the I/O interrupt is not generated within an expected time frame.
The z/OS component Missing Interrupt Handler (MIH), a timer-driven routine, alerts IOS
about this condition.
The DASD controller accepts control signals from the FICON I/O channel, which manages the
timing of data transfer over the FICON I/O channel path and provides indications concerning
the status of the device. A typical operation is reading a recording medium and recording
(storing) data. To accomplish its operations, the device needs detailed signal sequences
peculiar to its type of device. The DASD controller decodes the commands received from the
FICON I/O channel (as described at the channel program) interprets them for the particular
type of device, and provides the signal sequence required for the performance of the
operation.
A DASD controller in a sense is a true complex system, with the following aspects:
Millions of software lines of code (known as Licensed Internal Code, or LIC)
Several state-of-art RISC processors
Huge real memory for data caching and internal LIC functions
Complex internal fabric connections
DASD space capacity (magnetic and flash) in tens of terabyte units
The major difference among both controllers is the media type where data is stored. Tape
controllers keep data in cartridges. For example, the 3590 Megstar cartridge has the following
properties:
128 or 256 tracks
Mechanical speed of 2 m/sec
Up to theoretical 20 MB/sec
IBMLZ1 data compression (dictionaries)
10 GB (30 GB) or 20GB (60 GB) not compressed data
Same external dimensions as former
3480 and 3490 cartridges
No interchange with 3480 and 3490 drives
Linear serpentine recording
New colored inserts identify cartridge type
Servo tracks and RAID implementation
Tape technology was nearly dead and buried many times, but it survives. Although still having
many opponents, tape technology shows no sign of obsolescence. The major reasons are
price and portability (perfect for backups and migrated data sets), and maybe tapes will
minimize the big-data tsunami that will come. Some tape media properties:
Logical records are grouped in a physical block, in order to have fewer gaps (use the
space better) and more efficient Is. So, there is not Count/Key/Data as in a 3390 track.
A set of related blocks is a data set, which maybe multivolume. When a tape data set is
deleted the data is still there (no 3390 Erase on Scratch). To reclaim this space the volume
must be recycled. A data set can be read forward or optionally backward.
At steady state, a tape data rate is faster than magnetic disk media.
Tape was first Controller to implement data compression avoiding doing such at the PU,
then, saving MSUs.
It is possible with z/OS to have blocks with a size up to 1 MB (Large Tape Blocksize).
Allows tape device sharing among z/OS systems in a Parallel Sysplex.
Minimizes the natural setup performance problems of tapes, such as: mount, load, rewind
and data set scan, some solutions were designed:
– Automatic Robot
– Virtual Tapes
Optimizes tape data management, and many solutions were developed:
– Cataloging tape data sets in the z/OS catalog
– To use SMS constructs for tape data sets and volumes
– Remote Copy (PPRC) to include the data stored in tapes as participant of a DR
implementation.
Improves data security tape storage including:
– Data can be encrypted (after compression)
– Supports WORM (write once, read many) tape data cartridges whose data cannot be
altered.
Tape Virtualization
By Tape Virtualization (IBM VTS product) we understand, the use of intermediate DASD
(named cache) media to store tape data sets in a totally compatible software access.
Then, Tape Virtualization is a hardware-based solution that addresses not only tape cartridge
utilization but also tape device utilization and hence overall tape subsystem costs. Because of
its transparency to the host software, this solution is readily supported and easy to use.
2000 2008
HSA
2001 2009
LPAR B
2002 200A
Central Storage
FF01 2003 200B
LPAR A
Central Storage 2004 200C
2005 200D
UCB
2006 200E
2001
FF02 2007 200F
UCB
2000 V 200A,ONLINE
C40 IEE302I 200A ONLINE
UCB
183F
V 200B,ONLINE
FF03
Then, an I/O device is the endpoint in the “conduit” between a CPC memory and external
data. The major type of I/O devices attached to Z Family CPCs are DASD, tape, and printers.
Each I/O device is represented within IOS by a Unit Control Block (UCB). An UCB holds
information about an I/O device, such as:
State information for the device
Features of the device
Each I/O device is represented within SAP and FICON I/O channels by a Unit Control Word
(UCW) located at HSA (also known as subchannel).
An individual I/O device may be accessible to the channel subsystem by as many as eight
different FICON I/O channels. Each I/O device is identified by three types of addresses:
device number, subchannel number and unit address.
Device number
Each I/O device is identified by a system-unique parameter called the device number. The
device number is a 16-bit value that is assigned at HCD allowing a maximum of 65,536
devices. It is a sort of nickname to identify the I/O device during communication between
people and z/OS. Refer to Figure 1-39.
For example, the device number is entered by the system operator to designate the input
device to be used for initial program loading (IPL), or the operator after the IPL completes can
toggle a device online or offline, as follows:
V 200A, online
The device number is stored in the UCW (in the Hardware System Area - HSA) at Power-on
Reset (POR) coming from the IOCDS. It is also stored at UCB at IPL.
IOCDS
Subchannel
Before covering the concept of subchannel number let’s review the subchannel concept.
Subchannel is a z/Architected entity, represented in the HSA by a control block named Unit
Control word (UCW). Each I/O device accessible to the channel subsystem is assigned a
dedicated subchannel at installation POR time.
Then a subchannel provides the logical appearance of a device to z/OS, and contains
information required for sustaining a single I/O operation. I/O operations are initiated with a
device by the execution of I/O instructions (such as SSCH) that designate the subchannel
associated with the device.
Each subchannel provides information concerning the associated I/O device and its
attachment to the channel subsystem, and other functions involving the associated I/O
device. The subchannel also provides information concerning a current I/O operation, but only
one.
Then, the subchannel consists of internal memory (UCW) that contains two types of
information:
Static from the HCD process, describing the paths to the device (CHPIDs), device number,
I/O-interrupt-subclass code, as well as information on path availability.
Dynamic referred to ongoing I/O operation, such as functions pending or being performed,
next CCW address, status indication. There is just one I/O operation per UCW.
Subchannel number
Subchannel numbers are used to identify an I/O device along the communication between the
IOS (through up to 8 instructions, such the SSCH) and the group formed by SAP and the
FICON channels.
Then, a subchannel number is a 16-bit (2 bytes) value whose valid range is 0000-FFFF and
which is used to address a subchannel. In order to address the limitation of up to 64 KB
devices at large customers, z14 introduced four subchannel sets per LCSS.
In reality, a subchannel number is the UCW index in the subchannel set table. Refer to
Figure 1-40 on page 72, that shows the offset concept of the subchannel number in the UCW
table as assigned during installation. When IOCP generates a basic IOCDS, it creates one
subchannel for each I/O device associated with a logical control unit. Refer to “Unit address
and logical control unit” on page 74.
Then, the subchannel number is another way for addressing an I/O device. It is a value that is
assigned to the subchannel (UCW) representing the device at POR time. It indicates the
relative position of the UCW in the UCW table. The subchannel number was designed to
speed up the search of a UCW during SSCH processing.
Then, there is a maximum of 65,536 subchannels, (64 K) devices per subchannel set and
there are 4 subchannel sets per LCSS and there is 6 LCSS per z14.
About the 4 subchannel sets, we may say that the subchannel sets 1, 2, and 3 are used for
PAV alias addresses and the secondary devices at remote copy synchronous (PPRC). The
subchannel set 0 still reserves 64 KB subchannels for z/OS use.
In z14, the maximum number of UCWs is 64 KB times 6 times 4, because there are four UCW
tables per LCSS in the HSA. The same subchannel number is duplicated in different LCSSs.
However, it does not pose a problem because the same z/OS can be in just one logical
partition that belongs to just one LCSS.
Note: When you have devices in controller s with the capability of doing parallel I/O
operations (Parallel Access Volume - PAV) as IBM DS8000®, you need to define a UCW
for each additional parallel I/O to the same device. You must do this because each UCW
supports only one I/O operation at a time.
Control Unit
CHP CHP CHP CHP
2C 32 24 29
Device Number 7040 7041 7042 7043 7044 7045 7046 7047
Unit Address 00 01 02 03 04 05 06 07
Unit address
The unit address (UA) or device address is used to reference the device during the
communication between a FICON I/O channel and the controller serving the device. The UA
is two-hex digits in the range 00-to-FF, and is stored in the UCW, as defined to HCD. Then, it
is transmitted over the I/O interface to the controllers by the FICON channel.
I/O operation
In the Figure 1-41 on page 74, IOS issues the SSCH instruction pointing to the UCW with the
subchannel number X’1E38’. One of the FICON I/O channels (CHPID 24, for example),
In summary:
The device number is used in communication between an operator and z/OS.
The subchannel number is used in communication between PU (z/OS) and channel
subsystem.
The unit address is used in communication between the channel and the controller.
DISPLAY M command
Use the DISPLAY M command to display the status of sides, servers, channel paths, devices,
CPC memory, and expanded memory, or to compare the current hardware configuration to
the configuration in a CONFIGxx parmlib member. The system is to display the number of
online channel paths to devices or a single channel path to a single device. In Figure on
page 75, the command output shows the device status as online and for each of the CHPIDs,
(10, B0, 88, and DC), whether the channel path in online, the CHPID is online, and the path is
operational.
Channel
Appl I/O IOS
Subsystem
Storage DRVs
UCB
9 200C
ESCD CU
ASM UCW
S/C
Access UCB 1E39
Method JES 200B
2 EXCP UCW
S/C
UCB 1E38
3 200A SSCH ESCD CU
UCW
S/C
IOSB IOQ ORB 1E37
IOSB LPM DEV
4 CCW CCW
5
CCW CCW Device
PUT or ESCD CU
GET number
I/O 200A
READ or 7
WRITE Interrupt 6
TSCH
8 IRB UCW
S/C
1 UCB 1E36 ESCD CU
2009
User
Appl UCW
S/C
VTAM UCB 1E35
10 2008
I/O summary
Figure 1-42 shows a summary of the flow of an I/O operation from the request issued by an
application until the operation completes.
The latest member of the IBM Z family, the z14, is built upon a tried and true architecture to
support your digital transformation, create a strong cloud infrastructure, and expose back-end
services through secure APIs.
The z14 can help your organization make consistently optimal decisions, gain operational
data insights so you get the most value from your IT investment, and fully protect your data
with pervasive encryption (in-flight and at-rest), while facilitating regulatory compliance.
Leadership for a trust economy can be built on the z14. It is the premier system for enabling
data as the new security perimeter, and is designed for data serving in a cognitive era. The
z14, more than any other platform, offers a high-value architecture that supports an open and
connected world.
The z14 goes beyond previous designs while continuing to enhance the traditional mainframe
qualities, delivering unprecedented performance and capacity growth. The z14 introduces a
paradigm shift for protecting data and transactions, from selective encryption to pervasive
encryption.
The z14 can be configured with up to 170 processors and up to 30 TBs memory. It offers hot
pluggable I/O drawers that build on previous IBM Z innovations, and continues the utilization
of advanced technologies such as InfiniBand and RoCE (RDMA over Converged Ethernet).
The z14 has two options concerned with cooling: water and air. You, the customer chooses
which configuration is best suited to your environment.
It continues to run all the mainframe operating systems as such: z/OS, z/VM, z/VSE, Linux for
z, z/TPF and CFCC (for coupling facility).
The z14 provides up to a 30% increase in total system capacity over the previous model (IBM
z13), and has up to 3.2 times the available central storage (up from 10 TB to 32 TB). The
maximum number of processor units (PUs) has increased from 141 to 170.
The A-Frame: houses up to four Processor drawers, central storage, a PCIe I/O drawer
(Peripheral Component Interconnect® Express) and two 1U servers (Support Elements)
The Z-Frame: houses up to four PCIe I/O drawers, and the optional Internal Battery Facility
(IBF)
The PCIe I/O drawers contain I/O cards with I/O processors, known as channels.
Cryptographic functions
The z14 provides two major cryptographic functions: CPACF and Crypto Express6S.
The Central Processor Assist for Cryptographic Function (CPACF), standard on every core,
supports pervasive encryption and provides hardware acceleration for encryption operations.
The Crypto Express6S feature provides for high-performance cryptographic operations and
support for up to 85 domains.
Combined, these two enhancements perform encryption more efficiently on the z14 than on
earlier Z platforms.
CPACF
CPACF is a high performance, low latency co-processor that performs symmetric key
encryption and calculates message digests (hashes) in hardware and is the key to Pervasive
encryption. On-chip encryption rates on the z14 CPACF are up to 6X faster than the z13.
IBM Z pervasive encryption provides the comprehensive data protection that your
organization and customers demand. By placing the security controls on the data itself, the
solution creates an envelope of protection around the data. For example, Z pervasive
encryption helps protect the at-rest and in-flight data that is on your Z infrastructure. Also,
centralized, policy-based data encryption controls significantly reduce the costs that are
associated with data security and regulatory compliance. This leap in performance, together
with having one dedicated cryptographic co=processor per PU (core) is the key to achieving
pervasive encryption.
Crypto Express6S
The tamper-sensing and tamper-responding Crypto Express6S features provide acceleration
for high-performance cryptographic operations and support up to 85 domains. This
specialized hardware performs AES, DES/TDES, RSA, Elliptic Curve (ECC), SHA-1, and
SHA-2, and other cryptographic operations. It supports specialized high-level cryptographic
APIs and functions, including those required in the banking industry.
Crypto Express6S features are designed to meet the FIPS 140-2 Level 4 and PCI HSM
security requirements for hardware security modules.
The z14 offers twice the AES performance as the z13, a True Random Number Generator,
SHA3 support, and RSA/ECC acceleration.
The IFL, zIIP processor units on the z14 server can be configured to run two simultaneous
threads per clock cycle in a single processor (SMT), increasing the capacity of these
processors with 25% in average over processors running single thread. SMT is also enabled
by default on SAPs.
z14 Virtualization
The IBM z14 server supports z/Architecture mode only and can be initialized either in LPAR
mode (sometimes known as PR/SM) or DPM (Dynamic Partition Manager) mode. Only one
mode can be active and a Power-on Rest is required to switch between modes.
LPAR supports configuring up to 85 LPARs, each of which has logical processors, memory,
and I/O resources. Resources of these LPARs are assigned from the installed CPC drawers
and features. LPAR configurations can be dynamically adjusted to optimize the LPAR or
servers’ workloads.
For more information about LPAR and PR/SM functions, see Chapter 4, “Virtualization and
Logical Partition (PR/SM) concepts” on page 113.
It is intended to simplify virtualization management and is easy to use, especially designed for
those who have little or no experience with IBM Z. It does not require you to learn complex
syntax or command structures.
DPM provides simplified hardware and virtual infrastructure management, including partition
lifecycle and integrated dynamic I/O and PCIe functions management for Linux running in an
LPAR, under KVM on z, and under z/VM 6.4. Using DPM, an environment can be created,
provisioned, modified without disrupting running workloads, and monitored for
troubleshooting.
Enhancements to DPM on the z14, simplify the installation of the Linux operating system,
support additional hardware cards, and enable base cloud provisioning through OpenStack,
including the following enhancements:
Support for auto configuration of devices to simplify Linux Operating System Installation,
where Linux distribution installers exploit function
Secure FTP through HMC for booting and installing an Operating system by using FTP
The system I/O buses take advantage of the Peripheral Component Interconnect Express
(PCIe) technology and the InfiniBand technology, which are also used in coupling links.
The z14 connectivity supports the following I/O or special purpose features:
Storage connectivity:
– Fibre Channel connection (IBM FICON):
FICON Express16S+ 10 km long wavelength (LX) and short wavelength (SX)
FICON Express16S 10 km LX and SX
FICON Express8S 10 km LX and SX
– IBM zHyperLink Express
Network Connectivity:
– Open Systems Adapter (OSA):
OSA-Express6S 10 GbE long reach (LR) and short reach (SR)
OSA-Express6S GbE LX and SX
OSA-Express6S 1000BASE-T Ethernet
OSA-Express5S 10 GbE LR and SR
OSA-Express5S GbE LX and SX
OSA-Express5S 1000BASE-T Ethernet
– IBM HiperSockets
– Shared Memory Communication - Remote Direct Memory Access (SMC-R):
10 GbE RoCE (RDMA over Converged Ethernet) Express2
10 GbE RoCE Express
– Shared Memory Communication - Direct Memory Access (SMC-D) through Internal
Shared Memory (ISM)
Coupling and Server Time Protocol connectivity:
– Internal Coupling (IC) links
– Integrated Coupling Adapter Short Reach (ICA SR)
– Coupling Express Long Reach (CE LR)
– HCA3-O, 12x Parallel Sysplex InfiniBand (IFB) coupling links
– HCA3-O, 1x Parallel Sysplex InfiniBand (IFB) coupling links
All IBM Z servers are designed to enable the highest availability and lowest downtime.
Comprehensive, multi-layered strategy includes:
Error Prevention
Error Detection and Correction
Error Recovery
With a properly configured z14 server, further reduction of outages can be attained through
First Failure Data Capture (FFDC) - designed to reduce service times and avoid subsequent
errors, improved non-disruptive replace, repair, and upgrade functions for memory, drawers,
and I/O adapters. In addition, z14 servers have extended non-disruptive capability to
download and selectively install LIC updates.
The z14 RAS features provide unique high-availability and non-disruptive operational
capabilities that differentiate the Z servers in the marketplace. z14 RAS enhancements are
made on many components of the CPC (processor chip, memory subsystem, I/O and
service) in areas like error checking, error protection, failure handling, error checking, faster
repair capabilities, sparing, and cooling.
The ability to cluster multiple systems in a Parallel Sysplex takes the commercial strengths of
the IBM Z hardware and z/OS Operating System to higher levels of system management,
scalable growth, and continuous availability.
System design
Figure 2-2 shows a four-axis comparison between the most recent IBM Z product lines: z14,
z13, z12, z196 and z10EC. As you can see, there is growth on three of the four axes, which
results in a balanced system design:
PCI (performance capacity index) per PU grew approximately 8%. Using Large Systems
Performance Reference (LSPR) measurements, new measurements with z/OS V2R2, and
re-evaluating the calculation for the mixed workload, the z14 uniprocessor PCI is 1832.
The maximum number of processors grew from 141 to 170
Central storage grew from 10 TB to 30 TB.
The aggregate I/O remained constant as 832 GB/sec.
Central storage growth is proportionally larger than the other three resources. One of the
reasons behind this increase is that in heavy commercial data processing environments, and
with the trend in analytics to have large in-memory databases, central storage can really
improve the overall performance by drastically decreasing the I/O rate. In other words, central
storage is good for the overall performance health of the entire system.
Simulation support
z/VM guest virtual machines can create virtual specialty processors on processor models that
support these specialty processors but do not necessarily have them installed. Virtual
specialty processors are dispatched on real CPs. Simulating specialty processors provides a
test platform for z/VM guests to exploit mixed-processor configurations. This allows users to
assess the operational and processor utilization implications of configuring a z/OS system
with zIIP processors without requiring the real specialty processor hardware.
Model M01: one drawer with 41 PUs, 5 standard SAPs and 2 standard spares, 1 IFP, then 33
characterized PUs.
Model M02: two drawers with 41 PUs each, 10 standard SAPs and 2 standard spares, 1 IFP,
then 69 characterized PUs.
Model M03: three drawers with 41 PUs each, 15 standard SAPs and 2 standard spares, 1
IFP, then 104 characterized PUs.
Model M04: four drawers with 41 PUs each, 20 standard SAPs and 2 standard spares, 1 IFP,
then 140 characterized PUs.
Model M05: four drawers with 49 PUs each, 23 standard SAPs and 2 standard spares, 1 IFP,
then 170 characterized PUs.
As mentioned, the z14 hardware model names indicate the number of drawers present
(except for the M05 Special build system), and gives no indication of the actual number of
active processors.
Sub-capacity models
Sub-capacity models are servers in which the instruction execution rate is controlled to
increase “cycles per instruction”. It should be emphasized that the sub-capacity CPs are not
actually slower - the clock frequency doesn’t change, but they have a lower capacity for
performing work due to the change in execution rate.
On the z14, there are 170 (which is the number of available PUs) full capacity settings and
there are an additional 99 sub-capacity settings (33 each in the 4xx, 5xx and 6xx range).
There is no affinity between the hardware model and the number of activated CPs. For
example, it is possible to have a two-drawer Model M02 (82 characterized PUs) but with only
13 processors, so for software billing purposes, the machine might report as a Model 713,
613, 513 or 413 depending on the capacity model chosen by the customer.
After xx exceeds 33, then all processor engines are full capacity.
M01 701 - 733, 601 - 633, 501 - 533, and 401 - 433
M02 701 - 769, 601 - 633, 501 - 533, and 401 - 433
M03 701 - 7A5, 601 - 633, 501 - 533, and 401 - 433
M04 701 - 7E5, 601 - 633, 501 - 533, and 401 - 433
M05 701 - 7H0, 601 - 633, 501 - 533, and 401 - 433
Note: There is a model capacity identifier 400 which is used for ICF or IFL only models.
There are 270 (170 + 99 + 1 special case) possible combinations of capacity levels and
numbers of processors. These offer considerable overlap in absolute capacity, providing
different ways to achieve the required capacity. For example, a specific capacity (expressed
as MSUs) might be obtained with a single faster CP or with three sub-capacity CPs. The
single CP server might be a better choice for traditional CICS workloads because they are
single task (however, a processor loop can have a significant negative impact), and the
three-way server might be a better choice for a mixed batch/online workload.
http://www.ibm.com/systems/z/resources/swprice/reference/exhibits/hardware.html
z14 frames
The z14 is always a two-frame system. The A Frame and the Z Frame. It can be delivered as
an air-cooled system or as a water-cooled system. The radiator-cooled z14 models support
installation on raised floor and non-raised floor environments. For water-cooled models,
installation is only available on a raised floor.
The A-Frame can house up to four Processor drawers, central storage, a PCIe I/O drawer
(Peripheral Component Interconnect Express), cooling hardware and two 1U servers
(Support Elements).
The Z-Frame can house up to four PCIe I/O drawers, the power supplies and the optional
Internal Battery Facility (IBF).
The PCIe I/O drawers contain I/O cards with I/O processors, known as channels.
In addition, the z14 offers top-exit options for the I/O and power cables. These options give
you more flexibility in planning where the system resides, potentially freeing you from running
cables under a raised floor and increasing air flow over the system.
Figure 2-5 shows an internal, front view of the two frames of an air-cooled z14 system with
the maximum five PCIe I/O drawers, including the top-exit power cable options.
PCIe I/O drawers provide increased I/O granularity and capacity flexibility and can be
concurrently added and removed in the field.
Single chip modules are further described in the section 2.10, “Single Chip Module (SCM)” on
page 91. 2
Memory:
– A minimum of 320GB and a maximum of 32 TB of memory (excluding 192GB HSA) is
available for client use. See Table 2-1 on page 93 for details.
– Either 15, 20, or 25 memory DIMMs are plugged in a CPC drawer.
Fanouts:
Fanouts are interfaces to connect either the internal PCIe I/O drawers or externally to another
IBM Z processors.
The CPC drawer provides up to 10 PCIe Gen3 fanout adapters to connect to the PCIe I/O
drawers and ICA SR coupling links, and up to four InfiniBand fanout adapters for 12x
InfiniBand and 1x InfiniBand coupling links.
DCAs provide power to the CPC drawer. Loss of one DCA leaves enough power to satisfy the
power requirements of the entire drawer. The DCAs can be concurrently maintained.
Two Flexible Support Processors (FSP)
In a system with more than one book, all physical memory in the book containing the failing
memory is taken offline, which allows you to bring up the system with the remaining physical
memory in the other books. In this way, processing can be resumed until a replacement
memory is installed.
The SC SCM holds the L4 cache, and the number of active PU cores on each of the PU
SCMs can also range from 7 to 10 for all models.
Each SC SCM includes 672 MB shared eDRAM cache. The SC SMC is configured to provide
a single 672 MB L4 cache that is shared by all PU cores in the CPC drawer. This amount of
cache provides a total of 2.68GB of cache if all four CPC drawers are implemented, yielding
outstanding SMP scalability for real-world workloads.
PU cache
The on-chip cache for the PU (core) works in this way:
Each PU core has an L1 cache (private) that is divided into a 128 KB cache for
instructions and a 128 KB cache for data.
PU sparing
Hardware fault detection is embedded throughout the design, and is combined with
comprehensive instruction-level retry and dynamic PU sparing. This function provides the
reliability and availability that is required for true mainframe integrity.
Software support
The z14 PUs provide full compatibility with existing software for z/Architecture, and extend the
Instruction Set Architecture (ISA) to enable enhanced functionality and performance. Several
hardware instructions that support more efficient code generation and execution are
introduced in the z14.
PU characterization
PUs are ordered in single increments. The internal system functions, which are based on the
configuration that is ordered, characterize each PU into one of various types during system
initialization, which is often called a power-on reset (POR) operation. Characterizing PUs
dynamically without a POR is possible by using a process called Dynamic Processor Unit
Reassignment. A PU that is not characterized cannot be used. As discussed previously, each
PU can be characterized as follows:
Central processor (CP)
Integrated Facility for Linux (IFL) processor
IBM Z Integrated Information Processor (zIIP)
Internal Coupling Facility (ICF)
System assist processor (SAP)
Integrated firmware processor (IFP)
2.12 Memory
Maximum physical memory size is directly related to the number of CPC drawers in the
system. Typically, a system has more memory installed than was ordered because part of the
installed memory is used to implement the redundant array of independent memory (RAIM)
design. On the z14, this configuration results in up to 8 TB of available memory per CPC
drawer and up to 32 TB for a four-drawer system.
Table 2-1 lists the maximum memory sizes for each z14 model.
M01 1 8 TB
M02 2 16 TB
M03 3 24 TB
M04 4 32 TB
M05 4 32 TB
On z14 systems, the granularity for memory increases is 64 - 512 GB. Table 2-2 shows the
memory increments, depending on installed memory.
64 256-384
128 384-896
256 896-2944
512 2944-32576
2.13.1 Power
The system operates with two fully redundant power supplies. One is in the front side of the Z
frame, and the other is in the rear side of the Z frame. Each power supply has either one or
two power cords. The number of power cords that are required depends on the system
configuration. The total loss of one power supply has no impact on system operation.
Systems that specify two power cords can be brought up with one power cord and continue to
run. The larger systems that have a minimum of four BPR (Bulk Power Regulators) pairs that
are installed must have four power cords installed. Systems that specify four power cords can
Power cords attach to either a three-phase, 50/60 Hz, 200 - 480 V AC power source, or a 380
- 520 V DC power source.
A Balanced Power Plan Ahead feature is available for future growth, helping to ensure
adequate and balanced power for all possible configurations. With this feature, system
downtime for upgrading a server is eliminated by including the maximum power requirements
in terms of BPRs and power cords to your installation.
For ancillary equipment, such as the Hardware Management Console, its display, and its
switch, extra single-phase outlets are required.
The power requirements depend on the cooling facility that is installed, and on the number of
CPC drawers and I/O units that are installed. For more information on power requirements,
see Installation Manual for Physical Planning, GC28-6965-00a, available on IBM Resource
Link®.
Note: The exact power consumption for your system varies. The object of the tool is to
estimate the power requirements to aid you in planning for your system installation. After
your system is installed, then the actual power consumption can be confirmed by using the
HMC Monitor Dashboard task.
2.13.2 Cooling
As discussed previously, the z14 is available as either air cooled or water cooled, depending
on the customers’ requirements. The selection of air-cooled models or water-cooled models
is done when ordering, and the appropriate equipment is factory-installed. An MES
(conversion) from an air-cooled model to a water-cooled model and vice-versa is not allowed.
In all z14 servers, the CPC drawer, SC SCMs, PCIe I/O drawers, I/O drawers, and power
enclosures are all cooled by forced air with blowers that are controlled by the Move Device
Assembly (MDA).
The PU SCMs in the CPC drawers are cooled by water. The internal closed water loop takes
heat away from PU SCMs by circulating water between the radiator heat exchanger and the
cold plate that is mounted on the PU SCMs.
Air-cooled models
Although the PU SCMs are cooled by an internal closed loop water system, the heat is
exhausted into the room from the radiator heat exchanger by forced air with blowers.
Unlike the radiator in air-cooled models, an IBM z14 has two water loops: An internal closed
water loop (same as with the air-cooled models) and an external (chilled) water loop. The
external water loop connects to the client-supplied building’s chilled water. The internal water
loop circulates between the water cooling unit (WCU) heat exchanger and the PU SCMs cold
plates. The loop takes heat away from the PU SCMs and transfers it to the external water loop
in the WCU’s heat exchanger.
In addition to the PU SCMs, the internal water loop also circulates through two heat
exchangers that are in the path of the exhaust air in the rear of the frames. These heat
exchangers remove approximately 60% - 65% of the residual heat from the I/O drawers, PCIe
I/O drawers, the air-cooled logic in the CPC drawers, and the power enclosures. Almost
two-thirds of the total heat that is generated can be removed from the room by the chilled
water.
2.14 Upgrades
Capacity upgrades are possible on all IBM z14 systems and may be required for various
reasons. For example, a business might grow steadily, year after year, or activity spikes can
happen during marketing campaigns.
Permanent upgrades that are ordered through an IBM representative are available for the
following tasks:
Add processor drawers
Add Peripheral Component Interconnect Express (PCIe) drawers and features
Add model capacity
Add specialty engines
Add memory
Activate unassigned model capacity or IFLs
Deactivate activated model capacity or IFLs
Activate channels
Activate cryptographic engines
Change specialty engine (re-characterization)
Permanent upgrades initiated by the client through CIU allow you to add capacity to fit within
your existing hardware.
Add model capacity
Add specialty engines
Add memory
Activate unassigned model capacity or IFLs
Deactivate activated model capacity or IFLs
CBU or CPE temporary upgrades can be ordered by using the CIU application through
Resource Link or by calling your IBM marketing representative.
Billable capacity
To handle a peak workload, you can activate up to double the purchased capacity of any
processor unit (PU) type temporarily. You are charged daily.
Replacement capacity
When a processing capacity is lost in another part of an enterprise, replacement capacity can
be activated. It allows you to activate any PU type up to your authorized limit.
The concurrent capacity growth capabilities that are provided by z14 servers include, but are
not limited to, these benefits:
Enabling the meeting of new business opportunities
Supporting the growth of smart and cloud environments
Managing the risk of volatile, high-growth, and high-volume applications
Supporting 24 x 7 application availability
Enabling capacity growth during lockdown or frozen periods
Enabling planned-downtime changes without affecting availability
The sub-capacity models allow more configuration granularity within the family. The added
granularity is available for models that are configured with up to 33 CPs, and provides 99
extra capacity settings. Sub-capacity models provide for CP capacity increase in two
dimensions that can be used together to deliver configuration granularity. The first dimension
is adding CPs to the configuration. The second is changing the capacity setting of the CPs
currently installed to a higher model capacity identifier.
z14 servers allow the concurrent and non-disruptive addition of processors to a running
logical partition (LPAR). As a result, you can have a flexible infrastructure in which you can
add capacity without pre-planning. This function is supported by z/OS, z/VM, and z/VSE.
There are two ways to accomplish this addition:
With planning ahead for the future need of extra processors. In the LPAR’s profile,
reserved processors can be specified. When the extra processors are installed, the
number of active processors for that LPAR can be increased without the need for a
partition reactivation and initial program load (IPL).
Another (easier) way is to enable the dynamic addition of processors through the z/OS
LOADxx member. Set the DYNCPADD parameter in member LOADxx to ENABLE. z14
servers support dynamic processor addition in the same way that the z13, zEC12, z196,
and z10 support it.
After you place an order through the CIU facility, you receive a notice that the order is ready
for download. You can then download and apply the upgrade by using functions that are
available through the Hardware Management Console (HMC), along with the RSF. After all
the prerequisites are met, the entire process, from ordering to activation of the upgrade, is
performed by the client.
After download, the actual upgrade process is fully automated and does not require any
onsite presence of IBM SSRs.
The CIU facility supports LICCC upgrades only. It does not support I/O upgrades. All
additional capacity that is required for an upgrade must be previously installed.
Permanent upgrades
Permanent upgrades can be ordered by using the CIU facility. Through the CIU facility, you
can generate online permanent upgrade orders to concurrently add processors (CPs, ICFs,
zIIPs, IFLs, and SAPs) and memory, or change the model capacity identifier. You can do so
up to the limits of the installed processor drawers on an existing system.
Temporary upgrades
The base model z14 server describes permanent and dormant capacity using the capacity
marker and the number of PU features installed on the system. Up to eight temporary
offerings can be present. Each offering has its own policies and controls, and each can be
activated or deactivated independently in any sequence and combination. Although multiple
CBU is a quick, temporary activation of processor units (CP, zIIP, IFL, ICF, SAP) in the face of
a loss of processing capacity due to an emergency or disaster/recovery situation. When a z14
has the CBU feature enabled, the client is entitled to a number of tests over a period of time to
validate that the CBU works as expected.
Note: CBU is for disaster/recovery purposes only, and cannot be used for peak load
management of customer workload under the terms and conditions which the client has
signed.
CPE is intended to replace capacity lost within the enterprise due to a planned event such as
a facility upgrade or system relocation. CPE is intended for short-duration events lasting up to
a maximum of three days. Each CPE record, after it is activated, gives the customer access to
all dormant PUs on the server. Processing units can be configured in any combination of CP
or specialty engine types (zIIP, SAP, IFL, ICF). The capacity needed for a given situation is
determined by the customer at the time of CPE activation.
The processors that can be activated by CPE come from the available spare PUs on any
installed book. CPE features can be added to an existing z14 non-disruptively. There is a
one-time fixed fee for each individual CPE event. The base server configuration must have
sufficient memory and channels to accommodate the potential needs of the large
CPE-configured server. It is important to ensure that all required functions and resources are
available on the server where CPE is activated, including CF LEVELs for Coupling Facility
partitions, memory, and cryptographic functions, as well as connectivity capabilities.
The IBM Z family includes the new IBM z14 (z14) as well as the IBM z13s (z13s), IBM z13
(z13), zEnterprise zEC12, zBC12, z196, z114.
Unless otherwise stated, the term OSA applies to all OSA-Express features throughout this
book.
For more detail on IBM Z connectivity, see the IBM Z Connectivity Handbook, SG24-5444.
The SAP uses the I/O configuration definitions loaded in the hardware system area (HSA) of
the system to identify the external devices and the protocol they support. The SAP also
monitors the queue of I/O operations passed to the CSS by the operating system.
Using a SAP, the processing units (PUs) are relieved of the task of communicating directly
with the devices, so data processing can proceed concurrently with I/O processing.
Increased system performance demands higher I/O and network bandwidth, speed, and
flexibility, so the CSS evolved along with the advances in scalability of the Z platforms. The
z/Architecture provides functions for scalability in the form of multiple CSS that can be
configured within the same IBM Z platform,
PCIe I/O drawers allow a higher number of features (four times more than the I/O drawer and
a 14% increase over the I/O cage on previous IBM z13) and increased port granularity. Each
drawer can accommodate up to 32 features in any combination. They are organized in four
hardware domains per drawer, with eight features per domain. The PCIe I/O drawer is
attached to a PCIe fanout in the CPC drawer, with an interconnection speed of 8 GBps with
PCIe Gen2 and 16 GBps with PCIe Gen3. PCIe I/O drawers can be installed and repaired
concurrently in the field.
The Internal Coupling (IC) channel, IBM HiperSockets, and Shared Memory Communications
can be used for communications between logical partitions within the Z platform.
The Open Systems Adapter (OSA) features provide direct, industry-standard Ethernet
connectivity and communications in a networking infrastructure.
The 10GbE RoCE Express2 and 10GbE RoCE Express features provide high-speed,
low-latency networking fabric for IBM z/OS-to-z/OS shared memory communications.
Many features have an integrated processor that handles the adaptation layer functions
required to present the necessary features to the rest of the system in a uniform manner.
Therefore, all the operating systems have the same interface with the I/O subsystem.
The z14, z13, z13s, zEC12, and zBC12 support industry-standard PCIe adapters called
native PCIe adapters. For native PCIe adapter features, there is no adaptation layer, but the
device driver is present in the operating system. The adapter management functions (such as
diagnostics and firmware updates) are provided by Resource Groups.
There are four Resource Groups on z14, and there are two Resource Groups on z13, z13s,
zEC12, and zBC12. The Resource Groups are managed by an integrated firmware processor
that is part of the system’s base configuration.
The following sections briefly describe connectivity options for the I/O features available on
the z14 platforms.
The FICON implementation enables full-duplex data transfer. In addition, multiple concurrent
I/O operations can occur on a single FICON channel. FICON link distances can be extended
by using various solutions.
The FICON features on IBM Z also support full fabric connectivity for the attachment of SCSI
devices by using the Fibre Channel Protocol (FCP). Software support is provided by IBM
z/VM, IBM z/VSE, and Linux on IBM Z operating systems.
The zHyperLink Express feature is a native PCIe adapter and can be shared by multiple
LPARs.
On the z14, the zHyperLink Express feature is installed in the PCIe I/O drawer, on the IBM
DS8880, the fiber optic cable connects to a zHyperLink PCIe interface in I/O bay.
The zHyperLink Express feature has the same qualities of service as do all Z I/O channel
features. zHyperLink Express is not a replacement for FICON and in fact FICON attachments
to the same DS8880 are a requirement to exploit this feature.
The OSA-Express features bring the strengths of the Z family, such as security, availability,
and enterprise-wide access to data to the LAN environment. OSA-Express provides
connectivity for the following LAN types:
1000BASE-T Ethernet (10/100/1000 Mbps)
1 Gbps Ethernet
10 Gbps Ethernet
3.2.4 HiperSockets
IBM HiperSockets technology provides seamless network connectivity to consolidate servers
in an advanced infrastructure intra-server network. HiperSockets creates multiple
independent, integrated, virtual LANs within a dingle IBM Z platform.
HiperSockets use is also possible under the IBM z/VM operating system, which enables
establishing internal networks between guest operating systems, such as multiple Linux
servers.
The z/VM virtual switch can transparently bridge a guest virtual machine network connection
on a HiperSockets LAN segment. This bridge allows a single HiperSockets guest virtual
machine network connection to directly communicate with other guest virtual machines on the
virtual switch and with external network hosts through the virtual switch OSA port.
The systems in a Parallel Sysplex configuration are linked and can fully share devices and run
the same applications. This feature enables you to harness the power of multiple IBM Z
platforms as though they are a single logical computing system.
The architecture is centered around the implementation of a coupling facility (CF) that runs
the coupling facility control code (CFCC) and high-speed coupling connections for
intersystem and intra-system communications. The CF provides high-speed data sharing with
data integrity across multiple IBM Z platforms.
Coupling Links connectivity for Parallel Sysplex on z14 use Coupling Express Long Reach
(CE LR), Integrated Coupling Adapter Short Reach (ICA SR), and InfiniBand (IFB)
technology.
The ICA SR supports a cable length of up to 150 meters and supports a link data rate of 8
GBps. The coupling links can be defined as shared between images within a CSS, or
spanned across multiple CSS in IBM Z.
This feature supports a cable length of up to 150 meters using industry standard OM3 50 µm
fiber optic cables. The coupling links can be defined as shared between images within a CSS,
or spanned across multiple CSSs in IBM Z.
SMC allows two peers to send and receive data by using system memory buffers that each
peer allocates for its partner’s use. Two types of SMC protocols are available on the IBM Z
platform, both of which use shared memory architectural concepts, eliminating TCP/IP
processing in the data path, yet preserving TCP/IP quality of service (QoS) for connection
management purposes.
The 10GbE RoCE Express2 and 10GbE RoCE Express features provide the RoCE support
needed for LPAR-to-LPAR communication across Z platforms.
Both SMC protocols use shared memory architectural concepts, eliminating TCP/IP
processing in the data path, yet preserving TCP/IP quality of service (QoS) for connection
management purposes.
Crypto Express6S and Crypto Express5S (and previous generations) are tamper-sensing
and tamper-responding programmable cryptographic features that provide a secure
cryptographic environment. Each adapter contains a tamper-resistant hardware security
module (HSM).
The HSM can be configured as a secure IBM Common Cryptographic Architecture (CCA)
coprocessor, as a secure IBM Enterprise PKCS #11 (EP11) coprocessor, or as an
accelerator.
Each Crypto Express6S and Crypto Express5S feature occupies one I/O slot in the PCIe I/O
drawer. Crypto Express6S is supported on the z14. Crypto Express5S is supported on the
z13 and z13s platforms.
Flash Express implements storage-class memory (SCM) through an internal NAND Flash
solid-state drive (SSD) in a PCIe adapter form factor. Each Flash Express feature is installed
exclusively in the PCIe I/O drawer and occupies one I/O slot.
For availability reasons, the Flash Express feature must be ordered in pairs. A feature pair
provides 1.4 TB of usable storage.
Flash Express storage is allocated to each partition in a manner similar to main memory
allocation. The allocation is specified at the Hardware Management Console (HMC). z/OS
can use the Flash Express feature as SCM for paging store to minimize the supervisor call
(SVC) memory dump duration.
The z/OS paging subsystem supports a mix of Flash Express and external disk storage.
Flash Express can also be used by Coupling Facility images to provide extra capacity for
particular structures (for example, IBM MQ shared queues application structures).
Starting with the release of the z14, IBM has replaced Flash Express with Virtual Flash
Memory (VFM). VFM utilizes HSA type memory instead of a pair of flash cards, and supports
the same use cases as above.
The CSS provides the server communications to external devices through channel
connections. The channels transfer data between main storage and I/O devices or other
servers under the control of a channel program. The CSS allows channel I/O operations to
continue independently of other operations within the central processors (CPs) and Integrated
Facility for Linux processors (IFLs).
An important piece of the I/O z/Architecture is the SAP (System Assist Processor), which is a
PU in charge of guaranteeing the start and end of an I/O operation. It is a unique mainframe
feature that frees many processor cycles during I/O operations.
A single channel subsystem allows the definition of up to 256 channel paths. To overcome
this limit, the multiple channel subsystems concept was introduced. The z14 architecture
provides for six channel subsystems. The structure of the multiple LCSSs provides channel
connectivity to the defined LPARs in a manner that is transparent to subsystems and
application programs. This configuration enables the definition of a balanced configuration for
the processor and I/O capabilities.
The LCSS can have from 1 to 256 channels, and can be configured to 1 to 15 LPARs, except
for CSS 5 which can only be configured with 10 LPARs.
Therefore, the six channel subsystems support a maximum of 85 LPARs. CSSs are
numbered from 0 to 5, which is referred to as the CSS image ID (CSSID 0, 1, 2, 3, 4, or 5).
These CSS are also referred to as logical channel subsystems (LCSS).
CSS elements
The CSS is composed of the following elements:
Subchannels
Channel paths
Channel path identifier
Control units
I/O devices
Multiple subchannel sets, described in “Multiple subchannel sets” on page 109, are available
to increase addressability. Four subchannel sets per CSS are supported on a z14.
Subchannel set 0 can have up to 65280 subchannels, and subchannel sets 1, 2, and 3 can
have up to 65535 subchannels each.
Channel paths
Each CSS can have up to 256 channel paths. A channel path is a single interface between a
server and one or more control units. Commands and data are sent across a channel path to
run I/O requests.
The channel subsystem communicates with I/O devices through channel paths between the
channel subsystem and control units. On IBM Z, a CHPID number is assigned to a physical
location (slot/port) by the client, by using the hardware configuration definition (HCD) tool or
input/output configuration program (IOCP).
Control units
A control unit provides the logical capabilities necessary to operate and control an I/O device.
It adapts the characteristics of each device so that it can respond to the standard form of
control that is provided by the CSS. A control unit can be housed separately, or can be
physically and logically integrated with the I/O device, the channel subsystem, or within the
system itself.
I/O devices
An I/O device provides external storage, a means of communication between data-processing
systems, or a means of communication between a system and its environment. In the
simplest case, an I/O device is attached to one control unit and is accessible through one or
more channel paths.
Subchannel numbers
Subchannel numbers (including their implied path information to a device) are limited to four
hexadecimal digits by the architecture (0x0000 to 0xFFFF). Four hexadecimal digits provide
64 KB addresses, which are known as a subchannel set.
IBM has reserved 256 subchannels, leaving over 63 KB subchannels for general use. Again,
addresses, device numbers, and subchannels are often used as synonyms, although this is
not technically accurate. You might hear or read that there is a maximum of 63.75 KB
addresses or a maximum of 63.75 KB device numbers.
The additional subchannel sets, in effect, add an extra high-order digit (either 0, 1, 2 or 3) to
existing device numbers. For example, you might think of an address as 08000 (subchannel
set 0), 18000 (subchannel set 1), 28000 (subchannel set 2), or 38000 (subchannel set 3).
Adding a digit is not done in system code or in messages because of the architectural
requirement for four-digit addresses (device numbers or subchannels). However, certain
messages contain the subchannel set number. You can mentally use that as a high-order digit
for device numbers. Only a few requirements refer to the subchannel sets 1, 2 and 3because
they are only used for these special devices. JCL, messages, and programs rarely refer
directly to these special devices.
Moving these special devices into an alternate subchannel set creates more space for device
number growth. The appropriate subchannel set number must be included in the input/output
configuration program (IOCP) definitions or in the HCD definitions that produce the
input/output configuration data set (IOCDS). The subchannel set number defaults to zero.
z14
CSS CSS 0 CSS 1
LPAR name LP1 LP2 LP3 LP14 LP15 LP16
LPAR ID 01 03 05 12 13 15
MIF ID 1 3 5 2 3 5
MSS SS 0 SS 1 SS 0 SS 1
CHPID 80 81 90 91 80 81 90 91
Directors 61 62
Control Units
Disk Disk
and Devices
LCUs LCUs
Note: CHPID numbers are arbitrarily selected. For example, we could change CHPID 80
(in either or both CSSs in the illustration) to IBM C3 simply by changing a value in the
IODF/IOCDS.
A CHPID is associated with a PCHID, and PCHID numbers are unique across the server.
Different channels on a single I/O adapter can be used by different LPs (and different
CSS). As shown in Figure 3-1 on page 110, PCHID 0140 is the first channel on the I/O
features and PCHID 0141 is the second channel on the same I/O feature.
Figure 3-1 on page 110 also illustrates the relationship between LPs, CSSs, CHPIDs, and
PCHIDs.
The IOCP statements are typically built using the Hardware Configuration Dialog (HCD). This
interactive dialog is used to generate the input/output definition file (IODF), invoke the IOCP
program, and subsequently build your input/output configuration data set (IOCDS). The
IOCDS is loaded into the Hardware System Area (HSA) and initialized during Power-on
Reset.
The following tools are provided for the client to maintain and optimize the I/O configuration of
your environment.
D isplay C H P ID M atrix
************* SYMBOL EXPLANATIONS**********
ISF031I CONSOLE PATRICK ACTIVATED 00 UNKNOWN
10 OSA EXPRESS
-D M=CHP
11 OSA DIRECT EXPRESS
IEE174I 10.40.41 DISPLAY M 628 12 OPEN SYSTEMS ADAPTER
CHANNEL PATH STATUS 14 OSA CONSOLE
0 1 2 3 4 5 6 7 8 9 A B C D E F 16 CLUSTER BUS SENDER
2 . . . . . . . . . . . . + + + + 17 CLUSTER BUS RECEIVER
3 . . . . + + . . . . . . + + + + 18 INTERNAL COUPLING SENDER
4 . . . . + + . . + + + + . . . . 19 INTERNAL COUPLING RECEIVER
5 . . . . + + . . + + + + . . . . 1A FICON POINT TO POINT
6 . . . . . . . . . . . . + + + + 1B FICON SWITCHED
1D FICON INCOMPLETE
7 . . . . . . . . . . . . + + + +
1E DIRECT SYSTEM DEVICE
A + + + + + + + + . . . . . . . . 1F EMULATED I/O
D . . . . . . . . . . + + . . . . 20 RESERVED
21 INTEGRATED CLUSTER BUS PEER
************** SYMBOL EXPLANATIONS ************** 22 COUPLING FACILITY PEER
+ ONLINE @ PATH NOT VALIDATED - OFFLINE . 23 INTERNAL COUPLING PEER
* MANAGED AND ONLINE # MANAGED AND OFFLINE 24 INTERNAL QUEUED DIRECT COMM
CHANNEL PATH TYPE STATUS 25 FCP CHANNEL
0 1 2 3 4 5 6 7 8 9 A B C D E F 26 COUPLING OVER INFINIBAND
32 UNKNOWN
2 00 00 00 00 00 00 00 00 00 00 00 00 1A 1A 1A 1A
33 COUPLING OVER PCIE
3 00 00 00 00 1A 1A 00 00 00 00 00 00 1A 1A 1A 1A
NA INFORMATION NOT AVAILABLE
4 00 00 00 00 1B 1B 00 00 1B 1B 1B 1B 00 00 00 00
5 00 00 00 00 1B 1B 00 00 1B 1B 1B 1B 00 00 00 00
6 00 00 00 00 00 00 00 00 00 00 00 00 1A 1A 1A 1A
7 00 00 00 00 00 00 00 00 00 00 00 00 1A 1A 1A 1A
A 14 11 11 11 11 11 11 10 00 00 00 00 00 00 00 00 C H P ID 6C = FIC O N C hannel Type
D 00 00 00 00 00 00 00 00 00 00 11 11 00 00 00 00
This z/OS command provides information about the status and type of channels. There are
two parts to the display:
1. The first section displays the channel path status.
The channel path status is relative to the z/OS where the command is issued. That is, a
CHPID may be displayed as being offline, but if this CHPID is shared (MIF) by other logical
partitions, it may not be offline physically.
2. The second section displays the channel path type.
Note: Whereas the first section displays the status of channels available to the z/OS image
only, the second section provides information about all channels installed on the server.
Along these explanations, other IBM virtualization implementations are lightly described, such
as z/VM, PKM for IBM Z, PowerVM®, and VTS.
In the PR/SM part, we cover the concept of a logical partition (LP), and how to define it in a
z14 CPC. Here, we use the acronym PR/SM to describe the Licensed Internal Code (LIC)
software that partitions the CPC, creating the virtualized LPs. LIC means a type of internal
software code not perceived by the customer.
PR/SM allows a system programmer to allocate CPC hardware resources (including PUs,
CPC memory, and I/O channels) among LPs.
Then, in PR/SM mode, the resources of a CPC can be distributed among multiple control
programs (operating systems) that can run on the same CPC simultaneously. Each control
program has the use of resources defined to the LP in which it runs.
Virtualization definitions
Virtualization is a old technique (IBM mainframe have had such since the late sixties) that
creates virtual resources and “maps” them to real resources. In other words, separates
presentation of resources (virtual resources) to users from actual physical resources. In a
sense, virtualization is a synonym for resource sharing and consequently increases the
resource utilization. Virtualization is still a key strength of Z platforms, and useful for cloud
implementation.
A definition: Virtualization is the ability for a computer system to share resources so that
one physical server can act as many virtual servers, or virtual machines (VM). This can
reduce the number of processors and hardware devices needed.
Another virtualization definition can be this one: “Virtualization is the process of presenting
computing resources in ways that users and applications can easily get value out of them,
rather than presenting them in a way dictated by their implementation, geographic
location, or physical packaging.”
Another definition: ”Virtualization is the technique that adds a logical layer for the real
hardware, separating the Virtual Resources from the actual physical resources. This
technique allows a better hardware exploitation, allowing a better sharing and of
resources, reducing considerably the amount equipment. Also, the Virtualization,
increases the agility, flexibility and scalability of system, giving as a result more
productivity, efficiency and responsiveness to the business requirements.”
Virtualization properties
Creating many VMs consisting of virtualized processors, communications, memory, and I/O
devices, can reduce acquisition costs and the overhead of planning, purchasing, and
installing new hardware to support new workloads.
Through the “magic” of virtualization, software running within the virtual machine is unaware
that the “hardware” layer has been virtualized. It believes it is running on its own hardware
separate from any other operating system.
It is important to note (for completeness of the definition) that splitting a single physical entity
into multiple virtual entities is not the only method of virtualization. For example, combining
multiple physical entities to act as a single, larger entity is also a form of virtualization, and
grid computing is an example of this kind of virtualization. The grid virtualizes heterogeneous
and geographically dispersed resources, thus presenting a simpler view.
Then, virtualization aggregates pools of resources for allocation to users as virtual resources
showing the way back to a centralized topology. It is a logical solution for collapsing many
physical PUs into less and more powerful ones.
Virtual Resources
Proxies for real resources: same interfaces/functions, different attributes
May be part of a physical resource or multiple physical resource s
Virtualization
Creates virtual resources and "maps" them to real resources
Primarily accomplished with software and/or firmware
Separates presentation of resources to users from actual resourc es
Aggregates pools of resources for allocation to users as virtual resources
Resources
Components with architected interfaces/functions
May be centralized or distributed. Usually physical
Examples: memory, processors, disk drives, networks...
Virtualization benefits
Summarizing, there are many benefits gained from virtualization, as follows:
Better hardware utilization (85% of server capacity average daily is not used in a
distributed server farm environment). Without virtualization a resource available in one
server cannot be migrated to another server, where such resource is lacking.
Lower power consumption (50% of total data center energy is spent on air conditioning in
a server farm) and dynamic energy optimization use.
Reduced IT management costs. Virtualization can improve staff productivity by reducing
the number of physical resources that must be managed by hiding some of the resource
complexity, simplifying common management tasks through automation, better
information and centralization, and enabling workload management automation.
Consolidation of application and operating system copies. Virtualization enables common
tools to be used across multiple platforms and also enables multiple applications and
operating systems to be supported in one physical system. Virtualization consolidates
servers into virtual machines on either a scale-up or scale-out architecture. It also enables
systems to treat computing resources as a uniform pool that can be allocated to virtual
machines in a controlled manner.
Better SW investment protection.
Simplified Continuous Availability/Disaster Recovery solutions. It also enables physical
resources to be removed, upgraded, or changed without affecting users.
On the other hand, there are a few problems with virtualization, such as:
Increased complexity that may endanger Continuous Availability and problem
determination.
More overhead when using shared resources, due to increasing management.
V V
V V
Storage
SMP Servers Blades Servers and Network Hardware
Storage
In a sense, virtual resources are proxies for real resources (as the ones pictured in
Figure 4-2) with the same architecture interfaces and functions, but different attributes.
Hypervisor types
A hypervisor is a sort of operating system. Its major role is to virtualize physical resources
such as processors, CPC memory and some I/O devices to other operating systems running
underneath. There are two types of hypervisors:
Type-1 hypervisors
Have the function implemented as part of the operating system software or the hardware
firmware on a given hardware platform. When the hypervisor is implemented as part of the
operating system, this operating system providing the hypervisor function is called the “host”
operating system. The operating system running within a virtual machine (VM) created on the
host system is called the “guest” operating system. Figure 4-3 illustrates a Type-1 hypervisor.
The hypervisor box shown in the center of the figure is either the firmware hypervisor
(physical hypervisor) or an operating system (host operating system) with the hypervisor
function built in. The boxes labeled OS are the virtual machines (guest OS). z/VM and PR/SM
are examples of a Type-1 hypervisor. Some other examples are the XEN Open Source
Hypervisor, VMWare ESX Server, Virtual Iron, and ScaleMP.
Type-2 hypervisors
Are the ones running on a host operating system as an application. In this case, the host
operating system refers to the operating system on which the hypervisor application is
running. The host operating system runs directly on the hardware. The operating system
running inside the virtual machine (VM) created using the hypervisor application is the guest
Hypervisor technologies
It is useful to discuss the hypervisor technologies currently used.
IBM Hypervisors on Z
The IBM hypervisors are PR/SM, z/VM, PowerVM, and KVM for Z. Starting with z10 it is not
possible to have a Z CPC without PR/SM being active. Then, z/VM must run under PR/SM,
even though it is defined with just one LP.
The various virtualization options in Z platforms allow you to build flexible virtualized
environments to take advantage even of open source software (such as KVM for IBM Z) or
upgrade to new cloud service offerings, such as infrastructure as a service (IaaS) and
platform as a service (PaaS).
z/VM
Memory
Processors
z/VM provides each user with an individual working environment known as a virtual
machine. The virtual machine simulates the existence of a dedicated real machine, including
processor functions, memory, networking, and input/output (I/O) resources. Operating
systems and application programs can run in virtual machines, as guests. For example, you
can run multiple Linux and z/OS images on the same z/VM hypervisor that is also supporting
various applications and users. As a result, development, testing, and production
environments can share a single physical computer.
A virtual machine uses real hardware resources, but even with dedicated devices (like a tape
drive), the device number of the tape drive may or may not be the same as the real device
number of the tape drive. Therefore, a virtual machine only knows “virtual hardware” that may
or may not exist in the real world. Figure 4-7 shows the layout of the general z/VM
environment.
Through CMS:
z/VM system programmers can manage and control z/VM and its virtual machines.
Programmers can develop application source code.
CP can run an unlimited number of images or virtual machines executing Z operating systems
such as: z/OS, z/TPF, Linux, z/VSE, CMS, or z/VM (forming in this case, a cascade).
CP (as PR/SM) uses the Start Interpretive Instruction (SIE) to associate the physical PU with
the Virtual PU. The virtual PU is the PU as perceived by the operating system running in the
virtual machine.
The major reason currently to have z/VM is to allow an unlimited number of zLinux virtual
machines and to enjoy the user-friendly interfaces of CMS. Observe that the maximum
number of PR/SM logical partitions in a CPC is only 85.
This function uses a hardware interface to access system management z/VM APIs. Some of
the supported operations are:
View z/VM guests
Activate or deactivate z/VM guests
Display guest status configuration
The KVM hypervisor for IBM Z enables you to share real PUs (he IFLs), memory, and I/O
resources through platform virtualization. It can coexist with z/VM virtualization environments,
Linux on IBM Z, z/OS, z/VSE, and z/TPF.
The KVM hypervisor for IBM Z is optimized for scalability, performance, security, and
resiliency, also providing standard Linux and KVM interfaces for simplified operational control.
The KVM hypervisor for Z for the z14 is offered with the following Linux distributions:
SUSE Linux Enterprise Server 12 SP2 (or higher) with service
Canonical Ubuntu 16.04 LTS (or higher) with service.
DPM provides simplified hardware and virtual infrastructure management, including partition
lifecycle and integrated dynamic I/O and PCIe functions management for Linux running under
PR/SM, under KVM for IBM Z, and under z/VM 6.4. Using DPM, an environment can be
created, provisioned, modified without disrupting running workloads, and monitored for
troubleshooting.
Explaining:
The first one is the name of the partition created by the hypervisor.
The second one is the name of the virtualized shared processor.
The third one is the name of the hypervisor itself.
Finally, the name of the mechanism used by the installation to control the amount of
processor capacity delivered to each partition.
z14 CPC
1 to 85 Logical Partitions
z L z L z z C z z
V I / I / / F / /
M N O N O O C O O
U S U S S C S S
X X
Usually the terms LPAR and PR/SM are used interchangeably, both meaning Logical
Partitioning.
The majority of the z/OS code does not interact directly with the LPARs that run in the system.
Each logical PU in an LP is mapped into a State Descriptor Control Block (SDCB). SDCB
contains the last saved state (PSW, registers, CP Timer) of such a logical PU. When ready,
the logical PU has its SDCB in the PR/SM ready queue of such a PU pool type (set of logical
PUs with the same personality, such as zIIP or CP). The ready queue is ordered indirectly by
the LP Weigh value.
PR/SM’s major function is to dispatch its running physical PU into a logical PU of the same
logical pool type (CP, zIIP, IFL, ICF) through the use of the SIE instruction. z/VM Control
Program uses SIE to dispatch a virtual PU in a physical PU.
The logic of SIE is to load the state of logical PU in the physical and forces physical to start
instruction execution from this state.
A logical CP can only be dispatched in a physical CP, and so on. A shared logical PU has
three states:
Active, when running in the physical PU (also true for dedicated logical PU).
Ready (or Suspended), when delayed by the lack of an available physical PU. The SDCB
is in the PR/SM ready queue. There is one ready queue per pool of logical PUs. A
dedicated logical PU never assumes such a state.
Intercept is the separation between logical and physical PU, state of physical PU is saved in
the SDCB (including CP Timer). Reasons for Intercept:
Voluntary wait (z/OS voluntarily gives up the processor by switching the PSW Wait bit 14
to ON), because there is nothing in a "Ready" state in the dispatchable unit access list
(either a task control block or a service request block).
End of an LPAR Time Slice, set by PR/SM to avoid a logical PU’s dominance of the
physical PU. At SIE moment, the LPAR decides dynamically such a value.
16-way
z14 5-LPs
With 10 MVS1 to MVS5
CPs
Dedicated Shared
Physical CPs Physical CPs
CP
CP CP CP CP CP CP CP CP CP
However, with sharing you have the ability of utilize the PU capacity that is not required by
another sharing LP (that is in wait state). Normally, when a z/OS system that is using a
shared PU goes into a wait, PR/SM releases the physical PU, which can then be used by
another logical PU in ready state.
There are a number of parameters, that let you control the distribution of shared PUs between
LPs. It is not possible to have a single LP use both shared and dedicated CPs, with the
exception of the coupling facility LP.
One of the significant reasons for the increased number of LPs is server consolidation, where
different workloads spread across many small machines may be consolidated in LPs of a
larger server. PR/SM also continues to be used to run in different environments, such as
system programmer test, development, quality assurance, and production, all in the same
CPC.
Figure 4-10 on page 131 shows a 10-CP configured IBM z14 CPC. We use the term “physical
CP” to refer to the actual CPs that exist on the CPC. We use the term “logical CP” to refer to
the CPs each operating system perceives in order to dispatch work (TCB/SRB). The number
of logical CPs in an LP must be less than or equal to the number of physical CPs.
As shown in such figure, two CPs are dedicated to LP MVS1. The two dedicated CPs are for
use exclusively by MVS1. For this LP then, the number of physical CPs is equal to the number
of logical CPs.
The remaining eight physical CPs are shared between the LPs: MVS2, MVS3, MVS4, and
MVS5. Each of these LPs can use any of the shared physical CPs, with a maximum at any
one time equal to the number of online logical CPs in that LP. The number of logical CPs per
LP is:
Six logical CPs in LP MVS2
Three logical CPs in LP MVS3
Two logical CPs in LP MVS4
Two logical CPs in LP MVS5
The number of shared physical CPs can be less than the sum of logical shared CPs in all the
LPs. On the other hand, an MVS operator could vary logical CPs online and offline, as if they
were physical CPs. This can be done through the z/OS CONFIG command.
An LP cannot have more logical CPs online than the number defined for the LPs in the HMC.
All the ideas described here apply also to zIIP, not only to CP in one z/OS LP.
Logical CP
ready queue
Physical CPs
1 CP0 CP1 CP2 CP3
MVS2 MVS2 MVS5 MVS2 MVS2
LCP2 2 LCP5 LCP1 LCP0 LCP1
MVS4 3
LCP1
4
MVS2 P R / S M
LCP4
MVS3
LCP0 MVS5 MVS3 MVS4 MVS2
LCP0 LCP1 LCP0 LCP3
MVS3
LCP2 CP7
CP4 CP5 CP6
5
The logical CP is dispatched on the physical CP by copying the LP’s logical CP status (PSW,
registers, CP Timer) from the specific SDCB to the corresponding actual PU entities. This
causes the z/OS code or application code in that LP to execute on the physical CP, through
the logical CP.
Later, when the logical CP is intercepted (the association of physical CP with the logical CP
ends temporarily), the logical CP status is saved in SDCB and the PR/SM is automatically
dispatched on the physical CP again. This PR/SM code, then chooses another logical CP to
dispatch, and the whole process starts all over.
Figure 4-11 illustrates the flow of execution in a CPC, where we have eight shared physical
CPs. Let us examine this now.
Logical CP dispatching
MVS2 has 6 logical CPs, MVS3 has 3 logical CPs, MVS4 has 2 logical CPs, and MVS5 also
has 2 logical CPs. This gives us a total of 13 logical CPs, so PR/SM has up to 13 SDCBs to
manage in our environment.
Logical CP execution
So, how is a logical CP taken from the ready queue to executing on a physical CP? As
mentioned before, PR/SM is responsible for dispatching a logical CP on a physical CP.
PR/SM executes on each physical CP. When a logical CP is found ready, it is dispatched in a
physical CP, by PR/SM issuing a SIE instruction, which switches the PR/SM code from the
physical CP and replaces it with the code that was previously running on the logical CP.
The steps that occur in dispatching a logical CP, illustrated in Figure 4-11 on page 133, are as
follows:
1 - The next logical CP to be dispatched is chosen from the logical CP ready queue based on
its LP weight.
2 - PR/SM dispatches the selected logical CP (LCP5 of MVS2 LP) on a physical CP0.
3 - The z/OS dispatchable unit running on that logical CP (LCP5 of MVS2) begins to execute
on physical CP0. It executes until its time slice (generally between 12.5 and 25 milliseconds)
expires causing an intercept, or it enters a voluntary wait (lack of service in MVS2).
4 - As shown, the logical CP keeps running (for example) until it uses all its time slice. At this
point the logical CP5 (of MVS2) state is saved in its SDCB and control is passed back to
PR/SM LIC, which restarts execution on physical CP0 again.
5 - PR/SM determines why the logical CP was intercepted and requeues the logical CP
accordingly. If it is still ready (no wait), it is requeued on the logical CP ready queue and life
goes on, that is, back to step 1. If no more SDCBs are in the ready queue, PR/SM places the
physical CP in wait state.
CP CP CP CP CP CP CP CP
PR/SM weights
PR/SM weights are used to control the distribution of shared physical PUs between logical
PUs of LPs. Therefore, for LPs with dedicated PUs, PR/SM weights are not assigned.
An LP may use less than the guaranteed amount if it does not have such workload to execute.
Similarly, it can use more than its weight, if the other LPs are not using their guaranteed
amount.
PR/SM LIC uses weights and the number of logical CPs to decide the priority of logical PUs
in the logical PU ready queue. The following formulas are used by PR/SM in the process of
controlling the dispatching of logical PUs:
WEIGHT(LPx)% = 100 * WEIGHT LPx / SUM_of_ACTIVE LPs WEIGHTs
TARGET(LPx) = WEIGHT(LPx)% * (# of NON_DEDICATE_PHYS_PUs)
It is important to remember that all these considerations apply to any type of PU processor, at
a z/OS LP, for zIIP and CP. Therefore, in an PR/SM profile the customer should define the
number of logical zIIPs, together with their respective weights.
PU Capping
PU Capping is process used to artificially limit the PU consumption rate of a specific set of
dispatchable units (as TCBs and SRBs), usually associated with a user transactions
workload.
You may cap the PU resource with the granularity of an LP, or of a WLM service class with a
Parallel Sysplex scope.
Types of capping
We have the following types of capping in the mainframe platform:
PR/SM capping, where the full LP is capped. Refer to “PR/SM capping” on page 137. The
intelligence and the capping executor are located in the PR/SM hypervisor:
– The limit for capping is established by the LP weights (for zIIP and CP) at HMC LPAR
profile.
– The limit of capping is determined by the number of logical PUs, when this number is
less than the number of the non-shared physical PUs (from the same PU pool) in the
CPC.
WLM Resource Group capping, where just a set of WLM service classes (a set of
transactions sharing the same WLM goal) in the Sysplex are capped. The limit is indicated
through a Resource Group construct in the WLM policy. The intelligence and the executor
are located in WLM.
Defined capacity or soft capping, where the full LP is capped. The limit is indicated in
MSU/h in the HMC PR/SM profile. The intelligence is located in WLM and the executor is
in the PR/SM hypervisor.
PR/SM capping
PR/SM capping is a function used to ensure that an LP’s use of the physical PU cannot
exceed the amount specified in its Target(LPx). PR/SM capping is set at the HMC Image
Profile.
Normally, an LP can get more PU resources than the amount guaranteed by its Target(LPx);
in “PR/SM Weights” on page 135, we discuss how to calculate the Target(LPx) value. Usually,
this is a good thing, because if there is spare PU resource that is not required by another LP,
it makes sense to use it for an LP that needs more PU resource. This can happen when the
other LPs are not using all their share or are not active—remember that when an LP is
deactivated, the Target(LPx) of the remaining active LPs is recalculated.
If you want to prevent an LP from ever being able to use more than its Target(LPx), even if
there is spare PU resource available, you would use the PR/SM capping feature. PR/SM
capping is implemented by the PR/SM hypervisor by observing the PU resource consumed
by a logical PU in a capped LP, and acting if the utilization starts to exceed the logical CPs
Target(LCPx).
At frequent intervals (every few seconds or so), PR/SM compares the Target(LCPx) of each
logical PU of a capped LP to the amount of PU resource it has actually consumed over the
last interval. Depending on the result of this comparison, PR/SM LIC decides for what
percentage of the time in the next interval that the logical PU should not be given access to a
physical PU.
70
MVS2 uncapped
60
MVS1 Unc MVS2 weight
50
40 MVS3 capped
30 MVS3 Unc
MVS3 weight
20
10 CPC total utilization
MVS3 cap
0
0 30 60 90 120 150 180 210
Time intervals
The total weight% equals 100%, and therefore each unit of weight is equal to 1% of CPC
resource. For example, a weight of 45 equals 45% of the total CPC.
Each LP’s weight is shown as a separate straight horizontal line in the chart. The total CPC
utilization is shown as a separate line close to the top of the chart.
At one time or another, each LP requires less than its guaranteed share of CP resources. This
spare capacity can be used by either of the uncapped LPs. When an LP uses more than its
weight, its utilization line is above its weight line. This occurs for:
MVS1 from time 60 to 150
MVS2 from time 0 to 30
About WLM RG capping granularity, you can cap a set of service classes within a sysplex.
With PR/SM capping, you are capping the full LP.
A complete explanation of WLM Resource Group capping is beyond the scope of this book.
Refer to ABCs of z/OS System Programming Volume 11, SG24-6327, for more details about
this subject.
70
60
2
defined cap MSUs (1)
MSUs
50
1 actual MSUs (2)
40 long term 4-hours avg (3)
30
3
20
10
LPAR 1 LPAR 2 LPAR 3 1 2 3 4 5 6 7 8
time(hour)
40 MSUs 220 MSUs 150 MSUs
CICS CICS DB2
z/OS z/OS z/OS
Being more specific, we may say that, MSU/h is a WLM metric calculated from the total CPU
time amount consumed in an LP. This calculation is redone every five minutes in a
4-hour-rolling average (a 4HRA). It means that at every five minutes, a current value replaces
the 4-hour average from 5 minutes prior, and a new 4-hour average is recalculated. The
software products are charged using the monthly peak hour of the 4HRA.
An installation can implement soft capping (also called defined capacity) to enforce a limit on
LP consumption, and consequently to save on software fees. For example, if during the
month the 4HRA is 200 MSU/h and for just a few hours is 400 MSU/h, the idea is to cap the
LP at 200 MSU/h level. It is clear that when the LP is capped, you may face performance
problems.
Defined capacity is controlled by WLM and is executed by the PR/SM hypervisor code. As
illustrated in the example in Figure 4-14,
Curve 1 is the defined capacity, set at 40 MSU/hour.
Curve 2 is the hourly 4HRA.
Curve 3 is the capped 4HRA.
Group Capacity
Group Capacity is an extension of Defined Capacity capping, that adds more flexibility.
Instead of Defined Capacity where each LP is capped individually, an installation caps a
group of LPs containing shared logical processors. WLM and PR/SM management balances
capacity (in MSU/h) among groups of LPs on the same CPC.
An installation defines a capacity group by entering the group name and group limit MSU/h
4-hour rolling average value (in HMC). Group capacity works together with
individually-defined capacity LP capping, and an LP can have both a single capacity and a
group capacity.
It is possible to define multiple groups on a CPC, but an LP can only belong to one group. A
capacity group is independent of a Sysplex. That is, in the same group, you can have z/OS
from different sysplexes, but in the same CPC.
Looking at Figure 4-15, we have a CPCD with three LPs, with the following weights 465, 335,
and 200. This CPC has a total capacity of 400 MSU/h according with IBM tables. Then, the
target MSU/h and also the guaranteed consumption for each LP is as follows:
186 MSU/h for LP A
134 MSU/h for LP B
80 MSU/h for LP C
If each LP takes all possible processor resource because the other two are idle, we may
reach the Maximum figures:
LP A can use up to 200 MSU/h, limited by the group capping.
LP B can use up to 200 MSU/h, limited by the group capping.
LP C can use up to 30 MSU/h because an individual software capping (Defined Capacity)
is defined.
If all three LPs want to use as much resource as possible at the same time, we may reach the
protection (guarantee) figures:
LP A gets 99 MSU/h.
LP B gets 71 MSU/h.
LP C gets 30 MSU/h.
The distribution of MSU/h between LP a and LP B is proportional to the Weigh% of each LP.
However, not all applications support data sharing, and there are many applications that have
not been migrated to data sharing for various reasons. For these applications, IBM has
provided Intelligent Resource Director, which basically gives you the ability to move the
resource to where the workload is.
IRD functions
IRD is not actually a product or a system component. Rather, it is three separate but mutually
supportive functions:
WLM PR/SM CP Management - Workload Manager distributes processor resources
across an LP cluster by dynamically adjusting the LPAR weights in response to changing
workload requirements.
Dynamic Channel Path Management (DCM)
Channel Subsystem I/O Priority Queuing (CSS IOPQ) - see “Channel Subsystem I/O
Priority Queuing” on page 142.
For various reasons, the two first IRD functions are not often used by customers. So at this
"ABCs of z/OS System Programming" introductory level, we only cover the last function,
Channel Subsystem I/O Priority Queuing.
Channel Subsystem Priority Queuing is an extension of the existing concept of I/O priority
queuing. Previously, I/O requests were handled by SAP on a first-in, first-out basis.
Sometimes this approach caused high priority work to be delayed behind low-priority work.
This IRD recommended functionality must be explicitly ordered from the HMC hardware
console.
For more information about this topic, refer to z/OS Intelligent Resource Director, SG24-5952.
The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this Redbooks publication.
IBM Redbooks
For information on ordering these publications, see “How to get IBM Redbooks” on page 146.
Note that some of the documents referenced here may be available in softcopy only.
IBM Z Connectivity Handbook, SG24-5444
IBM z14 Technical Introduction, SG24-8450
IBM z14 Technical Guide, SG24-8451
z/OS Intelligent Resource Director, SG24-5952
ABCs of z/OS System Programming Volume 11, SG24-6327
Other publications
These publications are also relevant as further information sources:
z/Architecture Reference Summary, SA22-7871
z/Architecture Principles of Operations, SA22-7832
Installation Manual for Physical Planning, GC28-6965-00a
z/OS Distributed File Service zSeries File System Administration, SC24-5989
OSA-Express Customer's Guide and Reference, SA22-7935
z/OS HCD User's Guide, SC34-2669
z/OS HCD Planning, GA32-0907
Online resources
These websites and URLs are also relevant as further information sources:
Fibre Channel standards
http://www.t10.org
http://www.t11.org
zSeries I/O connectivity
http://www.ibm.com/servers/eserver/zseries/connectivity
Parallel Sysplex
http://www.ibm.com/servers/eserver/zseries/pso
zSeries networking
http://www.ibm.com/servers/eserver/zseries/networking
SG24-6990-05
ISBN 0738443107
Printed in U.S.A.
®
ibm.com/redbooks