SG 247467
SG 247467
SG 247467
Database Partitioning,
Table Partitioning, and
MDC for DB2 9
Differentiating database partitioning,
table partitioning, and MDC
Examining implementation
examples
Discussing best
practices
Whei-Jen Chen
Alain Fisher
Aman Lalla
Andrew D McLauchlan
Doug Agnew
ibm.com/redbooks
SG24-7467-00
Note: Before using this information and the product it supports, read the information in
Notices on page vii.
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
The team that wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Chapter 1. Introduction to partitioning technologies. . . . . . . . . . . . . . . . . . 1
1.1 Databases and partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Database concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Table partitioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Multi-dimensional clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Chapter 2. Benefits and considerations of database partitioning, table
partitioning, and MDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1 Database partitioning feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.1 The benefits of using database partitioning feature . . . . . . . . . . . . . 16
2.1.2 Usage considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Table partitioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.1 Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.2 Usage considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Multi-dimensional clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.1 Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.2 Usage considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Combining usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Chapter 3. Database partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.1.1 Supported operating systems and hardware . . . . . . . . . . . . . . . . . . 36
3.1.2 Minimum memory requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Planning considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.1 Deciding on the number of database partitions. . . . . . . . . . . . . . . . . 38
3.2.2 Logical and physical database partitions . . . . . . . . . . . . . . . . . . . . . 39
3.2.3 Partition groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.4 Distribution maps and distribution keys. . . . . . . . . . . . . . . . . . . . . . . 40
iii
iv
Contents
vi
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area.
Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product, program, or service that
does not infringe any IBM intellectual property right may be used instead. However, it is the user's
responsibility to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document.
The furnishing of this document does not give you any license to these patents. You can send license
inquiries, in writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer
of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may
make improvements and/or changes in the product(s) and/or the program(s) described in this publication at
any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm
the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on
the capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the
sample programs are written. These examples have not been thoroughly tested under all conditions. IBM,
therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
vii
Trademarks
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
Redbooks (logo)
eServer
pSeries
zSeries
AIX
DB2 Connect
DB2
IBM
Lotus
POWER
Redbooks
System p
Tivoli
1-2-3
viii
Preface
As organizations strive to do more with less, DB2 Enterprise Server Edition V9
for Linux, Unix, and Windows contains innovative features for delivering
information on demand and scaling databases to new levels. The table
partitioning, newly introduced in DB2 9, and database partitioning feature provide
scalability, performance, and flexibility for data store. The multi-dimension
clustering table enables rows with similar values across multiple dimensions to
be physically clustered together on disk. This clustering allows for efficient I/O
and provides performance gain for typical analytical queries.
How are these features and functions different? How do you decide which
technique is best for your database needs? Can you use more than one
technique concurrently?
This IBM Redbooks publication addresses these questions and more. Learn
how to set up and administer database partitioning. Explore the table partitioning
function and how you can easily add and remove years of data on your
warehouse. Analyze your data to discern how multi-dimensional clustering can
drastically improve your query performance.
ix
He has worked at IBM for 10 years. His areas of expertise include DB2 problem
determination. He has previously written an IBM Redbooks publication on DB2
Data Links Manager.
Andrew D McLauchlan is a DB2 Database Administrator and Systems
programmer in Australia. He has 29 years of experience in IBM in mainframe
hardware maintenance, manufacturing, Oracle database administration, and
DB2 database administration. He holds a degree in Electrical Engineering from
Swinburne University, Melbourne, Australia. His areas of expertise include DB2,
AIX, Linux, and Z/OS. He is a member of the IBM Global Services Australia
DB2 support group.
Doug Agnew is a DB2 Database Administrator in the United States. He has 34
years of experience in applications development, database design and modeling,
and DB2 administration. He holds a degree in Applied Mathematics from the
University of North Carolina - Charlotte. His areas of expertise include database
administration, data modeling, SQL optimization, and AIX administration. He is
an IBM Certified DBA for DB2 9 on Linux, UNIX, and Windows and is a member
of the DBA Service Center.
Acknowledgement
Thanks to the following people for their contributions to this project:
Liping Zhang
Kevin Beck
Paul McInerney
Sherman Lau
Kelly Schlamb
IBM Software Group, Information Management
Emma Jacobs and Sangam Racherla
International Technical Support Organization, San Jose Center
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about
this book or other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an e-mail to:
[email protected]
Preface
xi
xii
Chapter 1.
Introduction to partitioning
technologies
As databases and data warehouses become larger, organizations have to be
able to manage an increasing amount of stored data. To handle database
growth, Relational Database Management Systems (RDBMS) have to
demonstrate scalable performance as additional computing resources are
applied.
In this IBM Redbooks publication, we introduce three features available in DB2 9
for Linux, UNIX, and Windows to help manage the growth of your data. In this
chapter, we give an overview of the terminologies and concepts that we use
throughout this book. The topics we look at are:
DB2 9 database concepts, including concepts specifically related to the
Database Partitioning Feature (DPF), such as partitions, partition groups,
distribution keys, distribution maps, and the coordinator partition
Table partitioning concepts including table partition, partition key, and roll-in
and roll-out
Multi-dimensional clustering (MDC) concepts, such as block, block index,
dimension, slice, and cell
Although DPF is a licensed feature while table partitioning and MDC are
functions built into the database engine, we refer to all three collectively in this
chapter as partitioning technologies.
Partition
A database partition can be either logical or physical. Logical partitions reside on
the same physical server and can take advantage of symmetric multiprocessor
(SMP) architecture. Having a partitioned database on a single machine with
multiple logical nodes is known as having a shared-everything architecture,
because the partitions use common memory, CPUs, disk controllers, and disks.
Physical partitions consist of two or more physical servers, and the database is
partitioned across these servers. This is known as a shared-nothing architecture,
because each partition has its own memory, CPUs, disk controllers, and disks.
Each partitioned instance is owned by an instance owner and is distinct from
other instances. A DB2 instance is created on any one of the machines in the
configuration, which becomes the primary machine. This primary server is
known as the DB2 instance-owning server, because its disk physically stores the
instance home directory. This instance home directory is exported to the other
servers in the DPF environment. On the other servers, a DB2 instance is
separately created: all using the same characteristics, the same instance name,
the same password, and a shared instance home directory. Each instance can
manage multiple databases; however, a single database can only belong to one
instance. It is possible to have multiple DB2 instances on the same group of
parallel servers.
Figure 1-1 on page 3 shows an environment with four database partitions across
four servers.
Database Partitions
Server 1
Server 2
Server 3
Server 4
DB2
Partition
1
DB2
Partition
2
DB2
Partition
3
DB2
Partition
4
Partition groups
A database partition group is a logical layer that allows the grouping of one or
more partitions. Database partition groups allow you to divide table spaces
across all partitions or a subset of partitions. A database partition can belong to
more than one partition group. When a table is created in a multi-partition group
table space, some of its rows are stored in one partition, and other rows are
stored in other partitions. Usually, a single database partition exists on each
physical node, and the processors on each system are used by the database
manager at each database partition to manage its own part of the total data in
the database. Figure 1-2 on page 4 shows four partitions with two partition
groups. Partition group A spans three of the partitions and partition group B
spans a single partition.
Partition 2
Partition 3
Partition 4
Partition
group A
Partition
group A
Partition
group B
Partition
group A
Tablespace
1
Tablespace
1
Tablespace
2
Tablespace
1
Tablespace
3
Tablespace
3
Tablespace
3
Table space
A table space is the storage area for your database tables. When you issue the
CREATE TABLE command you specify the table space in which the table is
stored. DB2 uses two types of table spaces: System-Managed Space (SMS) and
Database-Managed Space (DMS). SMS table spaces are managed by the
operating system, and DMS table spaces are managed by DB2.
In general, SMS table spaces are better suited for a database that contains many
small tables, because SMS is easier to maintain. With an SMS table space, data,
indexes, and large objects are all stored together in the same table space. DMS
table spaces are better suited for large databases where performance is a key
factor. Containers are added to DMS table spaces in the form of a file or raw
device. DMS table spaces support the separation of data, indexes, and large
objects. In a partitioned database environment, table spaces belong to one
partition group allowing you to specify which partition or partitions the table space
spans.
Container
A container is the physical storage used by the table space. When you define a
table space, you must select the type of table space that you are creating (SMS
or DMS) and the type of storage that you are using (the container). The container
for an SMS table space is defined as a directory and is not pre-allocated;
therefore, data can be added as long as there is enough space available on the
file system. DMS table spaces use either a file or a raw device as a container,
and additional containers can be added to support growth. In addition, DMS
containers can be resized and extended when necessary.
Buffer pools
The buffer pool is an area of memory designed to improve system performance
by allowing DB2 to access data from memory rather than from disk. It is
effectively a cache of data that is contained on disk which means that DB2 does
not have the I/O overhead of reading from disk. Read performance from memory
is far better than from disk.
In a partitioned environment, a buffer pool is created in a partition group so that it
can span all partitions or single partition, depending on how you have set up your
partition groups.
Prefetching
Prefetching is the process by which index and data pages are fetched from disk
and passed to the buffer pools in advance of the index and data pages being
needed by the application. This can improve I/O performance; however, the most
important factors in prefetching performance are the extent size, prefetch size,
and placement of containers on disk.
Extent
An extent is the basic unit for allocations and is a block of pages that is written to
and read from containers by DB2. If you have specified multiple containers for
your table space, the extent size can determine how much data is written to the
container before the next container is used, in effect, striping the data across
multiple containers.
Page
A page is a unit of storage within a table space, index space, or virtual memory.
Pages can be 4 KB, 8 KB, 16 KB, or 32 KB in size. Table spaces can have a
page with any of these sizes. Index space and virtual memory can have a page
size of 4 KB. The page size can determine the maximum size of a table that can
be stored in the table space.
generate the partition map index value. The hash value ranges from 0 to 4095.
The distribution map entry for the index provides the database partition number
for the hashed row. DB2 uses a round-robin algorithm to specify the partition
numbers for multiple-partition database partition groups. There is only one entry
in the array for a database with a single partition, because all the rows of a
database reside in one partition database partition group.
Figure 1-3 shows the partition map for database partition group 1 with partitions
1,2, and 4.
Position 0
Position 4095
1 2 4 1 2 4 1 2 4 . 1 2 4 1
Figure 1-3 Distribution map for partition group 1
Figure 1-4 on page 7 shows how DB2 determines on which partition a given row
is stored. Here, the partition key EMPNO (value 0000011) is hashed to a value 3,
which is used to index to the partition number 1.
Hashing Function
CUSTOMER
0000011
Distribution Map
Index
0 1
10 11 4095
Partitions
1 2
Partition 1
Partition 2
Partition 3
Partition 4
Figure 1-4 DB2 process to identify the partition where the row is to be placed
Coordinator partition
The coordinator partition is the database partition to which a client application or
a user connects. The coordinator partition compiles the query, collects the
results, and then passes the results back to the client, therefore handling the
workload of satisfying client requests. In a DPF environment, any partition can be
the coordinator node.
Catalog partition
The SYSCATSPACE table space contains all the DB2 system catalog
information (metadata) about the database. In a DPF environment,
SYSCATSPACE cannot be partitioned, but must reside in one partition. This is
known as the catalog partition. When creating a database, the partition on which
the CREATE DATABASE command is issued becomes the catalog partition for
the new database. All access to system tables goes through this database
partition.
1.2.1 Concepts
This section introduces some of the table partitioning concepts that we cover in
more depth in this book.
Table partitions
Each partition in your partitioned table is stored in a storage object that is also
referred to as a data partition or a range. Each data partition can be in the same
table space, separate table spaces, or a combination of both. For example, you
can use date for a range, which allows you to group data into partitions by year,
month, or groups of months. Figure 1-5 shows a table partitioned into four data
partition ranges with three months in each range.
Table
Range1
Range2
Range3
Range4
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Partition keys
The value which determines how a partitioned table is divided is the partition key.
A partition key is one or more columns in a table that are used to determine to
which table partitions the rows belong.
1.3.1 Concepts
This section introduces several of the multi-dimensional clustering concepts that
we discuss in further detail in this book.
Block
A block is the smallest allocation unit of an MDC. A block is a consecutive set of
pages on the disk. The block size determines how many pages are in a block. A
block is equivalent to an extent.
Block index
The structure of a block index is almost identical to a regular index. The major
difference is that the leaf pages of a regular index are made up of pointers to
rows, while the leaf pages of a block index contain pointers to extents. Because
each entry of a block index points to an extent, whereas the entry in a regular
index points to a row, a block index is much smaller than a regular index, but it
still points to the same number of rows. In determining access paths for queries,
the optimizer can use block indexes in the same way that it uses regular indexes.
You can use AND and OR for the block indexes with other block indexes. You
can also use AND and OR for block indexes with regular indexes. Block indexes
can also be used to perform reverse scans. Because the block index contains
pointers to extents, not rows, a block index cannot enforce the uniqueness of
rows. For that, a regular index on the column is necessary. Figure 1-6 shows a
regular table with a clustering index and Figure 1-7 on page 11 shows an MDC.
Region
Clustering index
Table
Year
10
Unclustering index
Region
Block index
Block
North
North
South
East
West
2007
2006
2005
2005
2007
Year
Dimension
A dimension is an axis along which data is organized in an MDC table. A
dimension is an ordered set of one or more columns, which you can think of as
one of the clustering keys of the table. Figure 1-8 on page 12 illustrates a
dimension.
11
country
dimension
2002,
Canada,
blue
2002,
Mexico,
blue
2002,
Canada,
yellow
2002,
Canada,
yellow
2002,
Mexico,
yellow
color
dimension
2002,
Mexico,
yellow
2001,
Canada,
yellow
2001,
Mexico,
yellow
year
dimension
Slice
A slice is the portion of the table that contains all the rows that have a specific
value for one of the dimensions. Figure 1-9 on page 13 shows the Canada slice
for the country dimension.
12
Cell
A cell is the portion of the table that contains rows having the same unique set of
dimension values. The cell is the intersection of the slices from each of the
dimensions. The size of each cell is at least one block (extent) and possibly
more. Figure 1-10 on page 14 shows the cell for year 2002, country Canada, and
color yellow.
13
Cell for
(2002,
Canada,
yellow)
country
dimension
2002,
Canada,
blue
2002,
Mexico,
blue
2002,
2002,
Canada,
Canada,
yellow
yellow
2002,
Mexico,
yellow
2002,
2002,
Canada,
Canada,
yellow
yellow
2001,
1998,
Mexico,
Canada,
yellow
yellow
Each cell
contains
one or more
blocks
2001,
1998,
Mexico,
Canada,
yellow
yellow
2002,
Mexico,
yellow
color
dimension
year
dimension
Figure 1-10 The cell for dimension values (2002, Canada, and yellow)
You can read more information about the topics that we discuss in this chapter in
the DB2 9 Information Center at:
http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp
14
Chapter 2.
15
CPU
CPU
CPU
Memory
Partition
Disk
Disk
Disk
16
High-Speed Network
Machine
Machine
Machine
CPU
CPU
CPU
Memory
Memory
Memory
Partition
Disk
Disk
Disk
Machine
CPU
CPU
Machine
CPU
CPU
Memory
CPU
CPU
Memory
Partition
Disk
Chapter 2. Benefits and considerations of database partitioning, table partitioning, and MDC
17
Scalability
Scalability refers to the ability of a database to grow while maintaining operating
and response time characteristics. As a database grows, scalability in a DPF
environment can be achieved by either scaling up or scaling out. Scaling up
refers to growing by adding more resources to a physical machine. Scaling out
refers to growing by adding additional physical machines.
Scale up a DPF-enabled database by adding more physical resources, such as
CPUs, disks, and memory, to the physical machine.
Scale out a DPF-enabled database by adding a physical machine to the
configuration. This implies adding another physical database partition on the
physical machine. The database can then be spread across these new physical
partitions to take advantage of the new configuration.
When planning a DPF environment, consider whether to add logical or physical
database partitions to provide scalability.
Logical database partitions differ from physical database partitions in that logical
database partitions share CPUs, memory, and the disk in an SMP configuration.
A physical database partition on a physical machine does not share CPU, disks,
or memory with other physical database partitions in an MPP configuration.
Consider adding logical database partitions if the machine has multiple CPUs
that can be shared by the logical database partitions. Ensure that there is
sufficient memory and disk space for each logical database partition.
Consider adding physical database partitions to a database environment when a
physical machine does not have the capacity to have a logical database partition
on the same machine. Capacity refers to the number of users and applications
that can concurrently access the database. Capacity is usually determined by the
amount of CPU, memory, and disk available on the machine.
Parallelism
DB2 supports query, utility, and input/output (I/O) parallelism. Parallelism in a
database can dramatically improve performance. DB2 supports intrapartition and
interpartition parallelism.
18
Query optimization
DB2 9s cost-based query optimizer is DPF-aware. This implies that the query
optimizer uses the system configuration, the database configuration, and the
statistics stored in the system catalogs to generate the most efficient access plan
to satisfy SQL queries across multiple database partitions.
Chapter 2. Benefits and considerations of database partitioning, table partitioning, and MDC
19
Database administration
In general, administering a multi-partition database is similar to a single partition
database with a few additional considerations.
Disk
When laying out the physical database on disk, the workload of the database
must be taken into consideration. In general, workloads can be divided into two
broad categories: online transaction processing (OLTP) and Decision Support
Systems (DSS). For OLTP-type workloads, the database needs to support a
great deal of concurrent activity, such as inserts, updates, and deletes. Disks
must be laid out to support this concurrent activity. For DSS-type workloads, the
database must support a small number of large and complex queries as
compared to transactions reading records and many sequential I/O operations.
In a DPF-enabled database environment, each database partition has a set of
transaction logs. Our recommendation is to place the transaction log files on
separate physical disks from the data. This is especially true for an OLTP
environment where transaction logging can be intensive. Each database
partitions log files must be managed independently of other database partitions.
Memory
The main memory allocation in a DB2 database is for the database buffer pool.
In a DPF-enabled database with multiple database partitions, the buffer pools
are allocated per partition. In a DSS environment, most of the memory must be
allocated for buffer pools and sorting. Decisions have to be made as to the
number of buffer pools to create. A buffer pool is required for each page size that
is used in the database. Having properly tuned buffer pools is important for the
performance of the database, because the buffer pools are primarily a cache for
I/O operations.
Database design
When designing a DPF-enabled database, the database administrator needs to:
Recovery
In a DPF-enabled environment, each database partition must be backed up for
recovery purposes. This includes archiving the transaction log files for each
partition if archival logging is used. The granularity of the backups determines the
recovery time. Decisions must be made about the frequency of the backups, the
20
type of backups (full offline or online, incremental backups, table space backups,
flash copies, and so forth). In large DSS environments, high availability solutions
might need to be considered as well.
Database configuration
In a DPF-enabled environment, each database partition has a database
configuration file. Take this into consideration when you make changes on a
database partition that might need to be made on other database partitions as
well. For example, the LOGPATH needs to be set on each database partition.
System resources
System resources can be broadly divided into CPU, memory, disk, and network
resources. When planning to implement a database with multiple partitions on
the same physical machine, we recommend that the number of database
partitions does not exceed the number of CPUs on the machine. For example, if
the server has four CPUs, there must not be more than four database partitions
on the server. This is to ensure that each database partition has sufficient CPU
resources. Having multiple CPUs per database partition is beneficial whereas
having fewer CPUs per database partition can lead to CPU bottlenecks.
Each database partition requires its own memory resources. By default on
DB2 9, shared memory is used for the communication between logical database
partitions on the same physical machine. This is to improve communications
performance between logical database partitions. For multiple database
partitions, sufficient physical memory must be available to prevent memory
bottlenecks, such as the overcommitment of memory.
Designing the physical layout of the database on disk is critical to the I/O
throughput of the database. In general, table space containers need to be placed
across as many physical disks as possible to gain the benefits of parallelism.
Each database partition has its own transaction logs, and these transaction logs
need to be placed on disks that are separate from the data to prevent I/O
bottlenecks. In a DPF configuration, data is not shared between database
partitions.
Chapter 2. Benefits and considerations of database partitioning, table partitioning, and MDC
21
2.2.1 Benefits
Several of the benefits of table partitioning are:
Query performance
Query performance is improved because the optimizer is aware of data partitions
and therefore scans only partitions relevant to the query. This also presumes that
the partition key is appropriate for the SQL. This is represented in Figure 2-4.
Non-partitioned
Partitioned Table
Table 1
Scans required partitions
Table 1
Scans all
data
2000 - 2008
2001
2000
2002
2000 - 2008
SELECT ...
FROM Table 1
WHERE DATE >= 2001
AND DATE <= 2002
SELECT ...
FROM Table 1
WHERE DATE >= 2001
AND DATE <= 2002
22
2003
...
Query simplification
In the past, if the table had grown to its limit, a view was required across the
original and subsequent tables to allow a full view of all the data. Table
partitioning negates the need for a UNION ALL view of multiple tables due to the
limits of standard tables. The use of the view means that the view has to be
dropped or created each time that a new table was added or deleted.
A depiction of a view over multiple tables to increase table size is in Figure 2-5 on
page 24. If you modify the view (VIEW1), the view must be dropped and
recreated with tables added or removed as required. During this period of
Chapter 2. Benefits and considerations of database partitioning, table partitioning, and MDC
23
modification, the view is unavailable. After you have recreated the view,
authorizations must again be made to match the previous authorizations.
View 1
Table
1
Table
2
Table
3
You can achieve the same result by using table partitioning as shown in
Figure 2-6 on page 25. In the case of the partitioned table, a partition can be
detached, attached, or added with minimal impact.
24
Table 1
Table space1
Partition
1
Table space 2
Partition
2
Table space 3
Partition
3
Locking
When attaching or detaching a partition, applications need to acquire appropriate
locks on the partitioned table and the associated table. The application needs to
obtain the table lock first and then acquire data partition locks as dictated by the
data accessed. Access methods and isolation levels might require locking of
data partitions that are not in the result set. When these data partition locks are
acquired, they might be held as long as the table lock. For example, a cursor
stability (CS) scan over an index might keep the locks on previously accessed
data partitions to reduce the costs of reacquiring the data partition lock if that
data partition is referenced in subsequent keys. The data partition lock also
carries the cost of ensuring access to the table spaces. For non-partitioned
tables, table space access is handled by the table lock. Therefore, data partition
locking occurs even if there is an exclusive or share lock at the table level for a
partitioned table.
Finer granularity allows one transaction to have exclusive access to a particular
data partition and avoid row locking while other transactions are able to access
other data partitions. This can be a result of the plan chosen for a mass update or
Chapter 2. Benefits and considerations of database partitioning, table partitioning, and MDC
25
due to an escalation of locks to the data partition level. The table lock for many
access methods is normally an intent lock, even if the data partitions are locked
in share or exclusive. This allows for increased concurrency. However, if
non-intent locks are required at the data partition level and the plan indicates that
all data partitions can be accessed, a non-intent lock might be chosen at the
table level to prevent deadlocks between data partition locks from concurrent
transactions.
Administration
There is additional administration required above that of a single non-partitioned
table. The range partitioning clauses and the table space options for table
partitions, indexes, or large object placement must be considered when creating
a table. When attaching a data partition, you must manage the content of the
incoming table so that it complies with the range specifications. If data exists in
the incoming table that does not fit within the range boundary, you receive an
error message to that effect.
System resources
A partitioned table usually has more than one partition. We advise you always to
carefully consider the table space usage and file system structures in this
situation. We discuss optimizing the layout of table spaces and the underlying
disk structure in Chapter 4, Table partitioning on page 125.
Replication
In DB2 9.1, replication does not yet support source tables that are partitioned by
range (using the PARTITION BY clause of the CREATE TABLE statement).
26
2.3.1 Benefits
The primary benefit of MDC is query performance. In addition, the benefits
include:
Reduced logging
Reduced table maintenance
Reduced application dependence on clustering index structure
Query performance
MDC tables can show significant performance improvements for queries using
the dimension columns in the WHERE, GROUP BY, and ORDER BY clauses.
The improvement is influenced by a number of factors:
Dimension columns are usually not row-level index columns.
Because dimension columns must be columns with low cardinality, they are
not the columns that are normally placed at the start of a row-level index.
They might not even be included in row-level indexes. This means that the
predicates on these columns are index-SARGable (compared after the index
record had been read but before the data row is read) at best, and probably
only data-SARGable (compared after the data is read). With the dimension
block indexes, however, the dimension columns are resolved as index
start-stop keys, which means the blocks are selected before any
index-SARGable or data-SARGable processing occurs.
Dimension block indexes are smaller than their equivalent row-level indexes.
Row-level index entries contain a key value and the list of individual rows that
have that key value. Dimension block index entries contain a key value and a
list of blocks (extents) where all the rows in the extent contain that key value.
Block index scans provide clustered data access and block prefetching.
Because the data is clustered by the dimension column values, all the rows in
the fetched blocks pass the predicates for the dimension columns. This
means that, compared to a non-clustered table, a higher percentage of the
rows in the fetched blocks are selected. Fewer blocks have to be read to find
all qualifying rows. This means less I/O, which translates into improved query
performance.
Chapter 2. Benefits and considerations of database partitioning, table partitioning, and MDC
27
Reduced logging
There is no update of the dimension block indexes on a table insert unless there
is no space available in a block with the necessary dimension column values.
This results in fewer log entries than when the table has row-level indexes, which
are updated on every insert.
When performing a mass delete on an MDC table by specifying a WHERE
clause containing only dimension columns (resulting in the deletion of entire
cells), less data is logged than on a non-MDC table because only a few bytes in
the blocks of each deleted cell are updated. Logging individual deleted rows in
this case is unnecessary.
28
currently resides and placed in a suitable cell. This might involve creating a
new cell and updating the dimension indexes.
Is sufficient database space available?
An MDC table takes more space than the same table without MDC. In
addition, new (dimension) indexes are created.
Is adequate design time available?
To design an MDC table properly, you must analyze the SQL that is used to
query the table. Improper design leads to tables with large percentages of
wasted space, resulting in much larger space requirements.
Chapter 2. Benefits and considerations of database partitioning, table partitioning, and MDC
29
Query
Query
30
P1
Query
Query
P2
Query
Query
P3
Query
Query
Chapter 2. Benefits and considerations of database partitioning, table partitioning, and MDC
31
Query
Query
Query
Query
P1
P2
Query
Query
P3
Jan
Feb
Mar
32
Query
Query
Query
Query
P1
P2
Query
Query
P3
Jan
Feb
Mar
Chapter 2. Benefits and considerations of database partitioning, table partitioning, and MDC
33
34
Chapter 3.
Database partitioning
In this chapter, we describe the Database Partitioning Feature (DPF) of DB2. We
begin by discussing the hardware, software, and memory requirements. We
follow with a discussion about planning and implementing a database partitioning
environment. Finally, we discuss the administrative aspects of a partitioned
environment and best practices.
In this chapter, we discuss the following subjects:
Requirements
Planning considerations
Implementing DPF on UNIX and Linux
Implementing DPF on Windows
Administration and management
Using Materialized Query Tables to speed up performance in a DPF
environment
Best practices
35
3.1 Requirements
This section describes the operating systems and hardware supported by DB2 9
together with the minimum memory requirements for running DB2.
Hardware
AIX
eServer pSeries
IBM System p
11iv2 (11.23.0505)
PA-RISC (PA-8x00)-based HP
9000, Series 700, and Series
800 systems
Itanium-based HP Integrity
Series systems
Linux
x86
x86
x86_64
IA64
PPC 64 (POWER)
s390x (zSeries)
36
UltraSPARC
10
UltraSPARC or x86-64
(EM64T or AMD64)
Operating
System
Hardware
Windows
37
38
39
The IBMTEMPGROUP partition group spans all the database partitions and
contains the TEMPSPACE1 table space. The IBMTEMPGROUP partition group
is a system object and cannot be altered or dropped.
In general, when designing the database, place large tables in partition groups
that span all or most of the partitions in order to take advantage of the underlying
hardware and the distribution of the data on each partition. Place small tables in
a partition group that spans one database partition, except when you want to
take advantage of collocation with a larger table, which requires both tables to be
in the same partition group.
Consider separating OLPT-type workload data into partition groups that span
one partition. Place DSS-type workload data across multiple partitions.
40
Containers store the data from table spaces onto disk. If there are many
available disks on the system, consider creating each table space container on a
single disk; otherwise, spread the containers across all the disks. In general, the
more disks per table space, the better the performance.
If a Database-Managed Space (DMS) table space is defined by using device
containers, it does not use the operating system file system caching. File
containers for DMS table spaces use file system caching. On AIX, in either case,
it can be beneficial to tune down the file system caching to avoid double buffering
of data in the file system cache and the buffer pool.
The general rules to optimize table scans are:
Without RAID:
Extent size needs to be any of the available sizes: 32, 64, 128 and so on.
It is best to be in at least the 128K or 256K range.
Prefetch size must be the number of containers (that is, the number of
disks) multiplied by the extent size.
With RAID:
Extent size must be a multiple of the RAID stripe size.
Prefetch size needs to be the number of disk multiplied by the RAID stripe
size and a multiple of the extent size.
This can be limiting if the number of disks is a prime number. In this case,
your only choices are extent size = RAID stripe size or extent size =
(number of disks x RAID stripe size), which can easily be too big.
41
42
43
44
Broadcast joins
In this join strategy, each row of one table is broadcast to all partitions of the
other table to complete the join. If collocated or directed joins cannot be used
by the query optimizer, broadcast joins are considered by the query optimizer.
Partition 1
Partition 2
Parition 3
IBMCATGROUP
SYSCATSPACE
IBMDEFAULTGROUP
USERSPACE1
IBMTEMPGROUP
TEMPSPACE1
pg1
tbsp1
pg23
tbsp23
pg123
tbsp123
We have a single SMP machine with four logical database partitions. We create
six database partition groups:
IBMCATGROUP, IBMDEFAULTGROUP, and IBMTEMPGROUP are
DB2-created partition groups when the database is created.
The partition groups PG1, PG23, and PG123 are user-defined.
45
46
0
1
2
3
Clyde
Clyde
Clyde
Clyde
0
1
2
3
47
In our example in Example 3-2 on page 47, we have defined four partitions on
the same host, which indicates that these are logical database partitions. The
netname and resourcesetname are optional. If we wanted two physical partitions,
the db2nodes.cfg file looks similar to Example 3-3.
Example 3-3 The db2nodes.cfg file for two physical database partitions
0 ServerA 0
1 ServerB 0
Note: After you create a database, do not update the db2nodes.cfg file
manually to add or remove servers. Manually updating this file might result in
unexpected errors. Use the ADD/DROP DBPARTITIONNUM statement
instead.
You can obtain more information about the contents of the db2nodes.cfg file at:
http://publib.boulder.ibm.com/infocenter/db2luw/v9/topic/com.ibm.db2.ud
b.uprun.doc/doc/r0006351.htm
Services file
During the instance creation, a number of ports, which are equal to the number of
logical nodes that the instance is capable of supporting, are reserved in the
services file.
The ports that are reserved in the services file are used by the DB2 Fast
Communication Manager. The reserved ports have the following format:
DB2_InstanceName
DB2_InstanceName_1
DB2_InstanceName_2
DB2_InstanceName_END
The only mandatory entries are the beginning (DB2_InstanceName) and ending
(DB2_InstanceName_END) ports. The other entries are reserved in the services
file so that other applications do not use these ports.
48
When you install the instance-owning database partition server on the primary
computer, DB2 sets up communication ports. The default range is four ports. The
DB2 Setup wizard attempts to reserve an identical port range when database
partition servers are installed on participating computers. If the port has been
used, the Setup wizard uses the first available port range after 60000. However,
if the port numbers in the participating servers do not match the ports in the
Instance-owning server, you receive communication errors when you start DB2.
On UNIX, the services file is located in the /etc. directory. On Windows, the
services file is located in <drive>:\WINDOWS\system32\drivers\etc directory. An
extract of the services file for our instance in our test environment is shown in
Example 3-4.
Example 3-4 Extract of a services file
DB2_db2inst1
60004/tcp
DB2_db2inst1_1 60005/tcp
DB2_db2inst1_2 60006/tcp
DB2_db2inst1_END
60009/tcp
Hosts file
Another important consideration is the correct definition of the hosts file. The
hosts file is located in the same directory as the services file. In the hosts file, you
must have an entry defining the IP address, server name, and domain name (in a
Windows environment) for each participating server. This ensures the correct
configuration of the additional physical partitions at setup time. Example 3-5
shows a sample hosts file with two entries: one entry corresponds to the
partition-owning server and the other entry corresponds to an additional physical
partition.
Example 3-5 Sample hosts file
#
#
#
#
Internet Address
192.9.200.1
128.100.0.1
10.2.0.2
127.0.0.1
9.43.86.56
...
Hostname
net0sample
token0sample
x25sample
#
#
#
#
Comments
ethernet name/address
token ring name/address
x.25 name/address
loopback localhost
Clyde.itsosj.sanjose.ibm.com
# loopback (lo0)
.rhosts files
In a DPF environment, each database partition server must have the authority to
perform remote commands on all the other database partition servers
participating in an instance. You grant this authority by creating and updating the
49
.rhosts file in the instance owners home directory. The .rhosts file has the
following format:
hostname
instance_owner_user_name
In our test environment we created our .rhosts file as shown in Example 3-6.
Example 3-6 Contents of our .rhosts file
Clyde db2inst1
50
=
=
=
=
=
TESTDB
TESTDB
/home/db2inst1
b.00
= Indirect
= 0
=
=
The catalog partition can be created on any database partition and does not
have to be database partition 0.
export DB2NODE=2
db2 terminate
DB20000I The TERMINATE command completed successfully.
Example 3-9 demonstrates switching to partition 2 in our test environment. To
verify that we are on partition 2, we use the VALUES statement with the
CURRENT DBPARTITIONNUM special register as shown in Example 3-10.
Example 3-10 Determining the current database partition
51
You can obtain more information about the back-end process and its relationship
to the Command Line Processor (CLP) at:
http://publib.boulder.ibm.com/infocenter/db2luw/v9/topic/com.ibm.db2.ud
b.admin.doc/doc/r0010412.htm
When issuing DB2 commands to the database using the CLP, be careful to
ensure that you issue the command on the intended database partition.
0
1
2
3
4
52
Clyde
Clyde
Clyde
Clyde
Clyde
0
1
2
3
4
=
=
=
=
=
1
TEMPSPACE1
System managed space
System Temporary data
0x0000
53
In our example, we want to drop database partition number 4. Take the steps
shown in Example 3-15 to verify if the partition can be removed.
Example 3-15 Verifying if a database partition can be dropped
export DB2NODE=4
db2 TERMINATE
db2 DROP DBPARTITIONNUM VERIFY
SQL6034W Node "4" is not being used by any databases.
We can now proceed to drop the partition by using the db2stop... DROP
DBPARTITIONNUM... command as shown in Example 3-16.
Example 3-16 Dropping a database partition
export DB2NODE=3
db2 terminate
DB20000I
We have to redistribute the data from partition 3 first, before we can drop it.
A partition group can also be dropped by using the ALTER DATABASE
PARTITION GROUP statement.
54
55
Notice that the IBMTEMPGROUP, which is created by default, does not appear
in the listing.
56
1 2 3
1 2 3
1 2 3
1 2 3
....
....
....
1
1
1
1
2
2
2
2
3
3
3
3
1
1
1
1
2
2
2
2
3
3
3
3
1
1
1
1
2
2
2
2
3
3
3
3
1
1
1
1
2
2
2
2
3
3
3
3
1
1
1
1
2
2
2
2
3
3
3
3
1
1
1
1
2
2
2
2
3
3
3
3
1
1
1
1
2
2
2
2
3
3
3
3
1
1
1
1
2
2
2
2
3
3
3
3
1
1
1
1
2
2
2
2
3
3
3
3
57
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
.....
.....
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
You can obtain more information about the dbgpmap utility at:
http://publib.boulder.ibm.com/infocenter/db2luw/v9/topic/com.ibm.db2.ud
b.admin.doc/doc/r0011837.htm
58
has been added to the PG123 partition group by listing the database partition
groups as shown in Example 3-26.
Example 3-26 Viewing database partition groups
59
After the data has been redistributed successfully, the IN_USE flag on the
LIST DATABASE PARTITION GROUP SHOW DETAIL statement for partition
group PG123 on database partition 4 is set to A.
We can now proceed to drop database partition 4 from database partition group
PG123 as shown in Example 3-29.
Example 3-29 Dropping a database partition group
60
SQL6076W Warning! This command will remove all database files on the
node for this instance. Before continuing, ensure that there is no
user data on this node by running the DROP NODE VERIFY command.
Do you want to continue ? (y/n)y
61
62
63
64
Tablespace ID
Name
Type
Contents
=
=
=
=
0
SYSCATSPACE
Database managed space
All permanent data. Regular
table space.
= 0x0000
State
Detailed explanation:
Normal
Total pages
Useable pages
Used pages
Free pages
High water mark (pages)
Page size (bytes)
Extent size (pages)
Prefetch size (pages)
Number of containers
=
=
=
=
=
=
=
=
=
16384
16380
11260
5120
11260
4096
4
4
1
65
66
Creating tables
Create tables by using the CREATE TABLE statement. You can store table data,
indexes, and long column data in the same table space or separate them by
using the IN, INDEXES IN, and LONG IN options. In a DPF-enabled
environment, you can specify the distribution key for the table by using the
DISTRIBUTE BY clause. One of the tables in our test database is the LINEITEM
table and it was created in table space TBSP123 with L_ORDERKEY as the
distribution key. Example 3-41 shows the DDL that we used to create the
LINEITEM table.
Example 3-41 DDL to create the lineitem table
Altering tables
You can alter tables by using the ALTER TABLE statement.
Several characteristics of a table that you can alter by using the ALTER TABLE
statement are:
ADD or ALTER columns
ADD, ATTACH, or DETACH table partitions
ALTER foreign keys or check constraints
67
Renaming tables
You can rename tables by using the RENAME TABLE statement. We can
rename the LINEITEM table in our test database as shown in Example 3-42.
Example 3-42 Renaming a table using the RENAME TABLE statement
Dropping tables
You can drop tables by using the DROP TABLE statement. We can drop the
name table in our test database as shown in Example 3-43.
Example 3-43 Dropping a table
Viewing tables
View tables in a database by using the LIST TABLES command. In a
DPF-enabled environment, issue the command from any database partition. In
our test example, we can view all the tables for the db2inst1 schema as shown
in Example 3-44.
Example 3-44 Viewing a table with the LIST TABLES command
68
NATION
2007-04-09-16.14.34.273300
ORDERS
2007-04-09-16.14.55.457627
PART
2007-04-09-16.14.45.529928
PARTSUPP
2007-04-09-16.14.50.492779
REGION
2007-04-09-16.14.43.237286
REGION_MQTR
2007-04-18-10.38.51.713419
SUPPLIER
2007-04-09-16.14.47.736387
DB2INST1
DB2INST1
DB2INST1
DB2INST1
DB2INST1
DB2INST1
DB2INST1
8 record(s) selected.
You can obtain more details about the LIST TABLES command at:
http://publib.boulder.ibm.com/infocenter/db2luw/v9/topic/com.ibm.db2.ud
b.admin.doc/doc/r0001967.htm
Substitute the appropriate values for schema.table and the distribution key
column. In our test environment, we can determine the distribution of data in the
LINEITEM table as shown in Example 3-45 on page 70. The first column is the
database partition number and the second column is the number of rows for the
LINEITEM table on a database partition. In our example, the LINEITEM table is
fairly evenly distributed.
69
Example 3-45 Viewing the current data distribution of a table using SQL
70
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
71
2. Run through the installation, selecting options for typical or custom install,
response file creation, and installation path. The wizard prompts you to
specify the user ID and password for the DB2 Administration server (DAS).
The DAS user performs administrative and configuration tasks locally or
remotely on this server and other partitioned database servers. It is important
to have this user under a domain users group to grant access to all
participating database partitions. The account used by the DAS must have
the following advanced user rights:
The DAS account is granted these user rights if the account already exists or
if the installation process creates the account.
72
Note: You must ensure that you install DB2 on the same drive on each
participating server. For example, do not install DB2 on the C: drive of the
instance owning database server, on the D: drive of a database partition
server, or on the J: drive of another database partition server. If you install
DB2 on the C: drive of the instance owning database server, install DB2 on
the C: drive of any participating database partition servers.
3. In the Set up a DB2 instance window, specify whether to create a new
default DB2 instance or join an existing partitioned environment. Because this
server is our instance owning partition server, we select Create the default
DB2 instance as shown in Figure 3-4.
4. In the Set up partitioning options for the default DB2 instance window, select
the type of instance to create. Because we are setting up a multi-partition
environment, we select Multiple-partition instance from the options. See
Figure 3-5 on page 74. Here, we can also specify the number of logical
partitions.
73
5. In the Configure DB2 instances step (Figure 3-6 on page 75), specify the
communications options for the instance. Leaving these as defaults is
sufficient.
74
6. In the Set user information for the default DB2 instance window (Figure 3-7
on page 76), specify an account that the instance uses to start. You can
define a user before starting the installation, or you can have the DB2 Setup
wizard create a new domain user for you. If you want to create a new domain
user by using the DB2 Setup wizard, the account used to perform the
installation must have the authority to create domain users. The instance user
domain account must belong to the local Administrators group on all the
participating servers and is granted the following user rights:
75
7. The next two windows give you options for setting up the tools catalog and
notification options for e-mail and pager notifications.
8. Next, we are required to specify the operating system security options. This
allows the DB2 binaries to be secured by using NTFS permissions. We
elected to enable OS security.
After this step is complete, you see a summary and the installation completes.
The installation process updates the services file and the db2nodes.cfg file.
Now that we have our instance owning partition setup, we can proceed to set
up an additional physical partition.
76
2. The next window allows us to add the new partition to an existing partition
server environment. The first step is to select the button next to the
instance-owning server field. This allows you to select the partition server
environment that you want to join as shown in Figure 3-9 on page 78. Select
the instance owning server from the displayed list of servers in the domain
and select OK. In the Add a new database partition server window, specify
the user account that the new partition will use. It needs to be the same
domain account used when the instance-owning partition server was set up.
Figure 3-10 on page 78 shows a summary of the options that we have
selected.
77
3. The next window gives us the option of setting up the operating system
security.
4. The final step, when the installation completes, is to start the new partition.
From the instance owning partition, issue a db2stop. You notice that only one
partition is stopped (the partition on the instance owning server). This is
78
because the new partition is not available until DB2 has been restarted as
shown in Example 3-47.
Example 3-47 Stopping and starting db2
C:\>db2stop
04/05/2007 10:50:13 0 0 SQL1064N DB2STOP processing was successful.
SQL1064N DB2STOP processing was successful.
C:\>db2start
04/05/2007 10:50:19 0 0
SQL1063N DB2START processing was successful.
04/05/2007 10:50:37 1 0
SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
/u
/r
db2ncrt
The db2ncrt command allows you to add new logical or physical partitions. The
command creates the necessary Windows service for the new partition and
updates the db2nodes.cfg file on the instance owning machine. After the db2ncrt
command is issued, you must recycle the instance with the db2stop and
db2start commands in order for the new partition to become active. When you
have issued the db2stop, the system only stops the current configured instance,
79
at which point the db2nodes.cfg file on the instance owner is updated with
information for DB2 to communicate with the new partition.
Note: Only use db2ncrt if no databases exist in the instance. If you already
have an instance in which a database has already been created, you must
always use the db2start add dbpartitionnum command to create additional
partitions and redistribute your data.
For example, you have a server (server_a) with a single partition instance
configured (DB2). You want to add an additional physical partition server
(server_b). On server_b, you execute the command in Example 3-49.
Example 3-49 db2ncrt command example
/n
/u
/i
/m
/p
Is the logical port used for the database partition server. If it is the
first partition on this server, it must start with port number 0.
The port number cannot exceed the port range reserved for FCM
communications in the x:\windows\system32\drivers\etc\ services.
For example in our case, a range of 4 ports is reserved for the
current instance; therefore, the maximum port number is 3.
/o
db2nchg
The db2nchg command allows you to change the configuration of a database
partition. Options include: selecting a different logical port number or a different
network name for the database partition, changing the TCP/IP host name of the
machine, and moving the database partition from one machine to another. Only
use this command when the database partition is stopped.
80
db2ndrop
The db2ndrop command drops a partition from an instance that has no
databases. As with the db2nchg command, only use db2ndrop if the database
partition is stopped.
If we want to drop partition 1, we issue the command shown in Example 3-50.
Example 3-50 Using db2ndrop
C:\>db2ndrop /n:1
SQL2808W Node "1" for instance "DB2" has been deleted.
Note: If you use db2ndrop to remove a partition when the database still exists
in the instance, you lose the data on that partition. If you want to drop a
database partition from a partition server environment, we recommend using
the db2stop drop dbpartition command in conjunction with data
redistribution. See 3.3.10, Redistributing partition groups on page 56 for
further details.
db2_all
In a DPF-enabled environment, the db2_all utility is provided to issue
commands remotely to all database partitions in an instance.
81
82
When using db2_all, more than one DB2 statement can be issued by separating
each statement with a semicolon (;). For example, to list all the table spaces on
all database partitions, a database connection is needed for the LIST
TABLESPACES command to succeed as shown in Example 3-55.
Example 3-55 Listing table spaces on all partitions
SQLSTATE=08003
Database backup
Database backups are taken to protect against the possibility of losing data due
to hardware or software failures or both. A well-rehearsed recovery strategy must
be in place. DB2 provides the backup utility for taking backups.
You can take DB2 backups offline or online. An offline backup is also known as a
cold backup. Offline backups can be taken when there are no users or
applications connected to the database. An offline backup requires that the
database is unavailable to users during the backup. Online backups can be
taken while there is activity against the database and do not require that the
database is unavailable to users and applications.
DB2 backups can be full database backups or table space level backups. These
backups can be taken as incremental or delta backups. The database
transaction logs can also be backed up in the backup image.
DB2 supports taking a backup to disk, Tivoli Storage Manager (TSM), or an
X/Open Backup Services Application Programmers Interface (XBSA). You can
optionally compress DB2 backups by using a compression library to reduce the
size of the backup images.
In a DPF-enabled environment, database backups are taken at a database
partition level, that is, each database partition has to be backed up individually.
Offline backups require the catalog partition to be backed up first.
83
In our test environment, we have four database partitions. For a full database
backup, we have to back up four database partitions. We start by taking a backup
of the catalog partition first, because this is an offline backup as shown in
Example 3-57. After this completes successfully, we can back up the rest of the
partitions.
Example 3-57 Backing up a database
Monitoring backup
When the DB2 backup utility is executing, you can monitor it by using the
LIST UTILITIES SHOW DETAIL command. In a DPF-enabled environment, this
command only returns information about the database partition where it is
executed. In our test environment, we can monitor the backup utility as shown in
Example 3-59.
Example 3-59 Monitoring backup
export DB2NODE=0
db2 terminate
db2 LIST UTILITIES SHOW DETAIL
84
ID
= 5
Type
= BACKUP
Database Name
= TESTDB
Partition Number
= 0
Description
= offline db
Start Time
= 04/09/2007 10:41:23.587256
State
= Executing
Invocation Type
= User
Throttling:
Priority
= Unthrottled
Progress Monitoring:
Estimated Percentage Complete = 0
Total Work
= 47713041 bytes
Completed Work
= 0 bytes
Start Time
= 04/09/2007 10:41:23.598681
export DB2NODE=0
db2 terminate
db2 LIST HISTORY BACKUP ALL FOR testdb
85
$ db2ckbkp TESTDB.0.db2inst1.NODE0001.CATN0000.20070409104130.001
[1] Buffers processed:
###
RESTORE
Use the DB2 RESTORE utility to recover previously DB2-backed up databases
or table spaces after a problem, such as a media or storage failure, power
interruption, or application failure.
The DB2 RESTORE utility can be used to restore databases, table spaces, the
history file, or database transaction logs that have been stored in the backup
image. DB2 restore also supports rebuilding a database using table space
backup images.
DB2 supports restoring from disk, Tivoli Storage Manager (TSM), or an X/Open
Backup Services Application Programmers Interface (XBSA).
86
Monitoring restore
When the DB2 RESTORE utility executes, you can monitor it by using the
LIST UTILITIES SHOW DETAIL command. In a DPF-enabled environment, the
LIST UTILITIES command only returns information about the database partition
where it is executed. In our test environment, we can monitor the restore utility as
shown in Example 3-63 on page 88.
87
=
=
=
=
=
=
=
=
11
RESTORE
TESTDB
0
db
04/09/2007 12:09:03.058149
Executing
User
= 67129344 bytes
= 04/09/2007 12:09:03.058157
88
89
REORG utility
Use the DB2 REORG utility to reorganize a table or an index.
Index reorganization
You can reorganize all indexes defined on a table by rebuilding the index data
into unfragmented, physically contiguous pages. You achieve this by using the
INDEXES ALL FOR TABLE keyword. You can also reorganize a table by a
specific index by using the INDEX keyword. While indexes are reorganized, you
can use the REORG options ALLOW NO ACCESS, ALLOW READ ACCESS, or
ALLOW WRITE ACCESS to the table on which the indexes are reorganized. You
can also use the REORG utility to CONVERT Type 1 indexes to Type 2 indexes.
When the REORG utility is run with the CLEANUP ONLY option, a full
reorganization is not done. The indexes are not rebuilt and any pages freed up
are available for reuse by indexes defined on this table only.
When reorganizing indexes, the amount of sort memory available to sort the
index keys has a significant impact on performance.
In our test environment, we reorganized all the indexes on the LINEITEM table as
shown in Example 3-68 on page 91.
90
$ db2 "REORG INDEXES ALL FOR TABLE lineitem ALLOW WRITE ACCESS"
DB20000I The REORG command completed successfully.
Table reorganization
When a table is reorganized by using the REORG utility, you can use the INDEX
keyword to reorganize the table according to that index. If the INDEX option is
not specified to the REORG utility and if a clustering index exists on the table, the
data is ordered according to the clustering index.
Table reorganization can be offline or online. When an offline reorganization is
run, the ALLOW NO ACCESS or the ALLOW READ ACCESS keywords can be
specified with the REORG utility. Offline table reorganization is the default and is
the fastest method to reorganize a table. Offline reorganization of a table incurs a
large space requirement, because the entire copy of the table needs to be
rebuilt. An offline reorganization is synchronous.
If there is insufficient space within the table space to hold the table while it is
reorganized, a separate temporary table space must be specified.
In order to run the REORG utility online, you must specify the INPLACE keyword.
The INPLACE option allows user access to the table during a table
reorganization by specifying the ALLOW READ ACCESS or ALLOW WRITE
ACCESS keywords. An INPLACE table reorganization is asynchronous.
In our test environment, we ran an online table reorganization on all the database
partitions as shown in Example 3-69.
Example 3-69 Using the REORG utility on tables
Monitoring REORG
You can monitor table reorganizations by using the GET SNAPSHOT FOR
TABLES command. In our test environment, we specified the GLOBAL keyword
91
92
Collect the DETAILED index statistics when the table has multiple unclustered
indexes with varying degrees of clustering or the degree of clustering in an index
is nonuniform among the key values.
In our test environment, we collected detailed and distribution statistics for the
LINEITEM table with write access as shown in Example 3-71.
Example 3-71 Collecting detailed statistics
Monitoring RUNSTATS
Monitor the RUNSTATS utility by using the LIST UTILITIES command as shown
in Example 3-72.
Example 3-72 Monitoring the RUNSTATS utility
=
=
=
=
=
=
=
=
13
RUNSTATS
TESTDB
1
DB2INST1.LINEITEM
04/11/2007 11:25:29.175498
Executing
User
= Unthrottled
93
3.5.2 Monitoring
In this section, we discuss monitoring a partitioned database environment by
using DB2 tools.
Snapshot monitoring
You can use the DB2 snapshot monitor to capture information about the
database and any connected applications at a specific point-in-time. The
snapshot information is stored in internal buffers within DB2. Capture snapshots
by using the GET SNAPSHOT command. The snapshot monitor collects a
snapshot or current activities in the database and does not provide historical
information. Snapshot information that is already collected can be cleared by
using the RESET MONITOR command. Table 3-2 is a summary of the scope,
information provided, and use of snapshot monitoring.
Table 3-2 Scope, information provided, and use of snapshot monitors
Snapshot level
Information provided
Usage
Instance
Database
Table space
Table
Buffer pool
94
Snapshot level
Information provided
Usage
Lock
Statement
Application
Determine what an
application is doing
Determine most active
applications
Elapsed times of
applications
Before most snapshot data can be collected at any of the levels, the default
database monitor switches need to be turned on at the instance level. Even
without the monitor switches on, you can collect certain monitor data by default.
Note: The DFT_MON_TIMESTAMP monitor switch is turned on by default.
In our test environment, we turned on all the monitor switches as shown in
Example 3-73.
Example 3-73 Turning on all the monitor switches
db2
db2
db2
db2
db2
db2
UPDATE
UPDATE
UPDATE
UPDATE
UPDATE
UPDATE
DBM
DBM
DBM
DBM
DBM
DBM
CFG
CFG
CFG
CFG
CFG
CFG
USING
USING
USING
USING
USING
USING
DFT_MON_BUFPOOL ON
DFT_MON_LOCK ON
DFT_MON_SORT ON
DFT_MON_STMT ON
DFT_MON_UOW ON
DFT_MON_TABLE ON
95
= Enterprise Server
Product name
Service level
...
...
Node FCM information corresponds to
Free FCM buffers
Free FCM buffers low water mark
Free FCM channels
Free FCM channels low water mark
Number of FCM nodes
...
...
= DB2 v9.1.0.2
= s070210 (U810940)
= db2inst1
= 4
= Active
=
=
=
=
=
=
3
25060
25060
12518
12518
4
96
= TESTDB
=
= TESTDB
= 0
= 1
= 0
= 04/11/2007 15:26:40.407832
Application handle
Application ID
Sequence number
Application name
CONNECT Authorization ID
Application status
Status change time
Application code page
Locks held
Total wait time (ms)
=
=
=
=
=
=
=
=
=
=
63
*N0.db2inst1.070411200328
00001
db2bp
DB2INST1
Connect Completed
04/11/2007 15:03:40.932614
819
0
0
Event monitoring
You can use the DB2 event monitor to collect database monitor information on a
continual basis, which differs from snapshots because the snapshots are taken
at a certain point-in-time. DB2 event monitors are stored in the system catalog
tables. Event monitor data can be collected to pipes, files, or tables. Event
monitors collect data when a specific event, for which the event monitor has
been set up, occurs.
Event monitors do not have any database configuration switches that need to be
turned on before the collection of data. You must define and activate event
monitors in order to use them. By default, the DB2DETAILDEADLOCK event
monitor is defined. Event monitors that are defined can be viewed by querying
the SYSCAT.EVENTMONITORS system catalog table.
To create an event monitor, use the CREATE EVENT MONITOR SQL
statement. Event monitors only collect data when they are activated. To activate
or deactivate an event monitor, use the SET EVENT MONITOR STATE SQL
statement. A state of 0 indicates the event monitor is not activated. A state of 1
indicates the event monitor is activated.
97
export DB2NODE=2
db2 terminate
db2evmon -path /home/db2inst1/EVMON
Table 3-3 provides a summary of the scope, the information provided, and the
point at which the event data is collected by the event monitor.
Table 3-3 Event monitor scope, information provided, and data collection time
98
Event monitor
Information provided
Collected
Database
Database deactivation
Connections
End of connection
Table spaces
Database deactivation
Tables
Database deactivation
Buffer pools
Database deactivation
Event monitor
Information provided
Collected
Deadlocks
Detection of deadlock
Statements
Transactions
End of statement
(Single partition)
End of subsection
(DPF)
Requirements
The requirements to use db2pd are:
You must execute the utility on the same physical machine as the instance.
You can use db2_all, rsh, and so forth to execute remotely.
The user must have SYSADM authority.
The user must be the instance owner (for UNIX and Linux only).
You can execute db2pd to collect data at an instance level or a database level.
Instance scope options report information at the instance level. Database scope
options report information at the database level. Using the -alldbs option, several
databases can be active in one instance. All db2pd options have a full descriptive
word, for example, -applications, which can be shortened to a three letter
minimum, such as -app.
99
Options
Use
Instance
agents
fcm
FCM statistics
Buffer consumption
Applications involved in FCM usage
dbmcfg
sysplex
utilities
osinfo
applications
transactions
bufferpools
logs
locks
table spaces
dynamic
static
dbcfg
Database
100
Scope
Options
Use
catalogcache
tcbstats
reorgs
recovery
storagepaths
activestatements
In our test environment, we submitted the db2pd command to list all the
applications on database partition 1, as shown in Example 3-78.
Example 3-78 Using db2pd to list all applications on database partition 1
CoorPid
0
Status
Unknown
We can use the -alldbp option of db2pd to display information about all database
partitions. Example 3-79 shows how to get lock information.
Example 3-79 Using db2pd to list all locks at the database level
TranHdl
Lockname
Dur HoldCount Att
Type
ReleaseFlg
TranHdl
Lockname
Dur HoldCount Att
Type
ReleaseFlg
101
TranHdl
Lockname
Dur HoldCount Att
Type
ReleaseFlg
TranHdl
Lockname
Dur HoldCount Att
Type
ReleaseFlg
DB2 EXPLAIN
The DB2 EXPLAIN facility allows you to capture information about the
environment and the access plan chosen by the optimizer for static or dynamic
SQL statements. You can then use this information to tune the SQL statements,
as well as the database manager configuration, to improve the performance of
queries.
DB2 EXPLAIN captures:
Sequence of operations to process the query
Cost information
Predicates and selectivity estimates for each predicate
Statistics for all objects referenced in the SQL statement at the time that the
EXPLAIN information is captured
The DB2 EXPLAIN facility provides a number of tools to capture, display, and
analyze information about the access plans that the optimizer chooses for SQL
statements. The access plan listed in the EXPLAIN output is based on the
statistics available at the time of statement compilation. For static SQL, this
corresponds to bind and preparation time and might not match the actual runtime
statistics.
Access path information is stored in EXPLAIN tables, which you can query to
retrieve the information that you want. You can use either the GUI tool Visual
Explain or the text-based db2exfmt tool to examine the contents of the EXPLAIN
tables. We only demonstrate the db2exfmt tool in our examples.
EXPLAIN tables can be created by issuing the db2 -tvf EXPLAIN.DDL command,
or by the DB2 Control Center automatically. The EXPLAIN.DDL file is located in
the $HOME/sqllib/misc directory on UNIX and Linux, where $HOME is the home
directory of the DB2 instance owner and is located in C:\Program
Files\IBM\SQLLIB\misc on Windows.
102
GROUP BY
ORDER BY
Access Plan:
----------Total Cost:
Query Degree:
2.99008e+07
1
Rows
RETURN
(
1)
|
225
GRPBY
103
2)
|
36000
MDTQ
(
3)
|
225
GRPBY
(
4)
|
225
TBSCAN
(
5)
|
225
SORT
(
6)
|
2.2704e+07
HSJOIN
(
7)
/-------------+-------------\
2.2704e+07
633503
DTQ
HSJOIN
( 8)
( 18)
|
/------+-----\
2.2704e+07
633503
25
HSJOIN
TBSCAN
BTQ
( 9)
( 19)
( 20)
/----+----\
|
|
5.07707e+07
2.45672e+07
633503
25
TBSCAN
DTQ
TABLE: TPCD
FETCH
( 10)
( 11)
SUPPLIER
( 21)
|
|
/----+----\
5.07707e+07
2.45672e+07
25
25
TABLE: TPCD
HSJOIN
IXSCAN
TABLE: TPCD
PARTSUPP
( 12)
( 22)
NATION
/-------+-------\
|
9.5221e+07
2.45672e+07
25
TBSCAN
HSJOIN
INDEX: TPCD
( 13)
( 14)
N_NK
|
/---+---\
9.5221e+07
3.80893e+08 1.30986e+08
TABLE: TPCD
TBSCAN
BTQ
ORDERS
( 15)
( 16)
|
|
3.80893e+08
818664
TABLE: TPCD
TBSCAN
LINEITEM
( 17)
104
|
1.26927e+07
TABLE: TPCD
PART
The two biggest tables, LINEITEM and ORDERS, are partitioned on ORDERKEY. The
query joins these tables on the ORDERKEY. The join (HSJOIN(12)) is collocated
and this is probably good considering the size of data joined.
The PART table is partitioned on PARTKEY, but the LINEITEM table is partitioned on
ORDERKEY. Even though you can direct the LINEITEM table to the PART table
partitions, the optimizer does not want to move the rows of the LINEITEM table.
Instead, after filtering some of the PART table rows, the optimizer chooses to
broadcast (BTQ(16)) the PART table to be joined to the LINEITEM table. This join is
done before the join to the ORDERS table, because it helps reduce the size of the
LINEITEM table. Note that the partitioning of the result of this join is the same as
that of the LINEITEM table. Note also that the cardinality has been increased from
818664 to 1.30986a+08 after broadcasting the rows to the 160 partitions of the
LINEITEM table.
The join between PARTSUPP and LINEITEM is through the columns PARTKEY and
SUPPKEY. These columns are also the partitioning keys by which the PARTSUPP
table is partitioned. The optimizer chooses a directed table queue (DTQ(11)) to
push each of the result rows containing the LINEITEM data to the corresponding
partitions of the PARTSUPP table.
The NATION table is joined to the SUPPLIER table on the NATIONKEY. Because the
SUPPLIER is partitioned on the SUPPKEY, this join (HSJOIN(18)) uses a broadcast
table queue (BTQ(20). The resulting partitioning of this join is still the same as
the SUPPLIER was partitioned with SUPPKEY as the partitioning column.
The final join (HSJOIN(7)) is chosen with a directed table queue (DTQ(8)) to
send each row of the result of the join between LINEITEM, PART, ORDERS, and
PARTSUPP to the corresponding partitions of the result of SUPPLIER and NATION.
This is because the join predicate is on the SUPPKEY.
Finally at the top of the plan, after all the joins, we have a partial sort on the
N_NATION and O_YEAR columns on each partition. This also helps the intermediate
aggregation (GRPBY(4)) that collapses the size of the result on each partition.
The final aggregation is done through the GRPBY(2) operator. The order
required by the query is maintained by merging the rows from each partition
through the merging directed table queue (MDTQ(3)) sent to the coordinator
partition before the result is returned to the user.
105
3.5.3 Rebalancer
To maintain data striping across containers, DB2 might determine that a
rebalance is necessary and kick off the rebalancer process. The rebalancer runs
asynchronously in the background, and data in the table space is still accessible
to applications during this process.
You can add space to a DMS table space by using the ADD, ADD TO STRIPE
SET, EXTEND, and RESIZE options of the ALTER TABLESPACE statement.
BEGIN NEW STRIPE SET is another option for adding space, but because it
does not result in a rebalance, we do not include it in this discussion.
You can remove space from a DMS table space by using the DROP, REDUCE,
and RESIZE options of the ALTER TABLESPACE statement.
106
1
Stripes
10
11
12
Map
Range
Range
Stripe
Set
S.S.
Offset
Max.
Extent
Max.
Page
Start
Stripe
End
Stripe
Adj.
Containers
89
3 (0,1,2)
12
129
2 (0,1)
Range
In Figure 3-11, there are three containers with sizes of 51, 51, and 31 pages
respectively. Each container has a container tag, and by default, this uses a
single page, leaving 50, 50, and 30 pages. The extent size is 10, and therefore,
there are 5, 5, and 3 extents.
Container IDs are assigned in the order in which they are listed in the
CREATE TABLESPACE statement.
In Figure 3-11, the three containers are shown as vertical bars of 5, 5, and 3
boxes (representing the number of extents in them).
When a table space is created, it is placed into this map so that striping starts by
using all of the containers (that is, all start in stripe 0).
107
You can see by the extent numbers in the boxes how DB2 stripes across these
containers. It starts in the first stripe and uses an extent in each of the containers
within the stripe. After it has striped across one stripe, it moves onto the next
one.
In this example, striping continues across all of the containers until container 2 is
full. When this happens, the striping is done across the remaining containers.
Because stripes 0 - 2 share the same containers, it is considered a range.
Stripes 3 - 4 share the same containers so they are also a range.
There are 9 extents in the first range, (0 - 8), which means that there are 90
pages in the first range, (0 - 89). These maximum values are stored in the map
along with the starting stripe (0), end stripe (2), and the containers (3 of them: 0,
1, and 2).
In the second range, there are 4 extents, 9 - 12 (pages 90 - 129). The starting
stripe is 3, the end stripe is 4, and there are 2 containers (0 and 1).
The adjustment value is an indication of how far into the range that the extents
start. It is non-0 only during a rebalance.
108
In the first example in Figure 3-12, a container with 3 extents is added (note that
the container is actually 4 extents in size, but one extent is used to hold the
container tag). If the container was placed so that it started in stripe 0, it ends in
stripe 2. Stripe 2 is not the last stripe in the map, so DB2 decides not to place it
this way. Instead, it is placed so that it starts in stripe 2 and ends in the last stripe
(stripe 4).
In the other two examples in Figure 3-12, containers are added with 5 and 6
extents. In these cases, they can be added so that they start in stripe 0 and stop
in the last stripe (4, as in the second example) or beyond it (5, as in the third
example).
109
Extend
Example #1
Extend
Example #2
You can extend existing table space containers by using the EXTEND or
RESIZE options on the ALTER TABLESPACE statement.
When new space is added to an existing container, the map is changed so that
the space is just tacked on to the end of the container. In other words, the start
stripe for that particular container does not change but the end stripe for it does.
The end stripe increases by the number of extents added to it. Note that this can
still cause a rebalance to occur.
We provide all of the scenarios and examples in this section only to show you
how DB2 extends containers. The examples do not mean that these are good
configurations (in fact, they are not good configurations and are quite
impractical).
For examples #1 and #2 in Figure 3-13, the table space is created with three file
containers and the extent size is 100. The containers are 600, 600, and 400
pages in size but after removing an extent of pages from each for the tag, this
leaves 500, 500, and 300 pages (5, 5, and 3 extents).
Example #1 in Figure 3-13 increases the size of the first container by two extents
(200 pages). You can do this by using EXTEND and specifying the size
difference, or you can do this by using RESIZE and specifying the new size (800
in this case, 700 pages plus 100 for the tag).
110
Example #2 in Figure 3-13 on page 110 increases the size of each of the
containers to seven extents. Using EXTEND, each container can be listed along
with the number of pages that it needs to grow. Using RESIZE, each container
can be listed along with the new size (800, which is 700 pages plus 100 for the
tag). Because all of the containers in the table space are resized to the same
size (800 pages), you can use the ALL CONTAINERS clause instead of listing
them individually.
Reduce
Example #1
Reduce
Example #2
When space is removed from an existing container, the map is changed so that
the space is removed from the end of the container. In other words, the start
stripe for that particular container does not change, but the end stripe for that
container does. The end stripe decreases by the number of extents removed
from it. Note that this can still cause a rebalance to occur.
We use all of the scenarios and examples in this section only for the purpose of
showing how DB2 reduces containers. These examples are not meant to imply
that these are good configurations, in fact, they are not good configurations and
are quite impractical.
111
For examples #1 and #2 in Figure 3-14 on page 111, the table space is created
with three file containers and the extent size is 100. The containers are 600, 600,
and 400 pages in size but after removing an extent of pages from each for the
container tag, this leaves 500, 500, and 300 pages (5, 5, and 3 extents).
Example #1 in Figure 3-14 on page 111 decreases the size of the first two
containers by two extents (200 pages). You can do this by using REDUCE and
specifying the size difference or by using RESIZE and specifying the new size
(400 in this case, 300 pages plus one for the tag).
Example #2 in Figure 3-14 on page 111 decreases the size of the first container
by two extents (200 pages) and decreases the size of the second container by
one extent (100). As shown, you can do this by using the REDUCE option,
RESIZE option, or a combination of both.
Drop
Example #1
Drop
Example #1
112
After containers have been dropped and the reverse rebalance completes, the
remaining containers are renumbered so that their container IDs are contiguous,
starting at 0.
For example, if there are five containers in a table space, they have container IDs
in the range of 0 - 4. If containers 0, 1, and 3 were dropped, this leaves
containers 2 and 4. After the reverse rebalance completes, the actual files and
devices associated with containers 0, 1, and 3 are released, and then containers
2 and 4 are renumbered to 0 and 1.
Range
Stripe
Set
S.S.
Offset
Max.
Extent
Max.
Page
Start
Stripe
End
Strip
e
Adj.
Containers
44
3 (0,1,2)
12
64
2 (0,1)
16
84
2 (3,4)
17
89
1 (3)
23
119
3 (5,6,7)
When creating a new stripe set using the BEGIN NEW STRIPE SET option of the
ALTER TABLESPACE statement, the new containers are added to the map so
that they do not share the same stripe with any other containers. In this way, data
rebalancing is not necessary and the rebalancer does not start.
In this example, a table space is created with three containers that are 5, 5, and
3 extents respectively (remember that an extent of pages (5 in this example) in
each container is used to hold the container tag).
113
When the first new stripe set is created (stripe set 1), the containers that are
added (C3 and C4) are positioned in the map so that their stripe start value (5) is
one greater than the stripe end value of the last range in the map (4). After doing
this, the last stripe in the map is stripe 7.
When the second new stripe set is created (stripe set 2), the containers that are
added (C5, C6, and C7) are positioned in the map so that their stripe start value
(8) is one greater than the last stripe in the map (7).
As objects in the table space grow, space is consumed so that stripe 0 is used
first, followed by stripe 1, stripe 2, and so forth. This means that a stripe set is
only used after all of the space in the previous stripe sets has been consumed.
Rebalancing
When you add or remove space from a table space, a rebalance might occur
(depending on the current map and the amount of data in the table space). When
space is added and a rebalance is necessary, a forward rebalance is started
(extents move starting from the beginning of the table space).
When space is removed, a rebalance is necessary, and a reverse rebalance is
started (extents move starting with extents at the end of the table space).
When new space is added to a table space, or existing space is removed, a new
map is created. At this point, all of the extents still reside in physical locations on
the disk determined by the existing current map. It is the job of the rebalancer to
physically move extents from the original location (based on the current map) to
the new location (based on the new map) as shown in Figure 3-17.
9
11
REBALANCE
10
11
10
10
12
13
14
12
11
12
15
16
17
Forward rebalance
A forward rebalance starts with the first extent in the table space (extent 0),
moving one extent at a time until the extent holding the high-water mark has
been moved.
114
As each extent gets moved, the current map is updated so that it knows to look
for the extent in the new spot rather than the old one. As a result, the current map
begins to look like the new map.
Because the high-water mark is the stopping point for the rebalancer, by the time
that all of the extents have been moved, the current map and the new map are
identical up to the stripe holding the high-water mark.
Because there are no actual data extents above the high-water mark, nothing
else needs to move and the current map is made to look completely like the new
map.
Data is still accessible while a rebalance is in progress. Other than the
performance impact of the rebalance, objects can be dropped, created,
populated, and queried as if nothing were happening.
Even empty extents are moved during a rebalance, because the rebalancer does
not know whether extents are empty or in use. This is done for a couple of
reasons. First, to know about free and in use extents, Space Map Pages (SMPs)
must be locked in the buffer pool, and changes to SMPs cannot occur while the
SMPs are scanned. This means that objects cannot be created or dropped while
the corresponding SMPs are locked. Second, new space is usually added to a
table space when there is little or no free space in it. Therefore, there need to be
few free extents and the overhead of moving them unnecessarily is quite small
relative to the work done for the rest of the rebalance.
For example, a table space has two containers that are each four extents in size.
A third container of the same size is then added. The high-water mark is within
the 6th extent (extent #5) as shown in Figure 3-18 on page 116.
115
Current Map:
Current positions of
extents with respect
to current and new
containers
New Map:
Range
Range
10
11
Stripe
Set
0
Stripe
Set
0
S.S.
Offset
0
S.S.
Offset
0
Max.
Extent
7
Max.
Extent
11
Start
Stripe
3
Max.
Page
0
End
Stripe
0
Start
Stripe
3
Adj.
Containers
2 (0,1)
End
Stripe
0
Adj.
Containers
2 (0,1)
The current map is the table space map that describes the container
configuration before adding the new container.
The new map is what the current map looks like after the rebalance is complete.
Note the following in this example in Figure 3-18:
The container tag is not mentioned or taken into consideration. The sizes of
the containers are given in usable extents (in other words, the number of
extents that DB2 can use to actually hold data).
The extent size has not been given (because it does not add anything to the
example). Therefore, the Max. Page field is not shown.
Reverse rebalance
If the space that is removed (whether it is full containers dropped or extents
removed as part of a reduction) all is after the high-water mark extent in the table
space map, a reverse rebalance is unnecessary. Because none of the extents
removed contain any data, nothing needs to be moved out of the extents into the
remaining extents.
A reverse rebalance starts with the high-water mark extent, moving one extent at
a time until the first extent in the table space (extent 0) has been moved.
116
As each extent gets moved, the current map is updated so that it knows to look
for the extent in the new location rather than the old location. As a result, the
current map begins to look more and more similar to the new map.
Data is still accessible while a rebalance is in progress. Other than the
performance impact of the rebalance, objects can be dropped, created,
populated, and queried as if nothing were happening.
Even empty extents are moved during a rebalance, because the rebalancer does
not know whether extents are empty or in use. This is done for the following
reasons. First, to know about free and in use extents, SMP pages must be
locked in the buffer pool and changes to them cannot occur while the SMPs are
scanned. This means that objects cannot be created or dropped while the
corresponding SMPs are locked. Second, new space is usually added to a table
space when there is little or no free space in it. Therefore, there must be few free
extents and the overhead of moving the free extents unnecessarily is quite small
relative to the work done for the rest of the rebalance.
Space availability
If the map is altered in such a way that all of the space comes after the
high-water mark, a rebalance is unnecessary and all of the space is available
immediately for use as shown in Figure 3-19.
Current Map:
Current positions of
extents with respect
to current and new
containers
New Map:
Range
Range
10
11
Stripe
Set
0
Stripe
Set
0
S.S.
Offset
0
S.S.
Offset
0
Max.
Extent
Start
Stripe
3
Max.
Extent
11
Max.
Page
0
End
Stripe
0
Start
Stripe
3
Adj.
Containers
2 (0,1)
End
Stripe
0
Adj.
Containers
2 (0,1)
117
If the map is altered so that part of the space comes after the high-water mark as
shown in Figure 3-20, the space in the stripes above the high-water mark
becomes available for use. The rest of the space is unavailable until the
rebalance is finished.
X
"X" denotes an
extent below the
high-water mark
and "H" is the
high-water mark.
Figure 3-20 Space after the high-water mark is available for use
By the same logic, if all of the space is added to the map so that the space is in
the high-water stripe or lower, none of the space becomes available until after
the rebalance is complete, as shown in Figure 3-21.
118
119
120
You can use MQTs to replicate tables to other database partitions to enable
collocated joins to occur even though all the tables are not joined on the
partitioned key. In Figure 3-22, REGION is replicated to the other partitions using
the MQT infrastructure in order to enable collocated joins for superior
performance.
121
Example 3-84 shows how to create a replicated table. Table REGION is stored in a
single-partition table space. In order to facilitate a collocated join with a table in
the multipartition table space TBSP123, we create the replicated MQT
REGION_MQTR. With the REPLICATED clause, we requested that it is duplicated
on all the partitions in the partition group over which table space TBSP123 is
defined.
Example 3-84 Creating a replicated MQT
Coordinator partition
In a DPF environment, any partition can be a coordinator partition. In fact more
than one partition can service requests from your users or applications.
However, we recommend that you dedicate one partition in your environment to
be the coordinator partition and that this partition is only used as the coordinator
partition and does not hold any data. When setting up your application
connection, we recommend that you use the SET CLIENT command and specify
122
the coordinator node in the command. Example 3-85 shows the usage of the
command to connect to a specific partition.
Example 3-85 SET CLIENT command
= DB2/AIX64 9.1.2
= DB2INST1
= TESTDB
Catalog partition
The catalog partition is the database partition that stores the system catalogs for
the partitioned database. We recommend separating the catalog partition from
the data partitions.
123
Consider that integer columns are more efficient than character columns, which
in turn are more efficient than using decimal columns. The partitioning key must
be a subset of the primary key or unique index.
Remember that after you select a distribution key for a table, you cannot alter the
key unless the table is in a table space that is associated with a single-partition
database partition group.
Note: Creation of a multiple-partition table that contains only long data types
(LONG VARCHAR, LONG VARGRAPHIC, BLOB, CLOB, or DBCLOB) is not
supported.
3.7.3 Collocation
Tables that are collocated are stored in the same database partition group that
allows the processing of a query within the same logical partition. This avoids
unnecessary movement of data over the partitions in your DPF environment. To
help ensure table collocation, use the join columns as partitioning keys. The
joined tables can then be placed in table spaces that share the same partition
group. The joined tables partitioning keys must have the same number of
columns and corresponding data types. Try to place small tables in
single-partition database partition groups, except when you want to take
advantage of collocation with a larger table. Also, try to avoid extending
medium-sized tables across too many database partitions. Performance might
suffer if a mid-sized table is spread across too many partitions.
124
Chapter 4.
Table partitioning
This chapter provides information about the implementation of table partitioning.
This applies to UNIX, Linux, and Windows operating systems.
125
Roll-in strategies
Roll-in strategies include the use of the ADD PARTITION and ATTACH
PARTITION options of the ALTER TABLE statement.
ADD PARTITION
The strategy for using ADD PARTITION is to create a new partition in the
partitioned table and INSERT or LOAD data directly to the partitioned table,
provided that it complies with the overall range of the altered table. Using ADD
126
PARTITION with LOAD minimizes logging, but table access is limited to, at the
most, ready only.
We provide examples of the use of ADD PARTITION later in this chapter.
ATTACH PARTITION
Alternatively, you can use the ATTACH PARTITION parameter of ALTER
TABLE. This allows you to attach an existing table that you have already loaded
with data. This method provides minimal disruption when accessing a table.
When you perform an ATTACH, you must follow with SET INTEGRITY. This
incorporates the data into the indexes on the table. At the same time, it also
validates that the data is all within the boundaries defined for that range. SET
INTEGRITY has been made online in DB2 9 so that this longer running portion of
roll-in can take place while applications continue to read and write the existing
data in the table.
Be aware that the online SET INTEGRITY is a single statement and runs as a
single transaction. If the attached data volume is large, the index maintenance
can potentially need a large amount of log space for recovery. An alternative
roll-in solution is by using ALTER ADD to add a partition and use LOAD to move
the data into the partition.
We provide examples of the use of ATTACH PARTITION later in this chapter.
Roll-out strategies
The roll-out strategy is to use the DETACH PARTITION option of the ALTER
TABLE statement.
DETACH PARTITION
Use the DETACH PARTITION option to DETACH a partition to a stand-alone
table. You can then perform actions as required, such as REORG, EXPORT
data, DROP or prune data, and re-ATTACH.
We provide examples of the use of DETACH PARTITION later in this chapter.
Note that the new table resulting from the detached partition resides in the same
table space as the original partition. If exclusive use of that table space is
required for the partitioned table space, you have to DROP and re-CREATE the
table elsewhere.
127
128
129
You can read the complete details of the syntax in the DB2 Information Center at:
http://publib.boulder.ibm.com/infocenter/db2luw/v9/topic/com.ibm.db2.ud
b.admin.doc/doc/r0000927.htm
130
2. Then, we defined the table space parameters for INV_PRE00, which we depict
in Figure 4-3 on page 132, Figure 4-4 on page 132, and Figure 4-5 on
page 133.
Note: The data table spaces must have the same characteristics, such as
regular or large, page size, and extent size.
131
Figure 4-3 Defining the INV_PRE00 table space: Naming the table space
Figure 4-4 Defining the INV_PRE00 table space: Specify the type of table space
132
3. This was repeated for the remaining table spaces as shown in the table
definition in Figure 4-1 on page 130.
133
2. In the Add Column window, we defined the columns as shown in Figure 4-8
on page 135.
134
Figure 4-8 Creating the INVOICE table: Step two, defining the columns
135
136
Figure 4-10 Creating the INVOICE table: Defining the INV_PRE00 partition
5. Next, we assigned the table spaces to the partition for data, long data, and
indexes, which is shown in Figure 4-11 on page 138. You still have the
opportunity at this point to create table spaces, but we suggest that you
predefine all the table spaces that you require.
137
6. Finally, in Figure 4-12 on page 139, you have the opportunity to review the
SQL actions that are performed when you complete the task.
This completes the setup of the partitioned table by using the Control Center.
138
Figure 4-12 Creating the INVOICE table: Step six, reviewing the actions
139
140
AT
AT
AT
AT
AT
AT
AT
AT
ADD PARTITION
Adding a data partition to a partitioned table by using ADD PARTITION adds an
empty partition to the table. After you add the new partition, you can then insert or
load data directly into the partitioned table.
Example 4-3 on page 142 shows the SQL statements to add a new partition for
the 2008 year. You can do this in preparation for the change to the new calendar
year.
141
MANAGED BY AUTOMATIC
ATTACH PARTITION
Alternatively, you can add a new partition to a data partitioned table by using the
ATTACH PARTITION option. This is an efficient way to roll-in table data to a data
partitioned table.
The attaching partition process is:
Create a non-partitioned compatible table.
Load the data into the newly created table.
Attached the new table to the partitioned table.
Note that the table to be attached must be compatible with the target partitioned
table. Create the new table by using the same DDL of the source table. If the
target table is an existing table, make sure the table definitions match. There is
no data movement during ATTACH. Because the new partition is essentially the
same physical data object as the stand-alone table, the new partition inherits the
table space usage from the original stand-alone table. So, create the stand-alone
table in the correct table space before attaching.
Example 4-4 on page 143 illustrates attaching table INVOICE_DATE_2008Q1 to
INVOICE_DATE.
142
DETACH PARTITION
If you decide to remove all of the pre-2000 data from this table to store it in a
separate table, you use the DETACH parameter of the ALTER TABLESPACE
statement. To demonstrate this, we have detached the INV_PRE00 partitions into
a table called INV_PRE00. The statements that we used are in Example 4-5.
Example 4-5 Detaching partition INV_PRE00 into table INV_PRE00
143
corresponding to data that has been rolled-out. Meanwhile, the entries are
present, but invisible to normal queries. The features that make AIC unobtrusive
are:
Periodically checks for waiters and releases locks.
AIC does not activate a database partition.
AIC does not keep a database partition active if all applications disconnect.
The detaching process provides higher table availability than the bulk data
deletion. Figure 4-13 shows the table availability for DETACH for non-MQT
tables. When detaching a partition, there might be slight contention during index
cleaning if AIC performs the index cleaning.
OLD METHOD: DELETE
DELETE
available
contention
available
DETACH
ASYNC INDEX
CLEANUP
slight contention
available
Figure 4-14 on page 145 shows the table availability for DETACH for refresh
immediate MQTs.
144
Table
available
contention
available
available
contention
available
MQTs
ASYNC INDEX
CLEANUP
DETACH
Table
available off line
contention
slight contention
available
slight contention
available
MQTs
off line
REFRESH MQT
Figure 4-14 Table Availability for DETACH (with refresh Immediate MQTs)
145
146
implement the partitioned table. After you select the range, it cannot be changed
dynamically.
The RANGE option of the CREATE TABLE command defines the ranges of the
partitions and, in the case of the automatically generated range, defines the
number of partitions.
The RANGE specification is a combination of the RANGE partition expression
and the RANGE partition element:
PARTITION BY RANGE (partition-expression) (partition-element)
For example,
PARTITIONED BY RANGE (sales_region)
The NULLS LAST and NULLS FIRST specify that the null values compare high
or low. Not all the data types supported by the partitioned table can be the
partitioning key column data type. For the details of the supported data types,
refer to the DB2 Information Center at:
http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp
147
Each data partition has to have a unique name. If the partition name is not
specified, the name defaults to PARTx where x is an integer generated to keep
the names unique. The IN tablespace-name clause defines the table space
where the rows of the table in the range are stored.
The RANGE granularity can vary by partition as in Example 4-8 on page 149. In
this example, we have accumulated the historical (pre-2007) data into two
partitions, because we are more concerned with current 2007 data. The current
2007 data resides in its own partition, and the 2008 data will also have its own
partition. Because of the improved scan performance, the current data and future
148
data have improved performance because their partitions are in separate table
spaces.
Example 4-8 Range granularity
149
EXCLUSIVE indicates that all values equal to the specified value are not
included in the data partition containing this boundary.
The NULL clause specifies whether null values are sorted high or low when
considering data partition placement.
By default, null values are sorted high. Null values in the table partitioning key
columns are treated as positive infinity and are placed in a range ending at
MAXVALUE. If no data partition is defined, null values are considered
out-of-range values. Use the NOT NULL constraint if you want to exclude null
values from table partitioning key columns. LAST specifies that null values
appear last in a sorted list of values. FIRST specifies that null values appear
first in a sorted list of values.
When using the long form of the syntax, each data partition must have at least
one boundary specified.
IN indicates the table space where the partition is to be located.
This table space can be a table space that is used for other partitions or an
individual table space for a particular partition.
If you choose a range granularity that is inappropriate, you cannot change the
range definitions in place.
Your options to change range definitions are:
Export the data, drop the table, create a table, import data, or load data.
Detach partitions, add partitions, or import data.
Detach partitions, manipulate data into new tables that match the required
range specifications, and attach the resulting tables as the new partitions.
150
CREATE TABLE
invoice_auto
(custno BIGINT NOT NULL,
transaction_date DATE NOT NULL,
amount DECIMAL (15, 2) NOT NULL ,
custname CHARACTER (10) NOT NULL)
PARTITION BY RANGE (transaction_date NULLS LAST)
(STARTING FROM ('01/01/2000') INCLUSIVE
ENDING AT ('12/31/2007') INCLUSIVE
EVERY (1 YEARS)
) IN inv_all ;
Restrictions when using the automatically generated ranges include:
MINVALUE and MAXVALUE are not supported in the automatically
generated form of the syntax.
You can only specify one column for the range in the automatically generated
form of the syntax.
Tables that you create by using the automatically generated form of the
syntax (containing the EVERY clause) are constrained to use a numeric,
date, or time type for the table partitioning key.
151
INCLUSIVE
INCLUSIVE
INCLUSIVE
INCLUSIVE
INCLUSIVE
INCLUSIVE
INCLUSIVE
IN
IN
IN
IN
IN
IN
IN
inv_2000,
inv_2001,
inv_2002,
INV_2003,
INV_2004,
inv_2006,
inv_2007
);
Example 4-11 shows the use of multiple columns in the range definition.
Example 4-11 Range specification with two columns: year and month
CREATE TABLE invoice
(custno INTEGER NOT NULL ,
transaction_date DATE NOT NULL ,
amount INTEGER NOT NULL ,
cust_name CHARACTER (10) NOT NULL,
inv_month INT NOT NULL GENERATED ALWAYS AS (MONTH(transaction_date)),
inv_year INT NOT NULL GENERATED ALWAYS AS (YEAR(transaction_date)))
PARTITION BY RANGE (inv_year, inv_month)
(PARTITION prt2004_1 STARTING FROM (2004,1) INCLUSIVE ENDING AT (2004,3) INCLUSIVE
IN inv_tsd20041,
PARTITION prt2004_2 STARTING FROM (2004,4) INCLUSIVE ENDING AT (2004,6) INCLUSIVE
IN inv_tsd20042,
PARTITION PRT2004_3 STARTING FROM (2004,7) INCLUSIVE ENDING AT (2004,9) INCLUSIVE
IN INV_TSD20043,
PARTITION PRT2004_4 STARTING FROM (2004,10) INCLUSIVE ENDING AT (2004,12) INCLUSIVE
IN inv_tsd20044,
PARTITION prt2005_1 STARTING FROM (2005,1) INCLUSIVE ENDING AT (2005,3) INCLUSIVE
IN inv_tsd20051,
PARTITION PRT2005_2 STARTING FROM (2005,4) INCLUSIVE ENDING AT (2005,6) INCLUSIVE
IN INV_TSD20052,
PARTITION prt2005_3 STARTING FROM (2005,7) INCLUSIVE ENDING AT (2005,9) INCLUSIVE
IN inv_tsd20053,
PARTITION prt2005_4 STARTING FROM (2005,10) INCLUSIVE ENDING AT (2005,12) INCLUSIVE
IN inv_tsd20054);
152
Example 4-12 shows the use of multiple columns of different data types. Here,
the range specification is year and region number.
Example 4-12 Range specification with multiple columns: year and region
(CREATE TABLE invoice
(custno INTEGER NOT NULL ,
transaction_date DATE NOT NULL ,
amount INTEGER NOT NULL ,
cust_name CHARACTER (10) NOT NULL,
inv_year INT NOT NULL GENERATED ALWAYS AS (YEAR(transaction_date)),
region INTEGER)
PARTITION BY RANGE (inv_year,region)
(PARTITION prt2004_1 STARTING FROM (2004,1) INCLUSIVE ENDING AT (2004,3)
PARTITION prt2004_2 STARTING FROM
PARTITION prt2004_3 STARTING FROM
PARTITION prt2004_4 STARTING FROM
PARTITION prt2005_1 STARTING FROM
PARTITION prt2005_2 STARTING FROM
PARTITION prt2005_3 STARTING FROM
PARTITION prt2005_4 STARTING FROM
INCLUSIVE
IN inv_ts2004r1,
(2004,4) INCLUSIVE ENDING AT (2004,6) INCLUSIVE
IN inv_ts2004r4,
(2004,7) INCLUSIVE ENDING AT (2004,8) INCLUSIVE
IN inv_ts2004r7,
(2004,9) INCLUSIVE ENDING AT (2004,10) INCLUSIVE
IN inv_ts2004r9,
(2005,1) INCLUSIVE ENDING AT (2005,3) INCLUSIVE
IN inv_ts20051,
(2005,4) INCLUSIVE ENDING AT (2005,6) INCLUSIVE
IN inv_ts20054,
(2005,7) INCLUSIVE ENDING AT (2005,8) INCLUSIVE
IN inv_ts20057,
(2005,9) INCLUSIVE ENDING AT (2005,10) INCLUSIVE
IN inv_ts20059);
Note: When multiple columns are used as the table partitioning key, they are
treated as a composite key, which is similar to a composite key in an index in
that the trailing columns are dependent on the leading columns.
This is demonstrated in Example 4-13 on page 154 where the statement
depicted produces the error message shown below:
DB21034E The command was processed as an SQL statement because it was
not a valid Command Line Processor command. During SQL processing it
returned:
SQL0636N Range specified for data partition "PRT2004_1" is not valid.
Reason code = "10". SQLSTATE=56016
Reason code 10 states:
The range overlaps with another partition. Each data partition must
have a well defined starting and ending boundary and each data value
must go into one and only one data partition. Also, if the same value
153
154
In the next example, Example 4-15, there is a long table space, INV_TBSPL99,
allocated for the partitions that have no long table space specified.
Example 4-15 Specifying a collective long table space
CREATE TABLE testtab(
custno BIGINT NOT NULL ,
transaction_date DATE NOT NULL ,
amount DECIMAL (15, 2) NOT NULL ,
custname CHARACTER (10) NOT NULL ,
time TIME NOT NULL,
region INT NOT NULL)
LONG IN inv_tbspl99
PARTITION BY RANGE (TRANSACTION_DATE NULLS LAST)(
PARTITION INV_0 STARTING FROM (MINVALUE) INCLUSIVE ENDING AT
('12/31/2001') EXCLUSIVE IN inv_tbsp00 LONG IN inv_tbspl0,
PARTITION INV_1 STARTING FROM ('01/01/2002') INCLUSIVE ENDING AT
('12/31/2003') EXCLUSIVE IN inv_tbsp01 LONG IN inv_tbspl1,
PARTITION INV_2 STARTING FROM ('01/01/2004') INCLUSIVE ENDING AT
('12/31/2005') EXCLUSIVE IN inv_tbsp02,
PARTITION INV_3 STARTING FROM ('01/01/2006') INCLUSIVE ENDING AT
('12/31/2007') EXCLUSIVE IN inv_tbsp03);
Note: You cannot store long data remotely for certain data partitions and store
long data locally for other partitions.
155
156
INVOICE TABLE
INV_PRE00 Partition
INV_2000 Partition
Cont1
Cont1
Cont1
INV_2007 Partition
INV_2007 Table space
Cont1
Cont2
Cont2
Cont2
Cont2
Cont3
Cont4
Cont3
Cont3
Cont3
Cont5
Cont4
Cont6
....
Figure 4-16 Relationships of partitions, table spaces, and containers
157
a >= 21
a < 40
a >= 41
a < 60
a >= 61
a < 80
Partition elimination also applies for index scans. For example, many plans are
possible for the query shown in Figure 4-18 on page 159. Without partitioning,
one likely plan is to use index ANDing. Index ANDing performs these tasks:
With partitioning, each RID in the index contains the datapartID. Instead of
reading from the l_shipdate index, the optimizer looks at the datapartID to
discover if the row might be in the desired date range. It saves half the I/O in
indexes. Index ANDing passes RIDs back up to the runtime routine, ands them,
and then goes back to the kernel to fetch them. In contrast, partition elimination
skips irrelevant RIDs without ever returning them to run time, thus, improving
performance.
158
fetch
index
anding
partition
elimination
present
in both?
RIDs for
date range
l_shipdate
RIDs for
49981
matching
range?
RIDs for
49981
l_partkey
l_partkey
4.3.1 Utilities
These are DB2 utilities affected by table partitioning.
LOAD or IMPORT
LOAD or IMPORT can be used to load a partitioned table. The table is regarded
by the utility as a standard table for the purposed of loading data with the
following restrictions:
Consistency points are not supported.
159
Loading data into a subset of data partitions while the remaining data
partitions remain fully online is not supported.
The exception table used by a load operation cannot be partitioned.
A unique index cannot be rebuilt when the LOAD utility is running in insert
mode or restart mode and the load target table has any detached
dependents.
Exact ordering of input data records is not preserved when loading partitioned
tables. Ordering is only maintained within the data partition.
The LOAD utility does not access any detached or attached data partitions.
Data is inserted into visible data partitions only (visible data partitions are
neither attached nor detached).
A load replace operation does not truncate detached or attached data
partitions.
Because the LOAD utility acquires locks on the catalog system tables, the
LOAD utility waits for any uncommitted ALTER TABLE transactions.
REORG
There are restrictions when using REORG TABLE for partitioned tables:
You cannot use REORG on a partitioned table in a DMS table space during
an online backup of any table space in which the table resides (including
LOBs and indexes). You get an SQL2048 error. This does not occur when the
table spaces are SMS.
REORG is supported at the table level. You can reorganize an individual data
partition by detaching the data partition, reorganizing the resulting
non-partitioned table, and then reattaching the data partition. The table must
have an ACCESS_MODE in SYSCAT.TABLES of Full Access.
Reorganization skips data partitions that are in a restricted state due to an
ATTACH or DETACH operation.
If an error occurs, the non-partitioned indexes of the table are marked as bad
indexes and are rebuilt on the next access to the table.
If a reorganization operation fails, certain data partitions might be in a
reorganized state and others might not. When the REORG TABLE command
is reissued, all the data partitions are reorganized regardless of the data
partitions reorganization state.
When reorganizing indexes on partitioned tables, we recommend that you
perform a RUNSTATS operation after asynchronous index cleanup is
complete in order to generate accurate index statistics in the presence of
detached data partitions. To determine whether there are detached data
partitions in the table, you can check the status field in
160
161
---------------- STATEMENT 1
QUERYNO: 1
QUERYTAG:
Statement Type: Select
Updatable: No
Deletable: No
Query Degree: 1
SECTION 65 ----------------
Original Statement:
-----------------SELECT *
FROM DB2ADMIN.INVOICE
where transaction_date > '01/01/2002' and transaction_date < '03/01/2002'
Optimized Statement:
------------------SELECT Q1.CUSTNO AS "CUSTNO", Q1.TRANSACTION_DATE AS "TRANSACTION_DATE",
Q1.AMOUNT AS "AMOUNT", Q1.CUSTNAME AS "CUSTNAME"
FROM DB2ADMIN.INVOICE AS Q1
162
Plan Details:
-------------
163
164
Predicates:
---------2) Sargable Predicate
Comparison Operator: Less Than (<)
Subquery Input Required: No
Filter Factor: 0.661248
Predicate Text:
-------------(Q1.TRANSACTION_DATE < '2002-03-01')
3) Sargable Predicate
Comparison Operator: Less Than (<)
Subquery Input Required: No
Filter Factor: 0.349343
Predicate Text:
-------------('2002-01-01' < Q1.TRANSACTION_DATE)
DP Elim Predicates:
-----------------Range 1)
Stop Predicate: (Q1.TRANSACTION_DATE < '2002-03-01')
Start Predicate: ('2002-01-01' < Q1.TRANSACTION_DATE)
Input Streams:
------------1) From Object DB2ADMIN.INVOICE
Estimated number of rows: 3e+006
Number of columns: 5
Subquery predicate ID: Not Applicable
Column Names:
-----------+Q1.$RID$+Q1.CUSTNAME+Q1.AMOUNT+Q1.CUSTNO
+Q1.TRANSACTION_DATE
Output Streams:
-------------2) To Operator #1
Estimated number of rows: 31773
Number of columns: 4
Subquery predicate ID: Not Applicable
165
Column Names:
-----------+Q2.CUSTNAME+Q2.AMOUNT+Q2.TRANSACTION_DATE
+Q2.CUSTNO
166
might keep the locks on previously accessed data partitions to reduce the costs
of reacquiring the data partition lock if that data partition is referenced in
subsequent keys. The data partition lock also carries the cost of ensuring access
to the table spaces. For non-partitioned tables, table space access is handled by
the table lock. Therefore, data partition locking occurs even if there is an
exclusive or share lock at the table level for a partitioned table.
Finer granularity allows one transaction to have exclusive access to a given data
partition and avoid row locking while other transactions are able to access other
data partitions. This can be a result of the plan chosen for a mass update or due
to escalation of locks to the data partition level. The table lock for many access
methods is normally an intent lock, even if the data partitions are locked in share
or exclusive. This allows for increased concurrency. However, if non-intent locks
are required at the data partition level and the plan indicates that all data
partitions can be accessed, a non-intent lock might be chosen at the table level
to prevent deadlocks between data partition locks from concurrent transactions.
You can obtain more information from the DB2 Information Center:
http://publib.boulder.ibm.com/infocenter/db2luw/v9/topic/com.ibm.db2.ud
b.admin.doc/doc/c0021606.htm
4.3.4 Troubleshooting
You might encounter problems when creating tables, adding partitions, attaching
partitions, or detaching partitions. Reasons for these problems can be:
An ATTACH fails if the source table data does not conform to the range
specified for the new partition in the target table.
The source table must be an existing non-partitioned table or a partitioned
table with only a single data partition.
The table definition for a source table must match the target table.The
number, type, and ordering of columns must match for the source and target
tables.
The source table must not be a hierarchical table (typed table).
The source table must not be a range-clustered table (RCT).
The page size of table spaces must match. If they do not match, the result is
message SQL1860.
Detaching fails because you are detaching to an existing object.
The DB2 Information Center has more in-depth information about problems that
you might encounter, restrictions, and usage guidelines and can be accessed by
using this link:
167
http://publib.boulder.ibm.com/infocenter/db2luw/v9/topic/com.ibm.db2.ud
b.admin.doc/doc/c0022680.htm
------
--Load the data from TABLE1 into TABLE2 using LOAD FROM ... CURSOR
-DECLARE c1 CURSOR FOR SELECT * FROM table1;
LOAD FROM c1 OF CURSOR INSERT INTO table2 ;
168
--Or load the data from TABLE1 into TABLE2 using two step LOAD
-EXPORT TO TABLE1.DEL OF DEL SELECT * FROM table1;
LOAD FROM TABLE1.DEL OF DEL INSERT INTO table2
If the any of the data is outside of the partition boundary, the load completes but
with this message:
SQL0327N The row cannot be inserted into table because it is outside
the bounds of the defined data partition ranges. SQLSTATE=22525
169
SELECT
SELECT
SELECT
SELECT
*
*
*
*
FROM
FROM
FROM
FROM
170
171
Define ranges
When defining ranges for your partitioned table, consider these
recommendations:
Partition on date.
Fast roll-out by using DETACH is one of the greatest benefits of table
partitioning. You need to partition on a date column to get this benefit. Side
benefits include:
You receive performance benefits for common business intelligence
(BI)-type queries.
Many BI queries are date-oriented; therefore, you often get a performance
boost from partition elimination.
Partitioning on a date column allows the separation of active and static
data for reduced backup volume.
You can tailor your backups to back up active data more frequently and
static data less often.
172
Fast roll-out by using DETACH reduces the need for REORG TABLE.
One of the key reasons for reorganizing a table is to reclaim space after
bulk deletes. If DETACH is used instead of bulk delete, this reason goes
away.
Choose the size of ranges to match the roll-out.
The range size needs to match your roll-out strategy. Ranges typically are by
month or quarter, which yields a manageable number of ranges. You can
have ranges smaller than the required roll-out amount and roll-out several
ranges at a time. For example, define the range by month and roll-out three
months at a time each quarter.
Spreading data
Separation of indexes and table data is not at all required, but we recommend
that you separate indexes and data to simplify space planning and your backup
strategy. Advanced performance tuning often results in putting indexes in
separate table spaces.
Index placement
For partitioned tables, indexes are separate objects. Different indexes of the
same table can be placed in different table spaces. Individual indexes can then
be reorganized independently. When placing the index, consider these actions:
Separate indexes and table data to simplify size planning:
The size for each months worth of table data is fairly predictable.
For space planning, putting the indexes and table data together is
relatively more complicated than separating the index and the table data
into different table spaces.
Separating indexes is necessary for certain performance tuning.
Specify the index placement at CREATE INDEX time.
If you do not specify the table space when you CREATE INDEX, it looks at
what you specified for INDEX IN when you created the table. If you did not
specify anything for INDEX IN, the index is placed in the table space that held
the first range at the time the table was created. Because indexes tend to be
much larger than one range of the table, this often causes you to run out of
space unexpectedly.
Default placement is undesirable, because all indexes end up together with
one range.
173
Smoother roll-in
To facilitate a smoother roll-in process, consider these actions:
Issue COMMIT WORK after ATTACH and SET INTEGRITY:
ATTACH locks the whole table until committed.
New data is invisible after SET INTEGRITY is issued until committed.
SET LOCK TIMEOUT WAIT:
Prevents SET INTEGRITY from failing on a lock conflict at the end.
Plan for query draining by ATTACH:
ATTACH does not complete until it drains existing queries for the table.
Meanwhile, no new queries can start.
Use a single SET INTEGRITY statement:
Include all refresh immediate MQTs and the base table in the same SET
INTEGRITY statement.
MQTs that are not refreshed in the first pass go offline.
Multiple SET INTEGRITY statements can mean more passes through the
data.
Specify ALLOW WRITE ACCESS with SET INTEGRITY:
174
175
176
Chapter 5.
Multi-dimensional clustering
This chapter provides information for planning, implementing, and administering
multi-dimensional clustered (MDC) tables. In addition, we illustrate application
programming techniques that leverage the components of the MDC technology.
177
178
179
SELECT
FROM
180
181
SELECT
FROM
AVG(RowsPerCell) AS RowsPerCell,
MIN(RowsPerCell) AS MinRowsPerCell,
MAX(RowsPerCell) AS maxRowsPerCell,
STDDEV(RowsPerCell) AS sdRowsPerCell
(SELECT
col1, col2, ... , colN, COUNT( * ) RowsPerCell
FROM
table
GROUP BY
col1, col2, ... , colN
) cell_table
Compute the maximum rows per page and extent size for various combinations.
A table similar to Table 5-1 on page 183 might be useful. The formulas for the
columns in the table are:
Rows in a page = (page size - 68) / (average row size + 10) rounded
down to an integer
Rows in extent = rows in a page x extent size
Number of extents needed for a cell = rows in cell / rows in extent
The probability of a cell smaller than one extent is the probability assuming a
normal distribution of rows in a cell, using the average and standard deviation
from the SQL in Example 5-4 and calculated using a cumulative distribution
function. We use the Lotus 1-2-3 spreadsheet function to calculate the
probability:
@NORMAL(rows in extent;average rows in cell;stddev of rows in cell;0)
Stop the calculation for any entry that has a rows in extent greater than the
average number of rows in a cell.
182
Rows in a
page
Extent size
(in pages)
Rows in
extent
Extents
needed
for a cell
Probability
of a cell
smaller than
one extent
4096
27
108
6.35
<0.005%
216
3.18
0.02%
12
324
2.12
0.29%
16
432
1.59
2.63%
20
540
1.27
13.25%
24
648
1.06
38.59%
28
756
0.91
70.35%
216
3.18
0.02%
432
1.59
2.63%
12
648
1.06
38.59%
436
1.57
2.82%
872
0.79
92.22%
876
0.78
92.65%
8192
16384
32768
54
109
219
183
extent size of four pages. These selections offer a fairly small amount of I/O to
read a cell combined with a very small probability of encountering cells
occupying less than one extent.
If you are converting an existing table, compare your selections to the tables
actual values. If they do not match, consider creating a table space with
appropriate values and moving the table when you convert it. In this example, an
existing table space with 4K page size and four pages in an extent is probably
left unchanged.
184
If you are working with an existing table, this is a drastic step, because it
requires dropping and recreating the table space.
Reduce the page size.
The only effect of reducing page size is to reduce the size of the wasted
space in each cell, but it means more pages (and possibly more blocks) are
required to store the data, increasing the I/O activity required to read the
entire cell.
If you are working with an existing table, this is a drastic step, because it
requires dropping and recreating the table space.
Reconsider the entire list of columns.
If at first you do not succeed, try again.
185
186
CALL SYSPROC.ALTOBJ('APPLY_CONTINUE_ON_ERROR','
CREATE TABLE mdc_samp (
sales_amount DECIMAL(10,2) NOT NULL,
date_of_sale DATE NOT NULL,
salesperson CHAR(10) NOT NULL,
store_nbr INTEGER,
year_and_month GENERATED AS (INTEGER(date_of_sale)/100) )
ORGANIZE BY DIMENSIONS (store_nbr, year_and_month)
',-1,?)
SYSPROC.ALTOBJ captures the definition of all dependent objects, drops them,
renames the table, creates the new table, loads the new table from the renamed
old table, creates the dependent objects, and issues any necessary grants.
Figure 5-1 shows the sample output from the procedure.
After the table has been altered, execute SYSPROC.ALTOBJ one more time to
clean up the renamed old table. Using the ALTER_ID parameter value from the
output of the previous step (In our example, 5), execute the SYSPROC.ALTOBJ
call as shown in Example 5-9.
Example 5-9 Sample cleanup SYSPROC.ALTOBJ SQL
CALL SYSPROC.ALTOBJ('FINISH',",5,?)
After completion of these steps, be sure to execute RUNSTATS on the
restructured table.
187
5.3.1 Utilities
Although most utilities are unaffected by MDC tables, there are certain
considerations and specific output changes.
LOAD
The SAVECOUNT option and TOTALFREESPACE file-type modifier are not
supported with MDC tables. The ANYORDER option is required but, if not
specified, is set on automatically.
To improve performance, consider increasing the database configuration
parameter UTIL_HEAP_SIZE (utility heap size) in the database configuration. In
addition, set database configuration parameters SORTHEAP and
SHEAPTHRES_SHR to high values, because the load always includes a build
phase to create the required dimension indexes.
You need to sort the data by the dimension columns before loading. Because of
the need to build blocks and block indexes, the load is very sensitive to whether
the data is grouped by the dimension columns. In one test, we discovered that
the rows per second achieved with sorted data was about seven times greater
than with unsorted data.
IMPORT
You cannot use the CREATE or REPLACE_CREATE options to load an MDC
table.
188
REORG
You cannot use the ALLOW WRITE ACCESS option on the reorganization of an
MDC table, unless you also specify the CLEANUP ONLY option.
Reorganization of MDC tables needs to be a very rare event, unless there is
significant delete activity resulting in sparsely occupied blocks. Unlike a table
with a clustered index, there is no degradation in the data clustering, because
every row is always stored in a block with rows that have the same dimension
column values.
189
5.3.3 Explain
MDC tables and block indexes add new types of data access to queries.
Example 5-10 on page 191 shows the output from db2expln of a select against a
non-MDC table. Compare it with Example 5-11 on page 192, which shows the
output from db2expln of the same query against an MDC table.
190
Statement:
select *
from tpcd.customer
where c_mktsegment='FURNITURE'
191
192
Estimated timerons
Remember, timerons are an artificial measure of execution cost used by the
optimizer to select between access paths. A timeron cannot be used to
compare results between systems. However, within the same system in the
same operational environment, a timeron can be a valid form of comparison
between queries.
Prefetch status
Prefetch status here is None, which means that the optimizer does not see an
advantage to prefetching several blocks of data at a time. In the non-MDC
table, prefetch was Eligible, because the server was planning to scan the
entire table. Because of the clustering caused by the blocks, there is no
guarantee that prefetching is useful. Every block contains rows to be selected
compared to a percentage of the extents in the non-MDC table.
Locks
A new level of locking, block level, is introduced with MDC tables.
193
both processing time and log space requirements when the number of affected
rows is large.
Applications must not make massive updates to an MDC dimension column.
Updating an MDC dimension column causes the following actions to occur (at a
minimum):
The row is deleted from its current page.
Because the rows dimension columns no longer match the values for the cell
in which it resides, it must be moved to a different cell.
The row is placed on another page.
In the worst case, this includes creating a new cell. In that case, the
dimension block indexes and consolidated block index are updated. At best,
this process is the same as a new row insertion into an existing cell with room
in a block.
Any row-level indexes are updated to reflect the new position of the row.
The rows row ID changes, because it now resides on a different page. All
row-level indexes for the table contain the old row ID of the row.
194
195
196
A new table, MDC.CUSTOMER, was created by using the DDL in Example 5-16
and the data copied from TPCD.CUSTOMER.
Example 5-16 Creating the customer table with MDC
197
TPCD.CUSTOMER
MDC.CUSTOMER
CARD
300,000
300,000
NPAGES
13,610
13,667
FPAGES
13,611
16,032
ALTBLK
N/A
500
TSIZE
53,400,000
53,400,000
Table 5-3 Comparison of indexes for non-MDC and MDC customer tables
REORGCHK data
TPCD.CUSTOMER
MDC.CUSTOMER
NDXCUST clustering
index
INDCARD
300,000
500
LEAF
679
LVLS
KEYS
125
125
C_NATIONKEY
dimension index
INDC ARD
500
LEAF
LVLS
KEYS
25
C_MKTSEGMENT
dimension index
INDCARD
500
LEAF
LVLS
KEYS
After creating the tables, the SQL shown in Example 5-17 on page 199 was
executed 50 times using db2batch to compare performance.
198
--#BGBLK 50
SELECT
* FROM tpcd.customer WHERE c_nationkey = 0;
SELECT
* FROM tpcd.customer WHERE c_mktsegment = 'FURNITURE';
SELECT
* FROM tpcd.customer
WHERE
c_mktsegment = 'FURNITURE'
AND c_nationkey = 0;
SELECT
* FROM mdc.customer WHERE c_nationkey = 0;
SELECT
* FROM mdc.customer WHERE c_mktsegment = 'FURNITURE';
SELECT
* FROM mdc.customer
WHERE
c_mktsegment = 'FURNITURE'
AND c_nationkey = 0;
--#EOBLK
Again, we remind you that this database was in a lab environment. Do not use
the results that we received to predict the performance in your environment. We
also ran explains on the SQL. Timings and information gleaned from the explains
are shown in Table 5-4 on page 200.
Note that the non-MDC table is in pristine condition where the clustering index is
concerned. The data was loaded in cluster sequence and no updates had taken
place. As the table is updated, the effectiveness of the clustering decreases until
the table is reorganized. In the MDC table, the data is kept clustered, because
each cell only contains one combination of clustering column values. So,
reorganizing the table to re-cluster the data is unnecessary.
199
Statistic
Non-MDC
MDC
where
c_nationkey = 0
Average time
(proportional to
non-MDC average)
1.00
0.44
Minimum time
(proportional to
non-MDC average)
0.42
0.46
Maximum time
(proportional to
non-MDC average)
1.87
0.53
Relational scan of
table
Uses dimension
block index for
c_nationkey
Average time
(proportional to
non-MDC average)
1.00
0.51
Minimum time
(proportional to
non-MDC average)
0.79
0.48
Maximum time
(proportional to
non-MDC average)
2.27
0.58
Uses clustering
index
Uses dimension
block index for
c_mktsegment
where
c_mktsegment =
'FURNITURE'
200
Where clause of
query (select *
from customer
where ...)
Statistic
Non-MDC
MDC
where
c_mktsegment =
'FURNITURE' and
c_nationkey = 0
Average time
(proportional to
non-MDC average)
1.00
0.23
Minimum time
(proportional to
non-MDC average)
0.97
0.21
Maximum time
(proportional to
non-MDC average)
3.83
0.26
Uses clustering
index
Uses consolidated
block index
201
DB2 Design Advisor was executed against the table using the workload, which is
shown in Example 5-19, and the following command:
db2advis -d doug -i tpcd.workload.advis.sql -m C -o tpcd.advis.results
The resulting output is shown in Example 5-20.
Example 5-19 Workload for DB2 Design Advisor
202
Table 0: LINEITEM,
number of pages 415425,
block size 32
There are 1 candidate tables considered for Multi-dimensional
Clustering conversion
Searching the multi-dimensional space for solutions for LINEITEM...
Percentage of search points visited...
100
2 clustering dimensions in current solution
[620904.0000] timerons (without any recommendations)
[194247.0744] timerons (with current solution)
[68.72%] improvement
---- LIST OF MODIFIED CREATE-TABLE STATEMENTS WITH RECOMMENDED
PARTITIONING KEYS AND TABLESPACES AND/OR RECOMMENDED MULTI-DIMENSIONAL
CLUSTERINGS
-- ===========================
-- CREATE TABLE "TPCD "."LINEITEM" ( "L_ORDERKEY" INTEGER NOT NULL ,
-"L_PARTKEY" INTEGER NOT NULL ,
-"L_SUPPKEY" INTEGER NOT NULL ,
-"L_LINENUMBER" INTEGER NOT NULL ,
-"L_QUANTITY" DECIMAL(15,2) NOT NULL ,
-"L_EXTENDEDPRICE" DECIMAL(15,2) NOT NULL ,
-"L_DISCOUNT" DECIMAL(15,2) NOT NULL ,
-"L_TAX" DECIMAL(15,2) NOT NULL ,
-"L_RETURNFLAG" CHAR(1) NOT NULL ,
-"L_LINESTATUS" CHAR(1) NOT NULL ,
-"L_SHIPDATE" DATE NOT NULL ,
-"L_COMMITDATE" DATE NOT NULL ,
-"L_RECEIPTDATE" DATE NOT NULL ,
-"L_SHIPINSTRUCT" CHAR(25) NOT NULL ,
-"L_SHIPMODE" CHAR(10) NOT NULL ,
-"L_COMMENT" VARCHAR(44) NOT NULL
,
-MDC704121438030000 GENERATED ALWAYS AS (
((INT(L_SHIPDATE))/16)) )
-- ORGANIZE BY (
-MDC704121438030000,
-L_SHIPMODE )
-- ;
-- COMMIT WORK ;
203
204
205
206
207
6. You can import SQL from various sources, including a workload file, recently
explained statements (in the EXPLAIN tables of the database), or recently
executed SQL (from a dynamic SQL snapshot). In our example, we chose
Workload file, entered our file name, and clicked Load file. See Figure 5-5
on page 209.
208
7. The statements in the workload file are displayed. We selected all the
statements and clicked OK.
8. Back on the workload panel, we clicked Next, because we did not want to
change any of the statements or their relative execution frequency.
9. We do not need to calculate statistics on the tables in the workflow, because
all of our statistics are current. Click Next to proceed.
10.Choose any options needed on the Options panel of the Design Advisor. We
do not want to limit the disk space utilized for MDC, so do not select any
options and just click Next.
11. On the Calculation panel, select Now for when to execute analysis and click
Next.
12.On the recommendations panel (Figure 5-6 on page 210), a list of the tables
to alter is displayed. Click on the selection box to select which table changes
are accepted and click Show DDL to see the DDL generated by the Design
Advisor. You can also click Show workload detail to see the estimated
savings (in timerons) for each statement in the workload. Click Next to
proceed to the next step.
209
13.The Unused Objects panel shows the list of database objects (usually
indexes) that were not referenced in the workload. If there are items listed
that can be removed, you might mark them here by clicking the box in the
Accept column for the object. Be certain that other SQL statements are not
impacted by the removal. Click Next to continue.
14.The Schedule panel (Figure 5-7 on page 211) is where you can choose to
save the DDL necessary to convert the table, and either execute the DDL
immediately or schedule it for later execution. Click Cancel if you do not want
to execute the DDL. Otherwise, make a scheduling selection and click Next
for a final summary panel or Finish to exit the Design Advisor and proceed to
executing the DDL as scheduled.
210
211
212
bytes on each page that were updated. In addition, because the delete performs
at the page level instead of the row level and updates only a few bytes per page,
the processing time is significantly reduced.
213
dimensions, because that implies that for any one customer, period, and product,
there is enough data to fill a block. However, by adding a few computed columns
to the fact table that are computed by dividing the customer and product keys by
a certain value, we might be able to arrive at a dimension structure that does
have enough data to fill each block. These computed columns add length to each
row of the fact table, but the improved performance of the MDC dimensions and
the reduced size of the MDC block indexes compared to the row-level indexes
might offset that cost.
Based on this exercise, we strongly recommend that you use the DB2 Design
Advisor to help you determine a reasonable solution.
214
Chapter 6.
215
In general, there is no conflict between DPF and MDC. The distribution key might
or might not be used as a dimension key. If it is used as a dimension key, there is
no impact on cell size in the partitions. If the distribution key is not a dimension
key, you must estimate the average blocks per cell by database partition.
Database partitioning might reduce what was a reasonable average cell size to a
value less than one block, which means wasted space. This does not mean that
216
you must make the distribution key an MDC dimension. The keys serve different
purposes and have different standards for column selection. The columns must
be selected based on how well they fill those purposes. Choose the distribution
key to provide the maximum collocation with other tables in the database with
which the table is often joined. Choose the dimension keys for query
performance.
SELECT
FROM
GROUP BY
ORDER BY
partition,
COUNT( * ) cells,
AVG(RowsPerCell) AS RowsPerCell,
MIN(RowsPerCell) AS MinRowsPerCell,
MAX(RowsPerCell) AS maxRowsPerCell,
STDDEV(RowsPerCell) AS sdRowsPerCell
(
SELECT
dbpartitionnum(col1) partition,
col1,
col2,
... ,
colN,
COUNT( * ) RowsPerCell
FROM
table1
GROUP BY
dbpartitionnum(col1),
col1,
col2,
... ,
colN
) cell_table
ROLLUP(partition)
1
217
If the data is not evenly distributed, look for partitions that have an average rows
in a cell that is significantly lower than the overall average. This might indicate
more wasted space in that partition, which translates into more space required to
store the cells in the partition. Because the partitioned database allocates the
table space equally across partitions, this means that the amount of additional
space is multiplied by the number of partitions. For example, if a table requires
five extents in most partitions, but one of the partitions requires seven extents,
each partition has seven extents allocated for the table. This is a function of DPF.
MDC might simply exacerbate the condition.
PARTITION
CELLS ROWSPERCELL MINROWS MAXROWS SD
--------- ------- ----------- ------- ------- -----------------------1
17445
228
1
304
+4.46416060975245E+001
2
17440
229
1
311
+4.45760485138464E+001
3
17445
229
1
308
+4.45651991378273E+001
52330
229
1
311
+4.45962812765570E+001
We use the methodology that is described in Determine candidate page size
and extent size combinations on page 182 to choose the page size and extent
size. If we use 228 and 46 as the average and standard deviation (taking the
smallest average and the highest deviation give a worst case estimation of the
probability of a small cell), we get Table 6-1 on page 219. In this table, it is
obvious that the only good choice for page size and extent size is 4096 bytes for
the page and 4 pages in an extent.
218
Table 6-1 Computing rows in a page and extent for database partition and MDC
Page size
(in bytes)
Rows in
a page
Extent size
(in pages)
Rows in
extent
Extents in a
cell
Probability of
a cell smaller
than one
extent
4096
27
108
2.11
0.45%
216
1.06
39.71%
12
324
0.70
98.16%
8192
54
216
1.06
39.71%
16384
109
436
0.52
100.00%
32768
219
876
0.26
100.00%
219
combination of database partitioning and table partitioning now means that you
can distribute data across multiple database partitions, multiple table spaces
(due to the table partitioning), and multiple containers. This provides significant
scalability for a single table. It can also significantly boost the query performance
by taking advantage of the massively parallel processing (MPP) plus symmetric
multiprocessing (SMP) with table partition elimination.
Cluster
Database partition group 01
DB Partition 1
DB Partition 2
DB Partition 3
DB Partition 4
DB Partition 5
more
partitions
Table space 1
C1
C2
C3
C4
C5
C2
C3
C4
C5
C2
C3
C4
C5
C2
C3
C4
C5
Table space 2
C1
Table space 3
C1
Table space 4
C1
220
dbpg2007 MANAGED
dbpg2007 MANAGED
ON
ON
ON
ON
ON
ON
ON
12800 ) ON
ON
ON
ON
ON
ON
ON
ON
ON
221
dbpg2007 MANAGED
dbpg2007 MANAGED
ON
ON
ON
ON
ON
ON
ON
ON
ON
ON
ON
ON
ON
ON
ON
ON
222
12800 ) ON
12800 ) ON
12800 ) ON
12800 ) ON
12800 ) ON
12800 ) ON
Note the shortened version of the range partitioning syntax where the ENDING
AT parameter is implied.
We inserted data into the table using the statement in Example 6-5. As you can
see, there was considerable randomization of the data. However, the insert was
very quick due to the partitioned database resources and the table partitioning
that allow the distribution of the containers over a large number of physical
drives.
Example 6-5 Insert command for random data
223
224
Cluster
Database partition group 01
DB Partition 1
DB Partition 2
DB Partition 3
DB Partition 4
DB Partition 5
C2
C3
C4
C5
C11
C21
C31
C41
C51
C12
C22
C32
C42
C52
more
partitions
Table space 1
C1
Table space 2
more
table spaces
Figure 6-2 Multiple containers per table space in a partitioned table in a partitioned
database
225
12800 ) ON DBPARTITIONNUMS
12800 ) ON DBPARTITIONNUMS
12800 ) ON DBPARTITIONNUMS
12800 ) ON DBPARTITIONNUMS
12800 ) ON DBPARTITIONNUMS
12800 ) ON DBPARTITIONNUMS
12800 ) ON DBPARTITIONNUMS
226
12800 ) ON DBPARTITIONNUMS
12800 ) ON DBPARTITIONNUMS
12800 ) ON DBPARTITIONNUMS
12800 ) ON DBPARTITIONNUMS
dbpg2007 MANAGED
ON
ON
ON
ON
ON
ON
ON
ON
227
12800 ) ON DBPARTITIONNUMS
12800 ) ON DBPARTITIONNUMS
12800 ) ON DBPARTITIONNUMS
12800 ) ON DBPARTITIONNUMS
12800 ) ON DBPARTITIONNUMS
228
than individual column values, using the column as a dimension might provide a
query performance benefit.
For example, consider a table that contains a column of the form YYYYMM, which
contains a financial reporting period, where the table is to contain several years
of data with the oldest year rolled out annually. Partitioning the table on this
column so that each partition contains one year of information is reasonable for
roll-out purposes, while using the column as a dimension (which has monthly
granularity) can improve queries that reported by month or quarter.
When determining the page size and extent size for the table, table partitioning
affects the average rows in a cell the same way that database partitioning does.
Instead of working with the overall average rows in a cell, compute the average
rows in a cell for each range. If working with a new table, the simplest estimation
is to divide the estimated overall rows in a cell by the number of ranges. For an
existing table, use an SQL query similar to Example 6-7 to compute the average
rows in a cell and standard deviation for each range. Analyze the results as
shown in 5.1.4, Estimate space requirements on page 180.
Example 6-7 Computing average rows per cell across table ranges
229
When you detach a partition from a data partitioned MDC table, the new table is
an MDC table with the same dimensions as the original table. The new table
cannot be referenced until you commit the detach. The block indexes for the new
table are built when you first access the new table. Consequently, the first
access experiences reduced performance. Similarly, when you want to attach a
table to the partitioned MDC table, it must be dimensioned identically to the
partitioned table.
Example 6-8 shows a simple table definition that includes both table partitioning
and MDC.
Example 6-8 ORDERS table with table partitioning and MDC
230
the casting of the values in the RANGELIST table to match the data type of the
column used for range partitioning.
Example 6-9 SQL to compute average rows in a cell by range for the ORDERS table
231
A table such as that defined in 5.5.4, Using MDC to provide roll-out functionality
on page 211 might meet all these requirements:
With 1,500,000,000 rows per year, the table contains a large volume of data.
Use database partitioning to distribute this data across several database
partitions.
Only the most recent three or four years of data is kept.
Use the ACCT_YR column to partition the table in each database partition by
year. Use the DETACH command to remove years of data as they are no
longer needed.
The columns ending in APSK are foreign keys to dimension tables, so they
are good candidates for dimensions.
Create dimension indexes on the foreign keys. You might have to create
computed columns from the APSK columns to get the average rows per block
high enough to be useful.
Computation of page sizes and extent sizes follows the patterns already
established. For a new table, divide the estimated rows in a cell by the number of
ranges and then divide that result by the number of database partitions to get an
average number of rows in a cell within a range within a database partition. For
an existing non-distributed table, use the SQL query in Example 6-7 on page 229
to compute the average rows in a cell by range and divide that result by the
number of database partitions. Also, divide the standard deviation by the number
of database partitions. For an existing distributed table, use the SQL in
Example 6-10 to compute the average rows in a cell by range and database
partition and the standard deviation.
Example 6-10 Computing average rows in cells across table ranges
232
ON
GROUP BY
ORDER BY
table1.rangekey
BETWEEN range_low AND range_high
GROUP BY
DBPARTITIONNUM(col1), range, col1, col2,
... ,
colN) cell_table
ROLLUP(partition, range)
1
233
234
Appendix A.
235
$ ssh-keygen -t rsa
Or you can generate a DSA key pair, see Example A-2.
Example: A-2 Generating a DSA-encrypted key pair
$ ssh-keygen -t dsa
You are prompted for input but just accept the default. You then are prompted
to enter a passphrase. In our environment, we do not want a passphrase so
we press Enter twice. If you enter a passphrase, ssh challenges every
authentication attempt. DB2 does not allow rsh to prompt for additional
verification.
Two new files are generated in the ~/.ssh directory, id_rsa (the private key)
and id_rsa.pub (the public key), for RSA encryption. In a similar manner,
name files are generated for DSA encryption.
2. Enable the key pair.
Example A-3 on page 237 shows the commands to enable the RSA key pair.
236
$
$
$
$
$
$
cd ~/.ssh
mv id_rsa identity
chmod 600 identity
cat id_rsa.pub >> authorized_keys
chmod 644 authorized_keys
rm id_rsa.pub
$
$
$
$
$
$
cd ~/.ssh
mv id_dsa identity
chmod 600 identity
cat id_dsa.pub >> authorized_keys
chmod 644 authorized_keys
rm id_dsa.pub
237
# vi sshd_config
Then edit or change the line;
#
HostbasedAuthentication no
HostbasedAuthentication yes
2. Edit the shosts.equiv file.
The shosts.equiv file can be found in /etc/ssh on AIX and Linux, in /etc on
Solaris, and in /opt/ssh/etc on HP-UX. This file might not exist; therefore, you
must create one and ensure that it is owned by the root user and only allows
238
user read and write access and group/other read access. Each host must be
able to communicate with every other host in the DPF environment, so you
must set up the shosts.equiv file in such a way that it can be reused on all
hosts or partition servers. Edit the file as shown in Example A-8.
Example: A-8 Edit the shosts.equiv file
serverA
serverA.domain.com
serverB
serverB.domain.com
3. Edit the ssh_known_hosts file.
The ssh server host system needs to have access to the ssh client hosts
public key and, for host-based authentication, the trust mechanism looks for
public keys in the ssh_known_hosts file. The ssh_known_hosts file is found in
the /etc/ssh directory on AIX, Linux, and Solaris and in /opt/ssh/etc on HP-UX.
As with the shosts.equiv file, you might need to create the ssh_known_hosts
file if it does not exist. Ensure that it is owned by the root user and only allows
user read and write access and group/other read access.
Add the client machines unqualified host name, fully qualified host name, and
IP address to the ssh_known_hosts file. You can use the ssh-keyscan utility
to populate this file. The command for this is shown in Example A-9 for RSA
and Example A-10 for DSA. You need to change the directory (cd) to /etc/ssh
on AIX Linux and Solaris and to /opt/ssh/etc on HP-UX before running this
command.
Example: A-9 Updating the ssh_known_hosts file for RSA encryption
239
do this. This only affect future ssh sessions, not the sessions currently
running.
First, we find the pid for ssh daemon (sshd) as shown in Example A-11.
Example: A-11 Finding the sshd pid
0 15:18:48
0 16:08:55
pts/3
0:00 /usr/sbin/sshd
0:00 grep sshd
This tells us the pid is 1470502. We can now tell the sshd process to reread
the sshd_config file with the command in Example A-12.
Example: A-12 Restart the ssh daemon
240
#
HostbasedAuthentication no
HostbasedAuthentication yes
You then need to add a line to the ssh_config file to tell the ssh client to use the
ssh-keysign utility to read the hosts private key as shown in Example A-14.
Example: A-14 Enable the EnableSSHKeysign parameter
EnableSSHKeysign yes
db2set DB2RSHCMD=/usr/bin/ssh
241
242
Appendix B.
Additional material
This book refers to additional material that can be downloaded from the Internet
as described below.
243
30 KB minimum
Windows or Linux
486 or higher
256 MB
244
Related publications
The publications listed in this section are considered particularly suitable for a
more detailed discussion of the topics covered in this book.
Other publications
These publications are also relevant as further information sources:
IBM - DB2 9
What's New, SC10-4253
Administration Guide: Implementation, SC10-4221
Administration Guide: Planning, SC10-4223
Administrative API Reference, SC10-4231
Administrative SQL Routines and Views, SC10-4293
Administration Guide for Federated Systems, SC19-1020
Call Level Interface Guide and Reference, Volume 1, SC10-4224
Call Level Interface Guide and Reference, Volume 2, SC10-4225
Command Reference, SC10-4226
Data Movement Utilities Guide and Reference, SC10-4227
Data Recovery and High Availability Guide and Reference, SC10-4228
Developing ADO.NET and OLE DB Applications, SC10-4230
Developing Embedded SQL Applications, SC10-4232
Developing Java Applications, SC10-4233
245
246
Online resources
These Web sites are also relevant as further information sources:
DB2 Information Center
http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp
Database and Information Management home page
http://www.ibm.com/software/data/
Related publications
247
248
Index
Symbols
.rhosts 50
.ssh 236
A
access method 166
access plan 92, 120
adjustment value 108
administrative view 166
advantage 18
aggregate snapshot data 96
AIC 143
architecture 16
archival logging 20
array 5
asynchronous index cleanup 143
authentication 236
B
binary 98
block 5, 9, 27
block index 9, 178
block prefetching 27
bottlenecks 21
broadcast joins 45
buffer pool 5, 20
bulk data deletion 144
Business Intelligence 22
C
cache 20
cardinality 178, 180
catalog partition 7, 50
catalog system table 160
catalog table 43
cell 13, 178
cell sizes 183
client application 7
clustered data 27
clustering 9
clustering keys 11
collocated join 121
collocated Joins 44
collocated tables 44
collocation 124
column 5
column length 180
concurrent transaction 167
configuration 18
consistency point 159
container 4
container tag 107
contention 144
coordinator node 7
coordinator partition 7, 44
CS 25
cursor stability 25, 166
D
data partition 8, 160
data partition elimination 9
database configuration parameter
DFT_QUERYOPT 120
LOGPATH 21
SHEAPTHRES_SHR 188
SORTHEAP 188
UTIL_HEAP_SIZE 188
database configuration switches 97
database managed space 4, 129
database partition 2, 16
db2_all 87, 99
db2_install 46
db2ckbkp 86
db2evmon 98
db2exfmt 103, 161
db2gpmap 57
db2icrt 79
db2nchg 80
db2ncrt 79
db2ndrop 81
db2nodes.cfg 47
db2pd 99, 166
db2rcmd.exe 81
deadlock 167
Decision Support Systems 20
249
E
encryption 235236
equijoin column 41
escalation 26
event monitor 97, 166
exclusive access 167
extent 5, 9, 27
extent size 5, 180
F
Fast Communication Manager 48
file 5
file system cache 41
fixed length 180
forward rebalance 114
FPAGES 197
I
I/O 5, 18
I/O parallelism 19
IBMCATGROUP 39
IBMDEFAULTGROUP 39
IBMTEMPGROUP 39
id_rsa 236
id_rsa.pub 236
index 4
index maintenance 127
index reorganization 23
index scan 161
index statistics 92
index-SARGable 27
instance level 96
instance owner 2
instance owning partition server 236
instance-owning server 2
intent lock 167
internal buffer 94
inter-partition 81
interpartition 18
Interpartition parallelism 19
intrapartition 18
Intrapartition parallelism 19
IP address 49, 239
isolation level 120, 166
K
key pair 236
keyword 97
granularity 126
H
hash value 6
hierarchical table 167
high speed network 16
high-water mark 114, 117
history file 85
home directory 47, 236
host name 239
host trust relationship 237
250
M
mass update 167
Massively Parallel Processing 16
MDC 9, 177
memory 2, 16
metadata 7
monitor switch 95
MPP 16
MQT 119
multi-column clustering index 28
multi-partition database 20
multi-partition group 3
physical server 2
physical storage 4
point in time 94
pointer 128
predicates 179
prefetch size 5
prefetching 5
prerequisite 178
primary key 124
primary machine 2
Public key-based authentication 236
Q
N
named pipe 90
netname 48
non-distributed table 232
non-intent locks 26
non-partitioned table 25
normal distribution 183
NPAGES 197
O
OLTP 20
online index creation 129
Online Transaction Processing 20
optimization level 120
option 128
P
page 5
page size 5, 180
parallel 2
parallelism 18
partition attach 8
partition group 3, 5, 20, 39
partition key 8
partitioned environment 5
partitioned instance 2
partitioning key 124
partition-owning server 49
passphrase 236
password 236
performance 5, 22
physical database partition 18
physical node 3
physical objects 128
query 7
query optimization 120
query optimizer 45
query parallelism 19
query performance 22
R
RAID 41
range 8, 106
range specifications 26
range-clustered table 167
raw device 5
RDBMS 1
rebalance 106
rebalancer 115
record 20
recovery 20, 127
Redbooks Web site 248
Contact us xi
region 128
register variables
CURRENT MAINTAINED TABLE TYPES FOR
OPTIMIZATION 120
CURRENT REFRESH AGE 120
regular index 9
remote shell 235
resourcesetname 48
response time 18
reverse rebalance 112, 114, 116
roll-in 8, 22, 126
roll-out 8, 22, 126
round-robin algorithm 6
row count 42
row length 42
row-level index 27, 178
Index
251
RSA 235
rsh 235
S
scalability 2, 18
scale out 18
scale up 18
Secure shell 235
SET INTEGRITY 127, 174
Setup wizard 49
shared disk 17
shared nothing 16
shared nothing architecture 18
shared-everything architecture 2
shosts.equiv 238
SIGHUP 239
single partition 5
single partition database 20
skew data 56
slice 12
SMP 2, 16
SMS 4, 129, 178
snapshot 94
sorting 20
Space Map Pages 115
space requirements 180
space utilization 23
ssh 235
ssh server configuration file 238
ssh_known_hosts 239
sshd_config 238
ssh-keyscan 237
standard deviation 232
state 97
statistics 180
storing long data remotely 154
strategy 127
stripe 106
stripe set 106
stripe size 41
Symmetric Multiprocessor 16
system catalog 7, 19
System Managed Space 4
table partitioning 8, 21
table size 23
table sizes 42
table space 4
tape 90
target table 167
terminology 161
text editor 47
throughput 21
timerons 193
transaction 20
transaction log 20
trust mechanism 239
TSM 86
typed table 167
U
uncommitted transaction 160
uniform distribution 56
unique index 124, 160
usage consideration 16
utility parallelism 19
W
workloads 20
X
X/Open Backup Services Application Programmers
Interface 86
XBSA 86
T
table function 166
table lock 25
table partition elimination 9
252
(0.5 spine)
0.475<->0.875
250 <-> 459 pages
Back cover
Database Partitioning,
Table Partitioning, and
MDC for DB2 9
Differentiating
database
partitioning, table
partitioning, and
MDC
Examining
implementation
examples
Discussing best
practices
INTERNATIONAL
TECHNICAL
SUPPORT
ORGANIZATION
BUILDING TECHNICAL
INFORMATION BASED ON
PRACTICAL EXPERIENCE
IBM Redbooks are developed by
the IBM International Technical
Support Organization. Experts
from IBM, Customers and
Partners from around the world
create timely technical
information based on realistic
scenarios. Specific
recommendations are provided
to help you implement IT
solutions more effectively in
your environment.
ISBN 0738489220