IBM DB2 RUNSTATS Utility and Real-Time Statistics
IBM DB2 RUNSTATS Utility and Real-Time Statistics
IBM DB2 RUNSTATS Utility and Real-Time Statistics
Bryan F. Smith IBM Tuesday, August 14, 2007 Tuesday, 6 November 2007 11:45 am 12:45 pm Session 1316
Platform: DB2 for z/OS
Abstract
This presentation reviews the basics of the RUNSTATS utility (What it does; Why you need to run it; How DB2 uses the information), and explores new statistics collected on data and indexes, including: partition level information on Data Partitioned Secondary Indexes; non-uniform distributon statistics on non-indexed columns; and historical statistics. The real-time statistics are also reviewed. Upon completion of this session, the attendee, whose skill level may range from low to high, will be able to understand how to get the most out of DB2's statistics and operate at optimal efficiency. 2
Topics
Why RUNSTATS? Invoking RUNSTATS Commonly asked questions (about the stats) Real-time Statistics Rebinding considerations Reorg recommendations When is RUNSTATS needed? New/changed data statistics New/changed index statistics Handling part level statistics for DPSIs Distribution Statistics Enhanced HISTORY statistics changes Flushing the dynamic statement cache What statistics should I gather?
3
Why RUNSTATS?
The RUNSTATS utility computes statistics on a specified table space or index and updates the DB2 catalog Two types of statistics
Access path statistics
Those used by BIND/PREPARE in its process of optimization to determine access path (some can also be used to help determine when to reorg)
Space
Those used by the DBA to monitor space usage; to assist in capacity planning; to help determine when to reorg; etc.
4
SYSIBM.SYSTABLESPACE
NACTIVE/F AVGROWLEN SPACEF
Table in DSNDB06.SYSHIST Table in DSNDB06.SYSSTATS Collected from table space scan either Collected from index scan SYSIBM.SYSINDEXPART_HIST
AVGKEYLEN CARDF DSNUM EXTENTS FAROFFPOSF LEAFNEAR LEAFFAR NEAROFFPOS LEAFDIST PSUEDO_DEL_ENTRIES SPACEF PQTY SECQTYI
aggregates SYSIBM.SYSTABSTATS_HIST
CARD/F NPAGES PCTPAGES NACTIVE PCTROWCOMP
aggregates SYSIBM.SYSTABLEPART_HIST
AVGROWLEN CARD/F DSNUM EXTENTS NEARINDREF FARINDREF PAGESAVE PERCACTIVE PERCDROP SPACE/F PQTY SQTY SECQTYI
SYSIBM.SYSINDEXSTATS_HIST
FIRSTKEYCARD/F FULLKEYCARD/F NLEAF NLEVELS IOFACTOR PREFETCHFACTOR KEYCOUNT/F CLUSTERRATIO/F FULLKEYCARDDATA
Invoking RUNSTATS
Scans the tablespace
Invoking RUNSTATS
Affects the collection of column-statistics from the table space scan (expensive)
colgroup-spec
KEYCARD (Recommended)
Collects all of the distinct values in all of the 1 to n key column combinations for the specified indexes. n is the number of columns in the index. For example, suppose that you have an index defined on three columns: A, B, and C. If you specify KEYCARD, RUNSTATS collects cardinality statistics for column A, column set A and B, and column set A, B, and C. So these are cardinality statisics across column sets... if we had a 3-column index that had these values:
Col1 A A A A A A A B Col2 B B B B C C D B Col3 C D E E A A A B
If there were more gaps than active pages, LEAFDIST would be larger FREEPAGE on an index can certainly affect the calculation of LEAFDIST We used to use this value to determine when to reorg an index, but now we have better stats to determine this (LEAFFAR/NEAR)
10
0th jump
11
SYSIBM.SYSINDEXES.CLUSTERRATIO
An access path statistic that can also helps in determining when to reorg % of the rows that are in cluster order Rows are counted as being clustered if they are in a greater or equal page number of the previous row This is a statistic that describes the data in the table(space), even though it is reported in SYSINDEXES REORG INDEX will never affect this statistic
12
CLUSTERRATIO
Cluster Count
page 1
A B D E F K H I C
1 2 3 4 5 6
page 2
page 3
page 4
J L G
7 8
13
SYSIBM.SYSINDEXES_HIST
CLUSTERRATIO/F
SYSIBM.SYSINDEXPART_HIST
FAROFFPOSF NEAROFFPOS
14
Cluster Count
A B D X
PCTFREE
1 2 X 3 4
PCTFREE
page 2
E F C
page 3
X H I K
PCTFREE
5 6 7 X
page 4
J L G 8
PCTFREE
8 Optimal would be 11
8 Optimal would be 11
15
Example where INDREF is not correlated with Cluster Count -> CLUSTERRATIO
INDREF
page 1
Cluster Count
A B C X
PCTFREE
1 2 X 3 4 5 X 6 7 8 9 X 10 11 A, 1 <B, 1 <C, 1 <D, 2 <E, 2 <F, 2 <G, 3 <H, 3 <I, 3 <J, 4 <K, 4 <L, 4
page 2
D E F
page 3
PCTFREE
G H I
page 4
11 Optimal!
PCTFREE
J K L
PCTFREE
16
A B D E F K H I C
1 2
page 2
OFFPOS
3 4 5 6 A, 1 <B, 1 <C, 3 D, 1 <E, 2 <F, 2 <G, 4 H, 2 <I, 3 <J, 4 K, 2 <L, 4 X X X X X X
page 3
page 4
J L G
7 8
8 Optimal would be 11
17
We just saw an example where *OFFPOS is correlated to the cluster count (which is used to compute CLUSTERATIO). Can an example be created showing non-correlation between these two metrics?
18
What is the semantic difference between RUNSTATS TABLESPACE and RUNSTATS TABLESPACE TABLE (ALL)?
The TABLE keyword triggers collection of column statistics
19
Extra credit
Is there any difference between running
RUNSTATS TABLESPACE DB1.TS1 TABLE (ALL) INDEX (ALL) vs. RUNSTATS TABLESPACE DB1.TS1 TABLE (ALL) RUNSTATS INDEX(ALL) TABLESPACE DB1.TS1 - ??
20
Real-time Statistics
Introduced in V7 Contain space and some accesspath statistics in userdefined tables:
SYSIBM.TABLESPACESTATS (one row per partition) SYSIBM.INDEXSPACESTATS (one row per partition) In DB2 9, these are moved into the DB2 Catalog (DSNDB06.SYSRTSTS) as
SYSIBM.SYSTABLESPACESTATS SYSIBM.SYSINDEXSPACESTATS
Intended to eliminate running RUNSTATS for reasons of running utilities by exception Access path selection doesnt use RTS in V7, V8 or V9
21
Reorg Statistics
Runstats Statistics
Copy Statistics
Global Statistics
Reorg Statistics
Runstats Statistics
Copy Statistics
Global Statistics
RTS
SYSTTABLESPACESTATS
Global
SYSINDEXSPACESTATS
Global
NACTIVE Incremental Statistics REORG Statistics NLEVELS REBUILDLASTTIME NPAGES LASTTIME NLEAF INSERTS EXTENTS UPDATES DELETES SPACE APPENDINSERT TOTALENTRIES PSEUDODELETES LASTUSED MASSDELETE UPDATESTATSTIME LEAFNEAR LEAFFAR NUMLEVELS COPY Statistics LASTTIME UPDATEDPAGES CHANGES UPDATELRSN UPDATETIME RUNSTATS Statistics LASTTIME INSERTS DELETES MASSDELETE
Incremental NACTIVE REORG Statistics NPAGES LASTTIME EXTENTS INSERTS SPACE UPDATES TOTALROWS DELETES DATASIZE DISORGLOB UNCOMPRESSEDDATASIZE UNCLUSTINS MASSDELETE UPDATESTATSTIME NEARINDREF FARINDREF COPY Statistics LASTTIME UPDATEDPAGES CHANGES UPDATELRSN UPDATETIME RUNSTATS Statistics LASTTIME INSERTS UPDATES DELETES MASSDELETE
23
START DB2
Implicitly to enable real time statistics if
DSNRTSDB is not STOPPED and DB2 Catalog is accessible
STOP DATABASE(DSNRTSDB)
Flush all in-memory statistics
In V9, RTS are a part of the catalog and are always enabled
24
In a data sharing system, statistics are collected by each member In-memory statistics are always collected even if RTS is not enabled
25
STOP DB2 MODE(QUIESCE) A utility operation (e.g. LOAD, REORG, RUNSTATS, COPY, REBUILD, RECOVER)
26
Order active statistics blocks in clustering order Insert/update rows in the RTS tables via the clustering index
No RTS rows for read only table space accessed objects (LASTUSED will be updated for read only indexes)
28
Insert
Increment Inserts,Totalrows, Copy Changes counters May update Nactive, Space, Extents, Uncluster_Inserts, Distinct Updated Pages, Update LRSN, Update Timestamp, Datasize
Update
Increment Updates, Copy Changes counters May update NearIndRef/FarIndRef, Nactive, Space, Extents for VARCHAR Tables, Distinct Updated Pages, Update LRSN, Update Timestamp, Datasize
Delete
Increment Deletes, Copy Changes counters, Datasize
29
Rollback
Insert
Increment Deletes counter
Delete
Increment Inserts counter
Update
Increment Update counter
Statistics counters will not be updated during DB2 Restart Triggers may cause statistics updated for other tables
30
Insert
Increment Inserts,TotalEntries counters May update Append_Inserts, LeafNear, LeafFar, ReorgNumLevels, Nactive, Space, Extents, Nleaf
Delete
Increment Deletes counter May update Pseudo Deletes, ReorgNumLevels
31
RUNSTATS
Set Last_RUNSTATS_Timestamp Reset RUNSTATS related statistics
COPY
Set Last_COPY_Timestamp Reset COPY related statistics
LOAD REPLACE
Set Last_Load_Replace Timestamp Reset REORG related statistics
32
RECOVER TORBA/TOCOPY
Set Last_REORG, Last_RUNSTATS, Last_COPY, Last_Load_Replace, Last_Rebuild_Index to NULL Reset REORG, RUNSTATS, COPY statistics to NULL
REBUILD INDEX
Set Last_Rebuild_Index_Timestamp Reset REORG related statistics
Loss all in-memory statistics when DB2 is crashed or STOP DB2 MODE(FORCE) Unable to externalize statistics when DSNRTSDB is stopped or statistics tables are unavailable Need to run REORG, RUNSTATS, COPY to establish a reference point Statistics could be inaccurate if running vendor utilities without flushing the in-memory statistics Only physical space statistics (i.e. Nactives, Space, Extents) are maintained for DSNDB07 and the TEMP databases 34
Don't mix RTS objects with other user objects in a utility list operation
If mixed, RTS statistics will not be reset for all objects in the list
35
What is DSNACCOR?
A DB2 stored procedure that accesses the RTS tables And makes IFI calls
to gain -DISPLAY status on DB2 objects
36
Historical RTS
Rebinding considerations
Consider the following guidelines regarding when to rebind
CLUSTERRATIOF changes to less or more than 80% (a value of 0.80) NLEAF changes more than 20% from the previous value NLEVELS changes NPAGES changes more than 20% from the previous value NACTIVEF changes more than 20% from the previous value The range of HIGH2KEY to LOW2KEY range changes more than 20% from the range previously recorded Cardinality changes more than 20% from previous range Distribution statistics change the majority of the frequent column values
38
Reorg recommendations
These are generic and do not apply in all cases there is no absolutely reliable statistic as to when reorganization of table spaces or indexes should occur; however, understanding the rules of thumb will help in understanding data disorganization If reorg for performance, then track performance over time DSNACCOR (V7/8) /DSNACCOX (V9) usage
39
(REORGNEARINDREF+REORGFARINDREF (number of overflow rows since the last Reorg))/TOTALROWS > 5% in data sharing, >10% in non-data sharing REORGINSERTS (number of records inserted since the last Reorg)/TOTALROWS > 25% REORGDELETES (number of records deleted since the last Reorg)/TOTALROWS > 25% EXTENTS (number of extents) > 254 REORGDISORGLOB (number of LOBs inserted since the last Reorg that are not perfectly chunked)/TOTALROWS > 50% SPACE > 2 * (DATASIZE / 1024) (when free space is more than used space) REORGMASSDELETE > 0 (mass deletes on seg tsp and DROP on multi-table tsps)
RUNSTATS
PERCDROP > 10% SYSIBM.SYSLOBSTATS.ORGRATIO < 50% (changed to a value 0-100 in PQ96460 on V7/V8) (NEARINDREF + FARINDREF) / CARDF > 10% non-data-sharing, > 5% if data sharing FAROFFPOSF / CARDF > 10%
Or, if index is a clustering index, CLUSTERRATIOF < 90% (irrelevant if predominantly random access)
Other
Tsp is in adv reorg pending status (AREO*) as result of an ALTER TABLE stmnt Index on the tsp is in adv REBUILD pend state (ARBDP) as result an ALTER stmnt
40
(REORGNEARINDREF+REORGFARINDREF (number of overflow rows since the last Reorg))/TOTALROWS > 5% in data sharing, >10% in non-data sharing REORGINSERTS (number of records inserted since the last Reorg)/TOTALROWS > 25% REORGDELETES (number of records deleted since the last Reorg)/TOTALROWS > 25% EXTENTS (number of extents) > 254 REORGDISORGLOB (number of LOBs inserted since the last Reorg that are not perfectly chunked)/TOTALROWS > 50% SPACE > 2 * (DATASIZE / 1024) (when free space is more than used space) REORGMASSDELETE > 0 (mass deletes on seg tsp and DROP on multi-table tsps)
RUNSTATS PERCDROP > 10% Dont use RUNSTATS statistics as to a value 0-100 in PQ96460 on V7/V8) SYSIBM.SYSLOBSTATS.ORGRATIO < 50% (changeda trigger to consider running REORG
(NEARINDREF + FARINDREF) / CARDF > 10% non-data-sharing, > 5% if data sharing FAROFFPOSF / CARDF > 10%
Or, if index is a clustering index, CLUSTERRATIOF < 90% (irrelevant if predominantly random access)
Other
Tsp is in adv reorg pending status (AREO*) as result of an ALTER TABLE stmnt Index on the tsp is in adv REBUILD pend state (ARBDP) as result an ALTER stmnt
41
Other
Tsp is in adv reorg pending status (AREO*) as result of an ALTER TABLE stmnt Index on the tsp is in adv REBUILD pend state (ARBDP) as result an ALTER stmnt
42
43
RUNSTATS
LEAFFAR / NLEAF > 10% (NLEAF is a column in SYSIBM.SYSINDEXES and SYSIBM.SYSINDEXPART) PSEUDO_DEL_ENTRIES / CARDF > 10% for non-data sharing and > 5% for data sharing
Other
The index is in advisory REORG-pending status (AREO*) or advisoryREBUILD-pending status (ARBDP) as the result of an ALTER statement
44
45
HIGHKEY/HIGH2KEY/LOWKEY/LOW2KEY expanded
From CHAR(8) to VARCHAR(2000)
8 bytes not adequate for multi-byte character representations especially with Unicode
Optimizer has better information to estimate filter factors and determine access paths
46
If not specified then SORT will use sort product defaults Can also use FORCEROLLUP to aggregate partition level statistics when not all partitions have statistics
47
Data skew becomes more important Problem with skewed data and regular statistics
Optimizer assumes inaccurate distribution of values Less efficient join sequence could be chosen Less efficient method of accessing individual tables
DSTATS program could be downloaded to collect statistical data for non-indexed columns
Great improvement in access path selection, however Run separate from RUNSTATS Slow with big impact to DB2 work file database
48
49
As part of this, the previous limit of 10 names in the COLUMN parameter has been removed.
50
51
City San Jose San Jose San Jose San Jose Riverside Riverside Glendora Austin
Colvalue CA, San Jose, 95123 CA, San Jose, 95110 CA, San Jose, 95141 CA, Riverside, 92504 CA, Glendora, 91741 TX, Austin, 78732
53
Not currently collected via in-line statistics from LOAD and REORG
54
For example,
in V7 UPDATE NONE HISTORY OPTIMIZER was prohibited. in V8 UPDATE NONE HISTORY OPTIMIZER is allowed and you can monitor statistics changes over time without concern that access paths may change.
55
56
Do you want to collect statistics on every column and permutations of combination of columns?
No way!
57
58
60
Summary
Why RUNSTATS? Invoking RUNSTATS Commonly asked questions (about the stats) Real-time Statistics Rebinding considerations Reorg recommendations When is RUNSTATS needed? New/changed data statistics New/changed index statistics Handling part level statistics for DPSIs Distribution Statistics Enhanced HISTORY statistics changes Flushing the dynamic statement cache What statistics should I gather?
62
References
DB2 UDB for z/OS home page
http://www.software.ibm.com/data/db2/os390/
utilities@work
http://www.ibm.com/software/data/db2imstools/details/html/us_text.html
DB2 UDB for z/OS and OS/390 Version 7 Performance Topics, SG24-6129 DB2 UDB for z/OS and OS/390 Version 7: Using the Utilities Suite, SG246289 DB2 UDB for z/OS Version 8 What's New
http://www-3.ibm.com/software/data/db2/os390/v8/dsnwnj1.pdf
DB2 UDB for z/OS Version 8 Administration Guide DB2 UDB for z/OS Version 8 Utilities Guide and Reference
63
Information roadmap
http://ibm.com/software/db2zos/roadmap.html
Bryan F. Smith
IBM [email protected]
66