IBM DB2 - Query Plans and Explain
IBM - Query Plans and Explain
One of the main components of a Database Management System (DBMS) is the optimizer. He is the "brain" responsible for generating the Execution Plans, which are the strategies for accessing the data contained in the tables. Execution plans are generated for several SQL commands, including SELECT , UPDATE , DELETE and INSERT commands . The plans are generated taking into account a number of factors:
- the physical structure of the table: type of table (normal, range partitioned, MDC - Multidimensional clustering tables, etc).
- statistics about the data: number of records, column cardinality, etc.
- the physical characteristics of the database server: memory, processors, number of storage devices and server
There are several factors that contribute to ensuring the good performance of an SQL command:
- a good structure of the tables and their indexes.
- the use of good practices in writing SQL commands.
- constant maintenance of data statistics.
- the generation and analysis of the SQL Execution Plans.
In order for us to detect performance problems in SQL commands as early as possible, it is essential that test environments have similar characteristics and proportional to production environments. For this it is necessary:
- create and maintain tables and indexes with structures similar to those of production. The only acceptable difference between environments is that when changes are being made to the test environments. Indexes created emergency in production, for example, should be created in test environments as soon as possible.
- populate tables with a significant volume of records. As a general rule of thumb, consider that any table that will have thousands / millions of records in production should have at least 30,000 records in the test environments.
- the records must be created with the characteristics of variety and cardinality that the attributes of the table will have in production.
- the table and index statistics Data must be kept up to date.
Test environments that do not have volume or variety proportional to production environments are not suitable for performance tests, since the Execution Plans and the execution statistics (average number of blocks / records read) will be very different from those of the production environments.
Remember that it is the responsibility of the analysts, developers and homologators of the test environments:
- create the masses of data with volume and variety.
- keep the statistics of tables and indexes up to date.
- analyze Execution Plans and execution statistics for SQL commands.
The IBM DB2 EXPLAIN command
The IBM DB2 EXPLAIN Command generates the Execution Plan for an SQL command, and stores that plan in IBM DB2 tables. These tables are usually created in the "SYSTOOLS" schema . We will see that it is essential that the generation of the plan is followed by a detailed analysis, explained in the section "Making an even better and mandatory analysis of the Execution Plan" .
The EXPLAIN Command and Development Tools
In the DBVisualizer tool, we generate the Execution Plan in two ways:
- accessing the menu option "SQL Commander -> Execute Explain Plan"
- using the "CTRL-ALT-ENTER" key set
But in order for us to see the details of the execution plan we will have to proceed as follows:
- execute the command: EXPLAIN PLAN FOR ....
EXPLAIN PLAN FOR SELECT * FROM PAYMENTS;
- execute the command listed below in the "SQL to analyze the Execution Plan".
SQL command development tools such as DBVisualizer or DBeaver have different methods of viewing Execution Plans. One of the most important information contained in an Execution Plan, which is how columns are used in indexes and tables, is shown in DBeaver when going through the plan's nodes in its visualization tool. In DBVisualizer, this information is not yet available visually (the supplier has already been asked to implement this functionality), and we have to perform the above procedure to obtain this information.Practical examples of SQLs in tables, with various indexing scenarios and execution plans
Practical examples of SQLs in tables, with various indexing scenarios and execution plans
Now let's start with a series of practical examples of generating Execution plans for various scenarios.
Creating a table
CREATE TABLE DEPOSITS (ID INTEGER GENERATED ALWAYS AS IDENTITY, ACCOUNTNUMBER INTEGER, DATEOFDEPOSIT DATE, VALUE INTEGER, INCASH CHAR, CONSTRAINT PK_DEPOSITS PRIMARY KEY (ID), CONSTRAINT CK_INCASH CHECK (INCASH IN ('Y','N')));
Populating with data (30,000 records)
INSERT INTO DEPOSITS (ACCOUNTNUMBER,DATEOFDEPOSIT,VALUE,INCASH) WITH TEMP (QTD,DUMMY,ACCOUNTNUMBER,DATEOFDEPOSIT,VALUE,INCASH) AS (VALUES (0,GENERATE_UNIQUE(),1,CURRENT DATE,100,0) UNION ALL SELECT QTD + 1, GENERATE_UNIQUE(), MOD(INT(RAND() * 100), 100)+1, CURRENT DATE + MOD(INT(RAND() * 100), 100)+1, MOD(INT(RAND() * 100), 100)+150, MOD(INT(RAND() * 100), 2) FROM TEMP WHERE QTD < 29999) SELECT ACCOUNTNUMBER, DATEOFDEPOSIT, VALUE, CASE WHEN INCASH = 0 THEN 'N' ELSE 'Y' END FROM TEMP;
Checking how many records were created
SELECT COUNT (*) QTY FROM DEPOSITS; QTY ----- 30000
Updating table statistics
CALL ADMIN_CMD ('RUNSTATS ON TABLE DEPOSITS WITH DISTRIBUTION AND DETAILED INDEXES ALL');
Remember that the database does not have information about the statistics of the tables, until the RUNSTATS command is executed.
Getting some information from the IBM DB2 dictionary about the table
How many records (CARD) does the table have and what was the last update date for the statistics?]
SELECT CARD,STATS_TIME FROM SYSCAT.TABLES WHERE TABNAME = 'DEPOSITS'; CARD STATS_TIME ----- ------------------- 30000 2020-05-15 10:20:54
What are the indexes in the table, what are the columns of those indexes, and what was the last update date for your statistics
SELECT INDNAME, COLNAMES, STATS_TIME FROM SYSCAT.INDEXES WHERE TABNAME = 'DEPOSITS'; INDNAME COLNAMES STATS_TIME ------------ -------- ------------------- PK_DEPOSITS +ID 2020-05-15 10:22:18
The table has only one index, which is the index automatically created by IBM DB2 when a Primary Key or Unique Key is defined in a table. The index column is the ID attribute : the + sign means that the column is in ascending order in the index and when the sign is - it means that the column is ordered in descending order.
Think of the table as being a big book as an encyclopedia, and an index as that part at the end of the encyclopedia that says on which pages some terms are found. Indexes are structures maintained by the Database, and the correct creation of indexes for each of the methods of accessing the tables is a fundamental part of the future performance of the application. Absence or poor indexation are one of the main causes of slow systems. On the other hand, we must highlight that unused indexes are a stumbling block since the Database spends time and resources maintaining it.
What are the columns in the table and what is their cardinality (the number of distinct values for a column in the table)?
SELECT COLNO, COLNAME, COLCARD FROM SYSCAT.COLUMNS WHERE TABNAME = 'DEPOSITS' ORDER BY COLNO; COLNO COLNAME COLCARD ----- ------------ ------- 0 ID 30000 1 ACCOUNTNUMBER 100 2 DATEOFDEPOSIT 101 3 VALUE 101 4 INCASH 2
Notice the INCASH attribute . It has a value of 2 to COLCARD, meaning it has only two values, if: S and N . The ID attribute has a value of 30,000 for its number of distinct values, which is the expected value since this attribute is the Primary Key of the table. The ACCOUNTNUMBER Attribute has a quantity of 101, which means that there are 101 separate accounts, and so on. Attributes that have high cardinality are natural candidates for creating indexes, provided they are used in WHERE clauses or in table joins.
What is the statistical distribution of the "INCASH" column in the table?
SELECT COLNAME, SEQNO, COLVALUE, VALCOUNT FROM SYSCAT.COLDIST WHERE TABNAME = 'DEPOSITS' AND COLNAME = 'INCASH' AND TYPE = 'F' AND COLVALUE IS NOT NULL; COLNAME SEQNO COLVALUE VALCOUNT ------- ----- -------- -------- INCASH 1 'Y' 15330 INCASH 2 'N' 14670
This information is important to identify attributes with low cardinality (few distinct values), but whose distribution is uneven. This happens, for example, in the case of a table that stores records to process. Over time, most records will be in the "Processed" situation , and only a few will be in the "Processed" situation . In this case, it is important to create an index by this column, since normally in tables like this we always want to return the records to be processed. The same does not happen with an attribute of the Gender type: its cardinality is 2, and the distribution is practically 50% for each value. In that case the creation of an index will not be worth it: the use of an index, roughly speaking, is only worthwhile if the number of records returned is less than 5-7% of the records in a table. More than that, it is often better for IBM DB2 to do a TBSCAN (a full scan of the table).
This also implies that in the construction of SQLs, we must explicitly expose the attribute value in the WHERE clause, not using the Bind Variables technique: That way, IBM DB2 can, knowing the past value, choose the best Execution Plan (do a TBSCAN on the table or use an index). In a hypothetical table called BATCH with a BATCHSITUATION column :
Right
SELECT * FROM BATCH WHERE BATCHSITUATION = 0; -- 0 = "processing"
Wrong
SELECT * FROM BATCH WHERE BATCHSITUATION =?;
In another hypothetical table, PERSON, with the SSN column that has high cardinality (the SSN is unique in this case) we would have:
Right
SELECT * FROM PERSON WHERE SSN =?;
Wrong
SELECT * FROM PERSON WHERE SSN = '22223232323';
In Java, we use PreparedStatments and the functions SetInt , SetString and etc to pass these variables to the Database.
As is already known, this causes the database to reuse cache execution plans.
Executing several Execution Plan in DBVisualizer, with CTRL-ALT-ENTER
SQL-1
SELECT * FROM DEPOSITS;
Execution plan
Operation -------------------------------- RETURN USRCAT.DEPOSITS TBSCAN
The Plan shows that the table has been completely traversed, a TBSCAN . We did not enter any search criteria, so this is expected. Notice the Stream Count : there it indicates that the table has 30,000 records. This information was obtained when the RUNSTATS command was executed on the table, and it is static information, that is, it is up to you, in the approval banks, to run RUNSTATS whenever there are major changes in the volume of records in the table, or if the structure is modified.
SQL-2
SELECT * FROM DEPOSITS WHERE ACCOUNTNUMBER = ?
Execution plan
Operation ------------------------------ RETURN USRCAT.DEPOSITS TBSCAN
The Plan shows that the table continues to be fully covered, a TBSCAN . This is because, although we entered a search criterion, WHERE ACCOUNTNUMBER =? , the bank does not have any indexes for this attribute. Now let's create an index by ACCOUNTNUMBER :
CREATE INDEX IDX_DEPOSITS_01 ON DEPOSITS (ACCOUNTNUMBER) ALLOW REVERSE SCANS COLLECT DETAILED STATISTICS;
And we will generate the Execution Plan again (CTRL-ALT-ENTER)
Execution plan
Operation Stream Count --------------------------------- ------------ RETURN USRCAT.DEPOSITS FETCH USRCAT.IDX_DEPOSITS_01 IXSCAN 300
Now the Plan has changed. IBM DB2 used our new index, IDX_DEPOSITS_01 : it estimated that 300 records ( Stream Count ) will be found in that index for a given value of ACCOUNTNUMBER . This is an average estimate , and is obtained by dividing the value of the CARD column of the SYSCAT.TABLES table (30,000) by the value of the COLCARD column of the SYSCAT.COLUMNS table for the ACCOUNTNUMBER (101) column .
For each entry found in the index, DB2 uses an internal pointer called RID (Record Identified) to access the table ( FETCH operator ): that is why DB2 needs to search for the other attributes that SQL is requesting, DATE-DEPOSIT, VALUE and INCASH (for * cause in SQL). These attributes are not in the index.
Now let's modify the SQL and see what happens if I only search for attributes that are in the index
SQL-3
SELECT ACCOUNTNUMBER FROM DEPOSITS WHERE ACCOUNTNUMBER =?;
Execution plan
Operation ------------------------------ RETURN USRCAT.IDX_DEPOSITS_01 IXSCAN
We can see now, that the bank did not need to access the table. All the necessary attributes were in the index, in this case, the ACCOUNTNUMBER attribute .
Let's suppose, that a certain research is extremely executed, and that we want to avoid the cost of the bank having to access the table. We want all the desired information to be in the index (avoiding the FETCH to the table). First, let's run SQL without creating a new index:
SQL-4
SELECT ACCOUNTNUMBER, INCASH FROM DEPOSITS WHERE ACCOUNTNUMBER =?;
Execution plan
Operation --------------------------------- RETURN USRCAT.DEPOSITS FETCH RIDSCN SORT USRCAT.IDX_DEPOSITS_01 IXSCAN
As expected, the Bank had to access the index first, and then access the table, since the INCASH attribute is not in the index.
Let's create a new index that we can consider to be a coverage index (an index that has all the information for a given SQL).
CREATE UNIQUE INDEX IDX_DEPOSITS_02 ON DEPOSITS (ACCOUNTNUMBER,ID) INCLUDE (INCASH) ALLOW REVERSE SCANS COLLECT DETAILED STATISTICS;
This index is a special index: it includes an INCLUDE clause . The INCLUDE clause is used to put in the index attributes that are used in SQL, but that are not used to access the table, that is, they are used in the SELECT clause , or in JOINs of this table with another table, or in the ORDER BY / GROUP BY / HAVING COUNT and etc. Columns that are part of INCLUDE are not included in the index nodes ( https://i0.wp.com/datageek.blog/wp-content/uploads/2013/09/b-tree_2.jpg?w=654&ssl=1 ), being included in the index sheets , together with the RID .
Note: We only use these indexes if there are query SQLs that are very executed and the table does not suffer too many inclusions, exclusions and changes.
In IBM DB2, in order for us to use this clause, the index must be of the UNIQUE type , meaning that the attributes of the index key (attributes that are not in INCLUDE must form a unique set for the table. That is why in In the case of the IDX_DEPOSITS_02 index, the ID attribute was included at the end, which would not be necessary otherwise, this is an IBM DB2 rule.
Let's see how the SQL SQL-4 execution plan will look after creating this index
Operation --------------------------------------- RETURN USRCAT.IDX_DEPOSITS_02 IXSCAN
As expected, DB2 used our new index, IDX_DEPOSITS_02 , and did not need to access the DEPOSITS table '. All the necessary attributes, ACCOUNTNUMBER and INCASH are present in the index.
Now let's execute an SQL with two conditions in the WHERE clause
SQL-5
SELECT * FROM DEPOSITS WHERE ACCOUNTNUMBER = ? AND DATEOFDEPOSIT = ?;
Execution plan
Operation --------------------------------- RETURN USRCAT.DEPOSITS FETCH RIDSCN SORT USRCAT.IDX_DEPOSITS_01 IXSCAN
DB2 used the IDX_DEPOSITS_01 index , as expected, and it will find 3000 records in the index on average. But, as the DATEOFDEPOSIT attribute is not in the index, DB2 will have to make 300 accesses to the DEPOSITS table to retrieve the DATEOFDEPOSIT value and then, return the desired records. This is clearly not the most efficient way, since we saw that the value of the column 'COLCARD' of the SYSCAT.COLUMNS table for the column DATEOFDEPOSIT has 101 different dates. So let's create another index, a composite index, and see how the Execution Plan will look:
CREATE INDEX IDX_DEPOSITS_03 ON DEPOSITS (ACCOUNTNUMBER,DATEOFDEPOSIT) ALLOW REVERSE SCANS COLLECT DETAILED STATISTICS;
Execution plan
Operation ------------------------------- RETURN USRCAT.DEPOSITS FETCH USRCAT.IDX_DEPOSITS_03 IXSCAN
We see that DB2 has started using our new index. Note that the Stream Count has now returned a value equal to 3.16, which is more or less the average we expect:
SELECT AVG(QTD) "Average" FROM (SELECT ACCOUNTNUMBER, DATEOFDEPOSIT, COUNT(*) QTD FROM DEPOSITS GROUP BY ACCOUNTNUMBER,DATEOFDEPOSIT) Average ------- 3
The creation of this index greatly improved the performance of this SQL: verify that the cost dropped from 127 to 18 !
A note: The columns of an index are traversed from left to right. As we now have two indexes that have the ACCOUNTNUMBER attribute as their starting part, IDX_DEPOSITS_01 , for ACCOUNTNUMBER and IDX_DEPOSITS_03 , for ACCOUNTNUMBER + DATEOFDEPOSIT , we no longer need the IDX_DEPOSITS_01 index . The IDX_DEPOSITS_03 index is a "superset of the IDX_DEPOSITS_01 index . It can be deleted.
INDNAME COLNAMES ---------------- ---------------------- IDX_DEPOSITS_01 +ACCOUNTNUMBER IDX_DEPOSITS_03 +ACCOUNTNUMBER+DATEOFDEPOSIT DROP INDEX IDX_DEPOSITS_01;
Making an even better and mandatory analysis of the Execution Plan
Ok, the bank is using the IDX_DEPOSITS_03 index , which is what we were expecting, since in the WHERE clause we use WHERE ACCOUNTNUMBER =? AND DATEOFDEPOSIT =? . But, we would like to know how the columns were used in the index. In order for us to see this in DBVisualizer, we will first have to execute the command EXPLAIN PLAN FOR on SQL, and then execute the SQL that is described in the SQL session to analyze the Execution Plan . To make it easier, I will save this file in the folder "c:\temp\IBM DB2 - LAST EXPLAINED.sql", and use the command "@run" from DBVisualizer that allows to execute an external file. So here we go:
Run the EXPLAIN PLAN FOR command
SQL-6
EXPLAIN PLAN FOR SELECT * FROM DEPOSITS WHERE ACCOUNTNUMBER = ? AND DATEOFDEPOSIT = ?;
In DBVisualizer, use "@run"
@run "c:\temp\IBM DB2 - LAST EXPLAINED.sql";
Expanded Execution Plan
EXPLAINPLAN ------------------------------------------------------------------------------------------- ID | Operation | Rows | Cost | PCost 1 | RETURN | | 18 | 1000 2 | FETCH DEPOSITS | 3 of 3 (100.00%) | 18 | 1000 3 | IXSCAN IDX_DEPOSITS_03 | 3 of 30000 ( .01%) | 7 | 1000 Predicate Information 3 - START (Q1.ACCOUNTNUMBER = ?) START (Q1.DATEOFDEPOSIT = ?) STOP (Q1.ACCOUNTNUMBER = ?) STOP (Q1.DATEOFDEPOSIT = ?)
We see that the IDX_DEPOSITS_03 index is being used . Next to it is the number 3 in the ID column . In the part Predicate Information , in number 3 , it is described how the columns were used in this access. The important thing to highlight here are the START and / or STOP clauses . When an index column is listed using the start (START) or stop (STOP) operators, it means that the index was accessed on the node part of the index, and not on the leaf part ( SARG access )). This means that the bank was able to make efficient access. The type of index used by DB2 is of the BTree type ( https://i0.wp.com/datageek.blog/wp-content/uploads/2013/09/b-tree_2.jpg?w=654&ssl=1 ). It is like an inverted tree. You start at the top, the root, and work your way down the tree, filtering until you reach the index leaves, where are the RIDs , which are the pointers to the records in the table. Remember that indexes and tables in IBM DB2 are different objects and stored in physically separate locations.
Let's make a variation for this SQL
SQL-7
EXPLAIN PLAN FOR SELECT * FROM DEPOSITS WHERE ACCOUNTNUMBER = ? AND DATEOFDEPOSIT > ?;
In DBVisualizer, use "@run"
@run "c:\temp\IBM DB2 - LAST EXPLAINED.sql";
Expanded Execution Plan
EXPLAINPLAN ------------------------------------------------------------------------------------------- ID | Operation | Rows | Cost | PCost 1 | RETURN | | 39 | 1000 2 | FETCH DEPOSITS | 53 of 53 (100.00%) | 39 | 1000 3 | RIDSCN | 53 of 53 (100.00%) | 7 | 1000 4 | SORT (UNIQUE) | 53 of 53 (100.00%) | 7 | 1000 5 | IXSCAN IDX_DEPOSITS_03 | 53 of 30000 ( .18%) | 7 | 1000 Predicate Information 2 - SARG (? < Q1.DATEOFDEPOSIT) SARG (Q1.ACCOUNTNUMBER = ?) 5 - START (Q1.ACCOUNTNUMBER = ?) START (? < Q1.DATEOFDEPOSIT) STOP (Q1.ACCOUNTNUMBER = ?)
Note now that when accessing the IDX_DEPOSITS_03 index , the ACCOUNTNUMBER attribute appears as START and STOP , because in SQL the sign = (WHERE ACCOUNTNUMBER =?) Is being used . The attribute DATEOFDEPOSIT , on the other hand , is only with START, since it is being searched with the > operator .
Now, let's do something forbidden, which is to use a "Non-sargable" operator. These operators prevent an index from being traversed efficiently, using the index nodes . Examples of Sargable and Non-sargable operators :
Operators called "Sargable" are: =,>, <,> =, <=, BETWEEN, IN, LIKE without including the "%" operator at the beginning.
Operators called "Non-sargable" are: <>, OR, NOT IN, NOT LIKE, LIKE with the "%" operator at the beginning. We can also consider an operation with non-sargable operators when we apply functions or operators to indexed fields, which we use to access the data. Examples:
SELECT * FROM TABELA WHERE SUBSTRING(SSN,1,10) = ? SELECT * FROM TABELA WHERE UPPER(NAME) = ? SELECT * FROM TABELA WHERE BIRTHDATE + 1 DAY = ?
Let's look at some more examples of the consequences of using Non-sargable operators .
SQL-8
EXPLAIN PLAN FOR SELECT * FROM DEPOSITS WHERE SUBSTRING(ACCOUNTNUMBER,1,5) = ? AND DATEOFDEPOSIT > ?;
In DBVisualizer, use "@run"
@run "c:\temp\IBM DB2 - LAST EXPLAINED.sql";
Expanded Execution Plan
EXPLAINPLAN ------------------------------------------------------------------------------------------ ID | Operation | Rows | Cost | PCost 1 | RETURN | | 246 | 1000 2 | TBSCAN DEPOSITS | 12 of 30000 ( .04%) | 246 | 1000 Predicate Information 2 - SARG (Q1.DATEOFDEPOSIT = ?) SARG ( SUBSTRING(VARCHAR(Q1.ACCOUNTNUMBER), 1, CODEUNITS32) = ?)
Notice that DB2 has to not use any index. This is because the SUBSTRING function was used in an indexed attribute, which is used, in this SQL, to access the table.
SQL-9
EXPLAIN PLAN FOR SELECT * FROM DEPOSITS WHERE ACCOUNTNUMBER = ? AND DATEOFDEPOSIT + 1 DAY = ?;
In DBVisualizer, use "@run"
@run "c:\temp\IBM DB2 - LAST EXPLAINED.sql";
Expanded Execution Plan
EXPLAINPLAN ------------------------------------------------------------------------------------------- ID | Operation | Rows | Cost | PCost 1 | RETURN | | 95 | 1000 2 | FETCH DEPOSITS | 12 of 12 (100.00%) | 95 | 1000 3 | IXSCAN IDX_DEPOSITS_03 | 12 of 30000 ( .04%) | 7 | 1000 Predicate Information 3 - START (Q1.ACCOUNTNUMBER = ?) STOP (Q1.ACCOUNTNUMBER = ?) SARG ((Q1.DATEOFDEPOSIT + 1 DAYS) = ?)
Notice that DB2 is using the IDX_DEPOSITS_03 index , but it is efficiently accessing (START / STOP) only through the ACCOUNTNUMBER column . The column DATEOFDEPOSIT started to be accessed in SARG mode (because an operator "+ 1 DAY" was applied in the indexed attribute DATEOFDEPOSIT ), which means that DB2, had to go through the index, all the records for that determined ACCOUNTNUMBER , checking which entries in the index meet the DATADEPOSIT criterion . We can say that DB2 failed to "jump"directly to the deposit dates for that account. This is quite harmful, and could be corrected by changing the SQL to:
SELECT * FROM DEPOSITS WHERE ACCOUNTNUMBER = ? AND DATEOFDEPOSIT = ? - 1 DAY;
In this way, the operator would be applied to the parameter, and not to the indexed field!
Remember the application of Non-Sargable operators is only harmful when applied to an indexed attribute that is used to access the table. After the table has been accessed, there is nothing to prevent the use of Non-Sargable operators . What has to be guaranteed is that when accessing a table, either by the WHERE clause or by a JOIN, the indexed columns used to make that access do not suffer the application of Non-Sargable operations . The SQL below, for example, is using good practices:
SELECT * FROM DEPOSITS WHERE ACCOUNTNUMBER = ? AND DATEOFDEPOSIT = ? AND VALUE <> 1000;
The access to the table has already been done by the attributes ACCOUNTNUMBER and DATEOFDEPOSIT , the application of the <> operator in the VALUE attribute will not affect the performance of the query. This is also true for JOINs between tables. If we have already arrived at a table, and we need to apply a function or operator to an attribute that will be used to access records in another table, we can do that. We just shouldn't, I repeat, apply the function / operator to the destination attribute in the table that will be accessed.
One more SQL.
SQL-10
EXPLAIN PLAN FOR SELECT * FROM DEPOSITS WHERE DATEOFDEPOSIT BETWEEN ? AND ?;
In DBVisualizer, use "@run"
@run "c:\temp\IBM DB2 - LAST EXPLAINED.sql";
Expanded Execution Plan
EXPLAINPLAN ------------------------------------------------------------------------------------------- ID | Operation | Rows | Cost | PCost 1 | RETURN | | 246 | 1000 2 | NLJOIN | 1592 of 1 | 246 | 1000 3 | TBSCAN GENROW | 1 of 1 (100.00%) | 0 | 1000 4 | TBSCAN DEPOSITS | 1592 of 30000 ( 5.31%) | 246 | 1000 Predicate Information 3 - RESID (? <= ? SELECTIVITY 1.000000) 4 - SARG (Q3.DATEOFDEPOSIT <= ?) SARG (? <= Q3.DATEOFDEPOSIT)
Repair TBSCAN now. But the question remains, why? Didn't we create an index that includes the DATADEPOSIT attribute ? Let's see:
SELECT INDNAME, COLNAMES, FIRSTKEYCARD, FIRST2KEYCARD, FIRST3KEYCARD, FIRST4KEYCARD, FULLKEYCARD FROM SYSCAT.INDEXES WHERE TABNAME = 'DEPOSITS'; INDNAME COLNAMES FIRSTKEYCARD FIRST2KEYCARD FIRST3KEYCARD FIRST4KEYCARD FULLKEYCARD ---------------- ---------------------- ------------ ------------- ------------- ------------- ----------- PK_DEPOSITS +ID 30000 -1 -1 -1 30000 IDX_DEPOSITS_02 +ACCOUNTNUMBER+ID+INCASH 100 30000 30000 -1 30000 IDX_DEPOSITS_03 +ACCOUNTNUMBER+DATEOFDEPOSIT 100 9485 -1 -1 9485
But notice that in the only index that has the column DATEOFDEPOSIT , the index IDX_DEPOSITS_03 , the column DATEOFDEPOSIT is not at the beginning of the index !!! . In terms of index, we should not "skip" columns. In case, to make things Ok , we must create a new index:
CREATE INDEX IDX_DEPOSITS_04 ON DEPOSITS (DATEOFDEPOSIT) ALLOW REVERSE SCANS COLLECT DETAILED STATISTICS;
Now let's see how the expanded execution plan got
EXPLAINPLAN ------------------------------------------------------------------------------------------- ID | Operation | Rows | Cost | PCost 1 | RETURN | | 342 | 1000 2 | NLJOIN | 1592 of 1 | 342 | 1000 3 | TBSCAN GENROW | 1 of 1 (100.00%) | 0 | 1000 4 | FETCH DEPOSITS | 1592 of 1592 (100.00%) | 171 | 1000 5 | RIDSCN | 1592 of 1592 (100.00%) | 27 | 1000 6 | SORT (UNIQUE) | 1592 of 1592 (100.00%) | 27 | 1000 7 | IXSCAN IDX_DEPOSITS_04 | 1592 of 30000 ( 5.31%) | 27 | 1000 Predicate Information 3 - RESID (? <= ? SELECTIVITY 1.000000) 4 - SARG (Q3.DATEOFDEPOSIT <= ?) SARG (? <= Q3.DATEOFDEPOSIT) 7 - START (? <= Q3.DATEOFDEPOSIT) STOP (Q3.DATEOFDEPOSIT <= ?)
Notice now that DB2 used our new index IDX_DEPOSITS_04 , and that there is our dear START / STOP !
A note: Whenever an index column is consulted using the operators > or < or BETWEEN , and that index has more than one column, we must create the index by placing that column at the end of the index: If our table has the following queries:
SELECT * FROM DEPOSITS WHERE ACCOUNTNUMBER = ?;
and
SELECT * FROM DEPOSITS WHERE ACCOUNTNUMBER = ? AND DATEOFDEPOSIT = ?;
and
SELECT * FROM DEPOSITS WHERE ACCOUNTNUMBER = ? AND DATEOFDEPOSIT BETWEEN ? AND ?;
and
SELECT * FROM DEPOSITS WHERE DATEOFDEPOSIT =?;
and
SELECT * FROM DEPOSITS WHERE DATEOFDEPOSIT BETWEEN ? AND ?;
The only required indexes are:
1)
CREATE INDEX IDX_DEPOSITS_01 ON DEPOSITS (ACCOUNTNUMBER,DATEOFDEPOSIT) ALLOW REVERSE SCANS COLLECT DETAILED STATISTICS;
this index meets:
WHERE ACCOUNTNUMBER = ? WHERE ACCOUNTNUMBER = ? AND DATEOFDEPOSIT (>,<,BETWEEN) ...
2)
CREATE INDEX IDX_DEPOSITS_02 ON DEPOSITS (DATEOFDEPOSIT) ALLOW REVERSE SCANS COLLECT DETAILED STATISTICS;
this index meets:
WHERE DATEOFDEPOSIT = ? WHERE DATEOFDEPOSIT (>,<,BETWEEN) ...
SQL to analyze the Execution Plan(Source: https://use-the-index-luke.com/sql/explain-plan/db2/getting-an-execution-plan)
"IBM DB2 - LAST EXPLAINED.sql"
WITH tree(operator_ID, level, path, explain_time, cycle) AS ( SELECT 1 operator_id , 0 level , CAST('001' AS VARCHAR(2000)) path , max(explain_time) explain_time , 0 FROM SYSTOOLS.EXPLAIN_OPERATOR O WHERE explain_time = (SELECT MAX(explain_time) FROM SYSTOOLS.EXPLAIN_OPERATOR) UNION ALL SELECT s.source_id , level + 1 , tree.path || '/' || LPAD(CAST(s.source_id AS VARCHAR(3)), 3, '0') path , tree.explain_time , CASE WHEN (POSITION(path IN '%/' || LPAD(CAST(s.source_id AS VARCHAR(3)), 3, '0') || '/%' USING OCTETS) >0) THEN 1 ELSE 0 END FROM tree , SYSTOOLS.EXPLAIN_STREAM S WHERE s.target_id = tree.operator_id AND s.explain_time = tree.explain_time AND S.Object_Name IS NULL AND tree.cycle = 0 AND level < 100) SELECT * FROM ( SELECT EXPLAINPLAN FROM ( SELECT CAST( LPAD(id, MAX(LENGTH(id)) OVER(), ' ') || ' | ' || RPAD(operation, MAX(LENGTH(operation)) OVER(), ' ') || ' | ' || LPAD(rows, MAX(LENGTH(rows)) OVER(), ' ') || ' | ' || CASE WHEN COUNT(ActualRows) OVER () > 1 THEN LPAD(ActualRows, MAX(LENGTH(ActualRows)) OVER(), ' ') || ' | ' ELSE '' END || LPAD(cost, MAX(LENGTH(cost)) OVER(), ' ') || ' | ' || LPAD(pcost, MAX(LENGTH(pcost)) OVER(), ' ') AS VARCHAR(100)) EXPLAINPLAN , path FROM ( SELECT 'ID' ID , 'Operation' Operation , 'Rows' Rows , 'ActualRows' ActualRows , 'Cost' Cost , 'PCost' PCost , '0' Path FROM SYSIBM.SYSDUMMY1 UNION SELECT CAST(tree.operator_id as VARCHAR(254)) ID , CAST(LPAD(' ', tree.level, ' ') || CASE WHEN tree.cycle = 1 THEN '(cycle) ' ELSE '' END || COALESCE ( TRIM(O.Operator_Type) || COALESCE(' (' || argument || ')', '') || ' ' || COALESCE(S.Object_Name,'') , '' ) AS VARCHAR(254)) AS OPERATION , COALESCE(CAST(rows AS VARCHAR(254)), '') Rows , CAST(ActualRows as VARCHAR(254)) ActualRows , COALESCE(CAST(CAST(O.Total_Cost AS BIGINT) AS VARCHAR(254)), '') Cost , CAST(1000 as VARCHAR(254)) pcost -- , CASE WHEN O.Operator_Type <> 'RETURN' THEN COALESCE(CAST(CAST(O.Total_Cost*100/sum(case when O.Operator_Type <> 'RETURN' then O.Total_Cost else 0 end) over (partition by o.explain_time) AS BIGINT) AS VARCHAR(254)), '') ELSE ' ' END pCost , path FROM tree LEFT JOIN ( SELECT i.source_id , i.target_id , CAST(CAST(ROUND(o.stream_count) AS INTEGER) AS VARCHAR(12)) || ' of ' || CAST (total_rows AS VARCHAR(12)) || CASE WHEN total_rows > 0 AND ROUND(o.stream_count) <= total_rows THEN ' (' || LPAD(CAST (ROUND(ROUND(o.stream_count)/total_rows*100,2) AS NUMERIC(5,2)), 6, ' ') || '%)' ELSE '' END rows , CASE WHEN act.actual_value is not null then CAST(CAST(ROUND(act.actual_value) AS INTEGER) AS VARCHAR(12)) || ' of ' || CAST (total_rows AS VARCHAR(12)) || CASE WHEN total_rows > 0 THEN ' (' || LPAD(CAST (ROUND(ROUND(act.actual_value)/total_rows*100,2) AS NUMERIC(5,2)), 6, ' ') || '%)' ELSE NULL END END ActualRows , i.object_name , i.explain_time FROM (SELECT MAX(source_id) source_id , target_id , MIN(CAST(ROUND(stream_count,0) AS BIGINT)) total_rows , CAST(LISTAGG(object_name) AS VARCHAR(50)) object_name , explain_time FROM SYSTOOLS.EXPLAIN_STREAM WHERE explain_time = (SELECT MAX(explain_time) FROM SYSTOOLS.EXPLAIN_OPERATOR ) GROUP BY target_id, explain_time ) I LEFT JOIN SYSTOOLS.EXPLAIN_STREAM O ON ( I.target_id=o.source_id AND I.explain_time = o.explain_time ) LEFT JOIN SYSTOOLS.EXPLAIN_ACTUALS act ON ( act.operator_id = i.target_id AND act.explain_time = i.explain_time AND act.ACTUAL_TYPE like 'CARDINALITY%' ) ) s ON ( s.target_id = tree.operator_id AND s.explain_time = tree.explain_time ) LEFT JOIN SYSTOOLS.EXPLAIN_OPERATOR O ON ( o.operator_id = tree.operator_id AND o.explain_time = tree.explain_time ) --AQUI LEFT JOIN TABLE (SELECT LISTAGG (CASE argument_type WHEN 'UNIQUE' THEN CASE WHEN argument_value = 'TRUE' THEN 'UNIQUE' ELSE NULL END WHEN 'TRUNCSRT' THEN CASE WHEN argument_value = 'TRUE' THEN 'TOP-N' ELSE NULL END WHEN 'SCANDIR' THEN CASE WHEN argument_value != 'FORWARD' THEN argument_value ELSE NULL END ELSE argument_value END , ' ') argument , operator_id , explain_time FROM SYSTOOLS.EXPLAIN_ARGUMENT EA --SELECT * FROM SYSCAT.INDEXES WHERE TABNAME = 'EXPLAIN_ARGUMENT' WHERE argument_type IN ('AGGMODE' ,'UNIQUE', 'TRUNCSRT' ,'SCANDIR' ,'OUTERJN' ,'OPT_PROF') AND ea.operator_id = tree.operator_id AND ea.explain_time = tree.explain_time GROUP BY explain_time, operator_id ) A ON ( a.operator_id = tree.operator_id AND a.explain_time = tree.explain_time ) ) O UNION ALL SELECT '', 'A' FROM SYSIBM.SYSDUMMY1 UNION ALL SELECT '', 'Y' FROM SYSIBM.SYSDUMMY1 UNION ALL SELECT '', 'Q' FROM SYSIBM.SYSDUMMY1 UNION ALL SELECT 'Partition information','R' FROM SYSIBM.SYSDUMMY1 UNION ALL SELECT CASE argument_type WHEN 'DPESTFLG' THEN ' DPNUMPRT value is based on an estimate? '||argument_value WHEN 'DPLSTPRT' THEN ' Accessed data partitions: '||argument_value WHEN 'DPNUMPRT' THEN ' Actual or estimated number of data partitions accessed: '||argument_value ELSE argument_value END argument,'S' FROM (SELECT DISTINCT EXPLAIN_TIME FROM tree) a inner join SYSTOOLS.EXPLAIN_ARGUMENT b on a.explain_time = b.explain_time WHERE argument_type IN ('DPESTFLG' ,'DPLSTPRT' ,'DPNUMPRT') UNION ALL SELECT 'Predicate Information', 'AA' FROM SYSIBM.SYSDUMMY1 UNION ALL SELECT CAST (LPAD(CASE WHEN operator_id = LAG (operator_id) OVER (PARTITION BY operator_id ORDER BY pred_order) THEN '' ELSE operator_id || ' - ' END , MAX(LENGTH(operator_id )+4) OVER() , ' ') || how_applied || ' ' || predicate_text || CASE WHEN how_applied like '%DPSTART%' or how_applied like '%DPSTOP%' THEN 'Parti��o: '||TO_CHAR(RANGE_NUM) ELSE '' END AS VARCHAR(200)) PredicateInformation , 'P' || LPAD(id_order, 5, '0') || pred_order path FROM (SELECT CAST(operator_id AS VARCHAR(254)) operator_id , LPAD(trim(how_applied) , MAX (LENGTH(TRIM(how_applied))) OVER (PARTITION BY operator_id) , ' ' ) how_applied , CAST(substr(predicate_text, 1, 80) AS VARCHAR(80)) predicate_text , CASE how_applied WHEN 'START' THEN '1' WHEN 'STOP' THEN '2' WHEN 'SARG' THEN '3' ELSE '9' END pred_order , operator_id id_order , RANGE_NUM FROM systools.explain_predicate p WHERE explain_time = (SELECT MAX(explain_time) FROM systools.explain_operator) ) ) ORDER BY path);