Chapter 2 - Query Processing and Optimization

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Chapter Outline

 Introduction to Query Processing.


 Translating SQL Queries into Relational Algebra.
 Algorithms for External Sorting.
Chapter Two  Basic Algorithms for Executing Query Operations.
 Using Heuristics in Query Optimization.
 Using Selectivity and Cost Estimates in Query Optimization.
Query Processing and Optimization  Semantic Query Optimization.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 15- 1 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe 2

Introduction (1) Introduction (2)

 This chapter discuss the techniques used internally by a DBMS to  Then an internal representation of the query is created, usually as a tree
process, optimize, and execute high-level queries. data structure called a query tree.
 It is also possible to represent the query using a graph data structure
 A query expressed in a high-level query language such as SQL must first called a query graph. The DBMS must then devise an execution
be scanned, parsed, and validated. strategy or query plan for retrieving the results of the query from the
database files.
 The Scanner identifies the query tokens such as SQL keywords,
attribute names, and relation names that appear in the text of the query.  A query typically has many possible execution strategies, and the
process of choosing a suitable one for processing a query is known as
 Parser checks the query syntax to determine whether it is formulated query optimization.
according to the syntax rules (rules of grammar) of the query language.
 Query-Processing:- It is the process of translating high-level
 The query must also be validated by checking that all attribute and queries(SQL) into Low-Level expressions.
relation names are valid and semantically meaningful names in the  Query-Processing-Steps:- Consists of parsing, translation,
schema of the particular database being queried. optimization and execution of query.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 3 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 4

1
Introduction (3) Introduction (4)

 The query optimizer module has the task of producing a good


execution plan, and the code-generator generates the code to execute
that plan.
 Figure 2.1 shows the different steps of processing a high-level query.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 5 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 6

Introduction (5)
Translating SQL Queries into Relational Algebra(1)

Schema diagram for the company relational database.  Query-Decomposition:-Before translation, The SQL query is
decomposed into query blocks.
 Query-Blocks:- It is the basic unit that can be translated into the
algebraic operators and then optimized.

 A query block contains a single SELECT-FROM-WHERE expression, as


well as GROUP BY and HAVING clause if these are part of the block.

 Nested-Queries within a query are identified as separate query blocks.


 Aggregate operators in SQL must be included in the extended algebra.

 Because SQL includes aggregate operators such as MAX, MIN, SUM,


and COUNT these operators must also be included in the extended
algebra, as shown below.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 7 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 8

2
Translating SQL Queries into Relational Algebra(2) Translating SQL Queries into Relational Algebra(3)

 Consider the following SQL query on the EMPLOYEE relation in


Query Blocks Example
Figure above.
 The Inner block is:-

 This retrieves the highest salary in department 5.

 This query retrieves the names of employees (from any department in  The Outer block is:-
the company) who earn a salary that is greater than the highest salary
in department 5. The query includes a nested subquery and hence
would be decomposed into two blocks as shown below.  Where C represents the result returned from the inner block.
 Then the queries are translated into relational algebra as shown
below.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 9 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 10

Notation for Query Tree(1)


Translating SQL Queries into Relational Algebra (4)

SELECT LNAME, FNAME  After that internally, the relational algebra can be represented in either
FROM EMPLOYEE Query-Tree or Query-Graph data structure and then optimized.
WHERE SALARY > ( SELECT MAX (SALARY)  Query Trees or Query Graphs are two internal representations of a
FROM EMPLOYEE query.
WHERE DNO = 5);  Query-Tree:- It is a tree data structure that corresponds to a relational
algebra expression.
 It represents the input relations of the query as leaf nodes of the tree,
and represents the relational algebra operations as internal nodes.
SELECT LNAME, FNAME SELECT MAX (SALARY)
FROM EMPLOYEE FROM EMPLOYEE  An execution of the query tree consists of executing an internal node
WHERE SALARY > C WHERE DNO = 5 operation whenever its operands are available and then replacing that
internal node by the relation that results from executing the operation.
πLNAME, FNAME (σSALARY>C(EMPLOYEE)) ℱMAX SALARY (σDNO=5 (EMPLOYEE))
 Then the query optimizer would choose an execution plan for each
query block.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 11 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 12

3
Notation for Query Trees (2) Notation for Query Trees (3)

 The order of execution of operations starts at the leaf nodes, which


represents the input database relations for the query, and ends at the
root node, which represents the final operation of the query.

 The execution terminates when the root node operation is executed and
produces the result relation for the query.

 Figure a below shows a query tree for query For every project located
in ‘Stafford’, retrieve the project number, the controlling department
number, and the department manager’s Last Name, Address, and
Birthdate. Corresponds to the following relational algebra
expression.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 13 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 14

Notation for Query Trees (4) Notation for Query Graph (1)

 In the Figure (a) above , the leaf nodes P, D, and E represent the three  Another data structure for representation of a query is the query graph.
relations PROJECT, DEPARTMENT, and EMPLOYEE, respectively,  Relations in query graph are represented by single circles. Constant
and the internal tree nodes represent the relational algebra operations values are represented by constant nodes, which are displayed as
of the expression. double circles or ovals.
 When this query tree is executed, the node marked (1) in Figure a must
begin execution before node (2) because some resulting tuples of  Selection and join conditions are represented by the graph edges, as
operation (1) must be available before we can begin executing operation shown in the Figure c below.
(2). Similarly, node (2) must begin executing and producing results
before node (3) can start execution, and so on. Since Query Tree  Finally, the attributes to be retrieved from each relation are displayed
represents a order of operations. in square brackets above each relation.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 15 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 16

4
Notation for Query Graphs(2) Algorithms for External Sorting (1)

 Disadvantage:- The query graph representation does not indicate an


order on which operations to perform first.
 There is only a single graph corresponding to each query.

 Although some optimization techniques were based on query graphs,


it is now generally accepted that query trees are preferable because,
in practice, the query optimizer needs to show the order of
operations for query execution, which is not possible in query
graphs.
External Sorting:- Sorting is one of the primary algorithms used in
query processing.
 It is sorting algorithms that are suitable for large files of records
stored on disk that do not fit entirely in main memory, such as
most database files.
 For example, whenever an SQL query specifies an ORDER BY-
Clause, the query result must be sorted.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 17 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 18

Algorithms for External Sorting (2) Algorithms for External Sorting (3)

 Sorting Algorithm requires buffer space in main memory to perform  Sorting Phase:-For Example:- if the number of available main memory
sorting and merging. buffers nB = 5 disk blocks and the size of the file b = 1024 disk blocks,
 It uses the Sort-Merge Strategy. then nR= (b/nB) or 205 initial runs each of size 5 blocks. Hence, after
the sorting phase, 205 sorted runs (or 205 sorted subfiles of the original
 Sort-Merge Strategy:- It the strategy used by sorting algorithm. starts file) are stored as temporary subfiles on disk.
by sorting small subfiles (runs) of the main file and then merges the
sorted runs, creating larger sorted subfiles that are merged in turn.  In the merging-phase, the sorted runs are merged during one or more
 The sort-merge algorithm requires buffer-space in main-memory, merge passes.
where the actual sorting and merging of the runs is performed.  Merging Phase:- The degree of merging (dM) is the number of sorted
 Sorting phase:- nR = (b/nB) subfiles that can be merged in each merge step.
 Merging phase:- dM = Min (nB-1, nR); nP = (logdM(nR))  Hence, dM is the smaller of (nB - 1) and nR, and the number of merge
passes is (Log dM(nR)).
 nR: number of initial runs.
 b: number of file blocks.
 In our example where nB = 5, dM = 4 (four-way merging), so the 205
 nB: available buffer space.
 dM: degree of merging.
initial sorted runs would be merged 4 at a time. In each step into 52
 nP: number of passes. larger sorted subfiles at the end of the first merge pass.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 19 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 20

5
Algorithms for External Sorting (4) Algorithms for External Sorting (5)

 These 52 sorted files are then merged 4 at a time into 13 sorted files,
which are then merged into 4 sorted files, and then finally into 1 fully
sorted file, which means that four passes are needed.
 The performance of the sort-merge algorithm can be measured in the
number of disk block reads and writes (between the disk and main
memory) before the sorting of the whole file is completed. See Example.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 21 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 22

Algorithms for External Sorting (6) Algorithms for External Sorting (7)

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 23 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 24

6
Algorithms for External Sorting (8) Algorithms for External Sorting (9)

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 26

Implementing the SELECT Operation(2)


Implementing the SELECT Operation(1)
 S2 Binary Search:-
 If the selection condition involves an equality comparison on a
 Examples:-
key-attribute on which the file is ordered, binary search (which
is more efficient than linear search) can be used.
 E.g:- SSN is the ordering attribute of EMPLOYEE…

 σSSN=‘12345’(EMPLOYEE) in OP1.

 S3:- Using Primary Index :-


 If the selection condition involves an equality comparison on a key
attribute with a primary index, For-Example SSN=‘12345’ in previous
example, use a primary index to retrieve record. This condition
retrieves ONLY SINGLE RECORD.
1. Search Methods for Simple Selection:-  S4 Using a primary index to retrieve multiple records:-
 If the comparison condition is >,<,≥ or ≤ on a key-field with a
S1 Linear Search (Brute force):- primary index, For-Example DNO < 5 in following example use
Retrieve every record in the file, and test whether its attribute the primary index to retrieve records satisfying condition then
values satisfy the selection condition. DNO<5 retrieve all the preceding records then of DNO =5. E.g.:-
DNO is the primary attribute of EMPLOYEE uses the index.
 σ DNO < ‘12345’ (EMPLOYEE).
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 27 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 15- 28

7
Implementing the SELECT Operation(3) Implementing the SELECT Operation(4)
 Implementing the SELECT Operation (Contd.):-  Implementing the SELECT Operation (contd.):-
 Search Methods for Simple Selection:-  2. Search Methods for Complex Selection:-
 S5 Using a clustering index to retrieve multiple records:-

 If the selection condition involves an equality comparison on an


non-prime attribute with a clustering index, For-Example Mark  S6 Conjunctive selection using a composite index.
= 85 use the index to retrieve all the records that satisfy the  If two or more attributes are involved in equality condition in
selection condition. the conjunctive condition and a composite index exist on the
σ Mark < 85 (MARKS). combined field, For-Example if an index has been created on
 Method S1 (Linear Search) applies to any file, but all the other the composite-key (Essn, Pno) of the WORKS_ON file we can
methods depend on having the appropriate access path on the attribute use the index directly.
used in the selection condition. Method S2 (binary search) requires
the file to be sorted on the search attribute.
 The methods that use an index (S3, S4 and S5) are generally referred
to as index searches, and they require the appropriate index to exist
on the search attribute. Methods S4 can be used to retrieve records in a
certain range for example, 3000 <= Salary <= 3500. Queries
involving such conditions are called range queries.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 15- 29 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 30

Implementing the SELECT Operation(5)


Implementing the SELECT Operation(6)

 Whenever a single condition specifies the selection such as OP1, OP2,  Disjunctive Selection Conditions. Compared to a conjunctive selection
or OP3 the DBMS can only check whether or not an access path exists condition, a disjunctive condition (where simple conditions are
on the attribute involved in that condition. connected by the OR logical connective rather than by AND) is much
 If an access path (such as index or hash key or sorted file) exists, the harder to process and optimize. For example, consider.
method corresponding to that access path is used; otherwise, the brute
force, linear search approach of method S1 can be used.
 With such a condition, little optimization can be done, because the
 Query optimization for a SELECT operation is needed mostly for records satisfying the disjunctive condition are the union of the
conjunctive select conditions whenever more than one of the records satisfying the individual conditions. Hence, if any one of
attributes involved in the conditions have an access path. The optimizer the conditions does not have an access path, we are forced to use the
should choose the access path that retrieves the fewest records in the brute force, linear search approach.
most efficient way by estimating the different costs and choosing the
method with the least estimated cost.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 31 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 32

8
Implementing the JOIN Operation(1) Implementing the JOIN Operation(2)

 The JOIN operation is one of the most time-consuming operations in  J1 Nested-loop join:-
query processing.  For each record t in R (outer-loop), retrieve every record s from S

 The algorithms we discuss next are for a join operation of the form:- (inner-loop) and test whether the two records satisfy the join
condition t[A] = s[B].
 J2 Single-loop join :-
 If an index exists for one of the two join attributes say, attribute B of
 Where A and B are the join attributes, which should be domain-
file S retrieve each record t in R (loop over file R), and then use the
compatible attributes of R and S, respectively. The methods we
access structure (such as an index or a hash key) to retrieve directly
discuss can be extended to more general forms of join. We illustrate
all matching records s from S that satisfy s[B] = t[A].
three most common techniques for performing a join, using the
following sample operations:
 J3 Sort-merge join. If the records of R and S are physically sorted
by value of the join attributes A and B, respectively, we can
implement the join in the most efficient way possible.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 33 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 34

Query Optimization
Implementing the JOIN Operation(3)

 Both files are scanned concurrently in order of the join attributes,


matching the records that have the same values for A and B.
 If the files are not sorted, they may be sorted first by using External
Sorting.
 In this method, pairs of file blocks are copied into memory buffers in
order and the records of each file are scanned only once each for
matching with the other file unless both A and B are non-key attributes,
in which case the method needs to be modified slightly.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 35 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 36

9
Query Optimization Query Optimization

 Query Optimization:- It is the process of examining and Planning a


good execution plan from multiple query execution strategy for
satisfying efficient execution plan.
 The process of choosing a suitable execution strategy for processing a
query.
 The decomposed query block of SQL is translating into an equivalent
extended relational algebra expression and then optimized.
 There are two main techniques for implementing Query
Optimization:-
 The first technique is based on Heuristic Rules for ordering the
operations in a query execution strategy. The second technique involves
the systematic cost estimation .

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 37 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 38

Query Optimization 1. Heuristic Optimization of Query Trees(1)

 different execution strategies and choosing the execution plan with


the lowest cost. Semantic query optimization is used with the
combination with the heuristic query transformation rules.

 It uses constraints specified on the database schema such as unique


attributes and other more complex constraints, in order to modify one
query into another query that is more efficient to execute.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 39 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 15- 40

10
1. Heuristic Optimization of Query Trees(2) 1. Heuristic Optimization of Query Trees(3)

 In general, many different relational algebra expressions and hence Corresponding SQL query written is written as follows:-
many different query trees can be equivalent that is they can represent
the same query.
 The query parser will generate initial query tree to correspond to an
SQL query, without doing any optimization.
 For-Example:- For every project located in ‘Stafford’, retrieve the
project number, the controlling department number, and the department 1. Initial Canonical is as follows:-
manager’s last name, address, and birthdate.
 Corresponding Relational algebra expression is as follows:-

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 41 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 42

1. Heuristic Optimization of Query Trees(4) 1. Heuristic Optimization of Query Trees(5)

 The CARTESIAN PRODUCT of the relations specified in the FROM Example of Transforming a Query:-
clause is first applied. Then the selection and join conditions of the Consider the following query Q on the database Find the last names of
WHERE clause are applied, followed by the projection on the employees born after 1957 who work on a project named ‘Aquarius’.
SELECT clause attributes. This query can be specified in SQL as follows:
 Such a canonical query tree represents a relational algebra expression
that is very inefficient if executed directly, because of the
CARTESIAN PRODUCT (×) operations.
 For Example, if the PROJECT, DEPARTMENT, and EMPLOYEE
relations had record sizes of 100, 50, and 150 bytes and contained 100,
20, and 5,000 tuples, respectively, the result of the CARTESIAN  The initial query tree for Q is shown in Figure (a) below. Executing
PRODUCT would contain 10 million tuples of record size 300 bytes this tree directly first creates a very large file containing the
each. CARTESIAN PRODUCT of the entire EMPLOYEE, WORKS_ON,
 Therefore, It will never be executed. The heuristic query optimizer will and PROJECT files.
transform this initial query tree into an equivalent final query tree
that is efficient to execute.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 15- 43 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 44

11
1. Heuristic Optimization of Query Trees(7)
1. Heuristic Optimization of Query Trees(6)

 That is why the initial query tree is never executed, but is transformed
into another equivalent tree that is efficient to execute using the
following heuristic rules.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 45 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 15- 46
Slide

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 15- 47 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 15- 48
Slide 47 Slide 48

12
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 15- 49 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 15- 50
Slide 49 Slide 50

1. Heuristic Optimization of Query Trees(12) 1. Heuristic Optimization of Query Trees(13)

 A further improvement is achieved by switching the positions of the


EMPLOYEE and PROJECT relations in the tree.
 This uses the information that Pnumber is a key attribute of the PROJECT
relation, and hence the SELECT operation on the PROJECT relation will
retrieve a single record only.
 We can further improve the query tree by replacing any CARTESIAN
PRODUCT operation that is followed by a join condition with a JOIN operation,
as shown.
 Another improvement is to keep only the attributes needed by subsequent
operations in the intermediate relations, by including PROJECT (π) operations
as early as possible in the query tree.
 This particular query needs only one record from the PROJECT  This reduces the attributes (columns) of the intermediate relations, whereas the
relation for the ‘Aquarius’ project and only the EMPLOYEE records SELECT operations reduce the number of tuples (records).
for those whose date of birth is after ‘1957-12-31’. an improved query  As the preceding example demonstrates, a query tree can be transformed step by
tree that first applies the SELECT operations to reduce the number of step into an equivalent query tree that is more efficient to execute. However, we
tuples that appear in the CARTESIAN PRODUCT. must make sure that the transformation steps always lead to an equivalent query
tree.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 51 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 15- 52

13
2. Using Selectivity and Cost Estimates in Query Optimization(1) 2. Using Selectivity and Cost Estimates in Query Optimization(2)

 A query optimizer does not depend solely on heuristic rules, it also


 Cost Components of Query Execution
estimates and compares the costs of executing a query using different
execution strategies and algorithms, and it then chooses the strategy with
the lowest cost estimate.
 The method of optimizing the query by choosing a strategy those result
in minimum cost is called cost-based query optimization. The cost-
based query optimization uses the formula that estimate the cost for a
number of options and selects the one with lowest cost and the most
efficient to execute.
 The cost functions used in query optimization are estimates and not exact
cost functions. The cost of an operation is heavily dependent on its
selectivity.
 In general, the different algorithms are suitable for low or high selectivity
queries. In order for query optimizer to choose suitable algorithm for an
operation an estimate of the cost of executing that algorithm must be
provided.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 53 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 15- 54
Slide 54

2. Using Selectivity and Cost Estimates in Query Optimization(3) 2. Using Selectivity and Cost Estimates in Query Optimization(4)

 This is the cost of transferring (reading and writing) data blocks between 4. Memory Uses Cost: -It is a cost relating to the number of memory
secondary disk storage and main memory buffers. The cost of searching buffers needed during query execution.
for tuples in the database relations depends on the type of access 5. Communication Cost:-It is the cost of transferring query and its
structures on that relation, such ordering, hashing and primary or results from the database site to the site of terminal of query organization.
secondary indexes.
2. Storage Cost: -
 The storage cost is of storing any intermediate relations that are
generated by the executing strategy for the query.

3. Computation Cost: -
 Computation cost is the cost of performing in-memory operations on
the data buffers during query execution. Such operations contain
searching for and sorting records, merging records for a join and
performing computation on a field value. This is also known as CPU
cost.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 55 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 56

14
2. Using Selectivity and Cost Estimates in Query Optimization(5) 2. Using Selectivity and Cost Estimates in Query Optimization(6)

nTuples(EMPLOYEE) = 6,000
bFactor(EMPLOYEE) = 60
= nTuples(EMPLOYEE) / bFactor(EMPLOYEE)
nBlocks(EMPLOYEE)
= 6,000 / 60 = 100
nDistinctDEPT-
= 1,000
 Example of Cost Estimation for SELECT Operation: - ID(EMPLOYEE)
 Let us consider the relation EMPLOYEE having following attributes: - nTuples(EMPLOYEE) / nDistinctDEPT
=
EMPLOYEE (EMP-ID, DEPT-ID, POSITION, SALARY) SCDEPT-ID(EMPLOYEE) ID(EMPLOYEE)
=
 Let us consider the following assumptions:- 6,000 / 1,000 = 6
 There is a hash index on primary key attribute EMP-ID. nDistinctPOSITION(EMPLO
 There is a clustering index on foreign key attribute DEPT-ID. = 20
YEE)
 Let us also assume that the EMPLOYEE relation has the following
nTuples(EMPLOYEE) /
statistics in the system catalog: - =
SCPOSITION(EMPLOYEE) nDistinctPOSITION(EMPLOYEE)
=
6,000 / 20 = 300
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 57 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 15- 58

2. Using Selectivity and Cost Estimates in Query Optimization(7) 2. Using Selectivity and Cost Estimates in Query Optimization(8)

 Let us consider the following SELECT operations. The selection operation contains an equality condition on the primary
key EMP-ID of the relation EMPLOYEE. Therefore, as the attribute
Selection -1 EMP-ID is hashed we can use the strategy 3 to estimate the cost as 1
block. The estimated cardinality of the result relation is SCEMP-ID
(EMPLOYEE) = 1.

The attribute in the predicate is the non-key, non-indexed attribute.


Therefore we can improve on the linear search method, giving an
Selection -2
estimated cost of 100 blocks. The estimated cardinality of the result
relation is SCPOSITION (EMPLOYEE) = 300.

The attribute in the predicate is a foreign key with a clustering index.


Therefore, we can use strategy 7 to estimate the cost as (2 + (6/30)) = 3
Selection -3
blocks. The estimated cardinality of result relation is SCDEPT-ID
(EMPLOYEE) = 6.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 59 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 60

15
2. Using Selectivity and Cost Estimates in Query Optimization(9)
Semantic Query Optimization
The predicate here involves a range search on the SALARY attribute,  Semantic Query Optimization:-
which has the B+-Tree index. Therefore, we can use the strategy 6 to  Uses constraints specified on the database schema in order to

Selection -4
estimate the cost as (2 + (50/2) + (6,000/2)) = 3027 blocks. Thus the modify one query into another query that is more efficient to
linear search strategy is used in this case, the estimated cardinality of execute.
the result relation is SCSALARY (EMPLOYEE) = [6000*(8000-  Consider the following SQL query:-
2000*2)/(8000-2000)] = 4000.
While we are retrieving each tuple using the clustering index, we can SELECT E.LNAME, M.LNAME
check whether they satisfied the first condition (POSITION =
FROM EMPLOYEE E M
‘MANAGER’). We know that estimated cardinality of the second
condition SCDEPT-ID (EMPLOYEE) = 6. Let us assume that this
WHERE E.SUPERSSN=M.SSN AND E.SALARY>M.SALARY
intermediate condition is S. then the number of distinct values of  Explanation:-
Selection -5  Suppose that we had a constraint on the database schema that
POSITION in S can be estimated as [(6 + 20)/2] = 9. Let us apply
now the second condition using the clustering index on DEPT-ID, stated that no employee can earn more than his or her direct
which has an estimated cost of 3 blocks. Thus, the estimated
supervisor. If the semantic query optimizer checks for the
existence of this constraint, it need not execute the query at all
cardinality of the result relation will be SC POSITION (S) = 6/9 = 1,
because it knows that the result of the query will be empty.
which would be correct if there is one manager for each branch.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 15- 61 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 62

Query Evaluation Plan

 Query Evaluation Plan:- Used to specify how to evaluate a


query.
 A given query can have different execution plan.

 The responsibility of query optimizer is to generate least cost

plan.
 Finally, best query evaluation plan is submitted to query

evaluation engine for actual execution. Thank You !!!

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 63 Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide 15- 64

16

You might also like