DBMS Module-2-Notes - Normalization

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

DataBase Management System

Module – II
Normalization (Part-3)

 Functional Dependencies
 Normal Forms Based on Primary Keys
 General Definition of 2nd & 3rd Normal forms
 Boyce-Codd Normal Form

Functional Dependencies

 Functional Dependency is a constraint between two sets of attributes from the database.

Definition

 A functional dependency, denoted by X  Y (between two sets of attributes X and Y that


are subsets of R) specifies a constraint on the possible tuples for every state r of relation R.
 The constraint is that, for any two tuples t1 and t2 in r that have t1[X]= t2[X], they must
also have
t1[Y]=t2[Y].
 There is a functional dependency from X to Y (or the value of Y is determined by the value of
X (or X uniquely determines the value of Y) (or Y is functionally dependent on X)
 The abbreviation for functional dependency is FD or f.d.
 The set of attributes X is called the left-hand side of the FD, and Y is called the right-hand
side of the FD.
 If a constraint on R states that there cannot be more than one tuple with a given X value in
any relation instance r(R) (that is, X is a candidate key of R), this implies that X  Y is true
for any subset of attributes Y of R.
 If X  Y in a R, this does not say whether or not
Y  X in R
 A functional dependency is the property of relation schema not of a particular state (r) of R.
 Thus functional dependency is true for all the possible states of the relation.
 Relation extensions that satisfy the functional dependency are called legal relation states
(or relation extensions).
 An FD cannot be inferred automatically from a given relational state but must be defined
for the relation schema.

Dr. Aparna K, Assoc. Prof, Dept. of MCA, BMSIT&M Page 1


DataBase Management System

Functional Dependencies: Example

Inferences Rules for functional dependencies

 Designer specifies functional dependencies that are semantically obvious.


 Formally, the set of all dependencies that include F as well as all dependencies that can be
inferred from F (given set of dependencies) is called the closure of F and it is denoted by F+.

F = {SSN{ENAME, BDATE, ADDRESS, DNUMBER}, DNUMBER{DNAME,DMGRSSN}}

 We can infer the following additional functional dependencies from F:


SSN {DNAME,DMGRSSN}
SSNSSN
DNUMBERDNAME

 A set of inference rules can be applied to infer new dependencies from a given set of
dependencies
 F╞ XY denotes that functional dependency XY is inferred from the set of given
functional dependencies F.
 Armstrong's inference rules are a complete set of inference rules and can be applied to get
closure of functional dependencies.

Armstrong's inference rules:

IR1. (Reflexive rule) If Y is subset of X, then X  Y

Dr. Aparna K, Assoc. Prof, Dept. of MCA, BMSIT&M Page 2


DataBase Management System

IR2. (Augmentation rule) If X  Y, then XZ  YZ or X  Y ╞ XZ  YZ


IR3. (Transitive rule) If XY and YZ, then XZ or {XY, YZ}╞ XZ

IR1, IR2, IR3 form a sound and complete set of inference rules

 The reflexive rule (IR1) states that a set of attributes always determine itself or any of its
subsets
 Because IR1 generates dependencies that are always true and are known as trivial.
 The augmentation rule (IR2) states that adding the same set of attributes to both the left-
and right-hand sides of a dependency results in another valid dependency.
 The transitive rule (IR3) states that functional dependencies are transitive.

Some additional useful inference rules:

IR4. (Decomposition) If X  YZ, then X  Y and X  Z


IR5. (Union) If X  Y and X  Z, then X  YZ
IR6. (Psuedotransitivity) If X  Y and WY  Z, then WX  Z

 The last three inference rules, as well as any other inference rules, can be deduced from
IR1, IR2, and IR3 (completeness property)
 Decomposition rule (IR4) says that attributes from right-hand side can be removed and
applying the rule repeatedly can decompose a FD into set of FDs.
 Union (IR5) is opposite of IR4 i.e. set of FDs can be combined into a single FD
 Psuedotransitivity (IR6) is similar to transitive.

Derivation of IR4 (Decomposition)


 X  YZ (given)
 YZ  Y (using IR1)
 X  Y (using IR3 on 1 & 2)

Derivation of IR5 (Union)


 X  Y (given)
 X  Z (given)
 X  XY (using IR2 on 1)
 XY  YZ (using IR2 on 2)
 X  YZ (using IR3 on 3 & 4)

Derivation of IR6 (Psuedotransitivity)


 X  Y (given)
 WY  Z (given)

Dr. Aparna K, Assoc. Prof, Dept. of MCA, BMSIT&M Page 3


DataBase Management System

 WX  WY (using IR2 on 1)
 WX  Z (using IR3 on 3 & 2)

Inferences Rules for functional dependencies (Example)

Contracts (contractid, supplierid, projectid, deptid, partid, qty, value)

Schema is denoted for Contracts as CSJDPQV. The meaning of a tuple in this relation is that the
contract with contractid C is an agreement that supplier S (supplierid) will supply Q items of part
P (partid) to project J (projectid) associated with department D (deptid); the value V of this
contract is equal to value.

The following FDs are known to hold:


 The contract id C is a key: C  CSJDPQV.
 A project purchases a given part using a single contract: JP  C.
 A department purchases one part from a supplier: SD  P.

Several additional FDs hold in the closure of the set of given FDs:

 From JP  C, C  CSJDPQV and transitivity, we infer JP  CSJDPQV.


 From SD  P and augmentation, we infer SDJ  JP.
 From SDJ  JP, JP  CSJDPQV and transitivity, we infer SDJ  CSJDPQV.
 We can infer several additional FDs that are in the closure by using augmentation or
decomposition.
 For example, from C  CSJDPQV, using decomposition we can infer: C  C, C  S, C  J, C
 D, etc.
 Finally, we have a number of trivial FDs from the reflexivity rule.
 Closure of a set F of FDs is F+ that includes F as well as all dependencies that can be
inferred from F
 Closure of a set of attributes X with respect to F is the set X+ of all attributes that are
functionally determined by X
 X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F

Dr. Aparna K, Assoc. Prof, Dept. of MCA, BMSIT&M Page 4


DataBase Management System

Given set of functional dependencies

 SSN  ENAME
 PNUMBER  {PNAME, PLOCATION}
 {SSN, PNUMBER}  HOURS

Closure sets with respect to F

 {SSN}+  {SSN, ENAME}


 {PNUMBER}+  {PNUMBER, PNAME, PLOCATION}
 {SSN, PNUMBER}+  {SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS}

Closure of a Set of Attributes

 Consider the relation schema R(A,B,C,D) with functional dependencies {A}{C} and
{B}{D}.
 {A}+ = {A,C}
 {B}+ = {B,D}
 {C}+={C}
 {D}+={D}
 {A,B}+ = {A,B,C,D}

Normal Forms based on Primary Keys

 If a set of functional dependencies is given for each relation and each relation has a
designated primary key
 Above information and tests for normal forms drives the normalization process for
relational schema design.
 For relational design two approaches are followed:
 First perform conceptual design (ER or EER model) then map to set of relations.
 Design the relations based on external knowledge (existing implementations,
reports, forms etc.)
 Then we have to evaluate the relations for goodness and decompose them further and
further as needed to achieve higher normal forms using normalization theory.
 Normalization is carried out in practice so that the resulting designs are of high quality and
meet the desirable properties.
 The normalization process (proposed by Codd, 1972) takes a relation schema through a
series of tests to certify whether it satisfies certain normal form.

Dr. Aparna K, Assoc. Prof, Dept. of MCA, BMSIT&M Page 5


DataBase Management System

Introduction to Normalization

Three normal forms were proposed by Codd (1972). Normalization of data can be looked upon
as a process of analyzing the given relation schemas based on their FDs and primary keys to
achieve the desirable properties which are as follows:

 Minimizing redundancy
 Minimizing the insertion , deletion , and update anomalies
 If unsatisfactory relation schemas do not meet certain conditions i.e. the normal form test,
these are decomposed into smaller relation schemas that meet the tests and hence posses
the desirable properties .
 Thus, the normalization procedure provides database designers with the following :
 A formal frame work for analyzing relation schemas based on their keys and on the
functional dependencies among their attributes.
 A series of normal form tests that can be carried out on individual relation schemas so that
relational database can be normalized to any desired degree
 The normal form of a relation refers to the highest normal form condition that it meets and
hence indicates the degree to which it has been normalized.
 But normalization cannot be considered in isolation for a good database design.
 The process of normalization through decomposition must also confirm the existence of
two additional properties that the relational schemas (together) should possess.
 The lossless join or nonadditive join property: It guarantees that the spurious
tuple generation problem does not occur with respect to the relation schemas
created after decomposition. It is extremely critical and must be achieved at any
cost.
 The dependency preservation property: It ensures that each functional
dependency is represented in some individual relation resulting after
decomposition. Sometimes sacrificed for higher performance.

Practical Use of Normal Forms

 Database design in industry today pays particular attention to normalization only up to 3NF,
BCNF, 4NF.
 Sometimes relations may be left in a lower normalization status, such as 2NF, for
performance reasons.
 The process of storing the join of higher normal form relations as a base relation – which is
in a lower normal form – is known as denormalization.

Dr. Aparna K, Assoc. Prof, Dept. of MCA, BMSIT&M Page 6


DataBase Management System

Definitions of keys and Attributes participating in keys

Superkey:

 A superkey of a relation schema R= {A1, A2 …. An} is a set of attributes S subset-of R with a


property that no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S]

Key (minimal superkey):

 A key K is a superkey with additional property that removal of any attribute from K will
cause K not to be a superkey

Candidate key:

 If a relation schema has more than one key, each is called a candidate key.
 One of the candidate key is designated as primary key, and others are called Secondary
keys.
 Each relation schema must have a primary key.
 An attribute of relation schema R is called a prime attribute of R if it is a member of some
candidate key of R.
 An attribute is called nonprime, if it is not a member of any candidate key.
 For example:
 SSN and PNUMBER are the prime attributes of the relation WORKS_ON whereas other
attributes are non-prime.

First Normal Form


 It states that the domain of an attribute must include only atomic (simple, indivisible)
values.
 The value of any attribute in a tuple must be a single value from the domain of that
attribute.
 Hence, 1NF disallows having a set of values, a tuple of values, or a combination of both as
an attribute value for a single tuple.
 1NF disallows “relations within the relations”.

Dr. Aparna K, Assoc. Prof, Dept. of MCA, BMSIT&M Page 7


DataBase Management System

A relation schema which is not in 1 NF

First Normal Form (first technique)

 Remove the attribute DLOCATIONS that violates 1NF and place it in a separate relation
DEPT_LOCATIONS along with the primary key DNUMBER OF DEPARTMENT.
 The primary key of relation DEPT_LOCATIONS is the combination {DNUMBER, DLOCATION}.
 This method is the best method.
First Normal Form
(first technique: best approach)

First Normal Form (second technique)

2. Expand the key so that there will be a separate tuple in the original DEPARTMENT for each
location of a DEPARTMENT.
 Then Primary Key becomes {DNUMBER, DLOCATION} and redundancy is introduced.

Dr. Aparna K, Assoc. Prof, Dept. of MCA, BMSIT&M Page 8


DataBase Management System

1NF version of same relation with redundancy

First Normal Form (third technique)

 Replace the DLOCATIONS attribute by two atomic attributes :


 DLOCATION1
 DLOCATION2
 If a maximum number of values is known for the attributes:
 for example , if it is known that at most two locations can exist for a department
 It leads to introduction of more NULL values.

1NF version of same relation with more nulls

First Normal Form

 First technique is generally considered to be best.


 First normal form also disallows multivalued attributes that are themselves composite,
which are called nested relations because each tuple can have a relation within it.
 To normalize a relation into 1NF, nested relation attributes are removed and a new relation
with the primary key of other new relation is created.

Normalizing nested relations into 1NF

Dr. Aparna K, Assoc. Prof, Dept. of MCA, BMSIT&M Page 9


DataBase Management System

1 NF

Second Normal From

 Second normal form (2NF) is based on the concept of full functional dependency.
 Definition: A relation schema R is in 2NF if every nonprime attribute A in R is fully
functionally dependent on the primary key.

 A functional dependency X  Y is a full functional dependency, if removal of any attribute


A from X means dependency does not hold any more.
 For any attribute A Є X, (X – {A}) does not functionally determine Y.

 (X – {A})  Y is partial dependency

Dr. Aparna K, Assoc. Prof, Dept. of MCA, BMSIT&M Page 10


DataBase Management System

 {SSN, PNUMBER}  HOURS is a full FD since


neither SSN  HOURS
nor PNUMBER  HOURS hold

 {SSN, PNUMBER}  ENAME is not a full FD (it is called a partial dependency ) since
SSN  ENAME holds

 {SSN, PNUMBER} PNAME, PLOCATION is not a full FD (it is called a partial dependency )
since
PNUMBER PNAME, PLOCATION

 If Primary Key contains single attribute, the test need not be applied.

Third normal form (3NF)

 Third normal form (3NF) is based on the concept of transitive dependency.


 A functional dependency XY in a relation schema R is transitive dependency if there is a
set of attributes Z that is neither a candidate key nor a subset of any key of R and both XZ
and ZY hold.

 Definition: According to Codd’s original definition, a relation schema R is in 3NF if it satisfies


2NF and no nonprime attribute of R is transitively dependent on the primary key.

 SSN  DMGRSSN is a transitive FD since


SSN  DNUMBER and DNUMBER  DMGRSSN hold
 SSN  ENAME is non-transitive

Dr. Aparna K, Assoc. Prof, Dept. of MCA, BMSIT&M Page 11


DataBase Management System

General Definitions of Second and Third Normal Forms

 The previous definitions consider the primary key only.


 The following more general definitions take into account relations with multiple candidate
keys.

General Definition of 2NF:


 A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is
fully functionally dependent on every key of R OR

Dr. Aparna K, Assoc. Prof, Dept. of MCA, BMSIT&M Page 12


DataBase Management System

 A relation schema R is in second normal form (2NF) if every nonprime attribute A in R is not
partially dependent on any key of R.

Decomposing a relation into 2NF relations (Example)

General definition of Third Normal Form

 Superkey of relation schema R is a set of attributes S of R that contains a key of R.

 General Definition of 3NF: A relation schema R is in third normal form(3NF) if , whenever a


nontrivial functional dependency XA holds in R , either
o X is a superkey of R , or
o A is a prime attribute of R

Decomposing a relation into 3NF relations (Example)

Dr. Aparna K, Assoc. Prof, Dept. of MCA, BMSIT&M Page 13


DataBase Management System

Boyce-Codd Normal From

 BCNF was proposed as a simpler form of 3NF, but it was found to be stricter than 3NF.
 Every relation in BCNF is also in 3NF, but a relation in 3NF not necessarily is in BCNF.
 A relation schema R is in BCNF if whenever a nontrival functional dependency X  A holds
in R, then X is a superkey of R.

Decomposing a relation into BCNF relations (Example)

Dr. Aparna K, Assoc. Prof, Dept. of MCA, BMSIT&M Page 14


DataBase Management System

A relation TEACH that is in 3NF but not in BCNF

 Two FDs exist in the relation TEACH:


o fd1: { student, course}  instructor
o fd2: instructor  course
 {student, course} is a candidate key for this relation
 This relation is in 3NF but not in BCNF
 A relation NOT in BCNF should be decomposed so as to meet this property, while possibly
forgoing the preservation of all functional dependencies in the decomposed relations.

Dr. Aparna K, Assoc. Prof, Dept. of MCA, BMSIT&M Page 15


DataBase Management System

 Three possible decompositions for relation TEACH

1. {student, instructor} and {student, course}


2. {course, instructor } and {course, student}
3. {instructor, course } and {instructor, student}

 All three decompositions will lose fd1.


 We have to settle for sacrificing the functional dependency preservation. But we cannot
sacrifice the non-additive property after decomposition.
 Out of the above three, only the 3rd decomposition will not generate spurious tuples after
join. (and hence has the non-additive property).

Dr. Aparna K, Assoc. Prof, Dept. of MCA, BMSIT&M Page 16


DataBase Management System

Question Bank – Module –II

1. Define the following and give examples for each: a) Entity b) Relationship c) Role names
d) Recursive Relationship.
2. What is an attribute? Explain the different types of attributes, with suitable examples.
3. E – R Diagrams
4. Explain the following : a) Degree of relationship b) Multi valued attributes c) Derived
attributes d) Weak Entity
5. With a neat diagram, explain the main phases of the database design process.
6. Explain the following terms: a) Cardinality ratio b) Participation constraint
7. Explain Ternary relationship in detail with example.
8. Briefly explain the different notations used in an ER diagram.
9. Define the following with examples: a) Primary key b) Candidate key c) Composite key
d) Data Dictionary e) Schema f) Super Key g) Minimal Super key
10. Explain the different types of constraints in the Relational model with examples.
11. Explain the different update operations dealing with constraint violations.
12. Explain all the relational algebra operators along with their purpose, syntax and
examples of using them.
13. Explain the concept of Cartesian product with example
14. Explain the following integrity constraints: 1) key constraints 2) Entity integrity
constraints 3) Referential integrity constraints.
15. Write a note on different types of joins with examples.
16. Explain division operation with examples.
17. Explain the different aggregate and grouping functions along with “Script F” operator
with examples.
18. Explain the seven-step algorithm to convert the basic ER-model constructs into
relations, using suitable examples.
19. Questions on SQL queries / Relational algebra, given a relation database.
20. Define the term Functional Dependency, using an example.
21. List all Armstrong’s Inference axioms and prove them.

Dr. Aparna K, Assoc. Prof, Dept. of MCA, BMSIT&M Page 17


DataBase Management System

22. What is Normalization?


23. What are the two important properties that any relation schema should confirm during
the process of normalization? Explain.
24. Discuss the anomalies encountered in an un-normalized data base, with examples.
25. Explain 1NF with suitable examples.
26. Explain 2NF with suitable examples
27. Explain 3NF with suitable examples.
28. Differentiate between 3NF and BCNF with suitable examples.

Dr. Aparna K, Assoc. Prof, Dept. of MCA, BMSIT&M Page 18

You might also like