NORMALIZATION

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 51

Relational Database Design

Relational database design: The grouping of attributes to


form "good" relation schemas
Two levels of relation schemas:
The logical "user view" level
The storage "base relation" level
Criteria for "good" base relations:
Discuss informal guidelines for good relational design
Discuss formal concepts of functional dependencies
and normal forms 1NF 2NF 3NF BCNF

There are two popular approaches for designing the db


. Top down design
. Bottom up design
ER modeling technique is called Top down approach it
involves
i) Identifying entities and their attributes
ii) Identifying the relationship between entities
iii)Draw the ER diagram
iv)Mapping diagrams to the tables

Normalization is the bottom up approach. It is step by


step decomposition of complex records into simple records.
Normalization controls the redundancy and removes
inconsistency and update anomalies
Normalization is based on the functional dependency
and primary key
Normalization: The process of decomposing unsatisfactory
"bad" relations by breaking up their attributes into smaller
relations
Normal form: Condition using keys and FDs of a relation to
certify whether a relation schema is in a particular normal
form

Informal design guidelines for relation schemas


1) Semantics of the relation attributes
2) Reducing the redundant values in tuples
3) Reducing the null values in tuples
4) Disallowing the possibility of generating spurious
tuples

Semantics of the Relation Attributes


GUIDELINE 1: Informally, each tuple in a relation
should represent one entity or relationship instance.
Attributes of different entities (EMPLOYEEs,

DEPARTMENTs, PROJECTs) should not be mixed


in the same relation
Only foreign keys should be used to refer to other
entities
Entity and relationship attributes should be kept
apart as much as possible.

Redundant Information in Tuples and Update


Anomalies
GUIDELINE 2:
Mixing attributes of multiple entities may cause
problems
Information is stored redundantly wasting storage
Problems with update anomalies
Insertion anomalies
Deletion anomalies
Modification anomalies

Insert Anomaly: Cannot insert a project unless an


employee is assigned to .
Inversely - Cannot insert an employee unless an
he/she is assigned to a project.
Delete Anomaly: When a project is deleted, it will
result in deleting all the employees who work on that
project. Alternately, if an employee is the sole
employee on a project, deleting that employee would
result in deleting the corresponding project

Update

Anomaly: Changing the name of project


number P1 from Billing to CustomerAccounting may cause this update to be made for
all 100 employees working on project P1.

GUIDELINE

2: Design a schema that does not


suffer from the insertion, deletion and update
anomalies. If there are any present, then note them
so that applications can be made to take them into
account

If a database design is not perfect, it may contain anomalies, which


are like a bad dream for any database administrator. Managing a
database with anomalies is next to impossible.
Update anomalies If data items are scattered and are not linked
to each other properly, then it could lead to strange situations. For
example, when we try to update one data item having its copies
scattered over several places, a few instances get updated properly
while a few others are left with old values. Such instances leave the
database in an inconsistent state.
Deletion anomalies We tried to delete a record, but parts of it
was left undeleted because of unawareness, the data is also saved
somewhere else.
Insert anomalies We tried to insert data in a record that does not
exist at all.

Null Values in Tuples


GUIDELINE 3: Relations should be designed such
that their tuples will have as few NULL values as
possible
Reasons for nulls:
attribute not applicable or invalid
attribute value unknown (may exist)
value known to exist, but unavailable

Spurious Tuples
GUIDELINE 4: The relations should be designed to
satisfy the lossless join condition. No spurious tuples
should be generated by doing a natural-join of any
relations.
There are two important properties of decompositions:
(a)non-additive or losslessness of the corresponding join
(b)preservation of the functional dependencies.

Functional dependency
its a constraint between two set of attributes
from the db.
A F.D denoted by X-> Y between two sets of
attributes x and y that are subsets of R specifies a
constraint on the possible tuples that can form a
relation state r of R
The constraint is that, for any two tuples t1 & t2 in r
t1[x]=t2[x]
t1[y]=t2[y]

R(X,Y)
X
t1
10
t2
10

Y
d1
d1

There is a FD from X to Y or Y is FD on X
FD=> Functional dependency or f.d
X=> L.H.S
Y=> R.H.S

Full Functional dependency


Partial Functional dependency
Transitive dependency
Full Functional dependency
e.g., Eno,Pno-> Hours
Partial Functional dependency
e.g., Eno,Pno->Ename
Transitive dependency
e.g., Eno->Dno
Dno->Dname

Eno->Dname

Given a set of FDs F, we can infer additional FDs that


hold whenever the FDs in F hold
Armstrong's inference rules:
IR1. (Reflexive) If Y subset-of X, then X -> Y
IR2. (Augmentation) If X -> Y, then XZ -> YZ
(Notation: XZ stands for X U Z)
IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z
Some additional inference rules that are useful:
IR4. (Decomposition) If X -> YZ, then X -> Y and X -> Z
IR5. (Union) If X -> Y and X -> Z, then X -> YZ
IR6. (Psuedotransitivity) If X -> Y and WY -> Z, then
WX -> Z

Trivial , Non trivial

Trivial If a functional dependency (FD) X Y


holds, where Y is a subset of X, then it is called a
trivial FD. Trivial FDs always hold.
Non-trivial If an FD X Y holds, where Y is not a
subset of X, then it is called a non-trivial FD.
Completely non-trivial If an FD X Y holds,
where x intersect Y = , it is said to be a completely
non-trivial FD.

Candidate key
If a relation schema has more than one key, each is called
a candidate key. One of the candidate keys is arbitrarily
designated to be the primary key, and the others are called
secondary keys.
Prime and Non prime attribute
A Prime attribute must be a member of some candidate
key
A Nonprime attribute is not a prime attributethat is, it is
not a member of any candidate key.

Normalization of data is a process of analyzing the given


relation schemas based on their FD & primary keys to
achieve the desirable properties
1) Minimizing redundancy
2) Minimizing the insertion , deletion and modification
anomalies
Normal forms
1NF, 2NF, 3NF, BCNF(Boyce Codd Normal Form),4NF
and 5NF

1NF-

is based on primary key and atomic values and there


must be no composite attributes, multivalued attributes and
relation with in relation.

Composite attribute
Eno

Address

Ename
Fname
Lname

Eno

Address

Fname Lname

Multivalued Attribute
Dno
Dno

Multivalued
Attribute

Dname Dlocation
Dname

Dno Dlocation

Relation with in Relation


Eno Ename Addr
Eno Ename

Pno

Pname

Eno Pno

Pname

2NF - There is no partial dependency.


It is based on the concept of full functional dependency and
non key attribute should be fully dependent on the key
attribute.
A F.D X->Y if fully F.D
Def: A rs R is in 2NF if every non prime attribute A in R is full
FD on the primary key of R

Eg.
R={eno, pno, hours, ename, pname, plocation}
Given functional dependency
FD = {{eno,pno}-> hours,
eno->ename
pno->pname, plocation}
R1={eno,pno,hours}
R2 = {eno,ename}
R3={pno,pname,plocation}
now all the relations R1, R2 and R3 are in full functional
dependency.

3NFIt is based on the concept of transitive dependency


Def: A rs R is in 3NF if it satisfies 2 NF and no non
prime attribute of R is transitively dependent on the
primary key
Def: A rs R is in 3NF if, When ever a non trivial FD
X-> A holds in R, either
a) X is a super key of R (or)
b) A is a prime attribute of R

Eg:
R={eno, ename, address,dno,dname}
Given functional dependency
F = {eno -> ename,address,dno
dno -> dname}
R1={eno,ename,address,dno} R2 = {dno,dname}

BCNF(Boyce codd Normal Form)


Def:
A rs R is in BCNF if when ever a non trivial FD
X-> A holds in R, then X is a super key of R

Closure of a Set of Functional


Dependencies

Closure of a Set of Attributes

Redundancy of FDs

Canonical Cover

Example of Computing a
Canonical Cover

Finding Keys

3. Multivalued Dependencies and Fourth Normal


Form (1)
(a) The EMP relation with two MVDs: ENAME >> PNAME and ENAME >> DNAME. (b)
Decomposing the EMP relation into two 4NF relations EMP_PROJECTS and EMP_DEPENDENTS.

Multivalued Dependencies and Fourth Normal


Form (2)
Definition:

A multivalued dependency (MVD) X >> Y specified on


relation schema R, where X and Y are both subsets of R,
specifies the following constraint on any relation state r of R: If
two tuples t1 and t2 exist in r such that t1[X] = t2[X], then two
tuples t3 and t4 should also exist in r with the following

properties, where we use Z to denote (R 2 (X Y)):


t3[X] = t4[X] = t1[X] = t2[X].
t3[Y] = t1[Y] and t4[Y] = t2[Y].
t3[Z] = t2[Z] and t4[Z] = t1[Z].
An MVD X >> Y in R is called a trivial MVD if (a) Y is a
subset of X, or (b) X Y = R.

Multivalued Dependencies and Fourth Normal


Form (4)
Definition:
A relation schema R is in 4NF with respect to a set of
dependencies F (that includes functional dependencies
and multivalued dependencies) if, for every nontrivial
multivalued dependency X >> Y in F+, X is a superkey
for R.

Multivalued Dependencies and Fourth Normal


Form (5)
Decomposing a relation state of EMP that is not in 4NF. (a) EMP relation with additional tuples. (b)
Two corresponding 4NF relations EMP_PROJECTS and EMP_DEPENDENTS.

4. Join Dependencies and Fifth Normal Form (1)


Definition:

A join dependency (JD), denoted by JD(R1, R2, ..., Rn),


specified on relation schema R, specifies a constraint on the
states r of R. The constraint states that every legal state r of R
should have a non-additive join decomposition into R1, R2, ...,
Rn; that is, for every such r we have
* (R1(r), R2(r), ..., Rn(r)) = r

A join dependency JD(R1, R2, ..., Rn), specified on relation


schema R, is a trivial JD if one of the relation schemas Ri in
JD(R1, R2, ..., Rn) is equal to R.

Join Dependencies and Fifth Normal Form (2)


Definition:
A relation schema R is in fifth normal form (5NF) (or
Project-Join Normal Form (PJNF)) with respect to a
set F of functional, multivalued, and join dependencies
if, for every nontrivial join dependency JD(R1, R2, ...,
Rn) in F+ (that is, implied by F), every Ri is a superkey
of R.

Relation SUPPLY with Join Dependency and


conversion to Fifth Normal Form
(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has the JD(R1, R2, R3). (d)
Decomposing the relation SUPPLY into the 5NF relations R1, R2, and R3.

Steps to find Minimal Cover


Singleton attributes in RHS
Identify extraneous attributes and remove it
Remove redundant dependencies
Singleton attributes in RHS
AB->CD
The above functional dependency should be
decomposed to singleton attributes in the RHS as below.
AB-> C and
AB-> D

Identify extraneous attributes and remove it


If an attribute doesnt give any meaning to the functional
dependency, we say it as extraneous and remove it
Consider the functional dependencies
A-> B
If the LHS has more than one attribute, check whether there exists an
AB-> C extraneous( extra/unwanted) attribute if so, remove it.
D-> AC LHS which have 2 attributes is AB-> C
D-> E
+

A = ABC ,

B+ = B [Reflexivity]

If an attribute closure gives only its own attribute by satisfying


reflexivity, that attribute in the functional dependency is
extraneous.
B is extraneous in AB-> C implies A-> C

Finding Redundant Dependency


Consider the functional dependencies
A-> B
Step 2: In LHS there is no extraneous attribute
A-> C
Step 3: Remove redundant dependencies
D-> AC
1.Remove A-> B and find the attribute closure for A
A+ =AC[here if we are not consider A-> B , B cant be
D-> E
Step 1: Apply
singleton to RHS
A-> B
A-> C
D-> A
D-> C
D-> E

found in A+, so A-> B cant be a redundant dependency.


2. Remove A-> C and find the attribute closure for A
A+ =AB[here if we are not consider A-> C , C cant be
found in A+, so A-> C cant be a redundant dependency.
3. Remove D-> A and find the attribute closure for D
D+ =DCE[here if we are not consider D-> A , A cant be
found in D+, so D-> A cant be a redundant dependency
4. Remove D-> C and find the attribute closure for D
D+ =DAEC[here if we are not consider D-> C , C could be
found in D+, so D-> C is the redundant dependency so it should be
removed. Then the FDs are A-> B, A-C , D->A, D->E
5. Remove D-> E and find the attribute closure for D
D+ =DABC [here if we are not consider D-> E , E cant be
found in D+, so D-> E cant be a redundant dependency

So, Minimal cover will be after removing


a) Extraneous Attributes
b) Redundant Dependencies

Minimal Functional Dependencies are


A-> B
A-> C
D-> A
D->E

Find a Minimal Cover


R(A

B C D E)
F ={ A->D,

BC-> AD,

C->B,

E->A,

E->D}
Steps:
Singleton attributes in RHS
Identify extraneous attributes and remove it
Remove redundant dependencies

R(A B C D E)
F ={ A->D,
BC-> AD,
C->B,
E->A,
E->D}

Singleton attributes in

RHS
F={ A->D,

BC->A,
BC->D,
C->B,
E->A,
E->D}

Identify extraneous attributes and remove it


F={ A->D,

BC->A,
BC->D,
C->B,
E->A,
E->D}

F={ A->D,

C->A,
C->D,
C->B,
E->A,
E->D}

Remove redundant FDs


F={ A->D,

C->A,
C->D,
C->B,
E->A,
E->D}

F={ A->D,

C->A,
C->B,
E->A,
}

Equivalence of sets of FDs


Two

sets of FDs E and F


F is said to cover E if every FD in E is also
in closure of F
E and F are equivalent
if

E covers F and F covers E


E+ = F+

You might also like