DBMS Module-2-Notes - Normalization
DBMS Module-2-Notes - Normalization
DBMS Module-2-Notes - Normalization
Module – II
Normalization (Part-3)
Functional Dependencies
Normal Forms Based on Primary Keys
General Definition of 2nd & 3rd Normal forms
Boyce-Codd Normal Form
Functional Dependencies
Functional Dependency is a constraint between two sets of attributes from the database.
Definition
A set of inference rules can be applied to infer new dependencies from a given set of
dependencies
F╞ XY denotes that functional dependency XY is inferred from the set of given
functional dependencies F.
Armstrong's inference rules are a complete set of inference rules and can be applied to get
closure of functional dependencies.
IR1, IR2, IR3 form a sound and complete set of inference rules
The reflexive rule (IR1) states that a set of attributes always determine itself or any of its
subsets
Because IR1 generates dependencies that are always true and are known as trivial.
The augmentation rule (IR2) states that adding the same set of attributes to both the left-
and right-hand sides of a dependency results in another valid dependency.
The transitive rule (IR3) states that functional dependencies are transitive.
The last three inference rules, as well as any other inference rules, can be deduced from
IR1, IR2, and IR3 (completeness property)
Decomposition rule (IR4) says that attributes from right-hand side can be removed and
applying the rule repeatedly can decompose a FD into set of FDs.
Union (IR5) is opposite of IR4 i.e. set of FDs can be combined into a single FD
Psuedotransitivity (IR6) is similar to transitive.
WX WY (using IR2 on 1)
WX Z (using IR3 on 3 & 2)
Schema is denoted for Contracts as CSJDPQV. The meaning of a tuple in this relation is that the
contract with contractid C is an agreement that supplier S (supplierid) will supply Q items of part
P (partid) to project J (projectid) associated with department D (deptid); the value V of this
contract is equal to value.
Several additional FDs hold in the closure of the set of given FDs:
SSN ENAME
PNUMBER {PNAME, PLOCATION}
{SSN, PNUMBER} HOURS
Consider the relation schema R(A,B,C,D) with functional dependencies {A}{C} and
{B}{D}.
{A}+ = {A,C}
{B}+ = {B,D}
{C}+={C}
{D}+={D}
{A,B}+ = {A,B,C,D}
If a set of functional dependencies is given for each relation and each relation has a
designated primary key
Above information and tests for normal forms drives the normalization process for
relational schema design.
For relational design two approaches are followed:
First perform conceptual design (ER or EER model) then map to set of relations.
Design the relations based on external knowledge (existing implementations,
reports, forms etc.)
Then we have to evaluate the relations for goodness and decompose them further and
further as needed to achieve higher normal forms using normalization theory.
Normalization is carried out in practice so that the resulting designs are of high quality and
meet the desirable properties.
The normalization process (proposed by Codd, 1972) takes a relation schema through a
series of tests to certify whether it satisfies certain normal form.
Introduction to Normalization
Three normal forms were proposed by Codd (1972). Normalization of data can be looked upon
as a process of analyzing the given relation schemas based on their FDs and primary keys to
achieve the desirable properties which are as follows:
Minimizing redundancy
Minimizing the insertion , deletion , and update anomalies
If unsatisfactory relation schemas do not meet certain conditions i.e. the normal form test,
these are decomposed into smaller relation schemas that meet the tests and hence posses
the desirable properties .
Thus, the normalization procedure provides database designers with the following :
A formal frame work for analyzing relation schemas based on their keys and on the
functional dependencies among their attributes.
A series of normal form tests that can be carried out on individual relation schemas so that
relational database can be normalized to any desired degree
The normal form of a relation refers to the highest normal form condition that it meets and
hence indicates the degree to which it has been normalized.
But normalization cannot be considered in isolation for a good database design.
The process of normalization through decomposition must also confirm the existence of
two additional properties that the relational schemas (together) should possess.
The lossless join or nonadditive join property: It guarantees that the spurious
tuple generation problem does not occur with respect to the relation schemas
created after decomposition. It is extremely critical and must be achieved at any
cost.
The dependency preservation property: It ensures that each functional
dependency is represented in some individual relation resulting after
decomposition. Sometimes sacrificed for higher performance.
Database design in industry today pays particular attention to normalization only up to 3NF,
BCNF, 4NF.
Sometimes relations may be left in a lower normalization status, such as 2NF, for
performance reasons.
The process of storing the join of higher normal form relations as a base relation – which is
in a lower normal form – is known as denormalization.
Superkey:
A key K is a superkey with additional property that removal of any attribute from K will
cause K not to be a superkey
Candidate key:
If a relation schema has more than one key, each is called a candidate key.
One of the candidate key is designated as primary key, and others are called Secondary
keys.
Each relation schema must have a primary key.
An attribute of relation schema R is called a prime attribute of R if it is a member of some
candidate key of R.
An attribute is called nonprime, if it is not a member of any candidate key.
For example:
SSN and PNUMBER are the prime attributes of the relation WORKS_ON whereas other
attributes are non-prime.
Remove the attribute DLOCATIONS that violates 1NF and place it in a separate relation
DEPT_LOCATIONS along with the primary key DNUMBER OF DEPARTMENT.
The primary key of relation DEPT_LOCATIONS is the combination {DNUMBER, DLOCATION}.
This method is the best method.
First Normal Form
(first technique: best approach)
2. Expand the key so that there will be a separate tuple in the original DEPARTMENT for each
location of a DEPARTMENT.
Then Primary Key becomes {DNUMBER, DLOCATION} and redundancy is introduced.
1 NF
Second normal form (2NF) is based on the concept of full functional dependency.
Definition: A relation schema R is in 2NF if every nonprime attribute A in R is fully
functionally dependent on the primary key.
{SSN, PNUMBER} ENAME is not a full FD (it is called a partial dependency ) since
SSN ENAME holds
{SSN, PNUMBER} PNAME, PLOCATION is not a full FD (it is called a partial dependency )
since
PNUMBER PNAME, PLOCATION
If Primary Key contains single attribute, the test need not be applied.
A relation schema R is in second normal form (2NF) if every nonprime attribute A in R is not
partially dependent on any key of R.
BCNF was proposed as a simpler form of 3NF, but it was found to be stricter than 3NF.
Every relation in BCNF is also in 3NF, but a relation in 3NF not necessarily is in BCNF.
A relation schema R is in BCNF if whenever a nontrival functional dependency X A holds
in R, then X is a superkey of R.
1. Define the following and give examples for each: a) Entity b) Relationship c) Role names
d) Recursive Relationship.
2. What is an attribute? Explain the different types of attributes, with suitable examples.
3. E – R Diagrams
4. Explain the following : a) Degree of relationship b) Multi valued attributes c) Derived
attributes d) Weak Entity
5. With a neat diagram, explain the main phases of the database design process.
6. Explain the following terms: a) Cardinality ratio b) Participation constraint
7. Explain Ternary relationship in detail with example.
8. Briefly explain the different notations used in an ER diagram.
9. Define the following with examples: a) Primary key b) Candidate key c) Composite key
d) Data Dictionary e) Schema f) Super Key g) Minimal Super key
10. Explain the different types of constraints in the Relational model with examples.
11. Explain the different update operations dealing with constraint violations.
12. Explain all the relational algebra operators along with their purpose, syntax and
examples of using them.
13. Explain the concept of Cartesian product with example
14. Explain the following integrity constraints: 1) key constraints 2) Entity integrity
constraints 3) Referential integrity constraints.
15. Write a note on different types of joins with examples.
16. Explain division operation with examples.
17. Explain the different aggregate and grouping functions along with “Script F” operator
with examples.
18. Explain the seven-step algorithm to convert the basic ER-model constructs into
relations, using suitable examples.
19. Questions on SQL queries / Relational algebra, given a relation database.
20. Define the term Functional Dependency, using an example.
21. List all Armstrong’s Inference axioms and prove them.