Normalization and Functional Dependency
Normalization and Functional Dependency
Normalization and Functional Dependency
• These are – Insertion, update and deletion anomaly. Let’s take an example to understand this.
Example: Suppose a manufacturing company stores the employee details in a table named
employee that has four attributes: emp_id for storing employee’s id, emp_name for storing
employee’s name, emp_address for storing employee’s address and emp_dept for storing the
department details in which the employee works. At some point of time the table looks like this:
The above table is not normalized. We will see the problems that we face when a table is not
normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs to
two departments of the company. If we want to update the address of Rick then we have to
update the same in two rows or the data will become inconsistent. If somehow, the correct
address gets updated in one department but not in other then as per the database, Rick would
be having two different addresses, which is not correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and
currently not assigned to any department then we would not be able to insert the data into the
table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890 then
deleting the rows that are having emp_dept as D890 would also delete the information of
employee Maggie since she is assigned only to this department.
Ques Explain different normal forms with examples .Also explain how they are
achieved.
As per the rule of first normal form, an attribute (column) of a table cannot hold multiple values.
Example: Suppose a company wants to store the names and contact details of its employees.
It creates a table that looks like this:
Two employees (Jon & Lester) are having two mobile numbers so the company stored them in
the same field as you can see in the table above.
This table is not in 1NF as the rule says “each attribute of a table must have atomic (single)
values”, the emp_mobile values for employees Jon & Lester violates that rule.
To make the table complies with 1NF we should have the data like this:
• An attribute that is not part of any candidate key is known as non-prime attribute.
Example: Suppose a school wants to store the data of teachers and the subjects they teach.
They create a table that looks like this: Since a teacher can teach more than one subjects, the
table can have multiple rows for a same teacher.
• However, it is not in 2NF because non prime attribute teacher_age is dependent on teacher_id
alone which is a proper subset of candidate key. This violates the rule for 2NF as the rule says
“no non-prime attribute is dependent on the proper subset of any candidate key of the table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_id subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
Now the tables comply with Second normal form (2NF).
(An attribute that is not part of any candidate key is known as non-prime attribute.)
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each
functional dependency X-> Y at least one of the following conditions hold:
An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Suppose a company wants to store the complete address of each employee, they
create a table named employee_details that looks like this:
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any
candidate keys.
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent
on emp_id that makes non-prime attributes (emp_state, emp_city & emp_district) transitively
dependent on super key (emp_id).
To make this table complies with 3NF we have to break the table into two tables to remove the
transitive dependency:
employee table:
emp_id emp_name emp_zip
1001 John 282005
1002 Ajeet 222008
1006 Lora 282007
1101 Lilly 292008
1201 Steve 222999
employee_zip table:
• A table is said to be in BCNF if it is in 3NF and for every functional dependency X->Y, X
should be the super key of the table.
Example: Suppose there is a company wherein employees work in more than one
department. They store the data like this:
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:
emp_id emp_dept
1001 Production and planning
1001 stores
1002 design and technical
support
1002 Purchasing department
Functional dependencies:
Candidate keys:
This is now in BCNF as in both the functional dependencies left side part is a key.
Suppose we have a student table with attributes: Stu_Id, Stu_Name, Stu_Age. Here
Stu_Id attribute uniquely identifies the Stu_Name attribute of student table because
if we know the student id we can tell the student name associated with it.
Formally
If column A of a table uniquely identifies the column B of same table then it can
represented as A->B (Attribute B is functionally dependent on attribute A).Here A is called
determinant and B is called dependent attribute.
For example: Consider a table with two columns Student_id and Student_Name.
That makes sense because if we know the values of Student_Id and Student_Name
then the value of Student_Id can be uniquely determined.
Also, Student_Id -> Student_Id & Student_Name -> Student_Name are trivial
dependencies too.
If a functional dependency X->Y holds true where Y is not a subset of X then this
dependency is called non trivial Functional dependency.
For example:
Completely non trivial FD: If a FD X->Y holds true where X intersection Y is null then
this dependency is said to be completely non trivial function dependency.
When one column data match with multiple values in another columns within a same
table is called multivalued dependency.
For example: Consider a bike manufacture company, which produces two colors
(Black and red) in each model every year.
In this case these two columns are said to be multivalued dependent on bike_model.
These dependencies can be represented like this:
Transitive dependency
Note: A transitive dependency can only occur in a relation of three of more attributes.
This dependency helps us normalizing the database in 3NF (3rd Normal Form).
{Book} ->{Author} (if we know the book, we knows the author name)
Therefore as per the rule of transitive dependency: {Book} -> {Author_age} should
hold, that makes sense because if we know the book name we can know the author’s
age.
Properties of Decomposition
1. Lossless Decomposition
2. Dependency Preservation
3. Lack of Data Redundancy
1. Lossless Decomposition
• Decomposition must be lossless. It means that the information should not get lost
from the relation that is decomposed.
• It gives a guarantee that the join will result in the same relation as it was
decomposed.
Example:
Let's take 'E' is the Relational Schema, With instance 'e'; is decomposed into: E1,
E2, E3, . . . . En; With instance: e1, e2, e3, . . . . en, If e1 ⋈ e2 ⋈ e3 . . . . ⋈ en,
then it is called as 'Lossless Join Decomposition'.
• In the above example, it means that, if natural joins of all the decomposition give
the original relation, then it is said to be lossless join decomposition.
Example: <Employee_Department> Table
Eid Ename Age City Salary Deptid DeptName
E001 ABC 29 Pune 20000 D001 Finance
E002 PQR 30 Pune 30000 D002 Production
E003 LMN 25 Mumbai 5000 D003 Sales
E004 XYZ 24 Mumbai 4000 D004 Marketing
E005 STU 32 Bangalore 25000 D005 Human Resource
• Decompose the above relation into two relations to check whether a decomposition
is lossless or lossy.
• Now, we have decomposed the relation that is Employee and Department.
Relation 1 : <Employee> Table
• If the <Employee> table contains (Eid, Ename, Age, City, Salary) and
<Department> table contains (Deptid and DeptName), then it is not possible to
join the two tables or relations, because there is no common column between
them. And it becomes Lossy Join Decomposition.
2. Dependency Preservation
The goal of query optimization is to reduce the system resources required to fulfill a query, and
ultimately provide the user with the correct result set faster.
• First, it provides the user with faster results, which makes the application seem faster to
the user.
• Secondly, it allows the system to service more queries in the same amount of time,
because each request takes less time than unoptimized queries.
• Thirdly, query optimization ultimately reduces the amount of wear on the hardware (e.g.
disk drives), and allows the server to run more efficiently (e.g. lower power consumption,
less memory usage).
• The second step is Query Optimizer. In this, it transforms the query into equivalent expressions
that are more efficient to execute.
3. The third step is Query evaluation. It executes the above query execution plan and returns the
result.
Example
• A sequence of primitive operations that can be used to evaluate a query is a Query Execution
Plan or Query Evaluation Plan.
• The above diagram indicates that the query execution engine takes a query execution plan and
returns the answers to the query.
• Query Execution Plan minimizes the cost of query evaluation.