Redundancy Dependency Loss of Information

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 61

Unit 3: Normalization

Definition: Normalization is a database design technique which organizes tables in a


manner that reduces redundancy and dependency of data.

It is also useful to minimize the of use of null values and the prevention of loss of
information.

More specifically, if a relation is normalized (well-formed), when rows can be


inserted, deleted, or modified without creating anomalies.

• Normalization allows the database designer to understand the current data structures in
an organization.

• Furthermore, it aids any future changes and enhancements to the system

• The inventor of the relational model Edgar Codd proposed the theory of normalization
with the introduction of First Normal Form, and he continued to extend theory with
second and third Normal Form. 1
Normalization cont’d…
Normalization Principles
- Relational design principles for normalized relations:

- To be a well-formed relation, every determinant must be a candidate key.

- Any relation that is not well-formed should be broken down into two or more well-
formed relations

• TIP: as a general rule, a well-formed relation will not encompass more than one
business concept!

2
Aims of Normalization
 Physical space needed to store data is reduced.

 It ensures that the database is structured in the best possible way (data becomes better
organized)

 To achieve control over data redundancy. There should be no necessary duplication of


data in different tables.

 To ensure data consistency.

 To ensure tables have a flexible structure. E.g. number of classes taken or books
borrowed should not be limited

 To allow data in different tables can be used in complex queries

 It is used to minimize deletion, insertion and updating anomalies

3
Anomalies
 Anomalies are inconvenient or error-prone situations arising when we process the
tables.

There are three types of anomalies:

i. Update anomalies:-an update anomaly exists when one or more instances of duplicated
data is updated, but not all.

 For example, consider Jones moving address - you need to update all instances of
Jones's address.

4
Anomalies cont’d…
StudentNum CourseNum Student Address Course
Name

S21 9201 Jones Edinburgh Accounts

S21 9267 Jones Edinburgh Accounts

S24 9267 Smith Glasgow physics

S30 9201 Richards Manchester Computing

S30 9322 Richards Manchester Maths

5
Anomalies cont’d…
ii. Delete Anomalies:-a delete anomaly exists when certain attributes are lost because of
the deletion of other attributes.

Consider the table above, what happens if student S30 is discarded to leave the course?

Ans: All information about the course is lost.

6
Anomalies cont’d…
iii. Insert Anomalies:-an insert anomaly occurs when certain attributes cannot be
inserted into the database without the presence of other attributes.

- For example this is the converse of delete anomaly - we can't add a new course unless we
have at least one student enrolled on that course.

StudentNu CourseNum Student Address Course


m Name
S21 9201 Jones Edinburgh Accounts
S21 9267 Jones Edinburgh Accounts
S24 9267 Smith Glasgow physics
S30 9201 Richards Manchester Computing
S30 9322 Richards Manchester Maths

7
Anomalies cont’d…

8
Stages of Normalization
• It involves the process of applying a series of tests on a relation to determine whether it
satisfies or violets the requirements of a given normal form.

• When a test fails, the relation is decomposed into simpler relations that individually meet
the normalization tests.

• The higher the normal form the less vulnerable to update anomalies the relation
becomes.

• Three normal forms: 1NF, 2NF and 3NF where initially proposed by Codd.

• All these normal forms are based on the functional dependencies among the attributes of
a relation.

9
Stages of Normalization

• First Normal Form (1NF)

• Second Normal Form (2NF) and

• Third Normal Form (3NF)

All these normal forms are based on the functional dependencies among
the attributes of a relation.

10
Normalization Stages cont’d…
• Normalization follows a staged process that obeys a set of rules. The steps of
normalization are:

• Step 1: Select the data source and convert into an un normalized table (UNF)

• Step 2: Transform the un normalized data into first normal form (1NF)

• Step 3: Transform data in first normal form (1NF) into second normal form (2NF)

• Step 4: Transform data in second normal form (2NF) into third normal form (3NF)

11
Normalization Stages cont’d…

12
First Normal Form (1NF)
• A table is said to be in its 1NF, if there is no multi-valued attributes. In 1NF:

Create a separate table for each set of related data


All records must be identified uniquely with a primary key
The values in each columns of a table are atomic – i.e. single valued (no multi-value
attributes are not allowed)
Each table has primary key: minimal set of attributes which can uniquely identify a
record
There are no repeating groups: two columns do not store similar information in the
same table.
All values in each field must be of the same data type.
Each record needs to be unique
13
First Normal Form (1NF) cont’d…
• By default, all relations are in 1NF (because a relation should not contain
multivalued attribute)

• If there is an existence of repeating values, create a new table to move the


repeating groups from the original table
Which table is in 1NF?

Why?
Because the table found in the right side doesn’t contain repeating groups (no
multi-valued attribute) 14
First Normal Form (1NF) cont’d…

• Do you think that the above table is a valid table?

• No, because of the existence of multi-valued attribute prohibits the definition of


primary key.
15
First Normal Form (1NF) cont’d…
• There are two mechanisms to overcome this problem. These are:

a. Defining composite key as shown.

• This way of eliminating multi-valued attribute has its own serious draw back like
update anomaly.

16
First Normal Form (1NF) cont’d…
b. The other method is decomposing the table in two tables

17
First Normal Form (1NF) cont’d…
A table containing multi-valued attributed can also converted in to 1NF table
by changing it into atomic attribute

N.B::
- There is a structural change of a
table and
- Storage space wastage

18
First Normal Form (1NF) cont’d…
• STUDENT: Un normalized table

19
First Normal Form (1NF) cont’d…
• To convert the above table from un normalized form to 1NF, simply convert
any repeated attributes in to part of the candidate key

• STUDENT (Number, Name, Classes)



• STUDENT (Number, Name, Classes)

• STUDENT: First Normal Form table

20
First Normal Form (1NF) cont’d…
• STUDENT: First Normal Forma table

21
Second Normal Form (2NF)
• A table is said to be in 2NF if both the following conditions hold:

Table is in 1NF

No non-prime attribute is dependent on the proper subset of any candidate


key of table
Any attribute that is not part of any candidate key is known as non-prime
attribute

22
Functional Dependency
• The concept of functional dependency is central to normalization and, in particular,
strongly related to 2NF

• Functional dependency is the relationship that describes how the value of one attribute
may be used to find the value of another attribute.

• Determinant

• It is an attribute that can be used to find the value of another attribute in the relation.

Example: If ‘X’ is a set of attributes within a relation, then we say ‘A’ (an attribute or set
of attributes), is functionally dependent on X, there is only one corresponding value of A.

• We write this as: XA (i.e. X is a determinant)

23
• For example the value of attribute name and city could be determined by
knowing the value of Reg. #.
• Reg #Name, City

24
Functional Dependency cont’d….

Types of Functional Dependency

• Partial dependency

• Full dependency and

• Transitive dependency

25
Functional Dependency cont’d….
• Partial dependency: it is a dependency where non-key attributes functionally depend on
any parts of the composite key.

• It exists if there is an existence of composite keys.

26
Functional Dependency cont’d….

• In the above table knowing the value of the attribute Emp_ID could help to determine the
value the non-key attribute name (Graphically: Emp_IDName).

• Hence the non-key attribute name is partially dependent on the composite key.

• Similarly knowing the value of SW-ID could determine the value of the non-key attribute
SW-Title (Graphically: SW-IDSW-Title).

• I.e. SW-Title is partially dependent on the composite key.

27
Functional Dependency cont’d….
• Full Dependency: It is a dependency where non-key attributes are functionally
dependent on complete key.

• For example: the value of Hrs-Worked can be determined only knowing the values of
the composite keys (Emp-ID and SW-ID).

• Therefore the attribute Hrs-Worked is fully functionally dependent on Emp-ID and SW-
ID (Graphically: Emp-ID, SW-IDHrs-Worked).

• Transitive dependency: it is the dependency where non-key attributes became


determinant of other non-key attributes.

• For example: The non-key attribute date-completion could be determined by the other
non-key attribute Project ID (Graphically: Project-IDDate-Completion).

28
Functional Dependency cont’d….

29
Second Normal Form (2NF)
• A table is said to be in 2NF, if it is in 1NF and no column that is not part of the primary
key is dependent only on a portion of the primary key.

• Or it is in 1NF PLUS every non-key attribute is fully functionally dependent on the


entire primary key (i.e. every non-key attribute must be defined by the entire key, not by
only part of the key –no partially dependency).

• Hence, the concept of functional dependency is central to normalization and, in


particular, strongly related to 2NF.

• If we have relational table containing full dependency along with partial dependencies
can be decomposed as shown below.

• The determinant of each partial dependency table can be the primary key of the
corresponding table.
30
Second Normal Form (2NF) cont’d….

31
Second Normal Form (2NF) cont’d…
• For example if we take the following table, definitely it satisfies the rules of 1NF (no
multivalued attribute), but not 2NF.

• Because the three anomalies exists unless doing further decomposition.


• New software cannot be added unless an employee assigned to it.
• Similarly new employee cannot be added unless a software is assigned to an employee.
So we are forced to stop adding row.
• Therefore there is a problem of insert anomaly

32
Second Normal Form (2NF) cont’d…

• In the above table knowing the value of the attribute Emp_ID could help to determine the
value the non-key attribute name (Graphically: Emp_IDName).
• Hence the non-key attribute name is partially dependent on the composite key.
• Similarly knowing the value of SW-ID could determine the value of the non-key attribute
SW-Title (Graphically: SW-IDSW-Title).
• I.e. SW-Title is partially dependent on the composite key.
• Full Dependency: It is a dependency where non-key attributes are functionally
dependent on complete key.

33
Second Normal Form (2NF) cont’d…
• For example: the value of Hrs-Worked can be determined only knowing the values of
the composite keys (Emp-ID and SW-ID).

• Therefore the attribute Hrs-Worked is fully functionally dependent on Emp-ID and SW-
ID (Graphically: Emp-ID, SW-IDHrs-Worked).

• Transitive dependency: it is the dependency where non-key attributes became


determinant of other non-key attributes.

• For example: The non-key attribute date-completion could be determined by the other
non-key attribute Project ID (Graphically: Project-IDDate-Completion).

34
Second Normal Form (2NF) cont’d…
Second Normal Form (2NF)
• It is in 1NF PLUS every non-key attribute is fully functionally dependent on the entire
primary key (i.e. every non-key attribute must be defined by the entire key, not by only
part of the key).
• No partial dependency
• Hence, the concept of functional dependency is central to normalization and, in
particular, strongly related to 2NF.
• If we have relational table containing full dependency along with partial dependencies
can be decomposed as shown below.
• The determinant of each partial dependency table can be the primary key of the
corresponding table.

35
Second Normal Form (2NF) cont’d…
• For example if we take the following table, definitely it satisfies the rules of 1NF (no
multivalued attribute), but not 2NF.

• Because the three anomalies exists unless doing further decomposition.

• New software cannot be added unless an employee assigned to it.

• Similarly new employee cannot be added unless a software is assigned to an employee.

• So we are forced to stop adding row. Therefore there is a problem of insert anomaly

36
Second Normal Form (2NF) cont’d…
• In addition multiple updates are needed as is redundantly recorded (Employee
name and software title), update anomaly.
• If we delete the last row, the information associated with it also be deleted like the
course Visual Basic as only a single employee is working with it, delete anomaly.

37
Second Normal Form (2NF) cont’d…

• The solution is applying partial and full dependency rules

• The partial dependency of name attribute is converted in to Employee table


(Emp_IDName) and similarly the partial dependency of software attribute is converted
to software table (SW-IDSW-Title).
• The full dependency Hrs-Worked is converted in to work table (Emp_ID, SW-IDHrs-
Worked)

38
Second Normal Form (2NF) cont’d…

• Tips:

• Remove any key attributes (partial dependencies) that only depend on part of the table
key to a new table.

• What has to be determined is “is field A dependent upon field B or vice versa?”

• This means: “Given a value for A, do we then have only one possible value for B, and
vice versa?”

• If the answer is yes, A and B should be put into a new relation with A becoming the
primary key.

• A should be left in the original relation and marked as a foreign key.

39
Second Normal Form (2NF) cont’d…
The process is as follows:

• Take each non-key attribute in turn and ask the question: is this attribute dependent on
one part of the key?

• If yes, remove the attribute to a new table with a copy of the part of the key it is
dependent upon. The key is dependent up on becomes the key in the new table.
Underline the key in this new table.

• If no, check against other part of the key and repeat the above process.

• If still no, i.e. not dependent on either part the key, keep attributes in the current table.

40
Second Normal Form (2NF) cont’d…
Functional Dependency
• It is clear that:
• RefNo->Name, Adreess. Or, most correctly,
• AccNo, RefNo->Name, Adress, Status

Table is no in Second Normal Form


41
Second Normal Form (2NF) cont’d…
Table 1: RefNoName, Adress
Table 2: RefNo, AccNoStatus

42
Second Normal Form (2NF) cont’d…

Table not in 2nd NF

Tables in 2nd NF
43
Third Normal Form
A table is in the 3NF:

If it is in the 2NF and transitive functional dependency of non-prime attribute of any
supper key should be removed

Or there should not be a non-key columns dependent on other non-key columns
(transitive dependency) that could not act as a primary key.

• Solution: Non-key determinant with transitive dependency goes into a new table; non-
key determinant becomes primary key in the new table and remains as a foreign key in
the old table.

44
Third Normal Form cont’d….

• There are two possibilities of the existence of transitive dependency.


These are

a. The determinant has single attribute and

b. The determinant has multiple attribute.

45
Third Normal Form cont’d….

a. The determinant has single attribute


• If a relational table has a single attribute dependency, the process is as
follows:
• Add the primary key of each table in original table as a foreign key of the new table
along with the remaining non key attributes.

• (I.e. move the dependent attribute, together with a copy of the non-key attribute upon
which it is dependent, to a new table).

• Make the non key attribute, upon which it is dependent, the key in the new table.
Underline the key in this new table.

• Leave the non-key attribute, upon which it is dependent, in the original table and
mark it a foreign key.
46
Third Normal Form cont’d….

47
Third Normal Form cont’d….

• The table is definitely in 1NF and 2NF because:


• There is no multi-valued attribute and also there is no partial dependency
(partial dependency will not exist as there is no composite key).
• The table is in 2NF but still has all the three (insert, update and delete)
anomalies and also transitive functional dependency
• (i.e. the attribute Project ID will determine date-completion), therefore the
table is not in 3NF.

48
Third Normal Form cont’d….

• Now let us resolve this problem by decomposing in to project table (Project


IDdate-completion) and Employee table (Emp-IDName)

49
Third Normal Form cont’d….
• From the above 3NF table we can easily observe that is no anomaly at all.

• Now we can easily add new project without knowing the existence of employee

• We can also add new employee without knowing the existence of project

• No need of updating because there is no redundancy and

• The deletion of employee on employee table has no effect on project table.

50
Third Normal Form cont’d….
b. Multi attribute determinant

The table is 2NF, as there is no existence of partial dependency, but still


there is a problem of anomaly because of transitive dependency (Origin,
destination Distance).

51
Third Normal Form cont’d….

52
Not in 3NF

53
Table in 3NF

54
Exercise
Des it satisfy 2NF?
NO; Why?
- Here the primary key is (Studio,
move) and city depends only on the
studio but not on the whole key
- So, it is not in 2NF

55
Exercise
Solution

56
Exercise

Which normal form does it satisfy? And Why?

57
Solution

3NF

58
Does this table satisfy 3NF?

No

Why

Because book_id determines generic_id and generic_id determines generic type.


Therefore book_id determines generic type via generic_id and we have transitive
functional dependency.
59
Now it is in 3NF

60
61

You might also like