DBMS Tutorial PDF
DBMS Tutorial PDF
DBMS Tutorial PDF
notes
BY CHAITANYA SINGH | FILED UNDER: DBMS
DBMS stands for Database Management System. We can break it like this DBMS =
Database + Management System. Database is a collection of data and Management
System is a set of programs to store and retrieve those data. Based on this we can define
DBMS like this: DBMS is a collection of inter-related data and set of programs to store &
access those data in an easy and effective manner. Here are the DBMS notes to help you
learn database systems in a Systematic manner. Happy Learning!!
Introduction to DBMS
DBMS Applications
Advantages of DBMS over file processing system
DBMS Architecture
Three level DBMS Architecture
View of Data
Data Abstraction
Instances and Schemas
Data Models in DBMS
E-R Model in DBMS
DBMS Generalization
DBMS Specialization
DBMS Aggregation
Relational Model in DBMS
RDBMS concepts
Hierarchical data Model in DBMS
Network Model in DBMS
Database languages
Relational Algebra
Relational Calculus
Keys in DBMS
Primary key
Super key
Candidate key
Alternate key
Composite key
Foreign key
Constraints in DBMS
Domain constraints
Mapping constraints
Cardinality in DBMS
Functional dependencies in DBMS
Trivial functional dependency
non-trivial functional dependency
Multivalued dependency
Transitive dependency
Normalization in dbms – This covers all the normal forms: First Normal Form(1NF),
Second Normal Form(2NF), Third Normal Form(3NF), Boyce–Codd Normal
Form(BCNF)
Transaction Management in DBMS
ACID Properties
Transaction States
DBMS Schedules
Serializability
DBMS Conflict Serializability
DBMS View Serializability
Deadlock
Concurrency Control
Introduction to DBMS
BY CHAITANYA SINGH | FILED UNDER: DBMS
DBMS stands for Database Management System. We can break it like this DBMS =
Database + Management System. Database is a collection of data and Management
System is a set of programs to store and retrieve those data. Based on this we can define
DBMS like this: DBMS is a collection of inter-related data and set of programs to store &
access those data in an easy and effective manner.
Storage: According to the principles of database systems, the data is stored in such a way
that it acquires lot less space as the redundant data (duplicate data) has been removed
before storage. Let’s take a layman example to understand this:
In a banking system, suppose a customer is having two accounts, one is saving account
and another is salary account. Let’s say bank stores saving account data at one place
(these places are called tables we will learn them later) and salary account data at another
place, in that case if the customer information such as customer name, address etc. are
stored at both places then this is just a wastage of storage (redundancy/ duplication of
data), to organize the data in a better way the information should be stored at one place
and both the accounts should be linked to that information somehow. The same thing we
achieve in DBMS.
Fast Retrieval of data: Along with storing the data in an optimized and systematic
manner, it is also important that we retrieve the data quickly when needed. Database
systems ensure that the data is retrieved as quickly as possible.
Telecom: There is a database to keeps track of the information regarding calls made,
network usage, customer details etc. Without the database systems it is hard to
maintain that huge amount of data that keeps updating every millisecond.
Industry: Where it is a manufacturing unit, warehouse or distribution centre, each
one needs a database to keep the records of ins and outs. For example distribution
centre should keep a track of the product units that supplied into the centre as well as
the products that got delivered out from the distribution centre on each day; this is
where DBMS comes into picture.
Banking System: For storing customer info, tracking day to day credit and debit
transactions, generating bank statements etc. All this work has been done with the
help of Database management systems.
Sales: To store customer information, production information and invoice details.
Airlines: To travel though airlines, we make early reservations, this reservation
information along with flight schedule is stored in database.
Education sector: Database systems are frequently used in schools and colleges to
store and retrieve the data regarding student details, staff details, course details,
exam details, payroll data, attendance details, fees details etc. There is a hell lot
amount of inter-related data that needs to be stored and retrieved in an efficient
manner.
Online shopping: You must be aware of the online shopping websites such as
Amazon, Flipkart etc. These sites store the product information, your addresses and
preferences, credit details and provide you the relevant list of products based on your
query. All this involves a Database management system.
I have mentioned very few applications, this list is never going to end if we start mentioning
all the DBMS applications.
Data Isolation: Because data are scattered in various files, and files may be in
different formats, writing new application programs to retrieve the appropriate data is
difficult.
Duplication of data – Redundant data
Dependency on application programs – Changing files would lead to change in
application programs.
Disadvantages of DBMS:
DBMS Architecture
BY CHAITANYA SINGH | FILED UNDER: DBMS
In the previous tutorials, we learned basics of DBMS. In this guide, we will see the DBMS
architecture. Database management systems architecture will help us understand the
components of database system and the relation among them.
The architecture of DBMS depends on the computer system on which it runs. For example,
in a client-server DBMS architecture, the database systems at server machine can run
several requests made by client machine. We will understand this communication with the
help of diagrams.
For example, lets say you want to fetch the records of employee from the database and
the database is available on your computer system, so the request to fetch employee
details will be done by your computer and the records will be fetched from the database by
your computer as well. This type of system is generally referred as local database system.
2. Two tier architecture
In two-tier architecture, the Database system is present at the server machine and the
DBMS application is present at the client machine, these two machines are connected with
each other through a reliable network as shown in the above diagram.
Whenever client machine makes a request to access the database present at server using
a query language like sql, the server perform the request on the database and returns the
result back to the client. The application connection interface such as JDBC, ODBC are
used for the interaction between server and client.
3. Three tier architecture
In three-tier architecture, another layer is present between the client machine and server
machine. In this architecture, the client application doesn’t communicate directly with the
database systems present at the server machine, rather the client application
communicates with server application and the server application internally communicates
with the database system present at the server.
In the previous tutorial we have seen the DBMS architecture – one-tier, two-tier and three-
tier. In this guide, we will discuss the three level DBMS architecture in detail.
DBMS Three Level Architecture Diagram
1. External level
It is also called view level. The reason this level is called “view” is because several users
can view their desired data from this level which is internally fetched from database with
the help of conceptual and internal level mapping.
The user doesn’t need to know the database schema details such as data structure, table
definition etc. user is only concerned about data which is what returned back to the view
level after it has been fetched from database (present at the internal level).
External level is the “top level” of the Three Level DBMS Architecture.
2. Conceptual level
It is also called logical level. The whole design of the database such as relationship
among data, schema of data etc. are described in this level.
Database constraints and security are also implemented in this level of architecture. This
level is maintained by DBA (database administrator).
3. Internal level
This level is also known as physical level. This level describes how the data is actually
stored in the storage devices. This level is also responsible for allocating space to the data.
This is the lowest level of the architecture.
Abstraction is one of the main features of database systems. Hiding irrelevant details from
user and providing abstract view of data to users, helps in easy and efficient user-
database interaction. In the previous tutorial, we discussed the three level of DBMS
architecture, The top level of that architecture is “view level”. The view level provides the
“view of data” to the users and hides the irrelevant details such as data relationship,
database schema, constraints, security etc from the user.
To fully understand the view of data, you must have a basic knowledge of data abstraction
and instance & schema. Refer these two tutorials to learn them in detail.
1. Data abstraction
2. Instance and schema
Database systems are made-up of complex data structures. To ease the user interaction
with database, the developers hide internal irrelevant details from users. This process of
hiding irrelevant details from user is called data abstraction.
We have three levels of abstraction:
Physical level: This is the lowest level of data abstraction. It describes how data is
actually stored in database. You can get the complex data structure details at this level.
Logical level: This is the middle level of 3-level data abstraction architecture. It describes
what data is stored in database.
View level: Highest level of data abstraction. This level describes the user interaction with
database system.
Example: Let’s say we are storing customer information in a customer table. At physical
level these records can be described as blocks of storage (bytes, gigabytes, terabytes
etc.) in memory. These details are often hidden from the programmers.
At the logical level these records can be described as fields and attributes along with their
data types, their relationship among each other can be logically implemented. The
programmers generally work at this level because they are aware of such things about
database systems.
At view level, user just interact with system with the help of GUI and enter the details at
the screen, they are not aware of how the data is stored and what data is stored; such
details are hidden from them.
Instance and schema in DBMS
BY CHAITANYA SINGH | FILED UNDER: DBMS
DBMS Schema
Definition of schema: Design of a database is called the schema. Schema is of three
types: Physical schema, logical schema and view schema.
For example: In the following diagram, we have a schema that shows the relationship
between three tables: Course, Student and Section. The diagram only shows the design of
the database, it doesn’t show the data present in those tables. Schema is only a structural
view(design) of a database as shown in the diagram below.
The design of a database at physical level is called physical schema, how the data stored
in blocks of storage is described at this level.
Design of database at logical level is called logical schema, programmers and database
administrators work at this level, at this level data can be described as certain types of data
records gets stored in data structures, however the internal details such as implementation
of data structure is hidden at this level (available at physical level).
Design of database at view level is called view schema. This generally describes end user
interaction with database systems.
To learn more about these schemas, refer 3 level data abstraction architecture.
DBMS Instance
Definition of instance: The data stored in database at a particular moment of time is
called instance of database. Database schema defines the variable declarations in tables
that belong to a particular database; the value of these variables at a moment of time is
called the instance of that database.
For example, lets say we have a single table student in the database, today the table has
100 records, so today the instance of the database has 100 records. Lets say we are going
to add another 100 records in this table by tomorrow so the instance of database tomorrow
will have 200 records in table. In short, at a particular moment the data stored in database
is called the instance, that changes over time when we add or delete data from the
database.
DBMS languages
BY CHAITANYA SINGH | FILED UNDER: DBMS
Database languages are used to read, update and store data in a database. There are
several such languages that can be used for this purpose; one of them is SQL (Structured
Query Language).
Types of DBMS languages:
All of these commands either defines or update the database schema that’s why they
come under Data Definition language.
In practical data definition language, data manipulation language and data control
languages are not separate language, rather they are the parts of a single database
language such as SQL.
Object based logical Models – Describe data at the conceptual and view levels.
1. E-R Model
2. Object oriented Model
Record based logical Models – Like Object based model, they also describe data at the
conceptual and view levels. These models specify logical structure of database with
records, fields and attributes.
1. Relational Model
2. Hierarchical Model
3. Network Model – Network Model is same as hierarchical model except that it has
graph-like structure rather than a tree-based structure. Unlike hierarchical model, this
model allows each record to have more than one parent record.
Physical Data Models – These models describe data at the lowest level of abstraction.
An Entity–relationship model (ER model) describes the structure of a database with the
help of a diagram, which is known as Entity Relationship Diagram (ER Diagram). An ER
model is a design or blueprint of a database that can later be implemented as a database.
The main components of E-R model are: entity set and relationship set.
A simple ER Diagram:
In the following diagram we have two entities Student and College and their relationship.
The relationship between Student and College is many to one as a college can have many
students however a student cannot study in multiple colleges at the same time. Student
entity has attributes such as Stu_Id, Stu_Name & Stu_Addr and College entity has
attributes such as Col_ID & Col_Name.
Here are the geometric shapes and their meaning in an E-R Diagram. We will discuss
these terms in detail in the next section(Components of a ER Diagram) of this guide so
don’t worry too much about these terms now, just go through them once.
Components of a ER Diagram
1. Entity
An entity is an object or component of data. An entity is represented as rectangle in an ER
diagram.
For example: In the following ER diagram we have two entities Student and College and
these two entities have many to one relationship as many students study in a single
college. We will read more about relationships later, for now focus on entities.
Weak Entity:
An entity that cannot be uniquely identified by its own attributes and relies on the
relationship with other entity is called weak entity. The weak entity is represented by a
double rectangle. For example – a bank account cannot be uniquely identified without
knowing the bank to which the account belongs, so bank account is a weak entity.
2. Attribute
An attribute describes the property of an entity. An attribute is represented as Oval in an
ER diagram. There are four types of attributes:
1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute
1. Key attribute:
A key attribute can uniquely identify an entity from an entity set. For example, student roll
number can uniquely identify a student from a set of students. Key attribute is represented
by oval same as other attributes however the text of key attribute is underlined.
2. Composite attribute:
An attribute that is a combination of other attributes is known as composite attribute. For
example, In student entity, the student address is a composite attribute as an address is
composed of other attributes such as pin code, state, country.
3. Multivalued attribute:
An attribute that can hold multiple values is known as multivalued attribute. It is
represented with double ovals in an ER Diagram. For example – A person can have more
than one phone numbers so the phone number attribute is multivalued.
4. Derived attribute:
A derived attribute is one whose value is dynamic and derived from another attribute. It is
represented by dashed oval in an ER Diagram. For example – Person age is a derived
attribute as it changes over time and can be derived from another attribute (Date of birth).
E-R diagram with multivalued and derived attributes:
3. Relationship
A relationship is represented by diamond shape in ER diagram, it shows the relationship
among entities. There are four types of relationships:
1. One to One
2. One to Many
3. Many to One
4. Many to Many
DBMS Generalization
BY CHAITANYA SINGH | FILED UNDER: DBMS
Generalization is a process in which the common attributes of more than one entities form
a new entity. This newly formed entity is called generalized entity.
Generalization Example
Lets say we have two entities Student and Teacher.
Attributes of Entity Student are: Name, Address & Grade
Attributes of Entity Teacher are: Name, Address & Salary
These two entities have two common attributes: Name and Address, we can make a
generalized entity with these common attributes. Lets have a look at the ER model after
generalization.
Note:
1. Generalization uses bottom-up approach where two or more lower level entities combine
together to form a higher level new entity.
2. The new generalized entity can further combine together with lower level entity to create
a further higher level generalized entity.
DBMS Specialization
BY CHAITANYA SINGH | FILED UNDER: DBMS
Specialization is a process in which an entity is divided into sub-entities. You can think of
it as a reverse process of generalization, in generalization two entities combine together to
form a new higher level entity. Specialization is a top-down process.
The idea behind Specialization is to find the subsets of entities that have few distinguish
attributes. For example – Consider an entity employee which can be further classified as
sub-entities Technician, Engineer & Accountant because these sub entities have some
distinguish attributes.
Specialization Example
In the above diagram, we can see that we have a higher level entity “Employee” which we
have divided in sub entities “Technician”, “Engineer” & “Accountant”. All of these are just
an employee of a company, however their role is completely different and they have few
different attributes. Just for the example, I have shown that Technician handles service
requests, Engineer works on a project and Accountant handles the credit & debit details.
All of these three employee types have few attributes common such as name & salary
which we had left associated with the parent entity “Employee” as shown in the above
diagram.
DBMS Aggregration
BY CHAITANYA SINGH | FILED UNDER: DBMS
Aggregation is a process in which a single entity alone is not able to make sense in a
relationship so the relationship of two entities acts as one entity. I know it sounds confusing
but don’t worry the example we will take, will clear all the doubts.
Aggregration Example
In real world, we know that a manager not only manages the employee working under
them but he has to manage the project as well. In such scenario if entity “Manager” makes
a “manages” relationship with either “Employee” or “Project” entity alone then it will not
make any sense because he has to manage both. In these cases the relationship of two
entities acts as one entity. In our example, the relationship “Works-On” between
“Employee” & “Project” acts as one entity that has a relationship “Manages” with the entity
“Manager”.
In relational model, the data and relationships are represented by collection of inter-related
tables. Each table is a group of column and rows, where column represents attribute of an
entity and rows represents records.
Sample relationship Model: Student table with 3 columns and four records.
Table: Student
Stu_Id Stu_Name Stu_Age
111 Ashish 23
123 Saurav 22
169 Lester 24
234 Lou 26
Table: Course
Here Stu_Id, Stu_Name & Stu_Age are attributes of table Student and Stu_Id, Course_Id
& Course_Name are attributes of table Course. The rows with values are the records
(commonly known as tuples).
Hierarchical model in DBMS
BY CHAITANYA SINGH | FILED UNDER: DBMS
In hierarchical model, data is organized into a tree like structure with each record is
having one parent record and many children. The main drawback of this model is that, it
can have only one to many relationships between nodes.
367 Chaitanya 27
234 Ajeet 28
Course Table:
Constraints in DBMS
BY CHAITANYA SINGH | FILED UNDER: DBMS
Constraints enforce limits to the data or type of data that can be inserted/updated/deleted
from a table. The whole purpose of constraints is to maintain the data integrity during an
update/delete/insert into a table. In this tutorial we will learn several types of constraints
that can be created in RDBMS.
Types of constraints
NOT NULL
UNIQUE
DEFAULT
CHECK
Key Constraints – PRIMARY KEY, FOREIGN KEY
Domain constraints
Mapping constraints
NOT NULL:
NOT NULL constraint makes sure that a column does not hold NULL value. When we don’t
provide value for a particular column while inserting a record into a table, it takes NULL
value by default. By specifying NULL constraint, we can be sure that a particular column(s)
cannot have NULL values.
Example:
UNIQUE:
UNIQUE Constraint enforces a column or set of columns to have unique values. If a
column has a unique constraint, it means that particular column cannot have duplicate
values in a table.
DEFAULT:
The DEFAULT constraint provides a default value to a column when there is no value
provided while inserting a record into a table.
Key constraints:
PRIMARY KEY:
Primary key uniquely identifies each record in a table. It must have unique values and
cannot contain nulls. In the below example the ROLL_NO field is marked as primary key,
that means the ROLL_NO field cannot have duplicate and null values.
Domain constraints:
Each table has certain set of columns and each column allows a same type of data, based
on its data type. The column does not accept values of any other data type.
Domain constraints are user defined data type and we can define them like this:
Domain Constraint = data type + Constraints (NOT NULL / UNIQUE / PRIMARY KEY /
FOREIGN KEY / CHECK / DEFAULT)
Mapping constraints:
Read about Mapping constraint here.
Cardinality in DBMS
BY CHAITANYA SINGH | FILED UNDER: DBMS
In DBMS you may hear cardinality term at two different places and it has two different
meanings as well.
One to One – A single row of first table associates with single row of second table. For
example, a relationship between person and passport table is one to one because a
person can have only one passport and a passport can be assigned to only one person.
One to Many – A single row of first table associates with more than one rows of second
table. For example, relationship between customer and order table is one to many because
a customer can place many orders but a order can be placed by a single customer alone.
Many to One – Many rows of first table associate with a single row of second table. For
example, relationship between student and university is many to one because a university
can have many students but a student can only study only in single university at a time.
Many to Many – Many rows of first table associate with many rows of second table. For
example, relationship between student and course table is many to many because a
student can take many courses at a time and a course can be assigned to many students.
RDBMS Concepts
BY CHAITANYA SINGH | FILED UNDER: DBMS
RDBMS stands for relational database management system. A relational model can be
represented as a table of rows and columns. A relational database has following major
components:
1. Table
2. Record or Tuple
3. Field or Column name or Attribute
4. Domain
5. Instance
6. Schema
7. Keys
1. Table
A table is a collection of data represented in rows and columns. Each table has a name in
database. For example, the following table “STUDENT” stores the information of students
in database.
Table: STUDENT
2. Record or Tuple
Each row of a table is known as record. It is also known as tuple. For example, the
following row is a record that we have taken from the above table.
An attribute cannot accept values that are outside of their domains. For example, In the
above table “STUDENT”, the Student_Id field has integer domain so that field cannot
accept values that are not integers for example, Student_Id cannot has values like, “First”,
10.11 etc.
6. Keys
This is our next topic, I have covered the keys in detail in separate tutorials. You can refer
the keys index here.
keys in DBMS
BY CHAITANYA SINGH | FILED UNDER: DBMS
Key plays an important role in relational database; it is used for identifying unique rows
from table. It also establishes relationship among tables.
Primary Key – A primary is a column or set of columns in a table that uniquely identifies
tuples (rows) in that table.
Super Key – A super key is a set of one of more columns (attributes) to uniquely identify
rows in a table.
Candidate Key – A super key with no redundant attribute is known as candidate key
Alternate Key – Out of all candidate keys, only one gets selected as primary key,
remaining keys are known as alternate or secondary keys.
Composite Key – A key that consists of more than one attribute to uniquely identify rows
(also known as records & tuples) in a table is called composite key.
Foreign Key – Foreign keys are the columns of a table that points to the primary key of
another table. They act as a cross-reference between tables.
Definition: A primary key is a minimal set of attributes (columns) in a table that uniquely
identifies tuples (rows) in that table.
Attribute Stu_Name alone cannot be a primary key as more than one students can have
same name.
Attribute Stu_Age alone cannot be a primary key as more than one students can have same
age.
Attribute Stu_Id alone is a primary key as each student has a unique id that can identify the
student record in the table.
Note: In some cases an attribute alone cannot uniquely identify a record in a table, in that
case we try to find a set of attributes that can uniquely identify a row in table. We will see
the example of it after this example.
101 Steve 23
102 John 24
103 Robert 28
104 Steve 29
105 Carl 29
Customer_IDalone cannot be a primary key as a single customer can place more than one
order thus more than one rows of same Customer_ID value. As we see in the following
example that customer id 1011 has placed two orders with product if 9023 and 9111.
alone cannot be a primary key as more than one customers can place a order for
Product_ID
the same product thus more than one rows with same product id. In the following table,
customer id 1011 & 1122 placed an order for the same product (product id 9023).
alone cannot be a primary key as more more than one customers can place
Order_Quantity
the order for the same quantity.
Since none of the attributes alone were able to become a primary key, lets try to make a
set of attributes that plays the role of it.
{Customer_ID, Product_ID} together can identify the rows uniquely in the table so this set is the
primary key for this table.
Table Name: ORDER
1011 9023 10
1122 9023 15
1099 9031 20
1177 9031 18
1011 9111 50
Note: While choosing a set of attributes for a primary key, we always choose the minimal
set that has minimum number of attributes. For example, if there are two sets that can
identify row in table, the set that has minimum number of attributes should be chosen as
primary key.
Lets say we want to create the table that we have discussed above with the customer id
and product id set working as primary key. We can do that in SQL like this:
Definition of Super Key in DBMS: A super key is a set of one or more attributes
(columns), which can uniquely identify a row in a table. Often DBMS beginners get
confused between super key and candidate key, so we will also discuss candidate key and
its relation with super key in this article.
{Emp_SSN}
{Emp_Number}
{Emp_SSN, Emp_Number}
{Emp_SSN, Emp_Name}
{Emp_SSN, Emp_Number, Emp_Name}
{Emp_Number, Emp_Name}
Candidate Keys: As I mentioned in the beginning, a candidate key is a minimal super key
with no redundant attributes. The following two set of super keys are chosen from the
above sets as there are no redundant attributes in these sets.
{Emp_SSN}
{Emp_Number}
Only these two sets are candidate keys as all other sets are having redundant attributes
that are not necessary for unique identification.
Primary key:
A Primary key is selected from a set of candidate keys. This is done by database admin or
database designer. We can say that either {Emp_SSN} or {Emp_Number} can be chosen as a
primary key for the table Employee.
Definition of Candidate Key in DBMS: A super key with no redundant attribute is known
as candidate key. Candidate keys are selected from the set of super keys, the only thing
we take care while selecting candidate key is that the candidate key should not have any
redundant attributes. That’s the reason they are also termed as minimal super key.
Lets select the candidate keys from the above set of super keys.
Note: A primary key is selected from the set of candidate keys. That means we can either
have Emp_Id or Emp_Number as primary key. The decision is made by DBA (Database
administrator)
Definition: Foreign keys are the columns of a table that points to the primary key of
another table. They act as a cross-reference between tables.
For example:
In the below example the Stu_Id column in Course_enrollment table is a foreign key as it points
to the primary key of the Student table.
Course_enrollment table:
Course_Id Stu_Id
C01 101
C02 102
C03 101
C05 102
C06 103
C07 102
Student table:
101 Chaitanya 22
102 Arya 26
103 Bran 25
104 Jon 21
Note: Practically, the foreign key has nothing to do with the primary key tag of another
table, if it points to a unique column (not necessarily a primary key) of another table then
too, it would be a foreign key. So, a correct definition of foreign key would be: Foreign keys
are the columns of a table that points to the candidate key of another table.
Definition of Composite key: A key that has more than one attributes is known as
composite key. It is also known as compound key.
Note: Any key such as super key, primary key, candidate key etc. can be called composite
key if it has more than one attributes.
Table – Sales
Column cust_Id alone cannot become a key as a same customer can place multiple
orders, thus the same customer can have multiple entires.
Column order_Id alone cannot be a primary key as a same order can contain the order of
multiple products, thus same order_Id can be present multiple times.
Column product_code cannot be a primary key as more than one customers can place
order for the same product.
Column product_count alone cannot be a primary key because two orders can be placed
for the same product count.
Based on this, it is safe to assume that the key should be having more than one attributes:
Key in above table: {cust_id, product_code}
As we have seen in the candidate key guide that a table can have multiple candidate keys.
Among these candidate keys, only one key gets selected as primary key, the remaining
keys are known as alternative or secondary keys.
Table: Employee/strong>
DBA (Database administrator) can choose any of the above key as primary key. Lets say
Emp_Id is chosen as primary key.
Since we have selected Emp_Id as primary key, the remaining key Emp_Number would be
called alternative or secondary key.
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These
are – Insertion, update and deletion anomaly. Let’s take an example to understand this.
The above table is not normalized. We will see the problems that we face when a table is
not normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs
to two departments of the company. If we want to update the address of Rick then we have
to update the same in two rows or the data will become inconsistent. If somehow, the
correct address gets updated in one department but not in other then as per the database,
Rick would be having two different addresses, which is not correct and would lead to
inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and
currently not assigned to any department then we would not be able to insert the data into
the table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890
then deleting the rows that are having emp_dept as D890 would also delete the
information of employee Maggie since she is assigned only to this department.
To overcome these anomalies we need to normalize the data. In the next section we will
discuss about normalization.
Normalization
Here are the most commonly used normal forms:
Example: Suppose a company wants to store the names and contact details of its
employees. It creates a table that looks like this:
8812121212
9900012222
9990000123
Two employees (Jon & Lester) are having two mobile numbers so the company stored
them in the same field as you can see in the table above.
This table is not in 1NF as the rule says “each attribute of a table must have atomic
(single) values”, the emp_mobile values for employees Jon & Lester violates that rule.
To make the table complies with 1NF we should have the data like this:
An attribute that is not part of any candidate key is known as non-prime attribute.
Example: Suppose a school wants to store the data of teachers and the subjects they
teach. They create a table that looks like this: Since a teacher can teach more than one
subjects, the table can have multiple rows for a same teacher.
teacher_id subject teacher_age
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF
because non prime attribute teacher_age is dependent on teacher_id alone which is a
proper subset of candidate key. This violates the rule for 2NF as the rule says “no non-
prime attribute is dependent on the proper subset of any candidate key of the table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_id subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each
functional dependency X-> Y at least one of the following conditions hold:
An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Suppose a company wants to store the complete address of each employee,
they create a table named employee_details that looks like this:
employee table:
employee_zip table:
Example: Suppose there is a company wherein employees work in more than one
department. They store the data like this:
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:
emp_dept_mapping table:
emp_id emp_dept
1001 Production and planning
1001 stores
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.
You may hear this term often when dealing with Relational Database Management
Systems (RDBMS). In RDBMS, a table organizes data in rows and columns. The columns
are known as attributes whereas the rows are known as records.
Example: A school maintains the data of students in a table named “student”. Suppose the
data they store in table is student id, student name & student age. To do this they have
had three columns in the table: student_id, student_age, student_name. The table looks
like this:
101 12 Jon
102 13 Arya
103 12 Sansa
In the previous tutorial we have seen the DBMS architecture – one-tier, two-tier and three-
tier. In this guide, we will discuss the three level DBMS architecture in detail.
DBMS Three Level Architecture Diagram
1. External level
It is also called view level. The reason this level is called “view” is because several users
can view their desired data from this level which is internally fetched from database with
the help of conceptual and internal level mapping.
The user doesn’t need to know the database schema details such as data structure, table
definition etc. user is only concerned about data which is what returned back to the view
level after it has been fetched from database (present at the internal level).
External level is the “top level” of the Three Level DBMS Architecture.
2. Conceptual level
It is also called logical level. The whole design of the database such as relationship
among data, schema of data etc. are described in this level.
Database constraints and security are also implemented in this level of architecture. This
level is maintained by DBA (database administrator).
3. Internal level
This level is also known as physical level. This level describes how the data is actually
stored in the storage devices. This level is also responsible for allocating space to the data.
This is the lowest level of the architecture.
Abstraction is one of the main features of database systems. Hiding irrelevant details from
user and providing abstract view of data to users, helps in easy and efficient user-
database interaction. In the previous tutorial, we discussed the three level of DBMS
architecture, The top level of that architecture is “view level”. The view level provides the
“view of data” to the users and hides the irrelevant details such as data relationship,
database schema, constraints, security etc from the user.
To fully understand the view of data, you must have a basic knowledge of data abstraction
and instance & schema. Refer these two tutorials to learn them in detail.
1. Data abstraction
2. Instance and schema
The attributes of a table is said to be dependent on each other when an attribute of a table
uniquely identifies another attribute of the same table.
For example: Suppose we have a student table with attributes: Stu_Id, Stu_Name,
Stu_Age. Here Stu_Id attribute uniquely identifies the Stu_Name attribute of student table
because if we know the student id we can tell the student name associated with it. This is
known as functional dependency and can be written as Stu_Id->Stu_Name or in words we
can say Stu_Name is functionally dependent on Stu_Id.
Formally:
If column A of a table uniquely identifies the column B of same table then it can
represented as A->B (Attribute B is functionally dependent on attribute A)
Types of Functional Dependencies
Trivial functional dependency
non-trivial functional dependency
Multivalued dependency
Transitive dependency
A transaction is a set of logically related operations. For example, you are transferring
money from your bank account to your friend’s account, the set of operations would be like
this:
This whole set of operations can be called a transaction. Although I have shown you read,
write and update operations in the above example but the transaction can have operations
like read, write, insert, update, delete.
1. R(A);
2. A = A - 10000;
3. W(A);
4. R(B);
5. B = B + 10000;
6. W(B);
In the above transaction R refers to the Read operation and W refers to the write
operation.
The main problem that can happen during a transaction is that the transaction can fail
before finishing the all the operations in the set. This can happen due to power failure,
system crash etc. This is a serious problem that can leave database in an inconsistent
state. Assume that transaction fail after third operation (see the example above) then the
amount would be deducted from your account but your friend will not receive it.
Commit: If all the operations in a transaction are completed successfully then commit
those changes to the database permanently.
Rollback: If any of the operation fails then rollback all the changes done by previous
operations.
Even though these operations can help us avoiding several issues that may arise during
transaction but they are not sufficient when two transactions are running concurrently. To
handle those problems we need to understand database ACID properties.
Atomicity: This property ensures that either all the operations of a transaction reflect
in database or none. Let’s take an example of banking system to understand this:
Suppose Account A has a balance of 400$ & B has 700$. Account A is transferring
100$ to Account B. This is a transaction that has two operations a) Debiting 100$
from A’s balance b) Creating 100$ to B’s balance. Let’s say first operation passed
successfully while second failed, in this case A’s balance would be 300$ while B
would be having 700$ instead of 800$. This is unacceptable in a banking system.
Either the transaction should fail without executing any of the operation or it should
process both the operations. The Atomicity property ensures that.
Consistency: To preserve the consistency of database, the execution of transaction
should take place in isolation (that means no other transaction should run
concurrently when there is a transaction already running). For example account A is
having a balance of 400$ and it is transferring 100$ to account B & C both. So we
have two transactions here. Let’s say these transactions run concurrently and both
the transactions read 400$ balance, in that case the final balance of A would be 300$
instead of 200$. This is wrong. If the transaction were to run in isolation then the
second transaction would have read the correct balance 300$ (before debiting 100$)
once the first transaction went successful.
Isolation: For every pair of transactions, one transaction should start execution only
when the other finished execution. I have already discussed the example of Isolation
in the Consistency property above.
Durability: Once a transaction completes successfully, the changes it has made into
the database should be permanent even if there is a system failure. The recovery-
management component of database systems ensures the durability of transaction.
DBMS Transaction States
BY CHAITANYA SINGH | FILED UNDER: DBMS
In this guide, we will discuss the states of a transaction in DBMS. A transaction in DBMS
can be in one of the following states.
Active State
As we have discussed in the DBMS transaction introduction that a transaction is a
sequence of operations. If a transaction is in execution then it is said to be in active state. It
doesn’t matter which step is in execution, until unless the transaction is executing, it
remains in active state.
Failed State
If a transaction is executing and a failure occurs, either a hardware failure or a software
failure then the transaction goes into failed state from the active state.
Partially Committed State
As we can see in the above diagram that a transaction goes into “partially committed” state
from the active state when there are read and write operations present in the transaction.
A transaction contains number of read and write operations. Once the whole transaction is
successfully executed, the transaction goes into partially committed state where we have
all the read and write operations performed on the main memory (local memory) instead of
the actual database.
The reason why we have this state is because a transaction can fail during execution so if
we are making the changes in the actual database instead of local memory, database may
be left in an inconsistent state in case of any failure. This state helps us to rollback the
changes made to the database in case of a failure during execution.
Committed State
If a transaction completes the execution successfully then all the changes made in the
local memory during partially committed state are permanently stored in the database.
You can also see in the above diagram that a transaction goes from partially committed
state to committed state when everything is successful.
Aborted State
As we have seen above, if a transaction fails during execution then the transaction goes
into a failed state. The changes made into the local memory (or buffer) are rolled back to
the previous consistent state and the transaction goes into aborted state from the failed
state. Refer the diagram to see the interaction between failed and aborted state.
We know that transactions are set of instructions and these instructions perform operations
on database. When multiple transactions are running concurrently then there needs to be a
sequence in which the operations are performed because at a time only one operation can
be performed on the database. This sequence of operations is known as Schedule.
T1 T2
---- ----
R(X)
W(X)
R(Y)
R(Y)
R(X)
W(Y)
Serial Schedule
In Serial schedule, a transaction is executed completely before starting the execution of
another transaction. In other words, you can say that in serial schedule, a transaction does
not start execution until the currently running transaction finished execution. This type of
execution of transaction is also known as non-interleaved execution. The example we
have seen above is the serial schedule.
Ta Tb
----- -----
R(X)
R(X)
W(X)
commit
W(X)
R(X)
commit
Here the write operation W(X) of Ta precedes the conflicting operation (Read or Write
operation) of Tb so the conflicting operation of Tb had to wait the commit operation of Ta.
Cascadeless Schedule
In Cascadeless Schedule, if a transaction is going to perform read operation on a value, it
has to wait until the transaction who is performing write on that value commits.
Ta Tb
----- -----
R(X)
W(X)
W(X)
commit
R(X)
W(X)
commit
Recoverable Schedule
In Recoverable schedule, if a transaction is reading a value which has been updated by
some other transaction then this transaction can commit only after the commit of other
transaction which is updating value.
Ta Tb
----- -----
R(X)
W(X)
R(X)
W(X)
R(X)
commit
commit
DBMS Serializability
BY CHAITANYA SINGH | FILED UNDER: DBMS
When multiple transactions are running concurrently then there is a possibility that the
database may be left in an inconsistent state. Serializability is a concept that helps us to
check which schedules are serializable. A serializable schedule is the one that always
leaves the database in consistent state.
Types of Serializability
There are two types of Serializability.
1. Conflict Serializability
2. View Serializability
In the DBMS Schedules guide, we learned that there are two types of schedules – Serial &
Non-Serial. A Serial schedule doesn’t support concurrent execution of transactions while a
non-serial schedule supports concurrency. We also learned in Serializability tutorial that a
non-serial schedule may leave the database in inconsistent state so we need to check
these non-serial schedules for the Serializability.
Conflict Serializability is one of the type of Serializability, which can be used to check
whether a non-serial schedule is conflict serializable or not.
Conflicting operations
Two operations are said to be in conflict, if they satisfy all the following three conditions:
T1 T2
----- ------
R(A)
R(B)
R(A)
R(B)
W(B)
W(A)
To convert this schedule into a serial schedule we must have to swap the R(A) operation of
transaction T2 with the W(A) operation of transaction T1. However we cannot swap these
two operations because they are conflicting operations, thus we can say that this given
schedule is not Conflict Serializable.
T1 T2
----- ------
R(A)
R(A)
R(B)
W(B)
R(B)
W(A)
Lets swap non-conflicting operations:
T1 T2
----- ------
R(A)
R(A)
R(B)
W(B)
R(B)
W(A)
After swapping R(A) of T1 and R(B) of T2 we get:
T1 T2
----- ------
R(A)
R(B)
R(A)
W(B)
R(B)
W(A)
After swapping R(A) of T1 and W(B) of T2 we get:
T1 T2
----- ------
R(A)
R(B)
W(B)
R(A)
R(B)
W(A)
We finally got a serial schedule after swapping all the non-conflicting operations so we can
say that the given schedule is Conflict Serializable.
In the last tutorial, we learned Conflict Serializability. In this article, we will discuss another
type of serializability which is known as View Serializability.
To check whether a given schedule is view serializable, we need to check whether the
given schedule is View Equivalent to its serial schedule. Lets take an example to
understand what I mean by that.
Given Schedule:
T1 T2
----- ------
R(X)
W(X)
R(X)
W(X)
R(Y)
W(Y)
R(Y)
W(Y)
Serial Schedule of the above given schedule:
As we know that in Serial schedule a transaction only starts when the current running
transaction is finished. So the serial schedule of the above given schedule would look like
this:
T1 T2
----- ------
R(X)
W(X)
R(Y)
W(Y)
R(X)
W(X)
R(Y)
W(Y)
If we can prove that the given schedule is View Equivalent to its serial schedule then the
given schedule is called view Serializable.
You may be wondering instead of checking that a non-serial schedule is serializable or not,
can’t we have serial schedule all the time? The answer is no, because concurrent
execution of transactions fully utilize the system resources and are considerably faster
compared to serial schedules.
View Equivalent
Lets learn how to check whether the two schedules are view equivalent.
Two schedules T1 and T2 are said to be view equivalent, if they satisfy all the following
conditions:
1. Initial Read: Initial read of each data item in transactions must match in both schedules.
For example, if transaction T1 reads a data item X before transaction T2 in schedule S1
then in schedule S2, T1 should read X before T2.
Read vs Initial Read: You may be confused by the term initial read. Here initial read
means the first read operation on a data item, for example, a data item X can be read
multiple times in a schedule but the first read operation on X is called the initial read. This
will be more clear once we will get to the example in the next section of this same article.
2. Final Write: Final write operations on each data item must match in both the schedules.
For example, a data item X is last written by Transaction T1 in schedule S1 then in S2, the
last write operation on X should be performed by the transaction T1.
3. Update Read: If in schedule S1, the transaction T1 is reading a data item updated by
T2 then in schedule S2, T1 should read the value after the write operation of T2 on same
data item. For example, In schedule S1, T1 performs a read operation on X after the write
operation on X by T2 then in S2, T1 should read the X after T2 performs write on X.
View Serializable
If a schedule is view equivalent to its serial schedule then the given schedule is said to be
View Serializable. Lets take an example.
View Serializable Example
Initial Read
In schedule S1, transaction T1 first reads the data item X. In S2 also transaction T1 first
reads the data item X.
Lets check for Y. In schedule S1, transaction T1 first reads the data item Y. In S2 also the
first read operation on Y is performed by T1.
We checked for both data items X & Y and the initial read condition is satisfied in S1 &
S2.
Final Write
In schedule S1, the final write operation on X is done by transaction T2. In S2 also
transaction T2 performs the final write on X.
Lets check for Y. In schedule S1, the final write operation on Y is done by transaction T2.
In schedule S2, final write on Y is done by T2.
We checked for both data items X & Y and the final write condition is satisfied in S1 & S2.
Update Read
In S1, transaction T2 reads the value of X, written by T1. In S2, the same transaction T2
reads the X after it is written by T1.
In S1, transaction T2 reads the value of Y, written by T1. In S2, the same transaction T2
reads the value of Y after it is updated by T1.
The update read condition is also satisfied for both the schedules.
Result: Since all the three conditions that checks whether the two schedules are view
equivalent are satisfied in this example, which means S1 and S2 are view equivalent. Also,
as we know that the schedule S2 is the serial schedule of S1, thus we can say that the
schedule S1 is view serializable schedule.