Chapter 4 - Data Modeling Using ER Model
Chapter 4 - Data Modeling Using ER Model
Chapter 4 - Data Modeling Using ER Model
Chapter Four
Data Modeling Using the Entity-Relationship (ER) Model
4.1 Database Design
Database design is the process of coming up with different kinds of specification for the data to be stored
in the database. The database design part is one of the middle phases we have in information systems
development where the system uses a database approach. Design is the part on which we would be
engaged to describe how the data should be perceived at different levels and finally how it is going to be
stored in a computer system.
The ability to design databases and associated applications is critical to the success of the modern
enterprise. Database design requires understanding both the operational and business requirements of an
organization as well as the ability to model and realize those requirements using a database.
Developing database and information systems is performed using a development lifecycle, which
consists of a series of steps. As it is one component in most information system development tasks, there
are several steps to follow in designing a database system.
Information System with Database application consists of several tasks which include:
Planning of Information systems Design
Requirements Analysis,
Design (Conceptual, Logical and Physical Design)
Tuning
Implementation
Operation and Support
The requirements gathering and specification provides you with a high-level understanding of the
organization, its data, and the processes that you must model in the database. Database design involves
constructing a suitable model of this information. Since the design process is complicated, especially for
large databases, database design is mainly focused on this three phases:
1. Conceptual Design
2. Logical Design, and
3. Physical Design
In general, one has to go back and forth between these tasks to refine a database design, and decisions in
one task can influence the choices in another task.
(b) Attributes
Are properties used to describe each Entity or real world object.
Are used to store pieces of information about entities.
Attributes will give rise to recorded items of data in the database
For example, the STUDENT entity includes, among many others, the attributes STU_LNAME,
STU_FNAME, and STU_INITIAL.
In the original Chen notation, attributes are represented by ovals and are connected to the entity
rectangle with a line.
(c) Relationships
Relationships describe associations among data (exist between entities).
Most relationships describe associations between two entities.
Relationship (relationship type) is a meaningful association among entity types.
Generally, a relationship is represented as a connection between (or among) entities.
In standard ER model, it uses a diamond shape to connect between (or among) entities.
The relationship name is an active or passive verb; for example, a STUDENT takes a CLASS,
a PROFESSOR teaches a CLASS, a DEPARTMENT employs a PROFESSOR, a
DIVISION is managed by an EMPLOYEE.
The entities that participate in a relationship are also known as participants, and each
relationship is identified by a name that describes the relationship.
When the basic data model components were introduced, three types of relationships among data were
illustrated:
One-to-Many (1:M)
Many-to-Many (M:N), and
One-to-One (1:1)
The ER model uses the term connectivity to label the relationship types.
The name of the relationship is usually an active or passive verb.
For example, a PAINTER paints many PAINTINGs; an EMPLOYEE learns many SKILLs;
an EMPLOYEE manages a STORE.
Before working on the conceptual design of the database, one has to know and answer the following
basic questions.
• What are the entities and relationships in the enterprise?
• What information about these entities and relationships should we store in the database?
• What is the integrity constraints that hold? Constraints on each data with respect to update,
retrieval and store.
• Represent this information pictorially in ER diagrams, then map ER diagram into a relational
schema.
Ovals
Key
Key
Total participation:
Every tuple in the entity or relation participates in at least one relationship by taking a role. This means,
every tuple in a relation will be attached with at least one other tuple. The entity with total participation
in a relationship will be connected to the relationship using a double line. The existence of a mandatory
relationship indicates that the minimum cardinality is at least 1 for the mandatory entity.
Let’s examine a few more scenarios. Suppose that Tiny College employs some professors who
conduct research without teaching classes.
If you examine the “PROFESSOR teaches CLASS” relationship, it is quite possible for a
PROFESSOR not to teach a CLASS. Therefore, CLASS is optional to PROFESSOR. On the
other hand, a CLASS must be taught by a PROFESSOR. Therefore, PROFESSOR is mandatory
to CLASS
Problem: Which car (Car1 or Car3 or Car5) is used by Employee 6 Emp6 working in Branch 1 (Bra1)?
Thus from this ER Model one cannot tell which car is used by which staff since a branch can have more
than one car and also a branch is populated by more than one employee. Thus we need to restructure the
model to avoid the connection trap.
To avoid the Fan Trap problem we can go for restructuring of the E-R Model. This will result in the
following E-R Model.
If we have a set of projects that are not active currently then we can not assign a project manager for
these projects. So there are project with no project manager making the participation to have a minimum
value of zero.
Problem:
How can we identify which BRANCH is responsible for which PROJECT? We know that whether the
PROJECT is active or not there is a responsible BRANCH. But which branch is a question to be
answered, and since we have a minimum participation of zero between employee and PROJECT we
can’t identify the BRANCH responsible for each PROJECT.
The solution for this Chasm Trap problem is to add another relation ship between the extreme entities
(BRANCH and PROJECT)
Example;
The company is organized into departments. Each department has a unique name, a unique number, and
a particular employee who manages the department. We keep track of the start date when that employee
began managing the department. A department may have several locations. A department controls a
number of projects, each of which has a unique name, a unique number, and a single location.
We store each employee’s name, Social Security number, address, salary, sex(gender), and birth date.
An employee is assigned to one department, but may work on several projects, which are not necessarily
controlled by the same department. We keep track of the current number of hours per week that an
employee works on each project. We also keep track of the direct supervisor of each employee (who is
another employee). We want to keep track of the dependents of each employee for insurance purposes.
We keep each dependent’s first name, sex, birth date, and relation-ship to the employee
So far, we have not represented the fact that an employee can work on several projects, nor have we
represented the number of hours per week an employee works on each project. This characteristic is
listed as part of the third requirement and it can be represented by a multivalued composite attribute of
EMPLOYEE called Works_on with the simple components (Project, Hours). Alternatively, it can be
represented as a multivalued composite attribute of PROJECT called Workers with the simple
Exercises
1. Consider the following set of requirements for a UNIVERSITY database that is used to keep track of
students’ transcripts.
a) The university keeps track of each student’s name, student number, Social Security number,
current address and phone number, permanent address and phone number, birth date, sex,
class (freshman, sophomore, ..., grad-uate), major department, minor department (if any), and
degree program (B.A., B.S., ..., Ph.D.). Some user applications need to refer to the city, state,
and ZIP Code of the student’s permanent address and to the stu-dent’s last name. Both Social
Security number and student number have unique values for each student.
b) Each department is described by a name, department code, office num-ber, office phone
number, and college. Both name and code have unique values for each department.
c) Each course has a course name, description, course number, number of semester hours, level,
and offering department. The value of the course number is unique for each course.
2. Design an ER schema for keeping track of information about votes taken in the U.S. House of
Representatives during the current two-year congressional session. The database needs to keep track
of each U.S. STATE ’s Name (e.g.,‘Texas’, ‘New York’, ‘California’) and include the Region of
the state (whose domain is {‘Northeast’, ‘Midwest’, ‘Southeast’, ‘Southwest’, ‘West’}). Each
CONGRESS_PERSON in the House of Representatives is described by his or her Name, plus the
District represented, the Start_date when the congress person was first elected, and the political
Party to which he or she belongs (whose domain is {‘Republican’, ‘Democrat’, ‘Independent’,
‘Other’}). The database keeps track of each BILL(i.e., proposed law), including the Bill_name, the
Date_of_vote on the bill, whether the bill Passed_or_failed (whose domain is {‘Yes’, ‘No’}), and the
Sponsor (the congressperson(s) who sponsored—that is, proposed—the bill). The database also
keeps track of how each congressperson voted on each bill (domain of Vote attribute is {‘Yes’, ‘No’,
‘Abstain’, ‘Absent’}). Draw an ER schema diagram for this application. State clearly any
assumptions you make
3. A database is being constructed to keep track of the teams and games of a sports league. A team has
a number of players, not all of whom participate in each game. It is desired to keep track of the
players participating in each game for each team, the positions they played in that game, and the
result of the game. Design an ER schema diagram for this application, stating any assumptions you
make. Choose your favorite sport (e.g., soccer, baseball, football).
4. Consider an entity type SECTION in a UNIVERSITY database, which describes the section
offerings of courses. The attributes of SECTION are Section_number, Semester, Year ,
Course_number , Instructor, Room_no (where section is taught), Building (where section is taught),
Weekdays(domain is the possible combinations of weekdays in which a section can be offered
{‘MWF’, ‘MW’, ‘TT’, and so on}), and Hours (domain is all possible time periods during which
sections are offered {‘9–9:50 A . M .’, ‘10–10:50 A . M .’, ...,‘3:30–4:50 P.M.’, ‘5:30–6:20 P.M.’,
and so on}). Assume that Section_number is unique for each course within a particular
semester/year combination (that is, if a course is offered multiple times during a particular semester,
its section offerings are numbered 1, 2, 3, and so on). There are several composite keys for section,
and some attributes are components of more than one key. Identify three composite keys, and show
how they can be represented in an ER schema diagram.
Superclass/Supertype Entity
• Is the generalized entity
• An entity type whose tuples share common attributes. Attributes that are shared by all entity
occurrences (including the identifier) are associated with the supertype.
Subclass/Subtype Entity
• An entity type whose tuples have attributes that distinguish its members from tuples of the
generalized or Superclass entities.
• When one generalized Superclass has various subgroups with distinguishing features and these
subgroups are represented by specialized form, the groups are called subclasses.
• Subclasses can be either mutually exclusive (disjoint) or overlapping (inclusive).
• A single subclass may inherit attributes from two distinct superclasses.
• A mutually exclusive category/subclass is when an entity instance can be in only one of the
subclasses.
E.g.: An EMPLOYEE can either be SALARIED or PART-TIMER but not both.
• An overlapping category/subclass is when an entity instance may be in two or more subclasses.
E.g.: A PERSON who works for a university can be both EMPLOYEE and a
STUDENT at the same time.
Consider the EMPLOYEE supertype entity shown above. This entity can have several different
subtype entities (for example: HOURLY and SALARIED), each with distinct properties not
shared by other subtypes. But whether the employee is HOURLY or SALARIED, same
attributes (EmployeeId, Name, and DateHired) are shared.
The Supertype EMPLOYEE stores all properties that subclasses have in common. And
HOURLY employees have the unique attribute Wage (hourly wage rate), while SALARIED
employees have two unique attributes, StockOption and Salary.
Completeness Constraint.
• The Completeness Constraint addresses the issue of whether or not an occurrence of a Super
class must also have a corresponding Subclass occurrence.
• The completeness constraint requires that all instances of the subtype be represented in the super
type.
• The Total Specialization Rule specifies that an entity occurrence should at least be a member of
one of the subclasses. Total Participation of super class instances on subclasses is diagrammed
with a double line from the Super type to the circle as shown below.
E.g.: If we have EXTENTION and REGULAR as subclasses of a super class STUDENT,
then it is mandatory that each student to be either EXTENTION or REGULAR student.
Thus the participation of instances of STUDENT in EXTENTION and REGULAR
subclasses will be total.
• The Partial Specialization Rule specifies that it is not necessary for all entity occurrences in the
superclass to be a member of one of the subclasses. Here we have an optional participation on
the specialization. Partial Participation of superclass instances on subclasses is diagrammed with
a single line from the Supertype to the circle.
E.g.: If we have MANAGER and SECRETARY as subclasses of a superclass EMPLOYEE,
thenit is not the case that all employees are either manager or secretary. Thus the
participation of instances of employee in MANAGER and SECRETARY subclasses
will be partial.
The two types of constraints on generalization and specialization (Disjointness and Completeness
constraints) are not dependent on one another. That is, being disjoint will not favour whether the tuples
in the superclass should have Total or Partial participation for that specific specialization.
From the two types of constraints we can have four possible constraints
Disjoint AND Total Overlapping AND Total
Disjoint AND Partial Overlapping AND Partial