DBMS_UNIT-1 (1)

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 76

UNIT-I

Course Name: Database Management System


Course Outcome (CO): At the
ends of this course students will
have:
CO1: Awareness of database
management basics and different
models that we use for database.
CO2: Design and architecture of
relational model, relational
algebra and SQL queries.
CO3: Implement different form of
normalization.
CO4: Logical representation of
internet database.
CO5: Analysis and concepts of
transaction, concurrency and
recovery systems

04/18/22 Department of Computer Science & Engine


ering
Outline
• Introduction
• Database vs file system
• View of data
• Data Models
• Database language
• Database Users and Administrators
• Transaction Management
• Components of DBMS
• ER Model
– Basic
– Constraints, keys, Design issues
– ER diagram
Database Management System
• DBMS contains information about a particular enterprise
– Collection of interrelated data
– Set of programs to access the data
– An environment that is both convenient and efficient to use
• Database Applications:
– Banking: transactions
– Airlines: reservations, schedules
– Universities: registration, grades
– Sales: customers, products, purchases
– Online retailers: order tracking, customized recommendations
– Manufacturing: production, inventory, orders, supply chain
– Human resources: employee records, salaries, tax deductions
• Databases can be very large.
• Databases touch all aspects of our lives
University Database Example
• Application program examples
– Add new students, instructors, and courses
– Register students for courses, and generate class
rosters
– Assign grades to students, compute grade point
averages (GPA) and generate transcripts
• In the early days, database applications
were built directly on top of file systems
Drawbacks of using file systems to store data

• Data redundancy and inconsistency


– Multiple file formats, duplication of information in different files
• Difficulty in accessing data
– Need to write a new program to carry out each new task
• Data isolation
– Multiple files and formats
• Integrity problems
– Integrity constraints (e.g., account balance > 0) become
“buried” in program code rather than being stated explicitly
– Hard to add new constraints or change existing ones
Drawbacks of using file systems to store data (Cont.)
• Atomicity of updates
– Failures may leave database in an inconsistent state with partial updates
carried out
– Example: Transfer of funds from one account to another should either
complete or not happen at all
• Concurrent access by multiple users
– Concurrent access needed for performance
– Uncontrolled concurrent accesses can lead to inconsistencies
• Example: Two people reading a balance (say 100) and updating it by
withdrawing money (say 50 each) at the same time
• Security problems
– Hard to provide user access to some, but not all, data

Database systems offer solutions to all the above


problems
Levels of Abstraction
• Physical level: describes how a record (e.g., instructor) is stored.
• Logical level: describes data stored in database, and the relationships
among the data.
type instructor = record
ID : string;
name : string;
dept_name : string;
salary : integer;
end;
• View level: application programs hide details of data types. Views can
also hide information (such as an employee’s salary) for security
purposes.
View of Data
An architecture for a database system
Instances and Schemas
• Similar to types and variables in programming languages
• Logical Schema – the overall logical structure of the database
– Example: The database consists of information about a set of customers and
accounts in a bank and the relationship between them
• Analogous to type information of a variable in a program
• Physical schema– the overall physical structure of the database
• Instance – the actual content of the database at a particular point in time
– Analogous to the value of a variable
• Physical Data Independence – the ability to modify the physical schema
without changing the logical schema
– Applications depend on the logical schema
– In general, the interfaces between the various levels and components should
be well defined so that changes in some parts do not seriously influence
others.
Data Models
• A collection of tools for describing
– Data
– Data relationships
– Data semantics
– Data constraints
• Relational model
• Entity-Relationship data model (mainly for database
design)
• Object-based data models (Object-oriented and Object-
relational)
• Semistructured data model (XML)
• Other older models:
– Network model
– Hierarchical model
Relational Model
• All the data is stored in various tables. Columns
• Example of tabular data in the relational model

Rows
A Sample Relational Database
Data Definition Language (DDL)
• Specification notation for defining the database schema
Example: create table instructor (
ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2))

• DDL compiler generates a set of table templates stored in a data


dictionary
• Data dictionary contains metadata (i.e., data about data)
– Database schema
– Integrity constraints
• Primary key (ID uniquely identifies instructors)
– Authorization
• Who can access what
Data Manipulation Language (DML)
• Language for accessing and manipulating the data organized
by the appropriate data model
– DML also known as query language
• Two classes of languages
– Pure – used for proving properties about computational power
and for optimization
• Relational Algebra
• Tuple relational calculus
• Domain relational calculus
– Commercial – used in commercial systems
• SQL is the most widely used commercial language
SQL

• The most widely used commercial language


• SQL is NOT a Turing machine equivalent language
• SQL is NOT a Turing machine equivalent language
• To be able to compute complex functions SQL is usually
embedded in some higher-level language
• Application programs generally access databases through one of
– Language extensions to allow embedded SQL
– Application program interface (e.g., ODBC/JDBC) which allow SQL
queries to be sent to a database
Database Design
The process of designing the general structure of the database:

• Logical Design – Deciding on the database schema.


Database design requires that we find a “good” collection
of relation schemas.
– Business decision – What attributes should we record in the
database?
– Computer Science decision – What relation schemas should
we have and how should the attributes be distributed among
the various relation schemas?
• Physical Design – Deciding on the physical layout of the
database
Database Design (Cont.)
• Is there any problem with this relation?
Design Approaches
• Need to come up with a methodology to
ensure that each of the relations in the
database is “good”
• Two ways of doing so:
– Entity Relationship Model (Chapter 7)
• Models an enterprise as a collection of entities
and relationships
• Represented diagrammatically by an entity-
relationship diagram:
– Normalization Theory (Chapter 8)
• Formalize what designs are bad, and test for
them
Object-Relational Data Models
• Relational model: flat, “atomic” values
• Object Relational Data Models
– Extend the relational data model by including object orientation
and constructs to deal with added data types.
– Allow attributes of tuples to have complex types, including non-
atomic values such as nested relations.
– Preserve relational foundations, in particular the declarative
access to data, while extending modeling power.
– Provide upward compatibility with existing relational languages.
Database Engine
• Storage manager
• Query processing
• Transaction manager
Storage Management
• Storage manager is a program module that provides the
interface between the low-level data stored in the database and
the application programs and queries submitted to the system.
• The storage manager is responsible to the following tasks:
– Interaction with the OS file manager
– Efficient storing, retrieving and updating of data
• Issues:
– Storage access
– File organization
– Indexing and hashing
Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
Query Processing (Cont.)
• Alternative ways of evaluating a given query
– Equivalent expressions
– Different algorithms for each operation
• Cost difference between a good and a bad way of
evaluating a query can be enormous
• Need to estimate the cost of operations
– Depends critically on statistical information about relations
which the database must maintain
– Need to estimate statistics for intermediate results to compute
cost of complex expressions
Transaction Management
• What if the system fails?
• What if more than one user is concurrently updating
the same data?
• A transaction is a collection of operations that
performs a single logical function in a database
application
• Transaction-management component ensures that
the database remains in a consistent (correct) state
despite system failures (e.g., power failures and
operating system crashes) and transaction failures.
• Concurrency-control manager controls the
interaction among the concurrent transactions, to
ensure the consistency of the database.
Database Users and Administrators

Database
Database System Internals
Database Architecture
The architecture of a database systems is
greatly influenced by
the underlying computer system on which
the database is running:
• Centralized
• Client-server
• Parallel (multi-processor)
• Distributed
History of Database Systems
• 1950s and early 1960s:
– Data processing using magnetic tapes for storage
• Tapes provided only sequential access
– Punched cards for input
• Late 1960s and 1970s:
– Hard disks allowed direct access to data
– Network and hierarchical data models in widespread use
– Ted Codd defines the relational data model
• Would win the ACM Turing Award for this work
• IBM Research begins System R prototype
• UC Berkeley begins Ingres prototype
– High-performance (for the era) transaction processing
History (cont.)
• 1980s:
– Research relational prototypes evolve into commercial systems
• SQL becomes industrial standard
– Parallel and distributed database systems
– Object-oriented database systems
• 1990s:
– Large decision support and data-mining applications
– Large multi-terabyte data warehouses
– Emergence of Web commerce
• Early 2000s:
– XML and XQuery standards
– Automated database administration
• Later 2000s:
– Giant data storage systems
• Google BigTable, Yahoo PNuts, Amazon, ..
ER model -- Database
Modeling

04/18/22 Department of Computer Science & Engine


ering
ER model -- Database Modeling
• The ER data mode was developed to facilitate database design by
allowing specification of an enterprise schema that represents the
overall logical structure of a database.
• The ER model is very useful in mapping the meanings and
interactions of real-world enterprises onto a conceptual schema.
Because of this usefulness, many database-design tools draw on
concepts from the ER model.
• The ER data model employs three basic concepts:
– entity sets,
– relationship sets,
– attributes.
• The ER model also has an associated diagrammatic representation,
the ER diagram, which can express the overall logical structure of
a database graphically.
Entity Sets
• An entity is an object that exists and is distinguishable from
other objects.
– Example: specific person, company, event, plant
• An entity set is a set of entities of the same type that share
the same properties.
– Example: set of all persons, companies, trees, holidays
• An entity is represented by a set of attributes; i.e.,
descriptive properties possessed by all members of an entity
set.
– Example:
instructor = (ID, name, street, city, salary )
course= (course_id, title, credits)
• A subset of the attributes form a primary key of the entity
set; i.e., uniquely identifiying each member of the set.
Entity Sets -- instructor and
student
instructor_ID instructor_name student-ID student_name
Relationship Sets
• A relationship is an association among several entities
Example:
44553 (Peltier) advisor 22222
(Einstein)
student entity relationship set instructor
entity
• A relationship set is a mathematical relation among n  2
entities, each taken from entity sets
{(e1, e2, … en) | e1  E1, e2  E2, …, en 
En}

where (e1, e2, …, en) is a relationship


– Example:
Relationship Set advisor
Relationship Sets (Cont.)
• An attribute can also be associated with a relationship
set.
• For instance, the advisor relationship set between
entity sets instructor and student may have the
attribute date which tracks when the student started
being associated with the advisor
Degree of a Relationship Set
• binary relationship
– involve two entity sets (or degree two).
– most relationship sets in a database system are binary.
• Relationships between more than two entity sets are
rare. Most relationships are binary. (More on this
later.)
 Example: students work on research projects under the
guidance of an instructor.
 relationship proj_guide is a ternary relationship between
instructor, student, and project
Mapping Cardinality Constraints
• Express the number of entities to which another entity
can be associated via a relationship set.
• Most useful in describing binary relationship sets.
• For a binary relationship set the mapping cardinality
must be one of the following types:
– One to one
– One to many
– Many to one
– Many to many
Mapping Cardinalities

One to one One to many

Note: Some elements in A and B may not be mapped to any


elements in the other set
Mapping Cardinalities

Many to one Many to many

Note: Some elements in A and B may not be mapped to any


elements in the other set
Complex Attributes
• Attribute types:
– Simple and composite attributes.
– Single-valued and multivalued attributes
• Example: multivalued attribute: phone_numbers
– Derived attributes
• Can be computed from other attributes
• Example: age, given date_of_birth
• Domain – the set of permitted values for
each attribute
Composite Attributes
Redundant Attributes
• Suppose we have entity sets:
– instructor, with attributes: ID, name, dept_name, salary
– department, with attributes: dept_name, building, budget
• We model the fact that each instructor has an associated
department using a relationship set inst_dept
• The attribute dept_name appears in both entity sets. Since
it is the primary key for the entity set department, it
replicates information present in the relationship and is
therefore redundant in the entity set instructor and needs to
be removed.
• BUT: when converting back to tables, in some cases the
attribute gets reintroduced, as we will see later.
Weak Entity Sets
• Consider a section entity, which is uniquely identified by a
course_id, semester, year, and sec_id.
• Clearly, section entities are related to course entities. Suppose we
create a relationship set sec_course between entity sets section and
course.
• Note that the information in sec_course is redundant, since section
already has an attribute course_id, which identifies the course with
which the section is related.
• One option to deal with this redundancy is to get rid of the
relationship sec_course; however, by doing so the relationship
between section and course becomes implicit in an attribute, which
is not desirable.
Weak Entity Sets (Cont.)
• An alternative way to deal with this redundancy is to not store the
attribute course_id in the section entity and to only store the
remaining attributes section_id, year, and semester. However, the
entity set section then does not have enough attributes to identify a
particular section entity uniquely; although each section entity is
distinct, sections for different courses may share the same
section_id, year, and semester.
• To deal with this problem, we treat the relationship sec_course as a
special relationship that provides extra information, in this case, the
course_id, required to identify section entities uniquely.
• The notion of weak entity set formalizes the above intuition. A
weak entity set is one whose existence is dependent on another
entity, called its identifying entity; instead of associating a primary
key with a weak entity, we use the identifying entity, along with
extra attributes called discriminator to uniquely identify a weak
entity. An entity set that is not a weak entity set is termed a strong
entity set.
Weak Entity Sets (Cont.)
• Every weak entity must be associated with an
identifying entity; that is, the weak entity set is
said to be existence dependent on the identifying
entity set. The identifying entity set is said to own
the weak entity set that it identifies. The
relationship associating the weak entity set with
the identifying entity set is called the identifying
relationship.
• Note that the relational schema we eventually
create from the entity set section does have the
attribute course_id, for reasons that will become
clear later, even though we have dropped the
attribute course_id from the entity set section.
E-R Diagrams
Entity Sets
 Entities can be represented graphically as follows:
• Rectangles represent entity sets.
• Attributes listed inside entity rectangle
• Underline indicates primary key attributes
Relationship Sets
 Diamonds represent relationship sets.
Relationship Sets with Attributes
Roles
• Entity sets of a relationship need not be distinct
– Each occurrence of an entity set plays a “role” in the
relationship
• The labels “course_id” and “prereq_id” are called
roles.
Cardinality Constraints
• We express cardinality constraints by drawing either a directed line
(), signifying “one,” or an undirected line (—), signifying
“many,” between the relationship set and the entity set.

• One-to-one relationship between an instructor and a student :


– A student is associated with at most one instructor via the relationship
advisor
– A student is associated with at most one department via stud_dept
One-to-Many Relationship
• one-to-many relationship between an instructor and a
student
– an instructor is associated with several (including 0) students
via advisor
– a student is associated with at most one instructor via advisor ,
Many-to-One Relationships
• In a many-to-one relationship between an instructor
and a student,
– an instructor is associated with at most one student via
advisor,
– and a student is associated with several (including 0)
instructors via advisor
Many-to-Many Relationship
• An instructor is associated with several (possibly 0)
students via advisor
• A student is associated with several (possibly 0)
instructors via advisor
Total and Partial Participation

 Total participation (indicated by double line): every entity in the entity set
participates in at least one relationship in the relationship set

participation of student in advisor relation is total


 every student must have an associated instructor
 Partial participation: some entities may not participate in any relationship in
the relationship set
 Example: participation of instructor in advisor is partial
Notation for Expressing More Complex Constraints

 A line may have an associated minimum and maximum cardinality,


shown in the form l..h, where l is the minimum and h the maximum
cardinality
 A minimum value of 1 indicates total participation.
 A maximum value of 1 indicates that the entity participates in
at most one relationship
 A maximum value of * indicates no limit.

Instructor can advise 0 or more students. A student must have


1 advisor; cannot have multiple advisors
Notation to Express Entity with Complex Attributes
Expressing Weak Entity Sets

• In E-R diagrams, a weak entity set is depicted via a


double rectangle.
• We underline the discriminator of a weak entity set with
a dashed line.
• The relationship set connecting the weak entity set to
the identifying strong entity set is depicted by a double
diamond.
• Primary key for section – (course_id, sec_id, semester,
year)
E-R Diagram for a University
Enterprise
Reduction to Relation Schemas
Reduction to Relation Schemas
• Entity sets and relationship sets can be expressed
uniformly as relation schemas that represent the contents
of the database.
• A database which conforms to an E-R diagram can be
represented by a collection of schemas.
• For each entity set and relationship set there is a unique
schema that is assigned the name of the corresponding
entity set or relationship set.
• Each schema has a number of columns (generally
corresponding to attributes), which have unique names.
Representing Entity Sets
• A strong entity set reduces to a schema with the same
attributes
student(ID, name, tot_cred)

• A weak entity set becomes a table that includes a


column for the primary key of the identifying strong
entity set
section ( course_id, sec_id, sem, year )
Representing Relationship Sets
• A many-to-many relationship set is represented as a
schema with attributes for the primary keys of the two
participating entity sets, and any descriptive attributes
of the relationship set.
• Example: schema for relationship set advisor
advisor = (s_id, i_id)
Representation of Entity Sets with Composite Attributes

• Composite attributes are flattened out by creating a


separate attribute for each component attribute
– Example: given entity set instructor with composite
attribute name with component attributes first_name
and last_name the schema corresponding to the entity
set has two attributes name_first_name and
name_last_name
• Prefix omitted if there is no ambiguity (name_first_name
could be first_name)
• Ignoring multivalued attributes, extended instructor
schema is
– instructor(ID,
first_name, middle_initial, last_name,
street_number, street_name,
apt_number, city, state, zip_code,
date_of_birth)
Representation of Entity Sets with Multivalued Attributes

• A multivalued attribute M of an entity E is represented


by a separate schema EM
• Schema EM has attributes corresponding to the primary
key of E and an attribute corresponding to multivalued
attribute M
• Example: Multivalued attribute phone_number of
instructor is represented by a schema:
inst_phone= ( ID, phone_number)
• Each value of the multivalued attribute maps to a
separate tuple of the relation on schema EM
– For example, an instructor entity with primary key 22222 and
phone numbers 456-7890 and 123-4567 maps to two tuples:
(22222, 456-7890) and (22222, 123-4567)

Redundancy of Schemas
Many-to-one and one-to-many relationship sets that are total on the
many-side can be represented by adding an extra attribute to the
“many” side, containing the primary key of the “one” side
 Example: Instead of creating a schema for relationship set inst_dept,
add an attribute dept_name to the schema arising from entity set
instructor
Redundancy of Schemas (Cont.)
• For one-to-one relationship sets,
either side can be chosen to act as the
“many” side
– That is, an extra attribute can be added
to either of the tables corresponding to
the two entity sets
• If participation is partial on the
“many” side, replacing a schema by
an extra attribute in the schema
corresponding to the “many” side
could result in null values
Redundancy of Schemas (Cont.)
• The schema corresponding to a relationship set linking
a weak entity set to its identifying strong entity set is
redundant.

• Example: The section schema already contains the


attributes that would appear in the sec_course schema
Binary Vs. Non-Binary
Relationships
• Although it is possible to replace any non-binary (n-
ary, for n > 2) relationship set by a number of distinct
binary relationship sets, a n-ary relationship set shows
more clearly that several entities participate in a single
relationship.
• Some relationships that appear to be non-binary may
be better represented using binary relationships
– For example, a ternary relationship parents, relating a child
to his/her father and mother, is best replaced by two binary
relationships, father and mother
• Using two binary relationships allows partial information (e.g.,
only mother being known)
– But there are some relationships that are naturally non-binary
• Example: proj_guide
Converting Non-Binary Relationships to Binary Form

• In general, any non-binary relationship can be represented


using binary relationships by creating an artificial entity
set.
– Replace R between entity sets A, B and C by an entity set E, and
three relationship sets:
1. RA, relating E and A 2. RB, relating E and B

3. RC, relating E and C


– Create an identifying attribute for E and add any attributes of R to
E
– For each relationship (ai , bi , ci) in R, create
1. a new entity ei in the entity set E 2. add (ei , ai ) to
RA
3. add (ei , bi ) to RB 4. add (ei , ci ) to RC
Converting Non-Binary Relationships (Cont.)

• Also need to translate constraints


– Translating all constraints may not be possible
– There may be instances in the translated schema
that
cannot correspond to any instance of R
• Exercise: add constraints to the relationships RA, RB and
RC to ensure that a newly created entity corresponds to
exactly one entity in each of entity sets A, B and C
– We can avoid creating an identifying attribute by
making E a weak entity set (described shortly)
identified by the three relationship sets
E-R Design Decisions
• The use of an attribute or entity set to represent an
object.
• Whether a real-world concept is best expressed by an
entity set or a relationship set.
• The use of a ternary relationship versus a pair of binary
relationships.
• The use of a strong or weak entity set.
• The use of specialization/generalization – contributes to
modularity in the design.
• The use of aggregation – can treat the aggregate entity
set as a single unit without concern for the details of its
internal structure.
Summary of Symbols Used in E-R Notation
Symbols Used in E-R Notation

You might also like