Database Management System (DBMS)
Database Management System (DBMS)
Database Management System (DBMS)
Data
The term data refers to groups of information that represent the qualitative or
quantitative attributes of a variable or set of variables. Data (plural of "datum",
which is seldom used) are typically the results of measurements and can be the basis
of graphs, images, or observations of a set of variables. Data are often viewed as the
lowest level of abstraction from which information and knowledge are derived. Raw
data refers to a collection of numbers, characters, images or other outputs from
devices that collect information to convert physical quantities into symbols, that are
unprocessed.
Database servers are computers that hold the actual databases and run only the
DBMS and related software. Database servers are usually multiprocessor computers,
with generous memory and RAID disk arrays used for stable storage. Hardware
database accelerators, connected to one or more servers via a high-speed channel,
are also used in large volume transaction processing environments. DBMSs are found
at the heart of most database applications. DBMSs may be built around a custom
multitasking kernel with built-in networking support, but modern DBMSs typically
rely on a standard operating system to provide these functions.
[email protected]
Components of DBMS
DBMS Engine accepts logical request from the various other DBMS
subsystems, converts them into physical equivalents, and actually accesses
the database and data dictionary as they exist on a storage device.
Data Definition Subsystem helps user to create and maintain the data
dictionary and define the structure of the files in a database.
Data Manipulation Subsystem helps user to add, change, and delete
information in a database and query it for valuable information. Software
tools within the data manipulation subsystem are most often the primary
interface between user and the information contained in a database. It allows
user to specify its logical information requirements.
Application Generation Subsystem contains facilities to help users to
develop transaction-intensive applications. It usually requires that user
perform a detailed series of tasks to process a transaction. It facilitates easy-
to-use data entry screens, programming languages, and interfaces.
Data Administration Subsystem helps users to manage the overall
database environment by providing facilities for backup and recovery, security
management, query optimization, concurrency control, and change
management.
The internal level:- The internal level has an internal schema which
describes the physical storage structure of the database.
The conceptual level:-The conceptual level has a conceptual schema, it
describes the entities, data types, relationships, user operations, and
constraints.
The external level or view level:- The external or view level includes a
number of external schemas or user views. It describes the part of the
database that a particular user group is interested in and hides the rest of the
database from that user group.
[email protected]
The internal level has an internal schema, which describes the physical storage
structure of the database. The internal schema uses physical data model, which
describes the complete details of data storage, access paths for the database, and
how the data’s are retrieved or inserted in the database. A data model is a collection
of conceptual tools for describing the data, data relationship, data semantics and
consistency constraints.
The conceptual level has a conceptual schema that describes the whole database for
different users who access the database. The conceptual schema hides the details of
the physical storage structures and concentrates basically on entities, relationships,
and constraints. The external or view level includes a number of user views. Each
external schema describes the part of the database that a particular user group is
interested in and hides the rest of the database from other user groups.
Implementation data model is used at this level. Each user group will refer to its own
external schema. Hence the DBMS should be capable of transforming the request
specified in the external schema into request against the conceptual schema. The
process of transforming requests and results between levels are called mappings.
Database model
A database model or database schema is the structure or format of a database,
described in a formal language supported by the database management system, In
other words, a "database model" is the application of a data model when used in
conjunction with a database management system.
[email protected]
Flat model
The flat (or table) model consists of a single, two-dimensional array of data
elements, where all members of a given column are assumed to be similar values,
and all members of a row are assumed to be related to one another. For instance,
columns for name and password that might be used as a part of a system security
database. Each row would have the specific password associated with an individual
user. Columns of the table often have a type associated with them, defining them as
character data, date or time information, integers, or floating point numbers. This
may not strictly qualify as a data model, as defined above.
Hierarchical model
Parent–child relationship: Child may only have one parent but a parent can have
multiple children. Parents and children are tied together by links called "pointers". A
parent will have a list of pointers to each of their children.
[email protected]
Network model
The network model (defined by the CODASYL specification) organizes data using two
fundamental constructs, called records and sets. Records contain fields (which may
be organized hierarchically, as in the programming language COBOL). Sets (not to
be confused with mathematical sets) define one-to-many relationships between
records: one owner, many members. A record may be an owner in any number of
sets, and a member in any number of sets.
The network model is a variation on the hierarchical model, to the extent that it is
built on the concept of multiple branches (lower-level structures) emanating from
one or more nodes (higher-level structures), while the model differs from the
hierarchical model in that branches can be connected to multiple nodes. The network
model is able to represent redundancy in data more efficiently than in the
hierarchical model.
The operations of the network model are navigational in style: a program maintains
a current position, and navigates from one record to another by following the
relationships in which the record participates. Records can also be located by
supplying key values.
Most object databases use the navigational concept to provide fast navigation across
networks of objects, generally using object identifiers as "smart" pointers to related
objects. Objectivity/DB, for instance, implements named 1:1, 1:many, many:1 and
many:many named relationships that can cross databases. Many object databases
also support SQL, combining the strengths of both models.
Relational model
[email protected]
The relational model was introduced by E.F. Codd in 1970 as a way to make
database management systems more independent of any particular application. It is
a mathematical model defined in terms of predicate logic and set theory.
The products that are generally referred to as relational databases in fact implement
a model that is only an approximation to the mathematical model defined by Codd.
Three key terms are used extensively in relational database models: relations,
attributes, and domains. A relation is a table with columns and rows. The named
columns of the relation are called attributes, and the domain is the set of values the
attributes are allowed to take.
The basic data structure of the relational model is the table, where information about
a particular entity (say, an employee) is represented in rows (also called tuples) and
columns. Thus, the "relation" in "relational database" refers to the various tables in
the database; a relation is a set of tuples. The columns enumerate the various
attributes of the entity (the employee's name, address or phone number, for
example), and a row is an actual instance of the entity (a specific employee) that is
represented by the relation. As a result, each tuple of the employee table represents
various attributes of a single employee.
All relations (and, thus, tables) in a relational database have to adhere to some basic
rules to qualify as relations. First, the ordering of columns is immaterial in a table.
Second, there can't be identical tuples or rows in a table. And third, each tuple will
contain a single value for each of its attributes.
A relational database contains multiple tables, each similar to the one in the "flat"
database model. One of the strengths of the relational model is that, in principle, any
value occurring in two different records (belonging to the same table or to different
tables), implies a relationship among those two records. Yet, in order to enforce
explicit integrity constraints, relationships between records in tables can also be
defined explicitly, by identifying or non-identifying parent-child relationships
characterized by assigning cardinality (1:1, (0)1:M, M:M). Tables can also have a
[email protected]
designated single attribute or a set of attributes that can act as a "key", which can
be used to uniquely identify each tuple in the table.
A key that can be used to uniquely identify a row in a table is called a primary key.
Keys are commonly used to join or combine data from two or more tables. For
example, an Employee table may contain a column named Location which contains a
value that matches the key of a Location table. Keys are also critical in the creation
of indexes, which facilitate fast retrieval of data from large tables. Any column can
be a key, or multiple columns can be grouped together into a compound key. It is
not necessary to define all the keys in advance; a column can be used as a key even
if it was not originally intended to be one.
A key that has an external, real-world meaning (such as a person's name, a book's
ISBN, or a car's serial number) is sometimes called a "natural" key. If no natural key
is suitable (think of the many people named Brown), an arbitrary or surrogate key
can be assigned (such as by giving employees ID numbers). In practice, most
databases have both generated and natural keys, because generated keys can be
used internally to create links between rows that cannot break, while natural keys
can be used, less reliably, for searches and for integration with other databases. (For
example, records in two independently developed databases could be matched up by
social security number, except when the social security numbers are incorrect,
missing, or have changed.)
[email protected]
A variety of these ways have been tried for storing objects in a database. Some
products have approached the problem from the application programming end, by
making the objects manipulated by the program persistent. This also typically
requires the addition of some kind of query language, since conventional
programming languages do not have the ability to find objects based on their
information content. Others have attacked the problem from the database end, by
defining an object-oriented data model for the database, and defining a database
programming language that allows full programming capabilities as well as traditional
query facilities.
Hierarchical Model
The hierarchical data model organizes data in a tree structure. There is a hierarchy
of parent and child data segments. This structure implies that a record can have
repeating information, generally in the child data segments. Data in a series of
records, which have a set of field values attached to it. It collects all the instances of
a specific record together as a record type. These record types are the equivalent of
tables in the relational model, and with the individual records being the equivalent of
rows. To create links between these record types, the hierarchical model uses Parent
Child Relationships. These are a 1:N mapping between record types. This is done by
using trees, like set theory used in the relational model, "borrowed" from maths. For
example, an organization might store information about an employee, such as name,
employee number, department, salary. The organization might also store information
about an employee's children, such as name and date of birth. The employee and
children data forms a hierarchy, where the employee data represents the parent
segment and the children data represents the child segment. If an employee has
three children, then there would be three child segments associated with one
employee segment. In a hierarchical database the parent-child relationship is one to
many. This restricts a child segment to having only one parent segment. Hierarchical
DBMSs were popular from the late 1960s, with the introduction of IBM's Information
Management System (IMS) DBMS, through the 1970s.
Network Model
The popularity of the network data model coincided with the popularity of the
hierarchical data model. Some data were more naturally modeled with more than
one parent per child. So, the network model permitted the modeling of many-to-
many relationships in data. In 1971, the Conference on Data Systems Languages
(CODASYL) formally defined the network model. The basic data modeling construct
in the network model is the set construct. A set consists of an owner record type, a
set name, and a member record type. A member record type can have that role in
more than one set, hence the multiparent concept is supported. An owner record
type can also be a member or owner in another set. The data model is a simple
network, and link and intersection record types (called junction records by IDMS)
[email protected]
may exist, as well as sets between them . Thus, the complete network of
relationships is represented by several pairwise sets; in each set some (one) record
type is owner (at the tail of the network arrow) and one or more record types are
members (at the head of the relationship arrow). Usually, a set defines a 1:M
relationship, although 1:1 is permitted. The CODASYL network model is based on
mathematical set theory.
Relational Model
Certain fields may be designated as keys, which means that searches for specific
values of that field will use indexing to speed them up. Where fields in two different
tables take values from the same set, a join operation can be performed to select
related records in the two tables by matching values in those fields. Often, but not
always, the fields will have the same name in both tables. For example, an "orders"
table might contain (customer-ID, product-code) pairs and a "products" table might
contain (product-code, price) pairs so to calculate a given customer's bill you would
sum the prices of all products ordered by that customer by joining on the product-
code fields of the two tables. This can be extended to joining multiple tables on
multiple fields. Because these relationships are only specified at retreival time,
relational databases are classed as dynamic database management system. The
RELATIONAL database model is based on the Relational Algebra.
Object/Relational Model
[email protected]
new object-management possibi lities. Query and procedural languages and call
interfaces in ORDBMSs are familiar: SQL3, vendor procedural languages, and ODBC,
JDBC, and proprie tary call interfaces are all extensions of RDBMS languages and
interfaces. And the leading vendors are, of course, quite well known: IBM, Inform ix,
and Oracle.
Object-Oriented Model
What is the difference between weak entity set and strong entity?
An entity set that does not possess sufficient attributes to form a primary key is
called a weak entity set. One that does have a primary key is called a strong entity
set.
[email protected]
For example, the entity set transaction has attributes transaction-number, date and
amount.
Different transactions on different accounts could share the same number.
These are not sufficient to form a primary key (uniquely identify a transaction).
Thus transaction is a weak entity set. For a weak entity set to be meaningful, it must
be part of a one-to-many relationship set. This relationship set should have no
descriptive attributes.
The idea of strong and weak entity sets is related to the existence dependencies
seen earlier.
Member of a strong entity set is a dominant entity.
Member of a weak entity set is a subordinate entity.
A weak entity set does not have a primary key, but we need a means of
distinguishing among the entities.
The discriminator of a weak entity set is a set of attributes that allows this distinction
to be made.
The primary key of a weak entity set is formed by taking the primary key of the
strong entity set on which its existence depends (see Mapping Constraints) plus its
discriminator.
Primary keys
A Primary Key is a Column that uniquely identifies a particular Row in a Table. For
example, a person entity may have a Column for SSN. If in your data model each
person has a unique SSN, then it may be a candidate for a Primary Key. (Primary
Keys can consist of two or more Columns, but this is not covered here.)
Primary Keys are also the means by which Foreign Keys work. Because of this, SSN
may actually not be a good choice as a Primary Key. In practice, Rows often have a
unique numeric identifier (often called an identity or sequence value) that uniquely
identifies a particular Row. These kinds of values are often used as Primary Keys.
It should be noted that RDBMSes often use a Table's Primary Key's column(s) to
automatically create a Structured Index on that Table. A Structured Index is an index
that physically re-orders the data to match the index. This is done to improve query
performance, but can actually hurt performance if the wrong column(s) are used as
the Primary Key.
Foreign keys
A Foreign Key is a way to further constrain the allowable values of a Column to data
that exists in another Table. For example, if you have to process orders in your
system, you may create a Table called OrderInfo to store order information. An
order has to be associated with a customer, so you may have a Column in the
OrderInfo Table called CustomerID that somehow connects to an associated Row
in the Customer Table.
[email protected]
Most likely you do not want to be able to create orders for customers that do not
exist, and you would not want to delete a customer that is associated with any
orders. Doing so would break the Referential Integrity of the data. A Foreign Key
relationship ensures that these two rules are enforced.
Usually, the table with the foreign key constraint is referring to another table by that
table's primary key attribute(s). In a many-to-one relationship, for instance Orders is
many, and Customer is one, there are many Order rows per Customer row, so the
foreign key resides on the Order table. Customarily, the foreign key field names are
the same as the primary key field name of the table being referred to, so it is
probably a good idea to call the primary key on each table with redundant naming
like "TABLENAME_ID" e.g. Customer_ID.
Other Constraints
It is arguable, that the most important constraints are foreign key and primary key
constraints, because the process of normalization (see below), pushes most of the
data integrity checking onto the primary keying and joining ( retrieving rows using a
foreign key in one table, and a primary key table in another table).
Some DBMS provide a logical CHECK constraint, where the body of the CHECK
involves some sort of condition on one or more fields .
NOT NULL and UNIQUE are constraints applied to individual fields in the data
declaration statement CREATE TABLE ( f1 type1 PRIMARY KEY, f2 type2 UNIQUE , ...
CHECK (..) )
Relational algebra
Relational algebra, an offshoot of first-order logic (and of algebra of sets), deals
with a set of finitary relations (see also relation (database)) which is closed under
certain operators. These operators operate on one or more relations to yield a
relation. Relational algebra is a part of computer science.
Relational algebras received little attention until the publication of E.F. Codd's
relational model of data in 1970. Codd proposed such algebra as a basis for database
query languages.
[email protected]
first-order predicate calculus apart from the restrictions he proposed. In practice the
restrictions have no adverse effect on the applicability of his relational algebra for
database purposes.
Primitive operations
Set operators
Projection (π)
Selection (σ)
Rename (ρ)
As in any algebra, some operators are primitive and the others, being definable in
terms of the primitive ones, are derived. It is useful if the choice of primitive
operators parallels the usual choice of primitive logical operators. Although it is well
known that the usual choice in logic of AND, OR and NOT is somewhat arbitrary,
Codd made a similar arbitrary choice for his algebra.
The six primitive operators of Codd's algebra are the selection, the projection, the
Cartesian product (also called the cross product or cross join), the set union, the set
difference, and the rename. (Actually, Codd omitted the rename, but the compelling
case for its inclusion was shown by the inventors of ISBL.) These six operators are
fundamental in the sense that none of them can be omitted without losing expressive
power. Many other operators have been defined in terms of these six. Among the
most important are set intersection, division, and the natural join. In fact ISBL made
a compelling case for replacing the Cartesian product with the natural join, of which
the Cartesian product is a degenerate case.
Set operators
Although three of the six basic operators are taken from set theory, there are
additional constraints that are present in their relational algebra counterparts: For
set union and set difference, the two relations involved must be union-compatible—
that is, the two relations must have the same set of attributes. As set intersection
can be defined in terms of set difference, the two relations involved in set
intersection must also be union-compatible.
The Cartesian product is defined differently from the one defined in set theory in the
sense that tuples are considered to be 'shallow' for the purposes of the operation.
That is, unlike in set theory, where the Cartesian product of a n-tuple by an m-tuple
is a set of 2-tuples, the Cartesian product in relational algebra has the 2-tuple
"flattened" into an n+m-tuple. More formally, R × S is defined as follows:
R × S = {r s | r R, s S}
In addition, for the Cartesian product to be defined, the two relations involved must
have disjoint headers — that is, they must not have a common attribute name.
Projection (π)
A projection is a unary operation written as where a1,...,an is a set of attribute
names. The result of such projection is defined as the set that is obtained when all
tuples in R are restricted to the set {a1,...,an}.
[email protected]
Selection (σ)
A generalized selection is a unary operation written as where is a propositional
formula that consists of atoms as allowed in the normal selection and the logical
operators (and), (or) and (negation). This selection selects all those tuples in R for
which holds.
Rename (ρ)
A rename is a unary operation written as ρa / b(R) where the result is identical to R
except that the b field in all tuples is renamed to an a field. This is simply used to
rename the attribute of a relation or the relation itself.
Natural join (⋈) is a binary operator that is written as (R⋈S) where R and S are
relations.[1] The result of the natural join is the set of all combinations of tuples in R
and S that are equal on their common attribute names. For an example consider the
tables Employee and Dept and their natural join:
This can also be used to define composition of relations. In category theory, the join
is precisely the fiber product.
The natural join is arguably one of the most important operators since it is the
relational counterpart of logical AND. Note carefully that if the same variable appears
in each of two predicates that are connected by AND, then that variable stands for
the same thing and both appearances must always be substituted by the same
value. In particular, natural join allows the combination of relations that are
associated by a foreign key. For example, in the above example a foreign key
probably holds from Employee.DeptName to Dept.DeptName and then the natural
join of Employee and Dept combines all employees with their departments. Note that
this works because the foreign key holds between attributes with the same name. If
this is not the case such as in the foreign key from Dept.manager to Emp.emp-
number then we have to rename these columns before we take the natural join.
Such a join is sometimes also referred to as an equijoin.
[email protected]
Equijoin
Consider tables Car and Boat which list models of cars and boats and their respective
prices. Suppose a customer wants to buy a car and a boat, but she doesn't want to
spend more money for the boat than for the car. The θ-join on the relation CarPrice
≥ BoatPrice produces a table with all the possible options.
Semijoin (⋉)(⋊)
The semijoin is joining similar to the natural join and written as R⋉S where R and S
are relations. The result of the semijoin is only the set of all tuples in R for which
there is a tuple in S that is equal on their common attribute names. For an example
consider the tables Employee and Dept and their semi join:
Antijoin (►)
The antijoin, written as R►S where R and S are relations, is similar to the natural
join, but the result of an antijoin is only those tuples in R for which there is NOT a
tuple in S that is equal on their common attribute names.
For an example consider the tables Employee and Dept and their antijoin:
Division (÷)
The division is a binary operation that is written as R ÷ S. The result consists of the
restrictions of tuples in R to the attribute names unique to R, i.e., in the header of R
but not in the header of S, for which it holds that all their combinations with tuples in
S are present in R. For an example see the tables Completed, DBProject and their
division:
[email protected]
Completed
Student Task
Fred Database1
DBProject Completed ÷ DBProject
Fred Database2
Task Student
Fred Compiler1
Database1 Fred
Eugene Database1
Database2 Sara
Eugene Compiler1
Sara Database1
Sara Database2
[email protected]
14. Assists with impact analysis of any changes made to the database objects.
15. Troubleshoots with problems regarding the databases, applications and
development tools.
16. Create new database users as required.
17. Manage sharing of resources amongst applications.
18. The DBA has ultimate responsibility for the physical database design.