Database Management System (DBMS)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18
At a glance
Powered by AI
The key takeaways are that a DBMS is a set of programs that controls creation, organization, and access to data in a database. It allows users to define, create, maintain and control access to the database. A DBMS also provides backup and recovery facilities for the database.

The main components of a DBMS are the DBMS engine, data definition subsystem, data manipulation subsystem, and application generation subsystem.

The responsibilities of an Oracle DBA include creating and maintaining databases, performing capacity planning, installing and upgrading Oracle software, planning and implementing backups, enforcing security, troubleshooting issues, and more.

Database Management System (DBMS)

Data
The term data refers to groups of information that represent the qualitative or
quantitative attributes of a variable or set of variables. Data (plural of "datum",
which is seldom used) are typically the results of measurements and can be the basis
of graphs, images, or observations of a set of variables. Data are often viewed as the
lowest level of abstraction from which information and knowledge are derived. Raw
data refers to a collection of numbers, characters, images or other outputs from
devices that collect information to convert physical quantities into symbols, that are
unprocessed.

A Database Management System (DBMS) is a set of computer programs that


controls the creation, maintenance, and the use of a database. It allows
organizations to place control of database development in the hands of database
administrators (DBAs) and other specialists. A DBMS is a system software package
that helps the use of integrated collection of data records and files known as
databases. It allows different user application programs to easily access the same
database. DBMSs may use any of a variety of database models, such as the network
model or relational model. In large systems, a DBMS allows users and other software
to store and retrieve data in a structured way. Instead of having to write computer
programs to extract information, user can ask simple questions in a query language.
Thus, many DBMS packages provide Fourth-generation programming language
(4GLs) and other application development features. It helps to specify the logical
organization for a database and access and use the information within a database. It
provides facilities for controlling data access, enforcing data integrity, managing
concurrency, and restoring the database from backups. A DBMS also provides the
ability to logically present database information to users.

A DBMS is a set of software programs that controls the organization, storage,


management, and retrieval of data in a database. DBMSs are categorized according
to their data structures or types. The DBMS accepts requests for data from an
application program and instructs the operating system to transfer the appropriate
data. The queries and responses must be submitted and received according to a
format that conforms to one or more applicable protocols. When a DBMS is used,
information systems can be changed much more easily as the organization's
information requirements change. New categories of data can be added to the
database without disruption to the existing system.

Database servers are computers that hold the actual databases and run only the
DBMS and related software. Database servers are usually multiprocessor computers,
with generous memory and RAID disk arrays used for stable storage. Hardware
database accelerators, connected to one or more servers via a high-speed channel,
are also used in large volume transaction processing environments. DBMSs are found
at the heart of most database applications. DBMSs may be built around a custom
multitasking kernel with built-in networking support, but modern DBMSs typically
rely on a standard operating system to provide these functions.

DBMS building blocks


A DBMS includes four main parts: modeling language, data structure, database
query language, and transaction mechanisms.

[email protected]
Components of DBMS

 DBMS Engine accepts logical request from the various other DBMS
subsystems, converts them into physical equivalents, and actually accesses
the database and data dictionary as they exist on a storage device.
 Data Definition Subsystem helps user to create and maintain the data
dictionary and define the structure of the files in a database.
 Data Manipulation Subsystem helps user to add, change, and delete
information in a database and query it for valuable information. Software
tools within the data manipulation subsystem are most often the primary
interface between user and the information contained in a database. It allows
user to specify its logical information requirements.
 Application Generation Subsystem contains facilities to help users to
develop transaction-intensive applications. It usually requires that user
perform a detailed series of tasks to process a transaction. It facilitates easy-
to-use data entry screens, programming languages, and interfaces.
 Data Administration Subsystem helps users to manage the overall
database environment by providing facilities for backup and recovery, security
management, query optimization, concurrency control, and change
management.

The goal of Three-Schema architecture is to separate the user applications and


physical database. In this architecture, schemas can be defined at the following three
levels:

 The internal level:- The internal level has an internal schema which
describes the physical storage structure of the database.
 The conceptual level:-The conceptual level has a conceptual schema, it
describes the entities, data types, relationships, user operations, and
constraints.
 The external level or view level:- The external or view level includes a
number of external schemas or user views. It describes the part of the
database that a particular user group is interested in and hides the rest of the
database from that user group.

[email protected]
The internal level has an internal schema, which describes the physical storage
structure of the database. The internal schema uses physical data model, which
describes the complete details of data storage, access paths for the database, and
how the data’s are retrieved or inserted in the database. A data model is a collection
of conceptual tools for describing the data, data relationship, data semantics and
consistency constraints.

The conceptual level has a conceptual schema that describes the whole database for
different users who access the database. The conceptual schema hides the details of
the physical storage structures and concentrates basically on entities, relationships,
and constraints. The external or view level includes a number of user views. Each
external schema describes the part of the database that a particular user group is
interested in and hides the rest of the database from other user groups.
Implementation data model is used at this level. Each user group will refer to its own
external schema. Hence the DBMS should be capable of transforming the request
specified in the external schema into request against the conceptual schema. The
process of transforming requests and results between levels are called mappings.

The three-schema architecture explains the concept of data independence, which is


defined as the capacity to the change the schemas at one level of the database
system without having to change the schema at next higher level. The three-schema
architecture makes it easier to achieve true data independence. There are two types
of data independence, Logical data independence is the capacity to change the
conceptual schema without having to change the external schemas or application
[email protected]
programs. Only the view definition and the mappings need to be changed in the
DBMS that supports logical data independence. Physical data independence is the
capacity to change the internal schema without having to change the external
schemas.

Data independence is accomplished because, when the schema is changed at one


level the schema at the next higher-level remains unchanged only the mapping
between the two levels is changed. View is also called as “Virtual table” because view
does not contain physically stored records and will not occupy any space. A multi-
user database whose users have variety of applications must provide facilities for
defining multiple views. This three-schema helps us to provide data security of data’s
among different users accessing the database, ensures data integrity and avoid
duplication of data’s in the database. It helps us to establish and maintain
relationship among the data’s in the database.

Database model
A database model or database schema is the structure or format of a database,
described in a formal language supported by the database management system, In
other words, a "database model" is the application of a data model when used in
conjunction with a database management system.

Collage of five types of database models.


Schemas are generally stored in a data dictionary. Although a schema is defined in
text database language, the term is often used to refer to a graphical depiction of
the database structure.
Various techniques are used to model data structure. Most database systems are
built around one particular data model, although it is increasingly common for
products to offer support for more than one model. For any one logical model various
physical implementations may be possible, and most products will offer the user
some level of control in tuning the physical implementation, since the choices that
are made have a significant effect on performance. An example of this is the
relational model: all serious implementations of the relational model allow the
creation of indexes which provide fast access to rows in a table if the values of
certain columns are known.

[email protected]
Flat model

The flat (or table) model consists of a single, two-dimensional array of data
elements, where all members of a given column are assumed to be similar values,
and all members of a row are assumed to be related to one another. For instance,
columns for name and password that might be used as a part of a system security
database. Each row would have the specific password associated with an individual
user. Columns of the table often have a type associated with them, defining them as
character data, date or time information, integers, or floating point numbers. This
may not strictly qualify as a data model, as defined above.

Hierarchical model

In a hierarchical model, data is organized into a tree-like structure, implying a single


upward link in each record to describe the nesting, and a sort field to keep the
records in a particular order in each same-level list. Hierarchical structures were
widely used in the early mainframe database management systems, such as the
Information Management System (IMS) by IBM, and now describe the structure of
XML documents. This structure allows one 1:N relationship between two types of
data. This structure is very efficient to describe many relationships in the real world;
recipes, table of contents, ordering of paragraphs/verses, any nested and sorted
information. However, the hierarchical structure is inefficient for certain database
operations when a full path (as opposed to upward link and sort field) is not also
included for each record.

Parent–child relationship: Child may only have one parent but a parent can have
multiple children. Parents and children are tied together by links called "pointers". A
parent will have a list of pointers to each of their children.

[email protected]
Network model

The network model (defined by the CODASYL specification) organizes data using two
fundamental constructs, called records and sets. Records contain fields (which may
be organized hierarchically, as in the programming language COBOL). Sets (not to
be confused with mathematical sets) define one-to-many relationships between
records: one owner, many members. A record may be an owner in any number of
sets, and a member in any number of sets.

The network model is a variation on the hierarchical model, to the extent that it is
built on the concept of multiple branches (lower-level structures) emanating from
one or more nodes (higher-level structures), while the model differs from the
hierarchical model in that branches can be connected to multiple nodes. The network
model is able to represent redundancy in data more efficiently than in the
hierarchical model.

The operations of the network model are navigational in style: a program maintains
a current position, and navigates from one record to another by following the
relationships in which the record participates. Records can also be located by
supplying key values.

Although it is not an essential feature of the model, network databases generally


implement the set relationships by means of pointers that directly address the
location of a record on disk. This gives excellent retrieval performance, at the
expense of operations such as database loading and reorganization.

Most object databases use the navigational concept to provide fast navigation across
networks of objects, generally using object identifiers as "smart" pointers to related
objects. Objectivity/DB, for instance, implements named 1:1, 1:many, many:1 and
many:many named relationships that can cross databases. Many object databases
also support SQL, combining the strengths of both models.

Relational model

[email protected]
The relational model was introduced by E.F. Codd in 1970 as a way to make
database management systems more independent of any particular application. It is
a mathematical model defined in terms of predicate logic and set theory.

The products that are generally referred to as relational databases in fact implement
a model that is only an approximation to the mathematical model defined by Codd.
Three key terms are used extensively in relational database models: relations,
attributes, and domains. A relation is a table with columns and rows. The named
columns of the relation are called attributes, and the domain is the set of values the
attributes are allowed to take.

The basic data structure of the relational model is the table, where information about
a particular entity (say, an employee) is represented in rows (also called tuples) and
columns. Thus, the "relation" in "relational database" refers to the various tables in
the database; a relation is a set of tuples. The columns enumerate the various
attributes of the entity (the employee's name, address or phone number, for
example), and a row is an actual instance of the entity (a specific employee) that is
represented by the relation. As a result, each tuple of the employee table represents
various attributes of a single employee.

All relations (and, thus, tables) in a relational database have to adhere to some basic
rules to qualify as relations. First, the ordering of columns is immaterial in a table.
Second, there can't be identical tuples or rows in a table. And third, each tuple will
contain a single value for each of its attributes.

A relational database contains multiple tables, each similar to the one in the "flat"
database model. One of the strengths of the relational model is that, in principle, any
value occurring in two different records (belonging to the same table or to different
tables), implies a relationship among those two records. Yet, in order to enforce
explicit integrity constraints, relationships between records in tables can also be
defined explicitly, by identifying or non-identifying parent-child relationships
characterized by assigning cardinality (1:1, (0)1:M, M:M). Tables can also have a

[email protected]
designated single attribute or a set of attributes that can act as a "key", which can
be used to uniquely identify each tuple in the table.

A key that can be used to uniquely identify a row in a table is called a primary key.
Keys are commonly used to join or combine data from two or more tables. For
example, an Employee table may contain a column named Location which contains a
value that matches the key of a Location table. Keys are also critical in the creation
of indexes, which facilitate fast retrieval of data from large tables. Any column can
be a key, or multiple columns can be grouped together into a compound key. It is
not necessary to define all the keys in advance; a column can be used as a key even
if it was not originally intended to be one.

A key that has an external, real-world meaning (such as a person's name, a book's
ISBN, or a car's serial number) is sometimes called a "natural" key. If no natural key
is suitable (think of the many people named Brown), an arbitrary or surrogate key
can be assigned (such as by giving employees ID numbers). In practice, most
databases have both generated and natural keys, because generated keys can be
used internally to create links between rows that cannot break, while natural keys
can be used, less reliably, for searches and for integration with other databases. (For
example, records in two independently developed databases could be matched up by
social security number, except when the social security numbers are incorrect,
missing, or have changed.)

Object-relational database models

In recent years, the object-oriented paradigm has been applied to database


technology, creating a new programming model known as object databases. These
databases attempt to bring the database world and the application programming
world closer together, in particular by ensuring that the database uses the same type
system as the application program. This aims to avoid the overhead (sometimes
referred to as the impedance mismatch) of converting information between its
representation in the database (for example as rows in tables) and its representation
in the application program (typically as objects). At the same time, object databases
attempt to introduce the key ideas of object programming, such as encapsulation
and polymorphism, into the world of databases.

[email protected]
A variety of these ways have been tried for storing objects in a database. Some
products have approached the problem from the application programming end, by
making the objects manipulated by the program persistent. This also typically
requires the addition of some kind of query language, since conventional
programming languages do not have the ability to find objects based on their
information content. Others have attacked the problem from the database end, by
defining an object-oriented data model for the database, and defining a database
programming language that allows full programming capabilities as well as traditional
query facilities.

Object databases suffered because of a lack of standardization: although standards


were defined by ODMG, they were never implemented well enough to ensure
interoperability between products. Nevertheless, object databases have been used
successfully in many applications: usually specialized applications such as
engineering databases or molecular biology databases rather than mainstream
commercial data processing. However, object database ideas were picked up by the
relational vendors and influenced extensions made to these products and indeed to
the SQL language.

Hierarchical Model

The hierarchical data model organizes data in a tree structure. There is a hierarchy
of parent and child data segments. This structure implies that a record can have
repeating information, generally in the child data segments. Data in a series of
records, which have a set of field values attached to it. It collects all the instances of
a specific record together as a record type. These record types are the equivalent of
tables in the relational model, and with the individual records being the equivalent of
rows. To create links between these record types, the hierarchical model uses Parent
Child Relationships. These are a 1:N mapping between record types. This is done by
using trees, like set theory used in the relational model, "borrowed" from maths. For
example, an organization might store information about an employee, such as name,
employee number, department, salary. The organization might also store information
about an employee's children, such as name and date of birth. The employee and
children data forms a hierarchy, where the employee data represents the parent
segment and the children data represents the child segment. If an employee has
three children, then there would be three child segments associated with one
employee segment. In a hierarchical database the parent-child relationship is one to
many. This restricts a child segment to having only one parent segment. Hierarchical
DBMSs were popular from the late 1960s, with the introduction of IBM's Information
Management System (IMS) DBMS, through the 1970s.

Network Model

The popularity of the network data model coincided with the popularity of the
hierarchical data model. Some data were more naturally modeled with more than
one parent per child. So, the network model permitted the modeling of many-to-
many relationships in data. In 1971, the Conference on Data Systems Languages
(CODASYL) formally defined the network model. The basic data modeling construct
in the network model is the set construct. A set consists of an owner record type, a
set name, and a member record type. A member record type can have that role in
more than one set, hence the multiparent concept is supported. An owner record
type can also be a member or owner in another set. The data model is a simple
network, and link and intersection record types (called junction records by IDMS)

[email protected]
may exist, as well as sets between them . Thus, the complete network of
relationships is represented by several pairwise sets; in each set some (one) record
type is owner (at the tail of the network arrow) and one or more record types are
members (at the head of the relationship arrow). Usually, a set defines a 1:M
relationship, although 1:1 is permitted. The CODASYL network model is based on
mathematical set theory.

Relational Model

(RDBMS - relational database management system) A database based on the


relational model developed by E.F. Codd. A relational database allows the definition
of data structures, storage and retrieval operations and integrity constraints. In such
a database the data and relations between them are organised in tables. A table is a
collection of records and each record in a table contains the same fields.

 Properties of Relational Tables:


 Values Are Atomic
 Each Row is Unique
 Column Values Are of the Same Kind
 The Sequence of Columns is Insignificant
 The Sequence of Rows is Insignificant
 Each Column Has a Unique Name.

Certain fields may be designated as keys, which means that searches for specific
values of that field will use indexing to speed them up. Where fields in two different
tables take values from the same set, a join operation can be performed to select
related records in the two tables by matching values in those fields. Often, but not
always, the fields will have the same name in both tables. For example, an "orders"
table might contain (customer-ID, product-code) pairs and a "products" table might
contain (product-code, price) pairs so to calculate a given customer's bill you would
sum the prices of all products ordered by that customer by joining on the product-
code fields of the two tables. This can be extended to joining multiple tables on
multiple fields. Because these relationships are only specified at retreival time,
relational databases are classed as dynamic database management system. The
RELATIONAL database model is based on the Relational Algebra.

Object/Relational Model

Object/relational database management systems (ORDBMSs) add new object storage


capabilities to the relational systems at the core of modern information systems.
These new facilities integrate management of traditional fielded data, complex
objects such as time-series and geospatial data and diverse binary media such as
audio, video, images, and applets. By encapsulating methods with data structures,
an ORDBMS server can execute comple x analytical and data manipulation
operations to search and transform multimedia and other complex objects.

As an evolutionary technology, the object/relational (OR) approach has inherited the


robust transaction- and performance-management features of it s relational ancestor
and the flexibility of its object-oriented cousin. Database designers can work with
familiar tabular structures and data definition languages (DDLs) while assimilating

[email protected]
new object-management possibi lities. Query and procedural languages and call
interfaces in ORDBMSs are familiar: SQL3, vendor procedural languages, and ODBC,
JDBC, and proprie tary call interfaces are all extensions of RDBMS languages and
interfaces. And the leading vendors are, of course, quite well known: IBM, Inform ix,
and Oracle.

Object-Oriented Model

Object DBMSs add database functionality to object programming languages. They


bring much more than persistent storage of programming language objects. Object
DBMSs extend the semantics of the C++, Smalltalk and Java object programming
languages to provide full-featured database programming capability, while retaining
native language compatibility. A major benefit of this approach is the unification of
the application and database development into a seamless data model and language
environment. As a result, applications require less code, use more natural data
modeling, and code bases are easier to maintain. Object developers can write
complete database applications with a modest amount of additional effort.

According to Rao (1994), "The object-oriented database (OODB) paradigm is the


combination of object-oriented programming language (OOPL) systems and
persistent systems. The power of the OODB comes from the seamless treatment of
both persistent data, as found in databases, and transient data, as found in
executing programs."

In contrast to a relational DBMS where a complex data structure must be flattened


out to fit into tables or joined together from those tables to form the in-memory
structure, object DBMSs have no performance overhead to store or retrieve a web or
hierarchy of interrelated objects. This one-to-one mapping of object programming
language objects to database objects has two benefits over other storage
approaches: it provides higher performance management of objects, and it enables
better management of the complex interrelationships between objects. This makes
object DBMSs better suited to support applications such as financial portfolio risk
analysis systems, telecommunications service applications, world wide web
document structures, design and manufacturing systems, and hospital patient record
systems, which have complex relationships between data.

Entity-relationship model (ERM) is an abstract and conceptual representation of


data. Entity-relationship modeling is a database modeling method, used to produce a
type of conceptual schema or semantic data model of a system, often a relational
database, and its requirements in a top-down fashion. Diagrams created by this
process are called entity-relationship diagrams, ER diagrams, or ERDs.

Definition: An entity-relationship (ER) diagram is a specialized graphic that


illustrates the interrelationships between entities in a database. ER diagrams often
use symbols to represent three different types of information. Boxes are commonly
used to represent entities. Diamonds are normally used to represent relationships
and ovals are used to represent attributes.

What is the difference between weak entity set and strong entity?

An entity set that does not possess sufficient attributes to form a primary key is
called a weak entity set. One that does have a primary key is called a strong entity
set.

[email protected]
For example, the entity set transaction has attributes transaction-number, date and
amount.
Different transactions on different accounts could share the same number.
These are not sufficient to form a primary key (uniquely identify a transaction).
Thus transaction is a weak entity set. For a weak entity set to be meaningful, it must
be part of a one-to-many relationship set. This relationship set should have no
descriptive attributes.
The idea of strong and weak entity sets is related to the existence dependencies
seen earlier.
Member of a strong entity set is a dominant entity.
Member of a weak entity set is a subordinate entity.
A weak entity set does not have a primary key, but we need a means of
distinguishing among the entities.
The discriminator of a weak entity set is a set of attributes that allows this distinction
to be made.
The primary key of a weak entity set is formed by taking the primary key of the
strong entity set on which its existence depends (see Mapping Constraints) plus its
discriminator.

Relational Database Design/Constraints

Primary keys

A Primary Key is a Column that uniquely identifies a particular Row in a Table. For
example, a person entity may have a Column for SSN. If in your data model each
person has a unique SSN, then it may be a candidate for a Primary Key. (Primary
Keys can consist of two or more Columns, but this is not covered here.)

Primary Keys are also the means by which Foreign Keys work. Because of this, SSN
may actually not be a good choice as a Primary Key. In practice, Rows often have a
unique numeric identifier (often called an identity or sequence value) that uniquely
identifies a particular Row. These kinds of values are often used as Primary Keys.

It should be noted that RDBMSes often use a Table's Primary Key's column(s) to
automatically create a Structured Index on that Table. A Structured Index is an index
that physically re-orders the data to match the index. This is done to improve query
performance, but can actually hurt performance if the wrong column(s) are used as
the Primary Key.

In relational theoretical terms, a primary key is a chosen Candidate key, a minimal


set of attributes whose combination of instances in every row (tuple) is always
unique and identifies the row (tuple). A candidate key is a minimal superkey, a
superkey being any set of attributes (columns) which will identify the row , and the
largest superkey is the entire set of columns of the table ( attributes of the relation).

Foreign keys

A Foreign Key is a way to further constrain the allowable values of a Column to data
that exists in another Table. For example, if you have to process orders in your
system, you may create a Table called OrderInfo to store order information. An
order has to be associated with a customer, so you may have a Column in the
OrderInfo Table called CustomerID that somehow connects to an associated Row
in the Customer Table.

[email protected]
Most likely you do not want to be able to create orders for customers that do not
exist, and you would not want to delete a customer that is associated with any
orders. Doing so would break the Referential Integrity of the data. A Foreign Key
relationship ensures that these two rules are enforced.

By creating a Foreign Key relationship between OrderInfo's CustomerID column


and the Primary Key of the Customer Table, the RDBMS will ensure that
CustomerID always refers to a single existing Row in the Customer Table, and will
also prevent you from deleting that associated Row because one or more Rows in
OrderInfo depend on it.

Usually, the table with the foreign key constraint is referring to another table by that
table's primary key attribute(s). In a many-to-one relationship, for instance Orders is
many, and Customer is one, there are many Order rows per Customer row, so the
foreign key resides on the Order table. Customarily, the foreign key field names are
the same as the primary key field name of the table being referred to, so it is
probably a good idea to call the primary key on each table with redundant naming
like "TABLENAME_ID" e.g. Customer_ID.

Other Constraints

It is arguable, that the most important constraints are foreign key and primary key
constraints, because the process of normalization (see below), pushes most of the
data integrity checking onto the primary keying and joining ( retrieving rows using a
foreign key in one table, and a primary key table in another table).

Some DBMS provide a logical CHECK constraint, where the body of the CHECK
involves some sort of condition on one or more fields .

NOT NULL and UNIQUE are constraints applied to individual fields in the data
declaration statement CREATE TABLE ( f1 type1 PRIMARY KEY, f2 type2 UNIQUE , ...
CHECK (..) )

Relational algebra
Relational algebra, an offshoot of first-order logic (and of algebra of sets), deals
with a set of finitary relations (see also relation (database)) which is closed under
certain operators. These operators operate on one or more relations to yield a
relation. Relational algebra is a part of computer science.

Relational algebras received little attention until the publication of E.F. Codd's
relational model of data in 1970. Codd proposed such algebra as a basis for database
query languages.

Relational algebra is essentially equivalent in expressive power to relational calculus


(and thus first-order logic); this result is known as Codd's theorem. Some care,
however, has to be taken to avoid a mismatch that may arise between the two
languages since negation, applied to a formula of the calculus, constructs a formula
that may be true on an infinite set of possible tuples, while the difference operator of
relational algebra always returns a finite result. To overcome these difficulties, Codd
restricted the operands of relational algebra to finite relations only and also proposed
restricted support for negation (NOT) and disjunction (OR). Analogous restrictions
are found in many other logic-based computer languages. Codd defined the term
relational completeness to refer to a language that is complete with respect to

[email protected]
first-order predicate calculus apart from the restrictions he proposed. In practice the
restrictions have no adverse effect on the applicability of his relational algebra for
database purposes.

Primitive operations
Set operators
Projection (π)
Selection (σ)
Rename (ρ)

As in any algebra, some operators are primitive and the others, being definable in
terms of the primitive ones, are derived. It is useful if the choice of primitive
operators parallels the usual choice of primitive logical operators. Although it is well
known that the usual choice in logic of AND, OR and NOT is somewhat arbitrary,
Codd made a similar arbitrary choice for his algebra.

The six primitive operators of Codd's algebra are the selection, the projection, the
Cartesian product (also called the cross product or cross join), the set union, the set
difference, and the rename. (Actually, Codd omitted the rename, but the compelling
case for its inclusion was shown by the inventors of ISBL.) These six operators are
fundamental in the sense that none of them can be omitted without losing expressive
power. Many other operators have been defined in terms of these six. Among the
most important are set intersection, division, and the natural join. In fact ISBL made
a compelling case for replacing the Cartesian product with the natural join, of which
the Cartesian product is a degenerate case.

Altogether, the operators of relational algebra have identical expressive power to


that of domain relational calculus or tuple relational calculus. However, for the
reasons given in the Introduction above, relational algebra has strictly less
expressive power than that of first-order predicate calculus without function symbols.
Relational algebra actually corresponds to a subset of first-order logic that is Horn
clauses without recursion and negation.

Set operators
Although three of the six basic operators are taken from set theory, there are
additional constraints that are present in their relational algebra counterparts: For
set union and set difference, the two relations involved must be union-compatible—
that is, the two relations must have the same set of attributes. As set intersection
can be defined in terms of set difference, the two relations involved in set
intersection must also be union-compatible.
The Cartesian product is defined differently from the one defined in set theory in the
sense that tuples are considered to be 'shallow' for the purposes of the operation.
That is, unlike in set theory, where the Cartesian product of a n-tuple by an m-tuple
is a set of 2-tuples, the Cartesian product in relational algebra has the 2-tuple
"flattened" into an n+m-tuple. More formally, R × S is defined as follows:
R × S = {r  s | r  R, s  S}
In addition, for the Cartesian product to be defined, the two relations involved must
have disjoint headers — that is, they must not have a common attribute name.

Projection (π)
A projection is a unary operation written as where a1,...,an is a set of attribute
names. The result of such projection is defined as the set that is obtained when all
tuples in R are restricted to the set {a1,...,an}.

[email protected]
Selection (σ)
A generalized selection is a unary operation written as where is a propositional
formula that consists of atoms as allowed in the normal selection and the logical
operators (and), (or) and (negation). This selection selects all those tuples in R for
which holds.

Rename (ρ)
A rename is a unary operation written as ρa / b(R) where the result is identical to R
except that the b field in all tuples is renamed to an a field. This is simply used to
rename the attribute of a relation or the relation itself.

Natural join (⋈)

Natural join (⋈) is a binary operator that is written as (R⋈S) where R and S are
relations.[1] The result of the natural join is the set of all combinations of tuples in R
and S that are equal on their common attribute names. For an example consider the
tables Employee and Dept and their natural join:

Employee Dept Employee Dept


Name EmpId DeptName DeptName Manager Name EmpId DeptName Manager
Harry 3415 Finance Finance George Harry 3415 Finance George
Sally 2241 Sales Sales Harriet Sally 2241 Sales Harriet
George 3401 Finance Production Charles George 3401 Finance George
Harriet 2202 Sales Harriet 2202 Sales Harriet

This can also be used to define composition of relations. In category theory, the join
is precisely the fiber product.

The natural join is arguably one of the most important operators since it is the
relational counterpart of logical AND. Note carefully that if the same variable appears
in each of two predicates that are connected by AND, then that variable stands for
the same thing and both appearances must always be substituted by the same
value. In particular, natural join allows the combination of relations that are
associated by a foreign key. For example, in the above example a foreign key
probably holds from Employee.DeptName to Dept.DeptName and then the natural
join of Employee and Dept combines all employees with their departments. Note that
this works because the foreign key holds between attributes with the same name. If
this is not the case such as in the foreign key from Dept.manager to Emp.emp-
number then we have to rename these columns before we take the natural join.
Such a join is sometimes also referred to as an equijoin.

[email protected]
Equijoin

Consider tables Car and Boat which list models of cars and boats and their respective
prices. Suppose a customer wants to buy a car and a boat, but she doesn't want to
spend more money for the boat than for the car. The θ-join on the relation CarPrice
≥ BoatPrice produces a table with all the possible options.

Car Boat CarModel CarPrice BoatModel BoatPrice


CarModel CarPrice BoatModel BoatPrice CarA 20'000 Boat1 10'000
CarB 30'000 Boat1 10'000
CarA 20'000 Boat1 10'000
CarC 50'000 Boat1 10'000
CarB 30'000 Boat2 40'000
CarC 50'000 Boat2 40'000
CarC 50'000 Boat3 60'000

Semijoin (⋉)(⋊)

The semijoin is joining similar to the natural join and written as R⋉S where R and S
are relations. The result of the semijoin is only the set of all tuples in R for which
there is a tuple in S that is equal on their common attribute names. For an example
consider the tables Employee and Dept and their semi join:

Employee Dept Employee Dept


Name EmpId DeptName DeptName Manager Name EmpId DeptName
Harry 3415 Finance Sales Harriet Sally 2241 Sales
Sally 2241 Sales Production Charles Harriet 2202 Production
George 3401 Finance
Harriet 2202 Production

Antijoin (►)
The antijoin, written as R►S where R and S are relations, is similar to the natural
join, but the result of an antijoin is only those tuples in R for which there is NOT a
tuple in S that is equal on their common attribute names.
For an example consider the tables Employee and Dept and their antijoin:

Employee Dept Employee Dept


Name EmpId DeptName DeptName Manager Name EmpId DeptName
Harry 3415 Finance Sales Harriet Harry 3415 Finance
Sally 2241 Sales Production Charles George 3401 Finance
George 3401 Finance
Harriet 2202 Production

Division (÷)
The division is a binary operation that is written as R ÷ S. The result consists of the
restrictions of tuples in R to the attribute names unique to R, i.e., in the header of R
but not in the header of S, for which it holds that all their combinations with tuples in
S are present in R. For an example see the tables Completed, DBProject and their
division:

[email protected]
Completed
Student Task
Fred Database1
DBProject Completed ÷ DBProject
Fred Database2
Task Student
Fred Compiler1
Database1 Fred
Eugene Database1
Database2 Sara
Eugene Compiler1
Sara Database1
Sara Database2

A Database Administrator (DBA) is a person responsible for the design,


implementation, maintenance and repair of an organization's database. They are also
known by the titles Database Coordinator or Database Programmer, and is closely
related to the Database Analyst, Database Modeler, Programmer Analyst, and
Systems Manager. The role includes the development and design of database
strategies, monitoring and improving database performance and capacity, and
planning for future expansion requirements. They may also plan, co-ordinate and
implement security measures to safeguard the database.

Oracle DBA Responsibilities


1. Creates and maintains all databases required for development, testing,
education and production usage.
2. Performs the capacity planning required to create and maintain the
databases. The DBA works closely with system administration staff because
computers often have applications or tools on them in addition to the Oracle
Databases.
3. Performs ongoing tuning of the database instances.
4. Install new versions of the Oracle RDBMS and its tools and any other tools
that access the Oracle database.
5. Plans and implements backup and recovery of the Oracle database.
6. Controls migrations of programs, database changes, reference data changes
and menu changes through the development life cycle.
7. Implements and enforces security for all of the Oracle Databases.
8. Performs database re-organizations as required to assist performance and
ensure maximum uptime of the database.
9. Puts standards in place to ensure that all application design and code is
produced with proper integrity, security and performance. The DBA will
perform reviews on the design and code frequently to ensure the site
standards are being adhered to.
10. Evaluates releases of Oracle and its tools, and third party products to ensure
that the site is running the products that are most appropriate. Planning is
also performed by the DBA, along with the application developers and System
administrators, to ensure that any new product usage or release upgrade
takes place with minimal impact.
11. Provides technical support to application development teams. This is usually
in the form of a help desk. The DBA is usually the point of contact for Oracle
Corporation.
12. Enforces and maintains database constraints to ensure integrity of the
database.
13. Administers all database objects, including tables, clusters, indexes, views,
sequences, packages and procedures.

[email protected]
14. Assists with impact analysis of any changes made to the database objects.
15. Troubleshoots with problems regarding the databases, applications and
development tools.
16. Create new database users as required.
17. Manage sharing of resources amongst applications.
18. The DBA has ultimate responsibility for the physical database design.

The DBA should posses the following skills


1. A good knowledge of the operating system(s).
2. A good knowledge of physical database design.
3. Ability to perform both Oracle and also operating system performance
monitoring and the necessary adjustments.
4. Be able to provide a strategic database direction for the organisation.
5. Excellent knowledge of Oracle backup and recovery scenarios.
6. Good skills in all Oracle tools.
7. A good knowledge of Oracle security management.
8. A good knowledge of how Oracle acquires and manages resources.
9. Sound knowledge of the applications at your site.
10. Experience and knowledge in migrating code, database changes, data and
menus through the various stages of the development life cycle.
11. A good knowledge of the way Oracle enforces data integrity.
12. A sound knowledge of both database and program code performance tuning.
13. A DBA should possess a sound understanding of the business.

[email protected]

You might also like