Dbms Mod4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

DBMS

MODULE 4
● Authentication

Authentication Authorization

In the authentication process, the While in authorization process, a the


identity of users are checked for person’s or user’s authorities are
providing the access to the system. checked for accessing the resources.

In the authentication process, While in this process, users or


users or persons are verified. persons are validated.

While this process is done


It is done before the
after the authentication
authorization process.
process.

It needs usually the user’s While it needs the user’s


login details. privilege or security levels.
Authentication determines While it determines What
whether the person is user or permission does the user
not. have?

Generally, transmit Generally, transmit


information through an ID information through an
Token. Access Token.

The OpenID Connect (OIDC)


protocol is an authentication The OAuth 2.0 protocol
protocol that is generally in governs the overall system of
charge of user authentication user authorization process.
process.

Popular Authorization Techniques-


Popular Authentication Techniques-

● Role-Based Access
● Password-Based
Controls (RBAC)
Authentication
● JSON web token (JWT)
● Passwordless
Authorization
Authentication
● SAML Authorization
● 2FA/MFA (Two-Factor ● OpenID Authorization

Authentication / ● OAuth 2.0 Authorization

Multi-Factor

Authentication)

● Single sign-on (SSO)

● Social authentication

The authorization permissions


The authentication cannot be changed by user as
credentials can be changed in these are granted by the
part as and when required by owner of the system and only
the user. he/she has the access to
change it.

The user authentication is The user authorization is not


visible at user end. visible at the user end.

The user authentication is


identified with username, The user authorization is
password, face recognition, carried out through the access
retina scan, fingerprints, etc. rights to resources by using
roles that have been
pre-defined.

Example: Employees in a Example: After an employee


company are required to successfully authenticates,
authenticate through the the system determines what
network before accessing information the employees
their company email. are allowed to access.

● Authorization and access control


● DAC,MAC & RBAC models
Role-based access control (RBAC) is a security approach that restricts access to users
based on roles within the organization. RBAC is perhaps the precursor to the Zero Trust
security model, which assigns role-based permissions and limits employee access to
corporate resources in order to prevent data breaches.

It’s also important to point out that the cost of a breach without a Zero Trust approach in
2021 was $5.04 million but dropped down to $3.28 million when Zero Trust was
implemented. Role-based access control is essential when securing remote access and
preventing external attacks that can lead to major breaches.

Advantages of RBAC

​ Increased flexibility by assigning roles to employees only when required

​ Improves regulatory compliance as confidential data is managed more efficiently

​ Helps to easily integrate third-parties such as contractors and partners into your

network by assigning them predefined roles

​ Improves operational performance by eliminating the use of unnecessary


applications that cause tool sprawl for IT admins
​ Reduced administrative work

Disadvantages of RBAC

​ Role explosion which is when thousands of roles must be simultaneously

managed across multiple applications

​ Deployment can be quite complex, particularly in an enterprise environment

​ Access to specific actions in your system may be restricted but not to all data

​ Administrators may forget to assign permissions

What is Mandatory Access Control (MAC)?

MAC is a system-controlled access to objects based on the level of clearance assigned


to each user. MAC differs from other access control models in that it does not rely on
user permissions but rather on security labels assigned to each resource and is
controlled by a delegated administrator.

Under MAC system controls, users cannot accidentally override a security policy as a
system administrator sets all permissions. MAC systems are typically found in
governments due to the high-level of security.

Advantages of MAC

​ MAC provides tighter security as only an admin can alter controls, making it

difficult for unauthorized users to access resources

​ Subjects and objects have clearances and labels which are defined by secret or

top secret in order to preserve highly confidential data

Disadvantages of MAC

​ Clearing users is an expensive process

​ Constant maintenance is required which can burden management

​ Complex to implement
​ The classification labeling can overwhelm users and limit productivity

​ It is not always compatible with certain applications or operating systems

What is Discretionary Access Control (DAC)?

Discretionary access control is a security system that allows users to access resources
based on their permissions. DAC is among the most common types of access control
and relies on a hierarchical structure in which administrators are granted greater
privileges than regular users.

Originally defined by the Trusted Computer System Evaluation Criteria (TCSEC) “as a
means of restricting access to objects based on the identity of subjects and/or groups to
which they belong.” DAC is based on access control lists (ACLs) to specific company
resources. Discretionary access control is often discussed and paired with mandatory
access control as both focus on securing the system from a higher level.

Advantages of DAC

​ The authentication process is very strong

​ Lower administrative costs

​ Flexible

Disadvantages of DAC

​ ACL maintenance can be a very exhausting process

​ Limited negative authorization power

​ Difficulty audition due to extensive log entries

DAC vs MAC vs RBAC – And The Winner Is…

So, which access control model is the best? The answer is it depends on your
organization’s needs. If you are looking for a reliable and secure option, RBAC is a
good choice. If you are looking for a system that is easy to configure and manage, DAC
is a good option. If you are looking for a system that is extremely secure, then MAC is
ideal.
Discretionary access control (DAC) offers the most flexibility as it allows anyone to
assign controls and permissions to users without the approval of the IT department.
Security policies should be enforced before granting any type of authorization to
anyone. Make sure everyone is up to date on policies.

● Intrusion detection
A system called an intrusion detection system (IDS) observes network traffic for
malicious transactions and sends immediate alerts when it is observed. It is
software that checks a network or system for malicious activities or policy
violations. Each illegal activity or violation is often recorded either centrally
using a SIEM system or notified to an administration. IDS monitors a network or
system for malicious activity and protects a computer network from
unauthorized access from users, including perhaps insiders. The intrusion
detector learning task is to build a predictive model (i.e. a classifier) capable of
distinguishing between ‘bad connections’ (intrusion/attacks) and ‘good (normal)
connections’.

How does an IDS work?

● An IDS (Intrusion Detection System) monitors the traffic on a computer

network to detect any suspicious activity.


● It analyzes the data flowing through the network to look for patterns

and signs of abnormal behavior.

● The IDS compares the network activity to a set of predefined rules and

patterns to identify any activity that might indicate an attack or

intrusion.

● If the IDS detects something that matches one of these rules or

patterns, it sends an alert to the system administrator.

● The system administrator can then investigate the alert and take action

to prevent any damage or further intrusion.

Classification of Intrusion Detection System

IDS are classified into 5 types:

● Network Intrusion Detection System (NIDS): Network intrusion

detection systems (NIDS) are set up at a planned point within the

network to examine traffic from all devices on the network. It performs

an observation of passing traffic on the entire subnet and matches the

traffic that is passed on the subnets to the collection of known attacks.

Once an attack is identified or abnormal behavior is observed, the alert

can be sent to the administrator. An example of a NIDS is installing it

on the subnet where firewalls are located in order to see if someone is

trying to crack the firewall.


● Host Intrusion Detection System (HIDS): Host intrusion detection

systems (HIDS) run on independent hosts or devices on the network. A

HIDS monitors the incoming and outgoing packets from the device only

and will alert the administrator if suspicious or malicious activity is

detected. It takes a snapshot of existing system files and compares it

with the previous snapshot. If the analytical system files were edited or

deleted, an alert is sent to the administrator to investigate. An example

of HIDS usage can be seen on mission-critical machines, which are not

expected to change their layout.


● Protocol-based Intrusion Detection System (PIDS): Protocol-based

intrusion detection system (PIDS) comprises a system or agent that

would consistently reside at the front end of a server, controlling and

interpreting the protocol between a user/device and the server. It is

trying to secure the web server by regularly monitoring the HTTPS

protocol stream and accepting the related HTTP protocol. As HTTPS is


unencrypted and before instantly entering its web presentation layer

then this system would need to reside in this interface, between to use

the HTTPS.

● Application Protocol-based Intrusion Detection System (APIDS): An

application Protocol-based Intrusion Detection System (APIDS) is a

system or agent that generally resides within a group of servers. It

identifies the intrusions by monitoring and interpreting the

communication on application-specific protocols. For example, this

would monitor the SQL protocol explicitly to the middleware as it

transacts with the database in the web server.

● Hybrid Intrusion Detection System: Hybrid intrusion detection system

is made by the combination of two or more approaches to the intrusion

detection system. In the hybrid intrusion detection system, the host

agent or system data is combined with network information to develop

a complete view of the network system. The hybrid intrusion detection

system is more effective in comparison to the other intrusion detection

system. Prelude is an example of Hybrid IDS.

Benefits of IDS

● Detects malicious activity: IDS can detect any suspicious activities and

alert the system administrator before any significant damage is done.


● Improves network performance: IDS can identify any performance

issues on the network, which can be addressed to improve network

performance.

● Compliance requirements: IDS can help in meeting compliance

requirements by monitoring network activity and generating reports.

● Provides insights: IDS generates valuable insights into network traffic,

which can be used to identify any weaknesses and improve network

security.

Detection Method of IDS

1. Signature-based Method: Signature-based IDS detects the attacks on

the basis of the specific patterns such as the number of bytes or a

number of 1s or the number of 0s in the network traffic. It also detects

on the basis of the already known malicious instruction sequence that

is used by the malware. The detected patterns in the IDS are known as

signatures. Signature-based IDS can easily detect the attacks whose

pattern (signature) already exists in the system but it is quite difficult to

detect new malware attacks as their pattern (signature) is not known.

2. Anomaly-based Method: Anomaly-based IDS was introduced to

detect unknown malware attacks as new malware is developed

rapidly. In anomaly-based IDS there is the use of machine learning to

create a trustful activity model and anything coming is compared with

that model and it is declared suspicious if it is not found in the model.


The machine learning-based method has a better-generalized property

in comparison to signature-based IDS as these models can be trained

according to the applications and hardware configurations.

Comparison of IDS with Firewalls

IDS and firewall both are related to network security but an IDS differs from a
firewall as a firewall looks outwardly for intrusions in order to stop them from
happening. Firewalls restrict access between networks to prevent intrusion and
if an attack is from inside the network it doesn’t signal. An IDS describes a
suspected intrusion once it has happened and then signals an alarm.

Conclusion:

Intrusion Detection System (IDS) is a powerful tool that can help businesses in
detecting and prevent unauthorized access to their network. By analyzing
network traffic patterns, IDS can identify any suspicious activities and alert the
system administrator. IDS can be a valuable addition to any organization’s
security infrastructure, providing insights and improving network performance.

● SQL injection

SQL Injection
The SQL Injection is a code penetration technique that might cause loss to our
database. It is one of the most practiced web hacking techniques to place malicious
code in SQL statements, via webpage input. SQL injection can be used to manipulate
the application's web server by malicious users.

SQL injection generally occurs when we ask a user to input their username/userID.
Instead of a name or ID, the user gives us an SQL statement that we will unknowingly
run on our database. For Example - we create a SELECT statement by adding a variable
"demoUserID" to select a string. The variable will be fetched from user input
(getRequestString).
1. demoUserI = getrequestString("UserId");
2. demoSQL = "SELECT * FROM users WHERE UserId =" +demoUserId;

Types of SQL injection attacks

SQL injections can do more harm other than passing the login algorithms. Some of the
SQL injection attacks include:

○ Updating, deleting, and inserting the data: An attack can modify the cookies to
poison a web application's database query.

○ It is executing commands on the server that can download and install malicious
programs such as Trojans.

○ We are exporting valuable data such as credit card details, email, and passwords
to the attacker's remote server.

○ Getting user login details: It is the simplest form of SQL injection. Web
application typically accepts user input through a form, and the front end passes
the user input to the back end database for processing.

Example of SQL Injection

We have an application based on employee records. Any employee can view only their
own records by entering a unique and private employee ID. We have a field like an
Employee ID. And the employee enters the following in the input field:

236893238 or 1=1

It will translate to:

1. SELECT * from EMPLOYEE where EMPLOYEE_ID == 236893238 or 1=1

The SQL code above is valid and will return EMPLOYEE_ID row from the EMPLOYEE
table. The 1=1 will return all records for which this holds true. All the employee data is
compromised; now, the malicious user can also similarly delete the employee records.
Example:

1. SELECT * from Employee where (Username == "" or 1=1) AND (Password="" or


1=1).

Now the malicious user can use the '=' operator sensibly to retrieve private and secure
user information. So instead of the query mentioned above, the following query, when
exhausted, retrieve protected data, not intended to be shown to users.

1. SELECT * from EMPLOYEE where (Employee_name =" " or 1=1) AND (Password="
" or 1=1)

SQL injection based on Batched SQL statements

Several databases support batched SQL statements. It is a group of two or more SQL
statements separated by semicolons.

The SQL statement given below will return all rows from the Employee table, then delete
the Employee_Add table.

1. SELECT * From Employee; DROP Table Employee_Add

How to detect SQL Injection attacks

Creating a SQL Injection attack is not difficult, but even the best and good-intentioned
developers make mistakes. The detection of SQL Injection is, therefore, an essential
component of creating the risk of an SQL injection attack. Web Application Firewall can
detect and block basic SQL injection attacks, but we should depend on it as the sole
preventive measure.

Intrusion Detection System (IDS) is both network-based and host-based. It can be tuned
to detect SQL injection attacks. Network-based IDSec can monitor all connections to
our database server, and flags suspicious activities. The host-based IDS can monitor
web server logs and alert when something strange happens.

Impact of SQL Injection


The intruder can retrieve all the user-data present in the database, such as user details,
credit card information, and social security numbers, and can also gain access to
protected areas like the administrator portal. It is also possible to delete the user data
from the tables. These days all the online shopping applications, bank transactions use
back-end database servers. If the intruder can exploit SQL injection, the entire server is
compromised.

How to prevent SQL Injection attack

● We should use user authentication to validate input from the user by pre-defining
length, input type, and the input field.

● Restricting the access privileges of users and defining the amount of data any
outsider can access from the database. Generally, the user cannot be granted
permission to access everything in the database.

● We should not use system administrator accounts.

● Object oriented and object relational database

BASIS RDBMS OODBMS

Stands for Relational Stands for Object Oriented


Long
Database Management Database Management
Form
System. System.
Stores data in
Way of Entities, defined
storing as tables hold Stores data as Objects.
data specific
information.

Data Handles Handles larger and


Comple comparatively complex data than
xity simpler data. RDBMS.

Entity type refers Class describes a group


to the collection of objects that have
Groupin
of entity that common relationships,
g
share a common behaviors, and also have
definition. similar properties.

Data
RDBMS stores Stores data as well as
Handlin
only data. methods to use it.
g
Data
Main
Independence
Objectiv Data Encapsulation.
from application
e
program.

An object identifier
A Primary key
(OID) is an
distinctively
Key unambiguous, long-term
identifies an
name for any type of
object in a table..
object or entity.

Data
SQL (Structured Object Query Language
Retrieva
Query Language) (OQL)
l

RDBMS has
OODBMS has Highly
Scalabil Limited
scalable due to flexible
ity scalability due to
schema
rigid schema
Concurr RDBMS has
OODBMS has Optimistic
ency Fine-grained
concurrency control
Control locking

In RDBMS
Data Relational data is In OODBMS faster for
Relatio stored in tables complex object-oriented
nships and linked via queries
foreign keys

RDBMS is
Efficient for OODBMS is Faster for
Perform
complex queries complex object-oriented
ance
involving multiple queries
tables

RDBMS has
OODBMS has highly
Flexibili Limited flexibility
flexible due to
ty due to fixed
object-oriented nature
schema
Data In RDBMS Data In OODBMS Data is
Persiste is stored in tables stored in objects in
nce on disk memory or on disk

Exampl MySQL, Oracle, db4o, Versant,


es SQL Server Objectivity/DB

● Logical database
A Logical Database is a special type of ABAP (Advance Business Application
and Programming) that is used to retrieve data from various tables and the data
is interrelated to each other. Also, a logical database provides a read-only view
of Data.

Structure Of Logical Database:


A Logical database uses only a hierarchical structure of tables i.e. Data is
organized in a Tree-like Structure and the data is stored as records that are
connected to each other through edges (Links). Logical Database contains Open
SQL statements which are used to read data from the database. The logical
database reads the program, stores them in the program if required, and passes
them line by line to the application program.
Structure of Logical database

Features of Logical Database:


In this section, let us look at some features of a logical database:

● We can select only that type of Data that we need.

● Data Authentication is done in order to maintain security.

● Logical Database uses hierarchical Structure due to this data integrity

is maintained.

Goal Of Logical Database:


The goal of Logical Database is to create well-structured tables that reflect the
need of the user. The tables of the Logical database store data in a
non-redundant manner and foreign keys will be used in tables so that
relationships among tables and entities will be supported.

Tasks Of Logical Database:


Below is some important task of Logical Database:

● With the help of the Logical database, we will read the same data

from multiple programs.

● A logical database defines the same user interface for multiple

programs.

● Logical Database ensures the Authorization checks for the centralized

sensitive database.

● With the help of a Logical Database, Performance is improved. Like in

Logical Database we will use joins instead of multiple SELECT

statements, which will improve response time and this will increase

the Performance of Logical Database.

Data View Of Logical Database:


Logical Database provides a particular view of Logical Database tables. A
logical database is appropriately used when the structure of the Database is
Large. It is convenient to use flow i.e

● SELECT

● READ

● PROCESS

● DISPLAY
In order to work with databases efficiently. The data of the Logical Database is
hierarchical in nature. The tables are linked to each other in a Foreign Key
relationship.

Diagrammatically, the Data View of Logical Database is shown as:

Points To Remember:

● Tables must have Foreign Key Relationship.

● A logical Database consists of logically related tables that are

arranged in a hierarchical manner used for reading or retrieving Data.

● Logical Database consist of three main elements:

● Structure of Database

● Selections of Data from Database

● Database Program

● If we want to improve the access time on data, then we use VIEWS in

Logical Database.
Example:
Suppose in a University or College, a HOD wants to get information about a
specific student. So for that, he firstly retrieves the data about its batch and
Branch from a large amount of Data, and he will easily get information about
the required Student but didn’t alter the information about it.

Advantages Of Logical Database:


Let us look at some advantages of the logical database:

● In a Logical database, we can select meaningful data from a large

amount of data.

● Logical Database consists of Central Authorization which checks for

Database Accesses is Authenticated or not.

● In this Coding, the part is less required to retrieve data from the

database as compared to Other Databases.

● Access performance of reading data from the hierarchical structure of

the Database is good.

● Easy to understand user interfaces.


● Logical Database firstly check functions which further check that user

input is complete, correct, and plausible.

Disadvantages Of Logical Database:


This section shows the disadvantages of the logical database:

● Logical Database takes more time when the required data is at the last

because if that table which is required at the lowest level then firstly all

upper-level tables should be read which takes more time and this slows

down the performance.

● In Logical Database ENDGET command doesn’t exist due to this the code

block associated with an event ends with the next event statement.

Requirements :
● Web database
The Web-based database management system is one of the essential parts of
DBMS and is used to store web application data. A web-based Database
management system is used to handle those databases that are having data
regarding E-commerce, E-business, blogs, e-mail, and other online applications.

While many DBMS sellers are working for providing a proprietary database for
connectivity solutions with the Web, the majority of the organizations necessitate a more
general way out to prevent them from being tied into a single technology. Here are the
lists of some of the most significant necessities for the database integration applications
within the Web. These requirements are standards and not fully attainable at present.
There is no ranking of orders, and so the requirements are as follows:

● The ability and right to use valuable corporate data in a fully secured manner.
● Provides data and vendor's autonomous connectivity that allows freedom of
choice in selecting the DBMS for present and future use.
● The capability to interface to the database, independent of any proprietary Web
browser and/or Web server.
● A connectivity solution that takes benefit of all the features of an organization's
DBMS.
● An open-architectural structure that allows interoperability with a variety of
systems and technologies; such as:
○ Different types of Web servers
○ Microsoft's Distributed Common Object Model (DCOM) / Common Object
Model (COM)
○ CORBA / IIOP
○ Java / RMI which is Remote Method Invocation
○ XML (Extensible Markup Language)
○ Various Web services (SOAP, UDDI, etc.)
● A cost-reducing way which allows for scalability, development, and changes in
strategic directions and helps lessen the costs of developing and maintaining
those applications
● Provides support for transactions that span multiple HTTP requests.
● Gives minimal administration overhead.

Benefits of the Web-DBMS Approach


Here are various benefits that come through the use of web-based DBMS are:

● Provides simplicity
● Web-DBMS is Platform independence
● Provides Graphical User Interface (GUI)
● Standardization
● Provides Cross-platform support
● Facilitates transparent network access
● Scalability
● Innovation

● Distributed database

Distributed Database System in DBMS


A distributed database is essentially a database that is dispersed across numerous
sites, i.e., on various computers or over a network of computers, and is not restricted to
a single system. A distributed database system is spread across several locations with
distinct physical components. This can be necessary when different people from all
over the world need to access a certain database. It must be handled such that, to
users, it seems to be a single database.

Types:
1. Homogeneous Database: A homogeneous database stores data uniformly across all
locations. All sites utilize the same operating system, database management system,
and data structures. They are therefore simple to handle.

2. Heterogeneous Database: With a heterogeneous distributed database, many


locations may employ various software and schema, which may cause issues with
queries and transactions. Moreover, one site could not be even aware of the existence
of the other sites. Various operating systems and database applications may be used by
various machines. They could even employ separate database data models.
Translations are therefore necessary for communication across various sites.

Pause

Next

Unmute

Current TimeÂ

0:00

DurationÂ

18:10

Loaded: 1.10%

Fullscreen
Data may be stored on several places in two ways using distributed data storage:

1. Replication - With this strategy, every aspect of the connection is redundantly


kept at two or more locations. It is a completely redundant database if the entire
database is accessible from every location. Systems preserve copies of the data
as a result of replication. This has advantages since it makes more data
accessible at many locations. Moreover, query requests can now be handled in
parallel. But, there are some drawbacks as well. Data must be updated often. All
changes performed at one site must be documented at every site where that
relation is stored in order to avoid inconsistent results. There is a tone of
overhead here. Moreover, since concurrent access must now be monitored
across several sites, concurrency management becomes far more complicated.

2. Fragmentation - In this method, the relationships are broken up into smaller


pieces and each fragment is kept in the many locations where it is needed. To
ensure there is no data loss, the pieces must be created in a way that allows for
the reconstruction of the original relation. As fragmentation doesn't result in
duplicate data, consistency is not a concern.

Relationships can be fragmented in one of two ways:

○ Separating the relation into groups of tuples using rows results in horizontal
fragmentation, where each tuple is allocated to at least one fragment.

○ Vertical fragmentation, also known as splitting by columns, occurs when a


relation's schema is split up into smaller schemas. A common candidate key
must be present in each fragment in order to guarantee a lossless join

Sometimes a strategy that combines fragmentation and replication is employed.

Uses for distributed databases


○ The corporate management information system makes use of it.

○ Multimedia apps utilize it.

○ Used in hotel chains, military command systems, etc.

○ The production control system also makes use of it

Characteristics of distributed databases

Distributed databases are logically connected to one another when they are part of a
collection, and they frequently form a single logical database. Data is physically stored
across several sites and is separately handled in distributed databases. Each site's
processors are connected to one another through a network, but they are not set up for
multiprocessing.

A widespread misunderstanding is that a distributed database is equivalent to a loosely


coupled file system. It's considerably more difficult than that in reality. Although
distributed databases use transaction processing, they are not the same as systems
that use it.

Generally speaking, distributed databases have the following characteristics:

○ Place unrelated

○ Spread-out query processing

○ The administration of distributed transactions

○ Independent of hardware

○ Network independent of operating systems

○ Transparency of transactions

○ DBMS unrelated<

Architecture for a distributed database

Both homogeneous and heterogeneous distributed databases exist.


All of the physical sites in a homogeneous distributed database system use the same
operating system and database software, as well as the same underlying hardware. It
can be significantly simpler to build and administer homogenous distributed database
systems since they seem to the user as a single system. The data structures at each
site must either be the same or compatible for a distributed database system to be
considered homogeneous. Also, the database program utilized at each site must be
compatible or same.

The hardware, operating systems, or database software at each site may vary in a
heterogeneous distributed database. Although separate sites may employ various
technologies and schemas, a variation in schema might make query and transaction
processing challenging.

Various nodes could have dissimilar hardware, software, and data structures, or they
might be situated in incompatible places. Users may be able to access data stored at a
different place but not upload or modify it. Because heterogeneous distributed
databases are sometimes challenging to use, many organizations find them to be
economically unviable.

Distributed databases' benefits

Using distributed databases has a lot of benefits.

○ As distributed databases provide modular development, systems may be


enlarged by putting new computers and local data in a new location and
seamlessly connecting them to the distributed system.

○ With centralized databases, failures result in a total shutdown of the system.


Distributed database systems, however, continue to operate with lower
performance when a component fails until the issue is resolved.

○ If the data is near to where it is most often utilized, administrators can reduce
transmission costs for distributed database systems. Centralized systems are
unable to accommodate this<

Types of Distributed Database


○ Data instances are created in various areas of the database using replicated
data. Distributed databases may access identical data locally by utilizing
duplicated data, which reduces bandwidth. Read-only and writable data are the
two types of replicated data that may be distinguished.

○ Only the initial instance of replicated data can be changed in read-only versions;
all subsequent corporate data replications are then updated. Data that is writable
can be modified, but only the initial occurrence is affected.

○ Primary keys that point to a single database record are used to identify
horizontally fragmented data. Horizontal fragmentation is typically used when
business locations only want access to the database for their own branch.

○ Using primary keys that are duplicates of each other and accessible to each
branch of the database is how vertically fragmented data is organized. When a
company's branch and central location deal with the same accounts differently,
vertically fragmented data is used.

○ Data that has been edited or modified for decision support databases is referred
to as reorganised data. When two distinct systems are managing transactions
and decision support, reorganised data is generally utilised. When there are
numerous requests, online transaction processing must be reconfigured, and
decision support systems might be challenging to manage.

○ In order to accommodate various departments and circumstances, separate


schema data separates the database and the software used to access it. Often,
there is overlap between many databases and separate schema data

Distributed database examples

● Apache Ignite, Apache Cassandra, Apache HBase, Couchbase Server, Amazon


SimpleDB, Clusterpoint, and FoundationDB are just a few examples of the
numerous distributed databases available.

● Large data sets may be stored and processed with Apache Ignite across node
clusters. GridGain Systems released Ignite as open source in 2014, and it was
later approved into the Apache Incubator program. RAM serves as the database's
primary processing and storage layer in Apache Ignite.

● Apache Cassandra has its own query language, Cassandra Query Language, and
it supports clusters that span several locations (CQL). Replication tactics in
Cassandra may also be customized.

● Apache HBase offers a fault-tolerant mechanism to store huge amounts of


sparse data on top of the Hadoop Distributed File System. Moreover, it offers
per-column Bloom filters, in-memory execution, and compression. Although
Apache Phoenix offers a SQL layer for HBase, HBase is not meant to replace SQL
databases.

● An interactive application that serves several concurrent users by producing,


storing, retrieving, aggregating, altering, and displaying data is best served by
Couchbase Server, a NoSQL software package. Scalable key value and JSON
document access is provided by Couchbase Server to satisfy these various
application demands.
● Along with Amazon S3 and Amazon Elastic Compute Cloud, Amazon SimpleDB is
utilised as a web service. Developers may request and store data with Amazon
SimpleDB with a minimum of database maintenance and administrative work.

● Relational database designs' complexity, scalability problems, and performance


restrictions are all eliminated with Clusterpoint. Open APIs are used to handle
data in the XLM or JSON formats. Clusterpoint does not have the scalability or
performance difficulties that other relational database systems experience since
it is a schema-free document database.

● Data warehousing
● Data mining

Data Warehouse:
A Data Warehouse refers to a place where data can be stored for useful mining. It is like
a quick computer system with exceptionally huge data storage capacity. Data from the
various organization's systems are copied to the Warehouse, where it can be fetched
and conformed to delete errors. Here, advanced requests can be made against the
warehouse storage of data.

Data warehouse combines data from numerous sources which ensure the data quality,
accuracy, and consistency. Data warehouse boosts system execution by separating
analytics processing from transnational databases. Data flows into a data warehouse
from different databases. A data warehouse works by sorting out data into a pattern
that depicts the format and types of data. Query tools examine the data tables using
patterns.

Data warehouses and databases both are relative data systems, but both are made to
serve different purposes. A data warehouse is built to store a huge amount of historical
data and empowers fast requests over all the data, typically using Online Analytical
Processing (OLAP). A database is made to store current transactions and allow quick
access to specific transactions for ongoing business processes, commonly known as
Online Transaction Processing (OLTP).

Important Features of Data Warehouse

The Important features of Data Warehouse are given below:

1. Subject Oriented

A data warehouse is subject-oriented. It provides useful data about a subject instead of


the company's ongoing operations, and these subjects can be customers, suppliers,
marketing, product, promotion, etc. A data warehouse usually focuses on modeling and
analysis of data that helps the business organization to make data-driven decisions.

2. Time-Variant:

The different data present in the data warehouse provides information for a specific
period.

3. Integrated

A data warehouse is built by joining data from heterogeneous sources, such as social
databases, level documents, etc.

4. Non- Volatile

It means, once data entered into the warehouse cannot be change.

Advantages of Data Warehouse:

○ More accurate data access

○ Improved productivity and performance


○ Cost-efficient

○ Consistent and quality data

Data Mining:
Data mining refers to the analysis of data. It is the computer-supported process of
analyzing huge sets of data that have either been compiled by computer systems or
have been downloaded into the computer. In the data mining process, the computer
analyzes the data and extract useful information from it. It looks for hidden patterns
within the data set and try to predict future behavior. Data mining is primarily used to
discover and indicate relationships among the data sets.
Data mining aims to enable business organizations to view business behaviors, trends
relationships that allow the business to make data-driven decisions. It is also known as
knowledge Discover in Database (KDD). Data mining tools utilize AI, statistics,
databases, and machine learning systems to discover the relationship between the
data. Data mining tools can support business-related questions that traditionally
time-consuming to resolve any issue.

Important features of Data Mining:

The important features of Data Mining are given below:

○ It utilizes the Automated discovery of patterns.

○ It predicts the expected results.

○ It focuses on large data sets and databases

○ It creates actionable information.

Advantages of Data Mining:

i. Market Analysis:

Data Mining can predict the market that helps the business to make the decision. For
example, it predicts who is keen to purchase what type of products.

ii. Fraud detection:

Data Mining methods can help to find which cellular phone calls, insurance claims,
credit, or debit card purchases are going to be fraudulent.

iii. Financial Market Analysis:

Data Mining techniques are widely used to help Model Financial Market

iv. Trend Analysis:

Analyzing the current existing trend in the marketplace is a strategic benefit because it
helps in cost reduction and manufacturing process as per market demand.
Differences between Data Mining and Data
Warehousing:

Data Mining Data Warehousing

Data mining is the process of A data warehouse is a database system

determining data patterns. designed for analytics.

Data mining is generally Data warehousing is the process of

considered as the process of combining all the relevant data.

extracting useful data from a

large set of data.

Business entrepreneurs carry Data warehousing is entirely carried out

data mining with the help of by the engineers.

engineers.

In data mining, data is In data warehousing, data is stored

analyzed repeatedly. periodically.


Data mining uses pattern Data warehousing is the process of

recognition techniques to extracting and storing data that allow

identify patterns. easier reporting.

One of the most amazing One of the advantages of the data

data mining technique is the warehouse is its ability to update

detection and identification frequently. That is the reason why it is

of the unwanted errors that ideal for business entrepreneurs who

occur in the system. want up to date with the latest stuff.

The data mining techniques The responsibility of the data warehouse

are cost-efficient as is to simplify every type of business

compared to other statistical data.

data applications.

The data mining techniques In the data warehouse, there is a high

are not 100 percent possibility that the data required for

accurate. It may lead to analysis by the company may not be

serious consequences in a integrated into the warehouse. It can

certain condition. simply lead to loss of data.

Companies can benefit from Data warehouse stores a huge amount

this analytical tool by of historical data that helps users to

equipping suitable and analyze different periods and trends to

accessible knowledge-based make future predictions.

data.

You might also like