B SC (IT) VI-DSE3-M5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Department of B.Sc.

(IT) DSPMU, RANCHI

Module 4[Data Mining]

Outline of this Module:


 Overview of Data Mining.
 Definition Knowledge Discovery Process.
 OLAP and OLTP.
 Aspect of Data Mining-Association Rules, Outlier analysis and Predictive Analytics
 Concept of Data Mining in Data Warehouse environment.

On Completion, the student will be able to:


 Understand the definition of data mining.
 Describe KDD and process of KDD.
 Narrate the key differences between the data mining and data warehousing;
 List and discuss various data mining tools
 Discuss various data mining applications
Overview Data Mining

Data Mining is a technical process by which consistent patterns are identified, explored, sorted and
organized. Data Mining is used to provide two primary advantages –

1. To give businesses the predictive power to estimate the unknown or future values and
2. To provide businesses the descriptive power by finding interesting patterns in the data.

With data mining tools, organizations of any size can extract valuable insights from their datasets,
including information about consumers, costs, and future trends. This process can be employed to:-

A. Answer business questions that were traditionally too time-consuming to address.


B. make knowledge-driven decisions based on the absolute best data available.

DATA MINING AND ITS BENEFITS

Data mining which is also known as knowledge discovery in data (KDD) is the process of uncovering
patterns and other valuable information from large data sets. Given the evolution of data
warehousing technology and the growth of big data, adoption of data mining techniques has rapidly
accelerated over the last couple of decades, assisting companies by transforming their raw data into
useful knowledge. However, despite the fact that that technology continuously evolves to handle
data at a large-scale, leaders still face challenges with scalability and automation.

Data mining has improved organizational decision-making through insightful data analyses. The data
mining techniques that underpin these analyses can be divided into two main purposes; they can
either describe the target dataset or they can predict outcomes through the use of machine learning
algorithms. These methods are used to organize and filter data, surfacing the most interesting
information, from fraud detection to user behaviors, bottlenecks, and even security breaches.

The purpose of mining the data can be multi-fold which includes:

 Predicting various outcomes.


 Modeling target audience.
 Collecting the information about the product.

Dr. Priya Ranjan Priyadarshi


Department of B.Sc. (IT) DSPMU, RANCHI

With this technique, we analyze the data and then convert that data into meaningful information.
This helps the business to take accurate and better decisions in an organization. It helps to develop
smart market decision, run accurate campaigns, make predictions, and more. With the help of Data
mining, we can analyze customer behaviors and their insights. This leads to great success and data-
driven business.

In order to have a thorough understanding of the behavior of the customers, Data mining helps to
extract the pattern from the databases. The knowledge thus acquired, allows the companies to offer
the best possible services.

Benefits of Data Mining


Data mining provides us with the means of resolving problems in this challenging information age.
Data mining benefits include:

 It helps companies gather reliable information.


 It’s an efficient, cost-effective solution compared to other data applications
 It helps businesses make profitable production and operational adjustments.
 Data mining uses both new and legacy systems.
 It helps businesses make informed decisions.
 It helps detect credit risks and fraud.
 It helps data scientists easily analyze enormous amounts of data quickly.
 Data scientists can use the information to detect fraud, build risk models
 and improve product safety
 It helps data scientists quickly initiate automated predictions of behaviors and trends and
discover hidden patterns.

Types of Data that can be Mined

Data mining is not specific to one type of media or data. Data mining should be applicable to any
kind of information repository. However, algorithms and approaches may differ when applied to
different types of data. Indeed, the challenges presented by different types of data vary significantly.
Data mining is being put into use and studied for databases, including relational databases, object-
relational databases and object-oriented databases, data warehouses, transactional databases,
unstructured and semi-structured repositories such as the World Wide Web, social media data,
advanced databases such as spatial databases, multimedia databases, time-series databases and
textual databases, and even flat files.

KDD- Knowledge Discovery in Databases

The term KDD stands for Knowledge Discovery in Databases. It refers to the broad procedure of
discovering knowledge in data and emphasizes the high-level applications of specific Data Mining
techniques. It is a field of interest to researchers in various fields, including artificial intelligence,
machine learning, pattern recognition, databases, statistics, knowledge acquisition for expert
systems, and data visualization.

The main objective of the KDD process is to extract information from data in the context of large
databases. It does this by using Data Mining algorithms to identify what is deemed knowledge.

The Knowledge Discovery in Databases is considered as a programmed, exploratory analysis and


modeling of vast data repositories. KDD is the organized procedure of recognizing valid, useful, and
understandable patterns from huge and complex data sets. Data Mining is the root of the KDD

Dr. Priya Ranjan Priyadarshi


Department of B.Sc. (IT) DSPMU, RANCHI

procedure, including the inferring of algorithms that investigate the data, develop the model, and
find previously unknown patterns. The model is used for extracting the knowledge from the data,
analyze the data, and predict the data.

The availability and abundance of data today make knowledge discovery and Data Mining a matter
of impressive significance and need. In the recent development of the field, it isn't surprising that a
wide variety of techniques is presently accessible to specialists and experts.

The KDD Process

The knowledge discovery process is iterative and interactive, comprises of nine steps. The process is
iterative at each stage, implying that moving back to the previous actions might be required. The
process has many imaginative aspects in the sense that one can’t presents one formula or make a
complete scientific categorization for the correct decisions for each step and application type. Thus,
it is needed to understand the process and the different requirements and possibilities in each stage.

KDD (Knowledge Discovery in Databases) is a process that involves the extraction of useful,
previously unknown, and potentially valuable information from large datasets. The KDD process is an
iterative process and it requires multiple iterations of the above steps to extract accurate knowledge
from the data. The following steps are included in KDD process:

1. Building up an understanding of the application domain

This is the initial preliminary step. It develops the scene for understanding what should be done with
the various decisions like transformation, algorithms, representation, etc. The individuals who are

Dr. Priya Ranjan Priyadarshi


Department of B.Sc. (IT) DSPMU, RANCHI

in charge of a KDD venture need to understand and characterize the objectives of the end-user and
the environment in which the knowledge discovery process will occur.

2. Choosing and creating a data set on which discovery will be performed

Once defined the objectives, the data that will be utilized for the knowledge discovery process
should be determined. This incorporates discovering what data is accessible, obtaining important
data, and afterward integrating all the data for knowledge discovery onto one set involves the
qualities that will be considered for the process. This process is important because of Data Mining
learns and discovers from the accessible data. This is the evidence base for building the models. If
some significant attributes are missing, at that point, then the entire study may be unsuccessful
from this respect, the more attributes are considered. On the other hand, to organize, collect, and
operate advanced data repositories is expensive, and there is an arrangement with the opportunity
for best understanding the phenomena. This arrangement refers to an aspect where the interactive
and iterative aspect of the KDD is taking place. This begins with the best available data sets and later
expands and observes the impact in terms of knowledge discovery and modelling.

3. Preprocessing and cleansing

In this step, data reliability is improved. It incorporates data clearing, for example, Handling the
missing quantities and removal of noise or outliers. It might include complex statistical techniques or
use a Data Mining algorithm in this context. For example, when one suspects that a specific attribute
of lacking reliability or has many missing data, at this point, this attribute could turn into the
objective of the Data Mining supervised algorithm. A prediction model for these attributes will be
created, and after that, missing data can be predicted. The expansion to which one pays attention to
this level relies upon numerous factors. Regardless, studying the aspects is significant and regularly
revealing by itself, to enterprise data frameworks.

4. Data Transformation

In this stage, the creation of appropriate data for Data Mining is prepared and developed.
Techniques here incorporate dimension reduction ( for example, feature selection and extraction
and record sampling), also attribute transformation(for example, discretization of numerical
attributes and functional transformation). This step can be essential for the success of the entire
KDD project, and it is typically very project-specific. For example, in medical assessments, the
quotient of attributes may often be the most significant factor and not each one by itself. In
business, we may need to think about impacts beyond our control as well as efforts and transient
issues. For example, studying the impact of advertising accumulation. However, if we do not utilize
the right transformation at the starting, then we may acquire an amazing effect that insights to us
about the transformation required in the next iteration. Thus, the KDD process follows upon itself
and prompts an understanding of the transformation required.

5. Prediction and description


We are now prepared to decide on which kind of Data Mining to use, for example, classification,
regression, clustering, etc. This mainly relies on the KDD objectives, and also on the previous steps.
There are two significant objectives in Data Mining, the first one is a prediction, and the second one
is the description. Prediction is usually referred to as supervised Data Mining, while descriptive Data
Mining incorporates the unsupervised and visualization aspects of Data Mining. Most Data Mining
techniques depend on inductive learning, where a model is built explicitly or implicitly by
generalizing from an adequate number of preparing models. The fundamental assumption of the

Dr. Priya Ranjan Priyadarshi


Department of B.Sc. (IT) DSPMU, RANCHI

inductive approach is that the prepared model applies to future cases. The technique also takes into
account the level of meta-learning for the specific set of accessible data.

6. Selecting the Data Mining algorithm

Having the technique, we now decide on the strategies. This stage incorporates choosing a particular
technique to be used for searching patterns that include multiple inducers. For example, considering
precision versus understandability, the previous is better with neural networks, while the latter is
better with decision trees. For each system of meta-learning, there are several possibilities of how it
can be succeeded. Meta-learning focuses on clarifying what causes a Data Mining algorithm to be
fruitful or not in a specific issue. Thus, this methodology attempts to understand the situation under
which a Data Mining algorithm is most suitable. Each algorithm has parameters and strategies of
leaning, such as ten folds cross-validation or another division for training and testing.

7. Utilizing the Data Mining algorithm

At last, the implementation of the Data Mining algorithm is reached. In this stage, we may need to
utilize the algorithm several times until a satisfying outcome is obtained. For example, by turning the
algorithms control parameters, such as the minimum number of instances in a single leaf of a
decision tree.

8. Pattern Evaluation

In this step, we assess and interpret the mined patterns, rules, and reliability to the objective
characterized in the first step. Here we consider the preprocessing steps as for their impact on the
Data Mining algorithm results. For example, including a feature in step 4, and repeat from there. This
step focuses on the comprehensibility and utility of the induced model. In this step, the identified
knowledge is also recorded for further use. The last step is the use, and overall feedback and
discovery results acquire by Data Mining.

9. Presentation of discovered knowledge

Now, we are prepared to include the knowledge into another system for further activity. The
knowledge becomes effective in the sense that we may make changes to the system and measure
the impacts. The accomplishment of this step decides the effectiveness of the whole KDD process.
There are numerous challenges in this step, such as losing the "laboratory conditions" under which
we have worked. For example, the knowledge was discovered from a certain static depiction, it is
usually a set of data, but now the data becomes dynamic. Data structures may change certain
quantities that become unavailable, and the data domain might be modified, such as an attribute
that may have a value that was not expected previously.

Advantages of KDD

 Improves decision-making: KDD provides valuable insights and knowledge that can help
organizations make better decisions.
 Increased efficiency: KDD automates repetitive and time-consuming tasks and makes the
data ready for analysis, which saves time and money.
 Better customer service: KDD helps organizations gain a better understanding of their
customers’ needs and preferences, which can help them provide better customer service.
 Fraud detection: KDD can be used to detect fraudulent activities by identifying patterns and
anomalies in the data that may indicate fraud.

Dr. Priya Ranjan Priyadarshi


Department of B.Sc. (IT) DSPMU, RANCHI

 Predictive modeling: KDD can be used to build predictive models that can forecast future
trends and patterns.

Disadvantages of KDD

 Privacy concerns: KDD can raise privacy concerns as it involves collecting and analyzing large
amounts of data, which can include sensitive information about individuals.
 Complexity: KDD can be a complex process that requires specialized skills and knowledge to
implement and interpret the results.
 Unintended consequences: KDD can lead to unintended consequences, such as bias or
discrimination, if the data or models are not properly understood or used.
 Data Quality: KDD process heavily depends on the quality of data, if data is not accurate or
consistent, the results can be misleading
 High cost: KDD can be an expensive process, requiring significant investments in hardware,
software, and personnel.
 Overfitting: KDD process can lead to overfitting, which is a common problem in machine
learning where a model learns the detail and noise in the training data to the extent that it
negatively impacts the performance of the model on new unseen data.

OLTP AND OLAP

Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) are the two
terms which look similar but refer to different kinds of systems. Online transaction
processing (OLTP) captures, stores, and processes data from transactions in real time. Online
analytical processing (OLAP) uses complex queries to analyze aggregated historical data from
OLTP systems.

1. Online Transaction Processing (OLTP)


An OLTP system captures and maintains transaction data in a database. Each transaction
involves individual database records made up of multiple fields or columns. Examples
include banking and credit card activity or retail checkout scanning.
In OLTP, the emphasis is on fast processing, because OLTP databases are read, written, and
updated frequently. If a transaction fails, built-in system logic ensures data integrity.

2. Online Analytical Processing (OLTP)


OLAP applies complex queries to large amounts of historical data, aggregated from OLTP
databases and other sources, for data mining, analytics, and business intelligence projects.
In OLAP, the emphasis is on response time to these complex queries. Each query involves
one or more columns of data aggregated from many rows.

Examples include year-over-year financial performance or marketing lead generation trends.


OLAP databases and data warehouses give analysts and decisionmakers the ability to use
custom reporting tools to turn data into information. Query failure in OLAP does not
interrupt or delay transaction processing for customers, but it can delay or impact the
accuracy of business intelligence insights. OLTP is operational, while OLAP is informational. A
glance at the key features of both kinds of processing illustrates their fundamental
differences, and how they work together.

Dr. Priya Ranjan Priyadarshi


Department of B.Sc. (IT) DSPMU, RANCHI

Differences between OLTP and OLAP


Key OLTP OLAP
Difference
Characteristics
Handles a large number of small Handles large volumes of data
transactions with complex queries
Query types Simple standardized queries Complex queries
Operations Based on INSERT, UPDATE,DELETE Based on SELECT commands to
commands aggregate data for reporting
Response time Milliseconds Seconds, minutes, or hours
depending on the amount of
data to process
Design Industry-specific, such as retail, Subject-specific, such as sales,
manufacturing, or banking inventory, or marketing
Source Transactions Aggregated data from
transactions
Purpose Control and run essential business Plan, solve problems, support
operations in real time decisions, discover hidden
insights
Data updates Short, fast updates initiated by Data periodically refreshed with
user scheduled, long-running batch
jobs
Space Generally small if historical Generally large due to
requirements data is archived aggregating large datasets
Backup and Regular backups required to ensure Lost data can be reloaded from
recovery business continuity and meet legal OLTP database as needed in
and governance requirements lieu of regular backups
Productivity Increases productivity of end users Increases productivity of
business managers, data
analysts, and executives
Data view Lists day-to-day business transactions Multi-dimensional view of
enterprise data
User examples Customer-facing personnel, clerks, Knowledge workers such as
online shoppers data analysts, business analysts,
and executives
Database Normalized databases for efficiency Denormalized databases for
design analysis

Note: OLTP provides an immediate record of current business activity, while OLAP
generates and validates insights from that data as it’s compiled over time. That historical
perspective empowers accurate forecasting, but as with all business intelligence, the
insights generated with OLAP are only as good as the data pipeline from which they
emanate.

DATA MINING TECHNIQUES


Data mining is most useful in identifying data patterns and deriving useful business insights from
those patterns. To accomplish these tasks, data miners use a variety of techniques to generate
different results. Here are five common data mining techniques:

1. Association Rule Learning

Dr. Priya Ranjan Priyadarshi


Department of B.Sc. (IT) DSPMU, RANCHI

This function seeks to uncover the relationships between data points; it is used to determine
whether a specific action or variable has any traits that can be linked to other actions (e.g.,
business travelers’ room choices and dining habits). A hotelier might use association rule
insights to offer room upgrades or food and beverage promotions to attract additional
business travelers.
2. Anomaly or Outlier Detection
In addition to searching for patterns, data mining seeks to uncover unusual data within a set.
Anomaly detection is the process of finding data that doesn’t conform to the pattern. This
process can help find instances of fraud and help retailers learn more about spikes, or
declines, in the sales of certain products.
3. 3. Prediction
Data Prediction is a two-step process, similar to that of data classification. Although, for
prediction, we do not utilize the phrasing of “Class label attribute” because the attribute for
which values are being predicted is consistently valued(ordered) instead of categorical
(discrete-esteemed and unordered). The attribute can be referred to simply as the predicted
attribute. Prediction can be viewed as the construction and use of a model to assess the
class of an unlabelled object, or to assess the value or value ranges of an attribute that a
given object is likely to have.
4. Classification Analysis
With this technique, data points are assigned to groups, or classes, based on a specific
question or problem to address. For instance, if a consumer packaged goods company wants
to optimize its coupon discount strategy for a specific product, it might review inventory
levels, sales data, coupon redemption rates, and consumer behavioral data in order to make
the best decision possible.
5. Clustering Analysis
Clustering looks for similarities within a data set, separating data points that share common
traits into subsets. This is similar to the classification type of analysis in that it groups data
points, but, in clustering analysis, the data is not assigned to previously defined groups.
Clustering is useful for defining traits within a data set, such as the segmentation of
customers based on purchase behavior, need state, life stage, or likely preferences in
marketing communication.
6. Regression Analysis
Regression analysis is about understanding which factors within a data set are most
important, which can be ignored, and how these factors interact. With this technique, data
miners are able to validate theories such as “when a lot of snow is predicted, more bread
and milk will be sold before the storm.” While this seems obvious enough there are a
number of variables that need to be verified and quantified for the store manager to make
sure enough stock is available. For example, how much is “a lot” of snow? How much is
“more milk and bread”? Which types of weather forecasts tend to cause consumer action
and how many days before the storm will consumers start buying? What is the relationship
between inches of snow, units of bread, and units of milk? Through regression analysis,
specific inventory levels of milk and bread (in units/cases) can be recommended for specific
levels of snow forecasted (inches), at specific points in time (days before the storm). In this
way, the use of regression analysis maximizes sales, minimizes out-of-stock instances, and
helps avoid overstocking which results in product spoilage after the storm.

Concept of Data Mining in Data Warehouse environment

Dr. Priya Ranjan Priyadarshi


Department of B.Sc. (IT) DSPMU, RANCHI

A data warehouse is a technique for collecting and managing data from varied sources to provide
meaningful business insights. It is a blend of technologies and components which allows the
strategic use of data. Data Warehouse is repository of a large amount of data, which is designed for
query and analysis instead of transaction processing. It is a process of transforming data into
information and making it available to users for analysis.

On the other hand, data mining is looking for hidden, valid, and potentially useful patterns in huge
data sets. Data Mining is all about discovering unsuspected/ previously unknown relationships
amongst the data. It is a multi-disciplinary skill that uses machine learning, statistics, AI and database
technology. The insights extracted via Data mining can be used for marketing, fraud detection, and
scientific discovery, etc.

Data Mining Vs Data Warehouse

Data Mining Data Warehouse


Data mining is the process of analyzing A data warehouse is database system which is
unknown patterns of data. designed for analytical instead of transactional
work.
Data mining is a method of comparing large Data warehousing is a method of centralizing
amounts of data to finding right patterns. data from different sources into one common
repository.
Data mining is usually done by business users Data warehousing is a process which needs to
with the assistance of engineers. occur before any data mining can take place.
Data mining is the considered as a process of On the other hand, Data warehousing is the
extracting data from large data sets. process of pooling all relevant data together.
One of the most important benefits of data One of the pros of Data Warehouse is its ability
mining techniques is the detection and to update consistently. That's why it is ideal for
identification of errors in the system. the business owner who wants the best and
latest features.
Data mining helps to create suggestive Data Warehouse adds an extra value to
patterns of important factors. Like the buying operational business systems like CRM systems
habits of customers, products, sales. So that, when the warehouse is integrated.
companies can make the necessary
adjustments in operation and production.
The Data mining techniques are never 100% In the data warehouse, there is great chance that
accurate and may cause serious consequences the data which was required for analysis by the
in certain conditions. organization may not be integrated into the
warehouse. It can easily lead to loss of
information.
The information gathered based on Data Data warehouses are created for a huge IT
Mining by organizations can be misused project. Therefore, it involves high maintenance
against a group of people. system which can impact the revenue of medium
to small-scale organizations.
After successful initial queries, users may ask Data Warehouse is complicated to implement
more complicated queries which would and maintain
increase the workload.
Organizations can benefit from this analytical Data warehouse stores a large amount of
tool by equipping pertinent and usable historical data which helps users to analyze
knowledge-based information. different time periods and trends for making
future predictions.

Dr. Priya Ranjan Priyadarshi


Department of B.Sc. (IT) DSPMU, RANCHI

Organizations need to spend lots of their In Data warehouse, data is pooled from multiple
resources for training and Implementation sources. The data needs to be cleaned and
purpose. Moreover, data mining tools work in transformed. This could be a challenge.
different manners due to different algorithms
employed in their design.
The data mining methods are costeffective Data warehouse's responsibility is to simplify
and efficient compares to other statistical every type of business data. Most of the work
data applications. that will be done on user's part is inputting the
raw data.
Another critical benefit of data mining Data warehouse allows users to access critical
techniques is the identification of errors which data from the number of sources in a single
can lead to losses. Generated data could be place. Therefore, it saves user's time of
used to detect a drop-in sale. retrieving data from multiple sources.
Data mining helps to generate actionable Once you input any information into Data
strategies built on data insights. warehouse system, you will unlikely to lose track
of this data again. You need to conduct a quick
search, helps you to find the right statistic
information.

DATA MINING TOOLS

Data mining techniques are derived that utilize the domain knowledge from statistical analysis,
artificial intelligence, and database systems in order to analyze data in a proper manner in view of
different dimensions and perspectives. Data mining tools discover patterns or trends from the
collection of large sets of data and transforming data into useful information for making decisions.
These are the most popular data mining tools are Orange, SaS, Rattle Data Mining, Rapidminer,
DataMelt Data Mining, Oracle BI etc.

1. Orange Data Mining


Developed at the bioinformatics laboratory at the faculty of computer and information
science, Ljubljana University, Slovenia, Orange is a machine learning and data mining
software suite. It supports data visualization and is a software-based on components written
in Python. The components of Orange are called “widgets”.
These widgets range from data visualization and preprocessing to assessing the algorithms
and are used for predictive modeling.
Widgets deliver much important functionalities such as:
a. Displaying data table and allowing to select features
b. Reading of data.
c. Training predictors and comparison of learning algorithms
d. Data element visualization, etc.
Data coming to orange is formatted quickly to the desired pattern, and the widgets can be
easily transferred wherever and whenever needed. Orange is quite interesting to users.
Orange allows its users to make smarter decisions in a short time by rapidly comparing and
analyzing the data. Data mining can be performed via visual programming or Python
scripting. Many analysis are feasible through its visual programming interface (drag and drop
connected with widgets) and many visual tools tend to be supported such as bar charts,
scatter-plots, trees, dendrograms, and heat maps. A substantial amount of widgets (more
than 100) tend to be supported. The instrument has machine learning components, add-ons

Dr. Priya Ranjan Priyadarshi


Department of B.Sc. (IT) DSPMU, RANCHI

for bioinformatics and text mining, and it is packed with features for data analytics. This is
also used as a python library.
Orange comprises of canvas interface onto which the user places widgets and creates a data
analysis workflow. The widget proposes fundamental operations, For example, reading the
data, showing a data table, selecting features, training predictors, comparing learning
algorithms, visualizing data elements, etc. Orange comes with multiple regression and
classification algorithms. Orange can read documents in native and other data formats.
Orange is dedicated to machine learning techniques for classification or supervised data
mining. There are two types of objects used in classification: learner and classifiers. Learners
consider class-leveled data and return a classifier. Regression methods are very similar to
classification in Orange, and both are designed for supervised data mining and require class-
level data. The learning of ensembles combines the predictions of individual models for
precision gain. The model can either come from different training data or use different
learners on the same sets of data.

2. SAS Data Mining


SAS stands for Statistical Analysis System. It is a product of the SAS Institute created
for analytics and data management. SAS can mine data, change it, manage
information from various sources, and analyze statistics. It offers a graphical UI for
non-technical users. SAS data miner allows users to analyze big data and provide
accurate insight for timely decision-making purposes. SAS has distributed memory
processing architecture that is highly scalable. It is suitable for data mining,
optimization, and text mining purposes.
3. Rattle Data Mining
Rattle is a popular GUI for data mining using R. It presents statistical and visual
summaries of data, transforms data so that it can be readily modelled, builds both
unsupervised and supervised machine learning models from the data, presents the
performance of models graphically, and scores new datasets for deployment into
production. Rattle is Free Open Source Software. Key feature is that all of your
interactions through the graphical user interface are captured as an R script that can
be readily executed in R independently of the Rattle interface. Use it as a tool to
learn and develop your skills in R and then to build your initial models in Rattle to
then be tuned in R which provides considerably more powerful options.
4. Rapid Miner
Rapid Miner is one of the most popular tools for performing predictions. It is written
in JAVA programming language. It offers an integrated environment for text mining,
deep learning, machine learning, and predictive analysis.
The instrument can be used for a wide range of applications, including company
applications, commercial applications, research, education, training, application
development, machine learning.
Rapid Miner provides the server on-site as well as in public or private cloud
infrastructure. It has a client/server model as its base. A rapid miner comes with
template-based frameworks that enable fast delivery with few errors.
5. Data Melt
DataMelt is a free to use tool for numeric computation, mathematics, data analysis,
and data visualization. This program offers you the simplicity of scripting languages,
like Python, Ruby, Groovy with the power of hundreds of Java packages.
Features:

Dr. Priya Ranjan Priyadarshi


Department of B.Sc. (IT) DSPMU, RANCHI

1. DataMelt offers statistics, analysis of large data volumes, and scientific


visualization.
2. You can use it with different programming languages on different operating
systems.
3. It allows you to create high-quality vector-graphics images (EPS, SVG, PDF,
etc.), which can be included in LaTeX and another text processor.
4. Data Melt offers the usage of scripting languages, which are significantly
faster than the standard Python implemented in C.

APPLICATIONS OF DATA MINING

Data Mining is primarily used today by companies with a strong consumer focus, in the areas like
retail, financial, communication, and marketing organizations, to “drill down” into their transactional
data and determine pricing, customer preferences and product positioning, impact on sales,
customer satisfaction and corporate profits. With data mining, a retailer can use point-of-sale
records of customer purchases to develop products and promotions to appeal to specific customer
segments.

Following are some of the applications of data mining:

1. Basket Analysis
In its most basic application, retailers use basket analysis to analyze what consumers buy (or
put in their “baskets”). This is a form of the association technique, giving retailers insight
into buying habits and allowing them to recommend other purchases. A less familiar
application is one used by law enforcement, where vast amounts of anonymous consumer
data is analyzed looking for combinations of products one would use in bomb-making or the
production of methamphetamine.
2. Sales Forecasting
Sales forecasting is a form of predictive analysis to which businesses are devoting more of
their budgets. Data mining can help businesses project sales and set targets by examining
historical data such as sales records, financial indicators (e.g., consumer price index, inflation
markers etc.), consumer spending habits, sales attributed to a specific time of year, and
trends which may impact standard assumptions about the business.
3. Bioinformatics
Data Mining approaches seem ideally suited for Bioinformatics, since it is data-rich. Mining
biological data helps to extract useful knowledge from massive datasets gathered in biology,
and in other related life sciences areas such as medicine and neuroscience. Applications of
data mining to bioinformatics include gene finding, protein function inference, disease
diagnosis, disease prognosis, disease treatment optimization, protein and gene interaction
network reconstruction, data cleansing, and protein sub-cellular location prediction.
4. Inventory Planning
Data mining can provide businesses with up-to-date information regarding product
inventory, delivery schedules, and production requirements. Data mining also can help
remove some of the uncertainty that comes with simple supply-and-demand issues within
the supply chain. The speed with which data mining can discern patterns and devise
projections helps companies better manage their product stock and operate more
efficiently.
5. Customer Segmentation
Traditional market research may help us to segment customers but data mining goes in deep
and increases market effectiveness. Data mining aids in aligning the customers into a distinct

Dr. Priya Ranjan Priyadarshi


Department of B.Sc. (IT) DSPMU, RANCHI

segment and can tailor the needs according to the customers. Market is always about
retaining the customers. Data mining allows to find a segment of customers based on
vulnerability and the business could offer them with special offers and enhance satisfaction.
6. Customer Relationship Management
Customer Relationship Management is all about acquiring and retaining customers, also
improving customers’ loyalty and implementing customer focused strategies. To maintain a
proper relationship with a customer a business need to collect data and analyze the
information. This is where data mining plays its part. With data mining technologies the
collected data can be used for analysis. Instead of being confused where to focus to retain
customer, the seekers for the solution get filtered results.
7. Healthcare
Mining can be used to predict the volume of patients in every category. Processes are
developed that make sure that the patients receive appropriate care at the right place and
at the right time. Data mining can also help healthcare insurers to detect fraud and abuse.
8. Education
There is a new emerging field, called Educational Data Mining (EDM), concerns with
developing methods that discover knowledge from data originating from educational
Environments. The goals of EDM are identified as predicting students’ future learning
behavior, studying the effects of educational support, and advancing scientific knowledge
about learning. Data mining can be used by an institution to take accurate decisions and also
to predict the results of the student. With the results the institution can focus on what to
teach and how to teach. Learning pattern of the students can be captured and used to
develop techniques to teach them.
9. Intrusion Detection
Any action that will compromise the integrity and confidentiality of a resource is an
intrusion. The defensive measures to avoid an intrusion includes user authentication, avoid
programming errors, and information protection. Data mining can help improve intrusion
detection by adding a level of focus to anomaly detection. It helps an analyst to distinguish
an activity from common everyday network activity. Data mining also helps extract data
which is more relevant to the problem.
10. Criminal Investigation
Criminology is a process that aims to identify crime characteristics. Actually crime analysis
includes exploring and detecting crimes and their relationships with criminals. The high
volume of crime datasets and also the complexity of relationships between these kinds of
data have made criminology an appropriate field for applying data mining techniques. Text
based crime reports can be converted into word processing files. This information can be
used to perform crime matching process.
11. Fraud Detection
While frequently occurring patterns in data can provide teams with valuable insight,
observing data anomalies is also beneficial, assisting companies in detecting fraud. While
this is a well-known use case within banking and other financial institutions, SaaS-based
companies have also started to adopt these practices to eliminate fake user accounts from
their datasets.
12. Operational Optimization
Process mining leverages data mining techniques to reduce costs across operational
functions, enabling organizations to run more efficiently. This practice has helped to identify
costly bottlenecks and improve decision-making among business leaders

Dr. Priya Ranjan Priyadarshi

You might also like