Classification Through Machine Learning Technique: C4.5 Algorithm Based On Various Entropies

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

International Journal of Computer Applications (0975 – 8887)

Volume 82 – No 16, November 2013

Classification Through Machine Learning Technique:


C4.5 Algorithm based on Various Entropies

Seema Sharma Jitendra Agrawal Sanjeev Sharma


School of Information Technology, School of Information Technology, School of Information Technology,
UTD, RGPV, Bhopal ,M.P., India UTD, RGPV, Bhopal ,M.P., India UTD, RGPV, Bhopal ,M.P., India

ABSTRACT grouped together as a cluster [2] [11]. Regression is used to map


Data mining is an interdisciplinary field of computer science data item into a really valuable prediction variable.
and is referred to extracting or mining knowledge from large Classification technique has various algorithms such as decision
amounts of data. Classification is one of the data mining tree, nearest neighbour, genetic algorithm support vector
techniques that maps the data into the predefined classes and machine (SVM) etc. [3]. In which Decision tree algorithm is
groups. It is used to predict group membership for data widely used. In this paper, we examine the C4.5 decision tree
instances. There are many areas that adapt Data mining algorithms based on various entropies and construct the classifier
techniques such as medical, marketing, telecommunications, and that classify the problem based on multiple entropies, involves
stock, health care and so on. The C4.5 can be referred as the the Shannon entropy, Quadratic entropy, havrda and Charvat
statistic Classifier. This algorithm uses gain radio for feature entropy. In the rest of this paper gives machine learning
selection and to construct the decision tree. It handles both technique Concepts in Section 2, related work in section 3,
continuous and discrete features. C4.5 algorithm is widely used experimental Model in section 4. Performance metrics5,
because of its quick classification and high precision. This paper Comparative studies and result are presented in Section 6. The
proposed a C4.5 classifier based on the various entropies paper is concluded in Sections 7
(Shannon Entropy, Havrda and Charvt entropy, Quadratic
entropy) instance of Shannon entropy for classification.
2. MACHINE LEARNING
Experiment results show that the various entropy based approach TECHNIQUES
is effective in achieving a high classification rate. In the context of data mining, learning technique is generally
classified as supervised and unsupervised learning technique
Keywords both belong to machine learning technique. Classification is
Data Mining, Classification technique, Machine learning, supervised learning that focus on the prediction based on known
Decision tree technique, C4.5 algorithm. properties. A Classification task begins with a data set in which
the class assignments are known. If the target or class label has
1. INTRODUCTION numerical values then a predictive model uses. Regression
In the recent year, huge amounts of data being collected and algorithm is not a Classification algorithm. There are many
stored in databases everywhere across the globe mainly come classification algorithms some method are mostly used such as
from information industry and social sites. There are needed to decision tree, Support vector machine, Naive Bayes, KNN etc.
extract and classify useful information and knowledge from
large data. The data mining is most popular knowledge 2.1 Decision tree
acquisition technique, which deals with this problem Data The Decision tree is one of the classification techniques which is
mining (DM) is used to extract the required data from large done by the splitting criteria. The decision tree is a flow chart
databases. It is the process of performing automated extraction like a tree structure that classifies instances by sorting them
and generating the predictive information from large database it based on the feature (attribute) value. Each node in a decision
is actually the process of finding the hidden information or tree represents a feature in an instance to be classified. All
patterns from the repositories [13]. Data mining consists of the branches denote an outcome of the test, each leaf node hold the
various technical approaches including machine learning, class label. The instances are classified from starting based on
statistic, and database system. The goal of the data mining their feature value. Decision tree generates the rule for the
process is to discover knowledge from large databases and classification of the data set. Three basic algorithms are widely
transform into a human understandable format. The DM and used that are ID3, C4.5, and CART. [12] ID3 is an iterative
knowledge discovery are essential component organization due Dichotomer 3. It is an older decision tree algorithm introduced
to its decision making strategy. by Quinlan Ross in 1986 [25]. The basic concept is to make a
decision tree by using the top-down greedy approach. C4.5 is the
Classification, regression and clustering are three approaches of decision tree algorithm generated by Quinlan [24]. It is an
data mining in which instances are grouped into identified extension of ID3 algorithm.. C4.5 algorithm is widely used
classes [13]. Classification is a popular task in data mining because of its quick classification and high precision. CART
especially in knowledge discovery and future plan, it provides stands for Classification Regression Tree introduced by Bremen
the intelligent decision making, classification is not only study [4].The property of CART is that it is able to generate the
and examines the existing sample data but also predicts the regression tree. In Regression tree leaf node contains a real
future behaviour to that sample data. The classification includes number instance of a Class. A decision tree classifier is built in
two phases first is learning process phase in which analysis two phases:
training data, the rule and pattern created. The second phase tests
the data and archive the accuracy of classification patterns. [31]  A growth phase
Clustering approach is based on unsupervised learning because  A prune phase
there are no predefined classes. In this approach data may be

20
International Journal of Computer Applications (0975 – 8887)
Volume 82 – No 16, November 2013

After the preliminary tree has been built that is ‘growth phase', a which we can extend to K class classification by using K two
sub-tree is created with the least estimated error rate, that is the class classifiers. Support vector classifier (SVC) searches hyper
'prune phase'. The process of pruning the preliminary tree plane but SVC is outlined so kernel functions are introduced in
consists of removing small, deep nodes of the tree resulting from order to non line on decision surface.
'noise' contained in the training sample thus decreasing the risk
of 'over fitting' and ensuring in a more precise classification of 2.3.1 SVC
unknown data. Linear SVC used for data classification in which data is linearly
As the decision tree is being built, the goal at each node is to separable. Let w is weight vector, b is base, Xn is the nearest
decide the split attribute (feature) and the split point that best data point. And
divides the training instances belonging to that leaf. The value of .
a split point depends on how well it separates the classes. But A linear classifier not suitable for c class hypothesis. It can
Numerous splitting indices have been proposed in the precedent be used to learn nonlinear decision function space SVM can also
to evaluate the quality of the split. The below fig 1 shows the be extended to learn non-linear decision functions [23]. The
decision tree of weather prediction data base. kernel function allows us to hyper plane without explicitly
perform the calculation. [32]

Fig.1 Decision Tree Fig2 Linear SVC

3. RELATED WORK
2.2 K-Nearest Neighbor
The K-Nearest Neighbor (NN) is the simplest method of 3.1 Decision tree Based Classification and
machine learning. It is a type of instance base learning in which Prediction
object is classified based on the closest training example in the There are growing interest use of the decision tree learning
feature space. It implicitly computes the decision boundary algorithm to very large data set. So today globe various
however it is also possible to compute the decision explicitly. So technology are proposed for improving decision tree. Sonal
the computational complexity of K NN is the function of the Agrawal [1] discussed classification decision tree approach in
boundary complexity [9].The k-NN algorithm is sensitive to the the field of education. Student data from a community college
local structure of the data set. The special case when k = 1 is database has been taken and a variety of classification
called the nearest neighbor algorithm. The best choice approaches have been performed and a comparative analysis has
of k depends upon the data set; larger values of k reduce the been done. The research work shows that the SVM is established
effect of noise on the classification [5] but make boundaries as the best classifier with maximum accuracy and minimum root
between classes less distinct. The various heuristic techniques mean square error (RMSE). A Decision tree approach is
are used to select the optimal value of K. KNN has some proposed which may be taken as a significant basis of selection
strong consistent results. As the infinity approaches to data, the of student during several course programs.
algorithm is guaranteed to yield an error rate less than the Bayes
error rate [5]. HamidahJantan et al. [15] discussed an experimental study to
discover the possible data mining classification technique for
2.3 Support Vector Machine (SVM) talent knowledge acquisition. Talent knowledge discovered from
The support vector machine [SVM] is a training algorithm for related databases can be used to classify the appropriate talent
classification rule from the data set which trains the classifier; it among employees. They use decision tree (C4. 5) Technique, to
is then used to predict the class of the new sample. SVM is based find the pattern of talent performance in the field of human
on the concept of decision planes that define decision boundary resource. The author generates rules by evaluating using the
and point to form the decision boundary between the classes unseen data in order to estimate the accuracy of the prediction
called support vector threat as parameter. SVM is based on the result.
machine learning algorithm, invented by vapnik in 1960‘s. and
Phurivit et al [27] proposed real time intrusion detection
structure risk minimization principle to prevent over fitting.
approach using a supervised machine learning technique. They
There are 2 key implementations of SVM technique:
applied various well known machine learning techniques such as
mathematical programming and kernel function [14]. It finds an
decision tree, back propagations, naive Bayesian classification
optimal hyper plane between data point of different classes in a
and RBF NN to evaluate the performance of IDS. The author
high dimensional space. We are concerned about two class
classification, the classes being P and N for Yn= 1,-1, and by

21
International Journal of Computer Applications (0975 – 8887)
Volume 82 – No 16, November 2013

showed experimental results of the decision tree technique can value.C4.5 is one of the decision tree algorithms generated by
outperform of IDS approach. Quinlan [24]. It is an extension of ID3 algorithm. C4.5 uses the
Shannon Entropy.
S. Ravikumar et al [26] discussed the machine learning approach
in the field of inspection of machine component. This problem Here, Shannon Entropy has been used to find information gain in
includes image acquisition pre-processing, feature selection and C4.5 algorithm to calculate the Information Gain ratio contained
classification the author used Naive Baye algorithm and decision by the data, which helps to make Decision Tree and to predict
tree algorithm. And results showed that the accuracy of C4.5 is the targets. However, the results obtained from Shannon
better than the other technique. Entropy, are rather complex. Therefore, to minimize these
problems we use other entropy such as R´enyi entropy, quadratic
BahaSen [28] they developed models to predict secondary entropy, Havrda and Charvt entropy, and Taneja entropy instead
education placement test results, and using sensitivity analysis of Shannon Entropy.
on those prediction models they find out the most important
predictors. They proposed a decision tree algorithm for analysis The architecture of the experimental model classification
of the success factor behind placement test that may help method has been divided into three phases
understand potentially improve the achievement. First Phase: Pre-Processing of data.
Juan de ona et al [7] discusses road accidents, analysts many Second Phase In the second phase we apply various entropies to
road safety is to identify the main factors that give to crash find the information gain ratio.
severity. They applied decision tree approach in the road safety a) Shannon Entropy
field. The author's analysis the various decision tree algorithms b) Havrda and Charvt entropy
and extracting the decision rule for road safety. c) Quadratic entropy
Jung Min Kim [16] studies, figure out what relationship the d) R´enyi entropy
elements of meteorological changes have with the incidence of e)Taneja entropy
the five aggressive crimes during data mining. An analysis was C4.5 algorithm is built in this phase based on the above
made by the C4.5 algorithm of decision tree to verify what entropies.
crimes occur according to the elements of the climate change.

3.2 Different entropy based C4.5 Algorithm Third Phase: Output.


for classification
Jiang Su and Harry Zhang [30] have discussed the decision tree The classification includes a 3 step process
method and also proposed a fast decision tree. They build a tree
based on the conditional dependence assumption. Author shows 1) Model construction (Learning)
that the performance and accuracy of the new approach is better
than the C4.5 and less complexity as compare to C4.5 decision 2) Model Evaluation (Accuracy)
tree
3) Model Use (Classification)
In 2008 Tomasz Maszczyll and WlodzislawDuch [19] modify
the C4.5 algorithm based on Tasallis and renyi entropy. After
comparative analysis author is found that the modified C4.5
algorithm is better than the Shannon entropy based C4.5
algorithm. At basic of the decision tree algorithm ID3 are very
famous and easy to classify but if classifying attribute which
have many values then this algorithm are not beneficial.
Christiane FerreriaLemos Lima et al [18] describes the
comparative study of the use of Shannon, Renyi and Tsallis
entropies for designing decision tree. The goal of that paper is to
find more efficient alternative entropy for the intrusion Tolerant
system .The author show, the resultant tsallis and renyi entropy
can be used to construct more compact and efficient decision
tree. Fig 3 Classification Steps

In 2011 Mosonyi et al [21] proposed the Quantum Renyi All three steps are illustrated in fig 3. The first step is also
Relative formula and relative capacity formula. The Shannon known as learning. In this step learning is done on training
entropy is sensitive to noise sample and doesn’t work well in sample data and construction of the model bases of learning is
real work applications. So introduce the other measures of done. In the second step, the accuracy of model bases on test
feature quality called the Rank mutual information. data is found. This phase treats as the testing phase. After testing
phase, the accuracy of each model, used to classify the data set,
Qinghua Hu et al [13] proposed the rank entropy based decision is found. All these steps are essential for classification.
tree for monotonic classification. They apply rank mutual
information which combined with the Shannon entropy. Author 4.1 C4.5 algorithm
shows that if the training sample is consisting monotonically C4.5 is based on the information gain ratio that is evaluated by
then performance is still good with presence of noisy data. entropy. The information gain ratio measure is used to select the
test features at each node in the tree. Such a measure is referred
4. EXPERIMENTAL SETUP to as a feature (attribute) selection measure. The attribute with
Decision tree uses as a predictive model in which it maps the the highest information gain ratio is chosen as the test feature for
observations about an item to conclusions about the item's target the current node. Let D be a set consisting of (D1… Dj) data

22
International Journal of Computer Applications (0975 – 8887)
Volume 82 – No 16, November 2013

instances. Suppose the class label attribute has m distinct values 1. The Shannon entropy H (X) of a random variable X
defining m distinct classes, Ci(for I = 1,…,m). Let Djbe the with a discrete probability distribution
number of samples of D in class Ci. The expected information P (i) = p1, p2, p3, p4…….. pk is given by:
needed to classify a given sample is given by
log2p(i) (4.5)

SplitinfoA(D)=- Σ (|Dj|/|D|)*log ((|Dj|/|D|)) (4.1)


Gain ratio(A)=Gain(A)\splitinfoA(D) (4.2) 2. 'Quadratic entropy' was initiated by I. Vajda. This
entropy was first used in theoretical physics by Fermi. Consider
Where the finite discrete random variable X with a complete probability
collection.
(4.6)

Gain= Info (D) - InfoA (D) (4.4)

Quadratic entropy is defined by the expression


Info(D)=- Σ Pi log2(Pi) and (4.7)
(4.3)

InfoA(D)=- Σ (|Dj|/|D|)* Info(Dj)


3. Havrda and Charvat's entropy [13] brings a further
measure of entropy of the finite discrete random variable X with
Where pi = probability of distinct class Ci,D =data Set, A=Sub
a complete probability mass function. The
attribute from attribute, (|Dj|/|D|)=act as weight of jthpartition. In
other words, Gain (A) is the expected reduction in entropy
Havrda and Charvat entropy of order α (α >0; α ≠ 1) is defined
caused by knowing the value of feature A.
by the expression
Algorithm for Experimental Model
4. Taneja entropy was introduced by I.J. Taneja in 1975
Input: dataset. [29].A new formula for order c was proposed.
Output: classified output. α
(4.8)
α
1. Take a data set as input.
2. If that set has more features then apply the feature selection (4.9)
technique (PCA) as pre-processing technique α

3. Apply parallelism from step 4 to step 6.


4. Evaluate the entropy value and information gain ratio of all 5. R´enyi entropy [6] is parameterized according to the
three entropies (Shannon, havrda and Charvat’s entropy and value of α, where α> 0 and α≠ 1, having Shannon entropy as the
quadratic entropy). limit case. R´enyi entropy is given by:
5. Construct the models separately using c4.5 algorithm based (4.10)
α
on various entropies. α
α
6. Find the accuracy and execution time of each model and
store the value in array. Where And α α
7. Find a model that has maximum Accuracy. α is a constant In this paper take α=0.25.
8. If two have maximum accuracy then 5. PERFORMANCE METRICS
9. Find a minimum execution time of the model that has
maximum accuracy. 5.1 Mean Squared Error
The mean squared error is probably the most important criterion
10. Classify by that model which has minimum execution
used to evaluate the performance of a predictor. MSE measures
time. the average of the squares of the errors. If y is a vector of n
11. Else classification done by the model which has maximum predictions, and y is the vector of the true values, then the MSE
accuracy. of the predictor is:
12. End

MSE= (5 .1)
4.2 Various Entropies
Entropy is a method of measuring randomness or uncertainty in
a given set of data. For calculating the entropy of the file, our 5.2 Cross validation
data set is a sequence of bytes in the file [22]. In the C4.5
Cross-validation, also known as rotation estimation model
algorithm information gain is derived from entropy. There are
[8][10][17]. This is a validation technique for measuring how the
various types of entropies used in C4. 5 classification algorithms
results of a statistical analysis will generalize to an independent
which are follow:
data set. The fundamental approach, called k-fold CV, the
training set is dividing into k smaller sets. The following is
process is followed for each of the k “folds”:

23
International Journal of Computer Applications (0975 – 8887)
Volume 82 – No 16, November 2013

 A model is trained using of the folds as training


data;
 The resulting model is validated on the remaining part of Table 1. Properties Of Data Sets
the data
In this research work uses 5-fold cross validation for testing.

5.3 Ttest2 Data set Type of Origin No. of No. of No. of


use Instanc Featu Class
Ttest2 performs a t-test of the null hypothesis of x, y data that
e re es
are independent random samples from normal distributions with
equal means and equal but unidentified variances against the
alternative that the means are not equal. The end result of the test Iris Classific Real 150 4 3
is returned in h. x and y need not be vectors of the same length plants ation world
and y can also be N-dimensional arrays or matrices. N-
Breast Classific Real 277 9 2
dimensional arrays x and y must have the same. The test treats
Cancer ation world
Nan values as missing data, and ignores them.
[h, p, ci, stats] = ttest2 (x, y) (5.2) Blood
Transfusi Classific Business 748 5 2
 h = 1 signifies a rejection of the null hypothesis at the 5% on ation
significance level. h = 0 signifies a failure to reject the null Service
hypothesis at the 5% significance level Center
 The p value is the probability, under the null hypothesis, of Data Set
observing a value as extreme or more extreme of the test
statistic Wine Classific Real 1599 11 11
 ci is 100* (1– alpha) % confidence interval on the Quality ation , world (red)
difference of population means regressio 4898(wi
 stats have the following fields: n ne
tstat — Value of the test statistic , df — Degrees of
Yeast Classific Real 1484 8 10
freedom of the test , sd — Pooled sample standard
ation world
deviation or a vector with the sample standard
deviations Pima Classific Real 768 8 2
6. EXPERIMENTAL RESULT Indians ation world
The main objective of this paper is to classify the dataset. We Diabetes
have eight standard datasets. As shown in Table 1, Bases on this
data set valuate the accuracy, MSE of C4.5 algorithm based on Thyroid Classific Real 7200 21 3
different entropies shown in table 2,3 respectively. In this paper Disease ation world
also comparative analysis among three entropies (Shannon,
quadratic and Havrda & Charvt) based C4.5, C4.5 algorithm, Ozone Classific Real 2536 73 2
KNN and SVM. Result has in table 4, fig 3 shows the Level ation world
graphically representation of output. And also apply the ttest2 on Detection
all method on six data sets.

6.1 Test Dataset 6.2 Accuracy of Different Methods


For evaluating the all eight C4.5 method based on various There are five classification method testes by 5 fold cross
entropy; eight real world dataset were considered validation on the eight data set table 2 Show the accuracy of
each method on each data set and graphically shows by fig4.

24
International Journal of Computer Applications (0975 – 8887)
Volume 82 – No 16, November 2013

There are five classification method testes by 5 fold cross


validation on the eight data set. In Section evaluates the Mean
C4.5 Squared Error of Different Methods, table 3 shows the MSE
C4.5 C4.5 of each method on each data. Another work in this thesis is to
based
C4.5 on C4.5 based based Based built 3 entropies Shannon, Havrda and Charvt, Quadratic)
based on on on on Based C4.5 that apply all three entropy in parallel for
Havrda
Data set Shannon & Quadratic Renyi Taneja building model and take best of one for Classify data. This
Entropy Charvt entropy entropy entropy
technique compares with other machine learning techniques
entropy such as C4.5- algorithm, SVM (support vector machine),
Iris 89.33% 92.67% 93.33% 90.00% 89.33% KNN (K-Nearest Neighbor) etc. Table 4 shows the accuracy
of all machines learning technique.
Breast
cancer 63.97% 93.09% 90.73% 63.82% 37.88% Table 3. Mean Squared Error of Different Method

Blood
transfer 78.93% 75.44% 78.79% 76.24% 73.83%
Data Quadrat
Shannon Havrda
Wine ic R´enyi Taneja
quality 54.67% 55.67% 55.68% 33.60% 44.26% Set Entropy and
based entropy entrop entropy
Charvt based y based based
Yeast 33.58% 36.76% 29.53% 22.90% 8.24%
C4.5 entropy C4.5 C4.5 C4.5
based
Pima 72.42% 66.93% 70.33% 65.23% 66.40% C4.5
Thyroid 97.35% 99.08% 99.10% 93.37% 92.58% Iris 0.0033 0.0022 0.0017 0.0036 0.0033
ozon Breast
layer 97.12% 97.04% 97.12% 97.12% 97.12%
cancer 0.0357 0.0023 0.0105 0.0277 0.0154
Mean 69.90% 73.72% 73.05% 63.58% 64.56%
Blood
Table 2. Accuracy of Different Methods transfer 0.0109 0.0153 0.011 0.0148 0.0183

Wine
quality 0.0414 0.0396 0.0394 0.0885 0.0639

Yeast 0.0888 0.081 0.0997 0.1204 0.1714

Pima 0.0157 0.0225 0.0184 0.0248 0.0233

Thiried 0.0001 0 0 0.0009 0.0011

ozon
layer 0.00024 0.00025 0.00024 0.00024 0.00024

Mean 0.02451 0.02039 0.02261 0.03511 0.03711

6.4 Testing
Ttest2 performs on above machine learning algorithm. These
tests carry out a t-test of the null hypothesis of x, y data that are
independent random samples. The ttest carry out by equation
(6.1) that is

[h, p, ci, stats] = ttest2 (x, y) (6.1)

Fig.4 Accuracy Chart of Different Entropies Based C4.5 The result of ttest 2 is shown in table 5 .that consist degree of
Algorithm freedom, standard derivation value, confidential interval
probability
The below fig 4 show 5 different entropy based C4.5 methods
for classification on eight different data sets that are , breast
cancer , blood transfer , wine Quality yeast ,pima ,Thyroid,,
ozon layer. This graphical representation show accuracy of each
method on all data set. This graph show that Havrda and Charvat
based C4.5 algorithm has maximum accuracy among all method
that described above

6.3 Mean Squared Error of Different


Methods

25
International Journal of Computer Applications (0975 – 8887)
Volume 82 – No 16, November 2013

Table 4 Accuracy of Machine Learning Algorithms Table 5. Ttest-2 of Machine Learning Techniques

Sample Proba Confide Test Degree Stand


Data C4.5 C4.5 based SVM KNN Data bility statistic
set on various nce of ard
(p) interval freedo deviati
entropy
(CI) m on
Iris 89.33% 93.33% 92.00% 90.67%
X Y
Breast 63.97% 93.09% 94.85% 57.50% -0.2447,
cancer U V 0.6497 0.3749 0.4682 10 0.2408
Wine 54.67% 55.68% 19.69% 42.57% -0.3140,
quality U W 0.7928 0.4005 0.2699 10 0.2777

Yeast 33.58% 36.76% 50.14% 50.47% -0.223,


U Z 0.5893 0.3719 0.5578 10 0.2312
Pima 72.42% 72.42% 73.46% 70.46%
-0.3703,
thiried 97.35% 99.10% 94.28% 94.03% V W 0.8917 .3266 -0.1396 10 0.2709
Mean 68.55% 75.06% 70.74% 67.62% -0.2775,
V Z 0.943 0.2963 0.0727 10 0.223

-0.3064,
W Z 0.841 0.3688 0.2059 10 0.2624

7. CONCLUSION
This experiment is performed over eight real datasets using the
five methods namely C4.5 decision tree algorithm based on
Shannon Entropy, C4.5 decision tree algorithm based on Havrda
and Charvt entropy, C4.5 decision tree algorithm based on
Quadratic entropy, C4.5 decision tree algorithm based on R´enyi
entropy and C4.5 decision tree algorithm based on Taneja
entropy. As shown in table 5, accuracy of Experimental Method
based on three entropies is better than C4.5 algorithm. This
paper also shows that comparative analysis between machine
learning shown in above table.
Entropy Computation is used to create compact decision trees
with successful classification. The size of the decision tree, the
performance of the classifier is based on the entropy calculation.
So the most precise entropy can be applied to the particular
classification problem. The different entropies based approach
can be applied in any classification problem. Such as detecting
faults in industrial application, Medical diagnosis, loan approval,
pattern recognition, classifying market trends etc. This thesis is a
comparative study based on Shannon, R´enyi, quadratic, Havrda
and Charvt, Taneja entropy and it also builds a model that takes
Shannon, quadratic, and Havrda and Charvt entropy in parallel
Fig.5 Mean Accuracy of Machine Learning Algorithm and produce more precise classification for data set and a result
of this classification is comparable with the other machining
learning techniques. This entropy based approach can be applied
in real world classification problems.

8. REFERENCES
[1] Agarwal, S., Pandey, G. N., & Tiwari, M. D. Data Mining
in Education: Data Classification and Decision Tree
Approach.
[2] Merceron, A., & Yacef, K. (2005, May). Educational Data
Mining: a Case Study. In AIED (pp. 467-474).
[3] Bakar, A. A., Othman, Z. A., & Shuib, N. L. M. (2009,
October). Building a new taxonomy for data discretization
techniques. In Data Mining and Optimization, 2009.
DMO'09. 2nd Conference on (pp. 132-140). IEEE.

26
International Journal of Computer Applications (0975 – 8887)
Volume 82 – No 16, November 2013

[4] Burrows, W. R., Benjamin, M., Beauchamp, S., Lord, E. R., [19] Maszczyk, T., & Duch, W. (2008). Comparison of
McCollor, D., & Thomson, B. (1995). CART decision-tree Shannon, Renyi and Tsallis entropy used in decision trees.
statistical analysis and prediction of summer season In Artificial Intelligence and Soft Computing–ICAISC
maximum surface ozone for the Vancouver, Montreal, and 2008 (pp. 643-651). Springer Berlin Heidelberg.
Atlantic regions of Canada. Journal of applied
meteorology, 34(8), 1848-1862. [20] Mathur, N., Kumar, S., Kumar, S., & Jindal, R. The Base
Strategy for ID3 Algorithm of Data Mining Using Havrda
[5] Cover, T., & Hart, P. (1967). Nearest neighbor pattern and Charvat Entropy Based on Decision Tree.
classification. Information Theory, IEEE Transactions
on, 13(1), 21-27. [21] Mosonyi, M., & Hiai, F. (2011). On the quantum Renyi
relative entropies and related capacity
[6] Dasarathy, B. V. (1980). Nosing around the neighborhood: formulas. Information Theory, IEEE Transactions
A new system structure and classification rule for on, 57(4), 2474-2487.
recognition in partially exposed environments. Pattern
Analysis and Machine Intelligence, IEEE Transactions on, [22] Pareek, H., Eswari, P. R. L., Babu, N. S. C., & Bangalore,
(1), 67-71. C. D. A. C. (2013). Entropy and n-gram Analysis of
Malicious PDF Documents. International Journal of
[7] de Oña, J., López, G., & Abellán, J. (2012). Extracting Engineering, 2(2).
decision rules from police accident reports through decision
trees. Accident Analysis & Prevention. [23] Quinlan, J. R. (1987). Simplifying decision trees.
International journal of man-machine studies,27(3), 221-
[8] Devijver, P. A., & Kittler, J. (1982). Pattern recognition: A 234.
statistical approach(p. 448). Englewood Cliffs, NJ:
Prentice/Hall International. [24] Quinlan, J. R. (1993). C4. 5: programs for machine
learning (Vol. 1). Morgan kaufmann.
[9] Everitt,B. S., Landau, S., Leese, M.,& Stahl,
D.Miscellaneous [25] Quinlan, J. R. (1986). Induction of decision trees. Machine
learning, 1(1), 81-106.
Clustering Methods. Cluster Analysis, 5th Edition, 215-255.
[26] Ravikumar, S., Ramachandran, K. I., & Sugumaran, V.
[10] Geisser, S. (1993). Predictive interference: an (2011). Machine learning approach for automated visual
introduction (Vol. 55).CRC Press. inspection of machine components.Expert Systems with
Applications, 38(4), 3260-3266.
[11] Han, J., Kamber, M., & Pei, J. (2006). Data mining:
concepts and techniques. Morgan kaufmann. [27] Sangkatsanee, P., Wattanapongsakorn, N., &
Charnsripinyo, C. (2011). Practical real-time intrusion
[12] Horton, P., & Nakai, K. (1996, June). A probabilistic detection using machine learning approaches.Computer
classification system for predicting the cellular localization Communications, 34(18), 2227-2235.
sites of proteins. In Ismb (Vol. 4, pp. 109-115).
[28] Şen, B., Uçar, E., & Delen, D. (2012). Predicting &
[13] Havrda, J., & Charvát, F. (1967). Quantification method of analyzing secondary education placement-test scores: A
classification processes. Concept of structural $ a $- data mining approach. Expert Systems with
entropy. Kybernetika, 3(1), 30-35. Applications, 39(10), 9468-76.
[14] James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). [29] Sharma, B. D., & Taneja, I. J. (1975). Entropy of type (α, β)
Support Vector Machines. In An Introduction to Statistical and other generalized measures in information
Learning (pp. 337-372). Springer New York. theory. Metrika, 22(1), 205-215.
[15] Jantan, H., Hamdan, A. R., & Othman, Z. A. (2011, June). [30] Su, J., & Zhang, H. (2006, July). A fast decision tree
Talent knowledge acquisition using data mining learning algorithm. InProceedings of the National
classification techniques. In Data Mining and Optimization Conference on Artificial Intelligence (Vol. 21, No. 1, p.
(DMO), 2011 3rd Conference on (pp. 32-37). IEEE. 500). Menlo Park, CA; Cambridge, MA; London; AAAI
[16] Kim, J. M., Ahn, H. K., & Lee, D. H. (2013). A Study on Press; MIT Press; 1999.
the Occurrence of Crimes Due to Climate Changes Using [31] Jin, C., De-lin, L.,& Fen-xiang, M. (2009, July).An
Decision Tree. In IT Convergence and Security 2012 (pp. improved ID3 decision tree algorithm. In Computer Science
1027-1036). Springer Netherlands. &education 2009. 4th International Conference on (pp.
[17] Kohavi, R. (1995, August). A study of cross-validation and 127-130). IEEE.
bootstrap for accuracy estimation and model selection. [32] Balagatabi, Z. N., & Balagatabi, H. N. (2013). Comparison
In IJCAI (Vol. 14, No. 2, pp. 1137-1145). of Decision Tree and SVM Methods in Classification of
[18] Lima, C. F. L., de Assis, F. M., & de Souza, C. P. (2010, Researcher's Cognitive Styles in Academic
May). Decision tree based on shannon, renyi and tsallis Environment. Indian Journal of Automation and Artificial
entropies for intrusion tolerant systems. InInternet Intelligence, 1(1),31-43
Monitoring and Protection (ICIMP), 2010 Fifth
International Conference on (pp. 117-122). IEEE.

IJCATM: www.ijcaonline.org 27

You might also like