Classification Through Machine Learning Technique: C4.5 Algorithm Based On Various Entropies
Classification Through Machine Learning Technique: C4.5 Algorithm Based On Various Entropies
Classification Through Machine Learning Technique: C4.5 Algorithm Based On Various Entropies
20
International Journal of Computer Applications (0975 – 8887)
Volume 82 – No 16, November 2013
After the preliminary tree has been built that is ‘growth phase', a which we can extend to K class classification by using K two
sub-tree is created with the least estimated error rate, that is the class classifiers. Support vector classifier (SVC) searches hyper
'prune phase'. The process of pruning the preliminary tree plane but SVC is outlined so kernel functions are introduced in
consists of removing small, deep nodes of the tree resulting from order to non line on decision surface.
'noise' contained in the training sample thus decreasing the risk
of 'over fitting' and ensuring in a more precise classification of 2.3.1 SVC
unknown data. Linear SVC used for data classification in which data is linearly
As the decision tree is being built, the goal at each node is to separable. Let w is weight vector, b is base, Xn is the nearest
decide the split attribute (feature) and the split point that best data point. And
divides the training instances belonging to that leaf. The value of .
a split point depends on how well it separates the classes. But A linear classifier not suitable for c class hypothesis. It can
Numerous splitting indices have been proposed in the precedent be used to learn nonlinear decision function space SVM can also
to evaluate the quality of the split. The below fig 1 shows the be extended to learn non-linear decision functions [23]. The
decision tree of weather prediction data base. kernel function allows us to hyper plane without explicitly
perform the calculation. [32]
3. RELATED WORK
2.2 K-Nearest Neighbor
The K-Nearest Neighbor (NN) is the simplest method of 3.1 Decision tree Based Classification and
machine learning. It is a type of instance base learning in which Prediction
object is classified based on the closest training example in the There are growing interest use of the decision tree learning
feature space. It implicitly computes the decision boundary algorithm to very large data set. So today globe various
however it is also possible to compute the decision explicitly. So technology are proposed for improving decision tree. Sonal
the computational complexity of K NN is the function of the Agrawal [1] discussed classification decision tree approach in
boundary complexity [9].The k-NN algorithm is sensitive to the the field of education. Student data from a community college
local structure of the data set. The special case when k = 1 is database has been taken and a variety of classification
called the nearest neighbor algorithm. The best choice approaches have been performed and a comparative analysis has
of k depends upon the data set; larger values of k reduce the been done. The research work shows that the SVM is established
effect of noise on the classification [5] but make boundaries as the best classifier with maximum accuracy and minimum root
between classes less distinct. The various heuristic techniques mean square error (RMSE). A Decision tree approach is
are used to select the optimal value of K. KNN has some proposed which may be taken as a significant basis of selection
strong consistent results. As the infinity approaches to data, the of student during several course programs.
algorithm is guaranteed to yield an error rate less than the Bayes
error rate [5]. HamidahJantan et al. [15] discussed an experimental study to
discover the possible data mining classification technique for
2.3 Support Vector Machine (SVM) talent knowledge acquisition. Talent knowledge discovered from
The support vector machine [SVM] is a training algorithm for related databases can be used to classify the appropriate talent
classification rule from the data set which trains the classifier; it among employees. They use decision tree (C4. 5) Technique, to
is then used to predict the class of the new sample. SVM is based find the pattern of talent performance in the field of human
on the concept of decision planes that define decision boundary resource. The author generates rules by evaluating using the
and point to form the decision boundary between the classes unseen data in order to estimate the accuracy of the prediction
called support vector threat as parameter. SVM is based on the result.
machine learning algorithm, invented by vapnik in 1960‘s. and
Phurivit et al [27] proposed real time intrusion detection
structure risk minimization principle to prevent over fitting.
approach using a supervised machine learning technique. They
There are 2 key implementations of SVM technique:
applied various well known machine learning techniques such as
mathematical programming and kernel function [14]. It finds an
decision tree, back propagations, naive Bayesian classification
optimal hyper plane between data point of different classes in a
and RBF NN to evaluate the performance of IDS. The author
high dimensional space. We are concerned about two class
classification, the classes being P and N for Yn= 1,-1, and by
21
International Journal of Computer Applications (0975 – 8887)
Volume 82 – No 16, November 2013
showed experimental results of the decision tree technique can value.C4.5 is one of the decision tree algorithms generated by
outperform of IDS approach. Quinlan [24]. It is an extension of ID3 algorithm. C4.5 uses the
Shannon Entropy.
S. Ravikumar et al [26] discussed the machine learning approach
in the field of inspection of machine component. This problem Here, Shannon Entropy has been used to find information gain in
includes image acquisition pre-processing, feature selection and C4.5 algorithm to calculate the Information Gain ratio contained
classification the author used Naive Baye algorithm and decision by the data, which helps to make Decision Tree and to predict
tree algorithm. And results showed that the accuracy of C4.5 is the targets. However, the results obtained from Shannon
better than the other technique. Entropy, are rather complex. Therefore, to minimize these
problems we use other entropy such as R´enyi entropy, quadratic
BahaSen [28] they developed models to predict secondary entropy, Havrda and Charvt entropy, and Taneja entropy instead
education placement test results, and using sensitivity analysis of Shannon Entropy.
on those prediction models they find out the most important
predictors. They proposed a decision tree algorithm for analysis The architecture of the experimental model classification
of the success factor behind placement test that may help method has been divided into three phases
understand potentially improve the achievement. First Phase: Pre-Processing of data.
Juan de ona et al [7] discusses road accidents, analysts many Second Phase In the second phase we apply various entropies to
road safety is to identify the main factors that give to crash find the information gain ratio.
severity. They applied decision tree approach in the road safety a) Shannon Entropy
field. The author's analysis the various decision tree algorithms b) Havrda and Charvt entropy
and extracting the decision rule for road safety. c) Quadratic entropy
Jung Min Kim [16] studies, figure out what relationship the d) R´enyi entropy
elements of meteorological changes have with the incidence of e)Taneja entropy
the five aggressive crimes during data mining. An analysis was C4.5 algorithm is built in this phase based on the above
made by the C4.5 algorithm of decision tree to verify what entropies.
crimes occur according to the elements of the climate change.
In 2011 Mosonyi et al [21] proposed the Quantum Renyi All three steps are illustrated in fig 3. The first step is also
Relative formula and relative capacity formula. The Shannon known as learning. In this step learning is done on training
entropy is sensitive to noise sample and doesn’t work well in sample data and construction of the model bases of learning is
real work applications. So introduce the other measures of done. In the second step, the accuracy of model bases on test
feature quality called the Rank mutual information. data is found. This phase treats as the testing phase. After testing
phase, the accuracy of each model, used to classify the data set,
Qinghua Hu et al [13] proposed the rank entropy based decision is found. All these steps are essential for classification.
tree for monotonic classification. They apply rank mutual
information which combined with the Shannon entropy. Author 4.1 C4.5 algorithm
shows that if the training sample is consisting monotonically C4.5 is based on the information gain ratio that is evaluated by
then performance is still good with presence of noisy data. entropy. The information gain ratio measure is used to select the
test features at each node in the tree. Such a measure is referred
4. EXPERIMENTAL SETUP to as a feature (attribute) selection measure. The attribute with
Decision tree uses as a predictive model in which it maps the the highest information gain ratio is chosen as the test feature for
observations about an item to conclusions about the item's target the current node. Let D be a set consisting of (D1… Dj) data
22
International Journal of Computer Applications (0975 – 8887)
Volume 82 – No 16, November 2013
instances. Suppose the class label attribute has m distinct values 1. The Shannon entropy H (X) of a random variable X
defining m distinct classes, Ci(for I = 1,…,m). Let Djbe the with a discrete probability distribution
number of samples of D in class Ci. The expected information P (i) = p1, p2, p3, p4…….. pk is given by:
needed to classify a given sample is given by
log2p(i) (4.5)
MSE= (5 .1)
4.2 Various Entropies
Entropy is a method of measuring randomness or uncertainty in
a given set of data. For calculating the entropy of the file, our 5.2 Cross validation
data set is a sequence of bytes in the file [22]. In the C4.5
Cross-validation, also known as rotation estimation model
algorithm information gain is derived from entropy. There are
[8][10][17]. This is a validation technique for measuring how the
various types of entropies used in C4. 5 classification algorithms
results of a statistical analysis will generalize to an independent
which are follow:
data set. The fundamental approach, called k-fold CV, the
training set is dividing into k smaller sets. The following is
process is followed for each of the k “folds”:
23
International Journal of Computer Applications (0975 – 8887)
Volume 82 – No 16, November 2013
24
International Journal of Computer Applications (0975 – 8887)
Volume 82 – No 16, November 2013
Blood
transfer 78.93% 75.44% 78.79% 76.24% 73.83%
Data Quadrat
Shannon Havrda
Wine ic R´enyi Taneja
quality 54.67% 55.67% 55.68% 33.60% 44.26% Set Entropy and
based entropy entrop entropy
Charvt based y based based
Yeast 33.58% 36.76% 29.53% 22.90% 8.24%
C4.5 entropy C4.5 C4.5 C4.5
based
Pima 72.42% 66.93% 70.33% 65.23% 66.40% C4.5
Thyroid 97.35% 99.08% 99.10% 93.37% 92.58% Iris 0.0033 0.0022 0.0017 0.0036 0.0033
ozon Breast
layer 97.12% 97.04% 97.12% 97.12% 97.12%
cancer 0.0357 0.0023 0.0105 0.0277 0.0154
Mean 69.90% 73.72% 73.05% 63.58% 64.56%
Blood
Table 2. Accuracy of Different Methods transfer 0.0109 0.0153 0.011 0.0148 0.0183
Wine
quality 0.0414 0.0396 0.0394 0.0885 0.0639
ozon
layer 0.00024 0.00025 0.00024 0.00024 0.00024
6.4 Testing
Ttest2 performs on above machine learning algorithm. These
tests carry out a t-test of the null hypothesis of x, y data that are
independent random samples. The ttest carry out by equation
(6.1) that is
Fig.4 Accuracy Chart of Different Entropies Based C4.5 The result of ttest 2 is shown in table 5 .that consist degree of
Algorithm freedom, standard derivation value, confidential interval
probability
The below fig 4 show 5 different entropy based C4.5 methods
for classification on eight different data sets that are , breast
cancer , blood transfer , wine Quality yeast ,pima ,Thyroid,,
ozon layer. This graphical representation show accuracy of each
method on all data set. This graph show that Havrda and Charvat
based C4.5 algorithm has maximum accuracy among all method
that described above
25
International Journal of Computer Applications (0975 – 8887)
Volume 82 – No 16, November 2013
Table 4 Accuracy of Machine Learning Algorithms Table 5. Ttest-2 of Machine Learning Techniques
-0.3064,
W Z 0.841 0.3688 0.2059 10 0.2624
7. CONCLUSION
This experiment is performed over eight real datasets using the
five methods namely C4.5 decision tree algorithm based on
Shannon Entropy, C4.5 decision tree algorithm based on Havrda
and Charvt entropy, C4.5 decision tree algorithm based on
Quadratic entropy, C4.5 decision tree algorithm based on R´enyi
entropy and C4.5 decision tree algorithm based on Taneja
entropy. As shown in table 5, accuracy of Experimental Method
based on three entropies is better than C4.5 algorithm. This
paper also shows that comparative analysis between machine
learning shown in above table.
Entropy Computation is used to create compact decision trees
with successful classification. The size of the decision tree, the
performance of the classifier is based on the entropy calculation.
So the most precise entropy can be applied to the particular
classification problem. The different entropies based approach
can be applied in any classification problem. Such as detecting
faults in industrial application, Medical diagnosis, loan approval,
pattern recognition, classifying market trends etc. This thesis is a
comparative study based on Shannon, R´enyi, quadratic, Havrda
and Charvt, Taneja entropy and it also builds a model that takes
Shannon, quadratic, and Havrda and Charvt entropy in parallel
Fig.5 Mean Accuracy of Machine Learning Algorithm and produce more precise classification for data set and a result
of this classification is comparable with the other machining
learning techniques. This entropy based approach can be applied
in real world classification problems.
8. REFERENCES
[1] Agarwal, S., Pandey, G. N., & Tiwari, M. D. Data Mining
in Education: Data Classification and Decision Tree
Approach.
[2] Merceron, A., & Yacef, K. (2005, May). Educational Data
Mining: a Case Study. In AIED (pp. 467-474).
[3] Bakar, A. A., Othman, Z. A., & Shuib, N. L. M. (2009,
October). Building a new taxonomy for data discretization
techniques. In Data Mining and Optimization, 2009.
DMO'09. 2nd Conference on (pp. 132-140). IEEE.
26
International Journal of Computer Applications (0975 – 8887)
Volume 82 – No 16, November 2013
[4] Burrows, W. R., Benjamin, M., Beauchamp, S., Lord, E. R., [19] Maszczyk, T., & Duch, W. (2008). Comparison of
McCollor, D., & Thomson, B. (1995). CART decision-tree Shannon, Renyi and Tsallis entropy used in decision trees.
statistical analysis and prediction of summer season In Artificial Intelligence and Soft Computing–ICAISC
maximum surface ozone for the Vancouver, Montreal, and 2008 (pp. 643-651). Springer Berlin Heidelberg.
Atlantic regions of Canada. Journal of applied
meteorology, 34(8), 1848-1862. [20] Mathur, N., Kumar, S., Kumar, S., & Jindal, R. The Base
Strategy for ID3 Algorithm of Data Mining Using Havrda
[5] Cover, T., & Hart, P. (1967). Nearest neighbor pattern and Charvat Entropy Based on Decision Tree.
classification. Information Theory, IEEE Transactions
on, 13(1), 21-27. [21] Mosonyi, M., & Hiai, F. (2011). On the quantum Renyi
relative entropies and related capacity
[6] Dasarathy, B. V. (1980). Nosing around the neighborhood: formulas. Information Theory, IEEE Transactions
A new system structure and classification rule for on, 57(4), 2474-2487.
recognition in partially exposed environments. Pattern
Analysis and Machine Intelligence, IEEE Transactions on, [22] Pareek, H., Eswari, P. R. L., Babu, N. S. C., & Bangalore,
(1), 67-71. C. D. A. C. (2013). Entropy and n-gram Analysis of
Malicious PDF Documents. International Journal of
[7] de Oña, J., López, G., & Abellán, J. (2012). Extracting Engineering, 2(2).
decision rules from police accident reports through decision
trees. Accident Analysis & Prevention. [23] Quinlan, J. R. (1987). Simplifying decision trees.
International journal of man-machine studies,27(3), 221-
[8] Devijver, P. A., & Kittler, J. (1982). Pattern recognition: A 234.
statistical approach(p. 448). Englewood Cliffs, NJ:
Prentice/Hall International. [24] Quinlan, J. R. (1993). C4. 5: programs for machine
learning (Vol. 1). Morgan kaufmann.
[9] Everitt,B. S., Landau, S., Leese, M.,& Stahl,
D.Miscellaneous [25] Quinlan, J. R. (1986). Induction of decision trees. Machine
learning, 1(1), 81-106.
Clustering Methods. Cluster Analysis, 5th Edition, 215-255.
[26] Ravikumar, S., Ramachandran, K. I., & Sugumaran, V.
[10] Geisser, S. (1993). Predictive interference: an (2011). Machine learning approach for automated visual
introduction (Vol. 55).CRC Press. inspection of machine components.Expert Systems with
Applications, 38(4), 3260-3266.
[11] Han, J., Kamber, M., & Pei, J. (2006). Data mining:
concepts and techniques. Morgan kaufmann. [27] Sangkatsanee, P., Wattanapongsakorn, N., &
Charnsripinyo, C. (2011). Practical real-time intrusion
[12] Horton, P., & Nakai, K. (1996, June). A probabilistic detection using machine learning approaches.Computer
classification system for predicting the cellular localization Communications, 34(18), 2227-2235.
sites of proteins. In Ismb (Vol. 4, pp. 109-115).
[28] Şen, B., Uçar, E., & Delen, D. (2012). Predicting &
[13] Havrda, J., & Charvát, F. (1967). Quantification method of analyzing secondary education placement-test scores: A
classification processes. Concept of structural $ a $- data mining approach. Expert Systems with
entropy. Kybernetika, 3(1), 30-35. Applications, 39(10), 9468-76.
[14] James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). [29] Sharma, B. D., & Taneja, I. J. (1975). Entropy of type (α, β)
Support Vector Machines. In An Introduction to Statistical and other generalized measures in information
Learning (pp. 337-372). Springer New York. theory. Metrika, 22(1), 205-215.
[15] Jantan, H., Hamdan, A. R., & Othman, Z. A. (2011, June). [30] Su, J., & Zhang, H. (2006, July). A fast decision tree
Talent knowledge acquisition using data mining learning algorithm. InProceedings of the National
classification techniques. In Data Mining and Optimization Conference on Artificial Intelligence (Vol. 21, No. 1, p.
(DMO), 2011 3rd Conference on (pp. 32-37). IEEE. 500). Menlo Park, CA; Cambridge, MA; London; AAAI
[16] Kim, J. M., Ahn, H. K., & Lee, D. H. (2013). A Study on Press; MIT Press; 1999.
the Occurrence of Crimes Due to Climate Changes Using [31] Jin, C., De-lin, L.,& Fen-xiang, M. (2009, July).An
Decision Tree. In IT Convergence and Security 2012 (pp. improved ID3 decision tree algorithm. In Computer Science
1027-1036). Springer Netherlands. &education 2009. 4th International Conference on (pp.
[17] Kohavi, R. (1995, August). A study of cross-validation and 127-130). IEEE.
bootstrap for accuracy estimation and model selection. [32] Balagatabi, Z. N., & Balagatabi, H. N. (2013). Comparison
In IJCAI (Vol. 14, No. 2, pp. 1137-1145). of Decision Tree and SVM Methods in Classification of
[18] Lima, C. F. L., de Assis, F. M., & de Souza, C. P. (2010, Researcher's Cognitive Styles in Academic
May). Decision tree based on shannon, renyi and tsallis Environment. Indian Journal of Automation and Artificial
entropies for intrusion tolerant systems. InInternet Intelligence, 1(1),31-43
Monitoring and Protection (ICIMP), 2010 Fifth
International Conference on (pp. 117-122). IEEE.
IJCATM: www.ijcaonline.org 27