A Comparative Study of Existing Machine Learning Approaches For Parkinson's Disease Detection

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

IETE Journal of Research

ISSN: 0377-2063 (Print) 0974-780X (Online) Journal homepage: http://www.tandfonline.com/loi/tijr20

A Comparative Study of Existing Machine Learning


Approaches for Parkinson's Disease Detection

Gunjan Pahuja & T. N. Nagabhushan

To cite this article: Gunjan Pahuja & T. N. Nagabhushan (2018): A Comparative Study of Existing
Machine Learning Approaches for Parkinson's Disease Detection, IETE Journal of Research, DOI:
10.1080/03772063.2018.1531730

To link to this article: https://doi.org/10.1080/03772063.2018.1531730

Published online: 22 Oct 2018.

Submit your article to this journal

Article views: 4

View Crossmark data

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=tijr20
IETE JOURNAL OF RESEARCH
https://doi.org/10.1080/03772063.2018.1531730

A Comparative Study of Existing Machine Learning Approaches for Parkinson’s


Disease Detection
Gunjan Pahuja1 and T. N. Nagabhushan2
1 Department of Computer Science & Engineering, JSSATEN affiliated to Dr. A.P.J Abdul Kalam Technical University, Noida, UP, India;
2 Department of Information Science & Engineering, SJCE, Mysuru, India

ABSTRACT KEYWORDS
Parkinson’s disease (PD) has affected millions of people worldwide and is more prevalent in peo- Artificial neural networks
ple, over the age of 50. Even today, with many technologies and advancements, early detection of (ANN); K-nearest neighbors
this disease remains a challenge. This necessitates a need for the machine learning-based automatic (KNN); Parkinson’s disease
approaches that help clinicians to detect this disease accurately in its early stage. Thus, the focus (PD); support vector machine
(SVM)
of this research paper is to provide an insightful survey and compare the existing computational
intelligence techniques used for PD detection. To save time and increase treatment efficiency, clas-
sification has found its place in PD detection. The existing knowledge review indicates that many
classification algorithms have been used to achieve better results, but the problem is to identify the
most efficient classifier for PD detection. The challenge in identifying the most appropriate classifi-
cation algorithm lies in their application on local dataset. Thus, in this paper three types of classifiers,
namely, Multilayer Perceptron, Support Vector Machine and K-nearest neighbor have been discussed
on the benchmark (voice) dataset to compare and to know which of these classifiers is the most
efficient and accurate for PD classification. The Voice input dataset for these classifiers has been
obtained from UCI machine learning repository. ANN with Levenberg–Marquardt algorithm was
found to be the best classifier, having highest classification accuracy (95.89%). Moreover, we com-
pared our results with those obtained by Resul Das [“A comparison of multiple classification methods
for diagnosis of Parkinson Disease,” Expert Systems and applications, vol. 37, pp 1568–1572, 2010].

1. INTRODUCTION
After decades of exhaustive study, the causes of PD are
Parkinson’s disease (PD) is a progressive neurodegen- still unknown. Many of the researchers think that a
erative disorder of the nervous system that affects our combination of genetic [4] and environmental factors
body movements including speech. Dr. James Parkin- [5], such as exposure to the environmental toxin, head
son in 1817 [1] discovered this disease and described the injury, rural living, drinking water, manganese and expo-
condition which he called the ‘Shaking Palsy’. Neurode- sure to pesticides, are responsible for PD. These fac-
generative diseases are defined as hereditary and sporadic tors may vary from person to person. Also, there are
conditions which are characterized by dysfunction of some specific symptoms that an individual experience
the progressive nervous system (JPND research, 2015). and each PD patient experience these symptoms differ-
Out of many neurodegenerative diseases like Alzheimer’s ently. Description of different stages of PD is reported in
disease, Brain Cancer, Degenerative Nerve Diseases and Table 1. Primary motor symptoms of PD include tremor
Epilepsy, “Parkinson’s Disease” is considered to be the of the hands, arms, legs, jaw and face, bradykinesia or
second most common neurodegenerative disease [2]. slowness of movement, rigidity or stiffness of the limbs
and trunk and postural instability or impaired balance
PD is mainly caused by the progressive loss of dopamine and coordination [6–8]. In addition to these symptoms,
neurons in the area of the midbrain called substan- there are some non-motor symptoms like depression
tia nigra – the “movement control center” of the and loss of memory which may occur and affect the
brain (Figure 1). Loss of dopamine causes the neu- quality of life [9,10]. At the advanced stage, PD can
rons to fire out-of-control movements called hypo- be easily and accurately diagnosed, but effective treat-
kinetic movement disorder [3]. Although this disease ment is a challenging task. Also, if treatment is started
can be diagnosed easily in the advanced stage, effective in advanced stages, it might have less effective in con-
treatment is still very challenging. To date, there exists no trolling PD progression. This situation necessitates the
cure/medical treatment for PD. early and accurate diagnosis of PD, thus helps the patients

© 2018 IETE
2 G. PAHUJA AND T. N. NAGABHUSHAN: A COMPARATIVE STUDY OF EXISTING MACHINE LEARNING APPROACHES

(b) To offer a wide range of comparison in diverse angles


and perspectives in terms of data acquisition, fea-
ture extraction, feature subset selection, different
classifiers and result comparison organization.
(c) To compare the accuracy of existing classifiers on
vocal dataset available from UCI repository and also
to validate the performance of implemented classi-
fiers on two other benchmark datasets.
(1) Wisconsin Breast Cancer Database
(2) Pima Indians Diabetes Dataset
(d) To recommend the potential opportunity for auto-
matic diagnosis of PD.
Figure 1: Parkinson’s disease (normal movement vs. movement
disorders) [www.medindia.net]

in maintaining a good quality of life. To date, no sin- 2. REVIEW OF THE LITERATURE


gle blood or laboratory test exists that is helpful in the PD is also termed as idiopathic or primary Parkinsonism
identification of PD and its progression. However, rating /hypokinetic-rigid syndrome. From the literature, it is
methods such as Hoehn and Yahr scale (1967), Unified clear that various machine learning approaches have been
Parkinson Disease Rating Scale (UPDRS) and its mod- used for classification of PD by undertaking the vocal
ified version MDS-UPDRS [11] are sometimes used for and gait features [13–21]. MA Little et al. [2007] received
the early diagnosis of PD. Certain drawbacks associated the speech signals and created a database, in collabora-
with these methods are as follows: tion with National Center for Voice and Speech, Den-
ver, Colorado. The authors used kernel-support vector
(1) Availability of skilled workforce machine for PD classification [22]. Till now several stud-
(2) Time and cooperation required from patients for a ies have been conducted using Little’s PD voice dataset
longer period [12] etc. and different values of accuracy have been achieved
using different classification algorithms. Resul Das [13]
Sometimes, it would be difficult to distinguish between used the same dataset created by Little et al. and com-
various neurological disorders because they share the pared four independent classification approaches (neu-
same etiology. Approximately 75% of clinical diagnosis ral networks, data mining neural, logistic regression and
of PD is confirmed to be idiopathic PD. Thus, auto- decision trees) for diagnosis of PD. Among the four
matic methods based on machine learning are required to approaches, the best performance of 92.9% was yielded
improve diagnosis accuracy rate and to help the doctors by Multi-layer feed-forward neural network with Leven-
to make right decisions. berg–Marquardt algorithm. The authors also compared
the results with kernel–SVM results (from the literature)
and found that the obtained results are better than ker-
1.1 Objective
nel SVM approach. Freddie and Rasit [14] used a set
The objectives and contribution of our research paper are of nine parallel feed-forward neural network approach
as follows: on the same voice dataset for PD prediction. Although
complexity has been increased, this approach yielded an
(a) To present a comprehensive survey including the improvement of 8.4% on the prediction of PD as com-
most recent research papers up to year 2017. pared to the single unique network. On similar voice

Table 1: Stages of Parkinson’s disease


Stages Symptoms
Mildest stage (Stage 1) In this stage, the PD patients have least interference with routine tasks. Tremors and other symptoms are restricted to
one side of the body
Moderate stage (Stage 2) In this stage, symptoms like stiffness, resting tremors and trembling can be sensed on both sides of the body. Also
facial expressions of PD patients may get changed
Mid-stage (Stage 3) During this stage, major changes like balance loss, decreased flexes in addition with stage II symptoms will be
observed in PD patients. Occupational therapy combined with medication may help in decreasing the symptoms
Progressive stage (Stage 4) The condition of PD patient will get worse in this stage and it becomes difficult for the patient to move without some
assistive device like a walker
Advanced stage (Stage 5) Stage V is the most advanced and debilitating stage of PD. Stiffness in legs may cause freezing when standing. Patients
are frequently unable to stand without falling. They may experience hallucinations and occasional delusions
G. PAHUJA AND T. N. NAGABHUSHAN: A COMPARATIVE STUDY OF EXISTING MACHINE LEARNING APPROACHES 3

dataset, Hui-Ling Chen et al. [23] used fuzzy k-nearest negative cases or it measures the overall performance of
neighbor approach with Principal Component Analysis the method. Table 2 describes some of the studies avail-
for predicting PD and constructing the feature subset able in the literature for PD diagnosis and classification
from the whole feature space. The authors reported that using machine learning approaches.
their proposed method outperformed the other methods
in the literature.
2.1 Feature Subset Selection (FSS) Techniques
Omer et al. [16] compared the performance of LS-SVM, The diagnosis of neurodegenerative diseases through
SVM, MLPNN and GRNN in the remote tracking of PD machine learning approaches includes the following:
progression. It was observed that LS-SVM outperforms
the other methods while mapping the vocal features to (1) Data acquisition (Brain MRI images, gait move-
UPDRS data. ments, vocal data, local field potential etc.).
(2) Feature extraction (extract the features suitable for
It is clear from the literature that most of the PD patients training and testing a classifier).
exhibit gait disorder [20] along with vocal impairment. (3) Feature subset selection (to reduce the redundant
You-Yin et al. [24] developed a gait regression model for features).
predicting the severity of motor dysfunction from gait (4) Training and validating the performance of the clas-
image sequence. The studies done so far also indicate sifier.
that there is a loss of neurons in dopamine region of the
brain in the individuals affected by PD. Thus, over the Figure 2 shows the steps involved in medical image pro-
past 2 decades, neuroImaging techniques, such as MRI; cessing (MIP) using machine learning techniques.
SPECT; fMRI and PET, have been used to visually assess
and quantify the loss of neurons in different lobes of In the literature, a variety of machine learning algorithms
the brain [25–27]. MRI is preferred over others because exist such as induction-based (ID3, CART) and instance-
of non-invasiveness and high spatial resolution quality based algorithms (IBL) for medical imaging classifica-
[28,29]. In the literature, various machine learning tech- tion. But, these algorithms degrade the prediction accu-
niques/approaches exist that are found to be effective for racy because of the availability of many features that
diagnosing PD patients using neuroImaging techniques. are not necessary for predicting the output. Thus there
is a need for FSS methods which optimize the num-
The changes in the functional connectivity of motor net- ber of features by selecting the relevant subset and thus
works in the resting state in PD, using fMRI and a net- improve the classification accuracy. A typical FSS consists
work model based on graph theory, were demonstrated of 4 basic steps: Subset Generation, Subset Evaluation,
by Tao et al. [30]. The authors found that functional con- Stopping Criterion and Result Validation [39].
nectivity in the supplementary motor area, left dorsal lat-
eral prefrontal cortex and left putamen of PD patients at Subset generation procedure is a search procedure that
off state had significantly decreased while functional con- produces feature subsets for evaluation based on prede-
nectivity in the left cerebellum, left primary motor cortex fined criterion [40]. An evaluation function is used to
and left parietal cortex had increased as compared to nor- evaluate the subset under examination, the stopping cri-
mal subjects in PD. Defeng et al. [31] conducted a real- terion is used to decide when to stop and a validation
time case study using deep brain electrode implantation procedure is used to check whether the subset is valid.
to predict the PD tremor. Similarly, Christian Salvotre Based on different evaluation criteria, FSS algorithms are
et al. [32] used a dataset of MRI scans from 28 controls, categorized into three categories (1) the filter model, (2)
28 PD patients and 28 Progressive Supranuclear Palsy. the wrapper model [39] and (3) hybrid model. In all the
Supervised machine learning algorithm was used based categories, algorithms can be further differentiated by
on PCA as a feature extraction method and SVM as a clas- how the space of feature subsets is explored and the exact
sification algorithm. The authors have tried to overcome nature of their evaluation function.
the problem of imbalance dataset by taking the same
number of patients of different classes (PD, HC and PSP). The filter model relies on general characteristics of the
Nowadays many classifiers are available for PD detection data to evaluate and select the feature subsets without
and their performance is measured with metrics such involving any learning algorithm. But, sometimes the fil-
as accuracy, sensitivity and specificity [15,17,33,34]. In ter method fails to select the right subset of features if
general, the accuracy is a measure of how many cases the applied criterion deviates from the one that is used
are correctly identified in total irrespective of positive or for training purpose. Also, the filter approach may fail to
4 G. PAHUJA AND T. N. NAGABHUSHAN: A COMPARATIVE STUDY OF EXISTING MACHINE LEARNING APPROACHES

Table 2: Literature survey for diagnosis of Parkinson’s disease using machine learning approaches
Study Dataset Method Results
Song Pan et al. [15] Local field potential signals Radial Basis Function+ Support Accuracy
Vector Machine + SVM: 81.14%
Multilayer Perceptron RBF:80.13%
MLP:79.25%
Sang-Hong Lee and Joon Gait characteristics Wavelet-based feature extraction, +Neural Accuracy:77.33%
S. Lim [17] Network with weighted fuzzy membership
functions
G. Sateesh Babu and S. Gene expressions ICA+ Meta-cognitive neural classifier Accuracy:95.55%
Suresh [18]
R. Armananzas et al. [35] Movement disorder Wrapper feature selection + 5 Accuracy
classifiers: 1. NB:82.08%
Naïve Bayes (NB), k-nearest neigh- 2. KNN:80.06%
bors 3. LDA:83.24%
LDA, C4.5 decision trees, ANN 4. C 4.5:81.50%
5. ANN:64.74%
G.S. Babu et.al [33] Brain MRI images Voxel-Based Morphometry + PBL-McRBFN+ Accuracy:87.21%
RFE
F.J. Martinez-Murcia et al. DaTSCAN Independent Component Analysis (ICA) + Accuracy on
[36] Images Support Vector Machines(SVM) 1. PPMI dataset = 91.3%
and
2. Virgen dela Victoria”
Hospital in Málaga (VV),
Spain-94.7%
G. Singh and L. T1-weighted MRI Images Kohonen Self Organizing Accuracy: 99.9% (For classifying PD, HC
Samavedham [37] Map+ and SWEDD subjects)
Least Square Support Vec-
tor Machine
A. Benba et al. [38] Voice Assessment Principal Component Analysis+ Support Accuracy: 87.50%
Vector Machine (On 3 vowel samples
/a/,/o/,/u/)
L. Naranjo [21] Acoustic features Gibb’s Sampling Algorithm +Bayesian Accuracy: 86.2%
y extracted from repli- Approach Sensitivity:82.5%
cated voice recordings Specificity:90.0%

Figure 2: Steps involved in medical image processing (MIP) using machine learning techniques

find a feature subset that would jointly maximize the cri- the hybrid methods are more efficient than wrapper
terion, thus degrading the performance of the learning and filter approaches, they are much complex and
model [41,42]. On the other hand, the wrapper method limited to a specific learning machine [46,47]. Much
requires a learning algorithm and uses its performance as work has been done in this field as well [48]; different
the evaluation criterion. Wrappers can show even better researchers have mentioned advantages and disadvan-
results than others by considering prediction accuracy. tages of filter and wrapper approaches. Table 3 indicates
But, wrapper models are less general and are computa- FSS/dimensionality reduction methods currently avail-
tionally expensive than filter models because they need able in the literature for reducing the dimensionality or
more computational resources and use specific learn- removing the irrelevant/redundant features in the case of
ing algorithm [43,44]. Since filters execute many times PD detection and classification using machine learning
faster than wrappers, there is a much better chance of methods.
scaling to databases with a large number of features
in filter approach than wrappers. Also, filters do not
require re-execution for different learning algorithms. 2.2 Classification
Thus filters can provide the same benefits for learning as After feature extraction and subset selection, the next
wrappers do. phase is the classification. It is an instance of supervised
learning and can be defined as a problem of identify-
The hybrid model combines the advantage of the fil- ing the category, to which new observation will belong.
ter and wrapper model by utilizing different evalua- Various methods used for classification are categorized
tion criteria in different search stages [45]. Although as (a) Statistical Algorithms, (b) Pattern Recognition and
G. PAHUJA AND T. N. NAGABHUSHAN: A COMPARATIVE STUDY OF EXISTING MACHINE LEARNING APPROACHES 5

Table 3: Feature subset selection techniques


Study Dataset Feature subset selection technique
Y.-Y. Chen et al. [24] Monocular image sequences Linear discriminant analysis
R. Armananzas et al. [35] Movement Disorders Wrapper feature selection scheme
G.S. Babu et al. [33] Brain MRI images Wrapper Method based on Recursive feature elimination
M. Hariharan et al. [19] Vocal dataset Principal Component Analysis +Linear Discriminant Analysis
F.J. Martinez-Murcia et al. [36] DaTSCAN Images Independent Component Analysis (ICA)
B. Rana et al. [34] T1-weighted MRI images Filter feature selection approach (based on mutual information)
P. Shrivastava et al. [49] Gait and Voice dataset Evolutionary approaches (like Bat Algorithm, Cuckoo Search algorithm, PSO
and Genetic Algorithms)

is freely available and can be easily downloaded from UCI


repository. There are six recordings per patient. The first
column of the dataset specifies the name of the patient
and the last column specifies the status which is set to 1
for PD and 0 for healthy subjects.

Figure 3: Methods applied for PD classification In addition to classifying the PD patients using voice
dataset, we had also evaluated the classifiers performance
on two other benchmark datasets available from UCI
learning-based algorithms, (c) Search heuristics and a
repository. Table 4 summarizes the benchmark datasets
combination of algorithms.
used in this study.
In statistical approaches, the computation of mean, stan-
dard deviation of the features in the template is done. 3.2 Applied Methods
Distance techniques such as Euclidean distance, weighted
Euclidean distance and Manhattan distance are used for 3.2.1 Artificial Neural Network (ANN)
comparing the training data with the testing data. ANN symbolizes a parallel architecture that is moti-
vated by the way how biological neural processing takes
Pattern recognition is defined as an act of taking raw data place. Although many types of ANN architectures exist,
and classifying them into different categories based on MLP (multi-layer feed-forward neural network) is the
machine learning algorithms such as K-NN rule, Bayes most commonly used architecture (Figure 4). Backprop-
classifier, SVM, artificial neural networks (ANN) [13] agation algorithm proposed by Rumelhart in 1986 is
and clustering techniques like K-means [13,16,19,35,50]. a generalized delta rule that is utilized by MLP Net-
work for the adjustment of weights [13,16,54]. Leven-
Various evolutionary algorithms such as Ant colony opti- berg–Marquardt, Gradient descent scaled conjugate gra-
mization and Particle swarm optimization can also be dient, and Resilient back propagation are some of the
used for classification purpose [18,49,51,52]. The advan- variants of the Backpropagation algorithm. According to
tage of using these evolutionary algorithms is that they M.T. Hagan and M. Menhaj [55], for small- and medium-
can handle large databases. Figure 3 depicts the methods sized networks, the Levenberg–Marquardt algorithm is
applied for PD classification in this study. efficient and strongly recommended for neural net-
work training; therefore, the same algorithm has been
implemented here.
3. MATERIALS AND METHODS
This section describes the methods and materials used 3.2.2 Support Vector Machine (SVM)
in this study for classifying the PD patients from healthy SVM is considered to be a supervised classification
subjects. approach. Vapnik [56] first proposed SVM for binary
classification. Binary classification is based on the con-
cept of dividing the data into classes using a hyperplane.
3.1 Dataset
In this paper, dataset of PD patients regarding general For the linear classification problems, SVM is consid-
voice disorders has been used. MA Little [53] of the ered as an extension of the perceptron. From Figure 5,
University of Oxford, in collaboration with the National it is clear that the distance between the 2 hyper-
Centre for Voice and Speech, Denver, Colorado, recorded planes is 2/||w||. So, the optimization problem is to
the speech signals and created the database. This database reduce/minimize ||w|| or to maximize the margin
6 G. PAHUJA AND T. N. NAGABHUSHAN: A COMPARATIVE STUDY OF EXISTING MACHINE LEARNING APPROACHES

Table 4: Summary of Benchmark datasets


Title Features Instances Classes
Parkinson’s disease – voice dataset 23 197 2 (Binary)
(https://archive.ics.uci.edu/ml/datasets/Parkinsons)
Wisconsin Breast Cancer database 10 699 2 (Binary)
(http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original))
Pima Indians Diabetes Dataset 8 768 2 (Binary)
(http://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes)

because of its computational capability in dealing with


overfitting and dimensionality problems that generally
occurs during classification [15,36–38].

3.2.3 K-nearest Neighbor (K-NN)


K-NN is one of the nonparametric classification appro-
aches used in machine learning [57]. The result of K-
NN approach depends on the type of output required
for the particular applications. The class is assigned
to the object that is most common among the K-
nearest neighbors. If K = 1, then the class of that single
nearest neighbor is assigned to that object. Hui-Leng
Chen et al. [23] presented an efficient diagnosis sys-
Figure 4: Artificial neural network architecture
tem for PD detection using fuzzy k-nearest neighbor
approach.

4. RESULTS AND DISCUSSIONS


This section discusses the results obtained using ANN, K-
NN and SVM for classifying the PD patients from healthy
subjects using voice dataset. The same dataset has been
used by various researchers to prove their studies for PD
detection, Resul Das [13], Hui-Leng Chen et al. [23], G.S.
Babu [18] to name a few. All the implementations in this
study are carried out using Matlab R2013a.

4.1 ANN with Levenberg–Marquardt Algorithm


and Scaled Conjugate Gradient Algorithms
The neural network architecture used for classification
Figure 5: SVM trained with data/samples from 2 classes
is a feed-forward back-propagation network. Backprop-
agation has been used in this study based on Leven-
between the support vectors. Thus the binary optimiza- berg–Marquardt optimization and scaled conjugate gra-
tion problem can be stated as dient method with 10 neurons in the hidden layer. The
input dataset is randomly partitioned into training, test-
1  n
ing and validation dataset for ANN classification. The
arg min ||w||2 + C ξ
2 initial weights were chosen randomly. The tuning of all
i=1
the parameters for ANN classification has been done as
where C is a tuning parameter and C > 0 and ξ is in [13].
required to tolerate misclassification.
4.2 K-Nearest Neighbor (K-NN)
If wT x + b ≥ 0, then SVM predicts “1”, otherwise it
will predict “−1”. The decision boundary is given by K-NN (lazy learning) is considered to be the sim-
wT x+b = 0. In this paper, SVM has been chosen as a plest algorithm among the various machine learning
classifier for classifying PD patients from normal subjects algorithms available for classification. The predictions
G. PAHUJA AND T. N. NAGABHUSHAN: A COMPARATIVE STUDY OF EXISTING MACHINE LEARNING APPROACHES 7

made by this method are based on the outcome of ‘K’


neighbors that are closest to that point. Various dis-
tance metrics such as Euclidean, Euclidean squared, City-
block distances can be used for calculating the distance
between the sample cases (q) and query point (y). Here,
the statistical features of K-NN have been evaluated using
Euclidean and cityblock distance metrics.

D(y, q) = (y − q)2 Euclidean Distance
= abs(y − q) Cityblock

4.3 Support Vector Machine (SVM)


SVM classifier with RBF, polynomial and linear ker- Figure 6: ROC curve showing true positive rate vs. false positive
rate (Levenberg–Marquardt algorithm)
nel functions has been implemented in Matlab. To vali-
date classifier accuracy, 10-fold cross-validation [43] was
used. The advantage of 10-fold CV approach is that all
the test sets are independent and thus the reliability of for validating the classifier performance. This curve spec-
results could be improved. Since the voice dataset is an ifies the true positive rate vs. false positive rate for dif-
imbalanced dataset, we are using “stratified sampling” to ferent thresholds of the classifier output. From Figure 6,
split the data. Stratified 10-fold CV ensures the same class true positive rate vs false positive rate can be easily iden-
distribution in the subset, thus sample proportion in each tified. Figure 7 demonstrates the validation performance
data subset is the same as that in population. 10-fold CV graph using Levenberg-Marquardt Algorithm. There is
means that cross-validation process is repeated 10 inde- no problem of overfitting the data, because test curve
pendent times and then results are averaged to produce a and validation curve are similar. Some overfitting could
single estimation. have occurred if the test curve had increased significantly
before validation curve increased. For comparison pur-
Table 5 shows the performance comparison of ANN, pose, classification accuracies of the previous methods
SVM and K-NN with different variants on PD voice which were investigated on for PD diagnosis using voice
dataset. It is clear from Table 5 that ANN with Lev- data are listed in Table 6. In order to further validate the
enberg–Marquardt, K-NN with Euclidean distance and effectiveness of these methods, we implemented the same
SVM with RBF kernel outperformed the other variants algorithms on two other benchmark datasets (Table 4)
that are investigated in this study. Since it is an imbal- and results are compiled in Table 7.
anced dataset along with performance parameters such
as accuracy, sensitivity and specificity, geometric mean
of the true rates is also calculated. The significance of this
5. CHALLENGES AND ISSUES
metric is that geometric mean tries to maximize the accu-
racy on each of the 2 classes with a good balance. Another Ingrid Scholl et al. [58] discussed Kilo-to-Tera byte
approach to produce evaluation criteria is to make use challenges in MIP. These challenges are related to
of ROC curve. Since the aim of this study is to compare management and mining of medical images, bio-
the existing machine learning approaches for PD classi- imaging, neuroImaging and virtual reality in medical
fication, ROC (Receiver Operating Characteristic) curve visualizations. Technological advancement has enabled
only for this case is shown in Figure 6. ROC curve is used Peta-byte availability for medical imaging and hence

Table 5: Performance comparison of ANN, KNN and SVM on PD voice dataset


ANN KNN SVM

Variants →
Performance Levenberg– Marquardt Scaled conjugate Euclidean Cityblock RBF Polynomial Linear
parameters↓ algorithm gradient distance distance kernel kernel kernel
Classification accuracy 95.89 85.12 72.31 69.74 88.21 81.03 82.9
Sensitivity 93.75 70 68.75 66.67 91.67 79.17 87.33
Specificity 96.59 96.59 73.47 70.75 77.55 87.76 78.56
Geometric mean 95.16 82.23 71.07 68.68 84.31 83.35 82.83
8 G. PAHUJA AND T. N. NAGABHUSHAN: A COMPARATIVE STUDY OF EXISTING MACHINE LEARNING APPROACHES

Figure 7: Graph showing the validation performance (Levenberg–Marquardt algorithm)

Table 6: Classifier performance comparison with studies that the total number of samples from one class of data
available in the literature on vocal dataset (+ve) are not equal to the total number of samples from
Study Method Accuracy (%) other class of data (−ve). This problem exists not only
R. Das [13] ANN 92.9 in medical diagnosis but also approximately in all fields
F. Astrom and R. Kokar [14] 9 parallel neural 91.2
networks where “Machine Learning” is used such as face recogni-
A. Khemphila and Information Gain+ 83.33 tion and biometrics. This problem may be overcome by
V. Boonjing [54] ANN
H.-L. Chen et al. [23] PCA+FKNN 96.07
using balanced dataset, so that decision model can learn
A. Benba et al. [38] PCA+SVM 87.21 without bias. The presence of noise and outliers during
data collection can lead to poor diagnosis. Thus, prepro-
cessing of medical data is a necessary step and must be
handled automatically. Post removal of noise and outliers,
addresses byte challenge. The other two biggest chal-
medical images can be processed and analyzed to extract
lenges that still exist in MIP are the dataset and com-
meaningful information such as volume, shape, motion
putational power, e.g. G.S. Babu et al. [33] developed
of organs which are helpful in the diagnosis of the disease
the meta-cognitive algorithm for the identification of the
and abnormalities.
brain regions responsible for PD using RFE approach.
87.21% accuracy was achieved but the computational cost
was high. From machine learning perspective, dataset
6. CONCLUSION
must be clean and of significant size to solve the prob-
lem. However, availability of clean dataset is limited due Research highlights that 90% of people with PD exhibit
to the nature of complexity. vocal impairment. Vocal impairment or disorders of
voice means that voice will sound hoarse, strained or
Dataset collection has some inherent challenges like effortful. Several studies have been done to automate the
“class imbalance problem” [23] and presence of noise and PD diagnosis using voice dataset. In this paper, the per-
outliers in the dataset. Class imbalance problem means formance of ANN, KNN and SVM classifiers has been

Table 7: Performance Comparison of ANN, KNN and SVM on Wisconsin breast cancer dataset and Pima Indians diabetes dataset
Variants → ANN KNN SVM

Levenberg–
Performance Marquardt Scaled conjugate Euclidean Cityblock RBF Polynomial Linear
Datasets parameters↓ algorithm gradient distance distance kernel kernel kernel
Wisconsin Breast Classification accuracy 98 97 73.33 72.31 96.71 90.1 95.02
Cancer Database Sensitivity 97.8 97.16 68.75 66.67 96.29 92.16 96.72
Specificity 95.85 98.3 74.83 74.15 97.51 88.8 94.51
Geometric mean 96.82 97.73 71.73 70.31 96.90 90.46 95.61
Pima Indians Classification accuracy 81.11 78.51 72.82 72.31 75.01 73.16 74.61
Diabetes Dataset Sensitivity 90 80.62 68.75 68.75 73.4 77.4 78.3
Specificity 68.33 73.3 74.15 73.47 72.76 69.4 71.04
Geometric mean 78.42 76.87 71.40 71.07 73.08 73.29 74.58
G. PAHUJA AND T. N. NAGABHUSHAN: A COMPARATIVE STUDY OF EXISTING MACHINE LEARNING APPROACHES 9

evaluated using sensitivity, specificity, total classification (MDS-UPDRS): Scale presentation and clinimetric testing
accuracy and geometric mean on voice database. Sim- results,” Mov. Disord., Vol. 23, no. 15, pp. 2129–70, 2008.
ilar discussion is also carried out for Wisconsin Breast
12. B. Post, M. P. Merkus, R. M. de Bie, R. J. de Haan, and
Cancer database and Pima Indians Diabetes Dataset. It J. D. Speelman, “Unified Parkinson’s disease rating scale
is observed that Artificial Neural Networks with Lev- motor examination: Are ratings of nurses, residents in neu-
enberg–Marquardt algorithm gives the highest classifi- rology, and movement disorders specialists interchange-
cation accuracy of 95.89% for voice dataset. We believe able?,” Movement Dis., Vol. 20, pp. 1577–84, 2005.
that the use of machine learning techniques as discussed
13. R. Das, “A comparison of multiple classification methods
here will be a great support to the doctors. Although a
for diagnosis of Parkinson disease,” Expert Syst. Appl., Vol.
large number of techniques are available for PD diagno- 37, pp. 1568–72, 2010.
sis their performance is still imperfect. Hence, to improve
the accuracy of CAD algorithms, there is a need for fur- 14. F. Astrom and R. Koker, “A parallel neural network
ther enhancements. In future, we will attempt to use approach to prediction of Parkinson’s disease,” Expert Syst.
other evolutionary algorithms like Genetic algorithm and Appl., Vol. 38, pp. 12470–4, 2011.
Extreme Learning Machine for PD detection and classi- 15. S. Pan, S. Iplikci, K. Warwick, and T. Z. Aziz, “Parkinson’s
fication. disease tremor classification – a comparison between sup-
port vector machines and neural networks,” Expert Syst.
Appl., Vol. 39, pp. 10764–71, 2012.
REFERENCES
1. J. Parkinson, An Essay on Shaking Palsy. London: Whitting- 16. O. Eskidere, F. Ertas, and C. Hanilci, “A comparison of
ham and Rowland Printing, 1817. regression methods for remote tracking of Parkinson’s dis-
ease progression,” Expert Syst. Appl., Vol. 39, pp. 5523–8,
2. D. B. Calne, “Is idiopathic parkinsonism the consequence 2012.
of an event or a process,” Neurology, Vol. 44, no. 15, pp. 5–5,
1994. 17. S.-H. Lee and J. S. Lim, “Parkinson’s disease classifi-
cation using gait characteristics and wavelet-based fea-
3. A. E. Lang and A. M. Lozano, “Parkinson’s disease first of ture extraction,” Expert Syst. Appl., Vol. 39, pp. 7388–44,
Two parts,” New England J. Med., Vol. 339, pp. 1044–53, 2012.
1998.
18. G. Sateesh Babu, and S. Suresh, “Parkinson’s disease pre-
4. A Samii, J. G. Nutt, and B. R. Ransom, “Parkinson’s dis- diction using gene expression – a projection based learn-
ease,” Lancet, Vol. 363, no. 9423, pp. 1783–93, 2004. ing meta-cognitive neural classifier approach,” Expert Syst.
Appl., Vol. 40, pp. 1519–29, 2013.
5. L. M. de Lau and M. M. Breteler, “Epidemiology of Parkin-
son’s disease,” Lancet Neurol., Vol. 5, pp. 525–35, 2006. 19. M. Hariharan, K. Polat, and R. Sindhu, “A new hybrid intel-
ligent systems for accurate detection of Parkinson’s dis-
6. E. M. Morris, “Movement disorder in people with Parkin- ease,” Comp. Methods Prog. Biomed., Vol. 113, pp. 904–13,
son disease: A model for physical therapy,” Phys. Ther., Vol. 2014.
80, pp. 578–97, 2000.
20. W. Zeng and C. Wang, “Classification of neurodegenerative
7. A. Schrag, C. D. Good, K. Miszkiel, H. R. Morris, C. J. diseases using gait dynamics via deterministic learning,”
Mathias, A. J. Lees, and N. P. Quinn, “Differentiation of Inform. Sci., Vol. 317, pp. 246–58, 2015.
atypical parkinsonian syndromes with routine MRI,” Neu-
rology, Vol. 54, pp. 697–702, 2000. 21. L. Naranjo, C. J. Perez, J. Martín, and Y. Campos-Roca,
“A two-stage variable selection and classification approach
8. R. Angel, W. Alston, and J. R. Higgins, “Control of move- for Parkinson’s disease detection by using voice recording
ment in Parkinson’s disease,” Brain, Vol. 93, no. 1, pp. 1–14, replications,” Comp. Methods Prog. Biomed., Vol. 142, pp.
1970. 147–56, 2017.

9. S. L. Wu, R. M. Liscic, S. Kim, S. Sorbi, and Y. H. Yang, 22. M. A Little, P. E. McSharry, E. J. Hunter, J. Spielman, and L.
“Nonmotor symptoms of Parkinson’s disease,” Parkinson’s O. Ramig, “Suitability of dysphonia measurements for tele-
Dis., 2017. DOI:10.1155/2017/4382518. monitoring of Parkinson’s disease,” IEEE Trans. Biomed.
Eng., Vol. 56, pp. 1015–22, 2009.
10. T. Yousaf, H. Wilson, and M. Politis, “Imaging the nonmo-
tor symptoms in Parkinson’s disease,” Int. Rev. Neurobiol., 23. H.-L. Chen, C-C. Huang, X-G. Yu, X. Xu, X. Sun, G. Wang,
Vol. 133, pp. 179–257, 2017. and S-J. Wang, “An efficient diagnosis system for detec-
tion of Parkinson’s disease using fuzzy k-nearest neigh-
11. C. G. Goetz, et al., “Movement disorder society-sponsored bor approach,” Expert Syst. Appl., Vol. 40, pp. 263–71,
revision of the unified Parkinson’s disease rating scale 2013.
10 G. PAHUJA AND T. N. NAGABHUSHAN: A COMPARATIVE STUDY OF EXISTING MACHINE LEARNING APPROACHES

24. Y.-Y. Chen, et al., “A vision-based regression model to eval- DaTSCAN imaging,” Neurocomputing, Vol. 126, pp. 58–70,
uate parkinsonian gait from monocular image sequences,” 2014.
Expert Syst. Appl., Vol. 39, pp. 520–6, 2012.
37. G. Singh, and L. Samavedham, “Unsupervised learn-
25. P. Piccini, and A. Whone, “Functional brain imaging in ing based feature extraction for differential diagnosis of
the differential diagnosis of Parkinson’s disease,” Lancet neurodegenerative diseases: a case study on early-stage
Neurol., Vol. 3, pp. 284–90, 2004. diagnosis of Parkinson disease,” J. Neurosci.Methods, Vol.
256, pp. 30–40, 2015.
26. A. G. Filler, “The history, development and impact of
computed imaging in neurological diagnosis and neuro- 38. A. Benba, A. Jilbab, and A. Hammouch, “Voice assessments
surgery: CT, MRI, and DTI,” Nature, Vol. 7, pp. 1–69, for detecting patients with Parkinson’s diseases using PCA
2009. and NPCA,” Int J. Speech Technol., Vol. 19, no. 4, pp.
743–54, 2016.
27. B. S. Mahanand, S. Suresh, N. Sundararajan, and M.
Aswatha Kumar, “Identification of brain regions responsi- 39. M. Dash, and H. Liu, “Feature selection for classification,”
ble for Alzheimer’s disease using a self-adaptive resource Intell. Data Anal., Vol. 1, pp. 131–56, 1997.
allocation network,” Neural Netw., Vol. 32, pp. 313–22,
2012. 40. L. Huan and H. Motoda, Feature Selection for Knowledge
Discovery and Data Mining. Boston, MA: Kluwer Aca-
28. A. Schrage, M. Jahanshahi, and N. Quinn, “How does demic, 1998.
Parkinson’s disease affect quality of life? A comparison
with quality of life in the general population,” Movement 41. M. E. ElAlami, “A filter model for feature subset selection
Disord., Vol. 15, pp. 1112–8, 2000. based on genetic algorithm,” Knowledge Based Syst., Vol.
22, pp. 356–62, 2009.
29. B. Ravina, et al., “The role of radiotracer imaging in Parkin-
son disease,” Neurology, Vol. 64, pp. 208–15, 2005. 42. H. Yoon, C-S. Park, J. S. Kim, and J-G. Baek, “Algorithm
learning based neural network integrating feature selection
30. T. Wus, L. Wang, Y. Chen, C. Zhao, K. Li, and P. Chan, and classification,” Expert Syst. Appl., Vol. 40, pp. 231–41,
“Changes of functional connectivity of the motor network 2013.
in the resting state in Parkinson’s disease,” Neurosci. Lett.,
Vol. 460, pp. 6–10, 2009. 43. R. Kohavi and G. H. John, “Wrappers for feature subset
selection,” Artif. Intell., Vol. 97, pp. 273–324, 1997.
31. D. Wu, K. Warwick, Z. Ma, J. G. Burgess, S. Pan, and T. Z.
Aziz, “Prediction of Parkinson’s disease tremor onset using 44. M. Hall, “Correlation based feature selection for machine
radial basis function neural networks,” Expert Syst. Appl., learning,” Ph.D. dissertation, Dept. of Computer Science,
Vol. 37, pp. 2923-2928, 2010. University of Waikato, 1999.

32. C. Salvatore, et al., “Machine learning on brain MRI data 45. A. H. Hadjahmadi and T. J. Askari, “A decision support sys-
for differential diagnosis of Parkinson’s disease and pro- tem for Parkinson’s disease diagnosis using classification
gressive supranuclear palsy,” J. Neurosci. Methods, Vol. 222, and regression tree,” J. Math. Comp. Sci., Vol. 4, pp. 257–63,
pp. 230–37, 2014. 2012.

33. G. Sateesh Babu, S. Suresh, and B. S. Mahanand, “A novel 46. O. Uncu and I. B. Turksen, “A novel feature selection
PBL-McRBFN-RFE approach for identification of critical approach: combining feature wrappers and filters,” Inf. Sci.,
brain regions responsible for Parkinson’s disease,” Expert Vol. 177, pp. 449–66, 2007.
Syst. Appl., Vol. 41, pp. 478–88, 2014.
47. J. Huang, Y. Cai, and X. Xu, “A hybrid genetic algorithm for
34. B. Rana, A. Juneja, M. Saxena, S. Gudwani, S. Senthil feature selection wrapper based on mutual information,”
Kumaran, R.K. Agrawal, and M. Behari, “Regions-of- Pattern Recog. Lett., Vol. 28, pp. 1825–1844, 2007.
interest based automated diagnosis of Parkinson’s dis-
ease using T1-weighted MRI,” Expert Syst. Appl., Vol. 42, 48. D. Guan, W. Yuan, Y-K. Lee, K. Najeebullah, and M. K.
pp. 4506–16, 2015. Rasel, “A review of ensemble learning based feature selec-
tion,” IETE Tech. Rev., Vol. 31, pp. 190–198, 2014.
35. R. Armananzas, C. Bielza, K. R. Chaudhuri, P. Martinez-
Martin, and P. Larrañaga, “ Unveiling relevant non-motor 49. P. Shrivastava, A. Shukla, P. Vepakomma, N. Bhansali, and
Parkinson’s disease severity symptoms using a machine K. Verma, “A survey of nature-inspired algorithms for
learning approach,” Artif. Intell. Med., Vol. 58, pp. 195–202, feature selection to identify Parkinson’s disease,” Comp.
2013. Methods Prog. Biomed., Vol. 139, pp. 171–9, 2017.

36. F. J. Martinez-Murcia, J. M. Górriz, J. Ramírez, I. A. Illán, A. 50. M. Poletti, M. Emre, and U. Bonuccelli, “Mild cognitive
Ortiz, and the PPMI, “Automatic detection of parkinson- impairment and cognitive reserve in Parkinson’s disease,”
ism using significance measures and component analysis in Parkinsonism Related Disord., Vol. 17, pp. 579–586, 2011.
G. PAHUJA AND T. N. NAGABHUSHAN: A COMPARATIVE STUDY OF EXISTING MACHINE LEARNING APPROACHES 11

51. K Chandrasekaran, S. P. Simon, and N. P. Padhy, “Cuckoo 55. M. T. Hagan and M. Menhaj, “Training feedforward net-
search algorithm for emission reliable economic multi- works with the Marquardt algorithm,” IEEE Trans. Neural
objective dispatch problem,” IETE J. Res., Vol. 60, pp. Netw., Vol. 5, pp. 989–93, 1994.
128–38, 2014.
56. V. Vapnik, The Nature of Statistical Learning Theory. New
52. V. Mangat and R. Vig, “Dynamic PSO-based associative York: Springer-Verlag, 1995.
classifier for medical datasets,” IETE Tech. Rev., Vol. 31, pp.
258–65, 2014. 57. P. Hall, U. B. Park, and R. J. Samworth, “Choice of neighbor
order in nearest-neighbor classification,” Annals Stat., Vol.
53. M. A. Little, P. E. McSharry, S. J. Roberts, D. A. Costello, and 36, pp. 2135–52, 2008.
I. M. Moroz, “Exploiting nonlinear recurrence and fractal
scaling properties for voice disorder detection,” BioMed. 58. I. Scholl, T. Aach, M. T. Deserno, and T. Kuhlen, “Chal-
Eng. Online, Vol. 6, p. 23, 2007. lenges of medical image processing,” Comput Sci Res.
Develop., Vol. 26, pp. 5–13, 2011.
54. A. Khemphila and V. Boonjing, “Parkinsons disease classi-
fication using neural network and feature selection,” World
Acad. Sci. Tech, Vol. 64, pp. 1–15, 2012.

Authors T.N. Nagabhushan received his Master’s


degree and PhD in Electrical Engineering
Gunjan Pahuja received M.Tech. from from Indian Institute of Science, Banga-
Guru Jambheshwar University of Science lore in the year 1989 and 1996, respec-
and Technology in 2006. Currently, she is tively. He has over 32 years of experience
pursuing her PhD in the department of in teaching, research and industry besides
Computer Science and Engineering from holding the position of principal. His main
Dr. A.P.J. Abdul Kalam University, Luc- area of focus is machine learning, develop-
know. Her research focuses on Machine ment of new tools for supervised learning and applications to
Learning and Medical Image Processing. image processing. He is a member of ISTE and CSI.
She is a member of ISTE.
Email: [email protected]
Corresponding author. Email: [email protected]

You might also like