Exploratory_Analysis_of_Smartphone_Sensor_Data_for_Human_Activity_Recognition
Exploratory_Analysis_of_Smartphone_Sensor_Data_for_Human_Activity_Recognition
Exploratory_Analysis_of_Smartphone_Sensor_Data_for_Human_Activity_Recognition
ABSTRACT Precise recognition of human activities in any smart environment such as smart homes or
smart healthcare centers is vital for child care, elder care, disabled patient monitoring, self-management
systems, safety, tracking healthcare functionality, etc. Automatic human activity recognition (HAR) based on
smartphone sensor data is becoming widespread day by day. However, it is challenging to understand human
activities using sensor data and machine learning and so the recognition accuracy of many state-of-the-art
methods is relatively low. It requires high computational overhead to improve recognition accuracy. The goal
of this paper is to use exploratory data analysis (EDA) to deal with this strain and after analyzing, visual-
izations and dimensionality reductions are obtained which assists in deciding the data mining techniques.
The HAR method based on smartphone accelerometer and gyroscope sensors’ data, EDA, and prediction
models proposed in this paper is a high-precision method, and its highest accuracy is 97.12% for the HAR
smartphone dataset. Heterogeneous models-based two ensembles: stacking and voting are used in this study
to identify human activities of daily living (ADL). Three estimators are used: Linear Discriminant Analysis,
Linear Support Vector Machines, and Logistic Regression for both stacked and voting generalization. The
experimental results show that the generalization algorithms provide an automatic and precise HAR system
and can serve as a decision-making tool to identify ADL in any smart environment.
INDEX TERMS Activities of daily living, exploratory data analysis, hard voting, heterogeneous model,
smartphone sensor, stacked generalization.
present life situations; hence, they may necessitate the help performance of the method to authenticate the estimated
of other persons and/or mechanical devices [3]. results.
Due to the widespread ease of use of smartphones in recent The rest part of this paper is prepared as follows. The second
years, recognizing ADL using smartphone data has gained section reviews some state-of-the-art in the field of HAR
significant attention. Again, having built-in sensors such and the use of smartphone sensors to capture data. The third
as accelerometers, gyroscopes, magnetometers, and GPS, section presents a new and details exploratory data analysis
smartphones have become powerful tools for collecting data for the selected dataset and details modeling to deal with sen-
about human activities. Moreover, researchers have achieved sor data for ADL recognition. The fourth section illustrates
improved recognition performance and better discrimination and describes the experimental results and discussion with
between similar activities, by exploring the fusion of multi- estimators and their hyper-parameters. The fifth part sums
ple sensors in smartphones [4], [5], [6], [7]. Sensor fusion up the work of this study appeals to a conclusion, and plants
techniques, such as feature-level fusion, early fusion, and late some directorial thought for future study.
fusion have been used to combine data from multiple sensors
meritoriously. II. RELATED WORKS
That means, existing works in HAR based on smartphones In the past decades, HAR has become an active field of
have established the potential of smartphones as trustwor- research and many researchers worked on HAR systems
thy and handy tools to recognize human activities. Various for building various HAR applications in smart environ-
data mining algorithms, sensor fusion approaches, and sig- ments. Generally, a HAR system consists of several common
nal processing techniques have considerably contributed to steps [7]: sensing activity data from environment or body
improving performance and recognition accuracy. sensors, pre-processing and labeling the activity data, seg-
Changes from the existing works which are typically fit mentation using sliding window, feature extraction from
for activity identification part in HAR systems, this study time and/or frequency domains, and modeling using shallow
proposes a new and detailed exploratory data analysis (EDA) and/or deep learning methods with or without transfer learn-
method for visualizing data to separate human activities. ing. As modeling is an essential and significant part of the
Using smartphone sensors’ data for ADLs and combining HAR system, the selection of a classification model has a
the results of multiple heterogeneous models, we achieve prominent effect on the overall precision of the system.
activity recognition effectively. Our contributions are as In the literature, there are two types of HAR systems
follows: based on the classification algorithms used. One prominent
• In this study, a detailed exploratory data analysis is approach is based on the use of shallow learning algorithms.
outlined to deal with the recognition of activities of daily These algorithms learn from labeled datasets where each
living based on smartphone sensor data. So far we know, activity is associated with a specific set of sensor data fea-
no existing work has presented the analysis of HAR tures. Features commonly used include time-domain features,
data in such detail. The detailed EDA highlights the frequency-domain features, statistical features, and spatial
internal characteristics of data so that the data analyst’s features. Researchers have employed classifiers such as deci-
knowledge of identifying activities improves more and sion trees [7], [8], k-nearest neighbors (k-NN) [8], random
deeper, or changes the understanding with the learning forests [7], [8], artificial neural networks (ANN) [8], [9],
to figure out the real distribution of the activity data. support vector machines (SVM) [7], [8], [9], [10], etc. to rec-
• We use boxplots of ‘five number summary’ to visualize ognize activities such as walking, running, sitting, standing,
the dispersion of data; histograms, and bar of probability and cycling. These approaches have demonstrated promising
distribution functions to find the inception for differen- results in accurately recognizing activities with high accu-
tiating the steady and moving activities on univariate racy rates. Kong et al. [7] proposed a method based on six
analysis. Moreover, we apply kernel Principal Compo- different shallow learning models and achieved the highest
nent Analysis (kPCA) as well as T-distributed Stochastic accuracy using linear SVC with the Grid search method of
Neighbor Embedding (t-SNE) manifold learning meth- tuning hyper-parameters. They present some data analysis
ods to investigate the separability of data on all features. like ours but details exploratory analysis for the smartphone
These details EDAs assist in selecting a robust model for sensors data is not provided. Moreover, although their meth-
the HAR method. ods provide certain higher accuracy but require relatively
• The HAR method based on smartphone sensor data, higher training time. Masum et al. [8] captured data using a
detailed EDA, and recognition from multiple hetero- Xiaomi Redmi 4A smartphone, used PCA for selecting fea-
geneous models proposed in this paper is a new tures, and applied several mining algorithms including Dense
lightweight ensemble method. Although the machine Neural Network, Decision tree, k-NN, random forests, SVM,
learning method used here is lightweight, it is a and achieved the highest 94.38% accuracy for their prepared
high-precision method. This study attains higher pre- dataset. They compared the recognition results based on gen-
diction accuracy and lower training time in comparison der (Male and Female) which was not compared in any former
with state-of-the-art shallow and deep models. Various research but their methods provided worse results for highly
model evaluation techniques are used to measure the similar activities such as walking with walking downstairs
and/or walking upstairs. Khan et al. [9] acquired data using the datasets, and obtained the highest 92.7%, 93.7%, and
an LG Nexus 4 smartphone from five different phone posi- 76% accuracy for three datasets respectively. The authors
tions in the body from 40 subjects, sampling at 6 different presented a unique regularization method, explored the influ-
rates, and used data from 30 subjects for offline training and ence of hyper-parameters, and conveyed a recommendation
that of 10 subjects for real-time testing. They used kernel for future researchers who may use deep learning models but
discriminant analysis to reduce class variance and ANN for their suggested setting doesn’t show consistent performance
modeling and achieved the highest 87.1% accuracy. They for all benchmark datasets of HAR as well and their guide-
offered lightweight features that do not necessitate higher lines are limited to few deep models (DNN, CNN, and LSTM)
sampling rates and lengthier time windows for their calcula- and don’t advocate whether they will work for other broadly
tion and so assist in attaining a fast response but those features used deep models such as Inception, Gated Recurrent Unit
are not fully position/orientation-independent of the phone (GRU), etc. for HAR. Xu et al. [12] worked on 18 mid-level
such as the phone in the user’s hands, in a carrier bag, in a gesture activities from the Opportunity dataset, 18 lifestyle
coat’s side pocket, etc. Moreover, their recognition accuracy activities from the PAMAP2 dataset, and 6 activities of daily
is a bit lower than in many former works. Diney et al. [10] living for the dataset used in this study (HAR smartphone
captured accelerometer data using an Android smartphone dataset) and achieved 94.6%, 93.5%, and 94.5% accuracy
from a single subject and proposed an SVM model for the using Inception GoggLeNet and GRU for three correspond-
recognition of three activities of daily living. The authors ing datasets. Though this method provides better general-
developed the depiction of initially engendered vectors into ization and consistent performance than existing methods
compact clusters but captured data of training and recognition but doesn’t explore the class imbalance problem in the
from only one subject, so it cannot be a widespread solution data for real-life HAR applications. Bhattacharya et al. [13]
for HAR applications. proposed Ensem-HAR, where CNN-Net, CNN-LSTM-Net,
Another approach is based on the use of deep learn- ConvLSTM-Net, and StackedLSTM-Net are used as base
ing algorithms for HAR. Deep learning models such as models and Random Forest, is used as a meta-model of
convolutional neural networks (CNNs) and recurrent neu- stacking and implemented their method on three different
ral networks (RNNs) have shown remarkable performance datasets including the one used in this paper and obtain
in various recognition tasks including HAR. By leverag- 95.05% accuracy for HAR Smartphone dataset. Though their
ing the hierarchical representations learned from raw sensor stacking of four deep learning-based models performs better
data, deep learning models can automatically extract rele- than the other works to which it is compared, its accumulative
vant features and capture complex temporal dependencies in training time of four different deep learning-based models is
activity sequences. Researchers have designed deep learning so high that it cannot be a typical method for real-time HAR
architectures for HAR [5], [6], [11], [12], [13], achieving applications.
state-of-the-art accuracy rates and robustness to different In summary, although plenty of work has been completed
environments and user populations. Shi et al. [5] used the to boost and optimize the models in HAR methods; still there
Boulic kinematic model to construct the dataset from body are the following deficiencies:
movement sensors and proposed a Deep Convolutional Gen- (1) As we know human behavior of performing activity is
erative Adversarial Network (DCGAN) and a pre-trained not only usual and impulsive, but also human beings may
deep CNN architecture on ImageNet, VGG-16, deep model perform some unrelated activities. Besides this, there are
for recognizing three types of walking activities based on some variations of performing the same activity by different
moving speed. This method is decent to expand and enrich users. Another challenge is to handle the speed of movement
training set to escape overfitting and acquire better results in moving activities. We use a new EDA method to deal
even in the case of higher similarity between activities such with HAR, which can effectively and accurately separate the
as fast-walking and really-fast-walking but the downside is activities.
that the author works for three types of walking activities (2) There are a variety of machine learning techniques.
only. Ravi et al. [6] proposed CNN models using three dif- So selecting the best machine learning model is a challenge.
ferent regularizations for each of four different datasets: The HAR method based on EDA can find a robust classifier
ActiveMiles, WISDM v1.1, Daphnet FoG, and Skoda and for the dataset to reduce error with low training time.
achieved 95.1%, 98.2%, 91.7%, and 96.7% for recogniz- In this paper, the domain knowledge is enriched by
ing 2, 6, 7, and 10 activities respectively. They achieved exploratory analysis of the data which in sequence helps to
consistent accuracy for real-time classification in low-power select a robust model for the activity classification task, which
devices using their more discriminative and sensor ori- provides high precision results by minimizing the associ-
entation/placement invariant features for the datasets but ated HAR problems like solving the misperception of highly
their precision and computational times are not better than alike activities such as walking and walking-downstairs. For
some former state-of-the-arts. Hammerla et al. [11] worked evaluating the performance of the selected model, numerous
on three different datasets: Opportunity, PAMAP2, and Daph- comparative experiments are conducted and various perfor-
net Gait, proposed five different deep models for each of mance metrics are used. The experimental outcomes show our
methodology overtakes state-of-the-art and reaches higher sensor is separated into two components: body accelera-
accuracy up to 97.12%. tion and gravity acceleration by using another Butterworth
low-pass filter with a cut-off frequency of 0.3Hz because
III. METHODOLOGY gravitational force is supposed to have only low-frequency
The methodology involves the following steps: data collec- components.
tion, data preprocessing and analysis (which includes data From each segmented window, a feature vector is obtained
cleaning as well as exploratory data analysis to observe by estimating variables from both time and frequency
imbalance in the data and analysis on the single and multivari- domains. The data of each feature are normalized and con-
able), and finally modeling with sensor data. The conceptual fined within [-1, 1]. Finally, each record of the dataset
figure of the proposed framework is shown in Fig. 1 and is contains a 561 feature vector, its activity label, and an identi-
described in the sections below. fier of the user who carried out that experiment. Fig. 2 shows
the total data in the dataset with their train and test data
splitting for each activity and we see that about 70% of the
total data is used for training and the rest is used for testing.
1) OBSERVING IMBALANCE IN THE DATA Fig. 4 below shows the number of data points for each of
Fig. 3 below shows the data provided by each subject. the six activities of daily living. Fig. 4(a) shows the data per-
Fig. 3(a) shows the data percentage by all users and from centage of all activities and from this figure; we observe that
this figure; we observe that each subject has almost the same each activity has almost the same number of data. We have
number of data. We have only less data from user 8 (eight) only fewer walking staircase data compared to others but
compared to others but that’s acceptable. So, we should not that’s reasonable. So we should not be worried about the
worry about the difference between them. Fig. 3(b) shows difference between them. Fig. 4(b) shows this in more detail,
this in more detail, where data of each user is further high- where data of each activity is further highlighted based on
lighted based on the user’s activities and we see that each each user and we see that each activity is performed by each
user performs each activity in almost equal number of times subject in almost equal number of times which means there
which means there is no significant amount of gap in their is no significant amount of gap in their readings.
readings.
But still, about 15% of WALKING_DOWNSTAIRS obser- distinguishes all data points belonging to the LAYING activ-
vations are below 0.25 which are misclassified so this ity from other activities by just a single if-else statement: If
condition makes an error of 15% in walking downstairs (tGravityAcc-min()-X < 0.35) then Activity = ‘‘LAYING’’
classification. else Activity = ‘‘others’’.
the X -axis and gravity acceleration mean can also separate a: INVESTIGATING THE SEPARABLITY OF DATA USING KPCA
the in-bed activity from others. These are illustrated in Fig. 23 kPCA is an extension of PCA that achieves non-linear dimen-
to Fig. 26, shown in Appendix B. sionality reduction through the use of kernels to decompose
a multivariate dataset in a set of components that explain a
e: ANGULAR VELOCITY FROM THE GYROSCOPE IS maximum amount of the variance [17]. In PCA the num-
ALSO A FACTOR ber of components is bounded by the number of features
Though accelerometer data is significant to distinct static whereas in kPCA that number is bounded by the number of
and dynamic activities, analysis pays attention to that gyro- instances [18].
scope data in many cases can discriminate them. Fig. 9 We have used polynomial as well as Radial Basis Func-
shows the boxplots for six ADLs based on the feature tion (RBF) as kernel and the resulting figures are shown in
‘fBodyGyro-entropy()-Z’ (entropy value of angular velocity Fig. 10(a) for polynomial kernel degree, 9, and in 10(b) for
from gyroscope along Z -dimension in the frequency domain RBF kernel coefficient, 0.05, respectively.
for body motion). From boxplots, we can see that moving
activities can be clearly distinguished with a threshold value
as follows: If (fBodyGyro-entropy()-Z > 0.04) then Activ-
ity = ‘‘Dynamic’’ else Activity = ‘‘Static’’.
clusters.
Perplexity is the number of closest neighbors of each point
t-SNE contemplates when producing conditional probabili- both types of activities, especially all dynamic and matting
ties. The perplexity value has an impact on the optimization activities.
of t-SNE and therefore on the quality of the resulting embed-
ding. That’s why we analyze different plots with different D. MODELING WITH SENSOR DATA
perplexities: 2, 5, 20, 30, 50, 60, 80, and 100. A higher We have used those two ensemble approaches where the
perplexity considers a larger number of neighbors and ignores final prediction result is obtained from multiple conceptu-
more local information in favor of the global structure of ally different or heterogeneous learning models: Stacking,
data. Conversely, lower perplexities lead to smaller nearest and Voting. In both approaches, we have combined the pre-
neighbors and thus less sensitivity to global information in dictions of three classical machine learning linear models:
favor of the local neighborhood [20]. Four other factors Linear Discriminant Analysis (LDA), Linear Support Vector
control the performance of the resulting embedding: early Machines (LSVM), and Logistic Regression (Logit). These
exaggeration, learning rate, maximum number of iterations, heterogeneous models are applied with the same hyper-
and angle [20]. The resulting image for perplexity 80 with parameters in both stacking and voting classification cases,
early exaggeration, 12.0, learning rate, 153.167, the maxi- but their learning and recognition strategies are different.
mum number of iterations, 1000, and angle, 0.5, is shown Both stacking and voting classifiers improve generalizability
in Fig. 11. or robustness over a single classifier [21]. In the below, the
We select perplexity 80 because it balances attention three estimators that are used in our both stacked and voting
between local and global characteristics of data than other generalization are outlined with their hyper-parameters so
perplexities. For clarification, the figures for four other per- that one can reproduce the result and then the learning and
plexities: 5, 20, 50, and 100 are shown in Fig. 27 to Fig. 30 in recognition process of both ensembles are delineated.
Appendix C.
In Fig. 11, we see the data points in 2 dimensions and we 1) ESTIMATORS AND THEIR HYPER-PARAMETERS
observe the behavior of those data points. We can see the The parameters that are not directly learned within models
six activities in three folds/clusters. Again we observe that are called hyperparameters. They are provided as arguments
all other classes are fairly separable instead of ‘standing’ and to the model classes’ constructors in Scikit-learn [22].
‘sitting’ classes, because of similarities in sensor values, and Linear Discriminant Analysis provides a linear decision
it is expected because both are static actions. Maybe other boundary which is generated by fitting a class conditional
sensors like the heartbeat sensor can assist in discriminating Gaussian density to each class and using Bayes’ rule [23].
this because the heart rate is different at resting and stand- As the dataset contains six ADLs, the desired dimensionality
ing poses. Laying activity is totally in a different position. here is five. We have used the least squares solution as a
Walking, Walking downstairs, and walking upstairs are some solver and automatic shrinkage as a form of regularization
kind of similar so they are clustered together but separable (to improve the estimation of covariance matrices) using the
from each other. So, t-SNE is good for separating each of Ledoit-Wolf lemma [24].
high which indicates that for any human activity class, the
classifier misclassified it to any other class as well as any
other class is misclassified to it; in both cases, the error is
almost equal and too low. The higher support value of static
activities than that of dynamic activities focus on the disguise
that human is too lazy to do exercise activities. The high
average values of precision, recall, and F1 -score show the
robustness of the models and we see that the stacking model is
FIGURE 14. Proposed voting model.
better than the voting model for the HAR smartphone dataset.
A. EXPERIMENTAL RESULTS
Developing the machine learning model by properly tuning
the hyper-parameters of the estimators, the heat map of the
confusion matrix is obtained. Fig. 15 shows the confusion
matrix for both stacking and voting classifiers. From the
heat maps, we outline three remarks: (1) The correlation
degree of the stacking model is higher than that of the voting
classifier (2) The correlation degree of inactivity is higher
than that of moving activity. (3) The correlation degree among
static activities is higher than the correlation of any static
activity with any other moving activity. This means that if any
static activity is misclassified, it is misclassified to another FIGURE 15. Confusion matrix for ensemble models.
data within the table, we see that the stacking ensemble shows
better results than the voting classifier in all cases except in
the case of training time complexity, though the training time
is at a negligible level. The proposed stacking classifier shows
97.12% accuracy. B. COMPARISON AND DISCUSSION
Moreover, for clarifying the performance of the proposed Some state-of-the-art who worked on the same dataset have
models the Cohen’s kappa score, Jaccard score, Mathew been compared with the results of the proposed study, given
correlation coefficient, Hamming loss, and Zero one loss are in Table 5.
also obtained and shown in Fig. 16. These metrics also show Comparing the models developed in this study with similar
the robustness of both models and as before show little better studies in the literature, it is seen that the results of this study
performance for stacking model. are pleasing with an accuracy rate of 97.12% for the same
FIGURE 24. Data extent of six ADLs for gravity-acceleration mean in the
time domain along the X-axis. FIGURE 26. Data extent of six ADLs for the position of gravity-
acceleration mean with X-axis.
FIGURE 25. Data extent of six ADLs for gravity-acceleration energy in the
time domain along the X-axis.
Moreover, the accuracy of our best model (stacking model)
is compared with its baseline algorithms in Table 7.
less than that of their best model. Therefore, the above two The above table shows that the stacking model provides
comparison tables show the strength of our models in terms better results than its baseline methods and so, this table
of accuracy and time complexity. focuses on the significance of organizing baseline models
has the potential to increase the performance of present [8] A. K. M. Masum, S. Jannat, E. H. Bahadur, M. G. R. Alam, S. I. Khan,
HAR applications. Moreover, the solution can be extended and M. R. Alam, ‘‘Human activity recognition using smartphone sen-
sors: A dense neural network approach,’’ in Proc. 1st Int. Conf. Adv.
to the classification of any multivariate time series data for Sci., Eng. Robot. Technol. (ICASERT), Dhaka, Bangladesh, May 2019,
other applications. As our experiment is carried out offline, pp. 1–6.
our future target is to use our EDA to build and publish a real- [9] A. Khan, M. Siddiqi, and S.-W. Lee, ‘‘Exploratory data analysis of
acceleration signals to select light-weight and accurate features for real-
time system to classify human activities. Again, as human time activity recognition on smartphones,’’ Sensors, vol. 13, no. 10,
beings may accomplish numerous activities at the same time, pp. 13099–13122, Sep. 2013, doi: 10.3390/s131013099.
the future HAR hopes to be capable of recognizing parallel [10] P. Dinev, I. R. Draganov, O. L. Boumbarov, and D. Brodić, ‘‘Prepro-
activities. Moreover, our future target is to extend HAR based cessing and clustering raw accelerometer data from smartphones for
human activity recognition,’’ in Proc. CEMA, Athens, Greece, 2017,
on EDA to other human activities, human interactions, and pp. 20–24.
relationships. [11] N. Y. Hammerla, S. Halloran, and T. Plötz, ‘‘Deep, convolutional, and
recurrent models for human activity recognition using wearables,’’ in Proc.
IJCAI, New York, NY, USA, 2016, pp. 1533–1540.
APPENDIX A
[12] C. Xu, D. Chai, J. He, X. Zhang, and S. Duan, ‘‘InnoHAR: A deep neural
FIGURES MENTIONED IN THE SUBSECTION ‘BODY network for complex human activity recognition,’’ IEEE Access, vol. 7,
ACCELERATION CAN SEPARATE IT WELL’ pp. 9893–9902. 2019, doi: 10.1109/ACCESS.2018.2890675.
See Figs. 19–22. [13] D. Bhattacharya, D. Sharma, W. Kim, M. F. Ijaz, and P. K. Singh, ‘‘Ensem-
HAR: An ensemble deep learning model for smartphone sensor-based
human activity recognition for measurement of elderly health monitoring,’’
APPENDIX B Biosensors, vol. 12, no. 6, p. 393, Jun. 2022, doi: 10.3390/bios12060393.
FIGURES MENTIONED IN THE SUBSECTION ‘GRAVITY [14] Human Activity Recognition Using Smartphones Data Set.
Accessed: Jan. 28, 2023. [Online]. Available: https://archive.ics.
ACCELERATION COMPONENTS ALSO MATTERS’ uci.edu/ml/datasets/human+activity+recognition+using+smartphones
See Figs. 23–26. [15] Signal Processing With Machine Learning—Human Activity
Recognition Part 1—EDA. Accessed: Feb. 5, 2023. [Online]. Available:
https://medium.com/analytics-vidhya/signal-processing-with-machine-
APPENDIX C
learning-human-activity-recognition-part-i-eda-a1f3b0e91b63
FIGURES MENTIONED IN THE SUBSECTION [16] Step By Step All Classification Model for Beginners.
‘INVESTIGATING THE SEPARABILITY Accessed: Feb. 7, 2023. [Online]. Available: https://www.kaggle.com/
OF DATA USING T-SNE’ code/devson/stepbystep-all-classificationmodel-for-beginners
[17] B. Schölkopf, A. Smola, and K. R. Müller, ‘‘Kernel principal component
See Figs. 27–30. analysis,’’ in Proc. ICANN, Lausanne, Switzerland, 1997, pp. 583–588.
[18] S. Wold, K. Esbensen, and P. Geladi, ‘‘Principal component analysis,’’
APPENDIX D Chemometrics Intell. Lab. Syst., vol. 2, nos. 1–3, pp. 37–52, Aug. 1987,
THE CODES OF THIS STUDY doi: 10.1016/0169-7439(87)80084-9.
The codes used to produce the outcomes of our study are [19] L. Van der Maaten and G. Hinton, ‘‘Visualizing data using t-SNE,’’
J. Mach. Learn. Res., vol. 9, no. 11, pp. 2579–2605, Nov. 2008.
available in the following Github link: https://github.com/
[20] A. J. Izenman, ‘‘Introduction to manifold learning,’’ Wiley Interdis-
SM-Mohidul-Islam/Exploratory-Analysis-of-HAR-Smartph cipl. Rev., Comput. Statist., vol. 4, no. 5, pp. 439–446, Sep. 2012, doi:
one-Sensor-Data. 10.1002/wics.1222.
[21] T. G. Dietterich, ‘‘Ensemble methods in machine learning,’’ in Proc. MCS,
REFERENCES Berlin, Germany, 2000, pp. 1–15.
[1] C. A. Ronao and S.-B. Cho, ‘‘Human activity recognition using smart- [22] Tuning the Hyper-Parameters of an Estimator. Accessed: Feb. 17, 2023.
phone sensors with two-stage continuous hidden Markov models,’’ in Proc. [Online]. Available: https://scikit-learn.org/stable/modules/grid_search.
10th Int. Conf. Natural Comput. (ICNC), Xiamen, China, Aug. 2014, html
pp. 681–686. [23] R. H. Riffenburgh, ‘‘Linear discriminant analysis,’’ Ph.D. dissertation,
[2] E. Gambi, G. Temperini, R. Galassi, L. Senigagliesi, and A. D. Santis, Dept. Stat., Virginia Polytech. Inst., Blacksburg, VA, USA, 1957.
‘‘ADL recognition through machine learning algorithms on IoT air qual- [24] O. Ledoit and M. Wolf, ‘‘Honey, I shrunk the sample covariance matrix,’’
ity sensor dataset,’’ IEEE Sensors J., vol. 20, no. 22, pp. 13562–13570, J. Portfolio Manage., vol. 30, no. 4, pp. 110–119, Jun. 2004, doi:
Nov. 2020, doi: 10.1109/JSEN.2020.3005642. 10.3905/jpm.2004.110.
[3] P. F. Edemekong, D. Bomgaars, S. Sukumaran, and S. B. Levy, [25] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf, ‘‘Support
Eds., Activities of Daily Living. Bethesda, MD, USA: StatPearls vector machines,’’ IEEE Intell. Syst. Appl., vol. 13, no. 4, pp. 18–28,
Publishing, 2020, Accessed: Mar. 15, 2023. [Online]. Available: Jul./Aug. 1998, doi: 10.1109/5254.708428.
https://www.ncbi.nlm.nih.gov/books/NBK470404/ [26] T. G. Nick and K. M. Campbell, ‘‘Logistic regression,’’ in Topics in
[4] D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz, ‘‘A public Biostatistics, W. T. Ambrosius, Ed. Totowa, NJ, USA: Humana Press,
domain dataset for human activity recognition using smartphones,’’ in 2007, pp. 273–301.
Proc. Esann, Bruges, Belgium, vol. 3, 2013, p. 3. [27] D. H. Wolpert, ‘‘Stacked generalization,’’ Neural Netw., vol. 5, no. 2,
[5] X. Shi, Y. Li, F. Zhou, and L. Liu, ‘‘Human activity recognition based pp. 241–259, Jan. 1992, doi: 10.1016/S0893-6080(05)80023-1.
on deep learning method,’’ in Proc. Int. Conf. Radar (RADAR), Brisbane, [28] T. Hastie, R. Tibshirani, J. H. Friedman, and J. H. Friedman, The Elements
QLD, Australia, Aug. 2018, pp. 1–5. of Statistical Learning: Data Mining, Inference, and Prediction, vol. 2,
[6] D. Ravi, C. Wong, B. Lo, and G.-Z. Yang, ‘‘Deep learning for 2nd ed. New York, NY, USA: Springer, 2009, pp. 1–758.
human activity recognition: A resource efficient implementation on [29] F. Ordóñez and D. Roggen, ‘‘Deep convolutional and LSTM recurrent
low-power devices,’’ in Proc. IEEE 13th Int. Conf. Wearable Implant. neural networks for multimodal wearable activity recognition,’’ Sensors,
Body Sensor Netw. (BSN), San Francisco, CA, USA, Jun. 2016, vol. 16, no. 1, p. 115, Jan. 2016, doi: 10.3390/s16010115.
pp. 71–76. [30] J. Yang, M. N. Nguyen, P. P. San, X. Li, and S. Krishnaswamy, ‘‘Deep
[7] W. Kong, L. He, and H. Wang, ‘‘Exploratory data analysis of human convolutional neural networks on multichannel time series for human
activity recognition based on smart phone,’’ IEEE Access, vol. 9, activity recognition,’’ in Proc. IJCAI, Buenos Aires, Argentina, vol. 15,
pp. 73355–73364, 2021, doi: 10.1109/ACCESS.2021.3079434. 2015, pp. 3995–4001.
[31] F. Cruciani, A. Vafeiadis, C. Nugent, I. M. P. Cleland, K. Votis, and KAMRUL HASAN TALUKDER received the
R. Hamzaoui, ‘‘Feature learning for human activity recognition using con- Bachelor of Science degree (Hons.) in CSE,
volutional neural networks,’’ CCF Trans. Pervasive Comput. Interact., vol. the M.Sc. degree in computer science from the
2, no. 1, pp. 18–32, 2020, doi: 10.1007/s42486-020-00026-2. National University of Singapore (NUS), in 2004,
[32] N. Nair, C. Thomas, and D. B. Jayagopi, ‘‘Human activity recogni- and the Doctor of Engineering (D.Eng.) degree
tion using temporal convolutional network,’’ in Proc. 5th Int. Workshop from Hiroshima University, Japan, in 2008.
Sensor-Based Activity Recognit. Interact., Berlin, Germany, Sep. 2018, He was the Head of the Computer Science and
pp. 1–8. Engineering Discipline for three years. He joined
Khulna University, as a Faculty Member, in 2000.
He was a Postdoctoral Fellow with the Japan Soci-
ety for the Promotion of Science (JSPS), Hiroshima University, for two
years. He is currently a Professor with the Computer Science and Engi-
S.M. MOHIDUL ISLAM was born in 1983. neering Discipline, Khulna University. He is also the Dean of the Science,
He received the B.Sc. and M.Sc. degrees (Hons.) Engineering, and Technology School, Khulna University. He has published
in CSE, in 2007 and 2016, respectively. He is more than 70 peer-reviewed research articles over the years. His research
currently pursuing the Ph.D. degree in computer interests include image analysis, software engineering, networking, and
science and engineering with Khulna University, the IoT.
Bangladesh.
He achieved an ICT Fellowship from
Bangladesh Government for the M.Sc. degree
in engineering research. He joined Khulna Uni-
versity, as a Faculty Member, in 2008. He has
published several research papers in international journals and conferences.
His research interests include human activity recognition, data science,
machine learning, and smart technology. He is a Life Member of the Engi-
neer’s Institution Bangladesh (IEB) and the Bangladesh Computer Society
(BCS). He is the former Joint Secretary of the Khulna Region of Bangladesh
Open Source Network (BdOSN). He is the former Joint Secretary and a
current Branch Member of the Khulna Branch of the Bangladesh Computer
Society.