Research Article Network Threat Detection Based On Group CNN For Privacy Protection

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Hindawi

Wireless Communications and Mobile Computing


Volume 2021, Article ID 3697536, 18 pages
https://doi.org/10.1155/2021/3697536

Research Article
Network Threat Detection Based on Group CNN for
Privacy Protection

Yanping Xu ,1 Xia Zhang,1 Chengdan Lu ,2 Zhenliang Qiu,1 Chunfang Bi,3 Yuping Lai,4
Jian Qiu,5 and Hua Zhang6
1
School of Cyberspace, Hangzhou Dianzi University, Hangzhou, China
2
Zhejiang Electronic Information Product Inspection and Research Institute (Key Laboratory of Information Security of
Zhejiang Province), Hangzhou, China
3
Chengzhong Primary School, Zibo, China
4
School of Cyberspace, Beijing University of Post and Telecommunications, Beijing, China
5
Center for Undergraduate Education, Westlake University, Hangzhou, China
6
School of Computer Science, Hangzhou Dianzi University, Hangzhou, China

Correspondence should be addressed to Chengdan Lu; [email protected]

Received 8 June 2021; Revised 15 July 2021; Accepted 3 August 2021; Published 3 September 2021

Academic Editor: Yaguang Lin

Copyright © 2021 Yanping Xu et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The Internet of Things (IoT) contains a large amount of data, which attracts various types of network attacks that lead to
privacy leaks. With the upgrading of network attacks and the increase in network security data, traditional machine
learning methods are no longer suitable for network threat detection. At the same time, data analysis techniques and deep
learning algorithms have developed rapidly and have been successfully applied to a variety of tasks for privacy protection.
Convolutional neural networks (CNNs) are typical deep learning models that can learn and reconstruct features accurately
and efficiently. Therefore, in this paper, we propose a group CNN models that is based on feature correlations to learn
features and reconstruct security data. First, feature correlation coefficients are computed to measure the relationships
among the features. Then, we sort the correlation coefficients in descending order and group the data by columns. Second,
a 1D group CNN model with multiple 1D convolution kernels and 1D pooling filters is built to address the grouped data
for feature learning and reconstruction. Third, the reconstructed features are input to shadow machine learning models for
network threat prediction. The experimental results show that features reconstructed by the group CNN can reduce the
dimensions and achieve the best performance compared to the other present dimension reduction algorithms. At the same
time, the group CNN can decrease the floating point of operations (FLOP), parameters, and running time compared to
the basic 1D CNN.

1. Introduction In the IoT, there are three major security and privacy chal-
lenges: terminal authentication, network attack prevention,
Application scenarios for the IoT are becoming increasingly and personal data protection [8, 9]. In terms of privacy chal-
mature, which brings people to a new digital lifestyle by con- lenges, blockchain-enabled technology using encryption
necting everything [1]. However, as the IoT scope and scale algorithm will not cause privacy data leakage [10–12]. In
continue to expand, the threat of network intrusion has terms of network attack prevention, network threat detection
become more serious than ever before [2, 3]. Malicious soft- technology is required to find network intrusions and meet
ware, DDoS attacks, vulnerability attacks, and other attacks the demand of IoT assurance. In this situation, intrusion
always occur in the IoT cyberspace, which inevitably leads detection [13], malicious code detection [14], malware detec-
to privacy leaks [4–6]. These attacks harm not only physical tion [15], malicious URL detection [16], and vulnerability
terminal equipment but also people’s lives and property [7]. mining [17] based on machine learning algorithms are
2 Wireless Communications and Mobile Computing

considered to be effective network threat behavior detection to address 1D signals based on 2D CNN and has achieved
measures. With the upgrading of attacks and the increase in superior performance with high efficiency [34, 35]. To adapt
network security data, traditional machine learning methods to the data characteristics of 1D signals, comparing 2D
are no longer suitable. At the same time, data analysis tech- CNNs, the hierarchical architecture of 1D CNNs is simplified
niques and deep learning algorithms have developed rapidly [36]. For example, in the structure of 1D CNN, the data of the
and have been successfully applied to natural language pro- convolution kernels and pooling filters are 1D. In the struc-
cessing, image recognition, and video detection [18, 19]. In ture of 2D CNN, the data of the convolution kernels and
the field of network security, many research studies have used pooling filters are 2D. Therefore, in the structure and run-
deep learning technology to detect network threats and have ning process, 1D CNN is simpler than 2D CNN [34]. There-
garnered many achievements [13, 20]. fore, we build a 1D CNN for analyzing network security data.
Big data analysis techniques include oversampling imbal- However, with the deepening of the network layers, the
anced datasets, dimension reduction of high-dimensional number of parameters increases exponentially [37]. For exam-
data, and correlation analysis between features [21]. Correla- ple, in a traditional basic 2D CNN, if the size of the input image
tion analysis studies the correlation coefficients among two feature is C ∗ H ∗ W, the number of convolution kernels is N,
or more random variables [22]. In probability theory, the cor- the size of each convolution kernel is K ∗ K, and the size of
relation coefficient can reflect that there is a close relationship the feature map is M ∗ M. The total number of parameters in
between variables. The range of the correlation coefficient is ½ all convolution layers is C ∗ N ∗ ðK ∗ K + 1Þ ∗ ðM ∗ MÞ.
−1, 1. The closer the absolute value of the correlation coeffi- Obviously, the number of parameters is large. To reduce the
cient is to 1, the closer the linear relationship between the number of parameters and improve the efficiency of the
two variables is. In contrast, the closer the absolute value of CNN, a group CNN is proposed to group the convolution ker-
the correlation coefficient approaches to 0, the weaker the lin- nels separately [38]. Suppose that the convolution kernels are
ear relationship between the two variables will be. Therefore, divided into T groups, the number of convolution kernels in
we use a correlation coefficient matrix to measure the relation- each group is N/T, the size of each convolution kernel is K ′
ships among the column vectors in the data matrix. ∗ K ′ , and the size of the feature map is M ′ ∗ M ′ . The total
Machine learning algorithms are divided into shallow number of parameters in the convolution layers is C ∗ ðN/TÞ
learning and deep learning [13, 22, 23]. Shallow learning is
∗ ðK ′ ∗ K ′ + 1Þ ∗ ðM ′ ∗ M ′ Þ. When grouping, the sizes of
treated as a traditional machine learning technique that
the convolution kernel and the feature maps are considered to
achieves desirable effects to address a small amount of data.
be smaller, and the total number of parameters of all of the con-
Shallow learning algorithms, including support vector
volution layers is reduced. At the same time, the performance of
machines (SVMs), random forests (RFs), decision trees
the algorithm is improved. Therefore, we use a group CNN to
(DTs), and K-means algorithms, have been employed to dis-
address the big network data.
tinguish abnormal data from network activities [13, 20].
When analyzing the security data for network threat
Comparing rule-based intrusion detection systems (IDSs),
detection, we determined that each threat behavior had 1D
shallow learning methods do not rely on the domain knowl-
characteristics, which makes the threats similar to 1D signals.
edge and can extend their generalization ability to detect the
Additionally, the group CNN can improve the efficiency.
attack variants and unknown attacks. However, shallow
Therefore, learning from the successful experience of using
learning is no longer suitable to address the complexity of
1D CNN to process 1D signals, we build a 1D group CNN
the dataset and the diversity of the features [13]. In this situ-
model to perform feature learning and reconstruction of the
ation, it emerges that deep learning is required.
security dataset. In this paper, we combine shadow learning
Deep learning, also known as deep neural networks
and deep learning algorithms to build a network threat detec-
(DNNs), is designed from hierarchical structures composed
tion model. First, correlation coefficients are computed to
of multiple neural layers [24]. Deep learning can extract
measure the relationships of the features. Then, we sort the
and learn information to generate the reconstruction features
correlation coefficients in descending order and group the data
from the input raw data through layer-by-layer neural pro-
by the columns. Second, a 1D group CNN model with multi-
cessing. Benefitting from their feature reconstruction charac-
ple 1D convolution kernels and 1D pooling filters is built for
teristics, deep learning algorithms, including CNNs,
feature learning and reconstruction. In each convolution layer
recurrent neural network (RNNs), and generative adversarial
and pooling layer, the convolution kernels and pooling filters
networks (GANs) [25–31] have been widely used not only for
are grouped. Third, the reconstructed features are input to
visual recognition and language understanding but also for
the shadow learning models for threat prediction.
network threat detection. Studies in [32] show that deep
The proposed method includes the following advantages:
learning algorithm-based methods can achieve better perfor-
mance when working on reconstructed features. (1) Compared with the traditional basic 1D CNN, the
CNN, as one of the typical DNN models, was first pro- proposed group CNN model with grouped convolu-
posed to solve the problem of 2D image recognition. 2D tion kernels and pooling filters reconstructs the fea-
CNNs have been successfully used to learn and reconstruct tures layer by layer and reduces the FLOP,
features from raw data and have developed into the domi- parameters, and running time
nant approach for accomplishing recognition and detection
tasks of image and speech analysis [33]. Due to the good (2) The proposed data grouping, which is based on cor-
characteristics of CNN learning, 1D CNN has been proposed relation coefficients between the features, can
Wireless Communications and Mobile Computing 3

enhance the structural information used by group the type of raw data, 1D CNN and 2D CNN models should
CNN to address the data be built. A 1D CNN is constructed to process one-
dimensional sequence signal data and natural language, and
(3) The proposed group CNN model can reduce the a 2D CNN is constructed to address two-dimensional image
dimensions by generating fewer reconstructed fea- and video data [36]. Because the CNN can learn and recon-
tures and can achieve high performance struct features, both 1D CNN and 2D CNN are used for net-
(4) The FLOP and parameter counts of the group CNN work threat detection.
are calculated and are less than those of the basic Xiao et al. [46] proposed a network intrusion detection
1D CNN model based on a CNN. The original traffic data were
reduced in dimensions through principal component analy-
The remainder of this paper is organized as follows. Sec- sis (PCA) and an autoencoder (AE), and then, the data were
tion 2 discusses the related work using shallow learning algo- converted into a 2D image format. Next, the 2D data were
rithms and deep learning technology in network threat input to the CNN model to evaluate the performance. Wang
detection. A description of the 1D group CNN model is pro- et al. [47] proposed a method that represented raw flow data
vided in Section 3. Experimental results and analysis are pre- as an image and used the CNN for classification and identifi-
sented in Section 4. The work is concluded in Section 5. cation without manually selecting and extracting features.
Experimental results showed that this method had high avail-
2. Related Work ability and high accuracy in malicious traffic identification.
Zhang et al. [48] proposed a feature-hybrid malware variant
Machine learning techniques, including shallow learning and detection approach based on 2D CNN and 1D BPNN. A
deep learning algorithms, have been used for anomaly detec- 2D CNN was designed to compute the dot product and com-
tion since the early 2000s and can automatically mine hidden press the dimension of the PCA-initialized opcode matrix.
information on the differences between normal and Experimental results showed that the method achieved more
malicious behaviors. than 95% malware detection accuracy. Zhang et al. [49] pro-
Shallow learning algorithms, such as traditional machine posed converting opcodes into a 2D matrix and adopted the
learning algorithms, were previously applied to analyze sys- CNN to train the 2D opcode matrix for malware recognition.
tem logs, malicious software, and network traffic and to out- Experimental results showed that their approach could sig-
put the predicted labels of the input data. By comparing the nificantly improve the detection accuracy by 15%. Yan et al.
predicted labels with the true labels, the performance of the [50] proposed converting Android opcode into 2D gray
shallow learning algorithms can be achieved. The most images with fixed size and adopted a CNN to train and detect
widely used algorithms include SVM, DT, NB, and K- Android malware. Through the above literature, we can
means [39, 40]. Buczak et al. [39] provided a summary as a determine that the input data of the 2D CNN model must
survey to describe some machine learning and data mining be converted into 2D data first.
methods, such as DT, SVM, RF, and NB, which were used Ma et al. [51] proposed a hybrid neural network com-
for cybersecurity intrusion detection. Kruczkowski and Szyn- prised of 1D CNN and DNN to learn the characteristics of
kiewicz [41] used SVM with kernels to build a malware high-dimension network flows for network anomaly detec-
detection model. The results revealed that SVM was a robust tion. Experimental results showed that the proposed method
and efficient method for data analysis and it increased the was better than those of other algorithms on the comprehen-
efficiency of malware detection. Bilge et al. [42] presented sive performances. Azizjon et al. [52] proposed a 1D CNN
the EXPOSURE system to analyze large-scale and passive model to serialize the TCP/IP packets in a predetermined
domain name service (DNS) data. The classifier is built by time range as an invasion Internet traffic model for the IDS.
J48 DT. The experimental results suggested that the mini- Experimental results showed that 1D CNN and its variant
mum error was achieved by a decision tree. Aung and Min architectures had the capability to extract high-level feature
[43] used K-means and classification and regression tree representations and outperformed the traditional machine
(CART) algorithms to mine the KDD’99 dataset for intrusion learning classifiers. Wei et al. [53] proposed a 1D CNN-
detection. The experimental results showed that the hybrid based model to identify phishing websites on a URL address
data mining method could achieve good accuracy in perfor- text, which was converted to one-hot character-level repre-
mance analysis with time complexity. Mo et al. [44] discussed sentation. This mode liked the 1D CNN to analyze natural
three data clustering algorithms, including K-means, fuzzy C language. Experimental results showed that the method was
means (FCM), and expectation maximization (EM), to cap- faster to detect zero-day attacks. Zhang et al. [54] designed
ture abnormal behavior in communication networks. The a flow-based intrusion detection model called SGM-CNN,
experimental results showed that FCM was more accurate. which first integrated SMOTE and GMM to make an imbal-
More recently, deep learning technology is developing anced class process and used 1D CNN to detect the network
rapidly and has been successfully been applied to a variety traffic data with high accuracy. Experimental results showed
of tasks, such as natural language processing, image recogni- that SGM-CNN was superior to the state-of-the-art methods,
tion, and computer vision [45]. CNNs, as typical DNN and effective for large-scale imbalanced intrusion detection.
models, have feedforward neural networks with convolution The group convolution was first proposed and used in
calculations and deep structures, which can learn and recon- AlexNet by Krizhevsky et al. [37] for distributing the model
struct features more accurately and efficiently. According to over two GPUs to handle the memory insufficient issue.
4 Wireless Communications and Mobile Computing

Adam optimizer
1D groupCNN
Feature
correlation Group Group
convolution Convolution
kernel kernel Predic
tion Loss
Conca
tenate

Softmax
Loss
function
ę
Group Group Group True
Group Full
convolution pooling convolution labels
pooling connection
Convolution Pooling
Pooling mapping matrix mapping matrix
Convolution mapping matrix
Input data Reconstruction
mapping matrix Accurracy
Group data
data precision
Shallow machine learning classifier recall
Confusion F1
Support vector machine
matrix

Figure 1: The architecture of the 1D group CNN.

AlexNet was designed as a the group convolution method convolution kernels and pooling filters is built for feature
could increase the diagonal correlations between the convo- learning and reconstruction. In the group CNN model, the
lution kernels, reduce the training parameters, and be not input data are divided into multiple groups. Similarly, convo-
easy to overfit. Zhang et al. [38] proposed interleaved group lution kernels and pooling filters in each layer are divided
CNNs called IGCNets, which contained a pair of successive into multiple groups. Each group of data is computed by each
interleaved group convolutions, i.e., the primary group con- convolution kernel and is then computed by each pooling
volution and the secondary group convolution. IGCNets filter.
was wider than a regular convolution. Experimental results Finally, a concatenating layer is used to concatenate mul-
demonstrated that IGCNets was more efficient in parameters tiple groups of data to form one group of reconstructed data.
and computation complexity. Xie and Girshick [55] pro- Third, the reconstructed data are input to the shadow
posed a simple, highly modularized network architecture machine learning model for threat prediction. In the shadow
named ResNeXt, which was based on AlexNet and con- machine learning model, traditional machine learning algo-
structed by repeating a building block. The idea of ResNeXt rithms are used to identify normal or abnormal samples from
was consistent of group convolutions. Without increasing reconstruction data. Then, the accuracy, precision, recall, and
the complexity of the parameters, the accuracy of the model F1, which are the detection performance indicators, are com-
could be improved, and the number of super parameters puted according to the statistics of the confusion matrix.
could be reduced. Lu et al. [39] proposed a novel repeated
group convolutional kernel (RGC) to remove the filter’s 3.1. Group Convolutional Neural Network for Feature
redundancy from group extent. SRGC-Nets worked well in Reconstruction. The convolutional neural network (CNN) is
not only reducing the model size and computational com- one of the representative algorithms for deep learning. It is
plexity, but also decreasing the testing and training running a type of deep feed forward neural network that has convolu-
time. tion calculations [56]. CNNs have the capability of represen-
In the 2D CNN-based model, the input data are con- tation learning to generate reconstruction features. At the
verted to the image format. In the 1D CNN-based model, same time, by the convolution operation and pooling opera-
the input data are treated as timing serial signals, similar to tion, a CNN can achieve the purpose of reducing the dimen-
natural language. Compared with a 2D CNN, the structure sions of the input data [57]. Additionally, grouped
of a 1D CNN is simpler, which makes the computational convolution kernels and pooling filters can reduce the num-
complexity lower. Therefore, we intend to learn from the ber of parameters and improve the performance [39]. There-
experience of applying the 1D CNN to address the data and fore, we use 1D group convolution kernels to build a 1D
to construct a network threat detection model for feature group CNN model in this work.
learning and reconstruction. The 1D group CNN includes multiple convolutional
layers, multiple pooling layers, a full connection layer, a
3. Proposed Solution concatenating layer, and an output layer. In each convolu-
tional layer, the convolutional kernels are divided into multi-
The architecture of the proposed network threat detection ple groups. At the same time, in each pooling layer, the
model, which combines the 1D group CNN algorithm and pooling filters are divided into multiple groups. The fully
machine learning classification methods, is shown in connected layer determines the dimensions of the recon-
Figure 1. First, correlation coefficients are computed to mea- struction features of each group. The concatenating layer is
sure the relationships between the features. Then, we sort the used to concatenate the reconstruction features of each group
correlation coefficients in descending order and group the to form the final results. The combination of multiple layers
data. Second, a group CNN model with multiple groups of makes the group CNN output the low-dimensional
Wireless Communications and Mobile Computing 5

reconstruction features, which can not only strengthen the in the lth pooling layer is Rm,t
l , which is expressed as follows:
original data’s features but also relatively reduce the
dimension.   m,t m,t 
Rm,t
l = Re LU max pooling Sl , Pl , ð2Þ
3.2. Feature Correlation. In this work, we assume that the
input data are X = ðx1 , x2 , ⋯, xn , ⋯xN Þ, xn = where max poolingð·Þ is the pooling function. The max pool-
ðxn1 , xn2 , ⋯, xnD ÞT , n = ð1, 2, ⋯, NÞ, containing N indepen- ing is adopted in this paper.
dent D-dimensional samples. Usually, the malicious samples After the last pooling layer is the full connection layer.
have some similar values of the same features and so are the Last pooling layer is connected to a fully connected layer.
benign samples. Thus, there are certain correlations between After the convolution operations and pooling operations,
the futures and the labels. We calculate the correlations between the original data is converted into the feature maps. In
the data features and labels based on the correlation coefficients. the full connection layer, the tth feature map is mapped
First, we calculate the correlations between the data fea- to the group reconstruction features X t ′ by a global con-
tures and labels based on the correlation coefficients to form volution operation:
a correlation coefficient matrix R. Then, we randomly select
one row vector Ri and rank the correlation elements in !
descending order. Furthermore, we divide the data into T   m,t m,t  m,t 

X t = Re LU 〠 conv1D K full , RL + bL , ð3Þ
groups by columns equally according to the descending cor- m
relations. Usually, each group has the same number of fea-
tures, which is D/T. The input data in the tth (0 < t ≤ T)
group are expressed as X t = ðx1,t , x2,t , ⋯, xn,t , ⋯xN ,t Þ. So the where K m ,t
f ull is the convolution kernel of the full connec-
correlation coefficients of the first group data are the biggest, tion layer. bm ,t
L is the bias of the full connection layer.
and that of the last group data are the smallest. Further, the fully connected layer is connected to the
concatenating layer. The T groups of the reconstructed fea-
3.3. Group CNN. After the data are grouped, we start to estab-
tures X t ′ are concatenated to form the final reconstructed
lish the group CNN model, which contains L convolution
layers, L pooling layers, a full connection layer, a concatenat- features X ′ :
ing layer, and an output layer. Like the group counts of the  
input data, the convolution kernels and pooling filters in each X ′ = concatenate X t ′ , ð4Þ
layer are also divided into T groups. Further, there are M
convolution kernels in each group.
Suppose that the mth (0 < m ≤ M) convolution kernel in where concatenateð·Þ is the reconstruction features’
the tth (0 < t ≤ T) group of the lth (0 < l ≤ L) convolution concatenated function.
layer is expressed as K m ,t
l . Convolution operations are con- The size of X ′ is N × D ′ . When D ′ is less than D, it means
ducted between the grouped data X t and the convolution ker- that the dimension of D ′ is less than that of D. In other
nel, or the output Rm ,t
l−1 of the previous pooling layer. Then, words, 1D CNN realizes the generation of reconstruction fea-
activation function is working to generate the feature maps. tures and the dimension reduction of features.
Suppose the feature map in the tth group of the lth convolu-
tion layer by the mth convolution kernel is Sm ,t
l , which is
3.4. Floating Point of Operations and Parameters. Floating
expressed as follows: point of operations (FLOP) is used to calculate the times of
multiplications and additions, which are related to the overall
(   ,t  m ,t  running time of the model [58]. In this section, we want to
Re LU conv1D K m
l , X t + bl , l = 1,
Sm
l
,t
=   m,t m,t  m,t 
calculate the FLOP and parameter counts of the group
Re LU conv1D K l , Rl−1 + bl , 1 < l ≤ L, CNN. However, the group CNN is proposed on the basic
1D CNN. So, we first calculate the FLOP and parameter
ð1Þ counts of the basic 1D CNN. Then, we calculate the FLOP
and parameter counts of the group CNN based on that of
where Re LUð·Þ is the nonlinear activation function. conv1D the basic 1D CNN.
ð·Þ is the 1D convolution function. Rm ,t
l−1 is the output of the
mth pooling filter in tth group of the ðl − 1Þth pooling layer. 3.4.1. FLOP and Parameter Counts of the Basic 1D CNN. Sup-
bm
l
,t
is the bias of the tth group in the lth convolution layer. pose that the basic 1D CNN with fully connected layers is
After the convolution layer, a pooling layer not only used for feature reconstructed. First, FLOP is computed.
reduces the dimensions of feature maps from the upper con- We assume that the input data are X, containing N indepen-
volution layer to reduce the computational cost but also pro- dent D-dimensional samples. In the basic 1D CNN, the num-
vides basic translation invariance. The lth pooling layer is ber of the input convolution channels is C in , the number of
immediately after the lth convolution layer. Suppose the m the convolution kernels is M ′ , and the size of the convolution
th pooling filter of the tth group in the lth pooling layer is kernels is 1 ∗ W 1′. The size of the feature map of the convolu-
Pm ,t
l . The input data of the lth pooling layer is the output of tion operation is 1 ∗ W 2′. The numbers of the output convo-
the lth convolution layer, and the output data of the tth group lution channels are C out . The FLOP performed by a
6 Wireless Communications and Mobile Computing

convolution layer is as follows: written as follows:


  !
L
N ∗ Cin ∗ M ′ ∗ 1 ∗ W 1′ + 1 ∗ W 2′ ∗ M ′ ∗ W 1′ ∗ C out , ð5Þ ο 〠 N ∗ Cl,in ∗ M l′ ∗ W′l,1 ∗ C l,out : ð8Þ
l=1

where ð1 ∗ W 1′ + 1Þ means that a multiplication is performed It can be seen that the parameter count is determined by
by one convolutional kernel sampling the input data. ð+1Þ is the number of the samples, the number of the convolutional
to add the bias. layers, the number of the convolutional kernels per layer, the
∗W 2′ means the number of multiplications performed by size of each convolutional kernel, and the number of the
one convolutional kernel to get the feature maps of the out- input and output convolution channels.
put convolution operation. The definition of W 2′ is W 2′ = ðD
+ 2padding − W 1′Þ/stride + 1, where padding = 0, stride = 1. 3.4.2. FLOP and Parameter Counts of the Group CNN. Like
basic 1D CNN, the FLOP and parameter count of group
∗M ′ means multiple convolutional kernels computing in
CNN can be computed. Suppose that the input data is X,
the operation.
containing N independent D-dimensional samples, which
∗M ′ ∗ W 1′ means the number of addition from the fea- are grouped to T groups. It means that the dimension of each
ture map of the convolution operation to the output feature group data is D/T. The numbers of the input and output con-
map of the convolution layer. volution channels are Cin and C out . The structure of group
It is noted that the operations of Re LUð·Þ and the pool- CNN contains L convolution layers and L pooling layers.
ing layers do not contain multiplication and addition opera- There are T group convolution kernels in each convolution
tions. Therefore, the FLOP does not consider the operations layer. The pooling layer is the same. There are M convolution
of Re LUð·Þ and the pooling layers. kernels in each group convolution kernels. The size of each
∗Cin and ∗Cout means repeating calculation in multiple convolution kernel is 1 ∗ W 1 . The size of the feature map of
input channels and output channels. the convolution operation is 1 ∗ W 2 . Therefore, the FLOP
∗N means repeating calculation of all the samples. of each group is
Basic 1D CNN has L convolution layers, so the FLOP of
the basic 1D CNN model equals the sum of the FLOP of each N ∗ Cin ∗ M ∗ ð1 ∗ W 1 + 1Þ ∗ W 2 ∗ M ∗ W 1 ∗ C out , ð9Þ
convolution layer, which can be computed as follows:
where W 2 = ððD/TÞ + 2 padding − W 1 Þ/stride + 1, where
L   padding = 0, stride = 1.
〠 N ∗ Cl,in ∗ M l′ ∗ 1 ∗ W′l,1 + 1 ∗ W′l,2 ∗ M l′ ∗ W′l,1 ∗ Cl,out : Total FLOP of the model equals the sum of the FLOP of
l=1
each convolution layer, which can be computed as follows:
ð6Þ
L T
Then, the bias term is ignored and the FLOP calculation 〠 〠 N ∗ Cl,t,in ∗ M l,t ∗ ð1 ∗ W l,t ,1 + 1Þ ∗ W l,t ,2 ∗ M l,t
formula (6) is written as follows: l=1 t=1
∗ W l,t,1 ∗ C l,t,out :
!
L
2 2 ð10Þ
ο 〠 N ∗ C l,in ∗ M ′ l ∗ W ′ l,1 ∗ W′l,2 ∗ C l,out : ð7Þ
l=1
Then, the bias term is ignored and the FLOP in formula
(6) is optimized as follows:
It can be seen that FLOP is determined by the number of
!
the samples, the number of the convolutional layers, the L T
number of the convolutional kernels per layer, the size of ο 〠 〠 N ∗ Cl,t ,in ∗ M 2l,t ∗ W 2l,t,1 ∗ W l,t,2 ∗ Cl,t,out : ð11Þ
each convolutional kernel, the length of the feature map of l=1 t=1
the convolution operation, and the number of the input
and output convolution channels. Similarly, the parameter count of group CNN can be
Next, we computed the parameter count of basic 1D computed as follows:
CNN. The parameter count is to get the statistics of the !
L T
parameters during the basic 1D CNN operating, containing
weighting parameters and bias parameters, which appear in ο 〠 〠 N ∗ Cl,t ,in ∗ M l,t ∗ W l,t,1 ∗ Cl,t ,out : ð12Þ
l=1 t=1
the running process of the model. In the above basic 1D
CNN, in the case of a single channel and a single convolution
It can be seen that the FLOP and parameter count are
kernel, the number of the parameters is ðW 1′ + 1Þ. When the determined not only by the number of the samples, the num-
number of the convolution kernels is M ′ and the number of ber of the convolutional layers, the number of the convolu-
the convolution layers is L, the parameter count of each layer tional kernels per layer, the size of each convolutional
is ∑Ll=1 N ∗ C l,in ∗ M l′ ∗ ðW′l,1 + 1Þ ∗ Cl,out . Then, the bias term kernel, and the number of the input and output convolution
is ignored and the parameter count calculation formula is channels, but also by the number of groups.
Wireless Communications and Mobile Computing 7

Now, let us compare the FLOP and parameter count of Table 1: Details of the datasets.
group CNN with that of basic 1D CNN. From formula (7),
formula (8), formula (11), and formula (12), we can find that Number of Number of samples
Dataset
there are many parameters to decide the FLOP and parame- features (normal/abnormal)
ter count. We cannot compare them directly. But we can KDDCUP99 41 10200 (5000/5200)
assume some comparison conditions. Because the length of CICMalDroid2020-
139 3795 (1795/2100)
input data in group CNN to that of basic 1D CNN is 1/T, 139
we assume that the length of convolutional kernels in each CICMalDroid2020-
470 3795 (1795/2100)
layer of group CNN to that of basic 1D CNN is 1/T, that is, 470
W 1′ = T ∗ W 1 . Similarly, W 2′ ≈ T ∗ W 2 . According to the  
comparison of formula (7) and formula (11), it can roughly 1 + β2 × precision × recall 2 × TP
be seen that the FLOP of group CNN is smaller than that of F1 = = ðβ = 1Þ:
β2 × precision + recall 2 × TP + FN + FP
1D CNN. Similarly, according to the comparison of formula
(8) and formula (12), it can roughly be seen that the param- ð16Þ
eter count of group CNN is smaller than that of 1D CNN.
4. Experiments
Actually, in experiments, we set completely different values
of the parameters for the two models to achieve the best fea- 4.1. Dataset. The data come from the public datasets in
ture representation effect. More specifically, a comparison of cyberspace and contain the data of the network threat behav-
the results are seen in Section 4.3.5. ior. The details of the datasets are shown in Table 1.
KDDCUP99 [61] is the most famous and frequently cited
3.5. Shallow Machine Learning Classifier. Shallow machine
dataset on intrusion detection. The whole dataset is very big
learning has good performance and high efficiency. There-
and classified to 5 classes. In our work, we just randomly
fore, in this work, we use SVM as a shallow machine learning
extract a small part, and only use them in 2 classes consisting
algorithm to build the classification model and identify the
of the normal and abnormal samples. The data set contains
malicious samples in the dataset.
41 features, which are divided into 4 categories: 9 basic fea-
Shallow machine learning is consisted of two stages:
tures of the TCP, 13 content features of the TCP, 9 statistical
training stage and testing stage [59]. In the training stage,
features of the traffic based on time, and 10 statistical features
the high-dimensional original dataset is reconstructed to
of the traffic based on host.
the low-dimensional features by the training of the group
CICMalDroid2020 [62] is downloaded from the website
CNN. Then, the dataset containing low-dimensional recon-
of Canadian Institute for Cybersecurity datasets. The original
struction features is input to the shallow machine learning
dataset contains 5 categories of Android samples. In our
classifier to train and obtain the optimal model structure.
work, we just use the whole banking datase, which contains
In the testing stage, the high-dimensional original testing
2100 malware samples, and the whole benign datase, which
dataset is input to the trained group CNN model to obtain
contains 1795 benign samples. CICMalDroid2020-139
the low-dimensional reconstructed features [60, 61]. Then,
consists of 139 extracted features including the frequencies
the dataset containing low-dimensional reconstructed fea-
of system calls. CICMalDroid2020-470 consists of 470
tures is input to the trained shallow machine learning classi-
extracted features including the frequencies of system calls,
fier to get the labels of the predicted testing data.
binders, and composite behaviors.
In the experiment, the true labels of the testing dataset
For most machine learning-based classification tasks,
have been known, so the performance of the shallow machine
imbalanced datasets could cause the classification surfaces
learning models, such as accuracy, precision, recall, and F1,
of the classifiers bias to the majority class, which leads to
can be obtained by comparing the true labels with the pre-
the misclassification of the minority class. Generally, the net-
dicted labels and calculating the confusion matrix.
work threat data is treated as the minority class. Therefore, in
The confusion matrix for binary classification includes
our experiment, the ratios of “Normal” and “Abnormal”
four index items, such as true positive (TP), false negative
instances in all the three datasets are close to 1, which can
(FN), false positive (FP), and true negative (TN). Then, other
void the imbalanced problem.
evaluation metrics as performance are defined as follows:
4.2. Machine Learning Classifiers. There are many shallow
machine learning classifiers, e.g., NB, RF, and LR. Through
TP + TN
Accuracy = , ð13Þ our previous experimental results and analysis of the existing
TP + FN + FP + TN literature, we find that SVM is the most commonly used
classifier.
SVM has many advantages: (1) It has good stability,
TP which in many cases can maintain good classification perfor-
Precision = , ð14Þ
TP + FP mance. (2) It can deal with the noise and outlier data well by
introducing relaxation variable. (3) It can effectively solve the
problem of nonlinear and high-dimensional data. (4) It can
TP keep good classification efficiency and effect for small data
Recall = , ð15Þ
TP + FN sets.
8 Wireless Communications and Mobile Computing

1.00 1.00

0.95
0.95
0.90
0.90
Accuracy

Precision
0.85

0.80 0.85
0.75
0.80
0.70

0.65 0.75
5% 10% 15% 20% 25% 30% 100% 5% 10% 15% 20% 25% 30% 100%
Ratio of the reconstruction features Ratio of the reconstruction features

KDDCUP99 KDDCUP99
CICMalDroid2020-139 CICMalDroid2020-139
CICMalDroid2020-470 CICMalDroid2020-470
(a) Accuracy (b) Precision
1.0 1.00
0.95
0.9
0.90
0.8 0.85
Recall

0.80
0.7
F1

0.75
0.6 0.70
0.65
0.5
0.60
0.4 0.55
5% 10% 15% 20% 25% 30% 100% 5% 10% 15% 20% 25% 30% 100%
Ratio of the reconstruction features Ratio of the reconstruction features
KDDCUP99 KDDCUP99
CICMalDroid2020-139 CICMalDroid2020-139
CICMalDroid2020-470 CICMalDroid2020-470
(c) Recall (d) F1

Figure 2: The performance of the reconstruct features at different ratios.

To sum up, combined with the characteristics of our and 30%. First, the original data are input to group CNN
dataset, which is high-dimensional and small, we choose models to generate the reconstructed features. Second, the
SVM as the classifier in our experiment. data composed of reconstructed features are input to SVM,
All experiments are performed in JetBrains PyCharm and then, the accuracy, precision, recall, and F1 are com-
2017 with Python 3.6 interpreter on a laptop Intel CORE puted to evaluate the performance of the reconstructed fea-
i5-6200U 2.3 GHz with 8 GB RAM running the Windows tures. The performances of the reconstructed features at
10 OS. different ratios are plotted in Figure 2. In addition, it should
be noted that the number of iterations of the group CNN
4.3. Experiment Results and Discussion algorithms is 1000. The recorded results are the average of
5 experiments.
4.3.1. Comparison of the Reconstructed Features. In this sec- According to the curve of the performance of the recon-
tion, the performances of the reconstructed features at differ- structed features at different ratios in Figure 2, including the
ent ratios are compared. According to the output size of the accuracy, precision, recall, and F1, we can obtain some con-
fully connected layer, the dimensions of the reconstructed clusions. First, the performances of the reconstructed feature
features are different. In this section, to identify the perfor- data at some low ratios are better than those of the original
mance of the reconstructed features, the lengths of the recon- data, whose ratio is 100%. In particular, the performances
structed features are set according to different situations. of the KDDCUP99 dataset are more obvious. Therefore, it
Specifically, the ratios of the reconstructed feature length to is necessary to reduce the data dimensions by using the group
the original data length are set to 5%, 10%, 15%, 20%, 25%, CNN to reconstruct the features, which cannot reduce the
Wireless Communications and Mobile Computing

Table 2: The parameters of group CNN and basic 1D CNN network structures.

Basic 1D CNN Group CNN


KDD99 CICMalDroid2020-139 CICMalDroid2020-470 KDD99 CICMalDroid2020-139 CICMalDroid2020-470
Count of the convolution layers 3 4 5 3 4 5
Count of the pooling layers 0 3 2 0 3 2
Learning rate 0.0009 0.0009 0.0009 0.0009 0.0009 0.0009
Count of groups 1 1 1 2 2 4
Stride 1 1 1 1 1 1
9
10 Wireless Communications and Mobile Computing

Table 3: Accuracy of group CNN and basic 1D CNN.

(a) Accuracy of basic 1D CNN

Accuracy
Datasets
5% 10% 15% 20% 25% 30% 100%
KDD99 0.9605 0.9566 0.9599 0.9627 0.9667 0.9548 0.8788
CICMalDroid2020-139 0.7760 0.7854 0.7721 0.7587 0.7733 0.7724 0.7094
CICMalDroid2020-470 0.6783 0.7065 0.6808 0.6320 0.6127 0.6246 0.7171

(b) Accuracy of group CNN

Accuracy
Datasets
5% 10% 15% 20% 25% 30% 100%
KDD99 0.9612 0.9717 0.9724 0.9764 0.9696 0.9692 0.8788
CICMalDroid2020-139 0.7681 0.7988 0.8091 0.7945 0.7967 0.7891 0.7094
CICMalDroid2020-470 0.8111 0.8058 0.7937 0.8108 0.7985 0.7983 0.7171

data quality. Second, the higher the dimension of the original by the group CNN is higher than that of the original data.
data is, the lower the ratio of the reconstruction features with KDDCUP99 achieves the highest accuracy 0.9764 at 15%.
better performance. CICMalDroid2020-139 achieves the highest accuracy
For example, KDDCUP99 is a low-dimension dataset, 0.8091 at 10%. And CICMalDroid2020-470 achieves the
whose highest accuracy and F1 are at 15%. highest accuracy 0.8111 at 5%. Comparing the results in
CICMalDroid2020-139 is a middle–high-dimensional data- Table 3(a) with that in Table 3(b), we find that the accuracy
set, whose highest accuracy and F1 are at 10%. Meanwhile, by the group CNN is generally higher than that by the basic
CICMalDroid2020-470 is a high-dimensional dataset, whose 1D CNN. And the highest accuracy of each dataset in
highest accuracy and F1 are at 5%. To sum up, we can con- Table 3(b) by group CNN is higher than that in Table 3(a)
clude that reconstructed features are helpful to reduce the by the basic 1D CNN. Furthermore, the datasets get the high-
data dimensions and improve the performance. est accuracy by the group CNN at the lower ratios. For exam-
ple, KDDCUP99 gets the highest accuracy by the group CNN
at 15%, but gets the highest accuracy by basic 1D CNN at
4.3.2. Comparison of the Group CNN and the Basic 1D CNN. 25%. Finally, we can conclude that the performance of the
Both the group CNN and the basic 1D CNN can reconstruct group CNN is better than that of basic 1D CNN mainly
features. In this part, we compare the performance of the because grouped data based on the feature correlation helps
reconstructed features by these two methods. First, the orig- to improve the inside stickiness of the data of each group.
inal data are input to group CNN and basic 1D CNN models,
respectively. Different ratios from 5% to 30% of the recon-
structed features are generated. Second, the data composed 4.3.3. Training Loss of the Group CNN. During training stage,
of reconstructed features are input to SVM, and the accuracy the training loss is achieved based on the cross entropy loss
are computed to evaluate the performance of the recon- function to compare the probability that the predicted labels
structed features. The parameters of their network structures of the reconstructed features are close to the real labels. The
are shown in Table 2. The performance of different ratios of smaller the training loss is, the closer the predicted labels to
the reconstructed features are recorded in Table 3. In addi- the true labels of each data. In this section, we study the trend
tion, it should be noted that the number of iterations of the of the training loss of the group CNN. KDDCUP99 and
CNN algorithms is 1000. The recorded results are the average CICMalDroid2020-139 are grouped to two groups, while
of 5 experiments. CICMalDroid2020-470 is grouped to four groups. The
The original data are directly input to SVM, and the accu- grouped data are separately input to the group CNN to train
racy is recorded in the last column of Tables 3(a) and 3(b). By the models. Then, different ratios from 5% to 30% of the
contrast, the accuracy at different ratios from 5% to 30% of reconstructed features are generated. During the training of
the reconstructed features are recorded in other columns. the group CNN, the loss of each iteration is recorded and
Comparing the results in Table 3(a), we find that in some sit- plotted in Figure 3. The number of iterations in the training
uations the accuracy of the reconstructed features by the stage is 1000.
basic 1D CNN is higher than that of the original data. From the curves in Figure 3, on the one hand, we find that
KDDCUP99 achieves the highest accuracy at 25%. some training loss curves of the grouped data are closer to
CICMalDroid2020-139 achieves the highest accuracy at each other and approaching to 0. For example, in
10%. And CICMalDroid2020-470 achieves the highest accu- Figure 3(a), the training loss curves of 20% reconstructed fea-
racy with the original data. Comparing the results in ture data of KDDCUP99, which are grouped to two groups,
Table 3(b), we find the accuracy of the reconstructed features are closer. So are the training loss curves of 15%
Wireless Communications and Mobile Computing 11

Training loss of KDDCUP99 dataset

Training loss of 5% reconstructed features Training loss of 10% reconstructed features

0.8 0.8
Training loss

Training loss
0.6 0.6

0.4 0.4

0 200 400 600 800 1000 0 200 400 600 800 1000
Training epoch Training epoch

Training loss of 15% reconstructed features Training loss of 20% reconstructed features

0.8 0.8
Training loss

Training loss
0.6 0.6

0.4 0.4

0 200 400 600 800 1000 0 200 400 600 800 1000
Training epoch Training epoch

Training loss of 25% reconstructed features Training loss of 30% reconstructed features

0.8 0.8
Training loss

Training loss

0.6 0.6

0.4 0.4

0 200 400 600 800 1000 0 200 400 600 800 1000
Training epoch Training epoch

Loss of group 1 Loss of group 1


Loss of group 2 Loss of group 2
(a) Training loss of KDDCUP99 by group CNN

Figure 3: Continued.
12 Wireless Communications and Mobile Computing

Training loss of CICMalDroid2020-139 dataset

Training loss of 5% reconstructed features Training loss of 10% reconstructed features

0.8 0.8
Training loss

Training loss
0.6 0.6

0.4 0.4

0 200 400 600 800 1000 0 200 400 600 800 1000
Training epoch Training epoch

Training loss of 15% reconstructed features Training loss of 20% reconstructed features

0.8 0.8
Training loss

Training loss
0.6 0.6

0.4 0.4

0 200 400 600 800 1000 0 200 400 600 800 1000
Training epoch Training epoch

Training loss of 25% reconstructed features Training loss of 30% reconstructed features

0.8 0.8
Training loss

Training loss

0.6 0.6

0.4 0.4

0 200 400 600 800 1000 0 200 400 600 800 1000
Training epoch Training epoch

Loss of group 1 Loss of group 1


Loss of group 2 Loss of group 2
(b) Training loss of CICMalDroid2020-139 by group CNN

Figure 3: Continued.
Wireless Communications and Mobile Computing 13

Training loss of CICMalDroid2020-470 dataset

Training loss of 5% reconstructed features Training loss of 10% reconstructed features

0.8 0.8
Training loss

Training loss
0.6 0.6

0.4 0.4

0 200 400 600 800 1000 0 200 400 600 800 1000
Training epoch Training epoch

Training loss of 15% reconstructed features Training loss of 20% reconstructed features

0.8 0.8
Training loss

Training loss
0.6 0.6

0.4 0.4

0 200 400 600 800 1000 0 200 400 600 800 1000
Training epoch Training epoch
Training loss of 25% reconstructed features Training loss of 30% reconstructed features

0.8 0.8
Training loss

Training loss
0.6 0.6

0.4 0.4

0 200 400 600 800 1000 0 200 400 600 800 1000
Training epoch Training epoch

Loss of group 1 Loss of group 3 Loss of group 1 Loss of group 3


Loss of group 2 Loss of group 4 Loss of group 2 Loss of group 4
(c) Training loss of CICMalDroid2020-470 by group CNN

Figure 3: Training loss of group CNN.

reconstructed feature data of CICMalDroid2020-139 in compare with the basic 1D CNN and the group CNN. Like
Figure 3(b), and the training loss curves of 5% reconstructed in Section 4.3.1, first, the dimensions of the original data by
feature data of CICMalDroid2020-470 in Figure 3(c). the dimension reduction algorithms are reduced to 5%,
Furthermore, the ratios of the closer training loss curves 10%, 15%, 20%, 25%, and 30%, separately. Then, the dimen-
in Figure 3 are the same as that of the highest accuracy in sion reduction data are input to SVM. Accuracy and F1 are
Table 3(a). On the other hand, we find that when the curves calculated to evaluate the performance of the low-
converge, the training loss curve of group 1 is under that of dimensional data. The accuracy and F1 of the dimension
group 2 in Figures 3(a) and 3(b), and the loss curves are the reduction algorithms are recorded in Figure 4. In addition,
same in Figure 3(c), where the loss curve of group 1 is at it should be noted that the number of iterations of the basic
the bottom and the loss curve of group 4 is on the top. That 1D CNN and the group CNN algorithms are 1000. The
is because the data are grouped based on the feature correla- recorded results are the average of 5 experiments.
tion. We first calculate the correlations between features, and According to the accuracy and F1 of different dimension
rank the correlations in descending order. Then, we divide reduction algorithms in Figure 4, we can obtain some conclu-
the data into several groups equally according to the descend- sions. First, for the low-dimensional dataset, such as
ing correlation coefficients. So, the correlation coefficients of KDDCUP99, the ratios of the highest accuracy and F1 are
the first group are biggest, and that of the last group are smal- high. For the high-dimensional dataset, the ratios of the high-
lest. Therefore, the loss of reconstructed features are smaller est accuracy and F1 are low, such as CICMalDroid2020-470.
when the correlation coefficients are larger. Furthermore, the highest accuracy and F1 at the low ratios
are even higher than that of the original data. Therefore, we
4.3.4. Comparison of the Dimension Reduction Algorithms. think that it is quite necessary to reduce the data dimensions
The group CNN can reconstruct features and reduce the by the dimension reduction algorithms. Second, the accuracy
dimensions of the features. Therefore, the group CNN can and F1 of different ratios by the group CNN are the highest.
be seen as a dimension reduction algorithm. At present, there Therefore, we can obtain that the group CNN is the best
are many dimension reduction algorithms, such as PCA, FA, dimension reduction algorithm. At the same time, the accu-
ICA, and SVD. In this section, we choose PCA and SVD to racy and F1 of the basic 1D CNN are less than that of the
14 Wireless Communications and Mobile Computing

1.000 1.000

0.975 0.975

0.950 0.950

0.925 0.925
Accuracy

0.900

F1
0.900

0.875 0.875

0.850 0.850

0.825 0.825

0.800 0.800
5% 10% 15% 20% 25% 100% 5% 10% 15% 20% 25% 100%
Ratio of the reconstruction features Ratio of the reconstruction features

PCA Basic 1D CNN PCA Basic 1D CNN


SVD Group CNN SVD Group CNN
(a) The accuracy and F1 of KDDCUP99 by the dimension reduction algorithms
1.00 1.0

0.95
0.9
0.90

0.85
0.8
Accuracy

F1

0.80

0.75 0.7

0.70
0.6
0.65

0.60 0.5
5% 10% 15% 20% 25% 100% 5% 10% 15% 20% 25% 100%
Ratio of the reconstruction features Ratio of the reconstruction features

PCA Basic 1D CNN PCA Basic 1D CNN


SVD Group CNN SVD Group CNN
(b) The accuracy and F1 of CICMalDroid2020-139 by the dimension reduction algorithms
1.00 1.0

0.95 0.9

0.90 0.8

0.85 0.7
Accuracy

F1

0.80 0.6

0.75 0.5

0.70 0.4

0.65 0.3

0.60 0.2
5% 10% 15% 20% 25% 100% 5% 10% 15% 20% 25% 100%
Ratio of the reconstruction features Ratio of the reconstruction features

PCA Basic 1D CNN PCA Basic 1D CNN


SVD Group CNN SVD Group CNN
(c) The accuracy and F1 of CICMalDroid2020-470 by the dimension reduction algorithms

Figure 4: The accuracy and F1 of the dimension reduction algorithms.


Wireless Communications and Mobile Computing 15

Table 4: The structures, FLOP, parameter counts, and running time of the basic 1D CNN and the group CNN.

(a) The structures, FLOP, parameter counts, and running time of the basic 1D CNN

Datasets Structures of the basic 1D CNN FLOP Parameter counts Running time (S)
Layers: 3
Layer 1: Cin = 1, C out = 40, M ′ = 40, W 1′ = 10, W 2′ = 32
KDD99 4:63 ∗ 109 45700 602.68
Layer 2: Cin = 40, C out = 40, M ′ = 40, W 1′ = 8, W 2′ = 25
Layer 3: Cin = 40, C out = 80, M ′ = 80, W 1′ = 8, W 2′ = 18
Layers: 4
Layer 1: Cin = 1, C out = 30, M ′ = 30, W 1′ = 20, W 2′ = 128
CICMalDroid2020- Layer 2: Cin = 30, C out = 20, M ′ = 20, W 1′ = 20, W 2′ = 109 6:69 ∗ 109 86798 852.84
139
Layer 3: Cin = 20, C out = 40, M ′ = 40, W 1′ = 10, W 2′ = 100
Layer 4: Cin = 40, C out = 20, M ′ = 20, W 1′ = 10, W 2′ = 91
Layers: 5
Layer 1: Cin = 1, C out = 249, M ′ = 249, W 1′ = 10, W 2′ = 461
CICMalDroid2020- Layer 2: C in = 249, C out = 100, M ′ = 100, W 1′ = 40, W 2′ = 422
1480:39 ∗ 109 3768915 48213.86
470 Layer 3: Cin = 100, C out = 50, M ′ = 50, W 1′ = 80, W 2′ = 343
Layer 4: Cin = 50, C out = 20, M ′ = 20, W 1′ = 40, W 2′ = 304
Layer 5: Cin = 20, C out = 20, M ′ = 20, W 1′ = 20, W 2′ = 285

(b) The structures, FLOP, parameter counts, and running time of the group CNN

Datasets Structures of the group CNN FLOP Parameter counts Running time (S)
Groups: 2; layers: 3
Layer 1: Cin = 1, Cout = 20, M ′ = 20, W 1′ = 5, W 2′ = 16
KDD99 0:82 ∗ 109 13994 595.77
Layer 2: C in = 20, Cout = 20, M ′ = 20, W 1′ = 4, W 2′ = 13
Layer 3: Cin = 20, Cout = 40, M ′ = 40, W 1′ = 4, W 2′ = 10
Groups: 2; layers: 4
Layer 1: Cin = 1, Cout = 15, M ′ = 15, W 1′ = 10, W 2′ = 60
CICMalDroid2020-139 Layer 2: C in = 15, Cout = 10, M ′ = 10, W 1′ = 10, W 2′ = 51 4:90 ∗ 109 61180 484.02
Layer 3: C in = 10, Cout = 20, M ′ = 20, W 1′ = 5, W 2′ = 47
Layer 4: Cin = 20, Cout = 10, M ′ = 10, W 1′ = 5, W 2′ = 43
Groups: 4; layers: 5
Layer 1: Cin = 1, Cout = 63, M ′ = 63, W 1′ = 3, W 2′ = 115
Layer 2: Cin = 63, Cout = 25, M ′ = 25, W 1′ = 10, W 2′ = 106
CICMalDroid2020-470 23:52 ∗ 109 98976 3237.18
Layer 3: C in = 25, Cout = 12, M ′ = 12, W 1′ = 20, W 2′ = 87
Layer 4: Cin = 12, Cout = 5, M ′ = 5, W 1′ = 10, W 2′ = 78
Layer 5: C in = 5, C out = 5, M ′ = 5, W 1′ = 5, W 2′ = 74

group CNN, but higher than that of PCA and SVD, which are The basic 1D CNN and the group CNN have similar
traditional methods. Furthermore, we can conclude that the structures, when dealing with the same dataset. It should be
results of the deep learning methods are better than that of noted that the count of convolutional kernels in each layer
the traditional methods. Therefore, we suggest to apply deep of the basic 1D CNN is equal to that of the group CNN,
learning algorithms to reduce the dimensions. which means that the count of convolutional kernels in each
layer of the basic 1D CNN is equal to the numbers of the
4.3.5. Comparison of Running Time. In theory, we have groups multiplied by the counts of convolutional kernels in
already proved that the parameter counts and FLOP of the each group. When the models are operating to analyze the
group CNN are smaller than that of basic 1D CNN. In this data, running time is recorded. At the same time, FLOP
section, we compare the values of FLOP, parameter counts, and parameters are computed. The results are shown in
and running time between the basic 1D CNN and the group Table 4.
CNN. The basic 1D CNN and the group CNN are built with Table 4 shows the structures, FLOP, parameters, and run-
different structures to analyze the datasets. In particular, the ning time of the basic 1D CNN and the group CNN. It is easy
numbers of layers and the parameters of each layer are shown to find that the more layers of the structures have, the larger
in Table 4. the FLOP, parameters, and running time in Table 4(a) and
16 Wireless Communications and Mobile Computing

Table 4(b). Furthermore, the FLOP, parameters, and running works,” IEEE Transactions on Network Science and
time of the group CNN in Table 4(b) are less than that of the Engineering, vol. 8, no. 2, pp. 894–904, 2021.
basic 1D CNN in Table 4(a), when these two CNN models [2] Z. Cai and Z. He, “Trading private range counting over big IoT
deal with the same datasets. data,” in The 39th IEEE International Conference on Distrib-
In particular, more FLOP, parameter counts, and run- uted Computing Systems, pp. 144–153, Dallas, TX, USA, 2019.
ning time of the group CNN on CICMalDroid2020-470 [3] L. Qi, C. Hu, X. Zhang et al., “Privacy-aware data fusion and
decrease, compared to that of the group CNN on KDD99 prediction with spatial-temporal context for smart city indus-
and CICMalDroid2020-139. Maybe, we can infer that the trial environment,” IEEE Transactions on Industrial Informat-
larger the group count is, the more FLOP, parameter counts, ics, vol. 17, no. 9, pp. 4159–4167, 2020.
and running time reduce. It should be noted that the struc- [4] X. Yan, Y. Xu, X. Xing, B. Cui, Z. Guo, and T. Guo, “Trustwor-
tures of the basic 1D CNN and the group CNN in this section thy network anomaly detection based on an adaptive learning
are set to compare the running time, which are not used in rate and momentum in IIoT,” IEEE Transactions on Industrial
Informatics, vol. 16, no. 9, pp. 6182–6192, 2020.
other sections. On the contrary, in other sections, the struc-
tures of basic 1D CNN and group CNN are set to obtain [5] X. Zheng and Z. Cai, “Privacy-preserved data sharing towards
multiple parties in industrial IoTs,” IEEE Journal on Selected
the highest performance, which are totally different from that
Areas in Communications, vol. 38, no. 5, pp. 968–979, 2020.
in this section.
[6] Z. Cai, Z. He, X. Guan, and Y. Li, “Collective data-sanitization
for preventing sensitive information inference attacks in social
5. Conclusions networks,” IEEE Transactions on Dependable and Secure Com-
puting, vol. 15, no. 4, pp. 577–590, 2018.
In this paper, we present a 1D group CNN model to recon-
struct the features and reduce the dimensionality. The main [7] Y. Xu, C. Zhang, G. Wang, Z. Qin, and Q. Zeng, “A
blockchain-enabled deduplicatable data auditing mechanism
characteristic is that grouped data are based on feature corre-
for network storage services,” IEEE Transactions on Emerging
lations, which means that the data are grouped by column. Topics in Computing, 2020.
CNN model grouping occurs in convolution kernel group-
[8] Y. Xu, Q. Zeng, G. Wang, C. Zhang, J. Ren, and Y. Zhang, “An
ing. In summary, first, compared to all features, our group efficient privacy-enhanced attribute-based access control
CNN can achieve the best performance with fewer features. mechanism,” Concurrency & Computation Practice & Experi-
Second, compared to the basic 1D CNN, the group CNN out- ence, vol. 32, no. 5, pp. 1–10, 2020.
performs the basic 1D CNN on the features at different ratios. [9] Z. Cai and X. Zheng, “A private and efficient mechanism for
Third, compared to the dimension reduction algorithms, the data uploading in smart cyber-physical systems,” IEEE Trans-
accuracies and F1 of the group CNN are the highest. Fourth, actions on Network Science and Engineering, vol. 7, no. 2,
compared to the basic 1D CNN, the FLOP, parameters, and pp. 766–775, 2020.
running time of the group CNN are lower. Therefore, from [10] Y. Xu, J. Ren, Y. Zhang, C. Zhang, B. Shen, and Y. Zhang,
the evaluations of all of the aspects, the group CNN spends “Blockchain empowered arbitrable data auditing scheme for
less time but achieves better performance with fewer features. network storage as a service,” IEEE Transactions on Services
Computing, vol. 13, no. 2, pp. 289–300, 2020.
Data Availability [11] C. Zhang, Y. Xu, Y. Hu, J. Wu, J. Ren, and Y. Zhang, “A
blockchain-based multi-cloud storage data auditing scheme
The datasets used to support the findings of this study can be to locate faults,” IEEE Transactions on Cloud Computing, 2021.
downloaded from the public websites whose references are [12] Y. Xu, C. Zhang, Q. Zeng, G. Wang, J. Ren, and Y. Zhang,
provided in this paper. And the datasets also are available “Blockchain-enabled accountability mechanism against infor-
from the corresponding author upon request. mation leakage in vertical industry services,” IEEE Transac-
tions on Network Science and Engineering, vol. 8, no. 2,
Conflicts of Interest pp. 1202–1213, 2021.
[13] H. Liu and B. Lang, “Machine learning and deep learning
The authors declare that they have no conflicts of interest. methods for intrusion detection systems: a survey,” Applied
Sciences, vol. 20, no. 9, p. 4396, 2019.
Acknowledgments [14] J. Cheng, J. Zheng, and X. Yu, “An ensemble framework for
interpretable malicious code detection,” International Journal
This paper is supported by the Natural Science Foundation of of Intelligent Systems, 2020.
Zhejiang Province (Nos. LY20F020012 and LQ19F020008), [15] Y. Ye, T. Li, D. Adjeroh, and S. S. Iyengar, “A survey on mal-
the National Natural Science Foundation of China (No. ware detection using data mining techniques,” ACM Comput-
61802094), Zhejiang Electronic Information Product Inspec- ing Surveys, vol. 50, no. 3, pp. 1–40, 2017.
tion and Research Institute (Key Laboratory of Information [16] X. Yan, Y. Xu, B. Cui, S. Zhang, T. Guo, and C. Li, “Learning
Security of Zhejiang Province), and Key Laboratory of Brain URL embedding for malicious website detection,” IEEE Trans-
Machine Collaborative Intelligence of Zhejiang Province. actions on Industrial Informatics, vol. 16, no. 10, pp. 6673–
6681, 2020.
References [17] S. M. Ghaffarian and H. R. Shahriari, “Software vulnerability
analysis and discovery using machine-learning and data-
[1] X. Zhou, W. Liang, Z. Luo, and Y. Pan, “Periodic-aware intel- mining techniques,” ACM Computing Surveys, vol. 50, no. 4,
ligent prediction model for information diffusion in social net- pp. 1–36, 2017.
Wireless Communications and Mobile Computing 17

[18] A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi, “A survey [34] S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, and
of the recent architectures of deep convolutional neural net- D. J. Inman, “1D convolutional neural networks and applica-
works,” Artificial Intelligence Review, vol. 53, no. 8, pp. 5455– tions: a survey,” Mechanical Systems and Signal Processing,
5516, 2019. vol. 151, p. 107398, 2021.
[19] X. Zhou, X. Xu, W. Liang et al., “Intelligent small object detec- [35] Q. Xiang, X. Wang, Y. Song, L. Lei, R. Li, and J. Lai, “One-
tion based on digital twinning for smart manufacturing in dimensional convolutional neural networks for high-
industrial CPS,” IEEE Transactions on Industrial Informatics, resolution range profile recognition via adaptively feature
2021. recalibrating and automatically channel pruning,” Interna-
[20] C. F. Tsai, Y. F. Hsu, C. Y. Lin, and W. Y. Lin, “Intrusion detec- tional Journal of Intelligent Systems, vol. 36, no. 1, pp. 332–
tion by machine learning: a review,” Expert Systems with Appli- 361, 2021.
cations, vol. 36, no. 10, pp. 11994–12000, 2009. [36] S. Chen, J. Yu, and S. Wang, “One-dimensional convolutional
[21] S. Maldonado, R. Weber, and F. Famili, “Feature selection for auto-encoder-based feature learning for fault diagnosis of mul-
high-dimensional class-imbalanced data sets using support tivariate processes,” Journal of Process Control, vol. 87, pp. 54–
vector machines,” Information Sciences, vol. 286, pp. 228– 67, 2020.
246, 2014. [37] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet clas-
[22] R. K. Vigneswaran, R. Vinayakumar, K. P. Soman, and sification with deep convolutional neural networks,” Advances
P. Poornachandran, “Evaluating shallow and deep neural net- in Neural Information Processing Systems, vol. 25, pp. 1097–
works for network intrusion detection systems in cyber secu- 1105, 2012.
rity,” in 2018 9th International Conference on Computing, [38] T. Zhang, G. J. Qi, B. Xiao, and J. Wang, “Interleaved group
Communication and Networking Technologies, Bengaluru, convolutions for deep neural networks,” ICCV, vol. 1707,
India, 2018. 2017.
[23] S. Ansari, V. Bartos, and B. Lee, “Shallow and deep learning [39] Y. Lu, G. Lu, R. Lin, J. Li, and D. Zhang, “SRGC-nets: sparse
approaches for network intrusion alert prediction,” Procedia repeated group convolutional neural networks,” IEEE Trans-
Computer Science, vol. 171, pp. 644–653, 2020. actions on Neural Networks and Learning Systems, vol. 99,
[24] X. Zhou, X. Xu, W. Liang, Z. Zeng, and Z. Yan, “Deep learning pp. 1–14, 2019.
enhanced multi-target detection for end-edge-cloud surveil- [40] A. L. Buczak and E. Guven, “A survey of data mining and
lance in smart IoT,” IEEE Internet of Things Journal, vol. 8, machine learning methods for cyber security intrusion detec-
no. 16, pp. 12588–12596, 2021. tion,” IEEE Communications Surveys & Tutorials, vol. 18,
[25] Y. Liu, C. Wang, Y. Zhang, and J. Yuan, “Multiscale convolu- no. 2, pp. 1153–1176, 2016.
tional CNN model for network intrusion detection,” Computer [41] M. Kruczkowski and E. N. Szynkiewicz, “Support vector
Engineering and Applications, vol. 55, no. 3, p. 90, 2019. machine for malware analysis and classification,” in 2014
[26] M. Sheikhan and Z. Jadidi, “Flow-based anomaly detection in IEEE/WIC/ACM International Joint Conferences on Web Intel-
high-speed links using modified GSA-optimized neural net- ligence and Intelligent Agent Technologies, pp. 415–420, War-
work,” Neural Computing and Applications, vol. 24, no. 3-4, saw, Poland, 2014.
pp. 599–611, 2014. [42] L. Bilge, S. Sen, and D. Balzarotti, “Exposure: a passive DNS
[27] X. Zhou, Y. Li, and W. Liang, “CNN-RNN based intelligent analysis service to detect and report malicious domains,”
recommendation for online medical pre-diagnosis support,” ACM Transactions on Information and System Security,
IEEE/ACM Transactions on Computational Biology and Bioin- vol. 16, no. 4, pp. 1–28, 2013.
formatics, vol. 18, no. 3, pp. 912–921, 2021. [43] Y. Y. Aung and M. M. Min, “Hybrid intrusion detection sys-
[28] Y. Xu, X. Yan, Y. Wu, Y. Hu, W. Liang, and J. Zhang, “Hierar- tem using K-means and classification and regression trees
chical bidirectional RNN for safety-enhanced B5G heteroge- algorithms,” in 2018 IEEE 16th International Conference on
neous networks,” IEEE Transactions on Network Science and Software Engineering Research, Management and Applications,
Engineering, 2021. pp. 195–199, Kunming, China, 2018.
[29] Z. Cai, Z. Xiong, H. Xu, P. Wang, W. Li, and Y. Pan, “Gener- [44] A. Mo, K. Qader, and M. Alkasassbeh, “Comparative analysis
ative adversarial networks,” ACM Computing Surveys, of clustering techniques in network traffic faults classification,”
vol. 54, no. 6, pp. 1–38, 2021. International Journal of Innovative Research in Computer and
[30] X. Zhou, W. Liang, S. Shimizu, J. Ma, and Q. Jin, “Siamese neu- Communication Engineering, vol. 5, no. 4, pp. 6551–6563,
ral network based few-shot learning for anomaly detection in 2017.
industrial cyber-physical systems,” IEEE Transactions on [45] D. Abdelhafiz, C. Yang, R. Ammar, and S. Nabavi, “Deep con-
Industrial Informatics, vol. 17, no. 8, pp. 5790–5798, 2021. volutional neural networks for mammography: advances,
[31] X. Yan, B. Cui, Y. Xu, P. Shi, and Z. Wang, “A method of infor- challenges and applications,” BMC Bioinformatics, vol. 20,
mation protection for collaborative deep learning under GAN no. 11, pp. 281–301, 2019.
model attack,” IEEE/ACM Transactions on Computational [46] Y. Xiao, C. Xing, T. Zhang, and Z. Zhao, “An intrusion
Biology and Bioinformatics, vol. 18, no. 3, pp. 871–881, 2021. detection model based on feature reduction and convolu-
[32] C. Yin, Y. Zhu, J. Fei, and X. He, “A deep learning approach for tional neural networks,” IEEE Access, vol. 7, pp. 42210–
intrusion detection using recurrent neural networks,” IEEE 42219, 2019.
Access, vol. 5, pp. 21954–21961, 2017. [47] Y. Wang, J. An, and W. Huang, “Using CNN-based represen-
[33] Q. Yang, W. Shi, J. Chen, and W. Lin, “Deep convolution neu- tation learning method for malicious traffic identification,” in
ral network-based transfer learning method for civil infra- 2018 IEEE/ACIS 17th International Conference on Computer
structure crack detection,” Automation in Construction, and Information Science, pp. 400–404, Singapore, Singapore,
vol. 116, no. 10, article 103199, 2020. 2018.
18 Wireless Communications and Mobile Computing

[48] J. Zhang, Z. Qin, H. Yin, L. Ou, and K. Zhang, “A feature- [62] S. Mahdavifar, A. Kadir, R. Fatemi, D. Alhadidi, and A. A.
hybrid malware variants detection using CNN based opcode Ghorbani, “Dynamic android malware category classification
embedding and BPNN based API embedding,” Computers & using semi-supervised deep learning,” in 2020 IEEE Int’l. Conf.
Security, vol. 84, pp. 376–392, 2019. on Dependable, Autonomic and Secure Computing, Int’l. Conf.
[49] J. Zhang, Z. Qin, H. Yin, L. Ou, S. Xiao, and Y. Hu, “Malware on Pervasive Intelligence and Computing, Int’l. Conf. on Cloud
variant detection using opcode image recognition with small and Big Data Computing, Int’l. Conf. on Cyber Science and
training sets,” in 2016 25th International Conference on Com- Technology Congress (DASC/PiCom/CBDCom/CyberSciTech),
puter Communication and Networks, Waikoloa, HI, USA, pp. 515–522, Calgary, AB, Canada, 2020.
2016.
[50] J. Yan, Y. Qi, and Q. Rao, “Detecting malware with an ensem-
ble method based on deep neural network,” Security and Com-
munication Networks, vol. 2018, 16 pages, 2018.
[51] C. Ma, X. Du, and L. Cao, “Analysis of multi-types of flow fea-
tures based on hybrid neural network for improving network
anomaly detection,” IEEE Access, vol. 7, pp. 148363–148380,
2019.
[52] M. Azizjon, A. Jumabek, and W. Kim, “1D CNN based net-
work intrusion detection with normalization on imbalanced
data,” in 2020 International Conference on Artificial Intelli-
gence in Information and Communication, pp. 218–224, Fuku-
oka, Japan, 2020.
[53] W. Wei, Q. Ke, J. Nowak, M. Korytkowski, R. Scherer, and
M. Woźniak, “Accurate and fast URL phishing detector: a con-
volutional neural network approach,” Computer Networks,
vol. 178, p. 107275, 2020.
[54] H. Zhang, L. Huang, C. Q. Wu, and Z. Li, “An effective convo-
lutional neural network based on SMOTE and Gaussian mix-
ture model for intrusion detection in imbalanced dataset,”
Computer Networks, vol. 177, p. 107315, 2020.
[55] S. Xie and R. Girshick, “Aggregated residual transformations
for deep neural networks,” in 2017 IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR), Honolulu,
USA, 2016.
[56] M. D. Zeiler and R. Fergus, Visualizing and Understanding
Convolutional Neural Networks, Springer International Pub-
lishing, 2013.
[57] Y. Zhang, T. Shen, X. Ji, Y. Zhang, R. Xiong, and Q. Dai,
“Residual highway convolutional neural networks for in-loop
filtering in HEVC,” IEEE Transactions on Image Processing,
vol. 27, no. 8, pp. 3827–3841, 2018.
[58] M. E. Paoletti, J. M. Haut, X. Tao, J. Plaza, and A. Plaza,
“FLOP-reduction through memory allocations within CNN
for hyperspectral image classification,” IEEE Transactions on
Geoscience and Remote Sensing, vol. 1109, no. 10, pp. 5938–
5952, 2020.
[59] X. Zhou, Y. Hu, W. Liang, J. Ma, and Q. Jin, “Variational
LSTM enhanced anomaly detection for industrial big data,”
IEEE Transactions on Industrial Informatics, vol. 17, no. 5,
pp. 3469–3477, 2021.
[60] X. Yan, J. Zhang, H. Elahi, M. Jiang, and H. Gao, “A personal-
ized search query generating method for safety-enhanced
vehicle-to-people networks,” IEEE Transactions on Vehicular
Technology, vol. 70, no. 6, pp. 5296–5307, 2021.
[61] S. Hettich and S. D. Bay, KDD Cup 1999 Data, The UCI KDD
Archive, 1999.

You might also like