Machine Learning and Deep Learning Methods For Cybersecurity

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Received February 5, 2018, accepted May 3, 2018, date of publication May 15, 2018, date of current version July

19, 2018.
Digital Object Identifier 10.1109/ACCESS.2018.2836950

Machine Learning and Deep Learning


Methods for Cybersecurity
YANG XIN1,2 , LINGSHUANG KONG 3 , ZHI LIU 2,3 , (Member, IEEE), YULING CHEN2 ,
YANMIAO LI1 , HONGLIANG ZHU1 , MINGCHENG GAO1 ,
HAIXIA HOU1 , AND CHUNHUA WANG4
1 Centre of Information Security, Beijing University of Posts and Telecommunications, Beijing 100876, China
2 Guizhou Provincial Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China
3 School of Information Science and Engineering, Shandong University, Jinan 250100, China
4 China Changfeng Science Technology Industry Group Corporation, Beijing 100854, China

Corresponding author: Zhi Liu ([email protected])


This work was supported in part by the National Key R&D Program of China under Grant 2017YFB0802300, in part by the Foundation of
Guizhou Provincial Key Laboratory of Public Big Data under Grant 2017BDKFJJ015, in part by the Key Research and Development Plan
of Shandong Province under Grant 2017CXGC1503 and Grant 2018GSF118228, in part by the Shandong Provincial Natural Science
Foundation under Grant ZR2012FZ005, and in part by the Fundamental Research Funds of Shandong University under Grant 2015JC038.

ABSTRACT With the development of the Internet, cyber-attacks are changing rapidly and the cyber security
situation is not optimistic. This survey report describes key literature surveys on machine learning (ML)
and deep learning (DL) methods for network analysis of intrusion detection and provides a brief tutorial
description of each ML/DL method. Papers representing each method were indexed, read, and summarized
based on their temporal or thermal correlations. Because data are so important in ML/DL methods,
we describe some of the commonly used network datasets used in ML/DL, discuss the challenges of using
ML/DL for cybersecurity and provide suggestions for research directions.

INDEX TERMS Cybersecurity, intrusion detection, deep learning, machine learning.

I. INTRODUCTION without generating a large number of false alarms. However,


With the increasingly in-depth integration of the Internet and administrators often must manually update the database rules
social life, the Internet is changing how people learn and and signatures. New (zero-day) attacks cannot be detected
work, but it also exposes us to increasingly serious security based on misused technologies.
threats. How to identify various network attacks, particu- Anomaly-based techniques study the normal network
larly not previously seen attacks, is a key issue to be solved and system behavior and identify anomalies as deviations
urgently. from normal behavior. They are appealing because of
Cybersecurity is a set of technologies and processes their capacity to detect zero-day attacks. Another advan-
designed to protect computers, networks, programs and data tage is that the profiles of normal activity are customized
from attacks and unauthorized access, alteration, or destruc- for every system, application, or network, therefore mak-
tion [1]. A network security system consists of a network ing it difficult for attackers to know which activities
security system and a computer security system. Each of these they can perform undetected. Additionally, the data on
systems includes firewalls, antivirus software, and intrusion which anomaly-based techniques alert (novel attacks) can
detection systems (IDS). IDSs help discover, determine and be used to define the signatures for misuse detectors.
identify unauthorized system behavior such as use, copying, The main disadvantage of anomaly-based techniques is
modification and destruction [2]. the potential for high false alarm rates because previ-
Security breaches include external intrusions and internal ously unseen system behaviors can be categorized as
intrusions. There are three main types of network analy- anomalies.
sis for IDSs: misuse-based, also known as signature-based, Hybrid detection combines misuse and anomaly detec-
anomaly-based, and hybrid. Misuse-based detection tech- tion [4]. It is used to increase the detection rate of known
niques aim to detect known attacks by using the signatures intrusions and to reduce the false positive rate of unknown
of these attacks [3]. They are used for known types of attacks attacks. Most ML/DL methods are hybrids.

2169-3536
2018 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 6, 2018 Personal use is also permitted, but republication/redistribution requires IEEE permission. 35365
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Y. Xin et al.: Machine Learning and Deep Learning Methods for Cybersecurity

This paper presents a literature review of machine Readers who wish to focus on wireless network protection
learning (ML) and deep learning (DL) methods for cyberse- can refer to essays such as Soni et al. [10], which focuses
curity applications. ML/DL methods and some applications more on architectures for intrusion detection systems that
of each method in network intrusion detection are described. have been introduced for MANETs.
It focuses on ML and DL technologies for network security, The rest of this survey is organized as follows: Section II
ML/DL methods and their descriptions. Our research aims focuses on similarities and differences in ML and DL.
on standards-compliant publications that use ‘‘machine learn- Section III introduces cyber security datasets used in
ing’’, ‘‘deep learning’’ and cyber as keywords to search on ML and DL. Section IV describes the methods and related
Google Scholar. In particular, the new hot papers are used papers for ML and DL in cybersecurity. Section V discusses
because they describe the popular techniques. the research status quo and future direction. Section VI
The purpose of this paper is for those who want to study presents conclusions.
network intrusion detection in ML/DL.Thus, great emphasis
is placed on a thorough description of the ML/DL methods, II. SIMILARITIES AND DIFFERENCES IN ML AND DL
and references to seminal works for each ML and DL method There are many puzzles about the relationship among ML,
are provided. Examples are provided concerning how the DL, and artificial intelligence (AI). AI is a new technological
techniques were used in cyber security. science that studies and develops theories, methods, tech-
This paper does not describe all of the different techniques niques, and applications that simulate, expand and extend
of network anomaly detection; instead, it concentrates only human intelligence [11]. It is a branch of computer science
on ML and DL techniques. However, in addition to anomaly that seeks to understand the essence of intelligence and to
detection, signature-based and hybrid methods are depicted. produce a new type of intelligent machine that responds in
Patcha and Park [5] discuss technological trends in a manner similar to human intelligence. Research in this
anomaly detection and identify open problems and challenges area includes robotics, computer vision, nature language pro-
in anomaly detection systems and hybrid intrusion detection cessing and expert systems. AI can simulate the information
systems. However, their survey only covers papers published process of human consciousness, thinking. AI is not human
from 2002 to 2006, whereas our survey includes more-recent intelligence, but thinking like a human might also exceed
papers. Unlike Modi et al. [6], this review covers the applica- human intelligence.
tion of ML/DL in various areas of intrusion detection and is ML is a branch of AI and is closely related to (and often
not limited to cloud security. overlaps with) computational statistics, which also focuses on
Revathi and Malathi [7] focus on machine-learning intru- prediction making using computers. It has strong ties to math-
sion techniques. The authors present a comprehensive set ematical optimization, which delivers methods, theory and
of machine-learning algorithms on the NSL-KDD intrusion application domains to the field. ML is occasionally conflated
detection dataset, but their study only involves a misuse with data mining [12], but the latter subfield focuses more
detection context. In contrast, this paper describes not only on exploratory data analysis and is known as unsupervised
misuse detection but also anomaly detection. learning. ML can also be unsupervised and be used to learn
Sahoo et al. [8] present the formal formulation of and establish baseline behavioral profiles for various entities
Malicious URL Detection as a machine-learning task and and then used to find meaningful anomalies [13]. The pioneer
categorize and review the contributions of studies that of ML, Arthur Samuel, defined ML as a ‘‘field of study that
address different dimensions of this problem (e.g., fea- gives computers the ability to learn without being explicitly
ture representation and algorithm design). However, unlike programmed.’’ ML primarily focuses on classification and
this paper, they do not explain the technical details of the regression based on known features previously learned from
algorithm. the training data.
Buczak and Guven [9] focus on machine-learning methods DL is a new field in machine-learning research. Its moti-
and their applications to intrusion detection. Algorithms such vation lies in the establishment of a neural network that
as Neural Networks, Support Vector Machine, Genetic Algo- simulates the human brain for analytical learning. It mimics
rithms, Fuzzy Logics, Bayesian Networks and Decision Tree the human brain mechanism to interpret data such as images,
are described in detail. However, major ML/DL methods sounds and texts [14].
such as clustering, Artificial Immune Systems, and Swarm The concept of DL was proposed by Hinton [15] based
Intelligence are not included. Their paper focuses on net- on the deep belief network (DBN), in which an unsuper-
work intrusion detection. Through a wired network, attackers vised greedy layer-by-layer training algorithm is proposed
must pass multiple layers of firewall and operating system that provides hope for solving the optimization problem of
defenses or gain physical access to the network. However, any deep structure. Then the deep structure of a multi-layer auto-
node can be a target in a wireless network; thus, the network matic encoder is proposed. In addition, the convolution neural
is more vulnerable to malicious attacks and is more difficult network proposed by LeCun et al. [16] is the first real multi-
to defend than are wired networks. layer structure learning algorithm that uses a space relative
The ML and DL methods covered in this paper are appli- relationship to reduce the number of parameters to improve
cable to intrusion detection in wired and wireless networks. the training performance.

35366 VOLUME 6, 2018


Y. Xin et al.: Machine Learning and Deep Learning Methods for Cybersecurity

DL is a machine-learning method based on characteriza- Compared with some ML algorithms, the test time increases
tion of data learning. An observation, such as an image, can as the amount of data increases. However, this point does not
be expressed in a variety of ways, such as a vector of each apply to all ML algorithms, because some ML algorithms
pixel intensity value, or more abstractly as a series of edges, have short test times.
a region of a particular shape, or the like. Using specific • Interpretability. Crucially, interpretability is an impor-
representations makes it easier to learn tasks from instances. tant factor in comparing ML with DL. DL recognition of
Similarly to ML methods, DL methods also have supervised handwritten numbers can approach the standards of people,
learning and unsupervised learning. Learning models built a quite amazing performance. However, a DL algorithm will
under different learning frameworks are quite different. The not tell you why it provides this result [14]. Of course, from a
benefit of DL is the use of unsupervised or semi-supervised mathematical point of view, a node of a deep neural network
feature learning and hierarchical feature extraction to effi- is activated. However, how should neurons be modeled and
ciently replace features manually [17]. how do these layers of neurons work together? Thus, it is
The differences between ML and DL include the difficult to explain how the result was generated. Conversely,
following: the machine-learning algorithm provides explicit rules for
• Data dependencies. The main difference between deep why the algorithm chooses so; therefore, it is easy to explain
learning and traditional machine learning is its performance the reasoning behind the decision.
as the amount of data increases. Deep learning algorithms An ML method primarily includes the following four
do not perform as well when the data volumes are small, steps [12]:
because deep learning algorithms require a large amount • Feature Engineering. Choice as a basis for prediction
of data to understand the data perfectly. Conversely, in this (attributes, features).
case, when the traditional machine-learning algorithm uses • Choose the appropriate machine learning algorithm.
the established rules, the performance will be better [14]. (Such as classification algorithm or regression algorithm,
• Hardware dependencies. The DL algorithm requires high complexity or fast)
many matrix operations. The GPU is largely used to optimize • Train and evaluate model performance. (For different
matrix operations efficiently. Therefore, the GPU is the hard- algorithms, evaluate and select the best performing model.)
ware necessary for the DL to work properly. DL relies more • Use the trained model to classify or predict the unknown
on high-performance machines with GPUs than do traditional data.
machine-learning algorithms [18]. The steps of a DL approach are similar to ML, but as men-
• Feature processing. Feature processing is the process of tioned above, unlike machine-learning methods, its feature
putting domain knowledge into a feature extractor to reduce extraction is automated rather than manual. Model selection
the complexity of the data and generate patterns that make is a constant trial and error process that requires a suitable
learning algorithms work better. Feature processing is time- ML/DL algorithm for different mission types. There are three
consuming and requires specialized knowledge. In ML, most types of ML/DL approaches: supervised, unsupervised and
of the characteristics of an application must be determined semi-supervised. In supervised learning, each instance con-
by an expert and then encoded as a data type. Features can sists of an input sample and a label. The supervised learning
be pixel values, shapes, textures, locations, and orientations. algorithm analyzes the training data and uses the results of
The performance of most ML algorithms depends upon the the analysis to map new instances. Unsupervised learning
accuracy of the features extracted. Trying to obtain high- is a machine-learning task that deduces the description of
level features directly from data is a major difference between hidden structures from unlabeled data. Because the sample
DL and traditional machine-learning algorithms [17]. Thus, is unlabeled, the accuracy of the algorithm’s output cannot
DL reduces the effort of designing a feature extractor for each be evaluated, and only the key features of the data can be
problem. summarized and explained. Semi-supervised learning is a
• Problem-solving method. When applying traditional means of combining supervised learning with unsupervised
machine-learning algorithms to solve problems, traditional learning. Semi-supervised learning uses a large amount of
machine learning usually breaks down the problem into mul- unlabeled data when using labeled data for pattern recogni-
tiple sub-problems and solves the sub-problems, ultimately tion. Using semi-supervised learning can reduce label efforts
obtaining the final result. In contrast, deep learning advocates while achieving high accuracy.
direct end-to-end problem solving. Commonly used ML algorithms include for example KNN,
• Execution time. In general, it takes a long time to train SVM, Decision Tree, and Bayes. The DL model includes for
a DL algorithm because there are many parameters in the example DBM, CNN, and LSTM. There are many parameters
DL algorithm; therefore, the training step takes longer. The such as the number of layers and nodes to choose, but also
most advanced DL algorithm, such as ResNet, takes exactly to improve the model and integration. After the training is
two weeks to complete a training session, whereas ML train- complete, there are alternative models that must be evaluated
ing takes relatively little time, only seconds to hours. How- on different aspects.
ever, the test time is exactly the opposite. Deep learning The evaluation model is a very important part of
algorithms require very little time to run during testing. the machine-learning mission. Different machine-learning

VOLUME 6, 2018 35367


Y. Xin et al.: Machine Learning and Deep Learning Methods for Cybersecurity

missions have various evaluation indicators, whereas the • AUC: The value of AUC is the size of the area under the
same type of machine-learning missions also have different ROC curve. In general, AUC values range from 0.5 to 1.0,
evaluation indicators, each with a different emphasis such and larger AUCs represent better performance.
as classification, regression, clustering and the like [19]. In the area of cybersecurity, the metrics commonly used
The confusion matrix is a table that describes the classifi- in assessment models are precision, recall, and F1-score. The
cation results in detail, whether they are correctly or incor- higher and better the precision and recall of model tests are,
rectly classified and different classes are distinguished, for a the better, but in fact these two are in some cases contradictory
binary classification, a 2 ∗ 2 matrix, and for n classification, and can only be emphatically balanced according to the task
an n ∗ n matrix [20]. needs. The F1-score is the harmonic average of precision and
For a binary classification as shown in Table 1, the results recall, considering their results. In general, the higher the
can be divided into four categories: F1-score, the better the model will perform.

III. NETWORK SECURITY DATA SET


TABLE 1. Confusion matrix.
Data constitute the basis of computer network security
research. The correct choice and reasonable use of data are the
prerequisites for conducting relevant security research. The
size of the dataset also affects the training effects of the
ML and DL models. Computer network security data can
usually be obtained in two ways: 1) directly and 2) using an
existing public dataset. Direct access is the use of various
means of direct collection of the required cyber data, such
• True Positive (TP): Positive samples correctly classified as through Win Dump or Wireshark software tools to cap-
by the model; ture network packets. This approach is highly targeted and
• False Negative (FN): A positive sample that is suitable for collecting short-term or small amounts of data,
mis- classified by the model; but for long-term or large amounts of data, acquisition time
• False Positive (FP): A negative samples that is mis- and storage costs will escalate. The use of existing network
classified by the model; security datasets can save data collection time and increase
• True Negative (TN): Negative samples correctly classi- the efficiency of research by quickly obtaining the various
fied by the model; data required for research. This section will introduce some
Further, the following metrics can be calculated from the of the Security datasets that are accessible on the Internet and
confusion matrix: facilitate section IV of the research results based on a more
• Accuracy: (TP + TN)/ (TP + TN + FP + FN). Ratio comprehensive understanding.
of the number of correctly classified samples to the total
number of samples for a given test data set. When classes A. DARPA INTRUSION DETECTION DATA SETS
are balanced, this is a good measure; if not, this metric is not DARPA Intrusion Detection Data Sets [21], which are under
very useful. the direction of DARPA and AFRL/SNHS, are collected and
• Precision: TP/ (TP + FP). It calculates the ratio of all published by The Cyber Systems and Technology Group
‘‘correctly detected items’’ to all ‘‘actually detected items’’. (formerly the DARPA Intrusion Detection Evaluation Group)
• Sensitivity or Recall or True Positive Rate (TPR): TP/ of MIT Lincoln Laboratory for evaluating computer network
(TP + FN). It calculates the ratio of all ‘‘correctly detected intrusion detection systems.
items’’ to all ‘‘items that should be detected’’. The first standard dataset provides a large amount of back-
• False Negative Rate (FNR): FN/(TP + FN). The ratio of ground traffic data and attack data. It can be downloaded
the number of misclassified positive samples to the number directly from the website. Currently, the dataset primarily
of positive samples. includes the following three data subsets:
• False Positive Rate (FPR): FP/(FP + TN). The ratio of the · 1998 DARPA Intrusion Detection Assessment Dataset:
number of misclassified negative samples to the total number Includes 7 weeks of training data and 2 weeks of test data.
of negative samples. · 1999 DARPA Intrusion Detection Assessment Dataset:
• True Negative Rate (TNR): TN/(TN + FN). The ratio Includes 3 weeks of training data and 2 weeks of test data.
of the number of correctly classified negative samples to the · 2000 DARPA Intrusion Detection Scenario-Specific
number of negative samples. Dataset: Includes LLDOS 1.0 Attack Scenario Data,
• F1-score: 2 ∗ TP/(2 ∗ TP + FN + FP). It calculates the LLDOS 2.0.2 Attack scenario data, Windows NT attack data.
harmonic mean of the precision and the recall.
• ROC: In ROC space, the abscissa for each point is FPR B. KDD CUP 99 DATASET
and the ordinate is TPR, which also describes the trade-off of The KDD Cup 99 dataset [22] is one of the most widely
the classifier between TP and FP. ROC’s main analysis tool used training sets; it is based on the DARPA 1998 dataset.
is a curve drawn in ROC space - the ROC curve. This dataset contains 4 900 000 replicated attacks on record.

35368 VOLUME 6, 2018


Y. Xin et al.: Machine Learning and Deep Learning Methods for Cybersecurity

TABLE 2. Features of KDD Cup 99 dataset.

There is one type of the normal type with the identity of Each connection record contains 41 input features grouped
normal and 22 attack types, which are divided into five major into basic features and higher-level features. The basic
categories: DoS (Denial of Service attacks), R2L (Root to features are directly extracted or derived from the header
Local attacks), U2R (User to Root attack), Probe (Probing information of IP packets and TCP/UDP segments in the
attacks) and Normal. For each record, the KDD Cup 99 train- tcpdump files of each session. The listfiles for tcpdump from
ing dataset contains 41 fixed feature attributes and a class the DARPA training data were used to label the connection
identifier. Of the 41 fixed feature attributes, seven characteris- records. The so-called content-based higher-level features use
tic properties are the symbolic type; the others are continuous. domain knowledge to look specifically for attacks in the
In addition, the features include basic features actual data of the segments recorded in the tcpdump files.
(No.1–No.10), content features (No.11–No.22), and traffic These address ‘r2l’ and ‘u2r’ attacks, which occasionally
features (No.23–No.41) as shown in Table 2. The testing set either require only a single connection or are without any
has specific attack types that disappear in the training set, prominent sequential patterns. Typical features include the
which allows it to provide a more realistic theoretical basis number of failed login attempts and whether root access
for intrusion detection. was obtained during the session. Furthermore, there are
To date, the KDD Cup ’99 dataset remains the most time-based and connection-based derived features to address
thoroughly observed and freely available dataset, with ‘DoS’ and ‘probe’ attacks. Time-based features examine
fully labeled connection records spanning several weeks of connections within a time window of two seconds and pro-
network traffic and a large number of different attacks [23]. vide statistics about these. To provide statistical information

VOLUME 6, 2018 35369


Y. Xin et al.: Machine Learning and Deep Learning Methods for Cybersecurity

about attacks exceeding a two-second time-window, such as TABLE 3. Type of attck in NSL-KDD.
slow probing attacks, connection-based features use a con-
nection window of 100 connections. Both are further split
into same-host features, which provide statistics about con-
nections with the same destination host, and same-service
features, which examine only connections with the same
service [24].
The KDD Cup ’99 competition provides the training and
testing datasets in a full set and also provides a so-called
‘10%’ subset version. The ‘10%’ subset was created due to
the huge amount of connection records present in the full TABLE 4. Different classifications in the NSL-KDD.
set; some ‘DoS’ attacks have millions of records. Therefore,
not all of these connection records were selected. Further-
more, only connections within a time-window of five min-
utes before and after the entire duration of an attack were
added into the ‘10%’ datasets [22]. To achieve approximately
the same distribution of intrusions and normal traffic as the
original DARPA dataset, a selected set of sequences with
‘nor-mal’ connections were also left in the ‘10%’ dataset.
Training and test sets have different probability distribu-
tions. The full training dataset contains nearly five million academy (ADFA) [26], which is widely used in the testing
records. The full training dataset and the corresponding of intrusion detection products. In the dataset, various sys-
‘10%’ both contain 22 different attack types in the order tem calls have been characterized and marked for the type
that they were used in the 1998 DARPA experiments. The of attack. The data set includes two OS platforms, Linux
full test set, with nearly three million records, is only avail- (ADFA-LD) and Windows(ADFA-WD), which record the
able unlabeled; however, a ‘10%’ subset is provided both order of system calls. In the case of ADFA-LD, it records the
as unlabeled and labeled test data. It is specified as the invocation of operating system for a period of time. Kernel
‘corrected’ subset, with a different distribution and addi- provides the user space program and the kernel space interact
tional attacks not part of the training set. For the KDD Cup with a set of standard interface, the interface to the user pro-
’99 competition, the ‘10%’ subset was intended for train- gram can be restricted access hardware devices, such as the
ing. The ‘corrected’ subset can be used for performance application of system resources, operating equipment, speak-
testing; it has over 300,000 records containing 37 different ing, reading and writing, to create a new process, etc. User
attacks. space requests, kernel space is responsible for execution, and
these interfaces are the bridge between user space and kernel
C. NSL-KDD DATASET space. ADFA-LD is marked for the attack type, as shown in
The NSL-KDD dataset [7] is a new version of the the figure. Linux system, user space by making system calls
KDD Cup 99 dataset. The NSL-KDD dataset improves to kernel space to produce soft interrupts, so that the program
some of the limitations of the KDD Cup 99 dataset. The into the kernel state, perform corresponding operations. There
KDD 1999 Cup Dataset Intrusion Detection Dataset was is a corresponding system call number for each system call.
applied to the 3rd International Knowledge Discovery and It contains 5 different attack types and 2 normal types,
Data Mining Tools Contest. This model identifies features as shown in Table 5.
between intrusive and normal connections for building net-
work intrusion detectors. In the NSL-KDD dataset, each TABLE 5. Type of attck in ADFA-LD.
instance has the characteristics of a type of network data.
It contains 22 different attack types grouped into 4 major
attack types, as shown in Table 3.
The dataset covers the KDDTrain+ dataset as the train-
ing set and KDDTest+ and KDDTest−21 datasets as the
testing set, which has different normal records and four dif-
ferent types of attack records, as shown in Table 4. The
KDDTest−21 dataset is a subset of the KDDTest+ and is
more difficult to classify [25].
IV. ML AND DL ALGORITHM FOR CYBERSECURITY
D. ADFA DATASET This section is divided into two parts. The first part intro-
The ADFA data set is a set of data sets of host level duces the application of traditional machine-learning algo-
intrusion detection system issued by the Australian defence rithms in network security. The second part introduces the

35370 VOLUME 6, 2018


Y. Xin et al.: Machine Learning and Deep Learning Methods for Cybersecurity

application of deep learning in the field of cybersecurity. vector machine-intrusion detection technology. With the help
It not only describes the research results but also compares of the fuzzy C-means clustering technique, the heterogeneous
similar studies. training data are collected into homogeneous subsets, reduc-
ing the complexity of each subset, which helps to improve
A. SUPPORT VECTOR MACHINE detection accuracy. After the initial clustering, ANNs are
Support Vector Machine (SVM) is one of the most robust trained on the corresponding homogeneous subsets and use
and accurate methods in all machine-learning algorithms. the linear SVM classifier to perform the final classification.
It primarily includes Support Vector Classification (SVC) and The experimental results obtained with the calibrated KDD
Support Vector Regression (SVR). The SVC is based on the CUP 1999 dataset show the effectiveness of this method.
concept of decision boundaries. A decision boundary sepa- In the same work, the KDD Cup 99 dataset is divided into
rates a set of instances having different class values between 4 subsets according to different intrusion types and trained
two groups. The SVC supports both binary and multi-class separately; DoS and PROBE attacks have a higher frequency
classifications. The support vector is the closest point to the and can be effortlessly separated from normal activity. In con-
separation hyperplane, which determines the optimal sepa- trast, U2R and R2L attacks are embedded in the data portion
ration hyperplane. In the classification process, the mapping of the packet, making it difficult to achieve detection accuracy
input vectors located on the separation hyperplane side of on both types of attacks. The technique has attained a consis-
the feature space fall into one class, and the positions fall tent peak scores for all types of intrusions. Overall accuracy
into the other class on the other side of the plane. In the of the DoS, Probe, R2L and U2R categories was 99.66%,
case of data points that are not linearly separable, the SVM 98.55%, 98.99% and 98.81%, respectively. Compared with
uses appropriate kernel functions to map them into higher other reported intrusion detection approaches, this method is
dimensional spaces so that they become separable in those better in classification effect, but the trained classifier cannot
spaces [27]. effectively detect the abnormal in the actual network.
Kotpalliwar and Wajgi [28] choose two representative Yan and Liu [32] attempts to use a direct support vector
datasets ‘‘Mixed’’ and ‘‘10% KDD Cup 99’’ datasets. The classifier to create a transductive method and introduces the
RBF is used as a kernel function for SVM to classify DoS, simulated annealing method to degenerate the optimization
Probe, U2R, and R2L datasets. The study calculates param- model. The study used a subset of the DARPA 1998 dataset.
eter values related to intrusion-detector performance evalua- For DoS-type attacks, 200 normal samples and 200 attack
tion. The validation accuracy of the ‘‘mixed’’ dataset and the samples are selected; the feature dimension is 18, and samples
classification accuracy of the ‘‘10% KDD’’ dataset were esti- are randomly divided into a training set and a test set accord-
mated to be 89.85% and 99.9%, respectively. Unfortunately, ing to the ratio 6:4. The experimental results show that the
the study did not assess accuracy or recall except for accuracy. accuracy, FPR and precision are 80.1%, 0.47% and 81.2%,
Saxena and Richariya [29] proposed a Hybrid respectively. The dataset used in this study is too small, and
PSO-SVM approach for building IDS. The study used two the classification results are not very satisfactory.
feature reduction techniques: Information Gain and BPSO. Using the same dataset, Kokila et al. [33] focus on
The 41 attributes reduced to 18 attributes. The classification DDoS attacks on the SDN controller. A variety of machine-
performance was reported as 99.4% on the DoS, 99.3% on learning algorithms were compared and analyzed. The
Probe or Scan, 98.7% on R2L, and 98.5% on the U2R. The SVM classifier has a lower false alarm rate and a higher
method provides a good detection rate in the case of a Denial classification accuracy of 0.8% and 95.11%, respectively.
of Service (DoS) attack and achieves a good detection rate On the basis of a short sequence model, Xie et al. [34]
in the case of U2R and R2L attacks. However, the precision applied a class SVM algorithm to ADFA-LD. Due to
of Probe, U2R and R2L is 84.2%, 25.0% and 89.4%, respec- the short sequence removes duplicate entries, and between
tively. In other words, the method provided by the essay leads the normal and abnormal performed better separability,
to a higher false alarm rate. so the technology can reduce the cost of computing at the
Pervez and Farid [30] proposes a filtering algorithm based same time to achieve an acceptable performance limits, but
on a Support Vector Machine (SVM) classifier to select mul- individual type of attack mode recognition rate is low.
tiple intrusion classification tasks on the NSL-KDD intrusion
detection dataset. The method achieves 91% classification B. K-NEARESTNEIGHBOR
accuracy using only three input features and 99% classifica- The kNN classifier is based on a distance function that mea-
tion accuracy using 36 input features, whereas all 41 input sures the difference or similarity between two instances. The
features achieve 99% classification accuracy. The method standard Euclidean distance d(x, y) between two instances x
performed well on the training set with an F1-score of 0.99. and y is defined as:
However, in the test set, the performance is worse; the F1-
score is only 0.77. With poor generalization, it cannot effec- v
tively detect unknown network intrusions. u n
uX
Chandrasekhar and Raghuveer [31] integrates fuzzy C- d(x, y) = t (xk − yk )2
means clustering, an artificial neural network and support k=1

VOLUME 6, 2018 35371


Y. Xin et al.: Machine Learning and Deep Learning Methods for Cybersecurity

where, xk is the kth featured element of instance x, yk is the To reduce the false alarm rate, Meng et al. [39] devel-
kth featured element of the instance y and n is the total number oped a knowledge-based alarm verification method, designed
of features in the dataset. an intelligent alarm filter based on multi-level k-nearest
Assume that the design set for kNN classifier is U. neighbor classifier and filtered out unwanted alarms. Expert
The total number of samples in the design set is S. Let knowledge is a key factor in evaluating alerts and deter-
C = {C1 , C2 , . . . CL } are the L distinct class labels that are mining rating thresholds. Alert filters classify incoming
available in S. Let x be an input vector for which the class alerts into appropriate clusters for tagging through expert
label must be predicted. Let yk denote the kth vector in the knowledge rating mechanisms. Experts will further ana-
design set S. The kNN algorithm is to find the k closest lyze the effect of different classifier settings on classifi-
vectors in design set S to input vector x. Then the input vector cation accuracy in the evaluation of the performance of
x is classified to class Cj if the majority of the k closest vectors alarm filters in real datasets and network environments,
have their class as Cj [34]. respectively. Experimental results show that when K = 5,
Rao and Swathi [36] used Indexed Partial Distance Search the alarm filter can effectively filter out multiple NIDS
k-Nearest Neighbor (IKPDS) to experiment with various alarms.
attack types and different k values (i.e., 3, 5, and 10). In the same work, the study initially trained the filter
They randomly selected 12,597 samples from the NSl-KDD using the DARPA 1999 dataset and evaluated it in a net-
dataset to test the classification results, resulting in 99.6% work environment built by Snort and Wireshark. Before
accuracy and faster classification time. Experimental results Snort was deployed on the internal network, Snort was
show that IKPDS, and in a short time Network Intrusion designed to detect various types of network attacks. Wire-
Detection Systrms(NIDS), have better classification results. shark was implemented before Snort and was responsible
However, the study of the test indicators of the experiment for recording network packets and providing statistics. KNN-
is not perfect; it did not consider the precision and recall based smart alarm filters are deployed behind Snort to fil-
rate. ter Snort alarms. Real-world web traffic will pass through
Sharifi et al. [37] presents the K-Means and kNN com- Wireshark and reach Snort. Snort checks network packets
bination intrusion detection system. First, the input invasion and generates alerts. Thereafter, all generated Snort alarms
data (NSL-KDD) are preprocessed by principal component will be forwarded to intelligent kNN-based alarm filters for
analysis to select the best 10 important features. Then, these alarm filtering. Experiments use the accuracy and F-score
preprocessed data are divided into three parts and fed into as indicators; the average of results were 85.2% and 0.824,
the k-means algorithm to obtain the clustering centers and respectively.
labels. This process is completed 20 times to select the best Vishwakarma et al. [40] propose a kNN intrusion detec-
clustering scheme. These cluster centers and labels are then tion method based on the ant colony optimization algo-
used to classify the input KDD data using simple kNNs. In the rithm (ACO), pre-training the KDD Cup 99 dataset using
experiment, two methods were used to compare the proposed ACO, and studies on the performance of kNN-ACO, BP
method and the results of kNN. These measures are based Neural network and support vector machine for comparative
on the accuracy of the true detection of attack and attack analysis with common performance measurement parame-
type or normal mode. Implement two programs to investigate ters (accuracy and false alarm rate). The overall accuracy
the results. In the first case, the test data are separated from reported for the proposed method is 94.17%, and the overall
the train data, whereas in the second case, some test data are FAR is 5.82%.Unfortunately, the dataset used for this study
not substituted from the training data. However, in any case, was small, with only 26,167 samples participating in the
the average accuracy of the experiment is only approximately training.
90%, and it did not consider the precision and recall rate. Another study [41] used KNN for intrusion detection on
Shapoorifard and Shamsinejad [38] studied some new the same KDD Cup 99 dataset in an approach similar to
techniques to improve the classification performance of KNN that of Vishwakarma et al. [40]. The main difference is
in intrusion detection and evaluate their performance on NSL- that the k-NN, SVM, and pdAPSO algorithms are mixed to
KDD datasets. The farthest neighbor (k-FN) and nearest detect intrusions. The experimental results show that mix-
neighbor (KNN) are introduced to classify the data. When the ing different classifiers can improve classification accuracy.
farthest neighbor and the nearest neighbor have the same cate- The statistical results show that the classification accuracy is
gory label, the second nearest neighbor of the data is used for 98.55%. Other than accuracy, the study did not count other
discrimination. Experimental results show that this method indicators.
has been improved in terms of accuracy, detection rate and
reduction of failure alarm rate. Because the experimental C. DECISION TREE
results in this paper only provide a histogram, the accuracy A decision tree is a tree structure in which each internal node
of this method can only be roughly estimated. The detection represents a test on one property and each branch represents
rate and false alarm rate are 99%, 98% and 4%, respectively. a test output, with each leaf node representing a category.
However, the study did not identify specific types of attacks In machine learning, the decision tree is a predictive model;
in abnormal situations. it represents a mapping between object attributes and object

35372 VOLUME 6, 2018


Y. Xin et al.: Machine Learning and Deep Learning Methods for Cybersecurity

FIGURE 1. An example decision tree.

values. Each node in the tree represents an object, each diver- (SO-DTP) avoids overfitting, resulting in a more generalized
gence path represents a possible attribute value, and each leaf tree. By using these generalized trees to classify attacks,
node corresponds to the value of the object represented by significant performance improvement can be observed. The
the path from the root node to the leaf node. The decision false alarm rate, accuracy and precision of MO-DTP are
tree only has a single output; if you want complex output, 13.6%, 96.65 and 99.98, respectively. The false alarm rate,
you can establish an independent decision tree to handle dif- accuracy and precision of SO-DTP were 0.81%, 91.94%
ferent outputs. Commonly used decision tree models are ID3, and 99.89%. The research considers the binary classification
C4.5 and CART. and multi-classification and various parameters and is highly
As shown in Fig.1, the decision tree classifies the samples representative.
through the conditions of training, and has better detection Relan and Patil [44] propose two techniques for using
accuracy for known intrusion methods, but is not suitable for feature selection, the C4.5 decision tree algorithm and the
detection of unknown intrusion. C4.5 decision tree (with pruning). Train and test classifiers
Ingre et al. [42] propose a decision tree-based IDS for use the KDD Cup 99 and NSL-KDD datasets. Only the
the NSL-KDD dataset. Feature selection using a correla- discrete value attributes of protocol_type, Service, flag, land,
tion feature selection (CFS) approach, selecting 14 features logged_in, is_host_login, is_guest_login and class are con-
from each data sample, improves the prediction performance sidered in the classification process. The experimental results
of DTS-based IDS. The performance was evaluated sepa- show that C4.5 (with pruning) has higher precision and lower
rately for five categories and two categories; overall accu- FAR of 98.45% and 1.55% than does the C4.5 decision
racy was 83.7% and 90.3%, respectively. FAR was 2.5% tree.
and 9.7%. The method provided by the paper from the Another research [45] used C4.5 for intrusion detection
experimental results was not prominent; there is room for on the NSL-KDD dataset. In this work, feature selection
improvement. and segmentation values are important issues in building
Malik and Khan [43] attempt to prune the decision decision trees; the algorithm is designed to solve both of
tree using a particle swarm optimization (PSO) algorithm these problems. The information gain is used to select the
and apply it to network intrusion detection problems. most relevant features, and the segmentation values are cho-
Single-objective optimization decision tree pruning (SO- sen such that the classifier has no bias on the most fre-
DTP) and multi-objective optimization decision tree pruning quent values. Sixteen attributes were selected as features on
(MO-DTP) methods were used for experiments. The experi- the NSL-KDD dataset. The proposed decision tree splitting
ments were performed on a KDD99 dataset. (DTS) algorithm can be used for signature-based intrusion
From the experimental results, the multi-objective opti- detection. However, the accuracy of this method is only
mization decision tree pruning (MO-DTP) method is most 79.52%.
suitable to minimize the size of the entire tree. On average, Modinat et al. [46] is similar to Ingre et al. [42]. The
its tree size is tripled compared with any other tree classifier difference is the feature selection; they use GainRatio for
used. Single Objective Optimization Decision Tree Trimming feature selection. The experiment on the KDD99 dataset

VOLUME 6, 2018 35373


Y. Xin et al.: Machine Learning and Deep Learning Methods for Cybersecurity

showed an improvement in the performance of the decision tree model can adjust the priority. The obtained annotator
tree classifier in some categories of attack, that is, Remote can provide a useful supplement for an anomaly detection
to Local: R2L (98.31% for reduced dataset over 98% for full intrusion detection system.
dataset) and User to Root: U2R (76.92% for reduced dataset An advanced persistence threat (APT) attack is bringing
over 75% for full dataset)). In the case of Denial of Service out large social issues. The APT attack uses social engi-
(DoS and Normal categories), both methods yielded the same neering methods to target various systems for intrusion.
result (100% for both full and reduced datasets). In addition, Moon et al. [53] propose a decision tree-based intrusion
there were some demeaning results in the category of Probing detection system that can detect the malicious code’s behav-
attack (97.78% for reduced dataset and over 99.49% for full ior information by running malicious code on the virtual
dataset). machine and analyze the behavior information [54] to detect
Azad and Jha [47] proposes an intrusion detection system the intrusion. In addition, it detects the possibility of initial
based on a C4.5 decision tree and the genetic algorithm. intrusion and minimizes damage by responding quickly to
The proposed system solves the problem of small separa- APT attacks. The accuracy of the experiment was 84.7%. The
tion in the decision tree, improves the accuracy of classi- proposed system appears to have less accuracy than do other
fication and reduces the false positive rate. The proposed detection systems. However, considering the detection of
system is compared with some well-known systems, such as APT attacks related to malicious code, the detection accuracy
Random Tree [48], Naïve Bayes and Reptree. The results of the system is high.
show that the proposed system has better results than the
existing system. Training with the KDD Cup 99 dataset D. DEEP BELIEF NETWORK
yielded the best results, with a 99.89% accuracy rate and Deep Belief Network (DBN) is a probabilistic generative
a 0.11% FAR. model consisting of multiple layers of stochastic and hid-
Balogun and Jimoh [49] propose a hybrid intrusion detec- den variables. The Restricted Boltzmann Machine (RBM)
tion algorithm based on decision tree and k nearest neighbor. and DBN are interrelated because composing and stacking
The dataset initially passes through the decision tree and a number of RBMs enables many hidden layers to train
generates node information. Node information is based on data efficiently through activations of one RBM for further
the rules generated by the decision tree. The node infor- training stages [55]. RBM is a special topological structure of
mation (as an additional attribute) is sorted by kNN along a Boltzmann machine (BM). The principle of BM originated
with the original attribute set to obtain the final output. The from statistical physics as a modeling method based on an
KDD Cup 1999 dataset was used on the WEKA tool to energy function that can describe the high-order interactions
perform a performance assessment using a single 10-fold between variables. BM is a symmetric coupled random feed-
cross-validation technique [50] on a single base classifier back binary unit neural network composed of a visible layer
(decision tree and kNN) and the proposed hybrid classifier and a plurality of hidden layers. The network node is divided
(DT-kNN). Experimental results show that the hybrid clas- into a visible unit and a hidden unit, and the visible unit
sifier (DT-kNN) offers the best results in terms of accuracy and the hidden unit are used to express a random network
and efficiency compared with a single classifier (decision tree and a random environment. The learning model expresses the
and kNN). correlation between units by weighting.
In the same study, the proposed hybrid classifier can reach In the study, Ding et al. [56] apply Deep Belief Nets
an accuracy of 100% with a false positive rate of 0%. Com- (DBNs) to detect malware. They use PE files from the internet
pared with other NIDSs that also applied KDD Cup 1999 as as samples. DBNs use unsupervised learning to discover
a dataset, this hybrid classifier showed superior performance multiple layers of features that are then used in a feed-forward
in U2R, R2L, DoS and Probe attacks, although it was not neural network and fine-tuned to optimize discrimination.
the best for U2R and R2L attacks. However, in terms of The unsupervised pre-training algorithm makes DBNs less
accuracy, the proposed classifier could obtain the best perfor- prone to overfitting than feedforward neural networks ini-
mance at 100%. This experiment was performed on the 10% tialized with random weights. It also makes it easier to train
KDD Cup 1999 dataset without sampling. Some new attack neural networks with many hidden layers.
instances in the test dataset, which never appeared in training, Because the DBNs can learn from additional unlabeled
could also be detected by this system. From the results, this data, in the experiments, the DBNs produce better classifi-
method works best but might be an overfit situation. cation results than several other widely used learning tech-
Snort rule checking is one of the most popular forms of niques, outperforming SVM, KNN, and decision tree. The
network intrusion detection systems. Ammar [51] presents accuracy of the method is approximately 96.1%, but other
that Snort priority over real network traffic (real attack) can specifications are not mentioned.
be optimized in real time through a decision tree classifier Nadeem et al. [57] combine neural networks with semi-
in the case of high-speed networks using only three features supervised learning to achieve good accuracy with only a
(protocol, source port and destination port) with an accu- small number of labeled samples. Experiments using KDD
racy of 99%. Snort [52] usually sends alert priorities for the Cup 99 dataset, tracing the non-labeled data through the Lad-
common attack classes (34 classes) by default. The decision der Network and then using the DBN to classify the data of

35374 VOLUME 6, 2018


Y. Xin et al.: Machine Learning and Deep Learning Methods for Cybersecurity

FIGURE 2. An example RNN model structure.

the label obtained detection accuracy similar to the supervised network can effectively identify unknown attacks provided
learning, which was 99.18%. However, there was the same to it, and the proposed system achieves 97.5% accuracy after
problem as with Ding [56]–no calculation of indicators in 50 iterations.
addition to the accuracy rate. According to the behavioral characteristics of ad hoc net-
Gao et al. [58] compared different DBN structures, works, Tan et al. [62] design a DBN-based ad hoc network
adjusted the number of layers and number of hidden units in intrusion detection model and conduct a simulation experi-
the network model, and obtained a four-layer DBN model. ment on the NS2 platform. Experimental results show that
The KDD Cup 99 dataset was used for testing. The accuracy, the DBN detection method can obtain better accuracy and can
precision and FAR of the model were 93.49%, 92.33% and be applied to Ad hoc network intrusion detection technology.
0.76%. Accuracy and FAR were 97.6% and 0.9%, respectively.
Zhao et al. [59] aim at the problems of a large amount of
redundant information, large amount of data, long training E. RECURRENT NEURAL NETWORKS
time, and ease of falling into a local optimum in intrusion The recursive neural network (RNN) is used to process
detection. An intrusion detection method based on deep belief sequence data. In the traditional neural network model, data
network (DBN) and probabilistic neural network (PNN) from the input layer to the hidden layer to the output layer;
is proposed. First, the original data are converted to low- The layers are fully connected and there is no connection
dimensional data, using the DBN nonlinear learning ability between the nodes between each layer. Many problems exist
to retain the basic attributes of the original data. Second, that this conventional neural network cannot solve.
to obtain the best learning performance, the particle-swarm The reason that RNN is a recurrent neural network is that
optimization algorithm is used to optimize the number of the current output of a sequence is also related to the output
hidden nodes in each layer. Next, use the PNN to classify before it. The concrete manifestation is that the network can
the low-dimensional data. Finally, the KDD CUP 99 dataset remember the information of the previous moment and apply
was used to test the performance of the above approach. The it to the calculation of the current output; that is, the nodes
accuracy, precision and FAR of the experimental results were between the hidden layers become connected, and the input
99.14%, 93.25% and 0.615%, respectively. of the hidden layer includes both the output of the input layer
The method Alrawashdeh and Purdy [60] implement is and the last moment hidden layer output. Theoretically, any
based on a deep belief network using Logistic Regression length of sequence data RNN can be processed. However,
soft-max for fine-tuning the deep network. The multi-class in practice, to reduce the complexity, it is often assumed that
Logistic Regression layer was trained with 10 epochs on the current state is only related to the previous states.
the improved pre-trained data to improve the overall perfor- Fig 2 shows the RNN timing properties, expanded into
mance of the network. The method achieved a detection rate a whole network structure here in a multi-layer network.
of 97.9% on the total 10% KDD Cup 99 test dataset and The improved model based on RNN has Long Short Term
produced a low false negative rate of 2.47%. Memory (LSTM) and a Gated Recurrent Unit (GRU).
After training with 40% NSL-KDD datasets, Yin et al. [63] propose intrusion detection (RNN-IDS)
Alom et al. [61] explored the intrusion detection capabilities based on a cyclic neural network. The NSL-KDD dataset
of DBN through a series of experiments. The trained DBN was used to evaluate the performance of the model in binary

VOLUME 6, 2018 35375


Y. Xin et al.: Machine Learning and Deep Learning Methods for Cybersecurity

classification and multi-class classification, and the influence classification of intrusion detection using the 2013 network
of the number of neurons and different learning rates on the traffic data from the honeypot systems [71] of Kyoto Uni-
performance of the model. The training accuracy and the test versity. Results show that the GRU-SVM model performs
accuracy obtained by the model in binary classification are relatively higher than does the conventional GRU-Softmax
respectively 99.81% and 83.28%. The training accuracy and model.
test accuracy of the multi-classification model are 99.53% In the paper, the reported training accuracy of 81.54%
and 81.29%, respectively. and testing accuracy of 84.15% posits that the proposed
In a similar work, Krishnan and Raajan [64] also used GRU-SVM model has a relatively stronger predictive perfor-
RNN for intrusion detection, but the dataset used was the mance than does the conventional GRU-Softmax model (with
KDD 99 Cup. Experiment accuracy, recall and precision of training accuracy of 63.07% and testing accuracy of 70.75%).
Probe was 96.6%, 97.8% and 88.3%, respectively; DoS was
97.4%, 97.05% and 99.9%, respectively; U2R was 86.5%, F. COVOLUTIONAL NEURAL NETWORKS
62.7% and 56.1%, respectively; and R2L are 29.73%, 28.81% The recursive neural network (RNN) is used to process
and 94.1%. sequence data. In the traditional neural network model, data
Staudemeyer [65] implement the LSTM recurrent neural from the input layer to the hidden layer to the output layer;
network classifier for intrusion detection data. The results The layers are fully connected and there is no connection
show that the LSTM classifier has certain advantages over the between the nodes between each layer. Many problems exist
other strong static classifiers in the 10% KDD Cup 99 dataset. that this conventional neural network cannot solve.
These advantages lie in the detection of DoS attacks and Convolutional Neural Networks (CNN) is a type of arti-
Probe, both of which produce unique time series events on ficial neural network that has become a hotspot in the field
the attack categories that generate fewer events. The model- of speech analysis and image recognition. Its weight-sharing
classification accuracy rate reached 93.85%; the FAR was network structure makes it more similar to a biological neu-
1.62%. ral network, thus reducing the complexity of the network
Kim et al. [66] also use LSTM as their model and the model and reducing the number of weights. This advantage is
10% KDD Cup 99 as their dataset. They set 80 for the more obvious when the network input is a multi-dimensional
hidden layer size and 0.01 for the learning rate. The paper image, and the image can be directly used as the input of
reported the results as 96.93% accuracy, the average precision the network to avoid the complex feature extraction and data
as 98.8%, and the average FAR as 10%. Compared with the reconstruction in the traditional recognition algorithm. The
experimental results of [65], the method obtained a higher Convolutional Network is a multi-layered sensor specifically
false detection rate while obtaining a higher detection rate. designed to recognize two-dimensional shapes that are highly
To remedy the issue of high false-alarm rates, invariant to translation, scaling, tilting, or other forms of
Kim et al. [67] propose a system-call language-modeling deformation [72].
approach for designing LSTM-based host intrusion detection CNN is the first truly successful learning algorithm for
systems. The method consists of two parts: the front-end is for training multi-layer network structures, that is, the structure
LSTM modeling of system calls in various settings, and the shown in Fig 3. It reduces the number of parameters that
back-end is for anomaly prediction based on an ensemble of must be learned to improve the training performance of the
thresholding classifiers derived from the front-end. Models BP algorithm through spatial relationships. As a deep learn-
were evaluated using the KDD Cup 99 dataset and achieve ing architecture, CNN is proposed to minimize the data pre-
5.5% FAR and 99.8% precision. processing requirements. There are three main means for
Le et al. [68] compares the effect of six commonly CNN to reduce network-training parameters: local receptiv-
used optimizers on the LSTM intrusion detection model. ity, weight sharing and pooling. The most powerful part of
Experimenting with the KDD Cup 99 dataset, the LSTM CNN is the learning feature hierarchies from large amounts
RNN model with Nadam optimizer [69] outperforms pre- of unlabeled data. Therefore, CNN are quite promising for
vious works. Intrusion detection with accuracy is 97.54%, application in the network intrusion detection field.
the precision is 98.95%, and the false alarm rate is reasonable To learn useful feature representations automatically
at 9.98%. and efficiently from large amounts of unlabeled raw net-
Gated Recurrent Unit (GRU) and long short-term work traffic data by using deep learning approaches,
memory (LSTM) unit are both variants of a recurrent neu- Yu et al. [73] propose a deep learning approach, called
ral network (RNN). Conventionally, like most neural net- dilated convolutional autoencoders (DCAEs), for the network
works, each of the aforementioned RNN variants employs intrusion detection model, which combines the advantages of
the Softmax function as its final output layer for its pre- stacked autoencoders and CNNs. In essence, the model can
diction and the cross-entropy function for computing its automatically learn essential features from large-scale and
loss. Agarap et al. [70] present an amendment to this norm more-varied unlabeled raw network traffic data consisting of
by introducing a linear support vector machine (SVM) as real-world traffic from botnets, web-based malware, exploits,
the replacement for Softmax in the final output layer of APTs (Advanced Persistent Threats), scans, and normal traf-
a GRU model. This proposal is primarily intended for binary fic streams.

35376 VOLUME 6, 2018


Y. Xin et al.: Machine Learning and Deep Learning Methods for Cybersecurity

FIGURE 3. An example CNN model structure.

In the same work, Contagio-CTU-UNB datasets and selection and classifier into a unified end-to-end framework
CTU- UNB datasets are created based on various malware and automatically learns the nonlinear relationship between
traffic data. The classification task is performed to evaluate the original input and the expected output. This method uses
the performance of the proposed model. The precision, recall a public ISCX VPN-nonVPN traffic dataset for verification.
and F-score of classification tasks were 98.44%, 98.40% 1D-CNN performed well in 2-class classification with
and 0.984, respectively. 100% and 99% precision for non-VPN and VPN traffic,
Kolosnjaji et al. [74] shifts performance improvements respectively. Recall rates for non-VPN and VPN traffic
made in the area of neural networks to modeling the execution are 99% and 100%, respectively. VPN traffic of 1D-CNN
sequence of disassembled malicious binary files. A neural in 6-class and 12-class networks also showed performance
network consisting of convolution and feedforward neural of 94.9% and 92.0% precision and recalls of 97.3% and
structures is implemented. This architecture embodies a hier- 95.2%, respectively. However, the 1D-CNN performance of
archical feature extraction method that combines the features non-VPN services is not very good. The precision is only
of the n-gram [75] instruction with the simple vectorization 85.5% and 85.8%; the recall rate is only 85.8% and 85.9%.
of convolution. Wang et al. [78] proposed a malware traffic classification
In the paper, features are extracted from header files in method using a convolutional neural network by taking traffic
Portable Executable (PE) files for evaluation only. The results data as images. This method needed no hand-designed fea-
show that the proposed method outperforms the benchmark tures but directly took raw traffic as input data of the classifier.
methods, such as simple feedforward neural network and In this study, the USTC-TRC2016 flow dataset was estab-
support vector machine. The F1 score of 0.92 is reached along lished, and the data preprocessing kit USTCTK2016 was
with a precision and recall of 0.93. developed. Based on the dataset and the toolkit, we found
Saxe and Berlin [76] propose an eXpose neural network the best type of traffic characterization by analyzing the eight
that uses the depth learning method we have developed to experimental results. Experimental results show the average
take common raw short strings as input (a common case for accuracy of classifiers is 99.41%.
security inputs, which include artifacts such as potentially
malicious URLs, file paths, named pipes, named mutexes, V. DISCUSSION AND FUTURE DIRECTION
and registry keys) and learn to extract features and classifi- Our work examines a large number of academic intrusion
cations simultaneously using character-level embedding and detection studies based on machine learning and deep learn-
convolutional neural networks. In addition to fully automat- ing as shown in Table 5. In these studies, many imbalances
ing the feature design and extraction process, eXpose also appear and expose some of the problems in this area of
outperformed the baseline based on manual feature extraction research, largely in the following areas: (i) the benchmark
for all intrusion detection issues tested, the detection rate datasets are few, although the same dataset is used, and the
was 92% and a decrease in false alarm rate was 0.1%. methods of sample extraction used by each institute vary.
Wang et al. [77] propose a one-dimensional convolutional (ii) The evaluation metrics are not uniform, many studies only
neural network end-to-end encrypted traffic classification assess the accuracy of the test, and the result is one-sided.
method. The method integrates feature extraction, feature However, studies using multi-criteria evaluation often adopt

VOLUME 6, 2018 35377


Y. Xin et al.: Machine Learning and Deep Learning Methods for Cybersecurity

TABLE 6. ML and DL methods and data use.

different metric combinations such that the research results significance of algorithms and models. (iv) A number of new
cannot be compared with one another. (iii) Less consideration datasets are in the school’s charge, enriching the existing
is given to deployment efficiency, and most of the research research on cybersecurity issues, and the best of them is likely
stays in the lab irrespective of the time complexity of the algo- to be the benchmark dataset in this area.
rithm and the efficiency of detection in the actual network. The problems and trends described above also provide a
In addition to the problem, trends in intrusion detection future for intrusion detection research:
are also reflected in Table 5. (i) The study of hybrid mod-
els has been becoming hot in recent years, and better data A. DATA SETS
metrics are obtained by reasonably combining different algo- Existing datasets have the defects of old data, redundant
rithms. (ii) The advent of deep learning has made end-to- information and unbalanced numbers of categories. Although
end learning possible, including handling large amounts of the data can be improved after processing, there is a prob-
data without human involvement. However, the fine-tuning lem of insufficient data volume. Therefore, establishing net-
requires many trials and experience; interpretability is poor. work intrusion detection datasets with large amounts of data,
(iii) Papers comparing the performance of different algo- wide-type coverage and balanced sample numbers of attack
rithms over time are increasing year by year, and increasing categories becomes a top priority in the field of intrusion
numbers of researchers are beginning to value the practical detection.

35378 VOLUME 6, 2018


Y. Xin et al.: Machine Learning and Deep Learning Methods for Cybersecurity

B. HYBRID METHOD [2] A. Milenkoski, M. Vieira, S. Kounev, A. Avritzer, and B. D. Payne,


‘‘Evaluating computer intrusion detection systems: A survey of common
Hybrid detection methods mostly combine machine-learning practices,’’ ACM Comput. Surv., vol. 48, no. 1, pp. 1–41, 2015.
methods such as those described by [30], [33], [41], whereas [3] C. N. Modi and K. Acha, ‘‘Virtualization layer security challenges and
intrusion detection with a combination of deep learning and intrusion detection/prevention systems in cloud computing: A comprehen-
sive review,’’ J. Supercomput., vol. 73, no. 3, pp. 1192–1234, 2017.
machine-learning methods is less studied. AlphaGo has val-
[4] E. Viegas, A. O. Santin, A. França, R. Jasinski, V. A. Pedroni, and
idated the validity of this idea, which is an exciting research L. S. Oliveira, ‘‘Towards an energy-efficient anomaly-based intrusion
direction. detection engine for embedded systems,’’ IEEE Trans. Comput., vol. 66,
no. 1, pp. 163–177, Jan. 2017.
[5] A. Patcha and J.-M. Park, ‘‘An overview of anomaly detection techniques:
C. DETECTION SPEED Existing solutions and latest technological trends,’’ Comput. Netw., vol. 51,
By reducing the detection time and improving the detection no. 12, pp. 3448–3470, Aug. 2007.
[6] C. Modi, D. Patel, B. Borisaniya, H. Patel, A. Patel, and M. Rajarajan,
speed from the algorithm and hardware aspects, the algorithm ‘‘A survey of intrusion detection techniques in Cloud,’’ J. Netw. Comput.
can be used less time given the complexity of the machine- Appl., vol. 36, no. 1, pp. 42–57, 2013.
learning algorithm and deep learning algorithm. Hardware [7] S. Revathi and A. Malathi, ‘‘A detailed analysis on NSL-KDD dataset using
various machine learning techniques for intrusion detection,’’ in Proc. Int.
can use multiple computers for parallel computing. Combin- J. Eng. Res. Technol., 2013, pp. 1848–1853.
ing the two approaches is also an interesting topic. [8] D. Sahoo, C. Liu, and S. C. H. Hoi. (2017). ‘‘Malicious URL
detection using machine learning: A survey.’’ [Online]. Available:
https://arxiv.org/abs/1701.07179
D. ONLINE LEARNING
[9] A. L. Buczak and E. Guven, ‘‘A survey of data mining and machine
The means of network intrusion is increasing day by day. How learning methods for cyber security intrusion detection,’’ IEEE Commun.
to fit the new data better with the trained model is also an Surveys Tuts., vol. 18, no. 2, pp. 1153–1176, 2nd Quart., 2016.
[10] M. Soni, M. Ahirwa, and S. Agrawal, ‘‘A survey on intrusion detection
exciting research direction. At present, transfer learning is a techniques in MANET,’’ in Proc. Int. Conf. Comput. Intell. Commun.
viable means to fine-tune the model with a small amount of Netw., 2016, pp. 1027–1032.
labeled data, which should be able to achieve better results in [11] R. G. Smith and J. Eckroth, ‘‘Building AI applications: Yesterday, today,
and tomorrow,’’ AI Mag., vol. 38, no. 1, pp. 6–22, 2017.
actual network detection. [12] P. Louridas and C. Ebert, ‘‘Machine learning,’’ IEEE Softw., vol. 33, no. 5,
pp. 110–115, Sep./Oct. 2016.
VI. CONCLUSION [13] M. I. Jordan and T. M. Mitchell, ‘‘Machine learning: Trends, perspectives,
and prospects,’’ Science, vol. 349, no. 6245, pp. 255–260, 2015.
This paper presents a literature review of ML and DL methods [14] Y. LeCun, Y. Bengio, and G. Hinton, ‘‘Deep learning,’’ Nature, vol. 521,
for network security. The paper, which has mostly focused pp. 436–444, May 2015.
on the last three years, introduces the latest applications of [15] G. E. Hinton, ‘‘Deep belief networks,’’ Scholarpedia, vol. 4, no. 5, p. 5947,
2009.
ML and DL in the field of intrusion detection. Unfortunately, [16] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, ‘‘Gradient-based learn-
the most effective method of intrusion detection has not yet ing applied to document recognition,’’ Proc. IEEE, vol. 86, no. 11,
been established. Each approach to implementing an intru- pp. 2278–2324, Nov. 1998.
[17] L. Deng and D. Yu, ‘‘Deep learning: Methods and applications,’’
sion detection system has its own advantages and disadvan- Found. Trends Signal Process., vol. 7, nos. 3–4, pp. 197–387,
tages, a point apparent from the discussion of comparisons Jun. 2014.
among the various methods. Thus, it is difficult to choose a [18] I. M. Coelho, V. N. Coelho, E. J. da S. Luz, L. S. Ochi, F. G. Guimarães,
and E. Rios, ‘‘A GPU deep learning metaheuristic based model for time
particular method to implement an intrusion detection system series forecasting,’’ Appl. Energy, vol. 201, no. 1, pp. 412–418, 2017.
over the others. [19] I. Žliobaitė, A. Bifet, J. Read, B. Pfahringer, and G. Holmes, ‘‘Evalu-
Datasets for network intrusion detection are very important ation methods and decision theory for classification of streaming data
with temporal dependence,’’ Mach. Learn., vol. 98, no. 3, pp. 455–482,
for training and testing systems. The ML and DL methods 2015.
do not work without representative data, and obtaining such [20] J. N. Goetz, A. Brenning, H. Petschko, and P. Leopold, ‘‘Evaluat-
a dataset is difficult and time-consuming. However, there ing machine learning and statistical prediction techniques for landslide
susceptibility modeling,’’ Comput. Geosci., vol. 81, no. 3, pp. 1–11,
are many problems with the existing public dataset, such as
2015.
uneven data, outdated content and the like. These problems [21] R. P. Lippmann et al., ‘‘Evaluating intrusion detection systems: The 1998
have largely limited the development of research in this area. DARPA off-line intrusion detection evaluation,’’ in Proc. DARPA Inf.
Network information update very fast, which brings to the Survivability Conf. Expo. (DISCEX), vol. 2, 2000, pp. 12–26 .
[22] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, ‘‘A detailed analysis
DL and ML model training and use with difficulty, model of the KDD CUP 99 data set,’’ in Proc. IEEE Int. Conf. Comput. Intell.
needs to be retrained long-term and quickly. So incremental Secur. Defense Appl., Jul. 2009, pp. 1–6.
learning and lifelong learning will be the focus in the study [23] G. Meena and R. R. Choudhary, ‘‘A review paper on IDS classification
using KDD 99 and NSL KDD dataset in WEKA,’’ in Proc. Int. Conf.
of this field in the future. Comput., Commun. Electron., 2017, pp. 553–558.
[24] V. Bolón-Canedo, N. Sánchez-Maroño, and A. Alonso-Betanzos, ‘‘Feature
ACKNOWLEDGMENT selection and classification in multiple class datasets: An application to
KDD CUP 99 dataset,’’ Expert Syst. Appl., vol. 38, no. 5, pp. 5947–5957,
(Yang Xin and Lingshuang Kong contributed equally to this 2011.
work.) [25] S. Lakhina, S. Joseph, and B. Verma, ‘‘Feature reduction using principal
component analysis for effective anomaly–based intrusion detection on
NSL-KDD,’’ Int. J. Eng. Sci. Technol., vol. 2, no. 6, pp. 3175–3180, 2010.
REFERENCES [26] M. Xie, J. Hu, X. Yu, and E. Chang, ‘‘Evaluating host-based anomaly
[1] S. Aftergood, ‘‘Cybersecurity: The cold war online,’’ Nature, vol. 547, detection systems: Application of the frequency-based algorithms to
no. 7661, pp. 30–31, Jul. 2017. ADFA-LD,’’ in Proc. Int. Conf. Netw. Syst. Secur., 2014, pp. 542–549.

VOLUME 6, 2018 35379


Y. Xin et al.: Machine Learning and Deep Learning Methods for Cybersecurity

[27] R. K. Sharma, H. K. Kalita, and P. Borah, ‘‘Analysis of machine learning [51] A. Ammar, ‘‘A decision tree classifier for intrusion detection priority
techniques based intrusion detection systems,’’ in Proc. Int. Conf. Adv. tagging,’’ J. Comput. Commun., vol. 3, no. 4, pp. 52–58, 2015.
Comput., Netw., Inform., 2016, pp. 485–493. [52] R. Selvi, S. S. Kumar, and A. Suresh, ‘‘An intelligent intrusion detection
[28] M. V. Kotpalliwar and R. Wajgi, ‘‘Classification of attacks using support system using average manhattan distance-based decision tree,’’ Adv. Intell.
vector machine (SVM) on KDDCUP’99 IDS database,’’ in Proc. Int. Conf. Syst. Comput., vol. 324, pp. 205–212, 2015.
Commun. Syst. Netw. Technol., 2015, pp. 987–990. [53] D. Moon, H. Im, I. Kim, and J. H. Park, ‘‘DTB-IDS: An intrusion
[29] H. Saxena and V. Richariya, ‘‘Intrusion detection in KDD99 dataset using detection system based on decision tree using behavior analysis for pre-
SVM-PSO and feature reduction with information gain,’’ Int. J. Comput. venting APT attacks,’’ J. Supercomput., vol. 73, no. 7, pp. 2881–2895,
Appl., vol. 98, no. 6, pp. 25–29, 2014. 2017.
[30] M. S. Pervez and D. M. Farid, ‘‘Feature selection and intrusion classifi- [54] S. Jo, H. Sung, and B. Ahn, ‘‘A comparative study on the performance
cation in NSL-KDD CUP 99 dataset employing SVMs,’’ in Proc. 8th Int. of intrusion detection using decision tree and artificial neural network
Conf. Softw., Knowl., Inf. Manage. Appl. (SKIMA), 2014, pp. 1–6. models,’’ J. Korea Soc. Digit. Ind. Inf. Manage., vol. 11, no. 4, pp. 33–45,
[31] A. M. Chandrasekhar and K. Raghuveer, ‘‘Confederation of FCM cluster- 2015.
ing, ANN and SVM techniques to implement hybrid NIDS using corrected [55] D. Kwon, H. Kim, J. Kim, S. C. Suh, I. Kim, and K. J. Kim, ‘‘A survey of
KDD CUP 99 dataset,’’ in Proc. Int. Conf. Commun. Signal Process., 2014, deep learning-based network anomaly detection,’’ Clust. Comput., vol. 4,
pp. 672–676. no. 3, pp. 1–13, Sep. 2017.
[32] M. Yan and Z. Liu, ‘‘A new method of transductive SVM-based network
[56] Y. Ding, S. Chen, and J. Xu, ‘‘Application of deep belief networks for
intrusion detection,’’ in Proc. IFIP TC Conf., Nanchang, China, Oct. 2010,
opcode based malware detection,’’ in Proc. Int. Joint Conf. Neural Netw.,
pp. 87–95.
2016, pp. 3901–3908.
[33] R. T. Kokila, S. T. Selvi, and K. Govindarajan, ‘‘DDoS detection and anal-
[57] M. Nadeem, O. Marshall, S. Singh, X. Fang, and X. Yuan, ‘‘Semi-
ysis in SDN-based environment using support vector machine classifier,’’
supervised deep neural network for network intrusion detection,’’ in Proc.
in Proc. 6th Int. Conf. Adv. Comput., 2015, pp. 205–210.
KSU Conf. Cybersecur. Educ. Res. Pract., Oct. 2016, pp. 1–13.
[34] M. Xie, J. Hu, and J. Slay, ‘‘Evaluating host-based anomaly detection
systems: Application of the one-class SVM algorithm to ADFA-LD,’’ [58] N. Gao, L. Gao, Q. Gao, and H. Wang, ‘‘An intrusion detection model based
in Proc. 11th Int. Conf. Fuzzy Syst. Knowl. Discovery (FSKD), 2014, on deep belief networks,’’ in Proc. 2nd Int. Conf. Adv. Cloud Big Data,
pp. 978–982. 2014, pp. 247–252.
[35] X. U. Peng and F. Jiang, ‘‘Network intrusion detection model based on [59] G. Zhao, C. Zhang, and L. Zheng, ‘‘Intrusion detection using deep belief
particle swarm optimization and k-nearest neighbor,’’ Comput. Eng. Appl., network and probabilistic neural network,’’ in Proc. IEEE Int. Conf. Com-
vol. 4, no. 5, pp. 31–38, 2014. put. Sci. Eng., vol. 1, Jul. 2017, pp. 639–642.
[36] B. B. Rao and K. Swathi, ‘‘Fast kNN classifiers for network intrusion [60] K. Alrawashdeh and C. Purdy, ‘‘Toward an online anomaly intrusion
detection system,’’ Indian J. Sci. Technol., vol. 10, no. 14, pp. 1–10, detection system based on deep learning,’’ in Proc. IEEE Int. Conf. Mach.
2017. Learn. Appl., Dec. 2017, pp. 195–200.
[37] A. M. Sharifi, S. A. Kasmani, and A. Pourebrahimi, ‘‘Intrusion detection [61] M. Z. Alom, V. R. Bontupalli, and T. M. Taha, ‘‘Intrusion detection
based on joint of K-means and KNN,’’ J. Converg. Inf. Technol., vol. 10, using deep belief networks,’’ in Proc. Aerosp. Electron. Conf., 2016,
no. 5, pp. 42–51, 2015. pp. 339–344.
[38] H. Shapoorifard and P. Shamsinejad, ‘‘Intrusion detection using a novel [62] Q. Tan, W. Huang, and Q. Li, ‘‘An intrusion detection method based on
hybrid method incorporating an improved KNN,’’ Int. J. Comput. Appl., DBN in ad hoc networks,’’ in Proc. Int. Conf. Wireless Commun. Sensor
vol. 173, no. 1, pp. 5–9, 2017. Netw., 2016, pp. 477–485.
[39] W. Meng, W. Li, and L.-F. Kwok, ‘‘Design of intelligent [63] C. L. Yin, Y. F. Zhu, J. L. Fei, and X. Z. He, ‘‘A deep learning approach for
KNN-based alarm filter using knowledge-based alert verification in intrusion detection using recurrent neural networks,’’ IEEE Access, vol. 5,
intrusion detection,’’ Secur. Commun. Netw., vol. 8, no. 18, pp. 3883–3895, pp. 21954–21961, 2017.
2015. [64] R. B. Krishnan and N. R. Raajan, ‘‘An intellectual intrusion detection sys-
[40] S. Vishwakarma, V. Sharma, and A. Tiwari, ‘‘An intrusion detection system tem model for attacks classification using RNN,’’ Int. J. Pharm. Technol.,
using KNN-ACO algorithm,’’ Int. J. Comput. Appl., vol. 171, no. 10, vol. 8, no. 4, pp. 23157–23164, 2016.
pp. 18–23, 2017. [65] R. C. Staudemeyer, ‘‘Applying long short-term memory recurrent neural
[41] E. G. Dada, ‘‘A hybridized SVM-kNN-pdAPSO approach to intrusion networks to intrusion detection,’’ South Afr. Comput. J., vol. 56, no. 1,
detection system,’’ in Proc. Fac. Seminar Ser., 2017, pp. 14–21. pp. 136–154, 2015.
[42] B. Ingre, A. Yadav, and A. K. Soni, ‘‘Decision tree based intrusion detec- [66] J. Kim, J. Kim, H. L. T. Thu, and H. Kim, ‘‘Long short term memory
tion system for NSL-KDD dataset,’’ in Proc. Int. Conf. Inf. Commun. recurrent neural network classifier for intrusion detection,’’ in Proc. Int.
Technol. Intell. Syst., 2017, pp. 207–218. Conf. Platform Technol. Service, 2016, pp. 1–5.
[43] A. J. Malik and F. A. Khan, ‘‘A hybrid technique using binary particle [67] G. Kim, H. Yi, J. Lee, Y. Paek, and S. Yoon. (2016). ‘‘LSTM-
swarm optimization and decision tree pruning for network intrusion detec- based system-call language modeling and robust ensemble method for
tion,’’ Clust. Comput., vol. 2, no. 3, pp. 1–14, Jul. 2017. designing host-based intrusion detection systems.’’ [Online]. Available:
[44] N. G. Relan and D. R. Patil, ‘‘Implementation of network intrusion detec- https://arxiv.org/abs/1611.01726
tion system using variant of decision tree algorithm,’’ in Proc. Int. Conf.
[68] T.-T.-H. Le, J. Kim, and H. Kim, ‘‘An effective intrusion detection classifier
Nascent Technol. Eng. Field, 2015, pp. 1–5.
using long short-term memory with gradient descent optimization,’’ in
[45] K. Rai, M. S. Devi, and A. Guleria, ‘‘Decision tree based algorithm for
Proc. Int. Conf. Platform Technol. Service, 2017, pp. 1–6.
intrusion detection,’’ Int. J. Adv. Netw. Appl., vol. 7, no. 4, pp. 2828–2834,
[69] L. Bontemps, V. L. Cao, J. Mcdermott, and N. A. Le-Khac, ‘‘Collective
2016.
anomaly detection based on long short-term memory recurrent neural
[46] A. M. Modinat, G. A. Abimbola, O. B. Abdullateef, and A. Opeyemi,
networks,’’ in Proc. Int. Conf. Future Data Secur. Eng., 2017, pp. 141–152.
‘‘Gain ratio and decision tree classifier for intrusion detection,’’ Int. J.
Comput. Appl., vol. 126, no. 1, pp. 8887–8975, 2015. [70] A. F. Agarap. (2017). ‘‘A neural network architecture combining
[47] C. Azad and V. K. Jha, ‘‘Genetic algorithm to solve the problem of gated recurrent unit (GRU) and support vector machine (SVM)
small disjunct in the decision tree based intrusion detection system,’’ Int. for intrusion detection in network traffic data.’’ [Online]. Available:
J. Comput. Netw. Inf. Secur., vol. 7, no. 8, pp. 56–71, 2015. https://arxiv.org/abs/1709.03082
[48] S. Puthran and K. Shah, ‘‘Intrusion detection using improved decision tree [71] T. Ergen and S. S. Kozat, ‘‘Efficient online learning algorithms based on
algorithm with binary and quad split,’’ in Proc. Int. Symp. Secur. Comput. LSTM neural networks,’’ IEEE Trans. Neural Netw. Learn. Syst., vol. 4,
Commun., 2016, pp. 427–438. no. 2, pp. 1–12, 2017.
[49] A. O. Balogun and R. G. Jimoh, ‘‘Anomaly intrusion detection using an [72] S.-J. Bu and S.-B. Cho, ‘‘A hybrid system of deep learning and learning
hybrid of decision tree and K-nearest neighbor,’’ J. Adv. Sci. Res. Appl., classifier system for database intrusion detection,’’ in Hybrid Artificial
vol. 2, no. 1, pp. 67–74, 2015. Intelligent Systems, 2017, pp. 615–625.
[50] M. A. Iniesta-Bonillo, R. Sánchez-Fernández, and D. Jiménez-Castillo, [73] Y. Yu, J. Long, and Z. Cai, ‘‘Network intrusion detection through stacking
‘‘Sustainability, value, and satisfaction: Model testing and cross-validation dilated convolutional autoencoders,’’ Secur. Commun. Netw., vol. 2, no. 3,
in tourist destinations,’’ J. Bus. Res., vol. 69, no. 11, pp. 5002–5007, 2016. pp. 1–10, 2017.

35380 VOLUME 6, 2018


Y. Xin et al.: Machine Learning and Deep Learning Methods for Cybersecurity

[74] B. Kolosnjaji, G. Eraisha, G. Webster, A. Zarras, and C. Eckert, ‘‘Empow- YANMIAO LI is currently pursuing the Ph.D.
ering convolutional networks for malware classification and analysis,’’ in degree in telecommunications. His main research
Proc. Int. Joint Conf. Neural Netw. (IJCNN), 2017, pp. 3838–3845. interests include information security, user cross-
[75] B. Kolosnjaji, A. Zarras, G. Webster, and C. Eckert, ‘‘Deep learning for domain behavior analysis, and network security.
classification of malware system call sequences,’’ in AI 2016: Advances in
Artificial Intelligence, 2016, pp. 137–149.
[76] J. Saxe and K. Berlin. (2017). ‘‘eXpose: A character-level convolutional
neural network with embeddings for detecting malicious urls, file paths
and registry keys.’’ [Online]. Available: https://arxiv.org/abs/1702.08568
[77] W. Wang, M. Zhu, J. Wang, X. Zeng, and Z. Yang, ‘‘End-to-end encrypted
traffic classification with one-dimensional convolution neural networks,’’
in Proc. IEEE Int. Conf. Intell. Secur. Inform. (ISI), Jul. 2017, pp. 43–48.
[78] W. Wang, M. Zhu, X. Zeng, X. Ye, and Y. Sheng, ‘‘Malware traffic clas-
sification using convolutional neural network for representation learning,’’
in Proc. Int. Conf. Inf. Netw., 2017, pp. 712–717.
HONGLIANG ZHU received the Ph.D. degree in
information security from the Beijing University
of Posts and Telecommunications. He is currently
YANG XIN received the Ph.D. degree in infor- a Lecturer with BUPT, where he serves as the Vice
mation security from the Beijing University of Director of the Beijing Engineering Lab for Cloud
Posts and Telecommunications (BUPT). He is cur- Security, Information Security Center. His current
rently an Associate Professor with BUPT, where research interests include network security and big
he serves as the Vice Director of the Beijing data security.
Engineering Lab for Cloud Security, Information
Security Center. His current research interests
include network security and big data security.

MINGCHENG GAO received the master’s degree


LINGSHUANG KONG is currently pursuing the
in electronics and communication engineering
master’s degree with the School of Information
from Shangdong University in 2011. He is cur-
Science and Engineering, Shandong University.
rently pursuing the Ph.D. degree in information
His research field includes pattern recognition,
security with the Beijing University of Posts and
machine learning, and deep learning.
Telecommunications. His main research interests
include information security, user cross-domain
behavior analysis, and network security.

ZHI LIU received the Ph.D. degrees from the


Institute of Image Processing and Pattern Recog-
nition, Shanghai Jiao Tong University, in 2008. HAIXIA HOU is currently pursuing the Ph.D.
He is currently an Associate Professor with the degree in telecommunications. His main research
School of Information Science and Engineering, interests include information security, user
Shandong University. He is also the Head of the cross-domain behavior analysis, network, and
Intelligent Information Processing Group. His cur- blockchain.
rent research interests are in applications of com-
putational intelligence to linked multicomponent
big data systems, medical image in the neuro-
sciences, multimodal human computer interaction, remote sensing image
processing, content based image retrieval, semantic modeling, data process-
ing, classification, and data mining.

YULING CHEN is currently an Associate CHUNHUA WANG is currently a Senior Engineer


Professor with the Guizhou Provincial Key Lab- with the China Changfeng Science Technology
oratory of Public Big Data, Guizhou University, Industry Group Corporation.
Guiyang, China. Her recent research interests
include cryptography and information safety.

VOLUME 6, 2018 35381

You might also like