Applsci 09 04396 PDF
Applsci 09 04396 PDF
Applsci 09 04396 PDF
sciences
Review
Machine Learning and Deep Learning Methods for
Intrusion Detection Systems: A Survey
Hongyu Liu * and Bo Lang
State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China;
[email protected]
* Correspondence: [email protected]
Received: 14 September 2019; Accepted: 11 October 2019; Published: 17 October 2019
Abstract: Networks play important roles in modern life, and cyber security has become a vital
research area. An intrusion detection system (IDS) which is an important cyber security technique,
monitors the state of software and hardware running in the network. Despite decades of development,
existing IDSs still face challenges in improving the detection accuracy, reducing the false alarm rate
and detecting unknown attacks. To solve the above problems, many researchers have focused
on developing IDSs that capitalize on machine learning methods. Machine learning methods can
automatically discover the essential differences between normal data and abnormal data with high
accuracy. In addition, machine learning methods have strong generalizability, so they are also able
to detect unknown attacks. Deep learning is a branch of machine learning, whose performance is
remarkable and has become a research hotspot. This survey proposes a taxonomy of IDS that takes
data objects as the main dimension to classify and summarize machine learning-based and deep
learning-based IDS literature. We believe that this type of taxonomy framework is fit for cyber security
researchers. The survey first clarifies the concept and taxonomy of IDSs. Then, the machine learning
algorithms frequently used in IDSs, metrics, and benchmark datasets are introduced. Next, combined
with the representative literature, we take the proposed taxonomic system as a baseline and explain
how to solve key IDS issues with machine learning and deep learning techniques. Finally, challenges
and future developments are discussed by reviewing recent representative studies.
Keywords: machine learning; deep learning; intrusion detection system; cyber security
1. Introduction
Networks have increasing influences on modern life, making cyber security an important field
of research. Cyber security techniques mainly include anti-virus software, firewalls and intrusion
detection systems (IDSs). These techniques protect networks from internal and external attacks. Among
them, an IDS is a type of detection system that plays a key role in protecting cyber security by
monitoring the states of software and hardware running in a network.
The first intrusion detection system was proposed in 1980 [1]. Since then, many mature IDS
products have arisen. However, many IDSs still suffer from a high false alarm rate, generating many
alerts for low nonthreatening situations, which raises the burden for security analysts and can cause
seriously harmful attack to be ignored. Thus, many researchers have focused on developing IDSs with
higher detection rates and reduced false alarm rates. Another problem with existing IDSs is that they
lack the ability to detect unknown attacks. Because network environments change quickly, attack
variants and novel attacks emerge constantly. Thus, it is necessary to develop IDSs that can detect
unknown attacks.
To address the above problems, researchers have begun to focus on constructing IDSs using
machine learning methods. Machine learning is a type of artificial intelligence technique that can
automatically discover useful information from massive datasets [2]. Machine learning-based IDSs can
achieve satisfactory detection levels when sufficient training data is available, and machine learning
models have sufficient generalizability to detect attack variants and novel attacks. In addition, machine
learning-based IDSs do not rely heavily on domain knowledge; therefore, they are easy to design and
construct. Deep learning is a branch of machine learning that can achieve outstanding performances.
Compared with traditional machine learning techniques, deep learning methods are better at dealing
with big data. Moreover, deep learning methods can automatically learn feature representations from
raw data and then output results; they operate in an end-to-end fashion and are practical. One notable
characteristic of deep learning is the deep structure, which contains multiple hidden layers. In contrast,
traditional machine learning models, such as the support vector machine (SVM) and k-nearest neighbor
(KNN), contain none or only one hidden layer. Therefore, these traditional machine learning models
are also called shallow models.
The purpose of this survey is to classify and summarize the machine learning-based IDSs proposed
to date, abstract the main ideas of applying machine learning to security domain problems, and analyze
the current challenges and future developments. For this survey, we selected representative papers
published from 2015 to 2019, which reflect the current progress. Several previous surveys [3–5]
have classified research efforts by their applied machine learning algorithms. These surveys are
primarily intended to introduce different machine learning algorithms applied to IDSs, which can be
helpful to machine learning researchers. However, this type of taxonomic system emphasizes specific
implementation technologies rather than cyber security domain problems. As a result, these surveys
do not directly address how to resolve IDS domain problems using machine learning. For coping with
this problem, we propose a new data-centered IDS taxonomy in this survey, and introduce the related
studies following this taxonomy.
Data objects are the most basic elements in IDS. Data objects carry features related to attack
behaviors. Feature types and feature extraction methods differ among different data elements, causing
the most appropriate machine learning models to also differ. Therefore, this survey thoroughly analyzes
the data processed in cyber security and classifies IDSs on the basis of data sources. This taxonomy
presents a path involving data–feature–attack behavior–detection model, which is convenient for
readers to find study ideas for particular domain problems. For example, this taxonomic system can
answer the following problems: (1) What features best represent different attacks? (2) What type of data
is most suitable for detecting certain attacks? (3) What types of machine learning algorithms are the best
fit for a specific data type? (4) How do machine learning methods improve IDSs along different aspects?
These problems appeal to cyber security researchers. Finally, the challenges and future development
of machine learning methods for IDS are discussed by summarizing recent representative studies.
The rest of this paper is organized as follows: Section 2 introduces the key concepts and the
taxonomy of IDS. Section 3 introduces the frequently used machine learning algorithms in IDS, their
metrics, and common benchmark datasets. Section 4 classifies IDS according to data sources and sums
up the process of applying machine learning to IDSs. Section 5 discusses the challenges and future
directions of machine learning-based IDSs, and Section 6 concludes the paper.
anomaly detection. Among the data source-based methods, IDSs can be divided into host-based
and network-based methods [7]. This survey combines these two types of IDS classification methods,
taking the data source as the main classification consideration and treating the detection method as
a secondary classification element. The proposed taxonomy is shown in Figure 1. Regarding detection
methods, the survey concentrates on machine learning methods. We introduce how to apply machine
learning to IDS using different types of data in detail in Section 4.
Combine of
rule-based system
Host-based Log-based Feature
IDS detection engineering
Text analysis
Packet parsing
Packet-based
Source of data detection
Payload analysis
Feature
engineering
Network-based Flow-based
Deep learning
IDS detection
Traffic grouping
Statistical
IDS feature
Session-based
detection Sequence
feature
Statistical
model
Anomaly Machine
detection learning
Time series
Detection
methods Pattern
matching
Misuse
Expert system
detection
Finite-state
machine
As shown in Figure 1, a host-based IDS uses audit logs as a data source. Log detection methods are
mainly hybrids based on rule and machine learning, rely on log features, and use text analysis-based
methods. A network-based IDS uses network traffic as a data source—typically packets, which are the
basic units of network communication. A flow is the set of packets within a time window, which reflects
the network environment. A session is a packet sequence combined on the basis of a network
information 5-tuple (client IP, client port, server IP, server port, protocol). A session represents
high-level semantic information of traffic. Packets contain packet headers and payloads; therefore,
packet detection includes parsing-based and payload analysis-based methods. Based on feature
Appl. Sci. 2019, 9, 4396 5 of 28
extraction, flow detection can be divided into feature engineering-based and deep learning-based
methods. In addition, traffic grouping is a unique approach in flow detection. Based on whether
sequence information is used, session detection can be divided into statistical feature-based and
sequence feature-based methods.
ANN
SVM
KNN
Supervised
Naïve Bayes
learning
Logistic
regression
Shallow model
Decision tree
Unsupervised
K-means
learning
DBN
Machine
learning model
DNN
Supervised
learning
CNN Bi-RNN
GAN GRU
Unsupervised Stacked
RBM
learning Autoencoder
Sparse
Autoencoder
Autoencoder
Denoising
Autoencoder
Artificial Neural Network (ANN). The design idea of an ANN is to mimic the way human brains
work. An ANN contains an input layer, several hidden layers, and an output layer. The units in
adjacent layers are fully connected. An ANN contains a huge number of units and can theoretically
approximate arbitrary functions; hence, it has strong fitting ability, especially for nonlinear functions.
Due to the complex model structure, training ANNs is time-consuming. It is noteworthy that ANN
models are trained by the backpropagation algorithm that cannot be used to train deep networks.
Thus, an ANN belongs to shallow models and differs from the deep learning models discussed
in Section 3.1.2.
Support Vector Machine (SVM). The strategy in SVMs is to find a max-margin separation
hyperplane in the n-dimension feature space. SVMs can achieve gratifying results even with small-scale
training sets because the separation hyperplane is determined only by a small number of support
vectors. However, SVMs are sensitive to noise near the hyperplane. SVMs are able to solve linear
Appl. Sci. 2019, 9, 4396 7 of 28
problems well. For nonlinear data, kernel functions are usually used. A kernel function maps the
original space into a new space so that the original nonlinear data can be separated. Kernel tricks are
widespread among both SVMs and other machine learning algorithms.
K-Nearest Neighbor (KNN). The core idea of KNN is based on the manifold hypothesis. If most
of a sample’s neighbors belong to the same class, the sample has a high probability of belonging to
the class. Thus, the classification result is only related to the top-k nearest neighbors. The parameter k
greatly influences the performance of KNN models. The smaller k is, the more complex the model
is and the higher the risk of overfitting. Conversely, the larger k is, the simpler the model is and the
weaker the fitting ability.
Naïve Bayes. The Naïve Bayes algorithm is based on the conditional probability and the
hypothesis of attribute independence. For every sample, the Naïve Bayes classifier calculates the
conditional probabilities for different classes. The sample is classified into the maximum probability
class. The conditional probability formula is calculated as shown in Formula (1).
n
P ( X = x |Y = c k ) = ∏ P ( X ( i ) = x ( i ) |Y = c k ) (1)
i =1
When the attribute independence hypothesis is satisfied, the Naïve Bayes algorithm reaches the
optimal result. Unfortunately, that hypothesis is difficult to satisfy in reality; hence, the Naïve Bayes
algorithm does not perform well on attribute-related data.
Logistic Regression (LR). The LR is a type of logarithm linear model. The LR algorithm computes
the probabilities of different classes through parametric logistic distribution, calculated as shown in
Formula (2).
e wk ∗ x
P (Y = k | x ) = (2)
1 + ∑kK −1 ewk ∗ x
where k = 1,2...K − 1. The sample x is classified into the maximum probability class. An LR model is
easy to construct, and model training is efficient. However, LR cannot deal well with nonlinear data,
which limits its application.
Decision tree. The decision tree algorithm classifies data using a series of rules. The model is tree
like, which makes it interpretable. The decision tree algorithm can automatically exclude irrelevant and
redundant features. The learning process includes feature selection, tree generation, and tree pruning.
When training a decision tree model, the algorithm selects the most suitable features individually
and generates child nodes from the root node. The decision tree is a basic classifier. Some advanced
algorithms, such as the random forest and the extreme gradient boosting (XGBoost), consist of multiple
decision trees.
Clustering. Clustering is based on similarity theory, i.e., grouping highly similar data into the
same clusters and grouping less-similar data into different clusters. Different from classification,
clustering is a type of unsupervised learning. No prior knowledge or labeled data is needed for
clustering algorithms; therefore, the data set requirements are relatively low. However, when using
clustering algorithms to detect attacks, it is necessary to refer external information.
K-means is a typical clustering algorithm, where K is the number of clusters and the means is the
mean of attributes. The K-means algorithm uses distance as a similarity measure criterion. The shorter
the distance between two data objects is, the more likely they are to be placed in the same cluster.
The K-means algorithm adapts well to linear data, but its results on nonconvex data are not ideal.
In addition, the K-means algorithm is sensitive to the initialization condition and the parameter K.
Consequently, many repeated experiments must be run to set the proper parameter value.
Ensembles and Hybrids. Every individual classifier has strengths and shortcomings. A natural
approach is to combine various weak classifiers to implement a strong classifier. Ensemble methods
train multiple classifiers; then, the classifiers vote to obtain the final results. Hybrid methods are
designed as many stages, in which each stage uses a classification model. Because ensemble and hybrid
Appl. Sci. 2019, 9, 4396 8 of 28
classifiers usually perform better than do single classifiers, an increasing number of researchers have
begun to study ensemble and hybrid classifiers. The key points lie in selecting which classifiers to
combine and how they are combined.
Supervised or
Algorithms Suitable Data Types Functions
Unsupervised
Feature extraction; Feature
Autoencoder Raw data; Feature vectors Unsupervised
reduction; Denoising
Feature extraction; Feature
RBM Feature vectors Unsupervised
reduction; Denoising
Feature extraction;
DBN Feature vectors Supervised
Classification
Feature extraction;
DNN Feature vectors Supervised
Classification
Raw data; Feature vectors; Feature extraction;
CNN Supervised
Matrices Classification
Raw data; Feature vectors; Feature extraction;
RNN Supervised
Sequence data Classification
Data augmentation;
GAN Raw data; Feature vectors Unsupervised
Adversarial training
Features
Raw data Reconstructed data
Restricted Boltzmann Machine (RBM). An RBM is a randomized neural network in which units
obey the Boltzmann distribution. An RBM is composed of a visible layer and a hidden layer. The units
in the same layer are not connected; however, the units in different layers are fully connected, as shown
in Figure 4. where vi is a visible layer, and hi is a hidden layer. RBMs do not distinguish between
the forward and backward directions; thus, the weights in both directions are the same. RBMs are
unsupervised learning models trained by the contrastive divergence algorithm [17], and they are
usually applied for feature extraction or denoising.
Deep Brief Network (DBN). A DBN consists of several RBM layers and a softmax classification
layer, as shown in Figure 5. Training a DBN involves two stages: unsupervised pretraining and
supervised fine-tuning [18,19]. First, each RBM is trained using greedy layer-wise pretraining.
Then, the weight of the softmax layer are learned by labeled data. In attack detection, DBNs are
used for both feature extraction and classification [20–22].
Output
Hidden layer2
RBM2
Visible layer2
Hidden layer1
RBM1
Visible layer1
Input
Deep Neural Network (DNN). A layer-wise pretraining and fine-tuning strategy makes it
possible to construct DNNs with multiple layers, as shown in Figure 6. When training a DNN,
Appl. Sci. 2019, 9, 4396 10 of 28
the parameters are learned first using unlabeled data, which is an unsupervised feature learning
stage; then, the network is tuned through the labeled data, which is a supervised learning stage.
The astonishing achievements of DNNs are mainly due to the unsupervised feature learning stage.
Output layer
Hidden layer
Input layer
Convolutional Neural Network (CNN). CNNs are designed to mimic the human visual system
(HVS); consequently, CNNs have made great achievements in the computer vision field [23–25].
A CNN is stacked with alternate convolutional and pooling layers, as shown in Figure 7.
The convolutional layers are used to extract features, and the pooling layers are used to enhance
the feature generalizability. CNNs work on 2-dimensional (2D) data, so the input data must be
translated into matrices for attack detection.
Convolutional layer Pooling layer Convolutional layer Pooling layer Fully connected layer
Recurrent Neural Network (RNN). RNNs are networks designed for sequential data and are
widely used in natural language processing (NLP) [26–28]. The characteristics of sequential data
are contextual; analyzing isolated data from the sequence makes no sense. To obtain contextual
information, each unit in an RNN receives not only the current state but also previous states.
The structure of an RNN is shown in Figure 8. Where all the W items in Figure 8 are the same.
This characteristic causes RNNs to often suffer from vanishing or exploding gradients. In reality,
standard RNNs deal with only limited-length sequences. To solve the long-term dependence problem,
many RNN variants have been proposed, such as long short-term memory (LSTM) [29], gated recurrent
unit (GRU) [30], and bi-RNN [31].
Appl. Sci. 2019, 9, 4396 11 of 28
Unfold
Hidden layer h ht-1 ht ht+1
W W W W
The LSTM model was proposed by Hochreiter and Schmidhuber in 1997 [29]. Each LSTM unit
contains three gates: a forget gate, an input gate, and an output gate. The forget gate eliminates
outdated memory, the input gate receives new data, and the output gate combines short-term memory
with long-term memory to generate the current memory state. The GRU was proposed by Chung et al.
in 2014 [30]. The GRU model merges the forget gate and the input gate into a single update gate, which
is simpler than the LSTM.
Generative Adversarial Network (GAN). A GAN model includes two subnetworks, i.e., a generator
and a discriminator. The generator aims to generate synthetic data similar to the real data, and
the discriminator intends to distinguish synthetic data from real data. Thus, the generator and the
discriminator improve each other. GANs are currently a hot research topic used to augment data in
attack detection, which partly ease the problem of IDS dataset shortages. Meanwhile, GANs belong
to adversarial learning approaches which can raise the detection accuracy of models by adding
adversarial samples to the training set.
3.2. Metrics
Many metrics are used to evaluate machine learning methods. The optimal models are selected
using these metrics. To comprehensively measure the detection effect, multiple metrics are often used
simultaneously in IDS research.
• Accuracy is defined as the ratio of correctly classified samples to total samples. Accuracy
is a suitable metric when the dataset is balanced. In real network environments; however,
normal samples are far more abundant than are abnormal samples; thus, accuracy may not
be a suitable metric.
TP + TN
Accuracy = (3)
TP + FP + FN + TN
• Precision (P) is defined as the ratio of true positive samples to predicted positive samples;
it represents the confidence of attack detection.
TP
P= (4)
TP + FP
• Recall (R) is defined as the ratio of true positive samples to total positive samples and is also called
the detection rate. The detection rate reflects the model’s ability to recognize attacks, which is an
important metric in IDS.
TP
R= (5)
TP + FN
• F-measure (F) is defined as the harmonic average of the precision and the recall.
2∗P∗R
F= (6)
P+R
• The false negative rate (FNR) is defined as the ratio of false negative samples to total positive
samples. In attack detection, the FNR is also called the missed alarm rate.
FN
FNR = (7)
TP + FN
• The false positive rate (FPR) is defined as the ratio of false positive samples to predicted positive
samples. In attack detection, the FPR is also called the false alarm rate, and it is calculated
as follows:
FP
FPR = (8)
TP + FP
where the TP is the true positives, FP is the false positives, TN is the true negatives, FN is the false
negatives. The purpose of an IDS is to recognize attacks; therefore, attack samples are usually regarded
as positives, and normal samples are usually regarded as negatives. In attack detection, the frequently
used metrics include accuracy, recall (or detection rate), FNR (or missed alarm rate), and FPR (or false
alarm rate).
using common benchmark datasets, which allows new study results to be compared with those of
previous studies.
(1) DARPA1998
The DARPA1998 dataset [36] was built by the Lincoln laboratory of MIT and is a widely used
benchmark dataset in IDS studies. To compile it, the researchers collected Internet traffic over nine
weeks; the first seven weeks form the training set, and the last two weeks form the test set. The dataset
contains both raw packets and labels. There are five types of labels: normal, denial of service (DOS),
Probe, User to Root (U2R) and Remote to Local (R2L). Because raw packets cannot be directly applied to
traditional machine learning models, the KDD99 dataset was constructed to overcome this drawback.
(2) KDD99
The KDD99 [37] dataset is the most widespread IDS benchmark dataset at present. Its compilers
extracted 41-dimensional features from data in DARPA1998. The labels in KDD99 are the same as the
DARPA1998. There are four types of features in KDD99, i.e., basic features, content features, host-based
statistical features, and time-based statistical features. Unfortunately, the KDD99 dataset includes many
defects. First, the data are severely unbalanced, making the classification results biased toward the
majority classes. Additionally, there are many duplicate records and redundant records exist. Many
researchers have to filter the dataset carefully before they can use it. As a result, the experimental
results from different studies are not always comparable. Last but not least, KDD data are too old to
represent the current network environment.
(3) NSL-KDD
To overcome the shortcomings of the KDD99 dataset, the NSL-KDD [38] was proposed.
The records in the NSL-KDD were carefully selected based on the KDD99. Records of different
classes are balanced in the NSL-KDD, which avoids the classification bias problem. The NSL-KDD also
removed duplicate and redundant records; therefore, it contains only a moderate number of records.
Therefore, the experiments can be implemented on the whole dataset, and the results from different
papers are consistent and comparable. The NSL-KDD alleviates the problems of data bias and data
redundancy to some degree. However, the NSL-KDD does not include new data; thus, minority class
samples are still lacking, and its samples are still out-of-date.
(4) UNSW-NB15
The UNSW-NB15 [39] dataset was compiled by the University of South Wales, where researchers
configured three virtual servers to capture network traffic and extracted 49-dimensional features
using tool named Bro. The dataset includes more types of attacks than does the KDD99 dataset, and
its features are more plentiful. The data categories include normal data and nine types of attacks.
The features include flow features, basic features, content features, time features, additional features, and
labeled features. The UNSW-NB15 is representative of new IDS datasets, and has been used in some
recent studies. Although the influence of UNSW-NB15 is currently inferior to that of KDD99, it is
necessary to construct new datasets for developing new IDS based on machine learning.
time; therefore flow data is suitable for detecting a DOS attack. A covert channel involves data-leaking
activity between two specific IP addresses, which is more suited to detection from session data.
experiments on the ISCX 2012 dataset and detected attacks with both statistical and content
features. The statistical features mainly came from packet headers and included protocols, IPs, and
ports. The content features came from the payloads. First, payloads from different packets were
concatenated. Next, the concatenated payloads were encoded by skip-gram word embedding. Then,
the content features were extracted with a CNN. Finally, they trained a random forest model to detect
attacks. The final model reached an accuracy of 99.13%.
Combining various payload analysis techniques can achieve comprehensive content information,
which is able to improve the effect of the IDS. Zeng et al. [44] proposed a payload detection method
with multiple deep learning models. They adopted three deep learning models (a CNN, an LSTM,
and a stacked autoencoder) to extract features from different points of view. Among these, the CNN
extracted local features, the RNN extracted time series features, and the stacked autoencoder extracted
text features. The accuracy of this combined approach reached 99.22% on the ISCX 2012 dataset.
Extracting payload features with unsupervised learning is also an effective detection method.
Yu et al. [45] utilized a convolutional autoencoder to extract payload features and conducted
experiments on the CTU-UNB dataset. This dataset includes the raw packets of 8 attack types. To take
full advantage of convolutions, they first converted the packets into images. Then, they trained
a convolutional autoencoder model to extract features. Finally, they classified packets using
learned features. The precision, recall and F-measure on the test set reached 98.44%, 98.40%, and
98.41% respectively.
To enhance the robustness of IDSs, adversarial learning becomes a novel approach. Adversarial
learning can be used for attacks against IDS. Meanwhile, it is also a novel way to improve detection
accuracy of IDS. Rigaki et al. [46] used a GAN to improve the malware detection effect. To evade
detection, malware applications try to generate packets similar to normal packets. Taking the malware
FLU as an example, the command & control (C & C) packets are very similar to packets generated by
Facebook. They configured a virtual network system with hosts, servers, and an IPS. Then, they started
up the malware FLU and trained a GAN model. The GAN guided the malware to produce packets
similar to Facebook. As the training epochs increased, the packets blocked by the IPS decreased and
packet that passed inspection increased. The result was that the malicious packets generated by the
GAN were more similar to normal packets. Then, by analyzing the generated packets, the robustness
of the IPS was improved.
The existing feature engineering-based IDSs often have high detection accuracy but suffer from
a high false alarm rate. One solution is to combine many weak classifiers to obtain a strong classifier.
Goeschel et al. [47] proposed a hybrid method that included SVM, decision tree, and Naïve Bayes
algorithms. They first trained an SVM model to divide the data into normal or abnormal samples.
For the abnormal samples, they utilized a decision tree model to determine specific attack types.
However, a decision tree model can identify only known attacks, not unknown attacks. Thus, they also
applied a Naïve Bayes classifier to discover unknown attacks. By taking advantage of three different
classifier types this hybrid method achieved an accuracy of 99.62% and a false alarm rate of 1.57% on
the KDD99 dataset.
Another research objective is to accelerate the detection speed. Kuttranont et al. [48] proposed
a KNN-based detection method and accelerated calculation via parallel computing techniques running
on a graphics processing unit (GPU). They modified the neighbor-selecting rule of the KNN algorithm.
The standard KNN selects the top K nearest samples as neighbors, while the improved algorithm
selects a fixed percentage (such as 50%) of the neighboring samples as neighbors. The proposed method
considers the unevenness of data distribution and performs well on sparse data. These experiments
were conducted using the KDD99 dataset, achieving an accuracy of 99.30%. They also applied parallel
computing and the GPU to accelerating calculation. The experimental results showed that the method
with the GPU was approximately 30 times faster than that without the GPU.
The unsupervised learning methods are also applied to IDS, a typical way is to divide data with
clustering algorithms. The standard K-means algorithm is inefficient on big datasets. To improve
detection efficiency, Peng et al. [13] proposed an improved K-means detection method with mini
batch. They first carried out data preprocessing on the KDD99 dataset. The nominal features were
transformed into numerical types, and each dimension of the features was normalized by the max-min
method. Then, they reduced the dimensions using the principal components analysis (PCA) algorithm.
Finally, they clustered the samples with the K-means algorithm, but they improved K-means from
two aspects. (1) They altered the method of initialization to avoid becoming stuck in a local optimum.
(2) They introduced the mini-batch trick to decrease the running time. Compared with the standard
K-means, the proposed method achieved higher accuracy and runtime efficiency.
enhancing its ability to detect unknown samples. Finally, they classified the data using an XGBoost
model. Their model achieved accuracies on the Normal, DOS, Probe, R2L, and U2R classes of 99.96%,
99.17%, 99.50%, 97.13%, and 89.00%, respectively.
Deep learning models have made great strides in big data analysis; however, their performances
are not ideal on small or unbalanced datasets. Adversarial learning approaches can improve the
detection accuracy on small datasets. Zhang et al. [51] conducted data augmentation with a GAN.
The KDD99 dataset is both unbalanced and lacks new data, which leads to poor generalizability of
machine learning models. To address these problems, they utilized a GAN to expand the dataset.
The GAN model generated data similar to the flow data of KDD99. Adding this generated data to
the training set allows attack variants to be detected. They selected 8 types of attacks and compared
the accuracies achieved on the original dataset compared to the expanded dataset. The experimental
results showed that adversarial learning improved 7 accuracies in 8 attack types.
The existing session-based detection methods often face problems of low accuracy and have high
runtime costs. Ahmim et al. [54] proposed a hierarchical decision tree method in which, reduce the
detection time, they analyzed the frequency of different types of attacks and designed the detection
system to recognize specific attacks. They used data from the CICIDS 2017 dataset that included
79-dimensional features and 15 classes. The proposed detection system had a two-layer structure.
The first layer consisted of two independent classifiers (i.e., a decision tree and a rule-based model),
which processed part of the features. The second layer was a random forest classifier, which processed
all the features from the dataset as well as the output of the first layer. They compared multiple machine
learning models on 15 classes; their proposed methods performed best on 8 of the 15 classes. Moreover,
the proposed method had low time consumption, reflecting its practicability.
Session-based detection using supervised learning models depends on expert knowledge, which
is difficult to expand to new scenarios. To address this problem, Alseiari et al. [55] proposed an
unsupervised method to detect attacks in smart grids. Due to the lack of smart grid datasets, they
constructed a dataset through simulation experiments. First, they captured and cached packets to
construct sessions. Then, they extracted 23-dimensional features from the sessions. Next, they utilized
mini batch K-means to divide the data into many clusters. Finally, they labeled the clusters. This work
was based on two hypotheses. The first was that normal samples were the majority. The second one
was that the distances among the normal clusters were relatively short. When the size of a cluster was
less than 25% of the full sample amount or a cluster centroid was far away from all other the other
cluster centroids, that cluster was judged as abnormal. No expert knowledge was required for any
part of this process. The proposed methods were able to detect intrusion behaviors in smart grids
effectively and locate the attack sources while holding the false alarm rate less to than 5%.
experiments on the DARPA 1998 and the ISCX 2012 datasets. They first applied the CNN to extract
spatial features from packets. Next, they concatenated the spatial features in sequence and extracted
time features using the LSTM model. The resulting model achieved accuracies between 99.92% and
99.96%, and detection rates between 95.76% and 98.99%.
classification. The CNN was good at finding local relationships and detecting abnormal behaviors
from system calls.
Model interpretation is another important research direction, which has attracted extensive
attention. Tuor et al. [62] proposed an interpretable deep learning detection method using data from
the CERT Insider Threat dataset, which consists of system logs. They first extracted 414-dimensional
features using a sliding window. Then, they adopted a DNN and an RNN to classify logs. The DNN
detected attacks based on the log contents, and the RNN detected attacks based on the log sequences.
The proposed methods reduced the analysis workload by 93.5% and reached a detection rate of 90%.
Furthermore, they decomposed the abnormal scores into the contributions of each behavior, which
was a helpful analysis. Interpretable models are more convincing than are uninterpretable models.
Some logs lack labeled information; consequently, supervised learning is inappropriate.
Unsupervised learning methods are usually used with unlabeled logs. Bohara et al. [63] proposed an
unsupervised learning detection method in the enterprise environment. They conducted experiments
on the VAST 2011 Mini Challenge 2 dataset and extracted features from the host and network logs.
Due to the different influences of each feature, they selected features using the Pearson correlation
coefficient. Then, they clustered the logs with the K-means and DBSCAN algorithms. By measuring the
salient cluster features, the clusters were associated with abnormal behaviors. Finally, they analyzed
the abnormal clusters manually to determine the specific attack types.
(1) Lack of available datasets. The most widespread dataset is currently KDD99, which has many
problems, and new datasets are required. However, constructing new datasets depends on expert
knowledge, and the labor cost is high. In addition, the variability of the Internet environment intensifies
the dataset shortage. New types of attacks are emerging, and some existing datasets are too old to
reflect these new attacks. Ideally, datasets should include most of the common attacks and correspond
to current network environments. Moreover, the available datasets should be representative, balanced
and have less redundancy and less noise. Systematic datasets construction and incremental learning
may be solutions to this problem.
(2) Inferior detection accuracy in actual environments. Machine learning methods have a certain
ability to detect intrusions, but they often do not perform well on completely unfamiliar data. Most the
existing studies were conducted using labeled datasets. Consequently, when the dataset does not cover
all typical real-world samples, good performance in actual environments is not guaranteed—even if
the models achieve high accuracy on test sets.
(3) Low efficiency. Most studies emphasize the detection results; therefore, they usually employ
complicated models and extensive data preprocessing methods, leading to low efficiency. However,
to reduce harm as much as possible, IDSs need to detect attacks in real time. Thus, a trade-off exists
between effect and efficiency. Parallel computing [66,67] approaches using GPUs [48,68,69] are common
solutions.
From summarizing the recent studies, we can conclude that the major trends of IDS research lie in
the following aspects.
(1) Utilizing domain knowledge. Combining domain knowledge with machine learning can
improve the detection effect, especially when the goal is to recognize specific types of attacks in specific
application scenarios.
• The rule-based detection methods have low false alarm rates but high missed alarm rates include
considerable expert knowledge. In contrast, the machine learning methods usually have high
false alarm rates and low missed alarm rates. The advantages of both methods are complementary.
Combining machine learning methods with rule-based systems, such as Snort [70–73], can result
in IDSs with low false alarm rates and low missed alarm rates.
• For specific types of attacks, such as DOS [74–79], botnet [80], and phishing web [81], proper
feature must be extracted according to the attack characteristics that can be abstracted using
domain knowledge.
• For specific application scenarios, such as cloud computing [82,83], IoT [84–86], and smart
grids [87,88], domain knowledge can be used to provide the environmental characteristics that
are helpful in data collection and data preprocessing.
• Compared with shallow models, deep learning methods learn features directly from raw data,
and their fitting ability is stronger. Deep learning models with deep structures can be used for
classification, feature extraction, feature reduction, data denoising, and data augmentation tasks.
Thus, deep learning methods can improve IDSs from many aspects.
• Unsupervised learning methods require no labeled data; thus they can be used even when
a dataset shortage exists. The usual approach involves dividing data using an unsupervised
learning model, manually labeling the clusters, and then training a classification model with
supervised learning [89–92].
(3) Developing practical models. Practical IDSs not only need to have high detection accuracy
but also high runtime efficiency and interpretability.
Appl. Sci. 2019, 9, 4396 23 of 28
• In attack detection, the real-time requirement is essential. Thus, one research direction is to
improve the efficiency of machine learning models. Reducing the time required for data collection
and storage is also of concern.
• Interpretability is important for practical IDSs. Many machine learning models, especially deep
learning models, are black boxes. These models report only the detection results and have no
interpretable basis [93]. However, every cyber security decision should be made cautiously.
An output result with no identifiable reason is not convincing. Thus, an IDS with high accuracy,
high efficiency and interpretability is more practical.
6. Conclusions
The paper first proposes an IDS taxonomy that takes data sources as the main thread to present
the numerous machine learning algorithms used in this field. Based on this taxonomy, we then analyze
and discuss IDSs applied to various data sources, i.e., logs, packets, flow, and sessions. IDSs aim
to detect attacks, therefore it is vital to select proper data source according to attack characteristics.
Logs contain detailed semantic information, which are suitable for detecting SQL injection, U2R, and
R2L attacks. And packets provide communication contents, which are fit to detect U2L and R2L
attacks. Flow represents the whole network environment, which can detect DOS and Probe attack.
Sessions, which reflect communication between clients and servers, can be used to detect U2L, R2L,
tunnel and Trojan attacks. For IDSs using these different data types, the paper emphasizes machine
learning techniques (especially deep learning algorithms) and application scenarios.
Deep learning models are playing an increasingly important role and have become an outstanding
direction of study. Deep learning approaches include multiple deep networks which can be used to
improve the performance of IDSs. Compared with shallow machine learning models, deep learning
models own stronger fitting and generalization abilities. In addition, deep learning approaches are
independent of feature engineering and domain knowledge, which takes an outstanding advantage
over shallow machine learning models. However, the running time of deep learning models are often
too long to meet the real-time requirement of IDSs.
By summarizing the recent typical studies, this paper analyzes and refines the challenges and
future trends in the field to provide references to other researchers conducting in-depth studies. Lacking
of available datasets may be the biggest challenge. So unsupervised learning and incremental learning
approaches have broad development prospects. For practical IDSs, interpretability is essential. Because
interpretable models are convincing and can guide users to make a decision. The interpretability of
models may become an important research direction about IDSs in the future.
Author Contributions: Writing—original draft preparation, H.L.; writing—review and editing, H.L.; writing—
review and editing, B.L.
Funding: This research received no external funding
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Anderson, J.P. Computer Security Threat Monitoring and Surveillance; Technical Report; James P. Anderson
Company: Philadelphia, PA , USA, 1980.
2. Michie, D.; Spiegelhalter, D.J.; Taylor, C. Machine Learning, Neurall and Statistical Classification; Ellis Horwood
Series in Artificial Intelligence: New York, NY, USA, 1994; Volume 13.
3. Buczak, A.L.; Guven, E. A survey of data mining and machine learning methods for cyber security intrusion
detection. IEEE Commun. Surv. Tutor. 2015, 18, 1153–1176. [CrossRef]
4. Xin, Y.; Kong, L.; Liu, Z.; Chen, Y.; Li, Y.; Zhu, H.; Gao, M.; Hou, H.; Wang, C. Machine learning and deep
learning methods for cybersecurity. IEEE Access 2018, 6, 35365–35381. [CrossRef]
5. Agrawal, S.; Agrawal, J. Survey on anomaly detection using data mining techniques. Procedia Comput. Sci.
2015, 60, 708–713. [CrossRef]
6. Denning, D.E. An intrusion-detection model. IEEE Trans. Softw. Eng. 1987, 222–232. [CrossRef]
Appl. Sci. 2019, 9, 4396 24 of 28
7. Heberlein, L.T.; Dias, G.V.; Levitt, K.N.; Mukherjee, B.; Wood, J.; Wolber, D. A network security monitor.
In Proceedings of the 1990 IEEE Computer Society Symposium on Research in Security and Privacy, Oakland,
CA, USA, 7–9 May 1990; pp. 296–304.
8. Kuang, F.; Zhang, S.; Jin, Z.; Xu, W. A novel SVM by combining kernel principal component analysis and
improved chaotic particle swarm optimization for intrusion detection. Soft Comput. 2015, 19, 1187–1199.
[CrossRef]
9. Syarif, A.R.; Gata, W. Intrusion detection system using hybrid binary PSO and K-nearest neighborhood
algorithm. In Proceedings of the 2017 11th International Conference on Information & Communication
Technology and System (ICTS), Surabaya, Indonesia, 31 October 2017; pp. 181–186.
10. Pajouh, H.H.; Dastghaibyfard, G.; Hashemi, S. Two-tier network anomaly detection model: A machine
learning approach. J. Intell. Inf. Syst. 2017, 48, 61–74. [CrossRef]
11. Mahmood, H.A. Network Intrusion Detection System (NIDS) in Cloud Environment based on Hidden
Naïve Bayes Multiclass Classifier. Al-Mustansiriyah J. Sci. 2018, 28, 134–142. [CrossRef]
12. Shah, R.; Qian, Y.; Kumar, D.; Ali, M.; Alvi, M. Network intrusion detection through discriminative feature
selection by using sparse logistic regression. Future Internet 2017, 9, 81. [CrossRef]
13. Peng, K.; Leung, V.C.; Huang, Q. Clustering approach based on mini batch kmeans for intrusion detection
system over big data. IEEE Access 2018, 6, 11897–11906. [CrossRef]
14. Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.A. Extracting and composing robust features with
denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, Helsinki,
Finland, 5–9 July 2008, pp. 1096–1103.
15. Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.A. Stacked denoising autoencoders: Learning
useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 2010,
11, 3371–3408.
16. Deng, J.; Zhang, Z.; Marchi, E.; Schuller, B. Sparse autoencoder-based feature transfer learning for speech
emotion recognition. In Proceedings of the 2013 Humaine Association Conference on Affective Computing
and Intelligent Interaction, Geneva, Switzerland, 2–5 September 2013; pp. 511–516.
17. Hinton, G.E. A practical guide to training restricted Boltzmann machines. In Neural Networks: Tricks of the
Trade; Springer: Berlin, Germany, 2012; pp. 599–619.
18. Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006,
18, 1527–1554. [CrossRef] [PubMed]
19. Boureau, Y.l.; Cun, Y.L.; Ranzato, M.A. Sparse feature learning for deep belief networks. In Proceedings
of the 21st Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–10
December 2008; pp. 1185–1192.
20. Zhao, G.; Zhang, C.; Zheng, L. Intrusion detection using deep belief network and probabilistic neural
network. In Proceedings of the 2017 IEEE International Conference on Computational Science and
Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC),
Guangzhou, China, 21–24 July 2017; Volume 1, pp. 639–642.
21. Alrawashdeh, K.; Purdy, C. Toward an online anomaly intrusion detection system based on deep learning.
In Proceedings of the 2016 15th IEEE International Conference on Machine Learning and Applications
(ICMLA), Anaheim, CA, USA, 18–20 December 2016; pp. 195–200.
22. Yang, Y.; Zheng, K.; Wu, C.; Niu, X.; Yang, Y. Building an effective intrusion detection system using the
modified density peak clustering algorithm and deep belief networks. Appl. Sci. 2019, 9, 238. [CrossRef]
23. Sharif Razavian, A.; Azizpour, H.; Sullivan, J.; Carlsson, S. CNN features off-the-shelf: An astounding
baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Workshops, Columbus, OH, USA, 23–28 June 2014; pp. 806–813.
24. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks.
In Proceedings of the 26th Annual Conference on Neural Information Processing Systems, Lake Tahoe, NV,
USA, 3–6 December 2012; pp. 1097–1105. [CrossRef]
25. Lawrence, S.; Giles, C.L.; Tsoi, A.C.; Back, A.D. Face recognition: A convolutional neural-network approach.
IEEE Trans. Neural Netw. 1997, 8, 98–113. [CrossRef]
26. Graves, A.; Mohamed, A.R.; Hinton, G. Speech recognition with deep recurrent neural networks.
In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing,
Vancouver, BC, Canada , 26–31 May 2013; pp. 6645–6649.
Appl. Sci. 2019, 9, 4396 25 of 28
27. Graves, A.; Jaitly, N. Towards end-to-end speech recognition with recurrent neural networks. In Proceedings
of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 1764–1772.
28. Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. In Proceedings of the
Annual Conference on Neural Information Processing Systems 2014, Montreal, QC, Canada, 8–13 December
2014; pp. 3104–3112.
29. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef]
30. Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on
sequence modeling. arXiv 2014, arXiv:1412.3555.
31. Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45,
2673–2681. [CrossRef]
32. Ribeiro, M.T.; Singh, S.; Guestrin, C. Why should i trust you?: Explaining the predictions of any classifier.
In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144.
33. Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the Annual
Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017;
pp. 4765–4774.
34. Li, J.; Monroe, W.; Jurafsky, D. Understanding neural networks through representation erasure. arXiv 2016,
arXiv:1612.08220.
35. Fong, R.C.; Vedaldi, A. Interpretable explanations of black boxes by meaningful perturbation. In Proceedings
of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3429–3437.
36. DARPA1998 Dataset. 1998. Available online: http://www.ll.mit.edu/r-d/datasets/1998-darpa-intrusion-
detection-evaluation-dataset (accessed on 16 October 2019).
37. KDD99 Dataset. 1999. Available online: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
(accessed on 16 October 2019).
38. NSL-KDD99 Dataset. 2009. Available online: https://www.unb.ca/cic/datasets/nsl.html (accessed on 16
October 2019).
39. Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems
(UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications And Information
Systems Conference (MilCIS), Canberra, Australia, 10–12 November 2015; pp. 1–6.
40. Mayhew, M.; Atighetchi, M.; Adler, A.; Greenstadt, R. Use of machine learning in big data analytics
for insider threat detection. In Proceedings of the MILCOM 2015-2015 IEEE Military Communications
Conference, Canberra, Australia, 10–12 November 2015; pp. 915–922.
41. Hu, L.; Li, T.; Xie, N.; Hu, J. False positive elimination in intrusion detection based on clustering.
In Proceedings of the 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery
(FSKD), Zhangjiajie, China, 15–17 August 2015; pp. 519–523.
42. Liu, H.; Lang, B.; Liu, M.; Yan, H. CNN and RNN based payload classification methods for attack detection.
Knowl.-Based Syst. 2019, 163, 332–341. [CrossRef]
43. Min, E.; Long, J.; Liu, Q.; Cui, J.; Chen, W. TR-IDS: Anomaly-based intrusion detection through
text-convolutional neural network and random forest. Secur. Commun. Netw. 2018, 2018, 4943509. [CrossRef]
44. Zeng, Y.; Gu, H.; Wei, W.; Guo, Y. Deep − Full − Range: A Deep Learning Based Network Encrypted Traffic
Classification and Intrusion Detection Framework. IEEE Access 2019, 7, 45182–45190. [CrossRef]
45. Yu, Y.; Long, J.; Cai, Z. Network intrusion detection through stacking dilated convolutional autoencoders.
Secur. Commun. Netw. 2017, 2017, 4184196. [CrossRef]
46. Rigaki, M.; Garcia, S. Bringing a gan to a knife-fight: Adapting malware communication to avoid detection.
In Proceedings of the 2018 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, 24 May
2018; pp. 70–75.
47. Goeschel, K. Reducing false positives in intrusion detection systems using data-mining techniques utilizing
support vector machines, decision trees, and naive Bayes for off-line analysis. In Proceedings of the
SoutheastCon 2016, Norfolk, VA, USA, 30 March–3 April 2016; pp. 1–6.
48. Kuttranont, P.; Boonprakob, K.; Phaudphut, C.; Permpol, S.; Aimtongkhamand, P.; KoKaew, U.; Waikham, B.;
So-In, C. Parallel KNN and Neighborhood Classification Implementations on GPU for Network Intrusion
Detection. J. Telecommun. Electron. Comput. Eng. (JTEC) 2017, 9, 29–33.
Appl. Sci. 2019, 9, 4396 26 of 28
49. Potluri, S.; Ahmed, S.; Diedrich, C. Convolutional Neural Networks for Multi-class Intrusion Detection
System. In Mining Intelligence and Knowledge Exploration; Springer: Cham, Switzerland, 2018; pp. 225–238.
50. Zhang, B.; Yu, Y.; Li, J. Network Intrusion Detection Based on Stacked Sparse Autoencoder and Binary
Tree Ensemble Method. In Proceedings of the 2018 IEEE International Conference on Communications
Workshops (ICC Workshops), Kansas City, MO, USA, 20–24 May 2018; pp. 1–6.
51. Zhang, H.; Yu, X.; Ren, P.; Luo, C.; Min, G. Deep Adversarial Learning in Intrusion Detection: A Data
Augmentation Enhanced Framework. arXiv 2019, arXiv:1901.07949.
52. Teng, S.; Wu, N.; Zhu, H.; Teng, L.; Zhang, W. SVM-DT-based adaptive and collaborative intrusion detection.
IEEE/CAA J. Autom. Sin. 2017, 5, 108–118. [CrossRef]
53. Ma, T.; Wang, F.; Cheng, J.; Yu, Y.; Chen, X. A hybrid spectral clustering and deep neural network ensemble
algorithm for intrusion detection in sensor networks. Sensors 2016, 16, 1701. [CrossRef]
54. Ahmim, A.; Maglaras, L.; Ferrag, M.A.; Derdour, M.; Janicke, H. A novel hierarchical intrusion detection
system based on decision tree and rules-based models. In Proceedings of the 2019 15th International
Conference on Distributed Computing in Sensor Systems (DCOSS), Santorini Island, Greece, 29–31 May
2019; pp. 228–233.
55. Alseiari, F.A.A.; Aung, Z. Real-time anomaly-based distributed intrusion detection systems for advanced
Metering Infrastructure utilizing stream data mining. In Proceedings of the 2015 International Conference on
Smart Grid and Clean Energy Technologies (ICSGCE), Offenburg, Germany, 20–23 October 2015; pp. 148–153.
56. Yuan, X.; Li, C.; Li, X. DeepDefense: identifying DDoS attack via deep learning. In Proceedings of the 2017
IEEE International Conference on Smart Computing (SMARTCOMP), Hong Kong, China, 29–31 May 2017;
pp. 1–8.
57. Radford, B.J.; Apolonio, L.M.; Trias, A.J.; Simpson, J.A. Network traffic anomaly detection using recurrent
neural networks. arXiv 2018, arXiv:1803.10769.
58. Wang, W.; Sheng, Y.; Wang, J.; Zeng, X.; Ye, X.; Huang, Y.; Zhu, M. HAST-IDS: Learning hierarchical
spatial-temporal features using deep neural networks to improve intrusion detection. IEEE Access 2017,
6, 1792–1806. [CrossRef]
59. Meng, W.; Li, W.; Kwok, L.F. Design of intelligent KNN-based alarm filter using knowledge-based alert
verification in intrusion detection. Secur. Commun. Netw. 2015, 8, 3883–3895. [CrossRef]
60. McElwee, S.; Heaton, J.; Fraley, J.; Cannady, J. Deep learning for prioritizing and responding to intrusion
detection alerts. In Proceedings of the MILCOM 2017—2017 IEEE Military Communications Conference
(MILCOM), Baltimore, MD, USA, 23–25 October 2017; pp. 1–5.
61. Tran, N.N.; Sarker, R.; Hu, J. An Approach for Host-Based Intrusion Detection System Design Using
Convolutional Neural Network. In Proceedings of the International Conference on Mobile Networks and
Management, Chiba, Japan, 23–25 September 2017; Springer: Berlin, Germany, 2017; pp. 116–126.
62. Tuor, A.; Kaplan, S.; Hutchinson, B.; Nichols, N.; Robinson, S. Deep learning for unsupervised insider threat
detection in structured cybersecurity data streams. In Proceedings of the Workshops at the Thirty-First
AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017.
63. Bohara, A.; Thakore, U.; Sanders, W.H. Intrusion detection in enterprise systems by combining and clustering
diverse monitor data. In Proceedings of the Symposium and Bootcamp on the Science of Security, Pittsburgh,
PA, USA, 19–21 April 2016; pp. 7–16.
64. Uwagbole, S.O.; Buchanan, W.J.; Fan, L. Applied machine learning predictive analytics to SQL injection
attack detection and prevention. In Proceedings of the 2017 IFIP/IEEE Symposium on Integrated Network
and Service Management (IM), Lisbon, Portugal, 8–12 May 2017; pp. 1087–1090.
65. Vartouni, A.M.; Kashi, S.S.; Teshnehlab, M. An anomaly detection method to detect web attacks using
Stacked Auto-Encoder. In Proceedings of the 2018 6th Iranian Joint Congress on Fuzzy and Intelligent
Systems (CFIS), Kerman, Iran, 28 February–2 March 2018; pp. 131–134.
66. Potluri, S.; Diedrich, C. Accelerated deep neural networks for enhanced Intrusion Detection System.
In Proceedings of the 2016 IEEE 21st International Conference on Emerging Technologies and Factory
Automation (ETFA), Berlin, Germany, 6–9 September 2016; pp. 1–8.
67. Pektaş, A.; Acarman, T. Deep learning to detect botnet via network flow summaries. Neural Comput. Appl.
2018, 1–13. [CrossRef]
Appl. Sci. 2019, 9, 4396 27 of 28
68. Kim, J.; Shin, N.; Jo, S.Y.; Kim, S.H. Method of intrusion detection using deep neural network. In Proceedings
of the 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), Jeju, Korea,
13–16 February 2017; pp. 313–316.
69. Shone, N.; Ngoc, T.N.; Phai, V.D.; Shi, Q. A deep learning approach to network intrusion detection.
IEEE Trans. Emerg. Top. Comput. Intell. 2018, 2, 41–50. [CrossRef]
70. Ammar, A. A decision tree classifier for intrusion detection priority tagging. J. Comput. Commun. 2015, 3, 52.
[CrossRef]
71. Patel, J.; Panchal, K. Effective intrusion detection system using data mining technique. J. Emerg. Technol.
Innov. Res. 2015, 2, 1869–1878.
72. Khamphakdee, N.; Benjamas, N.; Saiyod, S. Improving intrusion detection system based on snort rules for
network probe attacks detection with association rules technique of data mining. J. ICT Res. Appl. 2015,
8, 234–250. [CrossRef]
73. Shah, S.A.R.; Issac, B. Performance comparison of intrusion detection systems and application of machine
learning to Snort system. Future Gener. Comput. Syst. 2018, 80, 157–170. [CrossRef]
74. Fouladi, R.F.; Kayatas, C.E.; Anarim, E. Frequency based DDoS attack detection approach using naive Bayes
classification. In Proceedings of the 2016 39th International Conference on Telecommunications and Signal
Processing (TSP), Vienna, Austria, 27–29 June 2016; pp. 104–107.
75. Alkasassbeh, M.; Al-Naymat, G.; Hassanat, A.; Almseidin, M. Detecting distributed denial of service attacks
using data mining techniques. Int. J. Adv. Comput. Sci. Appl. 2016, 7, 436–445. [CrossRef]
76. Niyaz, Q.; Sun, W.; Javaid, A.Y. A deep learning based DDoS detection system in software-defined
networking (SDN). arXiv 2016, arXiv:1611.07400.
77. Yadav, S.; Subramanian, S. Detection of Application Layer DDoS attack by feature learning using Stacked
AutoEncoder. In Proceedings of the 2016 International Conference on Computational Techniques in
Information and Communication Technologies (ICCTICT), New Delhi, India, 11–13 March 2016; pp. 361–366.
78. Nguyen, S.N.; Nguyen, V.Q.; Choi, J.; Kim, K. Design and implementation of intrusion detection system
using convolutional neural network for dos detection. In Proceedings of the 2nd International Conference
on Machine Learning and Soft Computing, Phu Quoc Island, Vietnam, 2–4 February 2018; pp. 34–38.
79. Bontemps, L.; McDermott, J.; Le-Khac, N.A. Collective anomaly detection based on long short-term memory
recurrent neural networks. In Proceedings of the International Conference on Future Data and Security
Engineering, Tho City, Vietnam, 23–25 November 2016; Springer: Cham, Switzerland, 2016; pp. 141–152.
80. Bapat, R.; Mandya, A.; Liu, X.; Abraham, B.; Brown, D.E.; Kang, H.; Veeraraghavan, M. Identifying malicious
botnet traffic using logistic regression. In Proceedings of the 2018 Systems and Information Engineering
Design Symposium (SIEDS), Charlottesville, VA, USA, 27 April 2018; pp. 266–271.
81. Abdelhamid, N.; Thabtah, F.; Abdel-jaber, H. Phishing detection: A recent intelligent machine learning
comparison based on models content and features. In Proceedings of the 2017 IEEE International Conference
on Intelligence and Security Informatics (ISI), Beijing, China, 22–24 July 2017; pp. 72–77.
82. Peng, K.; Leung, V.; Zheng, L.; Wang, S.; Huang, C.; Lin, T. Intrusion detection system based on decision tree
over big data in fog environment. Wirel. Commun. Mob. Comput. 2018, 2018, 4680867. [CrossRef]
83. He, Z.; Zhang, T.; Lee, R.B. Machine learning based DDoS attack detection from source side in cloud.
In Proceedings of the 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing
(CSCloud), New York, NY, USA , 26–28 June 2017; pp. 114–120.
84. Doshi, R.; Apthorpe, N.; Feamster, N. Machine learning ddos detection for consumer internet of things
devices. In Proceedings of the 2018 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA,
24 May 2018; pp. 29–35.
85. Meidan, Y.; Bohadana, M.; Mathov, Y.; Mirsky, Y.; Shabtai, A.; Breitenbacher, D.; Elovici, Y. N-BaIoT—
Network-based detection of IoT botnet attacks using deep autoencoders. IEEE Pervasive Comput. 2018,
17, 12–22. [CrossRef]
86. Diro, A.; Chilamkurti, N. Leveraging LSTM networks for attack detection in fog-to-things communications.
IEEE Commun. Mag. 2018, 56, 124–130. [CrossRef]
87. Foroutan, S.A.; Salmasi, F.R. Detection of false data injection attacks against state estimation in smart
grids based on a mixture Gaussian distribution learning method. IET Cyber-Phys. Syst. Theory Appl. 2017,
2, 161–171. [CrossRef]
Appl. Sci. 2019, 9, 4396 28 of 28
88. He, Y.; Mendis, G.J.; Wei, J. Real-time detection of false data injection attacks in smart grid: A deep
learning-based intelligent mechanism. IEEE Trans. Smart Grid 2017, 8, 2505–2516. [CrossRef]
89. Jing, X.; Bi, Y.; Deng, H. An Innovative Two-Stage Fuzzy kNN-DST Classifier for Unknown Intrusion
Detection. Int. Arab. J. Inf. Technol. (IAJIT) 2016, 13, 359–366.
90. Farnaaz, N.; Jabbar, M. Random forest modeling for network intrusion detection system. Procedia Comput. Sci.
2016, 89, 213–217. [CrossRef]
91. Ravale, U.; Marathe, N.; Padiya, P. Feature selection based hybrid anomaly intrusion detection system using
K means and RBF kernel function. Procedia Comput. Sci. 2015, 45, 428–435. [CrossRef]
92. Jabbar, M.; Aluvalu, R.; Reddy, S. Cluster based ensemble classification for intrusion detection system.
In Proceedings of the 9th International Conference on Machine Learning and Computing, Singapore,
24–26 February 2017; pp. 253–257.
93. Guo, W.; Mu, D.; Xu, J.; Su, P.; Wang, G.; Xing, X. Lemna: Explaining deep learning based security
applications. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications
Security, Toronto, ON, Canada, 15 October 2018; pp. 364–379.
c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).