Ids If
Ids If
Ids If
Abstract. In recent years, there has been a huge increase in the number
of attacks that causes huge damage and financial losses for both educa-
tional and business organizations. Intrusion detection systems (IDSs)
play a key role in ensuring network security. With the emergence of new
types of security threats, traditional IDSs based on pattern matching
and signature filtering, are limited by their need of new/up-to-date at-
tacks patterns. To tackle this issue, Machine learning and Deep learning
(ML/DL) techniques have been proposed in the literature to enhance
the detection ability of traditional IDSs. In this paper, we investigate
a novel problem of using unsupervised learning in the task of network
intrusion detection in software defined networks (SDN). In particular, we
develop a novel outlier detection method with Isolation Forest (IDS-IF)
to effectively detect network anomalies in SDN. Most of the existing un-
supervised ML/DL techniques suffer from high false positive rates since
they consider any deviation from the normal behavior as intrusion. To
alleviate this issue, IDS-IF isolates intrusions instead of profiling normal
data samples. The proposed solution not only enhances the detection
performance but also reduces the false positive rates as well as com-
putational complexity. The experimental results using the well-known
public network security dataset KDD, show that IDS-IF outperforms re-
cent state-of-the-art outlier detection method (i.e., Local Outlier Factor
(LOF)) in terms of accuracy, F1 score, and false positives rates, making
it a promising method to cope with the new emerging security threats
in SDN.
1 Introduction
(e.g., security issues with Internet of things (IoT) devices and theft of sensitive
data ). The new emerging security threats have been increasing in sophistication
and strength and are predicted to cost huge financial losses of about $20 Billion
(USD) By 2021 [1] for both educational and business organizations. Also, the re-
cent emergence of IoT botnets (e.g., Mirai botnet), as well as the rapid growth in
the number of insecure IoT devices, with an estimation of 75 billion connected
devices by the end of 2025 [2], can provide attackers with more sophisticated
tools (e.g., Botnet as-a-service) to conduct large scale and devastating attacks.
To ensure the security of networks, Internet service providers (ISPs) make use
of firewalls to control/filter connections between local network and the Inter-
net. Also, multiple security enforcement mechanisms such as anti-virus, access
control, and data encryption are used the protect the network from any suspi-
cious activity. However, it has been shown that these security measures are not
sufficient to fully protect network against zero-day attacks.
To this end, in addition to these preventive security mechanisms, intrusion
detection systems (IDSs) are used to effectively and timely secure the network
against any type of suspicious/unauthorized activity that can cause collateral
damage to either data integrity, data confidentiality, or data availability. Intru-
sion detection systems (IDSs) play a key role in ensuring the security of the
network. IDSs can be categorized into two categories: (1) Signature filtering-
based Intrusion Detection Systems (Misuse-Based detection, SFIDS); and (2)
Anomaly-based Intrusion Detection Systems (A-IDS). SFIDSs detect network
anomalies by using a pre-defined attack pattern/signature of well-known intru-
sions, while A-IDSs learn normal behaviors of activities and consider any devi-
ation as intrusion. SIDSs are vulnerable to zero-day attacks and are limited by
their need of new/up-to-date attacks patterns, while A-IDS suffers from high
false positive rate.
Software defined networks (SDN) is a novel paradigm that leverages network
programmability to solve the limitations of conventional networks. SDN provides
new capabilities through a logically centralized component, to cope with the new
emerging security threats ranging from DDoS attacks to phishing to data leakage
[3–13].
The recent emergence of Machine learning and Deep learning (ML/DL) tech-
niques have achieved promising results in many fields [14–18]. In order to effi-
ciently/timely handle the task if intrusion detection, IDSs adopted these ML/DL
techniques. ML/DL based IDSs can effectively detect existing and new network
security threats [19–21]. In this chapter, we investigate a novel problem of us-
ing unsupervised learning in the task of network anomaly/intrusion detection
in software defined networks (SDN). Most existing unsupervised ML/DL based
IDSs such as clustering-based techniques (e.g., K-means [22]), try to find a pro-
file o similar/normal data samples, then classify others/dissimilar as anoma-
lies/intrusions. These techniques have two main drawbacks: (1) high false posi-
tive rates since they consider any deviation from the normal behavior as intru-
sion; and (2) high computational complexity since they focus on memorizing a
large number of normal data samples patterns (i.e., hidden feature learning). To
A novel unsupervised learning method for intrusion detection in SDN 3
alleviate these issues, we develop a novel outlier detection method with Isolation
Forest (IDS-IF) that can effectively detect network anomalies in SDN while hav-
ing a low false positive rate as well as low computational complexity. To achieve
this, IDS-IF isolates intrusions instead of profiling normal data samples. Anoma-
lies/Intrusions are mostly few and rare/different data samples (i.e., minority in
the dataset), which make them susceptible to isolation with low computation
rather than profiling a large number of normal data samples. IDS-IF not only
enhances the detection performance but also reduces the false positive rates as
well as computational complexity. The experimental results using the well-known
public network security dataset KDD [23], show that IDS-IF outperforms recent
state-of-the-art Outlier detection method (i.e., Local Outlier Factor (LOF)) in
terms of accuracy, F1 score, and false positives rates, making it a promising
method to cope with the new emerging security threats in SDN.
The main contributions of this paper can be summarized as follows:
2 Related Works
The new emerging security threats are becoming more devastating; several state-
of-the-art works have integrated supervised and unsupervised ML/DL techniques
to improve the efficiency of traditional IDSs to cope with these attacks. In the
following, we overview the most representative ML/DL based IDSs as well as
their security issues.
Ruoning et al. [24] proposed a novel real-time intrusion detection scheme that
uses a dynamic cumulative-distance anomaly detection algorithm (i.e., k-nearest
neighbors (k-NN)). Their proposed architecture consisted of a distributed data
processing platform that uses flume for a reliable log data aggregation, and
collection and storm for a distributed and reliable and stream processing. The
effectiveness of the proposed scheme was evaluated using a real-world dataset.
4 Zakaria Abou El Houda, Abdelhakim Senhaji Hafid, and Lyes Khoukhi
The experimental results showed that this algorithm is suitable for real-time
network anomaly detection in high-speed network.
Yang et al. [25] developed a new framework that uses support vector machines
(SVM) method to detect and mitigate network anomalies in a SDN environment.
This framework consisted of three modules: (1) a traffic collection module, to
extract network traffic features/characteristics and prepare them for network
traffic identification module; (2) a network anomaly identification module, to
perform the classification and to identify anomalies using SVM method; and
(3) a flow table delivery module, to dynamically adjust Openflow (OF) rules
according to the attack identification module. The effectiveness of the proposed
framework was tested and evaluated using KDD’99 dataset.
Majjed et al. [26] designed an effective DL framework, self-taught learning
(STL-IDS), that uses a sparse autoencoder (SAE) along with a support vector
machines (SVM) method to detect and mitigate network anomalies. STL-IDS
uses a feature selection method and a dimensionality reduction scheme to reduce
training time complexity while improving the prediction accuracy using the SVM
algorithm.
Taher et al. [27] proposed a novel supervised ML method that uses Artificial
Neural Network (ANN) along with a feature selection method to detect and
mitigate network anomalies. The authors have shown that ANN with feature
selection method outperform SVM with respect to intrusion detection rate. The
effectiveness of the proposed framework was tested and evaluated using NSL-
KDD dataset.
Yin et al. [28] proposed a novel deep learning scheme (RNN-IDS), that uses
recurrent neural networks for network intrusion detection. The authors have
studied the performance of RNN-IDS in binary and multi-class classification
using NSL-KDD dataset. The experimental results showed that RNN-IDS out-
performs sallow ML models with respect to detection accuracy.
Wang et al. [29] proposed a hierarchical spatial-temporal feature-based IDS
called HAST-IDS. HAST-IDS has two main stages: (1) it uses deep convolutional
neural networks (CNNs) to learn the low-level spatial features of network traffic;
and (2) it uses LSTM (Long short-term memory) to learn a high-level temporal
feature. The effectiveness of the proposed framework was tested and evaluated
using the standard DRAPA and ISCX2012 datasets.
Tuan et al. [30] proposed a light-weight unsupervised learning scheme based
on Local outlier Factor (LoF) algorithm to detect and mitigate network anoma-
lies (e.g., DDoS attacks) in a SDN environment. LoF measured local deviation
for a given data sample with respect to its neighbors (i.e., local density). The
proposed solution requires minimal network resources and achieved promising
result using CAIDA dataset.
Gao et al. [31] proposed an adaptive ensemble learning method that combines
multiple shallow ML models (i.e., Decision Trees (DT), support vector machine
(SVM), logical regression (LR), k-nearest neighbors (KNN), Adaboost, random
forests (RF), and deep neural networks) to increase the detection rate of shallow
ML models. Also, they have proposed an ensemble adaptive voting algorithm.
A novel unsupervised learning method for intrusion detection in SDN 5
he effectiveness of the proposed framework was tested and evaluated using NSL-
KDD dataset.
Based on our analysis of existing works [24–30], we found that a number of
these schemes [27–30] are computationally expensive. Also, most of them suffer
from high false positive rates since they consider any deviation from the normal
behavior as intrusion. To address the shortcomings of the existing solutions [24–
30], we propose a novel outlier detection method with Isolation Forest (IDS-IF)
to effectively detect network anomalies in SDN. IDS-IF isolates intrusions instead
of profiling normal data samples; it does not use any computationally expensive
method (i.e., density measure, distance measure) to detect intrusions. Also, IDS-
IF can handle large size and extremely high-dimensional problems. IDS-IF not
only enhances the detection performance but also reduces false positive rates as
well as computational complexity. The experimental results using the well-known
public network security dataset KDD [23], show that IDS-IF outperforms recent
state-of-the-art Outlier detection method [30] in terms of accuracy, F1 score,
and false positives rates, making it a promising method to cope with the new
emerging security threats in SDN.
3 IDS-IF: An Overview
This section presents an overview of IDS-IF. When designing IDS-IF, we did
consider the following goals/objectives. First, IDS-IF should ensure/guarantee a
full protection from the new emerging security threats. Unlike existing ML/DL
based IDSs [24–30] that try to find a profile o similar/normal data samples,
then classify others/dissimilar as anomalies/intrusions. IDS-IF aims to isolates
intrusions instead of profiling normal data samples. Anomalies/Intrusions are
mostly few and rare/different data samples which make them susceptible to
isolation with low computation rather than profiling normal data samples that
consist of a large number of data samples. Then, these anomalies/intrusions
should be effectively and timely detected/mitigated, using an OpenFlow (OF)
security policy, and the overall system has to be as secure as possible.
Fig. 1 shows the architecture of IDS-IF. IDS-IF haw two phases: (1) a novel
outlier detection method with Isolation Forest (IDS-IF) to effectively detect
network anomalies in SDN; this method is implemented on the application plane
(i.e., top of the SDN controller); and (2) a security policy mitigation scheme to
effectively mitigate network anomalies allowing to timely and effectively detect
and mitigate these network security threats. The Northbound API (i.e., REST
API) is used in the detection/mitigation process to offer the inter-operability
to use/manage any type of SDN controller (e.g., Ryu OpenFlow controller [32],
Floodlight OpenFlow controller [33]).
4 IDS-IF
In this section, we describe in more detail IDS-IF; in particular, we describe how
it effectively isolates anomalies without normal data sample profiling.
6 Zakaria Abou El Houda, Abdelhakim Senhaji Hafid, and Lyes Khoukhi
Security Action
IDS-IF
and Policy
AS A AS B AS C
to decrease the entropy from the top of the tree (i.e., root node) to the bottom
of the tree (i.e., leaf node). IG is defined as follows:
N
X
IG = − pj ∗ log(pj ) (1)
j=1
Xi − M ean(Xi )
Xi0 = (5)
stdev(Xi )
A novel unsupervised learning method for intrusion detection in SDN 9
Label Binary
Normal 0
DoS 1
Probe 1
R2L 1
U2R 1
where Xi denotes data input feature (e.g., flag), M ean(Xi ) and stdev(Xi )
denote, respectively, the mean and standard deviation values of each data input
feature.
2
F1 = 1 1 (9)
P recision + Recall
FP
FPR = (10)
TN + FP
where TP (True Positives) represent anomalies/intrusions that are correctly
identified/classified as intrusions, FN (False Negatives) represent anomalies/intrusions
that are classified as normal data samples, FP (False Positives) represent nor-
mal data samples that are identified/classified as anomalies/intrusions, and TN
(True Negatives) represent normal data samples that are identified/classified as
normal data samples.
To test the effectiveness of IDS-IF, we defined two scenarios that are sum-
marized in Table. 5. For scenario 1, we consider a binary classification using SA
dataset that contains 1% of abnormal data of KDD dataset. For scenario 2, we
consider a binary classification using SF dataset that contains 0.3% of abnormal
data of KDD dataset.
We compare the performance of IDS-IF with a recent state-of-the-art Outlier
detection method [30] in terms of of Accuracy, Precision, Recall, and F1 score
in both scenarios. Higher these permanence metrics values indicate a better
classification model.
Figs. 2 and 3 show the confusion matrices on the KDD dataset for scenario
1 and scenario 2, respectively. In scenario 1, IDS-IF achieves 89%, 97%, 88%,
94%, 92% in accuracy, precision, recall, AUC, and F1 score, respectively; while
LoF achieves 81%, 93%, 82%, 46%, 87% in accuracy, precision, recall, AUC,
A novel unsupervised learning method for intrusion detection in SDN 11
and F1 score, respectively. In scenario 2, IDS-IF achieves 89%, 96%, 88%, 85%,
91% in accuracy, precision, recall, AUC, and F1 score, respectively; while LoF
achieves 81%, 91%, 81%, 47%, 87% in accuracy, precision, recall, AUC, and F1
score, respectively. Table. 6 shows the performance metrics of IDS-IF and LoF
on KDD dataset. We observe that in both scenarios, IDS-IF achieves the highest
accuracy, precision, recall, AUC, and F1 score, making it a promising method
to mitigate the new emerging threats in SDN environment.
(a) (b)
Fig. 2. Confusion matrices for scenario 1 on KDD dataset for: a) IDS-IF; and b) LoF.
5 Conclusion
In this paper, we proposed a novel outlier detection method with Isolation For-
est (IDS-IF) to effectively detect network anomalies in SDN. The experimental
results using the well-known public network security dataset KDD, showed that
IDS-IF outperforms recent state-of-the-art Outlier detection method (i.e., Local
12 Zakaria Abou El Houda, Abdelhakim Senhaji Hafid, and Lyes Khoukhi
(a) (b)
Fig. 3. Confusion matrices for scenario 2 on KDD dataset for: a) IDS-IF; and b) LoF.
References
1. S. Morgan, “Global ransomware damage costs predicted to reach $20 billion (usd)
by 2021.” [Online]. Available: https://cybersecurityventures.com/
2. L. Horwitz, “The future of iot miniguide: The burgeoning iot market contin-
ues.” [Online]. Available: https://www.cisco.com/c/en/us/solutions/internet-of-
things/future-of-iot.html
3. S. Scott-Hayward, S. Natarajan, and S. Sezer, “A survey of security in software
defined networks,” IEEE Communications Surveys Tutorials, vol. 18, no. 1, pp.
623–654, Firstquarter 2016.
4. Z. A. El Houda, L. Khoukhi, and A. Hafid, “Chainsecure - a scalable and proactive
solution for protecting blockchain applications using sdn,” in 2018 IEEE Global
Communications Conference (GLOBECOM), 2018, pp. 1–6.
5. D. B. Rawat and S. R. Reddy, “Software defined networking architecture, security
and energy efficiency: A survey,” IEEE Communications Surveys Tutorials, vol. 19,
no. 1, pp. 325–346, Firstquarter 2017.
6. Z. A. El Houda, A. Hafid, and L. Khoukhi, “Co-iot: A collaborative ddos mitigation
scheme in iot environment based on blockchain using sdn,” in 2019 IEEE Global
Communications Conference (GLOBECOM), 2019, pp. 1–6.
7. D. Zhou, Z. Yan, G. Liu, and M. Atiquzzaman, “An adaptive network data collec-
tion system in sdn,” IEEE Transactions on Cognitive Communications and Net-
working, vol. 6, no. 2, pp. 562–574, 2020.
A novel unsupervised learning method for intrusion detection in SDN 13
25. L. Yang and H. Zhao, “Ddos attack identification and defense using sdn based on
machine learning method,” in 2018 15th International Symposium on Pervasive
Systems, Algorithms and Networks (I-SPAN), 2018, pp. 174–178.
26. M. Al-Qatf, Y. Lasheng, M. Al-Habib, and K. Al-Sabahi, “Deep learning approach
combining sparse autoencoder with svm for network intrusion detection,” IEEE
Access, vol. 6, pp. 52 843–52 856, 2018.
27. K. A. Taher, B. Mohammed Yasin Jisan, and M. M. Rahman, “Network intrusion
detection using supervised machine learning technique with feature selection,” in
2019 International Conference on Robotics,Electrical and Signal Processing Tech-
niques (ICREST), 2019, pp. 643–646.
28. C. Yin, Y. Zhu, J. Fei, and X. He, “A deep learning approach for intrusion detection
using recurrent neural networks,” IEEE Access, vol. 5, pp. 21 954–21 961, 2017.
29. W. Wang, Y. Sheng, J. Wang, X. Zeng, X. Ye, Y. Huang, and M. Zhu, “Hast-
ids: Learning hierarchical spatial-temporal features using deep neural networks to
improve intrusion detection,” IEEE Access, vol. 6, pp. 1792–1806, 2018.
30. N. N. Tuan, N. Danh Nghia, P. H. Hung, D. Khac Tuyen, N. M. Hieu, N. Tai Hung,
and N. H. Thanh, “An abnormal network traffic detection scheme using local outlier
factor in sdn,” in 2020 IEEE Eighth International Conference on Communications
and Electronics (ICCE), 2021, pp. 141–146.
31. X. Gao, C. Shan, C. Hu, Z. Niu, and Z. Liu, “An adaptive ensemble machine
learning model for intrusion detection,” IEEE Access, vol. 7, pp. 82 512–82 521,
2019.
32. “Ryu controller.” [Online]. Available:
https://ryu.readthedocs.io/en/latest/library.html
33. “Floodlight openflow controller.” [Online]. Available:
https://floodlight.atlassian.net/wiki/spaces/HOME/overview
34. “Scikit-learn: Machine learning in python,” Journal of Machine Learn-
ing Research, vol. 12, no. 85, p. 2825–2830, 2011. [Online]. Available:
https://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf
35. “Google colaboratory.” [Online]. Available: https://colab.research.google.com/