Ddos Detection Approach Based On Continual Learning in The SDN Environment

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

DDoS Detection Approach Based on

Continual Learning in the SDN


Environment

Ameni Chetouane1(B) and Kamel Karoui1,2


1
RIADI Laboratory, ENSI, University of Manouba, Manouba, Tunisia
[email protected], [email protected]
2
National Institute of Applied Sciences and Technology, University of Carthage,
Carthage, Tunisia

Abstract. Software Defined Networking (SDN) is a technology that has


the capacity to revolutionize the way we develop and operate network
infrastructure. It separates control and data functions and can be pro-
grammed directly using a high-level programming language. However,
given the existing and growing security risks, this technology introduces
a new security burden into the network architecture. Intruders have more
access to the network and can develop various attacks in the SDN envi-
ronment. In addition, modern cyber threats are developing faster than
ever. Distributed Denial of Service (DDoS) attacks are the major security
risk in the SDN architecture. They attempt to interfere with network ser-
vices by consuming all available bandwidth and other network resources.
In order to provide a network with countermeasures against attacks, an
Intrusion Detection System (IDS) must be continually evolved and inte-
grated into the SDN architecture. In this paper, we focus on Continual
Learning (CL) for DDoS detection in the context of SDN. We propose a
method of continually enriching datasets in order to have a better pre-
diction model. This is done without interrupting the normal operation
of the DDoS detection system.

Keywords: Software Defined Networking (SDN) · Network security ·


Security threats · DDoS · Machine Learning (ML) · Continual
Learning (CL)

1 Introduction
Over the past several decades, traditional network architecture has largely
remained unchanged and has proven to have some limitations. Software Defined
Networking (SDN) is an open network design that has been proposed to address
some of traditional networks’ key flaws [1]. Network control logic and network
operations, according to SDN proponents, are two separate concepts that should
be split into layers. Therefore, SDN introduced the control plane and data plane

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


A. Abraham et al. (Eds.): HIS 2022, LNNS 647, pp. 1199–1208, 2023.
https://doi.org/10.1007/978-3-031-27409-1_110
1200 A. Chetouane and K. Karoui

concepts: the centralized control plane manages network logic and traffic engi-
neering operations, whereas the data plane only controls packet transfer among
networks [2]. Although the characteristics of SDN, such as logical centralized
control, global network awareness, and dynamic updating of forwarding rules,
make it easy to identify and respond to attacks on the network. However, because
the control and data layers are separated, new attack opportunities arise, and
the SDN can become the target of various attacks such as Distributed Denial
of Service (DDoS) [3]. These attacks are designed to cripple networks by flood-
ing cables, network devices, and servers with unauthorized traffic. Several DDoS
attacks have occurred, resulting in downtime and financial losses [4]. Therefore,
an Intrusion Detection System (IDS) must be integrated into the SDN environ-
ment. It examines network data, analyzes it, and looks for anomalies or unwanted
access [5]. For the past few years, IDS based on Machine Learning (ML) has been
on the rise. However, the results of the different ML methods depend highly on
the dataset. A number of public datasets have been used, including NSL-KDD
[6]. However, before using these datasets to traina ML intrusion detection model,
the authors do not consider the quality of the datasets. These datasets are also
outdated and are not specific to the SDN environment. In addition, one of the
most challenging aspects of cybersecurity is the changing nature of security dan-
gers [7]. New attack vectors grow as a result of the development of new tech-
nologies and their exploitation in novel or unconventional ways. This involves
making certain that all cybersecurity components are continually updated to
guard against potential vulnerabilities. In this paper, we propose a method for
detecting DDoS in the SDN environment based on Continual Learning (CL).
The majority of CL research is focused on the computer vision and natural lan-
guage processing areas, with the network anomaly detection domain receiving
less attention [8]. The contributions in this paper include:

– The proposition of CL system to detect DDoS in the SDN environment based


on dataset enrichment. This is accomplished without interfering with the
detecting system’s normal operation.
– The proposition of three metrics to verify the usefulness of the new dataset
in terms of quality, quantity, and representativity.

The remainder of the paper is organised as follows. The related works are pre-
sented in Sect. 2. The proposed system is described in Sect. 3. In Sect. 4, we
present the case study. Section 5 concludes this paper presenting future work.

2 Related Works

DDoS attacks are one of the most serious risks in SDN [9]. Several ML approaches
to detect DDoS in SDN have been tried and tested. In [10], the authors proposed
a method to detect DDoS in SDN based on ML. They evaluated different impor-
tant feature selection methods. The best features are selected based on the per-
formance of the SDN controller and the classification accuracy of the machine
DDoS Detection Approach Based on Continual Learning 1201

learning approaches. To identify SDN attacks, a comparison of feature selec-


tion and ML methods has also been developed. The experimental results show
that the Recursive Feature Elimination (RFE) approach is used by the Random
Forest (RF) method to train the most accurate model, which has an accuracy
rate of 99.97%. Ashodia et al. [11] suggested a ML technique to detect DDoS in
SDN that combines Naive Bayes (NB), Decision Trees (DT), K-Nearest Neigh-
bors (KNN), Logistic Regression (LR), and Random Forest (RF). The experi-
ment results demonstrate that Decision Tree and Random Forest algorithms offer
superior accuracy and decision rates in comparison with other algorithms. The
authors in [12] used various machine learning techniques such as DT, NB, and LR
for DDoS detection in SDN. The proposed method includes different steps such
as data preprocessing and data classification using ML classifiers. Compared to
other algorithms, the machine learning algorithm with the greatest results was
DT, which had an accuracy rate of 99.90%. The authors in [6] employed Decision
tree (DT) and Support Vector Machine (SVM) techniques for DDoS detection
in SDN. The authors identified and selected crucial features for additional detec-
tion. The SVM classifier and DT module are then used to forward the dataset
to the next step. The classifiers classify the traffic dataset into two categories:
attack and normal, according to the flag value (0 or 1). Otherwise, the controller
will choose the route for the regular traffic packets. Employing the SVM and
DT classifiers, the controller will broadcast the forwarding table to handle the
payload when a DDoS problem is detected. According to the experiments, SVM
performs better in a simulated environment than the decision tree.

3 Proposed System
CL brings together research and methods that deal with the issue of learning
when the distribution of the data changes over time and knowledge fusion over
limitless data streams must be must be considered [13]. In a previous work,
we evaluated the performance of various ML approaches for DDoS detection in
the SDN environment. We compared various methods, such as DT, RF, NB,
SVM, and KNN. These methods are commonly used for DDoS detection in
SDNs and perform well with high accuracy [14]. We found that the RF method
performed better than the other methods. Therefore, we try to enhance the
learning process of this method for DDoS detection in SDN. Our goal is to
provide our model with new predictive capabilities without forgetting what has
been learned previously. We propose a method for continual dataset enrichment
and deployment of new models whenever we have a better predictor model. This
is done without interrupting the detection system’s operation. The flowchart of
the process of CL is presented in Fig. 1.
Before explaining the different steps of the proposed system, we present the
notation that will be used.
1202 A. Chetouane and K. Karoui

Fig. 1. The Continual Learning process.

3.1 Notation

– P = {pk } : This set represents the security policy of the institution. It gathers
the types of attacks that the institution would like to protect itself against.
This set is chosen by the security administrators of the institution.
– Di : the initial dataset.
– Di .type: the set of types of attacks presented in Di .
– Di .dat: the data presented in Di .
– Di : the newly generated dataset.
– Di + .type: the set of types of intrusions presented in Di .
– Di .dat: the data presented in Di .
– Di+1 : the new dataset which is obtained by combining Di and Di .
– Di+1 .type: the set of types of intrusions of the new dataset which is obtained
by combining Di .type and Di .type.
– Di+1 .dat: the data presented in Di+1 .
– Di+1Dif f .type = |Di .type − Di .type|: is the difference between Di .type and
Di .type. It includes attack types that belong to Di .type and do not belong
to Di .type. The set Di+1Dif f .type is used to display the new attack types
generated in Di .
– Di+1U nion .type = |Di .type ∪ Di + .type|: is the set of union of Di .type and
Di .type. It includes the types of attacks that belong to Di .type and Di .type.
– Di+1Inter .type = |P ∩Di+1 .type|: is the set of intersection of P and Di+1 .type.
It includes the types of attacks that belong to both P and Di+1 .type.
– Di+1Dif f .dat = |Di .dat−Di .dat|: is the difference between Di .dat and Di .dat.
It includes the data that belong to Di .dat and do not belong to Di .dat. The
set Di+1Dif f .dat is used to display the new data generated in Di .
– Di+1U nion .dat = |Di .dat ∪ Di .dat|: is the set of union of Di .dat and Di .dat.
It includes the data that belong to Di .dat and Di .dat.

3.2 Dataset Creation


In order to achieve CL, we propose to enrich a selected dataset “Di ”. We create
a new dataset “Di ” by generating new DDoS traffic based on the attack types
DDoS Detection Approach Based on Continual Learning 1203

presented in the security policy P. This is done without interrupting the detection
system in operation. We propose to generate DDoS traffic between hosts and
collect the traffic statistics from the switches. The generated DDoS traffic is new
and is not included in the selected dataset “Di ”. Then, we place the obtained
traffic statistics into a “Di ” dataset. We combine the two datasets to obtain
the new dataset “Di+1 ”. We propose a method to check whether this dataset is
efficient or not. After checking the usefulness of “Di+1 ” we train the ML model
with this new dataset. Once our ML model is selected and trained, it is placed
in the SDN architecture. In addition, we can use external SDN-based public
datasets available online to enrich the initial dataset.

3.3 Dataset Effectiveness


After combining the two datasets, we propose a method based on the use of
metrics to determine the effectiveness of the new dataset Di+1 in terms of quality,
quantity, and representativity. In the first step, we focus on the effectiveness in
terms of quality of the new dataset, which is presented in our case by the types
of attacks. We present a metric called quality “qual(Di+1 .type)” to verify the
effectiveness of Di+1 .type. The proposed metric determines whether the dataset
“Di+1 ” obtained by combining the two datasets is enriched or not with respect
to “Di ” based on the types of attacks. In other words, the combination is able
to handle new types of attacks. The proposed metric is calculated as follows:
|Di+1Dif f .type|
qual(Di+1 .type) = 0 ≤ qual(Di+1 .type) ≤ 1 (1)
|Di+1U nion .type|
– Where |Di+1Dif f .type| represents the number of elements of Di+1Dif f .type
and |Di+1U nion .type| is the number of elements of Di+1U nion .type.
For the effectiveness of Di+1 in terms of quantity, we propose a metric called
quantity “quan(Di+1 .dat)” that defines the number of occurrences of the new
attack types in the new dataset Di+1 .
|Di+1Dif f .dat|
quan(Di+1 .dat) = 0 ≤ quan(Di+1 .dat) ≤ 1 (2)
|Di+1U nion .dat|
We also provide another metric called representativity “rep(Di+1 .type)”, to
assess how representative the new dataset Di+1 with respect to all searched
attack types P . The proposed metric is calculated as follows:
|Di+1Inter .type|
rep(Di+1 .type) = 0 ≤ rep(Di+1 .type) ≤ 1 (3)
|P |
– Where |Di+1Inter .type| represents the number of elements in Di+1Inter .type
and |P | is the number of elements in P.
After the calculation of the different metrics, we move on to the next step,
which is the evaluation of the obtained values, which are considered to be deci-
sion values. We used the method presented in [15] for evaluating the values of
1204 A. Chetouane and K. Karoui

decision-making attributes. The author proposed two approaches for aggregat-


ing attribute values based on two levels of classification: individual attribute
classification and global classification. The author aggregated measures into a
single measure that is a good indicator for making a decision. The obtained
measurement is reversible. We use two types of classification. We start with the
classification of each value related to each metric. We associate a metric value
(qual(Di+1 .type), quan(Di+1 .dat), rep(Di+1 .type) a binary value based on the
different intervals presented in Table 1.

Table 1. Individual classification of metric values

Class Conditions Associated binary value


Low 0 ≤ metric value<0.25 00
Medium 0.25 ≤ metric value<0.5 01
High 0.5 ≤ metric value<0.75 10
Excellent 0.75 ≤ metric value< 1 11

Then we used the bit alternation method in the global classification that
allows constructing a metric for decision making [15]. Before alternating the
individual classes of each metric, we order the metrics. If we consider three
factors qual(Di+1 .type = ‘10’, quan(Di+1 .dat) = ‘11’ and rep(Di+1 .type) = ‘11’.
We assume that the data quality is more important than the data quantity and
the representativity. The sequence M = ‘111011’ with an integer value of 59 is
obtained by applying the bit alternation of the three factors (qual(Di+1 .type),
quan(Di+1 .dat), rep(Di+1 .type)). The procedure is carried out by alternating
the bit sequences (Fig. 2).

Fig. 2. The bit alternation method.

Finally, we define the threshold of acceptability as Suse . The choice of the


threshold Suse is not part of the overall objectives of this research.

If M > Suse , we can conclude that the new dataset Di+1 resulting from the
combination of Di and Di is useful and effective. Once we verify that the new
dataset is useful, we train the ML model with the new data Di+1 . Then, we
evaluate the performance of this new model using the standard metrics, namely
DDoS Detection Approach Based on Continual Learning 1205

accuracy, precision, and recall. These metrics are generally employed to assess
the performance of ML methods [2]. If the new ML model performs well, we will
deploy it in the SDN controller.

4 Case Study
In this section, we apply the proposed system to a case study.

4.1 Dataset Creation

In this section, we try to create a new dataset D1 which includes new DDoS
traffic. We first select an initial dataset called “DDoS attack SDN dataset” D1
[16] and try to enrich it. This dataset contains both benign TCP, UDP and
ICMP traffic and harmful traffic, which consists of TCP Syn attacks, UDP Flood
attacks and ICMP attacks. There are 23 features in all in the data collection.
The class name in the last column determines whether the traffic is legitimate
or malicious. Besides, we used the mininet emulator to create the SDN traffic
dataset, namely the “new DDoS dataset”. This dataset was produced by aug-
menting the Ryu controller with a Python program made using the Ryu API
[17] and the Mininet emulator [18]. It regularly gathers various flow and port
statistics and keeps track of all the switches in the topology. The statistics it
gathers are also saved in a file. We generate a DDoS attack using hping3 [19].
For generating attacks, four types of floods are generated, an ICMP flood, a
TCP SYN flood, a UDP flood, and a LAND attack. ICMP flood, TCP SYN
flood, and UDP flood are presented in the first dataset D1 . We try to generate
new DDoS attacks based on these types to learn more about these attacks and
get new results from another SDN domain. Therefore, the ML model can learn
these types of DDoS attacks from the samples provided by D1 and D1 . We also
generate LAND DDoS attacks that are not presented in the selected dataset D1 .
We can use other available datasets and try to enrich them to get other combi-
nations of DDoS types. The characteristics of the used datasets are presented in
Table 2.

Table 2. Characteristics of the used datasets

Dataset Di Number of Number Types of attacks Di .type


samples |Di .dat| of features
DDOS attack SDN 104345 23 TCP syn UDP flood ICMP flood
new DDoS dataset 969691 21 TCP syn UDP flood ICMP flood LAND attack
1206 A. Chetouane and K. Karoui

4.2 Dataset Effectiveness


In this section, We try to determine the effectiveness of the new dataset obtained
by combining the initial dataset and the new generated dataset. First of all, we
present the values of the different notation fields presented earlier in Sect. 3.1:
– P = {ICM P, U DP, T CP, LAN D} : It presents the intrusions types that we
aim to detect. These intrusions are considered the most dangerous for the
SDN environment [20].
– D1 : “DDoS attack SDN dataset”.
– D1 .type: {TCP, UDP, ICMP}.
– D1 .dat: the data presented in D1 .
– D1 : “new DDoS dataset”.
– D1 .type: {TCP, ICMP, UDP, LAND}.
– D1 .dat: the data presented in D2 .
– D2 : the enriched dataset which is obtained by combining D1 and D1 .
– D2 .type: {TCP, ICMP, UDP, LAND}.
– D2 .dat: the data presented in D2 .
– D2Dif f .type = {LAND}.
– D2U nion .type = {TCP, UDP, ICMP, LAND}.
– D2Inter .type = {TCP, UDP, ICMP}
– D2Dif f .dat = {LAND.dat}
– D2U nion .dat = {TCP.dat, UDP.dat, ICMP.dat, LAND.dat}.
Then, we calculate the different metrics using equations (4), (5) and (6). We
obtain the following results:

qual(D2 .type) = 0.25, quan(D2 .dat) = 0.4 , rep(D2 .type) = 0.75.

The next step consists in associating binary values to the values of the three
metrics. We obtain the following factors (see Table 1): qual(D2 .type) = ‘01’,
quan(D2 .dat) = ‘01’, rep(D2 .type)= ‘11’ . There are several ways to define the
order of importance of different factors depending on the institution. In our case
study, we assume that qual(D2 .type) is more important than quan(D2 .dat) and
quan(D2 .dat) [15]. We use the bit alternation approach to determine the decision
value (see Fig. 2). For example, if we suppose that the threshold of acceptability
Suse = 10. We obtain the following sequence: 001111, which corresponds to the
integer value M = 15. We can see that M > Suse .

As a result, the new dataset D2 formed by combining the initial dataset D1


and the newly generated dataset D1 is effective. Therefore, we can say that the
initial dataset D1 is enriched. We train the Random Forest (RF) model with new
training data D2 . In the next step, we assess the performance of the new trained
RF method using the standard metrics, namely accuracy, precision, and recall.
The RF model gives good results, with a value of 99% for the three metrics.
We can note that this model performs well. Therefore, we deploy it in the SDN
architecture without interrupting the system operation or the DDoS detection
process.
DDoS Detection Approach Based on Continual Learning 1207

5 Conclusion
In this paper, we propose a Continual Learning (CL) system for DDoS detec-
tion in the SDN environment. We apply CL to the DDoS detection system in
the SDN environment to make it self-adapting to modern threats and reduce
recycling costs. We create a new DDoS dataset and combine it with a selected
dataset. Then, we propose a method to verify whether the dataset obtained by
combining the selected dataset with the newly generated dataset is useful or not;
in other words, whether we managed to enrich the selected dataset. We propose
three metrics, called quality, quantity, and representativity, to determine the
effectiveness of the new dataset. We use the bit alternation method to integrate
the three metrics and make a decision about the usefulness of the new dataset.
We train our Machine Learning (ML) model with the enriched dataset. In the
next step, we evaluate the performance of the new ML model using the standard
metrics, namely accuracy, precision, and recall. The new model performs well, so
we deployed it on the SDN controller. In future work, we will use Deep Learning
(DL) methods for DDoS detection in SDN.

References
1. Kreutz, D., Ramos, F.M., Verissimo, P.E., Rothenberg, C.E., Azodolmolky, S.,
Uhlig, S.: Software-defined networking: a comprehensive survey. Proc. IEEE
103(1), 14–76 (2014)
2. Chetouane, A., Karoui, K.: A survey of machine learning methods for DDoS threats
detection against SDN. In: International Workshop on Distributed Computing for
Emerging Smart Networks, pp. 99–127. Springer (2022)
3. Kreutz, D., Ramos, F.M.V., Verissimo, P.: Towards secure and dependable
software-defined networks. In: Proceedings of the Second ACM SIGCOMM Work-
shop on Hot Topics in Software Defined Networking, pp. 55–60 (2013)
4. Sachdeva, M., Singh, G., Kumar, K., Singh, K.: Measuring impact of DDoS attacks
on web services (2010)
5. Liao, H.J., Lin, C.H.R., Lin, Y.C., Tung, K.Y.: Intrusion detection system: a com-
prehensive review. J. Netw. Comput. Appl. 36(1), 16–24 (2013)
6. Sudar, K.M., Beulah, M., Deepalakshmi, P., Nagaraj, P., Chinnasamy, P.: Detec-
tion of distributed denial of service attacks in SDN using machine learning tech-
niques. In: 2021 International Conference on Computer Communication and Infor-
matics (ICCCI), pp. 1–5. IEEE (2021)
7. What is cybersecurity?
8. Amalapuram, S.K., Tadwai, A., Vinta, R., Channappayya, S.S., Tamma, B.R.:
Continual learning for anomaly based network intrusion detection. In: 2022
14th International Conference on COMmunication Systems & NETworkS (COM-
SNETS), pp. 497–505. IEEE (2022)
9. Eliyan, L.F., Di Pietro, R.: Dos and DDoS attacks in software defined networks: a
survey of existing solutions and research challenges. Futur. Gener. Comput. Syst.
122, 149–171 (2021)
10. Nadeem, M.W., Goh, H.G., Ponnusamy, V., Aun, Y.: DDoS detection in SDN using
machine learning techniques. Comput. Mater. Contin. 71(1), 771–789 (2022)
1208 A. Chetouane and K. Karoui

11. Ashodia, N., Makadiya, K.: Detection of DDoS attacks in sdn using machine
learning. In: 2022 International Conference on Electronics and Renewable Systems
(ICEARS), pp. 1322–1327. IEEE (2022)
12. Altamemi, A.J., Abdulhassan, A., Obeis, N.T.: DDoS attack detection in software
defined networking controller using machine learning techniques. Bull. Electr. Eng.
Inform. 11(5), 2836–2844 (2022)
13. Ring, M.B. et al.: Continual learning in reinforcement environments (1994)
14. Aslam, M., Ye, D., Tariq, A., Asad, M., Hanif, M., Ndzi, D., Chelloug, S.A., Elaziz,
M.A., Al-Qaness, M.A., Jilani, S.F.: Adaptive machine learning based distributed
denial-of-services attacks detection and mitigation system for SDN-enabled iot.
Sensors 22(7), 2697 (2022)
15. Karoui, K.: Security novel risk assessment framework based on reversible metrics:
a case study of DDoS attacks on an e-commerce web server. Int. J. Netw. Manag.
26(6), 553–578 (2016)
16. Ahuja, N., Mukhopadhyay, D., Singal, G.: DDoS attack SDN dataset (2020)
17. mgen S. Natarajan. Ryu application api
18. Mininet emulation software (2018)
19. S. Natarajan. hping3
20. Sen, S., Gupta, K.D., Manjurul Ahsan, M.: Leveraging machine learning approach
to setup software-defined network (SDN) controller rules during DDoS attack. In:
Proceedings of International Joint Conference on Computational Intelligence, pp.
49–60. Springer (2020)

You might also like