EECE 655 Paper

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Intrusion Detection System using Machine Learning

Antoine El Choueiri Maria Khaled


Nour Ardo EECE Department EECE Department
EECE Depatment American University Of Beirut American University of Beirut
American University Of Beirut Beirut Lebanon Beirut, Lebanon
Beirut, Lebanon [email protected] [email protected]
[email protected]

Abstract— With the enormous increase of the protocols and of the main issues with misuse detection is that it cannot
applications, and the complexity of the Internet traffic, the identify unknown intrusions. Several algorithms have been
number of cyberattacks and intrusions has surged immensely. used to implement Intrusion Detection Systems, the most
This has led to an increased importance of intrusion detection famous ones being k-Nearest Neighbors (kNN), Support
systems to insure security and privacy for private and Vector Machine (SVM) and Convolutional Neural Networks
governmental organizations. An intrusion detection system (IDS) (CNN). The objective of this paper is to analyze the results
is a system that monitors network traffic and provides alerts of implementing IDS with both kNN and SVM in terms of
when it detects suspicious activity. The intrusion detection
accuracy, recall, prediction and f1 score. In section (II), we
systems have shifted to deep learning models for better accuracy
will look at previous work regarding the use of deep learning
and faster detection. In this paper, we will be comparing different
deep learning models based on the precision, the recall, the f1 in implementing intrusion detection systems. In section (III),
score and the accuracy. Our focus here is on the most widely we will point on the comparison between the different
used models: KNN and SVM. Running DOS and port scanning models for implementing intrusion detection. In section (IV)
attacks in real time, and predicting the intrusion with these we will implement an intrusion detection system using both
models have shown a similar output for both models with a SVM and KNN models and compare the results in terms of
difference in the DDoS recall where it is higher in KNN. accuracy, precision, f1 score and recall in section (V).

Keywords— Intrusion Detection System, k-Nearest Neighbor


(kNN), Support Vector Machine (SVM), Convolutional Neural
Network (CNN), False positive, False negative, Recall, F1 Score,
Accuracy. II. RELATED WORK
Several authors have studied the use of Machine Learning
for Intrusion Detection Systems.
B.Basaveswara Rao and K. Swathi [2] worked on the
I. INTRODUCTION adaption of fast kNN classifiers for int9rusion detection
systems. Using the NSL-KDD data set, they evaluated the
The recent years have seen a tremendous advance in
regular kNN classification as well as two fast kNN
computer networking with plenty of network-based and
internet-based application. Social media, e-commerce algorithms: Partial Distance Search k-Nearest Neighbor
services, banking services as well as many other applications (KPDS) and Indexed Partial Distance Search k-Nearest
have become network and internet-based. [3] Unfortunately, Neighbor (IKPDS). After pre-processing the data, they
this expansion of computer networks comes with a high risk executed the three algorithms to compare their results in
of malicious attacks and network intrusions. The focus of terms of accuracy and computational time. Their results
researchers has lately been shifted on networking security showed that the confusion matrices were the same which led
and trying to implement security mechanisms to prevent and to the same accuracy for the three different classifications.
detect any malicious activity in the network. Intrusion However the computational time for executing IKPDS was
Detection is a field of research that has for purpose to detect lower than this of the traditional kNN and the PKDS.
intrusions of the network by attacker in the fastest way H. Shapoorifard and P. Shamsinejad [3] offered a novel
possible to an appropriate action may be take to fix the hybrid method with an improved kNN classifier for
damaged caused promptly [1]. Several techniques can be intrusion detection systems. They used the CANN approach
used in intrusion detection systems. Anomaly Detection which is based on cluster center and the k-nearest neighbor.
types of IDS techniques establish a normal traffic profile and They also introduced a new parameter, the k-farthest
compared the traffic on a network to this profile in order to neighbor (kFN) classifier. They tested their new classifier
detect any anomalous activity. The issue with anomaly on the NSL-KDD dataset by first preprocessing and
detection is that it could result in false negatives where normalizing the data and then executed their algorithm using
anomalous traffic is flagged as normal traffic. Anomaly
four types of attack: U2R, DoS, R2L and Probe as well as
Detection Systems use either a statistical approach, where a
the normal traffic. The results obtained showed that the
behavior profile is created to train the intrusion detection
model or a predictive pattern generator which predicts future addition of the farthest neighbor improved the accuracy and
events based on data from events that have occurred detection rate and decreased the false positives in the best
previously. A third approach is the use of neural networks case and performed just like a regular kNN in the worst case
where a neural network model is trained to predict the user’s scenarios.
future command [1]. Another type of intrusion detection D. Ashok Kumar and S.R Venugopalan in [4] analyzed
systems is the misuse detection systems that defines an the effect of normalization on intrusion detection systems
anomalous system behavior and classifies all other different classifiers. To carry out their experiment, they tested both
behaviors as normal. As one can see, a misuse detection the Naïve Bayes and J48 classifiers on the KYOTO 2006+
system is the reverse of the anomaly detection system. One and KDD CUP 99 datasets with four different normalization

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


techniques for quantitative attributes: frequency detection systems. They trained their model on the
normalization, mean range normalization, maximize KDDCUP 99 dataset. Using the True Positive and True
normalization and rational normalization. They also used Negative results, the accuracy, precision, recall and F1-score
probability function normalization for qualitative attributes. of the model were evaluated. The results showed that the
The results showed that using the mean-range to normalize proposed model outperformed previous models like DNN
quantitative attributes and the probability function to and LSTM. The model used 425, 989 parameters and uses
normalize qualitative attributes gives a better percentage of relatively uncomplicated preprocessing techniques. It can
correctly detected instances. The experiment didn’t take into therefore be used in IoT devices with low and limited
consideration accuracy, detection rate, false positives and computation power.
false negatives.
W. Ii and Z. Liu in [5] used the SVM model for intrusion
detection systems and compared the model without
normalization, with Max normalization and with Min-Max III. THEORY
normalization. The data was trained using the RBF kernel. Different machine learning models were used in
The comparison was based on the accuracy of five fold intrusion detection, especially SVM and KNN, and they
cross validations and the time taken by the model. The showed high accuracies in detecting malicious activities. In
results showed that not normalizing wastes a significant previous works, SVM showed close accuracy to KNN. We
amount of time and that the Min-Max normalization gives are going to compare in this paper these two different
better results than the Max normalization in terms of models based on the number of false negatives, false
accuracy. positives, f1 score and accuracies, as well as the prediction
S.T Ikram and A.K Cherukuri in [6] implemented an of each in real-time.
intrusion detection system with multi-class SVM and RBF
kernel on an NSL-KDD data set. The data was initially
normalized and the most relevant features were selected
using the chi-squared selection. The model was trained with IV. IMPLEMENTATION OF ATTACKS AND DETECTION
the (C, gamma) pair which maximized the accuracy of the SYSTEMS
validation set. The model achieved a high accuracy, high From the different machine learning models that
detection rate and low false alarm rates compared to more were previously implemented, we have chosen the two most
traditional models. used: SVM and KNN. Based on previous works, we
J. Jha and L.Ragha in [7] implemented an SVM designed the models with the parameters that caused the
intrusion detection system integrates the Information Gain highest accuracy and the best output: the MinMaxScaler and
Ration (IGR) and the k-mean algorithm. They selected the the Normalizer as scalers, and sigmoid as kernel.
most relevant features of the NSL-KDD dataset with a Regarding the dataset, we have used CICFlowMeter [11], an
hybrid approach. The features were ranked using the IGR. application developed by the Canadian Institute of
Then the k-mean classifier was used to compute the Cybersecurity (CIC) at the University of New Brunswick
accuracy of the subset of features. The results showed that a that captures the flow of data and extracts 84 features. We
picking a reduced dataset using the IGR and the k-mean have downloaded the Intrusion Detection Evaluation Dataset
classifier increased the performance and accuracy, as well as (CIC-IDS 2017) [12] that was also provided by the CIC, and
the training and testing time of the SVM model. used it to train our models. The pre-collected dataset
A. Bachar, N. El Makhfi and O. El Bannay in [8] contains normal flows of packets and malicious ones like
implemented an intrusion detection system based on the DDOS, port scan, bot, SSH-Patator… After training the
SVM model with Gaussian kernel and polynomial kernel. different models, we have written an algorithm that captures
The model trained 175,341 records of the UNSW-NB15 the flow of packets in real-time using the CICFlowMeter
dataset and tested 82,332 records. The data was first Python library, and that predicts whether there is an
preprocessed and normalized. The model was evaluated in intrusion or not based on each model. We have implemented
terms of false positives rate, true negatives rates, accuracy, two attacks: the ping of death [13] and Nmap’s port scan.
precision, recall and F1-score. The results showed that both Note that the CICFlowMeter outputs a csv file of the flow
kernels give almost the same metrics and that SVM gives a and this csv file is inputted in the models to predict whether
better accuracy of 94% than the ANN, RepTree, multilayer it is a malicious activity, so the real-time is not 100%, it
perceptron model and Random Forest model. took few more minutes to enter the file to be able to predict.
G.K. De Teyou and J. Ziazet in [9] implemented an
intrusion detection system based on the Convolutional
Neural Network (CNN) model using the NSL-KDD dataset.
They compared the results with previously implemented V. COMAPRISON OF MODELS
models. After preprocessing and normalizing the data, they Both models showed a very high accuracy: 0.96806
used CNN to extract the features and used it again for for SVM, and 0.998407 for KNN. But this accuracy does
intrusion classification. The results showed that this CNN- not reflect the correctness of the models because the number
CNN scheme performed well compared to the CNN-SVM, of benign samples that were used to train the models is
CNN-KNN and CNN-DNN models with an accuracy of higher in a significant amount than the malicious ones, and
80.7% and 77.15% on the 2 class and 5 class classifications. then the number of correctly classified would definitely be
S. Sriram, A. Shashank, R. Vinayakumar and KP high. We will focus on the precision and recall. The
Soman in [10] implemented a DCNN model for intrusion precision reflects on the number of samples that the model
labeled as attack and they are actually attacks, and the recall performance of each from the f1 score, recall, precision and
reflects on the number of attacks that the model did classify accuracy. They showed similar results except for the recall of
as attacks. DDoS in SVM that was average. These models predicted
A high precision shows a low false negative rate, and the also the intrusion in real-time, and succeeded in classifying
high recall shows a low false positive rate. The f1-score is a port scan and the normal traffic, but the DDoS was not
function that reflects the relation between precision and always detected by SVM due to the low recall value.
recall. Bot and SSH-Patator were not detected with SVM,
but they presented a high precision (0.93 for Bot and 1 for
SSH-Patator) and a high recall (0.85 for Bot and 0.94 for REFERENCES
SSH-Patator). Benign packets and port scan presented also a
very high precision and recall for both models. DDoS [1] A. Sundaram, “An introduction to intrusion detection”, 1996.
detection differed between SVM and KNN where the recall [2] B.Basaveswara Rao and K. Swathi, “Fast kNN Classifiers for
was average in SVM (0.56) but very high in KNN (1), Network Intrusion Detection System”, Indian Journal of Science and
Technology, India, vol 10(14), April 2017.
which lead to a significantly lower f1 for DDoS in SVM
[3] H. Shapoorifard and P. Shamsinejad, “Intrusion Detection using a
compared to KNN. In the real-time prediction, both models Novel Hybrid Method Incorporating an Improved KNN”,
predicted correctly the port scan and the benign packets. But International Journal of Computer Applications (0975-8887), vol.
the DDoS was not predicted correctly using SVM which is 173, No-1, September 2017.
consistent with the low recall of the model. This experiment [4] D. Ashok Kumar and S.R Venugopalan, “The Effect of Normalization
showed similar work and results for both models, except for on Intrusion Detection Classifiers (Naïve Bayes and J48),
International Journal on Future Revolution in Computer Science and
the DDoS where the prediction was better with KNN. Communication Engineering, vol.3, Issue 7.
[5] W. Ii and Z. Liu, “A method of SVM with Normalization in Intrusion
SVM: Detection”, Procedia Environmental Sciences, vol 11, part A, pp 256-
262, 2011 [2011 2nd International Conference on Challenges in
Environmental Science and Computer Engineering (CESCE 2011)].
[6] S.T. Ikram and A.K. Cherukuri, “Intrusion detection model using
fusion of chi-square feature selection and multi class SVM”, Journal
of King Saud University, Computer and Information Sciences, Saudi
Arabia, vol.29, Issue 4, pp-462-472, October 2017.
[7] J. Jhaand L. Raghda, “Intrusion Detection System using Support
Vector Machine”, International Journal of Applied Information
Systems (IJAIS), Foundation of Computer Science FCS, New York,
USA [International Conference & workshop on Advanced Computing
2013 (ICWAC 2013)].
[8] A. Bachar, N. El Makhfi and O. El Bannay, “Towards a behavioral
network intrusion detection system based on the SVM model”, 2020
1st International Conference on Innovative Research in Applied
Science, Engineering and Technology (IRASET), Morocco, April
2020.
[9] G. K. De Teyou and J. Ziazet, “Convolutional Neural Network for
Intrusion Detection System In Cyber Physical Systems”, Cornell
University, May 2019.
[10] S. Sriram, A Shashank, R Vinayakumar and K.P. Soman, “DCNN-
IDS: Deep Convolutional Neural Network based Intrusion Detection
System”, Center for Computational Engineering and Networking,
Amrita School Of Engineering, India, Division of Biomedical
Informatics, Cincinnati Children’s Hospital Medical Center, Ohio,
United States.
[11] A. H. Lashkari, A. Seo, G. Drapper Gil, Ali.A Ghorbani, “CIC-AB:
An Online Ad Blocker for Browsers”, Canadian Insitute of
Cybersecurity, UNB Research Expo, April 2017.
https://www.unb.ca/cic/research/applications.html#CICFlowMeter
[12] Iman Sharafaldin, Arash Habibi Lashkari, and Ali A. Ghorbani,
“Toward Generating a New Intrusion Detection Dataset and Intrusion
Traffic Characterization”, 4th International Conference on
Information Systems Security and Privacy (ICISSP), Portugal,
January 2018.
[13] Muzixing, “Attack”, GitHub, July 2016.
VI. CONCLUSION https://github.com/muzixing/Attack

The complexity in the networks and in the Internet has


lead to a continuous increase in vulnerabilities and attacks, Link to YouTube Demo:
and after the wide spreading and usage of the Internet, https://www.youtube.com/watch?v=Y4oHy1x4aw0&fea
ture=youtu.be
intrusion detection systems were needed to detect and
prevent malicious activities, and to protect the security and
Note: Maria focused on the literature review while
privacy of users. And since everything is shifting nowadays Antoine and Nour focused on the simulation. Maria
to machine learning, the intrusion detection was studied and wrote the Introduction and Literature review sections
built with different deep learning model to preserve accuracy while Antoine wrote the theory, implementation section
and privacy. We considered in this paper two different and references and finally Nour wrote the abstract,
models: KNN and SVM. And we have compared the comparison of results and the conclusion

You might also like