Detail Analysis of Attacks and Methods of Intrusion Detection System

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

International Journal of Trend in Scientific Research and Development (IJTSRD)

Volume 8 Issue 4, Jul-Aug 2024 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470

Detail Analysis of Attacks and


Methods of Intrusion Detection System
Keshav Sinha1, Partha Paul2
1
Department of Computer Science & Engineering,
University of Petroleum & Energy Studies, Dehradun, Uttarakhand, India
2
Department of Computer Science & Engineering, Birla Institute of Technology, Mesra, Ranchi, Jharkhand, India

ABSTRACT How to cite this paper: Keshav Sinha |


Computer networks link many activities, events, and applications. Partha Paul "Detail Analysis of Attacks
The network’s performance must be improved and have more and Methods of Intrusion Detection
capacity to handle increased users. The network’s computer system System" Published
should guarantee security, confidentiality, and integrity. An intrusion in International
Journal of Trend in
jeopardizes the operation and security of a wired or wireless network
Scientific Research
system. If the invasions are not detected at the appropriate level, the and Development
loss to the system might be immeasurable. Intrusions occur when (ijtsrd), ISSN:
malicious actors harm information resources. The hackers tamper the 2456-6470, IJTSRD68276
normal operations or attempt to infiltrate the system via the gateway. Volume-8 | Issue-4,
The study analyzes the attack and normal traffic packets from the August 2024, pp.1032-1041, URL:
KDD Cup99 dataset. The KDD Cup99 data includes benchmark www.ijtsrd.com/papers/ijtsrd68276.pdf
traffic and intrusion detection features. However, most intrusion
detection systems today have significant false alarm rates and miss Copyright © 2024 by author (s) and
many attacks because they cannot distinguish between unlawful and International Journal of Trend in
Scientific Research and Development
unlawful behaviors. Journal. This is an
KEYWORDS: Invasion, Intrusion Detection, KDDCup99, Misuse Open Access article
detection, Anomaly detection distributed under the
terms of the Creative Commons
Attribution License (CC BY 4.0)
(http://creativecommons.org/licenses/by/4.0)
I. INTRODUCTION
Daily life depends on the availability and processing formulated for every system based on future
of information quickly. If demand increased in this performance. Computer security is typically based on
scenario, it would be necessary to store realizing the following factors in a computer system.
proportionately more data and resources across
 Confidentiality: information is to be accessed
numerous computers with the necessary correlation, only by authorized persons.
and data interference, unauthorized access, and system
and network growth would worsen. The virtual access  Integrity: information must remain unaltered by
path would grant access to unauthorized network users. mischievous or malicious attempts.
On the other side, hackers can access confidential data  Availability: the computer must function without
by taking advantage of flaws in networks or systems. degradation of access and impart resources to
The constraints on access and security measures are legitimate users when required.
insufficient against internal and compromised threats.
Recognizing breaches and intrusions is the only proven In general, an intrusion is any action attempting to
approach to keeping systems and networks safe. Along compromise a resource’s confidentiality, integrity, and
with identifying real attackers, intrusion detection availability. Anderson (1980) defined intrusion as the
systems should also keep track of attempted potential opportunity of an intentionally unauthorized
intrusions. attempt to access information, manipulate
information, or make a system untrustworthy.
A trustworthy system should secure its data and
resources from unauthorized access, tampering, and Intrusion Detection System (IDS) was commercially
denial of service attacks. The function of any computer introduced in the year 1990 [1]. It behaves like a
network system should have some expected level of burglar alarm that detects invasion and triggers alarms
trust and confidence. The security policy must be like audible, visual, or messages like e-mail. The IDS

@ IJTSRD | Unique Paper ID – IJTSRD68276 | Volume – 8 | Issue – 4 | Jul-Aug 2024 Page 1032
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
is used to prevent problem behaviors that attack or events and attacks that the NIDS has not detected.
abuse the system, detect, and deal with attacks. The The configuration of HIDS resides only on an
mechanism should have low false alarms while individual host and requires more management
ensuring invasion detection. Various approaches are effort to install and configure in multiple hosts.
present, but they are relatively ineffective in the Also, HIDS are more vulnerable to direct attacks
classification and alarm rate dimensions. Machine and susceptible to some Denial of Service (DoS)
learning-based anomaly detection approaches have attacks [4].
been effectively used in the network intrusion
 Application-based IDS (AppIDS): It is the
detection scenario because of their intrinsic
enhancement of the HIDS, which examines an
capabilities of discovering new attacks [2]. Most
application for abnormal events by looking into
existing classification methods are based on neural
the files created in the application and anomaly
networks, fuzzy logic, genetic algorithm, and support
occasions such as exceeding the users’
vector machines.
authorization, and void file execution. It also
The motivation for this work is to study and analyze observes the interaction between the application
various attacks and explore benchmark datasets for and the user and the encrypted traffic. It is more
designing the enhanced methodology. The paper’s susceptible to attack and does not possess the skill
objective is to study the existing available methods to to detect software tampering [5].
explore the possibilities of improved performance.
The accuracy of any IDS is measured based on the false
The rest of the paper is organized as follows: Section
alarm rate (both positive and negative). Based on the
1 presents the Introduction to IDS, Section 2 presents
detection method, IDSs are classified into:
the surveys of significant work carried out in the
 Misuse Detection: In misuse detection or
domain of IDS, Section 3 describes the analysis of
signature- based intrusion detection system, the
data applicable to IDS, and finally, the conclusion is
signatures or patterns of the known attacks are
presented for the entire work.
placed in the database. They are matched with the
II. RESEARCH BACKGROUND signatures of traffic entering the network. In case
Detection of intrusions protects a computer network of any attack, the signature can be used to detect it
from unauthorized users as well-as insiders attack. accurately. Unfortunately, newly formed attacks
The intrusion detector task is to construct a predictive with modified signatures can go undetected
model or classification method capable of within the system and are classified as false
distinguishing ‘bad’ connections, called intrusions or negatives [6]. In general, many false negatives are
attacks, and ‘good’ or average connections. IDSs are more associated with signature-based IDS. It is
broadly classified into three categories based on also referred to as knowledge-based IDS.
deployment.
 Anomaly Detection: The anomaly detection or
 Network-based IDS (NIDS): It is a passive statistical anomaly-based IDS gathers statistical
device that resides in an organization’s computer summaries by watching the traffic, which is
or network and observes the network traffic to known to be expected, and a performance baseline
indicate attacks. It recognizes any attack and is developed. The network activities are
notifies such malicious codes to system periodically monitored and compared with the
administrators immediately. It can be installed in baseline of intrusions. The statistical and
the boundary of the router to observe the traffic behavioral patterns that detect attacks allow a low
going into and out of the network [3]. The false negative rate. The behavioral patterns of
minimum number of monitoring units for an users or programs are used to develop a pattern of
extensive network can be deployed without normal and abnormal activities, which are used to
disturbing the regular operations of networks. It is detect the occurrence of an attack. Consequently,
also not vulnerable to direct attack, but it can any variation from typical behavior by a user or
become exhausted by network traffic, unable to program would be detected, thereby generating an
detect encrypted packets and fail to distinguish alarm. Regrettably, most alarms are benign and
some attacks. false positives are derived as a result. It is also
 Host-based IDS (HIDS): It resides in the referred to as behavior-based IDS [7].
computer or server, called the host, and examines The fundamental principle of anomaly intrusion
only the host activities. It is employed to monitor detection is that any intrusive activity is a subset of
the system and stored configuration files and bizarre action. The intrusion may be recognized based
detect the intruders’ creation, modification, and on anomalous actions. For example, suppose an
deletion of system files. It can also detect local authorized employee of an organization opens the

@ IJTSRD | Unique Paper ID – IJTSRD68276 | Volume – 8 | Issue – 4 | Jul-Aug 2024 Page 1033
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
system after office hours using their official account. intrusion detection is that the data point of a specific
In that case, it is also considered abnormal, and feature that lies away from a multiple of the standard
consequently, it may be an intrusion. Likewise, users deviation (statistics) on both sides of the mean may be
in an organization constantly login out of working measured as anomalous. The disadvantages are
hours through the official server is also treated as an anomaly intrusion detections are not sensitive to the
anomaly. The intrusive activity can be carried out as a order of incidence of events. They will probably miss
sum of individual activities, and no one is separately intrusions that are indicated by sequential
anomalous. Flagging every part of anomalous interrelationships among events. Moreover, fixing the
activities precisely results in false positives or false threshold value of deviation is challenging—the
negatives. However, intrusive activity does not shallow threshold setting results in false positives, and
coincide all the time with anomalous activity. There high-value results in false negatives.
are four possibilities (Sangeetha et al., 2022): The false positives are the provocation of intrusion
 Intrusive but not Anomalous: It is also called detection systems. Anomaly detection systems are
false negatives or Type I errors, in which the mainly prone to false positives. Generally, no
activity is intrusive and fails to detect because it issignificant rate of false positives in signature-based
not anomalous. These are false negatives because systems is reported if rules are correctly installed.
the IDS falsely reports the absence of intrusions. Likewise, false negatives are also a problem for IDS.
 Not Intrusive but Anomalous: It is also called Typical data may generate false negatives in misuse-
false positives or Type II errors, in which the based systems due to the resemblance of existing
activity is not intrusive and treated as intrusive attacks. The techniques for detecting intruders have
because it is anomalous. These are called false evolved to face new attacks. It simplifies that the
positives because the IDS falsely reports standard and attack packets are indicated by ‘0’ and
intrusions. ‘1’, respectively. Table 1 presents various works in
 Not Intrusive and not Anomalous: It is also terms of security systems and feature selection. The
called true negatives, in which the activity is not Hybrid Association Classification (AC) approach, a
intrusive and is not informed as intrusive. hybrid classification methodology, was introduced by
 Intrusive and Anomalous: It is also called true Hadi et al., (2018). Several rules are developed to
positives, in which the activity is intrusive and reflect each attribute, and the number of categorization
reported as intrusive because it is also anomalous. rules is maintained to a minimum. Two Extreme
In an anomaly detection system, the activities of Layer Machines (TELM) were suggested by Qu et al.,
various subjects are observed, and profiles are (2016) to tackle challenging classification and
generated based on behaviors called master profiles. If regression problems with little storage. When a neural
any behavior changes happen in the upcoming period, network has a lot of hidden layers, TELM significantly
the new profile measures will be updated periodically. improves performance. Nabipour et al. (2020),
The current activities are stored in temporary profiles proposed a classification approach for high-
and periodically transferred to a master profile. In dimensional situations. The genetic algorithm supports
statistical intrusion detection systems, acquiring user the fuzzy rule-based methodology used to create the
activities would be trained regularly using behavioral classification model. The guidelines for choosing the
moment, which is used to distinguish the patterns as best features were predicted using the Mixed Integer
normal or abnormal. The advantage of anomaly Programming Model.
Table 1. Chronological Literature Review
Research Technique Used Methodology Advantages/Disadvantages
• Intrusion Detection
• Identify the relevant data.
System (IDS) with Data • Efficient
Nadiamm ai • Classify the Distributed Denial of
Mining. • High Detection Rate
et al. (2014) Service (DDoS) attack using
• Efficient Data Adapted • High Accuracy
labeled data.
Decision Tree (EDADT).
• Enhanced Adaptive Acknowledgm
Shakshuki • High Detection Rate
• IDS for MANET ent (EAACK)
et al., (2012) • Low False Alarm Rate
• Classify the malicious behavior
• Classify the training data
Bhatia et • IDS with Artificial • Compare the oversampling of the • Better Detection Rate
al., (2017) Neural Network (ANN) U2R and R2L • High Accuracy
• Categories the attacks

@ IJTSRD | Unique Paper ID – IJTSRD68276 | Volume – 8 | Issue – 4 | Jul-Aug 2024 Page 1034
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
• Intrusion Detection • MIL-STD protocol is used for the • High Detection Rate
Yahalom et
System for Hierarchical training of data • Efficiency
al., (2019)
Data • Reduce the false alarm rate • High Accuracy
• Intrusion Detection by • Feature Selection is performed
• High Detection Rate
Liu & Lang fusion of different using Linear correlation
• Low False Alarm Rate
(2019) feature selection coefficient and Cuttlefish
• High Accuracy
algorithms. algorithms
One of the best approaches to solving the multi-class problem in machine learning is to use a classification system
based on fuzzy rules. The Cluster Center and Nearest Neighbor feature selection method was put out by Lin et al.
(2015). It computes the distance between each data sample and its own cluster’s center by calculating the distance
and then using the same function on the data and the cluster’s closest neighbor.
Then, using the k-NN classifier, which has a high processing efficiency and detection rate, each piece of data
may be utilized in the intrusion detection process. Composition of Feature Relevancy is a novel feature selection
method proposed by Longde et al., (2018). The eight real-world datasets and two different classifiers are used
to enhance feature selection. Liu et al., (2017) proposed a technique for selecting attributes based on aptitude.
After identifying the closest traits, the quality is determined. These methods result in superior feature selection
outcomes. Basu (2019) invented a brand-new data structure called a Grid Count Tree (GCD) to find outliers. It
may be used to compute numerical value separation and category separation quickly and to separate meaningful
signals from false data. Both real- world and artificial genetically connected applications are used to evaluate this
GCD. Cai (2013) introduced the Iterative Self-organizing Map with Robust Distance (ISOMRD) for outlier
detection based on this situation. When points with similar traits assemble, clusters are created. Many databases
are processed via iterative processing. It is helpful to locate solutions for dynamic analysis and geographical data
mining applications. Bai et al., (2016) proposed an outlier detection technique based on the local outlier factor
for large data sets. Outliers are identified using the Grid-Based Partition Algorithm and the Distributed LOF. The
data collection is divided into a small number of grid sets, and data nodes are assigned. Tuples are categorized
using classification as cross- grid tuples or gird local tuples. Dispersed LOF is utilized effectively in distributed
situations to reduce outliers. Di Mauro et al. (2021), suggested a feature selection technique for two categories of
data sets. These data sets to aid in the detection of false negatives and improve forecast accuracy. Idris (2014)
suggested a negative feature selection technique for detecting e-mail spam. NSA- PSO defines a local outlier
factor to estimate the threshold value. The proposed method outperforms non-FSA techniques. Under the title
Distribution Estimation based Negative Selection Algorithm, Fouladvand et al., (2017) introduced a novel
attribute selection approach for normal and self-space using detectors (DENSA). Random detectors performed
well on a range of real-world data sets in this experiment.
Various applications exist for the IDS to detect different types of attacks and security violations. It also prevents
the applications such as Business transaction systems, Document maintenance systems, Banking, Insurance
Systems, and E-Governance from the adversary. The applications of IDS are not specified because all sensitive
services are available on the Internet and Intranet. The service providers need to safeguard valuable information
consistently. The technologies are growing exponentially, and protecting resources is becoming more complex.
The system framework and the critical elements of the research model are covered in the following section.
III. THEORETICAL FRAMEWORK
Real-world data must be generated for intrusion detection to evaluate all potential risks. The stages involved in
data analysis methodically identify patterns in the gathered data and link them to the problem that has been
recognized. Data modeling will determine how it may be categorized and connected. The accuracy and
reliability of the data collected for the evaluation are aspects of data quality. Figure 1 presents the theoretical
Framework for KDD Cup Dataset Analysis. The investigation would be feasible if the data quality and attributes
for the position were excellent.

@ IJTSRD | Unique Paper ID – IJTSRD68276 | Volume – 8 | Issue – 4 | Jul-Aug 2024 Page 1035
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470

Figure 1. Theoretical Framework for KDD Cup Dataset Analysis


The research requires several ground truth databases in its region. In this paper, KDD Cup99 is used for intrusion
detection systems to classify network traffic. The dataset consists of professional-level interest groups on
knowledge discovery and data mining (http://www.sigkdd.org/kddcup) (Nguyen et al., 2016). The Lincoln
Laboratory at Massachusetts Institute of Technology produced standard network traffic data under the auspices of
DARPA and the Air Force Research Laboratory to evaluate computer network intrusion. The research activity
mainly focuses on the 1998 and 1999 datasets. Figure 2 presents the KddCup99 dataset description.

Figure 2: KddCup99 Dataset Description


A standardized set of auditable data containing a variety of simulated intrusions data present on the military
network environment. It emulated on US Air Force LAN, mainly focused on real environment attacks. The raw
TCP/IP dump data has been captured from the network. A connection between the source IP and destination IP
address is presented in TCP sequence packets. It starts and stops at certain times that allow data to transfer per
specific protocol. Furthermore, each connection contains a label that indicates normal or an assault with a specific
attack type.
A. Attributes in KddCup99
The features are grouped into three categories (i) basic features of individual connections, (ii) content features
within a connection, and (iii) traffic features which are computed using a two-second time. The KDD Cup99
uses a series of packets with a total of 41 characteristics that are broadcast over two seconds. A packet’s
fundamental features are represented by features (0-9), content features are represented by features (10-22),
traffic features are represented by (23-31), and host-based features from (32–41). Some of the terminologies
associated with the data set are (i) Connections that were established with the same host as the one being utilized
for the current connection within the previous two seconds are referred to as having the ‘same host,’ and (ii) the
term ‘same service’ refers to connections that provided the same service as the one being used now within the
last two seconds. The characteristics based on ‘same host’ and ‘same service’ are collectively referred to as the
time-based traffic aspects of the connection records. Table 2 present the various attributes of KddCup99 datasets.

@ IJTSRD | Unique Paper ID – IJTSRD68276 | Volume – 8 | Issue – 4 | Jul-Aug 2024 Page 1036
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
Table 2. KddCup99 Attribute Description
Feature Name Variable Type Label Description
duration C 1 v1 Connections in seconds
protocol_type D 1 v2 Types of protocol (TCP, UDP, etc.)
service D 1 v3 Network service (HTTP, telnet, etc.)
flag D 1 v4 Normal or Error connection status.
src_bytes C 1 v5 Source to Destination data bytes info
dst_bytes C 1 v6 Destination to Source data bytes info
land D 1 v7 1-Connection from/to host/port. 0-otherwise
wrong_frag ment C 1 v8 Number of ‘wrong’ fragments
urgent C 1 v9 Number of urgent packets
hot C 2 v10 Count the System Access
num_failed_logins C 2 v11 Number of failed login attempts
logged_in C 2 v12 1-Successfully logged; 0- otherwise
num_comp romised C 2 v13 Compromised conditions
root_shell C 2 v14 1 - root shell is obtained; 0 - otherwise
su_attempted C 2 v15 1-SU root 0 - Otherwise
num_root C 2 v16 ‘Root’ accesses
num_file_c reations C 2 v17 File creation operations
num_shells C 2 v18 Number of shell prompts
num_acces s_files C 2 v19 Writes, delete and create operations.
num_outbo und_cm ds C 2 v20 Outbound commands in FTP
is_hot_login D 2 v21 1-Login ‘hot’ list (root, adm, etc.); 0-otherwise
is_guest_login D 2 v22 1-Login (guest, anonymous, etc.); 0-otherwise
count C 3 v23 Same Host Connections
srv_count C 3 v24 Connections to same Service
serror_rate C 3 v25 ‘SYN’ errors to the same host
srv serror rate C 3 v26 ‘SYN’ errors to the same service
rerror_rate C 3 v27 ‘REJ’ errors to the same host
srv_rerror_rate C 3 v28 ‘REJ’ errors to the same service
same_srv_rate C 3 v29 Same Service and the same host
diff_srv_rate C 3 v30 Different Services and the same host
srv_diff_ho st_rate C 3 v31 Same Service and different hosts
dst_host_count C 3 v32 Same Host to the Destination Host
Same Service to Destination Host as Current
dst_host_srv_count C 3 v33
Connection
dst_host_sa me_srv_rate C 3 v34 Same Service to the Destination Host
dst_host_di ff_srv_rate C 3 v35 Different Services to the Destination Host
dst_host_same_src_port_rate C 3 v36 Port Services to the Destination Host
dst_host_srv_diff_host_rate Different Hosts from the same service to the
C 3 v37
destination host
dst_host_se rror_rate
C 3 v38 ‘SYN’ (errors same host to destination)
‘SYN’ errors from the same service to the
dst_host_srv_serror_rate C 3 v39
destination host
dst_host_rerror_rate C 3 v40 ‘REJ’ errors (same host to destination)

dst_host_sr v_rerror_rate C 3 v41 ‘REJ’ errors (same service to destination)

* C- Continuous, D- Discrete **1-Intrinsic, 2-Content, 3-Traffic


The protocol_type, service, flag, land, logged_in, is_hot_login, and is_guest_login is labeled as discrete or
categorical features, and the other 34 features are labeled as continuous features. Table 3 present the description

@ IJTSRD | Unique Paper ID – IJTSRD68276 | Volume – 8 | Issue – 4 | Jul-Aug 2024 Page 1037
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
of various flag values of KddCup99, and the categorical features protocol_type, service, and flag have different
values listed in Table 4.

Table 3. Description of flag values


Flag Label Description
The originator sent an SYN followed by an RST but never see an SYN-ACK from the
RSTOS0 1
responder
RSTR 2 Established, responder aborted
RSTO 3 Connection established; originator aborted (sent an RST)
OTH 4 No SYN seen, just midstream traffic (a “partial connection” that was not later closed)
REJ 5 Connection attempt rejected
S0 6 A connection attempt was seen, but no reply
S1 7 Connection established, not terminated
Connection established and the close attempt by originator seen (but no reply from
S2 8
responder)
Connection established and the close attempt by responder seen (but no reply from
S3 9
originator)
SF 10 Normal establishment and termination
The originator sent an SYN followed by a FIN
SH 11 (finish ‘flag’) but never saw an SYN-ACK from the responder (hence the connection
was “half” open)
Table 4. Various services and flags in the KddCup99 dataset
Label Service Label Service Label Service
1 netbios_dgm 25 Z39_50 49 time
2 netbios_ssn 26 gopher 50 echo
3 netbios_ns 27 domain 51 ldap
4 remote_job 28 finger 52 link
5 http_8001 29 klogin 53 HTTP
6 hostnames 30 kshell 54 SMTP
7 uucp_path 31 supdup 55 UUCP
8 http_2784 32 systat 56 auth
9 iso_tsap 33 telnet 57 nnsp
10 csnet_ns 34 shell 58 nntp
11 domain_u 35 imap4 59 name
12 ftp_data 36 eco_i 60 exec
13 http_443 37 ecr_i 61 AOL
14 daytime 38 red_i 62 IRC
15 harvest 39 pop_2 63 X11
16 discard 40 pop_3 64 BGP
17 netstat 41 login 65 CTF
18 courier 42 tim_i 66 MTP
19 pm_dump 43 urh_i 67 rje
20 printer 44 urp_i 68 ssh
21 private 45 ntp_u 69 efs
22 sql_net 46 vmnet 70 ftp
23 tftp_u 47 other
24 sunrpc 48 whois
B. Classification of Attacks
There are varieties of attacks which are entering into the network over a period, and the attacks are classified into
the following four main classes:
 Denial of Service: It is a class of attacks where an attacker makes some computing or memory resource too
busy or too full to handle legitimate requests, denying legitimate users access to a machine. The three

@ IJTSRD | Unique Paper ID – IJTSRD68276 | Volume – 8 | Issue – 4 | Jul-Aug 2024 Page 1038
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
different ways to launch a DoS attack are (i) by abusing the computer’s legitimate features, (ii) by targeting
the implementation bugs, and (iii) by exploiting the misconfiguration of the systems. The DoS attacks are
classified based on the services an attacker renders unavailable to legitimate users.
 User to Root: The attacker starts with access to a normal user account on the system and gains root access.
Common programming mistakes and environment assumptions allow attackers to exploit root access’s
vulnerability.
 Remote to User: The attacker sends packets to a machine over a network that exploits the machine’s
vulnerability to gain local access as a user illegally. There are different types of R2L attacks, and the most
common attack in this class is made using social engineering.
 Probing: It is a class of attacks where an attacker scans a network to gather information to find known
vulnerabilities. An attacker with a map of machines and services available on a network can manipulate the
information to look for exploits. Different probes exist; some abuse the computer’s legitimate features, and
some use social engineering techniques.
Table 5 present the various class of attacks that is most common for the analysis of the KddCup99 dataset.
Table 5. Various attacks on KddCup99 Dataset
Attack Type Mechanism Attack Effect
back DoS Abuse/Bug Slows down server response
land DoS Bug Slows down server response
Neptune DoS Abuse Slows down server response
smurf DoS Abuse Slows down the network
pod DoS Abuse Slows down server response
teardrop DoS Bug Reboots the machine
load- module U2R Poor environment sanitation Gains root shell
buffer_over flow U2R Abuse Gains root shell
rootkit U2R Abuse Gains root shell
Perl U2R Poor environment sanitation Gains root shell
phf R2L Bug Executes commands as root
guess_pass wd R2L Login misconfiguration Gains user access
R2L Abuse Gains user access
warezmaste r

IMAP R2L Bug Gains root access

multihop R2L Abuse Gains root access

ftp_write R2L Misconfigura tion Gains user access

R2L Abuse Gains user access


spy

warezclient R2L Abuse Gains user access

satan Probe Abuse of feature Looks for known vulnerabilities

Nmap Probe Abuse of feature Identifies active ports on a machine

Probe Abuse of feature Identifies active ports on a machine


portsweep
Probe Abuse of feature Identifies active machines
ipsweep

The data set in KDD Cup99 have normal, 22 attack-type data with 41 features, and Table 6 shows a few data set.
All generated traffic patterns end with a label either as ‘normal’ or any ‘attack’ for upcoming analysis.
Table 6. Sample Data Packets
Feature Name Packet-1 (Normal) Packet-2 (Neptune)

@ IJTSRD | Unique Paper ID – IJTSRD68276 | Volume – 8 | Issue – 4 | Jul-Aug 2024 Page 1039
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
duration 0 0
protocol_type TCP TCP
service HTTP private
Flag SF REJ
src_bytes 327 0
dst_bytes 467 0
Land 0 0
wrong_fragment 0 0
urgent 0 0
Hot 0 0
num_failed_logins 0 0
logged_in 1 0
num_compromised 0 0
root_shell 0 0
su_attempted 0 0
num_root 0 0
num_file_creations 0 0
num_shells 0 0
num_access_files 0 0
num_outbound_cmds 0 0
is_hot_login 0 0
is_guest_login 0 0
count 33 136
srv_count 47 1
serror_rate 0 0
srv_serror_rate 0 0
rerror_rate 0 1
srv_rerror_rate 0 1
same_srv_rate 1 0.01
diff_srv_rate 0 0.06
srv_diff_host_rate 0.04 0
dst_host_count 151 255
dst_host_srv_count 255 1
dst_host_same_srv_rate 1 0
dst_host_diff_srv _rate 0 0.06
dst_host_same_src_port_rate 0.01 0
dst_host_srv_diff_host_rate 0.03 0
dst_host_serror_rate 0 0
dst_host_srv_serror_rate 0 0
dst_host_rerror_rate 0 1
dst_host_srv_rerror_rate 0 1
This section outlines the structure of the dataset used by the Intrusion detection system. The various kinds of
features, such as discrete and continuous, are studied with a focus on their role in the attack. The attacks are
classified with a brief introduction to each.
IV. CONCLUSION many strategies employed by the network intrusion
Any network administrator’s primary priority should detection system are described, along with each one’s
be intrusion detection. We conducted a thorough yet benefits and drawbacks. It also observed the presence
simple study to examine different methods for of many assault packets, both normal and attack. The
developing Network Intrusion Detection models. investigation in this work is broadened based on
Several research articles published in various journals several machine learning methods for identifying the
served as the foundation for the construction of this system assault. While the machine is given the ability
study. Several tables provided in this publication to learn, the behavior of the data has been studied for
analyze the Kddcup99 dataset’s characteristics. The further research.

@ IJTSRD | Unique Paper ID – IJTSRD68276 | Volume – 8 | Issue – 4 | Jul-Aug 2024 Page 1040
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
REFERENCES market trends using machine learning and deep
[1] Bass, T. (2000). Intrusion detection systems learning algorithms via continuous and binary
and multisensor data fusion. Communications data; a comparative analysis. IEEE Access, 8,
of the ACM, 43(4), 99-105. 150199-150212.
[2] Rimmer, V., Nadeem, A., Verwer, S., [12] Nadiammai, G. V., & Hemalatha, M. J. E. I. J.
Preuveneers, D., & Joosen, W. (2022). Open- (2014). Effective approach toward Intrusion
World Network Intrusion Detection. In Security Detection System using data mining
and Artificial Intelligence (pp. 254-283). techniques. Egyptian Informatics Journal,
Springer, Cham. 15(1), 37-50.
[3] Horchulhack, P., Viegas, E. K., & Santin, A. O. [13] Shakshuki, E. M., Kang, N., & Sheltami, T. R.
(2022). Toward feasible machine learning (2012). EAACK—a secure intrusion-detection
model updates in network-based intrusion system for MANETs. IEEE Transactions on
detection. Computer Networks, 202, 108618. industrial electronics, 60(3), 1089-1098.
[4] Ahmet, E. F. E., & ABACI, İ. N. (2022). [14] Bhatia, M. K., Ripudaman, S., Akashdeep, S.,
Comparison of the Host Based Intrusion & Bhardwaj, B. L. (2017). Knowledge,
Detection Systems and Network Based Attitude and Practice of self-medication among
Intrusion Detection Systems. Celal Bayar undergraduate medical students of Punjab. J
University Journal of Science, 18(1), 23-32. Med Res, 3(3), 151-4.
[5] Agarwal, N., & Hussain, S. Z. (2018). A closer [15] Yahalom, R., Steren, A., Nameri, Y., Roytman,
look at intrusion detection system for web M., Porgador, A., & Elovici, Y. (2019).
applications. Security and Communication Improving the effectiveness of intrusion
Networks, 2018. detection systems for hierarchical data.
[6] Fernando, P., Dadallage, K., Gamage, T., Knowledge-Based Systems, 168, 59-69.
Seneviratne, C., Madanayake, A., & Liyanage, [16] Liu, H., & Lang, B. (2019). Machine learning
M. (2022). Proof of Sense: A Novel Consensus and deep learning methods for intrusion
Mechanism for Spectrum Misuse Detection. detection systems: A survey. applied sciences,
IEEE Transactions on Industrial Informatics, 9(20), 4396.
18(12), 9206-9216.
[17] Lin, W. C., Ke, S. W., & Tsai, C. F. (2015).
[7] Sinha, K., & Verma, M. (2021). The Detection CANN: An intrusion detection system based
of SQL Injection on Blockchain-Based on combining cluster centers and nearest
Database. In Revolutionary Applications of neighbors. Knowledge-based systems, 78, 13-
Blockchain-Enabled Privacy and Access 21.
Control (pp. 234-262). IGI Global.
[18] Longde, S. U. N., Xiaolin, W. U., Wanfu, Z. H.
[8] Sangeetha, S. K., Mani, P., Maheshwari, V., O. U., Xuejun, L. I., & Peihui, H. (2018).
Jayagopal, P., Sandeep Kumar, M., & Allayear, Technologies of enhancing oil recovery by
S. M. (2022). Design and Analysis of chemical flooding in Daqing Oilfield, NE
Multilayered Neural Network-Based Intrusion China. Petroleum Exploration and
Detection System in the Internet of Things Development, 45(4), 673-684.
Network. Computational Intelligence &
[19] Liu, J., Lin, Y., Lin, M., Wu, S., & Zhang, J.
Neuroscience, 2022.
(2017). Feature selection based on quality of
[9] Hadi, W. E., Al-Radaideh, Q. A., & Alhawari, information. Neurocomputing, 225, 11-22.
S. (2018). Integrating associative rule-based
[20] Basu, P. (2019). Toward Reliable, Secure, and
classification with Naïve Bayes for text
Energy-Efficient Multi- Core System Design
classification. Applied Soft Computing, 69,
(Doctoral dissertation, Utah State University).
344-356.
[21] Cai, Q. (2013). Self-organizing learning model
[10] Qu, B. Y., Lang, B. F., Liang, J. J., Qin, A. K., &
for data mining applications. Stevens Institute
Crisalle, O. D. (2016). Two-hidden-layer
of Technology.
extreme learning machine for regression and
classification. Neurocomputing, 175, 826-834. [22] Bai, M., Wang, X., Xin, J., & Wang, G. (2016).
An efficient algorithm for distributed density-
[11] Nabipour, M., Nayyeri, P., Jabani, H., Shahab,
S., & Mosavi, A. (2020). Predicting stock

@ IJTSRD | Unique Paper ID – IJTSRD68276 | Volume – 8 | Issue – 4 | Jul-Aug 2024 Page 1041
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
based outlier detection on big data. [25] Fouladvand, S., Osareh, A., Shadgar, B.,
Neurocomputing, 181, 19-28. Pavone, M., & Sharafi, S. (2017). DENSA: An
effective negative selection algorithm with
[23] Di Mauro, M., Galatro, G., Fortino, G., &
Liotta, A. (2021). Supervised feature selection flexible boundaries for self-space and dynamic
techniques in network intrusion detection: A number of detectors. Engineering Applications
critical review. Engineering Applications of of Artificial Intelligence, 62, 359-372.
Artificial Intelligence, 101, 104216. [26] Nguyen, T. T., Nguyen, T. T. T., Pham, X. C.,
[24] Idris, I., & Selamat, A. (2014). Improved email & Liew, A. W. C. (2016). A novel combining
spam detection model with negative selection classifier method based on variational
algorithm and particle swarm optimization. inference. Pattern Recognition, 49, 198-212.
Applied Soft Computing, 22, 11-27.

@ IJTSRD | Unique Paper ID – IJTSRD68276 | Volume – 8 | Issue – 4 | Jul-Aug 2024 Page 1042

You might also like