Intrusion Detection Using Neural Networks and Support Vector Machines

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Intrusion Detection Using Neural Networks and Support Vector Machines

Srinivas Mukkamala, Guadalupe Janoski, Andrew Sung


{ srinivas, silfalco, sung}@cs.nmt.edu
Department of Computer Science
New Mexico Institute of Mining and Technology
Socorro New Mexico, 87801 USA

-
Abstract Information security is an issue of serious global 11. Intrusion
concern. The complexity, accessibility, and openness of the
Internet have served to increase the security risk of information
systems tremendously. This paper concerns intrusion detection.
Intrusion can be defined as any set of actions that attempt
We describe approaches to intrusion detection using neural to compromise the integrity, confidentiality or availability of
networks and support vector machines. The key ideas are to a resource. In the context of information systems, intrusion
discover useful patterns or features that describe user behavior refers to any unauthorized access, unauthorized attempt to
on a system, and use the set of relevant features to build access or damage, or malicious use of information resources.
classifiers that can recognize anomalies and known intrusions, Intrusion can be categorized into two classes, anomaly
hopefully in real time. Using a set of benchmark data from a intrusions and misuse intrusions.
KDD (Knowledge Discovery and Data Mining) competition
designed by DARPA, we demonstrate that eficient and accurate
classifiers can be built to detect intrusions. We compare the Anomalies are deviations from normal usage behavior.
performance of neural networks based, and support vector Misuses, on the other hand, are recognized patterns of attack
machine based, systems for intrusion detection. [2]. While misuse patterns are often simpler to process and
locate, it is often the anomaly patterns that will help to locate
I. Introduction problems. As misuses are recognized patterns of attack, the
detection system tends to fail when novel attack methods are
implemented. Detection of anomaly patterns is
Information assurance is an issue of serious global concern.
computationally expensive because of the overhead of
The Internet has brought about great benefits to the modem
keeping track of, and possibly updating several system profile
society; meanwhile, the rapidly increasing connectivity and
metrics, as it must be tailored system to system, and
accessibility to the Intemet has posed a tremendous security
sometimes even user to user, due to the fact behavior patterns
threat. Malicious usage, attacks, and sabotage have been on
and system usage vary greatly.
the rise as more and more computers are put into use.
Connecting information systems to networks such as the
Internet and public telephone systems W h e r magnifies the A. Intrusion Detection
potential for exposure through a variety of attack channels.
The most popular way to detect intrusions is by using the
This paper concems intrusion detection, an important issue audit data generated by the operating system. Since almost all
in defensive information warfare. We present the use of activities are logged on a system, it is possible that a manual
neural networks and support vectors machines for intrusion inspection of these logs would allow intrusions to be detected.
detection of information systems. Since most of the intrusions It is important to analyze the audit data even after an attack
can be located by examining patterns of user activities, many has occurred to determine the extent of damage sustained;
IDSs have been built by utilizing the recognized attack and this analysis also helps in tracking down the attackers and in
misuse patterns. Using neural networks for intrusion recording the attack pattems for future detection. A good IDS
detection has been done within the security community that can be used to analyze audit data for such insights makes
[1,7,8,10,11]. In our experiments, the neural networks and a valuable tool for information systems.
support vector machines are trained with normal user activity
and attack patterns. The data we used originated from MIT’s The idea behind anomaly detection is to establish each
Lincoln Labs. It was developed for KDD competition by user’s normal activity profile, and to flag deviations from the
DARPA and is considered a standard benchmark for intrusion established profile as possible intrusion attempts. A main
detection evaluations. Our goal for intrusion detection is to issue concerning misuse detection is how to develop
detect both anomalies and misuses. The approach is to train signatures that include all possible attacks to avoid false
the neural networks or support vector machines to learn the negatives, and how to develop signatures that do not match
normal behavior and attack patterns; then significant non-intrusive activities to avoid false positives. Though false
deviations from normal behavior are flagged as attacks. We negatives are frequently considered more serious, the
begin by giving basic definitions and terms in the next section. selection of threshold levels is important so that neither of the
above problems is unreasonably magnified.

0-7803-7278-6/02/$10.00 02002 IEEE 1702


B. Intrusion Detection Systems TABLE 1 : A'ITACKS USED IN DARPA EVALUATION

Intrusion detection systems (IDS) [9] are designed to


Denial of
identify-preferably in real time-unauthorized use, misuse and service Sick Sick Sick
attacks on information systems. IDSs maintains a set of Mail bomb Mail bomb Mail bomb
historical profiles or recorded profiles for users, matches an Neptune Neptune Neptune
audit record with appropriate profile, updates the profile Ping of death Ping of death Ping of death
Process table Process table Process table
whenever necessary, and reports any anomalies detected. An Smurf Smurf Smurf
IDS does not usually perform any action to prevent Syslogd Syslogd Syslogd
intrusions; its main function is to alert the system UDP storm UDP storm UDP storm
administrators that there is a possible security violation; as Remote to Dictionary
user Ftp-write Ftp-write
such it is a proactive tool rather than a reactive tool. IDSs are Guest Guest Guest
classified into two types: host based IDS and network based Phf Phf Imap
IDS. A host based IDS monitors all the activity on a single Xlock Xlock Named
information system host. It ensures none of the information Xnsnoop xnsnoop Phf
Sendmail
system security policies are being violated. A network IDS Xlock
monitors activities on a whole network and analyzes traffic xnsnoop
for potential security breaches or violations. User to super- Eject Load module Per1
user Ffbconfig Ps Xterm
Fdformat
One of the main problems with IDSs is the overhead,
which can become unacceptably high. To analyze system Probing IP sweep
logs, the operating system must keep information regarding
all the actions performed, which invariably results in huge
Saint Saint Saint
amounts of data, requiring disk space and CPU resource.
Next, the logs must be processed to convert into a
manageable format and then compared with the set of A. Denial of Service Attack
recognized misuse and attack patterns to identify possible
security violations. Further, the stored patterns need be
A denial of service attack is a class of attacks in which an
continually updated, which would normally involve human attacker makes some computing or memory resource too busy
expertise. An intelligent, adaptable and cost-effective tool
or too full to handle legitimate requests, or denies legitimate
that is capable of (mostly) real-time intrusion detection is the
users access to a machine. Examples are Apache5 Back,
goal. Land, Mailbomb, SYN Flood, Ping of death, Process table,
Smurf, Syslogd, Teardrop, Udpstorm.
III. DARPA Data for Intrusion Detection
B. User to Root Attacks
The data was acquired from the 1998 DARF'A intrusion
detection evaluation program. They set up an environment to
User to root exploits are a class of attacks in which an
acquire raw TCP/IP dump data for a local-area network
attacker starts out with access to a normal user account on the
(LAN) simulating a typical U.S. Air Force LAN. They
system and is able to exploit vulnerability to gain root access
operated the LAN as if it was a true environment, but blasted
to the system. Examples are Eject, Ffbconfig, Fdformat,
with multiple attacks. For each TCP/IP connection, 41
Loadmodule, Perl, Ps, Xterm.
various quantitative and qualitative features were extracted.

Attacks fall into four main categories: ,


C. Remote to User Attack

A remote to user attack is a class of attacks in which an


1. DOS: denial of service
attacker sends packets to a machine over a network? but who
2. R2L: unauthorized access from a remote machine
3. U2R: unauthorized access to local super user (root) does not have an account on that machine; exploits some
vulnerability to gain local access as a user of that machine.
privileges
Examples are Dictionary, Ftp-write, Guest, Imap, Named,
4. Probing: surveillance and other probing
Phf, Sendmail, Xlock, Xsnoop.
Table 1 shows 32 different exploits that were used in the
1998 DARPA intrusion detection evaluation. This table D.Probing
presents attacks broken up into categories by type and
operating system. Probing is a class of attacks in which an attacker scans a
network of computers to gather information or find known

0-7803-7278-6/02/$10.00 02002 IEEE 1703


vulnerabilities. An attacker with a map of machines and Count Number of Continuous
connections to the
services that are available on a network can use this
same host as the
information to look for exploits. Examples are Ipsweep, current connection in
Mscan, Nmap, Saint, Satan. the past two seconds
Serror-rate % Of connections that Continuous
have "SYN" errors
E. List of Features Remr-rate % Of connections that Continuous
have "REP'errors
TABLE 2: LIST OF FEATURES Same-srv-rate % Of connections to continuous
(KDDCUP-99 TASK DESCRIPTION [16]) the same service
Diff-srv-rate % Of connections to Continuous
different services
Srv-count Number of Continuous
Duration Length (number of connections to the
seconds) of the same service as the
current connection in
I e.g. tcp, udp, etc. Srv-serror-rate
Service I Network service on I have 'W""errors I
the destination, e.g., Srv-rerror-rate I % Ofconnections that I Continuous
I have"REJ"errors I
Srv-diff-host-rate I % Ofconnections to I Continuous
h m source to
destination
~.
I differenthosts I
Dst-.bytes I Number of data bytes
from destination to
source IV. SVM Intusion Detection System
Flag Normal or error status
of the connection
Land 1 if connection is Discrete The construction of an SVM intrusion detection system
h d t o the same consists of three phases:
host/port; 0
otherwise
* Preprocessing: using automated parsers to process the
Wronghgment Number of "wrong"
randomly selected raw TCP/P dump data in to
fragments machine-readable form.
urgent Number of urgent * Training: in this process SVM is trained on different
ackets types of attacks and normal data. The data have 4 1
Number of "hot" input features and fall into two classes: normal (+1)
indicators
Num-failed-logins Number of failed Continuous and attack (-1).
* Testing: measure the performance on testing data.
Logged in 1 if successllly
logged in; 0

"commxnised"
I
Discrete

I
Continuous
A . Support Vector Machines

Support vector machines, or SVMs, are learning machines


conditions that plot the training vectors in high-dimensional feature
Root-shell 1 if root shell is
space, labeling each vector by its class. SVMs view the
obtained, 0 otherwise
SU-attempted . 1 if "su m t " classification problem as a quadratic optimization problem.
I command attempted; They combine generalization control with a technique to
I ootherwise avoid the "curse of dimensionality" by placing an upper
Nun-root I Number of "root" Continuous bound on the margin between the different classes, making it
Continuous1 a practical tool for large and dynamic data sets. SVMs
creation operations classify data by determining a set of support vectors, which
Num-shells Number of shell are members of the set of training inputs that outline a hyper
prompts plane in feature space [121.
Num-access-files Number of operations
on access control
files The SVMs are based on the idea of structural risk
Num-ou tbound-cmds Number of outbound minimization, which minimizes the generalization error, i.e.
commands in an QI true error on unseen examples. The number of free
session
Is-hot-login 1 if the login belongs
parameters used in the SVMs depends on the margin that
to the "hot" list; 0 separates the data points but not on the number of input
otherwise features, thus SVMs do not require a reduction in the number
Is_guest-login 1 if the login is a of features in order to avoid overfitting. SVMs provide a
"guest' login; 0
I otherwise
generic mechanism to fit the surface of the hyper plane to the

0-7803-7278-6/02/$10.00 02002 IEEX 1704


data through the use of a kernel function. The user may vector (w) during the training process is 126.10847;
provide a h c t i o n , such as a linear, polynomial, or sigmoid normalization of the longest example vector (x) is 1.0000.
curve, to the SVMs during the training process, which selects The number of kernel evaluations is 3 148450. The estimated
support vectors along the surface of this function. This VC-dimension [12] of the classifier is less than or equal to
capability allows classifying a broader range of problems. 31807.69124.
The primary advantage of SVMs is binary classification and
regression that they provide to a classifier with a minimal 2) Testing: We apply SVMs to a set of intrusion data as
VC-dimension [121, which implies' low expected probability described in Section 3 above. In our case we use the SVMs
of generalization errors. In our case all intrusions are to differentiate intrusions and normal activities. The testing
classified as +1, and normal data are classified as -1. All the set, consisting of 6980 data points with 41 features, received
SVMs experiments described below use the freeware package 99.50% accuracy, with a total runtime of 1.63 sec. The
SVM light [ 131. following graph shows the results.

There are two main reasons that we experiment with SVMs


for intrusion detection. The first is speed as real-time SVM detection
I
i
performance is of primary importance to intrusion detection
systems, any classifier that can potentially outrun neural
networks is worth considering. The second reason is
scalability: SVMs are relatively insensitive to the number of
data points and the classification complexity does not depend
on the dimensionality of the feature space [14], so they can
potentially learn a larger set of patterns and be able to scale
better than neural networks. Once the data is classified into
two classes, a suitable optimizing algorithm can be used if
necessary for further feature identification, depending on the
application [14].
Fig. 1. SVMs results on KDD intrusion detection (outputs 1 denote 1 or
normal; outputs 2 denote -1 or attack)
B. The development of SVM IDS
V. The Neural Network Intrusion Detection System
The data is first partitioned into two classes: normal and
attack, where attack represents a collection of 22 different The neural network intrusion detection system consists of
attacks belonging to the four classes described in section 3.1. three phases:
The objective is to separate normal (1) and intrusive (-1)
patterns.
* Using automated parsers to process the raw TCPAP
dump data in to machine-readable form.
1) Training: The SVMs are trained with normal and * Training: neural network is trained on different types
intrusive data. Our processed data consists of 14292 data of attacks and normal data. The input has 41 features
points: 7312 for training, 6980 for testing. Each point is and the output assumes one of two values: intrusion
located in the n-dimensional space, with each dimension (22 different attack types), and normal data.
corresponding to a feature of the data point. We used a * Testing: performed on the test set containing 6980
training set of 7312 data points with 41 features [16]. Data data.
points contain actual attacks and normal usage patterns. Data
points are used for training using the RBF (radial bias A. Experiments Using Neural Networks
function) kernel option; an important point of the kernel
function is that it defines the feature space in which the
Multi-layer, feed-forward networks are used. The scaled
training set examples will be classified [13].
conjugate gradient descent algorithm, available fiom the
MATLAB package, is used for training.
During the training process the default regularization
parameter is set to c = 1000, with optimization done for 2733
iterations. During training only 6 data points from the 7312 Our data consists of the same set of 14292 data points. The
set of 7312 training data is divided in to two classes: normal
training set are misclassified. A difference of 0.00072 was
achieved with the CPU run time of 17.77 seconds. The and attack, where the attack is a collection of 22 different
number of support vectors used in the training process were types of instances that belong to the four classes described in
204, including 29 at the upper bound. Linear loss during the section 3, and the other is the normal data.
process was 17.78295. The normalization of the weight

0-7803-7278-6/02/$10.00 02002 IEEE 1705


In the study we use three different feed-forward neural
networks with the following architectures:
I NN detection

Network A: Clayer, 4 1-20-20-20- 1. 1 2.5 I I


Network B: 3-layer, 4 1-40-40-1. 2
Network C: 3-layer, 41-25-20-1. 2m 1.5
G I
We use an initial training set of 7312 normalized input-
output pairs consisting attack pattems, and normal user
pattems.
O50 5
- - b " ~ ~ ~
Data points
I ) Training Neural Networks: The training of the neural
networks was conducted using feed forward back propagation +NN [41.40,40,1] +NN [41,20,20,20.1]
algorithm using scaled conjugate gradient decent or SCG for -&- NN [41,25,20,1] -.X- Actual
learning. The network was set to train until the desired mean
square error of 0.001 was met. During the training process the Fig. 3. Neural network testing on KDD intrusion detection
goal was met at 394 epochs with a performance of
0.0009962988, Fig 2 shows the training process. The purpose
of having multiple networks is to find a suitable VI. Comparison of SVMs and Neural Networks
architecture that can detect at a faster speed with low error
rate, minimizing false positives and false negatives. Out of all Figure 4 below shows a comparison of the results of (the
the networks architectures used, network B performs the best best performing) neural network and support vector machines
detection with 99.25% accuracy. on the KDD data subset selected for testing. Due to the large
size of the testing set, only thirty data points are shown here.
As can be seen, the SVM IDS has a slightly higher rate of
Figure 2 below demonstrates extremely good results of the
making the correct detection.
training of network B that converges in 394 epochs, while
other methods we tried (Gradient Descent with Adaptive
Linear Back Propagation and Gradient Descent with
Comparison of NN's and SVM's
Momentum and Adaptive Linear Back Propagation) took
longer.

I +NN prediction +SVM prediction -b- Actual I


Fig. 4. Neural network and SVMs testing on two classes attackhormal data

VII. Conclusion

We have constructed intrusion detection systems using


Fig. 2. Neural network haining on KDD intrusion detection data-subset neural networks and support vector machines, and tested their
performance on a set of benchmark DARPA data. It is
2) Testing the Neural Network: The testing set, as before, observed that both the neural networks and SVMs deliver
consisting of 6980 data points with 41 features. We have highly accurate results (greater than 99% accuracy on testing
three different feed-forward, multi-layer neural network set) and show compatible level of performance. The training
architectures. The following figure shows the results of three time for SVMs is significantly shorter (17.77 sec vs. 18 min),
different architectures: Network A performed with an an advantage that becomes rather important in situations
accuracy of 99.05%; network B achieved an accuracy of where retraining needs to be done quickly (e.g., when new
99.25%; network C performed with an accuracy of 99%. attack pattems are discovered). The running time of SVMs is
also notably shorter. On the other hand, SVMs can only make

O-7803-7278-6/02/$10.00 02002 IEEE 1706


binary classifications, which is a severe disadvantage where [7] Debar H, Becke M, Siboni D (1992) A Neural
the intrusion detection system requires multiple-class Network Component for an Intrusion Detection
identifications (e.g., all 22 different types of attacks need to System. Proceedings of the IEEE Computer Society
be differentiated). Symposium on Research in Security and Privacy.
[SI Debar H, Dorizzi B (1992) An Application of a
Statistical learning techniques are being used more Recurrent Network to an Intrusion Detection System.
extensively in recent intrusion detection systems, owing to Proceedings of the International Joint Conference on
their adaptability and their generalization capability regarding Neural Networks. pp 78-483.
new attack signatures that would need to be ‘learned’ [9] Denning D (FEB 1987) An Intrusion-Detection
quickly? once discovered?by an IDS. Whether to use SVMs Model. IEEE Transactions on Software Engineering,
or neural networks in implementing an intrusion detector Vol. SE-13, NO 2..
depends on the particular type of intrusion (anomaly or [lo] Ghosh AK. (1999). Learning Program Behavior
misuse) that is under watch, as well as other security policy Profiles for Intrusion Detection. USENIX.
requirements. SVMs have great potential to be used in place [ 113 Cannady J. (1998) Artificial Neural Networks for
of neural networks due to its scalability (large data sets and Misuse Detection. National Information Systems
large number of features in patterns can easily overwhelm Security Conference.
neural networks) and faster training and running time. On the [12] Vladimir VN (1995) The Nature of Statistical
other hand, neural networks have already proven to be usehl Learning Theory. Springer, Berlin Heidelberg New
in many IDSs, and are especially suited for multi-category York.
classifications. [ 131 Joachims T (2000) SVMlight is an implementation of
Support Vector Machines (SVMs) in C.
http://ais.gmd.de/-thorstedsvm-light/ . University of
VIII. Acknowledgements
Dortmund. Collaborative Research Center on
‘Complexity Reduction in Multivariate Data’
Partial support for this research received from ICASA (SFB475).
(Institute for Complex Additive Systems Analysis, a division [14] Joachims T (1998) Making Large-scale SVM
of New Mexico Tech) is gratefully acknowledged. The Learning Practical. LS8-Report, University of
second author also acknowledges her partial support received Dortmund, LS VIII-Report.
from Sandia National Laboratories under the Rio Grande [ 151 Joachims T (2000) Estimating the Generalization
Educational Initiative. We would also like to acknowledge Performance of a SVM Efficiently. Proceedings of the
many insightful conversations with Dr. Jean-Louis Lassez, International Conference on Machine Learning,
David Duggan, and Bob Hutchinson that contributed greatly Morgan Kaufman.
to our work. [ 161 http://kdd.ics.uci.edu/databases/kddcup99/task.html.

IX. References

[l] Ryan J, Lin M-J, Miikkulainen R (1998) Intrusion


Detection with Neural Networks. Advances in Neural
Information Processing Systems 10, Cambridge, MA:
MIT Press.
[2] Kumar S, Spafford EH (1994) An Application of
Pattern Matching in Intrusion Detection. Technical
Report CSD-TR-94-013. Purdue University.
[3] Luo J, Bridges SM (2000) Mining Fuzzy Association
Rules and Fuzzy Frequency Episodes for Intrusion
Detection. International Journal of Intelligent Systems,
John Wiley & Sons, pp 15:687-703.
[4] Demuth H, Beale M (2000) Neural Network Toolbox
User’s Guide. Mathworks, Inc. Natick, MA .
[5] Sung AH (1998) Ranking Importance Of Input
Parameters Of Neural Networks. Expert Systems with
Applications, pp 15:405-411.
[6] Cramer M, et. al. (1995) New Methods of Intrusion
Detection using Control-Loop Measurement.
Proceedings of the Technology in Information
Security Conference (TISC) ’95. pp 1-10.

0-7803-7278-6/02/$10.00 G2002 IEEE 1707

You might also like