Ring 2018
Ring 2018
Ring 2018
a r t i c l e i n f o a b s t r a c t
Article history: Flow-based data sets are necessary for evaluating network-based intrusion detection sys-
Received 25 July 2018 tems (NIDS). In this work, we propose a novel methodology for generating realistic flow-
Revised 21 November 2018 based network traffic. Our approach is based on Generative Adversarial Networks (GANs)
Accepted 23 December 2018 which achieve good results for image generation. A major challenge lies in the fact that
Available online 26 December 2018 GANs can only process continuous attributes. However, flow-based data inevitably contain
categorical attributes such as IP addresses or port numbers. Therefore, we propose three
Keywords: different preprocessing approaches for flow-based data in order to transform them into
GANs continuous values. Further, we present a new method for evaluating the generated flow-
TTUR WGAN-GP based network traffic which uses domain knowledge to define quality tests. We use the
NetFlow three approaches for generating flow-based network traffic based on the CIDDS-001 data
Generation set. Experiments indicate that two of the three approaches are able to generate high quality
IDS data.
able data sets are often outdated or suffer from other short-
1. Introduction comings. Typically, network traffic is captured in packet-based
or flow-based format. This work focuses on flow-based net-
Detecting attacks within network-based traffic has been of
work traffic. Using real flow-based network traffic is problem-
great interest in the data mining community over decades.
atic due to the missing ground truth. Since flow-based data
Recently, Buczak and Guven (2016) presented an overview of
sets contain millions up to billions of flows, manual label-
the community effort with regard to this issue. However, there
ing of real network traffic is difficult even for security experts
are still open challenges (e.g., the high cost of false-positives
and extremely time-consuming. As another disadvantage, real
or the lack of labeled data sets which are publicly available)
network traffic often cannot be shared within the community
for the successful use of data mining algorithms for anomaly-
due to privacy concerns. However, labeled data sets are neces-
based intrusion detection (Catania and Garino, 2012; Sommer
sary for training supervised data mining methods (e.g., clas-
and Paxson, 2010). In this work, we focus on a specific chal-
sification algorithms) and provide the basis for evaluating the
lenge within that setting.
performance of supervised as well as unsupervised anomaly-
Problem statement. For network-based intrusion detection,
based intrusion detection methods.
few labeled data sets are publicly available which contain re-
Objective. Large training data sets with high variance can
alistic user behavior and up-to-date attack scenarios. Avail-
increase the robustness of anomaly-based intrusion detection
∗
Corresponding author.
E-mail addresses: [email protected] (M. Ring), [email protected] (D. Schlör), dieter.landes@hs-
coburg.de (D. Landes), [email protected] (A. Hotho).
https://doi.org/10.1016/j.cose.2018.12.012
0167-4048/© 2018 Elsevier Ltd. All rights reserved.
computers & security 82 (2019) 156–172 157
2.2. GANs
to the following result: Let us assume the training data set contains 100,000 differ-
ent IP addresses, 20,000 different destination ports and 3 differ-
sim(192.168.20.1, 192.168.20.2)>sim(192.168.20.1, 192.168.20.3), ent transport protocols. Then, the size of the one-hot vector is
120,003 and only one component is 1, while all others are 0.
(1)
Input and output layers comprise exactly the same number of
neurons which is equal to the size of the vocabulary. The out-
where sim(X, Y) is an arbitrary similarity function (e.g., co-
put layer uses a softmax classifier which indicates the proba-
sine similarity) between the IP addresses X and Y. IPVec
bilities for each value of the vocabulary that it appears in the
considers IP addresses 192.168.20.1 and 192.168.20.2 as more
same flow (context) as the input value to the neural network.
similar than 192.168.20.1 and 192.168.20.3 because the IP ad-
The softmax classifier (Buduma and Locascio, 2017) normal-
dresses 192.168.20.1 and 192.168.20.2 refer to the same targets
izes the output of all output neurons such that the sum of the
and use the same services. In contrast to that, the IP address
outputs is 1. The number of neurons in the hidden layer is
192.168.20.3 targets different servers and uses different ser-
much smaller than the number of neurons in the input layer.
vices (e.g., SSH-Traffic).
Table 2 – Generation of training samples in IP2Vec (Ring et al., 2017a). Input values are highlighted with cyan background
and expected output values are highlighted with gray background. The following abbreviations are used: src IP addr.
(source IP address), dst IP addr. (destination IP address), dst port (destination port), proto (transport protocol).
value, one training sample where the destination port is the in-
put value and one training sample where the transport protocol
3. Transformation approaches
is the input value.
This section describes three different methods to transform
In the training process, the neural network is fed with the
the heterogeneous flow-based network data such that they
input value and tries to predict the probabilities of the other
may be processed by Improved Wasserstein Generative Adver-
values from the vocabulary. For training samples, the probabil-
sarial Networks (WGAN-GP).
ity of the concrete output value is 1 and 0 for all other values.
In general, the output layer indicates the probabilities for each
value of the input vocabulary that it appears in the same flow 3.1. Preliminaries
as the given input value.
The network uses backpropagation for learning. This kind In general, we use in all three methods the same preprocess-
of training, however, could take a lot of time. Let us assume ing steps for the attributes date first seen, transport protocol, and
that the hidden layer comprises 32 neurons and the training TCP flags (see Table 1).
data set encompasses one million different IP addresses and Usually, the concrete timestamp is marginal for generating
ports. This results in 32 million weights in each layer of the realistic flow-based network data. Instead, many intrusion de-
network. Consequently, training such a large neural network tection systems derive additional information from the times-
is going to be slow. To make things worse, a huge amount of tamp like “is today a working day or weekend day” or “does the
training flows is required for adjusting that many weights and event occur during typical working hours or at night”. Therefore, we
for avoiding overfitting. Consequently, we have to update mil- do not generate timestamps. Instead, we create two attributes
lions of weights for millions of training samples. Therefore, weekday and daytime. To be precise, we extract the weekday
IP2Vec attempts to reduce the training time by using Nega- information of flows and generate seven binary attributes is-
tive Sampling in a similar way as Word2Vec does (Mikolov Monday, isTuesday and so on. Then, we interpret the daytime
et al., 2013a). In Negative Sampling, each training sample mod- as seconds [0,86400) and normalize them to the interval [0,1].
ifies only a small percentage of the weights, rather than all We transform the transport protocol (see #3 in Table 1) to three
of them. More details on Negative Sampling may be found in binary attributes, namely isTCP, isUDP, and isICMP. The same
Mikolov et al. (2013b). procedure is followed for TCP flags (see #10 in Table 1) which
are transformed to six binary attributes isURG, isACK, isPUS, is-
SYN, isRES, and isFIN.
2.3.4. Continuous representation of IP addresses
After the training phase, IP2Vec uses the weights of the hidden 3.2. Method 1 – numeric transformation
layer as m-dimensional vector representations of IP addresses.
That means, a 32 dimensional continuous representation of Although IP addresses and ports look like real numbers, they
each IP address, transport protocol and port is obtained if the hid- are actually categorical. Yet, the simplest approach is to inter-
den layer comprises 32 neurons. pret them as numbers after all and treat them as continuous
Intuition. Why does this approach work? If two IP addresses attributes. We refer to this method as Numeric-based Improved
refer to similar destination IP addresses, destination ports, and Wasserstein Generative Adversarial Networks (short: N-WGAN-
transport protocols, then the neural network needs to output GP). This method transforms each octet of an IP address to the
similar results for these IP addresses. One way for the neural interval [0,1], e.g., 192.168.220.14 is transformed to four con-
network to learn similar output values for different input val- tinuous attributes: (ip_1) 192/255 = 0.7529, (ip_2) 168/255 =
ues is to learn similar weights in the hidden layer of the net- 0.6588, (ip_3) 220/255 = 0.8627 and (ip_4) 14/255 = 0.0549. We
work. Consequently, if two IP addresses exhibit similar network do a similar procedure for ports by dividing them through the
behavior, IP2Vec attempts to learn similar weights (which are highest port number, e.g. the source port = 80 will be trans-
the vectors of the target feature space Rm ) in the hidden layer. formed to one continuous attribute 80/65535 = 0.00122.
computers & security 82 (2019) 156–172 161
Table 3 – Preprocessing of flow-based data. The first column provides the original flow attributes and examplarly values,
the other columns show the extracted features (column Attr.) and the corresponding values (column Value)) for each of
preprocessing method.
The attributes duration, bytes and packets (see attributes #2, 192.168.220.14 is transformed to 11000000 10101000 11011100
#8 and #9 in Table 1) are normalized to the interval [0,1]. Table 3 00001110. Ports are converted to their 16-bit binary repre-
provides examples and compares the three transformation sentation, e.g., the source port 80 is transformed to 00000000
methods. 01010000. For representing bytes and packets, we transform
them to a binary representation as well and limit their length
3.3. Method 2 – binary transformation to 32 bit. The attribute duration is normalized to the interval
[0,1]. Table 3 shows an example for this transformation proce-
The second method creates several binary attributes for IP dure.
addresses, ports, bytes, and packets. We refer to this method
as Binary-based Improved Wasserstein Generative Adversarial Net- 3.4. Method 3 – embedding transformation
works (short: B-WGAN-GP). Each octet of an IP address is
mapped to an 8-bit binary representation. Consequently, The third method transforms IP addresses, ports, duration, bytes,
IP addresses are transformed into 32 binary attributes, e.g., and packets into so-called embeddings in a m-dimensional fea-
162 computers & security 82 (2019) 156–172
Table 4 – Extended generation of training samples in IP2Vec. Input values are highlighted with cyan background and
expected output values are highlighted with gray background. The following abbrevations are used: src IP addr. (source IP
address), dst IP addr. (destination IP address), dst port (destintion port), proto (transport protocol).
input value output value
src IP addr. src port dst IP addr. dst port proto bytes packets duration → src IP addr. dst IP addr.
→ src IP addr. src port
→ src IP addr. proto
src IP addr. src port dst IP addr. dst port proto bytes packets duration → dst IP addr. src IP addr.
→ dst IP addr. dst port
→ dst IP addr. proto
src IP addr. src port dst IP addr. dst port proto bytes packets duration → src port src IP addr.
src IP addr. src port dst IP addr. dst port proto bytes packets duration → dst port dst IP addr.
src IP addr. src port dst IP addr. dst port proto bytes packets duration → bytes packets
→ bytes duration
src IP addr. src port dst IP addr. dst port proto bytes packets duration → packets bytes
→ packets duration
src IP addr. src port dst IP addr. dst port proto bytes packets duration → duration packets
ture space Rm following the ideas in Section 2.3. We refer to 4.1. Data set
this method as Embedding-based Improved Wasserstein Genera-
tive Adversarial Networks (short: E-WGAN-GP). We use the publicly available CIDDS-001 data set (Ring et al.,
E-WGAN-GP extends IP2Vec (see Section 2.3) for learning 2017b) which contains unidirectional flow-based network traf-
embeddings not only for IP addresses, ports, and transport pro- fic as well as detailed information about the networks and IP
tocols, but also for the attributes duration, bytes, and packets. To addresses within the data set. Fig. 4 shows an overview of the
that end, the input vocabulary of IP2Vec is extended by the val- emulated business environment of the CIDDS-001 data set. In
ues of the latter three attributes and additional training pairs essence, the CIDDS-001 data set contains four internal subnets
are extracted from each flow. Table 4 presents the extended which can be identified by their IP address ranges: a developer
training sample generation. subnet (dev) with exclusively Linux clients, an office subnet
Each flow produces 13 training samples each of which (off) with exclusively Windows clients, a management subnet
consists of an input and an expected output value. The in- (mgt) with mixed clients, and a server subnet (srv). Additional
put values are highlighted with cyan background in Table 4. knowledge facilitates the evaluation of the generated data (see
The expected output values for the corresponding input value Section 4.3).
are highlighted with gray background. Our adapted training The CIDDS-001 data set contains four weeks of network
sample generation extracts further training samples for the traffic. We consider only the network traffic which was cap-
attributes bytes, packets and duration. Further, we also cre- tured at the network device within the OpenStack environment
ate training pairs with the destination IP address as input. (see Fig. 4) and divide the network traffic in two parts: week1
Ring et al. (2017a) argue that it is not necessary to extract and week2-4. The first two weeks contain normal user behav-
training samples with destination IP addresses as input when ior and attacks, whereas week3 and week4 contain only normal
working on unidirectional flows. Yet, in this case, IP2Vec does user behavior and no attacks. We use this kind of splitting in
not learn meaningful representation for multi- and broadcast order to obtain a large training data set week2–4 for our gen-
IP addresses which only appear as destination IP addresses in erative models and simultaneously provide a reference data
flow-based network traffic. Table 3 shows the result of an ex- set week1 which contains normal and malicious network be-
emplary transformation. havior. Overall, week2–4 contains around 22 million flows and
E-WGAN-GP maps flows to embeddings which need to be week1 contains around 8.5 million flows. We consider only the
re-transformed to the original space after generation. To that TCP, UDP and ICMP flows and remove the 895 IGMP flows from
end, values are replaced by the closest embeddings generated the data set.
by IP2Vec. For instance, we calculate the cosine similarity be-
tween the generated output for the source IP address and all
existing IP address embeddings generated by IP2Vec. Then, we
4.2. Definition of a baseline
replace the output with the IP address which has the highest
similarity.
As baseline for our experiments, we build a generative
model which creates new flows based on the empiri-
cal probability distribution of the input data. The base-
4. Experiments line estimates the probability distribution for each attribute
by counting from the input data. New flows are gener-
This section provides an experimental evaluation of our three ated by drawing from the empirical probability distribu-
approaches N-WGAN-GP, B-WGAN-GP and E-WGAN-GP for syn- tions. Each attribute is drawn independently from other
thetic flow-based network traffic generation. attributes.
computers & security 82 (2019) 156–172 163
Fig. 4 – Overview of the simulated network environment from the CIDDS-001 data set (Ring et al., 2017b).
4.3. Evaluation methodology their generated data. Siska et al. (2010) and Iannucci et al.
(2017) build graphs and evaluate the diversity of the generated
Evaluation of generative models is challenging and an open traffic by comparing the number of nodes and edges between
research topic: Borji (2018) analyzed different evaluation mea- generated and real network traffic. Other flow-based network
sures for GANs. Images generated with GANs are often pre- traffic generators often focus on specific aspects in their eval-
sented to human judges and evaluated by visual comparison. uation, e.g. distributions of bytes or packets are compared with
Another well-known evaluation measure for images is the In- real NetFlow data in Sommers and Barford (2004) or Botta et al.
ception Score (IS) (Salimans et al., 2016). IS classifies gener- (2012).
ated images in 1000 different classes using the Inception Net Since there is no single widely accepted evaluation
v3 (Szegedy et al., 2016). IS, however, is not applicable in our methodology, we use several evaluation approaches to assess
scenario since the Inception Net v3 can only classify images, the quality of the generated data from different views. To eval-
but no flow-based network traffic. uate the diversity and distribution of the generated data, we
In the IT security domain, there is neither consensus on visualize attributes (see Section 4.4.2) and compute the Eu-
how to evaluate network traffic generators, nor a standard- clidean distances between generated and real flow-based net-
ized methodology (Molnár et al., 2013). Glasser and Lindauer work data (see Section 4.4.3). To evaluate the quality of the
(2013) discuss about the problem of evaluating synthetic data. content and relationships between attributes within a flow,
The authors conclude that synthetic data will only be real- we introduce domain knowledge checks (see Section 4.4.4) as a
istic in some limited and measurable dimensions in the ab- new evaluation method. This method is developed on the ba-
sence of a clear definition of realism. Therefore, Glasser and sic idea of Glasser and Lindauer (2013). While Glasser and Lin-
Lindauer use human feedback from domain experts to eval- dauer (2013) use feedback from human experts, the domain
uate the quality of generated data for anomaly detection. knowledge checks are automated test procedures on the basis
Stiborek et al. (2015) use an anomaly score to evaluate of domain knowledge.
164 computers & security 82 (2019) 156–172
Fig. 6 – Distribution of the attribute source port for the subnets. The rows show in order: (1) data sampled from real data
(week 1) and data generated by (2) baseline, (3) N-WGAN-GP, (4) E-WGAN-GP and (5) B-WGAN-GP.
We will now briefly discuss the conditional distribution N-WGAN-GP is incapable of representing the distributions
of source ports (Fig. 6). In the first row, we can clearly dis- properly. Note that almost exclusively flows with external
tinguish typical client-port (dev, mgt, off) and server-port (ext, source IP addresses are generated in the selected samples. In-
srv) distributions. As expected, the maximum likelihood base- depth analysis of the generated data suggests that numeric
line is not able to capture the differences of the distribu- representations fail to match the designated subnets exactly.
tions depending on the subnet of the source IP address and As nearly all generated data is assigned to the ext subnet, it
models a distribution which is a combination of all five sub- comes as no surprise that the distribution represents a com-
nets from the input data. In contrast, the B-WGAN-GP and E- bination of all five subnets from the input data for both source
WGAN-GP capture the conditional probability distributions for ports (Fig. 6) and destination IP addresses (Fig. 7).
the source port given the subnet of the source IP address very For the attribute destination IP address, the distribution is
well. a mixture of external and internal IP addresses for dev, mgt
166 computers & security 82 (2019) 156–172
Fig. 7 – Distribution of the attribute destination ip for the subnets. The rows show in order: (1) data sampled from real data
(week 1) and data generated by (2) baseline, (3) N-WGAN-GP, (4) E-WGAN-GP and (5) B-WGAN-GP.
and off subnets (see reference week week1). This matches addresses. E-WGAN-GP and B-WGAN-GP capture this property
the user roles, surfing on the internet (external) as well as very well while the baseline and N-WGAN-GP fail to capture this
accessing internal services (e.g., printers). For external sub- property.
nets, the destination IP address has to be within the internal IP
address range. Traffic from external sources to external tar-
4.4.3. Euclidean distances
gets does not run through the simulated network environ-
The second evaluation compares the distribution of the
ment of the CIDDS-001 data set. Consequently, there is no
generated and real flow-based network data in each at-
flow within the CIDDS-001 data set which has a source IP ad-
tribute independently. Therefore, we calculate Euclidean
dress and a destination IP address from the ext subnet. This
distances between the probability distributions of the gener-
fact can be seen for week1 in Fig. 7 where flows which have
ated data and the input flow-based network data (week2–4)
their origin in the ext subnet only address a small range of
in each attribute. We choose the Euclidean distance over
destination IP addresses which reflect the range of internal IP
the Kullback–Leibler divergence in order to avoid calculation
computers & security 82 (2019) 156–172 167
Table 5 – Euclidian distances between the training data (week2-4) and the generated flow-based network traffic in each
attribute.
problems where the probability of generated data is zero. dress (source IP address or destination IP address) of each
Table 5 highlights the results. We refrain from calculating the flow must be internal (starting with 192.168.XXX.XXX).
Euclidean distance for the attribute date first seen since exact • Test 3: If the flow describes normal user behavior and
matches of timestamps (considering seconds and millisec- the source port or destination port is 80 (HTTP) or 443
onds) do not make sense. At this point, we refer to Fig. 5 which (HTTPS), the transport protocol must be TCP.
analyzes the temporal distribution of the generated • Test 4: If the flow describes normal user behavior and
timestamps. the source port or destination port is 53 (DNS), the transport
Network traffic is subject to concept drift and exact repro- protocol must be UDP.
duction of probability distributions is not desirable. This fact • Test 5: If a multi- or broadcast IP address appears in the
can be seen in Table 5 where the Euclidean distances between flow, it must be the destination IP address.
the probability distributions from week1 and week2–4 of the • Test 6: If the flow represents a netbios message (destina-
CIDDS-001 data set are between 0.02 and 0.14. Consequently, tion port is 137 or 138), the source IP addresses must be in-
generated network traffic should have similar Euclidean dis- ternal (192.168.XXX.XXX) and the destination IP address
tances to the training data like the reference week week1. must be an internal broadcast (192.168.XXX.255).
However, it should be mentioned that there is no perfect dis- • Test 7: TCP, UDP and ICMP packets have a minimum and
tance value x which indicates the correct amount of concept maximum packet size. Therefore, we check the relation-
drift. The generated data of E-WGAN-GP tends to have simi- ship between bytes and packets in each flow according to
lar distances to the training data (week2–4) like the reference the following rule:
data set week1. Table 5 shows that the baseline has the low-
est distance to the training data in each attribute. The gener- 42 ∗ packets ≤ bytes ≤ 65.535 ∗ packets
ated data of N-WGAN-GP differs considerably from the train-
ing data set in some attributes. This is because N-WGAN-GP Table 6 shows the results of checking the generated data
often does not generate the exact values but a large number against these rules.
of new values. The binary approach B-WGAN-GP has small dis- The reference data set week1 achieves 100 percent in each
tances in most attributes (except for attribute duration). This test which is not surprising since the data is real flow-based
may be caused by the distribution of duration in the training network traffic which is captured in the same environment as
data as most flows in the training data set have very small the training data set. The baseline approach does not capture
values in this attribute. Further, the normalization of the dura- dependencies between flow attributes and achieves worse re-
tion to interval [0,1] entails that almost all flows have very low sults. This can be especially observed in Tests 1, 4, and 6.
values in this attribute. N-WGAN-GP and B-WGAN-GP tend to Since multi- and broadcast IP addresses appear only in the at-
generate the smallest possible duration (0.000 seconds) for all tribute destination IP address, the baseline cannot fail Test 5 and
flows. achieves 100 percent.
For our generative models, E-WGAN-GP achieves the best
4.4.4. Domain knowledge checks results on average. The usage of embeddings leads to more
We use domain knowledge checks to evaluate the intrinsic qual- meaningful similarities within categorical attributes and fa-
ity of the generated data. To that end, we derive several prop- cilitates the learning of interrelationships. Embeddings, how-
erties that generated flow-based network data need to fulfill ever, also reduce the possible resulting space since no new
in order to be realistic. We use the following seven heuristics values can be generated. B-WGAN-GP generates flows which
as sanity checks: achieve high accuracy in Tests 1–4. However, this approach
shows weaknesses in Tests 5 and 6 where several internal re-
lationships must be considered. The numerical approach N-
• Test 1: If the transport protocol is UDP, then the flow must
WGAN-GP has the lowest accuracy in the tests. In particular,
not have any TCP flags.
Test 4 shows that normalization of source port or destination port
• Test 2: The CIDDS-001 data set is captured within an em-
to a single continuous attribute is inappropriate. Straightfor-
ulated company network. Therefore, at least one IP ad-
168 computers & security 82 (2019) 156–172
Table 6 – Results of the domain knowledge checks in percentage. Higher values indicate better results.
ward mapping of 216 different port values to one continuous for representation. These two aspects support B-WGAN-GP
attribute leads to too many values for a good reconstruction. in generating better categorical values of a flow as can be
In contrast to that, the binary representation of B-WGAN-GP observed in the results of the domain knowledge checks (see
leads to better results in that test. e.g. Test 2 and Test 4 in Table 6). Further, Figs. 6 and 7 in-
dicate that B-WGAN-GP captures the internal structure of
the traffic very well even though it is less restricted than E-
5. Discussion WGAN-GP with respect to the treatment of previously unseen
values.
Flow-based network traffic consists of heterogeneous data E-WGAN-GP learns embeddings for IP addresses, ports, bytes,
and GANs can only process continuous input values. To solve packets, and duration. These embeddings are continuous vec-
this problem, we analyze three methods to transform categor- tor representations and take contextual information into ac-
ical to continuous attributes. The advantages and disadvan- count. As a consequence, the generation of flows is less error-
tages of these approaches are discussed in the following. prone as small variations in the embedding space generally do
N-WGAN-GP is a straightforward numeric method but leads not change the outcome in input space much. For instance,
to unwanted similarities between categorical values which if a GAN introduces a small error in IP address generation, it
are not similar considering real data. For instance, this trans- could find the embedding of the IP address 192.168.220.5 as
formation approach assesses the IP addresses 192.168.220.10 nearest neighbor instead of the embedding of the expected
and 191.168.220.10 as highly similar although the first IP IP address 192.168.220.13. Since both IP addresses are internal
address 192.168.220.10 is private and the second IP address clients, the error has nearly no effect. As a consequence, E-
191.168.220.10 is public. Hence, the two addresses should be WGAN-GP achieves the best results of the generative models in
ranked as fairly dissimilar. Obviously, even small errors in the the evaluation. Yet, this approach (in contrast to N-WGAN-GP
generation process can cause significant errors. This effect and B-WGAN-GP) cannot generate previously unseen values
can be observed in Test 2 (see Table 6) where N-WGAN-GP has due to the embedding translation. This is not a problem for
problems with the generation of private IP addresses. Instead, the attributes bytes, packets and duration. Given enough train-
this approach often generates non-private IP addresses such ing data, embeddings for all (important) values of bytes, du-
as 191.168.X.X or 192.167.X.X. In image generation, the origi- ration and packets are available. For example, consider the at-
nal application domain of GANs, small errors do not have se- tribute bytes. We assume that the available embedding val-
rious consequences. A brightness 191 instead of 192 in a gen- ues b1 , b2 , b3 , . . . , bk−1 , bk sufficiently cover the possible value
erated pixel has nearly no effect on the image and the error range of the attribute bytes. As specific byte-values have no
is (normally) not visible for human eyes. Further, N-WGAN-GP particular meaning, we are only interested in the magnitude
normalizes the numeric attributes bytes and packets to the in- of the attribute. Therefore, non existing values bx can be re-
terval [0,1]. The generated data are then de-normalized using placed with available embedding values without adversely af-
the original training data. Here, we can observe that real flows fecting the meaning.
often have typical byte sizes like 66 bytes which are also not The situation may be different for IP addresses and ports.
exactly matched. This results in higher Euclidean distances IP addresses represent hosts with a distinct complex network
in these attributes (see Table 5). Overall, the first method N- behavior, for instance as a web server, printer, or Linux client.
GAN-WP does not seem to be suitable for generating realistic Generating new IP addresses goes along with the invention of a
flow-based network traffic. new host with new network behavior. To answer the question
B-WGAN-GP extracts binary attributes from categorical at- whether the generation of new IP addresses is necessary (or de-
tributes and converts numerical attributes to their binary rep- sired), the purpose needs to be considered in which the gener-
resentation. Using this transformation, additional structural ated data shall be used later. If the training set comprises more
information (e.g., subnet information) of IP addresses can be than 10,000 or 100,000 different IP addresses, there is probably
maintained. Further, B-WGAN-GP assigns larger value ranges no need to generate new IP addresses for an IDS evaluation data
to categorical values in the transformed space than N-WGAN- set. However, this does not hold generally. Instead, one should
GP. While N-WGAN-GP uses a single continuous attribute to ask the following two questions: (1) are there enough different
represent a source port, B-WGAN-GP uses 16 binary attributes IP addresses in the training data set and (2) is there a need to
computers & security 82 (2019) 156–172 169
generate previously unseen IP addresses? If previously unseen data sets. Instead, a good network traffic generator for our pur-
IP addresses are required, E-WGAN-GP is not suitable as trans- pose should be able to generate new synthetic flow-based net-
formation method, otherwise E-WGAN-GP will generate better work traffic.
flows than all other approaches.
The situation for ports is similar to IP addresses. Generally, Category (II) Maximum Throughput Generators usually aim to
there are 65536 different ports and most of these ports should test end-to-end network performance (Molnár et al., 2013).
appear in the training data set. Generating new port values is Iperf (Jon et al.,) is such a generator and can be used for testing
also associated with generating new behavior. If the training bandwidth, delay jitter, and loss ratio characteristics. Conse-
data set comprises SSH connections (port 22) and HTTP con- quently, methods from this category primarily aim at evaluat-
nections (port 80), but no FTP connections (port 20 and 21), ing network bandwidth performance.
generators are not able to produce realistic FTP connections
if they have never seen such connections. Since the network Category (III) Attack Generators use real network traffic as in-
behavior of FTP differs greatly from SSH and HTTP, it does not put and combine it with synthetically created attacks. FLAME
make much sense to generate unseen service ports. However, (Brauckhoff et al., 2008) is a generator for malicious network
the situation is different for typical client ports. traffic. The authors use rule-based approaches to inject e.g.
Generally, GANs capture the implicit conditional probabil- port scan attacks or denial of service attacks. Vasilomanolakis
ity distributions very well, given that a proper data representa- et al. (2016) present ID2T, a similar approach which combines
tion is chosen which is the case for E-WGAN-GP and B-WGAN- real network traffic with synthetically created malicious net-
GP (see Figs. 6 and 7). While the visual differences between work traffic. For creating malicious network traffic, the au-
binary and embedded data representations are subtle, the do- thors use rule-based scripts or manipulate parameters of the
main knowledge checks show larger quality differences. Over- input network traffic. Sperotto et al. (2009) analyze ssh brute
all, this analysis suggests that E-WGAN-GP and B-WGAN-GP force attacks on flow level and use a Hidden Markov Model
are able to generate good flow-based network traffic. While to model the characteristics of them. However, their model
E-WGAN-GP achieves better evaluation results, B-WGAN-GP is generates only the number of bytes, packets and flows during a
not limited in the value range and is able to generate previ- typical attack scenario and does not generate complete flow-
ously unseen values. based data.
network traffic and is not limited to generating only malicious resolution in image processing. Despite they are combining
network traffic like category (III). While Siska et al. (2010) and generative adversarial networks with the generation of fine-
Iannucci et al. (2017) use domain knowledge to generate flows grained network traffic, their approach is very different to ours
by defining conditional dependencies between flow attributes, since they only work with traffic as aggregated continuous at-
we use GAN-based approaches which learn all dependencies tribute, not with network data at flow-level and rely on coarse-
between the flow attributes inherently. grained information as input.
As can be seen in this section, GANs are already used in the
6.2. GANs domain IT security and prove their general suitability. How-
ever, existing works are only applied to specific application
This section analyses how GANs were recently introduced in scenarios and consider only continuous attributes. In contrast
the domain IT security. A more general discussion about at- to that, the proposed approach aims to generate network data
tacks and defenses for deep learning against adversarial ex- in standard flow-based NetFlow format and considers all typi-
amples may be found in Yuan et al. (2017). Rigaki and Garcia cal categorical attributes like IP addresses or port numbers.
(2016) use a GAN based approach to modify malware commu-
nication in order to avoid detection. The authors evaluate their
method using an Intrusion Prevention System (IPS) which is
based on a Markov model. The IPS considers the bytes, dura- 7. Summary
tion and time-delta of flows for determining malicious network
traffic. Therefore, Rigaki and Garcia use a GAN which learns Labeled flow-based data sets are necessary for evaluating and
to imitate Facebook chat traffic characteristics based on these comparing anomaly-based intrusion detection methods. Eval-
flow attributes. For capturing the time-delta of the flows, the uation data sets like DARPA 98 and KDD Cup 99 cover several
generator and discriminator of the GAN are Recurrent Neu- attack scenarios as well as normal user behavior. These data
ronal Networks (RNNs). After the training phase, the authors sets, however, were captured at some point in time such that
use the GAN to generate legitimate Facebook traffic character- concept drift of network traffic causes static data sets to be-
istics and adapt the malware to match these traffic patterns. come obsolete sooner or later.
Following this approach, the malware is able to successfully In this paper, we proposed three synthetic flow-based net-
bypass the IPS. Hu and Tan (2017) present a GAN based ap- work traffic generators which are based on Improved Wasser-
proach named MalGAN in order to generate synthetic mal- stein GANs (WGAN-GP) (Gulrajani et al., 2017) using the two
ware examples which are able to bypass anomaly-based de- time scale update rule from (Heusel et al., 2017). Our genera-
tection methods. Malware examples are represented as 160- tors are initialized with flow-based network traffic and then
dimensional binary attributes. generate new synthetic flow-based network traffic. In con-
Anderson et al. (2016) developed a character-based GAN trast to previous high-level generators, our GAN-based ap-
to mimic domain generation algorithms (DGA) as used by proaches learn all internal dependencies between attributes
malware to contact command and control servers. The au- inherently and no additional knowledge has to be modeled.
thors train an auto-encoder to generate domain names and re- Flow-based network traffic consists of heterogeneous data,
assembled encoder and decoder to an adversarial setting fool- but GANs can only process continuous input data. To over-
ing DGA-detection classifiers. DeepDGA generates domain- come this challenge, we proposed three different methods
names, which are categorical data, however as the domain- to handle flow-based network data. In the first approach N-
name is the only attribute generated, their setting is hardly WGAN-GP, we interpreted IP addresses and ports as continuous
comparable to our flow-based network data generation task. input values and normalized numeric attributes like bytes and
Yin et al. (2018) propose Bot-GAN, a framework which gen- packets to the interval [0,1]. In the second approach B-WGAN-
erates synthetic network data in order to improve botnet de- GP, we created binary attributes from categorical and numer-
tection methods. However, their framework does not consider ical attributes. For instance, we converted ports to their 16-bit
the generation of categorical attributes like IP addresses and binary representation and extracted 16 binary attributes. B-
ports which is one of the key contributions of our work. WGAN-GP is able to maintain more information (e.g., subnet
Zheng et al. (2018) use a generative adversarial network information of IP addresses) from the categorical input data.
based approach for fraud detection in bank transfers. To be The third approach E-WGAN-GP learns meaningful continu-
precise, the authors use a deep denoising autoencoder and ous representations of categorical attributes like IP addresses
two Gaussian Mixture Models (GMM). The encoder and one using IP2Vec (Ring et al., 2017a). The preprocessing of E-WGAN-
GMM act as discriminator and the decoder act as generator. GP is inspired from the text mining domain which also has to
The second GMM classifies the bank transfers in combination deal with non-continuous input values. Then, we generated
with a threshold into the classes normal or fraud. Zheng et al. new flow-based network traffic based on the CIDDS-001 data
achieve good results with this approach and are able to beat set (Ring et al., 2017b) in an experimental evaluation. Our ex-
non GAN-based approaches. However, their input data differ periments indicate that especially E-WGAN-GP is able to gen-
significantly from flow-based data and consist primarily of erate realistic data which achieves good evaluation results. B-
continuous attributes like amount of transferred money, balance WGAN-GP achieves similarly good results and is able to create
of the account or frequency of transfers. new (unseen) values in contrast to E-WGAN-GP. The quality
For analysis of mobile traffic Zhang et al. (2017) propose of network data generated by N-WGAN-GP is less convincing,
ZipNet-GAN, a GAN-based approach for fine-grained pattern which indicates that straight forward numeric transformation
extraction from coarse-grained network data, similar to super- is not appropriate.
computers & security 82 (2019) 156–172 171
Our research indicates that GANs are well suited for gen- reproducible network research. ACM; 2003. p.
erating flow-based network traffic. We plan to extend our ap- 57–64.
proach in order to generate sequences of flows instead of indi- Garcia S, Grill M, Stiborek J, Zunino A. An empirical comparison
of botnet detection methods. Comput Secur 2014;45:100–23.
vidual flows. Therefore, we want to evaluate further network
Glasser J, Lindauer B. Bridging the gap: a pragmatic approach to
structures (e.g., LSTMs or CNN) which are able to learn tem-
generating insider threat data. In: Proceedings of the security
poral relationships of flow sequences. In addition, we want to and privacy workshops (SPW). IEEE; 2013. p. 98–104.
work on the development of further evaluation methods. Goodfellow I. NIPS 2016 tutorial: Generative Adversarial
Networks. arXiv preprint 2016. arXiv: 1701.00160.
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D,
Acknowledgments Ozair S, Courville A, Bengio Y. Generative adversarial nets. In:
Proceedings of the advances in neural information processing
systems (NIPS); 2014. p. 2672–80.
M.R. was supported by the BayWISS Consortium Digitization. Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC.
We gratefully acknowledge the support of NVIDIA Corporation Improved training of wasserstein GAN. In: Proceedings of the
with the donation of the Titan Xp GPU used for this research. advances in neural information processing systems (NIPS);
2017. p. 5769–79.
Han J, Pei J, Kamber M. Data mining: concepts and techniques.
Supplementary material 3rd. Elsevier; 2011.
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S.
GANs Trained by a Two Time-Scale Update Rule Converge to a
Supplementary material associated with this article can be Local Nash Equilibrium. In: Advances in Neural Information
found, in the online version, at doi:10.1016/j.cose.2018.12.012. Processing Systems (NIPS); 2017. p. 6629–40.
Hu W, Tan Y. Generating Adversarial Malware Examples for
R E F E R E N C E S
Black-Box Attacks Based on GAN. 2017. arXiv:1702.05983.
Iannucci S, Kholidy HA, Ghimire AD, Jia R, Abdelwahed S,
Banicescu I. A comparison of graph-based synthetic data
generators for benchmarking next-generation intrusion
Anderson HS, Woodbridge J, Filar B. Deepdga: adversarially-tuned detection systems. In: Proceedings of the IEEE International
domain generation and detection. In: Proceedings of the 2016 conference on cluster computing (CLUSTER). IEEE; 2017.
ACM workshop on artificial intelligence and security. ACM; p. 278–89.
2016. p. 13–21. Isola P, Zhu JY, Zhou T, Efros AA. Image-to-image translation
Arjovsky M, Chintala S, Bottou L. Wasserstein generative with conditional adversarial networks. In: Proceedings of the
adversarial networks. In: Proceedings of the international IEEE conference on computer vision and pattern recognition
conference on machine learning (ICML); 2017. p. 214–23. (CVPR). IEEE; 2017. p. 5967–76.
Beigi EB, Jazi HH, Stakhanova N, Ghorbani AA. Towards effective Jon D, Seth E, Bruce MA, Jeff P, Kaustubh P. Iperf: the TCP/UDP
feature selection in machine learning-based botnet detection bandwidth measurement tool. (Date last accessed
approaches. In: Proceedings of the IEEE conference on 14-June-2018).
communications and network security. IEEE; 2014. p. 247–55. Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A,
Borji A. Pros and cons of GAN evaluation measures. arXiv Aitken A, Tejani A, Totz J, Wang Z, Shi W. Photo-realistic single
preprint 2018. arXiv: 1802.03446. image super-resolution using a generative adversarial
Botta A, Dainotti A, Pescapé A. A tool for the generation of network. In: Proceedings of the IEEE conference on computer
realistic network workload for emerging networking vision and pattern recognition (CVPR). IEEE; 2017. p. 105–14.
scenarios. Comput Netw 2012;56(15):3531–47. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word
Brauckhoff D, Wagner A, May M. FLAME: a flow-level anomaly representations in vector space. 2013a. arXiv:1301.3781.
modeling engine. In: Proceedings of the workshop on cyber Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed
security experimentation and test (CSET). USENIX representations of words and phrases and their
Association; 2008. p. 1:1–1:6. compositionality. In: Proceedings of the advances in neural
Buczak AL, Guven E. A survey of data mining and machine information processing systems (NIPS); 2013b. p. 3111–19.
learning methods for cyber security intrusion detection. IEEE Molnár S, Megyesi P, Szabo G. How to validate traffic generators?.
Commun Surv Tutor 2016;18(2):1153–76. In: Proceedings of the IEEE international conference on
Buduma N, Locascio N. Fundamentals of deep learning: designing communications workshops (ICC). IEEE; 2013. p. 1340–4.
next-generation machine intelligence algorithms. O’Reilly Najafabadi MM, Khoshgoftaar TM, Kemp C, Seliya N, Zuech R.
Media; 2017. Machine learning for detecting brute force attacks at the
Cao VL, Nicolau M, McDermott J. A hybrid autoencoder and network level. In: Proceedings of the IEEE international
density estimation model for anomaly detection. In: conference on bioinformatics and bioengineering (BIBE). IEEE;
Proceedings of the international conference on parallel 2014. p. 379–85.
problem solving from nature. Springer; 2016. p. 717–26. Najafabadi MM, Khoshgoftaar TM, Napolitano A, Wheelus C.
Catania CA, Garino CG. Automatic network intrusion detection: RUDY attack: detection at the network level and its important
current techniques and open issues. Comput Electr Eng features. In: Proceedings of the international Florida artificial
2012;38(5):1062–72. intelligence research society conference; 2016. p. 288–93.
Claise B. Cisco systems netflow services export version 9. RFC Preuer K, Renz P, Unterthiner T, Hochreiter S, Klambauer G.
3954; 2004. Fréchet ChemblNet distance: a metric for generative models
Claise B. Specification of the IP flow information export (IPFIX) for molecules. CoRR 2018 abs/1803.09518.
protocol for the exchange of IP traffic flow information. RFC Radford A, Metz L, Chintala S. Unsupervised representation
5101; 2008. learning with deep convolutional generative adversarial
Feng Wc, Goel A, Bezzaz A, Feng WC, Walpole J. TCPivo: a networks. Proceedings of the international conference on
high-performance packet replay engine. In: Proceedings of the learning representations (ICLR), 2016.
ACM workshop on models, methods and tools for
172 computers & security 82 (2019) 156–172
Rigaki M, Garcia S. Bringing a GAN to a knife-fight: adapting Yin C, Zhu Y, Liu S, Fei J, Zhang H. An enhancing framework for
malware communication to avoid detection. Proceedings of botnet detection using generative adversarial networks. In:
the first deep learning and security workshop, San Francisco, Proceedings of the international conference on artificial
USA, 2016. intelligence and big data (ICAIBD); 2018. p. 228–34.
Ring M, Landes D, Dallmann A, Hotho A. IP2Vec: learning doi:10.1109/ICAIBD.2018.8396200.
similarities between IP adresses. In: Proceedings of the Yu L, Zhang W, Wang J, Yu Y. SeqGAN: sequence generative
workshop on data mining for cyber security (DMCS), adversarial nets with policy gradient. In: Proceedings of the
international conference on data mining. IEEE; 2017a. conference on artificial intelligence (AAAI). AAAI Press; 2017.
p. 657–66. p. 2852–8.
Ring M, Landes D, Hotho A. Detection of slow port scans in Yuan X, He P, Zhu Q, Bhat RR, Li X. Adversarial examples: attacks
flow-based network traffic. PLOS ONE 2018;13(9):1–18. and defenses for deep learning. arXiv preprint 2017. arXiv:
doi:10.1371/journal.pone.0204507. 1712.07107.
Ring M, Wunderlich S, Grüdl D, Landes D, Hotho A. Flow-based Zhang C, Ouyang X, Patras P. Zipnet-gan: inferring fine-grained
benchmark data sets for intrusion detection. In: Proceedings mobile traffic patterns via a generative adversarial neural
of the European conference on cyber warfare and security network. In: Proceedings of the thirteenth international
(ECCWS). ACPI; 2017b. p. 361–9. conference on emerging networking experiments and
Salakhutdinov R, Larochelle H. Efficient learning of deep technologies. ACM; 2017. p. 363–75.
Boltzmann machines. In: Proceedings of the international Zheng YJ, Zhou XH, Sheng WG, Xue Y, Chen SY. Generative
conference on artificial intelligence and statistics; 2010. adversarial network based telecom fraud detection at the
p. 693–700. receiving bank. Neural Netw. 2018;102:78–86.
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A,
Chen X. Improved techniques for training GANs. In: Markus Ring is a research associate at Coburg University of Ap-
Proceedings of the advances in neural information processing plied Sciences and Arts where he is working on his doctoral thesis.
systems (NIPS); 2016. p. 2234–42. He previously studied Informatics at Coburg and worked as a net-
Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA. Toward developing work administrator at T-Systems Enterprise GmbH. His research
a systematic approach to generate benchmark datasets for interests include the generation of realistic flow-based network
intrusion detection. Comput Secur 2012;31(3):357–74. data and the application of data-mining methods for intrusion de-
Siska P, Stoecklin MP, Kind A, Braun T. A flow trace generator tection.
using graph-based traffic classification techniques. In:
Daniel Schlör is a Ph.D. student at the Department of Computer
Proceedings of the international wireless communications
Science at University of Würzburg, Germany. He is also a mem-
and mobile computing conference (IWCMC). ACM; 2010.
ber of the junior research group Computational Literary Stylistics
p. 457–62. doi:10.1145/1815396.1815503.
at the Department of Computational Philology. His research inter-
Sommer R, Paxson V. Outside the closed world: on using machine
ests include data and text mining, machine learning and general
learning for network intrusion detection. In: Proceedings of
applications of computer science methods in the field of digital
the IEEE symposium on security and privacy. IEEE; 2010.
humanities.
p. 305–16.
Sommers J, Barford P. Self-configuring network traffic generation.
Dieter Landes is a professor of software engineering and database
In: Proceedings of the ACM internet measurement conference
systems at Coburg University of Applied Sciences and Arts. He
(ACM IMC). ACM; 2004. p. 68–81.
holds a diploma in informatics from the University of Erlangen-
Sperotto A, Sadre R, de Boer PT, Pras A. Hidden Markov model
Nuremberg, and a doctorate in Knowledge-Based Systems from
modeling of SSH brute-force attacks. In: Proceedings of the
the University of Karlsruhe. After several years working in indus-
international workshop on distributed systems: operations
try – including time with Daimler Research – he joined Coburg
and management. Springer; 2009. p. 164–76.
in 1999. He has published 70 papers in journals, books, and at
Stevanovic M, Pedersen JM. An analysis of network traffic
conferences. His research interests include requirements engi-
classification for botnet detection. In: Proceedings of the IEEE
neering, software-engineering education, learning analytics, and
international conference on cyber situational awareness, data
data mining.
analytics and assessment (CyberSA). IEEE; 2015. p. 1–8.
Stiborek J, Rehák M, Pevnỳ T. Towards scalable network host Andreas Hotho is professor at the University of Wü rzburg. He
simulation. In: Proceedings of the international workshop on holds a Ph.D. from the University of Karlsruhe, where he worked
agents and cybersecurity; 2015. p. 27–35. from 1999 to 2004 at the Institute of Applied Informatics and For-
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking mal Description Methods (AIFB) in the areas of text, data, and
the inception architecture for computer vision. In: web mining, semantic web and information retrieval. From 2004
Proceedings of the IEEE conference on computer vision and to 2009 he was a senior researcher at the University of Kassel. He
pattern recognition; 2016. p. 2818–26. joined the L3S in 2011. Since 2005 he has been leading the de-
Tran QA, Jiang F, Hu J. A real-time netflow-based intrusion velopment of the social bookmark and publication sharing plat-
detection system with improved BBNN and high-frequency form BibSonomy. Andreas Hotho has published over 100 articles in
field programmable gate arrays. In: Proceedings of the journals and at conferences, co-edited several special issues and
international conference on trust, security and privacy in books, and co-chaired several workshops. He worked as a reviewer
computing and communications. IEEE; 2012. p. 201–8. for journals and was a member of many international conferences
Turner A. Tcpreplay. (Date last accessed 14-June-2018). and workshops program committees. His general research area is
Vasilomanolakis E, Cordero CG, Milanov N, Mühlhäuser M. on Data Science with focuses on the combination of data mining,
Towards the creation of synthetic, yet realistic, intrusion information retrieval and the semantic web. More specifically, he
detection datasets. In: Proceedings of the IEEE network is interested in the analysis of social media systems, especially
operations and management symposium (NOMS). IEEE; 2016. tagging, sensor data emerging trough ubiquitous and social ac-
p. 1209–14. tivities, security, and the application of Text Mining on historic
Wagner C, François J, Engel T, et al. Machine learning approach literature.
for IP-flow record anomaly detection. In: Proceedings of the
international conference on research in networking. Springer;
2011. p. 28–39.