3 540 45798 4

L e c tu re N o te s in C o m p u te r S c ie n c e 2 4 5 9
E d ite d b y G . G o o s , J . H a rtm a n is , a n d J . v a n L e e u w e n
3B e r lin
H e id e lb e rg
N e w Y o rk
B a rc e lo n a
H o n g K o n g
L o n d o n
M ila n
P a r is
T o k y o
M a ria C a rla C a lz a ro s s a S a lv a to r e T u c c i ( E d s .)
P e rfo rm a n c e E v a lu a tio n
o f C o m p le x S y s te m s :
T e c h n iq u e s a n d T o o ls
P e rfo rm a n c e 2 0 0 2 T u to ria l L e c tu re s
13
S e rie s E d ito rs
G e r h a r d G o o s , K a r ls r u h e U n iv e r s ity , G e r m a n y
J u r is H a r tm a n is , C o r n e ll U n iv e r s ity , N Y , U S A
J a n v a n L e e u w e n , U tr e c h t U n iv e r s ity , T h e N e th e r la n d s
V o lu m e E d ito rs
M a ria C a rla C a lz a ro s s a
U n iv e r s ità d i P a v ia , D ip a r tim e n to d i In fo rm a tic a e S is te m is tic a
v ia F e rra ta 1 , 2 7 1 0 0 P a v ia , Ita ly
E - m a il: m c c @ a lic e .u n ip v .it
S a lv a to re T u c c i
U ffi c io p e r l’In fo rm a tic a , la T e le m a tic a e la S ta tis tic a
P re s id e n z a d e l C o n s ig lio d e i M in is tri
v ia d e lla S ta m p e ria 8 , 0 0 1 8 7 R o m a , Ita ly
E - m a il: tu c c i@ to r v e rg a ta .it
C a ta lo g in g -in -P u b lic a tio n D a ta a p p lie d fo r
D ie D e u ts c h e B ib lio th e k - C IP -E in h e its a u fn a h m e
P e rfo rm a n c e e v a lu a tio n o f c o m p le x s y te m s : te c h n iq u e s a n d to o ls ;
p e rfo rm a n c e 2 0 0 2 tu to ria l le c tu r e s / M a ria C a rla C a lz a ro s s a ; S a lv a to re
T u c c i (e d .) . - B e r lin ; H e id e lb e rg ; N e w Y o rk ; H o n g K o n g ; L o n d o n ; M ila n ;
P a ris ; T o k y o : S p rin g e r, 2 0 0 2
(L e c tu re n o te s in c o m p u te r s c ie n c e ; V o l. 2 4 5 9 )
IS B N 3 -5 4 0 -4 4 2 5 2 -9
C R S u b je c t C la s s ifi c a tio n ( 1 9 9 8 ) : C .4 , C .2 , D .2 .8 , D .4 , F .1 , H .4
IS S N 0 3 0 2 -9 7 4 3
IS B N 3 -5 4 0 -4 4 2 5 2 -9 S p rin g e r-V e rla g B e rlin H e id e lb e rg N e w Y o rk
T h is w o rk is s u b je c t to c o p y rig h t. A ll rig h ts a re re s e rv e d , w h e th e r th e w h o le o r p a rt o f th e m a te ria l is

c o n c e rn e d , s p e c ifi c a lly th e rig h ts o f tra n s la tio n , re p rin tin g , re -u s e o f illu s tra tio n s , re c ita tio n , b ro a d c a s tin g ,
re p ro d u c tio n o n m ic ro fi lm s o r in a n y o th e r w a y , a n d s to ra g e in d a ta b a n k s . D u p lic a tio n o f th is p u b lic a tio n
o r p a rts th e re o f is p e rm itte d o n ly u n d e r th e p ro v is io n s o f th e G e rm a n C o p y rig h t L a w o f S e p te m b e r 9 , 1 9 6 5 ,
in its c u rre n t v e rs io n , a n d p e rm is s io n fo r u s e m u s t a lw a y s b e o b ta in e d fro m S p rin g e r-V e rla g . V io la tio n s a re
lia b le fo r p ro s e c u tio n u n d e r th e G e rm a n C o p y rig h t L a w .
S p rin g e r-V e rla g B e rlin H e id e lb e rg N e w Y o rk ,
a m e m b e r o f B e rte ls m a n n S p rin g e r S c ie n c e + B u s in e s s M e d ia G m b H
h ttp ://w w w .s p r in g e r.d e
© S p r in g e r -V e r la g B e r lin H e id e lb e r g 2 0 0 2
P rin te d in G e rm a n y
T y p e s e ttin g : C a m e r a - r e a d y b y a u th o r, d a ta c o n v e r s io n b y P T P - B e r lin , S te fa n S o s s n a e .K .
P rin te d o n a c id -fre e p a p e r S P IN : 1 0 8 7 1 1 9 4 0 6 /3 1 4 2 5 4 3 2 1 0
Preface
The fast evolution and the increased pervasiveness of computers and commu-
nication networks have led to the development of a large variety of complex
applications and services which have become an integral part of our daily lives.
Modern society widely relies on information technologies. Hence, the Quality
of Service, that is, the efficiency, availability, reliability, and security of these
technologies, is an essential requirement for the proper functioning of modern
society.
In this scenario, performance evaluation plays a central role. Performance
evaluation has to assess and predict the performance of hardware and software
systems, and to identify and prevent their current and future performance bott-
lenecks.
In the past thirty years, many performance evaluation techniques and tools
have been developed and successfully applied in studies dealing with the con-
figuration and capacity planning of existing systems and with the design and
development of new systems. Recently, performance evaluation techniques have
evolved to cope with the increased complexity of the current systems and their
workloads. Many of the classical techniques have been revisited in light of the
recent technological advances, and novel techniques, methods, and tools have
been developed.
This book is organized around a set of survey papers which provide a com-
prehensive overview of the theories, techniques, and tools for performance and
reliability evaluation of current and new emerging technologies. The papers, by
leading international experts in the field of performance evaluation, are based on
the tutorials presented at the IFIP WG 7.3 International Symposium on Compu-
ter Modeling, Measurement, and Evaluation (Performance 2002) held in Rome
on September 23–27, 2002.
The papers address the state of the art of the theoretical and methodological
advances in the area of performance and reliability evaluation as well as new
perspectives in the major application domains. A broad spectrum of topics is
covered in this book. Modeling and verification formalisms, solution methods,
workload characterization, and benchmarking are addressed from a methodo-
logical point of view. Applications of performance and reliability techniques to
various domains, such as, hardware and software architectures, wired and wi-
reless networks, Grid environments, Web services, real–time voice and video
applications, are also examined.
This book is intended to serve as a reference for students, scientists, and en-
gineers working in the areas of performance and reliability evaluation, hardware
and software design, and capacity planning.
VI Preface
Finally, as editors of the book, we would like to thank all authors for their
valuable contributions and their effort and cooperation in the preparation of
their manuscripts.
July 2002 Maria Carla Calzarossa

Salvatore Tucci
Table of Contents
G-Networks: Multiple Classes of Positive Customers, Signals, and

Product Form Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Erol Gelenbe
Spectral Expansion Solutions for Markov-Modulated Queues . . . . . . . . . . . . 17

Isi Mitrani
M/G/1-Type Markov Processes: A Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Alma Riska, Evgenia Smirni
An Algorithmic Approach to Stochastic Bounds . . . . . . . . . . . . . . . . . . . . . . . 64

J.M. Fourneau, N. Pekergin
Dynamic Scheduling via Polymatroid Optimization . . . . . . . . . . . . . . . . . . . . . 89

David D. Yao
Workload Modeling for Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 114

Dror G. Feitelson
Capacity Planning for Web Services (Techniques and Methodology) . . . . . . 142

Virgilio A.F. Almeida
End-to-End Performance of Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

Paolo Cremonesi, Giuseppe Serazzi
Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Reinhold Weicker
Benchmarking Models and Tools for Distributed Web-Server Systems . . . . . 208

Mauro Andreolini, Valeria Cardellini, Michele Colajanni
Stochastic Process Algebra: From an Algebraic Formalism to an

Architectural Description Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
Marco Bernardo, Lorenzo Donatiello, Paolo Ciancarini
Automated Performance and Dependability Evaluation Using Model

Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Christel Baier, Boudewijn Haverkort, Holger Hermanns,
Joost-Pieter Katoen
Measurement-Based Analysis of System Dependability Using Fault

Injection and Field Failure Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
Ravishankar K. Iyer, Zbigniew Kalbarczyk
VIII Table of Contents
Software Reliability and Rejuvenation: Modeling and Analysis . . . . . . . . . . . 318

Kishor S. Trivedi, Kalyanaraman Vaidyanathan
Performance Validation of Mobile Software Architectures . . . . . . . . . . . . . . . 346

Vincenzo Grassi, Vittorio Cortellessa, Raffaela Mirandola
Performance Issues of Multimedia Applications . . . . . . . . . . . . . . . . . . . . . . . . 374

Edmundo de Souza e Silva, Rosa M. M. Leão, Berthier Ribeiro-Neto,
Sérgio Campos
Markovian Modeling of Real Data Traffic: Heuristic Phase Type and

MAP Fitting of Heavy Tailed and Fractal Like Samples . . . . . . . . . . . . . . . . . 405
András Horváth, Miklós Telek
Optimization of Bandwidth and Energy Consumption in Wireless

Local Area Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Marco Conti, Enrico Gregori
Service Centric Computing – Next Generation Internet Computing . . . . . . . 463
Jerry Rolia, Rich Friedrich, Chandrakant Patel
European DataGrid Project: Experiences of Deploying a Large Scale

Testbed for E-science Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
Fabrizio Gagliardi, Bob Jones, Mario Reale, Stephen Burke
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501

G-Networks: Multiple Classes of Positive
Customers, Signals, and Product Form Results
Erol Gelenbe
School of Electrical Engineering and Computer Science

University of Central Florida
Orlando, FL 32816
[email protected]
Abstract. The purpose of this tutorial presentation is to introduce G-

Networks, or Gelenbe Networks, which are product form queueing net-
works which include normal or positive customers, as well as negative
customers which destroy other customers, and triggers which displace
other customers from one queue to another. We derive the balance equa-
tions for these models in the context of multiple customer classes, show
the product form results, and exhibit the traffic equations which – in
this case, contrary to BCMP and Jackson networks - are non-linear. This
leads to interesting issues of existence and uniqueness of the steady-state
solution. Gelenbe Network can be used to model large scale computer sys-
tems and networks in which signaling functions represented by negative
customers and triggers are used to achieve flow and congestion control.
1 Introduction
In this survey and tutorial, we discuss a class of queueing networks, originally
inspired by our work on neural networks, in which customers are either “signals”
or positive customers.
Positive customers enter a queue and receive service as ordinary queueing
network customers; they constitute queue length. A signal may be of a “negative
customer”, or it may be a “trigger”. Signals do not receive service, and disappear
after having visited a queue. If the signal is a trigger, then it actually transfers
a customer from the queue it arrives to, to some other queue according to a
probabilistic rule. On the other hand, a negative customer simply depletes the
length of the queue to which it arrives if the queue is non-empty. One can
also consider that a negative customer is a special kind of trigger which simply
sends a customer to the “outside world” rather than transferring it to another
queue. Positive customers which leave a queue to enter another queue can become
signals or remain positive customers.
Additional primitive operations for these networks have also been introduced
in [12]. The computation of numerical solutions to the non-linear traffic equations
of some of these models have been discussed in [6]. Applications to networking
problems are reported in [17]. A model of doubly redundant systems using G-
networks, where work is scheduled on two different processors and then cancelled
M.C. Calzarossa and S. Tucci (Eds.): Performance 2002, LNCS 2459, pp. 1–16, 2002.

c Springer-Verlag Berlin Heidelberg 2002
2 E. Gelenbe
at one of the processors if the work is successfully completed at the other, is

presented in [7]. The extension of the original model with positive and signals
[4] to multiple classes was proposed and obtained in various papers [9,11,15,19].
Some early neural network applications of G-networks are summarized in
a survey article [18]. From the neural network approach, the model in [9] was
applied to texture generation in colour images in an early paper [10].
The present survey Includes the results presented in [20], where multiple
classes of positive and signals are discussed, and we also include multiple classes
of triggers. Thus in this paper we discuss G-Networks with multiple classes of
positive customers and one or more classes of signals.
Three types of service centers with their corresponding service disciplines are
examined:
– Type 1 : first-in-first-out (FIFO),

– Type 2 : processor sharing (PS),
– Type 4 : last-in-first-out with preemptive resume priority (LIFO/PR).
With reference to the usual terminology related to the BCMP theorem [2],
we exclude from the present discussion the Type 3 service centers with an in-
finite number of servers since they will not be covered by our results.
Furthermore, in this paper we deal only with exponentially distributed service
times.
In Section 2 we will prove that these multiple class G-Networks, with Type
1, 2 and 4 service centers, have product form.
2 The Model
We consider networks with an arbitrary number n of queues, an arbitrary number

of positive customer classes K, and an arbitrary number of signal classes S.
External arrival streams to the network are independent Poisson processes for
positive customers of some class k and signals of some class c. We denote by Λi,k
the external arrival rate of positive customers of class k to queue i and by λi,m
be the external arrival rate of signals of class m to queue i.
Only positive customers are served, and after service they may change class,
service center and nature (positive signal), or depart from the system. The move-
ment of customers between queues, classes and nature (positive to signal) is
represented by a Markov chain.
At its arrival in a non-empty queue, a signal selects a positive customer
as its “target” in the queue in accordance with the service discipline at this
station. If the queue is empty, then the signalsimply disappears. Once the target
is selected, the signaltries to trigger the movement of the selected customer. A
negative customer, of some class m, succeeds in triggering the movement of the
selected positive customer of some class k, at service center i with probability
Ki,m,k . With probability (1 − Ki,m,k ) it does not succeed. A signal disappears
as soon as it tries to trigger the movement of its targeted customer. Recall that
G-Networks: Multiple Classes of Positive Customers 3
signal is either exogenous, or is obtained by the transformation of a positive

customer as it leaves a queue.
A positive customer of class k which leaves queue i (after finishing service)
goes to queue j as a positive customer of class l with probability P + [i, j] [k, l],
or as a signal of class m with probability P − [i, j] [k, m]. It may also depart from
the network with probability d[i, k]. Obviously we have for all i, k:

n
R
n
S
P + [i, j][k, l] + P − [i, j][k, m] + d[i, k] = 1 (1)
j=1 l=1 j=1 m=1
We assume that all service centers have exponential service time distribu-
tions. In the three types of service centers, each class of positive customers may
have a distinct service rate
rik. When the service center is of Type 1 (FIFO) we place the following con-
straint on the service rate and the movement triggering rate due to incoming
signals:
S
rik + Ki,m,k λi,m = ci (2)
m=1
Note that this constraint, together with the constraint (3) given below, have
the effect of producing a single positive customer class equivalent for service cen-
ters with FIFO discipline. The following constraints on the movement triggering
probability are assumed to exist. Note that because services are exponentially
distributed, positive customers of a given class are indistinguishable for move-
ment triggering because of the Markovian property of service time.
– The following constraint
n must
R hold for all stations i of Type 1 and classes of
signals m such that j=1 l=1 P − [j, i][l, m] > 0
for all classes of positive customers a and b, Ki,m,a = Ki,m,b (3)
This constraint implies that a signal of some class m arriving from the net-
work does not “distinguish” between the positive customer classes it will try
to trigger the movement, and that it will treat them all in the same manner.
– For a Type 2 server, the probability that any one positive customer of the
queue is selected by the arriving signalis 1/c if c is the total number of
customers in the queue.
For Type 1 service centers, one may consider the following conditions which
are simpler than (2) and (3):
ria = rib
(4)
Ki,m,a = Ki,m,b
for all classes of positive customers a and b, and all classes of signals m. Note
however that these new conditions are more restrictive, though they do imply
that (2), 3) hold.
4 E. Gelenbe
2.1 State Representation
We denote the state at time t of the queueing network by a vector x(t) =

(x1 (t), ..., xn (t)). Here xi (t) represents the state of service center i. The vector
x = (x1 ..., xn ) will denote a particular value of the state and |xi | will be the
total number of customers in queue i for state x.
For Type 1 and Type 4 servers, the instantaneous value of the state xi of
queue i is represented by the vector of elements whose length is the number of
customers in the queue and whose jth element xi,j is the class index of the jth
customer in the queue. Furthermore, the customers are ordered according to the
service order (FIFO or LIFO); it is always the customer at the head of the list
which is in service. We denote by ci,1 the class number of the customer in service
and by ci,∞ the class number of the last customer in the queue.
For a PS (Type 2) service station, the instantaneous value of the state xi is
represented by the vector (xi,k ) which is the number of customers of class k in
queue i.
3 Main Theorem
Let P (x) denote the stationary probability distribution of the state of the net-
work. It is given by the following product form result.
Theorem 1 Consider a G-network with the restrictions and properties described

in the previous sections. If the system of non-linear equations:
Λi,k + Λ+
i,k
qi,k = S (5)
ri,k + m=1 Ki,m,k [λi,m + λ−
i,m ]

n
R
Λ+
i,k = P + [j, i][l, k]rj,l qj,l
j=1 l=1

n
R
n
S
R
+ rj,l qj,l P − [j, h][l, m]Kh,m,s qh,s Q[h, i][s, k]
j=1 l=1 h=1 m=1 s=1

n
S
R
+ λj,m Kj,m,s qj,s Q[j, i][s, k] (6)
j=1 m=1 s=1

n
R
λ−
i,m = P − [j, i][l, m]rj,l qj,l (7)
j=1 l=1
has a solution such that for each pair i, k : 0 < qi,k and for each sta-
R
tion i : k=1 qi,k < 1, then the stationary probability distribution of the network
state is:
n
P (x) = G gi (xi ) (8)
i=1
where each gi (xi ) depends on the type of service center i. The gi (xi ) in (5) have
the following form :
FIFO. If the service center is of Type 1, then
|xi |

gi (xi ) = qi,vi,n (9)
n=1
PS. If the service center is of Type 2, then

R
(qi,k )xi,k
gi (xi ) = |xi |! (10)
xi,k !
k=1
LIFO/PR. If the service center is of Type 4, then
|xi |

gi (xi ) = qi,vi,n (11)
n=1
and G is the normalization constant.
Notice that Λ+
i,k may be written as:

n
R
Λ+
i,k = rj,l qj,l P + [j, i][l, k]
j=1 l=1

n
R
S
+ qj,l Q[j, i][l, k]Kj,m,l [λj,m + λ−
j,m ] (12)
j=1 l=1 m=1
The conditions requiring that qi,k > 0 and on that their sum over all classes
at each center be less than 1 simply insure the existence of the normalizing
constant G in (8) and the stability of the network.
4 Proof of the Main Result

The proof follows the same lines as that for a similar result but more restrictive
result in [20] which does not cover the case of triggers. The reader who is not
interested in the technical details may prefer to skip this section. We begin with
some technical Lemmas.
Lemma 1 The following flow equation is satisfied:

n
R
n
S
Λ+
i,k + λ−
i,m
i=1 k=1 i=1 m=1

n
R
= qi,k ri,k (1 − d[i, k])
i=1 k=1

n
R
n
R
S
j,m ]
i=1 k=1 j=1 l=1 m=1
6 E. Gelenbe
Proof : Consider (12), then sum it for all the stations and all the classes
and exchange the order of summations in the right-hand side of the equation :

n
R
n
R n
R
Λ+
i,k = rj,l qj,l ( P + [j, i][l, k])
i=1 k=1 j=1 l=1 i=1 k=1

n
R
n
R
S
j,m ]
i=1 k=1 j=1 l=1 m=1
Similarly, using equation (7)

n
S
n
R n
S
λ−
i,m = rj,l qj,l ( P − [j, i][l, m])
i=1 m=1 j=1 l=1 i=1 m=1
Furthermore:

n
R
n
S
Λ+
i,k + λ−
i,m
i=1 k=1 i=1 m=1

n
R n
R
n
S
= rj,l qj,l ( P + [j, i][l, k] + P − [j, i][l, m])
j=1 l=1 i=1 k=1 i=1 m=1

n
R
n
R
S
j,m ]
i=1 k=1 j=1 l=1 m=1
According to the definition of the routing matrix P (equation (1)), we have

n
R
n
S
Λ+
i,k + λ−
i,m
i=1 k=1 i=1 m=1

n
R
= rj,l qj,l (1 − d[j, l])
j=1 l=1

n
R
n
R
S
j,m ]
i=1 k=1 j=1 l=1 m=1
Thus the proof of the Lemma is complete.

In order to carry out the algebraic manipulations of the stationary Chapman-
Kolmogorov (global balance) equations, we introduce some notation and develop
intermediate results:
– The state dependent service rates for customers at service center j will be
denoted by Mj,l (xj ) where xj refers to the state of the service center and
l is the class of the customer concerned. From the definition of the service
rate rjl, we obtain for the three types of stations :
FIFO and LIFO/PR Mj,l (xj ) = rj,l 1{cj,1 =l} ,

x
PS Mj,l (xj ) = rj,l |xj,l j|
.
– Nj,l (xj ) is the movement triggering rate of class l positive customers due to
external arrivals of all the classes of signals:
S
FIFO and LIFO/PR Nj,l (xj ) = 1{cj,1 =l} m=1 Kj,m,l λj,m
xj,l S
PS Nj,l (xj ) = |xj | m=1 Kj,m,l λj,m .
– Aj,l (xj ) is the condition which establishes that it is possible to reach state
xj by an arrival of a positive customer of class l
FIFO Aj,l (xj ) = 1{cj,∞ =l} ,
LIFO/PR Aj,l (xj ) = 1{cj,1 =l} ,
PS Aj,l (xj ) = 1{|xj,l |>0} .
– Zj,l,m (xj ) is the probability that a signal of class m, arriving from the net-
work, will trigger the movement of a positive customer of class l.
FIFO and LIFO/PR Zj,l,m (xj ) = 1{cj,1 =l} Kj,m,l
x
PS Zj,l,m (xj ) = |xj,l j|
Kj,m,l .
– Yj,m (xj ) is the probability that a signal of class m which enters a non empty
queue, will not trigger the movementof a positive customer.
R
FIFO and LIFO/PR Yj,m (xj ) = l=1 1{cj,1 =l} (1 − Kj,m,l )
R xj,l
PS Yj,m (xj ) = l=1 (1 − Kj,m,l ) |xj | .
Denote by (xj + ej,l ) the state of station j obtained by adding to the j − th

queue a positive customer of class l. Let (xi − ei,k ) be the state obtained by
removing from the end of the list of customers in queue, a class k customer if it
is there; otherwise (xi − ei,k ) is not defined.
Lemma 2 For any Type 1, 2, or 4 service center, the following relations hold:
gj (xj + ej,l )
Mj,l (xj + ej,l ) = rj,l qj,l (13)
gj (xj )
gj (xj + ej,l ) S
Nj,l (xj + ej,l ) = (Kj,m,l λj,m )qj,l (14)
gj (xj ) m=1
gj (xj + ej,l )
Zj,l,m (xj + ej,l ) = Kj,m,l qj,l (15)
gj (xj )
The proof is purely algebraic.

Remark : As a consequence, we have from equations (12), (7) and (13):

n
R
gj (xj + ej,l ) +
Λ+
i,k = Mj,l (xj + ej,l ) P [j, i][l, k]
j=1 l=1
gj (xj )

n
R
S
j,m ] (16)
j=1 l=1 m=1
and
8 E. Gelenbe

n
R
gj (xj + ej,l ) −
λ−
i,m = Mj,l (xj + ej,l ) P [j, i][l, m] (17)
j=1 l=1
gj (xj )
Lemma 3 Let i be any Type 1, 2, or 4 station, and let Δi (xi ) be:

S
Δi (xi ) = λ−
i,m Yi,m (xi )
m=1

R
− (Mi,k (xi ) + Ni,k (xi ))
k=1

R
gi (xi − ei,k )
+ Ai,k (xi )(Λi,k + Λ+
i,k )
gi (xi )
k=1
S
Then for the three types of service centers, 1{|xi |>0} Δi (xi ) = m=1 λ−
i,m
1{|xi |>0} .
The proof of Lemma 3 is in a separate subsection at the end of this paper in
order to make the text somewhat easier to follow.

Let us now turn to the proof of Theorem 1. The global balance equation of
the networks which are considered is:
n
R
P (x)[ (Λj,l + Mj,l (xj )1{|xj |>0} + Nj,l (xj )1{|xj |>0} )]
j=1 l=1

n
R
= P (x − ej,l )Λj,l Aj,l (xj )1{|xj |>0}
j=1 l=1

n
R
+ P (x + ej,l )Nj,l (xj + ej,l )D[j, l]
j=1 l=1

n
R
+ P (x + ej,l )Mj,l (xj + ej,l )d[j, l]
j=1 l=1

n
n
R
S
+ Mj,l (xj + ej,l )P (x + ej,l )P − [j, i][l, m]Yi,m (xi )1{|xi |>0}
i=1 j=1 l=1 m=1

n
n
R
S
+ Mj,l (xj + ej,l )P (x + ej,l )P − [j, i][l, m]1{|xi |=0}
i=1 j=1 l=1 m=1

n
n
R
R
+ Mj,l (xj + ej,l )P (x − ei,k + ej,l )P + [j, i][l, k]Ai,k (xi )1{|xi |>0}
i=1 j=1 k=1 l=1

n
n
R
R
+ Nj,l (xj + ej,l )P (x − ei,k + ej,l )Q[j, i][l, k]Ai,k (xi )1{|xi |>0}
i=1 j=1 k=1 l=1

n
n
R
R
S
+ Mj,l (xj + ej,l )P (x + ei,k + ej,l )P − [j, i][l, m]Zi,k,m
i=1 j=1 k=1 l=1 m=1
(xi + ei,k )D[i, k]

n
n
R
R
S
n
R
+ (Mj,l (xj + ej,l )P (x+ei,k + ej,l − eh,s )P − [j, i][l, m]
i=1 j=1 k=1 l=1 m=1 h=1 s=1
Zi,k,m (xi + ei,k )Q[i, h][k, s]Ah,s (xh )1{|xh |>0} )
We divide both sides by P (x), assume that there is a product form solution,
and apply Lemma 2:

n
R
(Λj,l + Mj,l (xj )1{|xj |>0} + Nj,l (xj )1{|xj |>0} )
j=1 l=1
n R
gj (xj − ej,l )
= Λj,l Aj,l (xj )1{|xj |>0}
j=1
gj (xj )
l=1

n
R
S
n
R
+ λj,m Kj,m,l qj,l D[j, l] + rj,l qj,l d[j, l]
j=1 l=1 m=1 j=1 l=1

n
n
R
S
+ rj,l qj,l P − [j, i][l, m]Yi,m (xi )1{|xi |>0}
i=1 j=1 l=1 m=1

n
n
R
S
+ rj,l qj,l P − [j, i][l, m]1{|xi |=0}
i=1 j=1 l=1 m=1

n
n
R
R
gi (xi − ei,k )
+ rj,l qj,l P + [j, i][l, k]Ai,k (xi ) 1{|xi |>0}
i=1 j=1 k=1 l=1
gi (xi )

n
n
R
R
S
gi (xi − ei,k )
+ λj,m Kj,m,l qj,l Q[j, i][l, k]Ai,k (xi ) 1{|xi |>0}
i=1 j=1 k=1 l=1 m=1
gi (xi )

n
n
R
R
S
+ rj,l qj,l P − [j, i][l, m]Ki,m,k qi,k D[i, k]
i=1 j=1 k=1 l=1 m=1

n
n
n
R
R
R
S
+ rj,l qj,l P − [j, i][l, m]Ki,m,k qi,k Q[i, h][k, s]
i=1 j=1 h=1 l=1 k=1 s=1 m=1
gh (xh − eh,s )
Ah,s (xh )1{|xh |>0}
gh (xh )
10 E. Gelenbe
We now apply (7) to the fourth, fifth, eigth and ninth terms of the second member
of the equation:

n
R
(Λj,l + Mj,l (xj )1{|xj |>0} + Nj,l (xj )1{|xj |>0} )
j=1 l=1
n R
gj (xj − ej,l )
= Λj,l Aj,l (xj )1{|xj |>0}
j=1
gj (xj )
l=1

n
R
S
n
R
j=1 l=1 m=1 j=1 l=1

n
S
+ λ−
i,m Yi,m (xi )1{|xi |>0}
i=1 m=1

n
S
+ λ−
i,m 1{|xi |=0}
i=1 m=1

n
n
R
R
gi (xi − ei,k )
+ rj,l qj,l P + [j, i][l, k]Ai,k (xi ) 1{|xi |>0}
i=1 j=1 k=1 l=1
gi (xi )

n
n
R
R
S
gi (xi − ei,k )
+ λj,m Kj,m,l qj,l Q[j, i][l, k]Ai,k (xi ) 1{|xi |>0}
i=1 j=1 k=1 l=1 m=1
gi (xi )

n
R
S
+ λ−
i,m Ki,m,k qi,k D[i, k]
i=1 k=1 m=1

n
n
R
R
S
gi (xi − ei,k )
+ λ−
j,m Kj,m,l qj,l Q[j, i][l, k] Ai,k (xi )1{|xi |>0}
i=1 j=1 l=1 k=1 m=1
gi (xi )
We group the first, sixth, seventh and ninth terms of the right side of the equa-
tion, and pass the two last terms of the first member to the second:

n
R
(Λj,l )
j=1 l=1

n
R
=− (Mi,k (xi ) + Ni,k (xi ))1{|xi |>0}
i=1 k=1

n
R
gi (xi − ei,k )
+ Ai,k (xi )1{|xi |>0} (Λi,k + Λ+
i,k )
i=1 k=1
gi (xi )

n
S
+ λ−
i,m Yi,m (xi )1{|xi |>0}
i=1 m=1

n
R
S
n
R
j=1 l=1 m=1 j=1 l=1

n
S
+ λ−
i,m 1{|xi |=0}
i=1 m=1

n
R
S
+ λ−
i,m Ki,m,k qi,k D[i, k]
i=1 k=1 m=1
We now apply Lemma 3 to the sum of the three first terms of the second equation:

n
R
Λj,l
j=1 l=1

n
S
= λ−
i,m 1{|xi |>0}
i=1 m=1

n
R
S
n
R
j=1 l=1 m=1 j=1 l=1

n
S
+ λ−
i,m 1{|xi |=0}
i=1 m=1

n
R
S
+ λ−
j,m Kj,m,k qj,k D[j, k]
j=1 k=1 m=1
Now we group the first and fourth terms, and the second and fifth terms of the
right side of the equation.

n
R
Λj,l
j=1 l=1

n
S
= λ−
i,m
i=1 m=1

n
R
S
+ qj,l Kj,m,l (λj,m + λ−
j,m )D[j, l]
j=1 l=1 m=1

n
R
+ rj,l qj,l d[j, l]
j=1 l=1
Substituting the value of D[j, l] and the value of d[j, l],

n
R
Λj,l
j=1 l=1
12 E. Gelenbe

n
S
n
R
S
n
R
= λ−
i,m + qj,l Kj,m,l (λj,m + λ−
j,m ) + qj,l rj,l
i=1 m=1 j=1 l=1 m=1 j=1 l=1
n
n
R
R
S
−( qj,l Kj,m,l Q[j, i][l, k](λj,m + λ−
j,m )
i=1 j=1 l=1 k=1 m=1

n
n
R
R
+ rj,l qj,l P + [j, i][l, k])
i=1 j=1 l=1 k=1

n
n
R
S
− qj,l rj,l P − [j, i][l, m]
i=1 j=1 l=1 m=1
and substituting for qjl in the second and third terms and grouping them we
have:

n
R
Λj,l
j=1 l=1

n
S
= λ−
i,m
i=1 m=1

n
R
n
R
+ Λj,l + Λ+
j,l
j=1 l=1 j=1 l=1

n
R
n
S
− j,l −
Λ+ λ−
i,m
j=1 l=1 i=1 m=1
which yields thefollowing equality which is obviously satisfied,

n
R
n
R
Λj,l = Λj,l ,
j=1 l=1 j=1 l=1
concluding the proof.

As in the BCMP [2] theorem, we can also compute the steady state distribu-
tion of the number of customers of each class in each queue. Let yi be the vector
whose elements are (yi,k ) the number of customers of class k in station i. Let y
be the vector of vectors (yi ). We omit the proof of the following result.
Theorem 2 If the system of equations (5), (6) and (7) has a solution then, the
steady state distribution π(y) is given by

n
π(y) = hi (yi ) (18)
i=1
where the marginal probabilities hi (yi ) have the following form :

R
R
hi (yi ) = (1 − qi,k )|yi |! [(qi,k )yi,k /yi,k !] (19)
k=1 k=1
4.1 Proof of Lemma 3
The proof of Lemma 3 consists in algebraic manipulations of the terms in the

balance equations related to each og the the three types of stations.
LIFO/PR. First consider an arbitrary LIFO station and recall the definition of
Δi :

R
gi (xi − ei,k )
1{|xi |>0} Δi (xi ) = 1{|xi |>0} Ai,k (xi )(Λi,k + Λ+
i,k )
gi (xi )
k=1

R
R
− 1{|xi |>0} Mi,k (xi ) − 1{|xi |>0} Ni,k (xi )
k=1 k=1

S
+ 1{|xi |>0} λ−
i,m Yi,m (xi )
m=1
We substitute the values of Yi,m , Mi,k , Ni,k and Ai,k for a LIFO station :

R
1{|xi |>0} Δi (xi ) = 1{|xi |>0} 1{ci,1 =k} (Λi,k + Λ+
i,k )/qi,k
k=1

R
− 1{|xi |>0} 1{ci,1 =k} ri,k
k=1

R
S
− 1{|xi |>0} 1{ci,1 =k} Ki,m,k λi,m
k=1 m=1

S
R
+ 1{|xi |>0} λ−
i,m 1{ci,1 =k} (1 − Ki,m,k )
m=1 k=1
We use the value of qi,k from equation (5) and some cancellations of termsto
obtain:

R
S
S
1{|xi |>0} Δi (xi ) = 1{|xi |>0} 1{ci,1 =k} ( Ki,m,k λ−
i,m + λ−
i,m (1 − Ki,m,k )
k=1 m=1 m=1

S
R
= 1{|xi |>0} λ−
i,m 1{ci,1 =k}
m=1 k=1
R
and as 1{|xi |>0} k=1 1{ci,1 =k} = 1{|xi |>0} , we finally get the result :

S
1{|xi |>0} Δi (xi ) = 1{|xi |>0} λ−
i,m (20)
m=1
14 E. Gelenbe
FIFO. Consider now an arbitrary FIFO station :

R
gi (xi − ei,k )
i,k )
gi (xi )
k=1

R
R
− 1{|xi |>0} Mi,k (xi ) − 1{|xi |>0} Ni,k (xi )
k=1 k=1

S
+ 1{|xi |>0} λ−
i,m Yi,m (xi )
m=1
Similarly, we substitute the values of Yi,m , Mi,k , Ni,k , Ai,k and qi,k :

R
S

S
1{|xi |>0} Δi (xi ) = 1{|xi |>0} 1{ci,∞ =k} (ri,k + Ki,m,k λi,m + Ki,m,k λ−
i,m )
k=1 m=1 m=1

R
R

S
− 1{|xi |>0} 1{ci,1 =k} ri,k − 1{|xi |>0} 1{ci,1 =k} Ki,m,k λi,m
k=1 k=1 m=1

S

R
+ 1{|xi |>0} λ−
i,m 1{ci,1 =k} (1 − Ki,m,k )
m=1 k=1
We separate the last term into two parts, and regroup terms:

R

S

S
1{|xi |>0} Δi (xi ) = 1{|xi |>0} 1{ci,∞ =k} (ri,k + Ki,m,k λi,m + Ki,m,k λ−
i,m )
k=1 m=1 m=1

R

S

S
− 1{|xi |>0} 1{ci,1 =k} (ri,k + Ki,m,k λi,m + Ki,m,k λ−

i,m )
k=1 m=1 m=1
S
R
+ 1{|xi |>0} λ−
i,m 1{ci,1 =k}
m=1 k=1
Conditions (2) and (3) imply that the following relation must hold:

R
S
S
1{ci,∞ =k} (ri,k + Ki,m,k λi,m + Ki,m,k λ−
i,m ) =
k=1 m=1 m=1

R
S
S
1{ci,1 =k} (ri,k + Ki,m,k λi,m + Ki,m,k λ−
i,m )
k=1 m=1 m=1
R
Thus, as 1{|xi |>0} k=1 1{ci,1 =k} = 1{|xi |>0} , we finally get the expected
result :

S
1{|xi |>0} Δi (xi ) = 1{|xi |>0} λ−
i,m (21)
m=1
PS. Consider now an arbitrary PS station :

R
gi (xi − ei,k )
i,k )
gi (xi )
k=1

R
R
− 1{|xi |>0} Mi,k (xi ) − 1{|xi |>0} Ni,k (xi )
k=1 k=1

S
+ 1{|xi |>0} λ−
i,m Yi,m (xi )
m=1
As usual, we substitute the values of Yi,m , Mi,k , Ni,k , Ai,k :

R
(Λi,k + Λ+
i,k ) xi,k
1{|xi |>0} Δi (xi ) = 1{|xi |>0} 1{|xi,k |>0}
qi,k |xi |
k=1

R
xi,k
− 1{|xi |>0} ri,k
|xi |
k=1

R
xi,k
S
− 1{|xi |>0} Ki,m,k λi,m
|xi | m=1
k=1

S
R
xi,k
+ 1{|xi |>0} λ− (1 − Ki,m,k )
m=1 k=1
i,m
|xi |
Then, we apply equation (5) to substitute qi,k . After some cancelations of

terms we obtain :

R
xi,k
S
1{|xi |>0} Δi (xi ) = 1{|xi |>0} Ki,m,k λ−
|xi | m=1
i,m
k=1

S
R
xi,k
+ 1{|xi |>0} λ− (1 − Ki,m,k )
m=1 k=1
i,m
|xi |
Finally we have:

R
xi,k
S
1{|xi |>0} Δi (xi ) = 1{|xi |>0} λ− (22)
|xi | m=1
i,m
k=1
R x
Since 1{|xi |>0} k=1 |xi,k
i|
= 1{|xi |>0} , once again, we establish the relation we
need. This concludes the proof of Lemma 3.

References
1. Kemmeny, J.G., Snell, J.L. “Finite Markov Chains”, Von Nostrand, Princeton,
1965.
16 E. Gelenbe
2. Baskett F., Chandy K., Muntz R.R., Palacios F.G. “Open, closed and mixed net-
works of queues with different classes of customers”, Journal ACM, Vol. 22, No 2,
pp 248–260, April 1975.
3. Gelenbe E. “Random neural networks with negative and positive signals and prod-
uct form solution”, Neural Computation, Vol. 1, No. 4, pp 502–510, 1989.
4. Gelenbe E. “Product form queueing networks with negative and positive cus-
tomers”, Journal of Applied Probability, Vol. 28, pp 656–663, 1991.
5. Gelenbe E., Glynn P., Sigmann K. “Queues with negative customers”, Journal of
Applied Probability, Vol. 28, pp 245–250, 1991.
6. Fourneau J.M. “Computing the steady-state distribution of networks with positive
and negative customers”, Proc. 13-th IMACS World Congress on Computation and
Applied Mathematics, Dublin, 1991.
7. E. Gelenbe, S. Tucci “Performances d’un système informatique dupliqué”,
Comptes-Rendus Acad. Sci., t 312, Série II, pp. 27–30, 1991.
8. Gelenbe E., Schassberger R. “Stability of G-Networks”, Probability in the Engi-
neering and Informational Sciences, Vol. 6, pp 271–276, 1992.
9. Fourneau, J.M., Gelenbe, E. “Multiple class G-networks,” Conference of the ORSA
Technical Committee on Computer Science, Williamsburg, VA, Balci, O. (Ed.),
Pergamon, 1992.
10. Atalay, V., Gelenbe, E. “Parallel algorithm for colour texture generation using the
random neural network model”, International Journal of Pattern Recognition and
Artificial Intelligence, Vol. 6, No. 2 & 3, pp 437–446, 1992.
11. Miyazawa, M. “ Insensitivity and product form decomposability of reallocatable
GSMP”, Advances in Applied Probability, Vol. 25, No. 2, pp 415–437, 1993.
12. Henderson, W. “ Queueing networks with negative customers and negative queue
lengths”, Journal of Applied Probability, Vol. 30, No. 3, 1993.
13. Gelenbe E. “G-Networks with triggered customer movement”, Journal of Applied
Probability, Vol. 30, No. 3, pp 742–748, 1993.
14. Gelenbe E., “G-Networks with signals and batch removal”, Probability in the En-
gineering and Informational Sciences, Vol. 7, pp 335–342, 1993.
15. Chao, X., Pinedo, M. “On generalized networks of queues with positive and neg-
ative arrivals”, Probability in the Engineering and Informational Sciences, Vol. 7,
pp 301–334, 1993.
16. Henderson, W., Northcote, B.S., Taylor, P.G. “Geometric equilibrium distributions
for queues with interactive batch departures,” Annals of Operations Research, Vol.
48, No. 1–4, 1994.
17. Henderson, W., Northcote, B.S., Taylor, P.G. “Networks of customer queues and
resource queues”, Proc. International Teletraffic Congress 14, Labetoulle, J. and
Roberts, J. (Eds.), pp 853–864, Elsevier, 1994.
18. Gelenbe, E. “G-networks: an unifying model for neural and queueing networks”,
Annals of Operations Research, Vol. 48, No. 1–4, pp 433–461, 1994.
19. Chao, X., Pinedo, M. “Product form queueing networks with batch services, sig-
nals, and product form solutions”, Operations Research Letters, Vol. 17, pp 237–
242, 1995.
20. J.M. Fourneau, E. Gelenbe, R. Suros “G-networks with multiple classes of positive
and negative customers,” Theoretical Computer Science, Vol. 155, pp. 141–156,
1996.
21. Gelenbe, E., Labed A. “G-networks with multiple classes of customers and trig-
gers”, to appear.
Spectral Expansion Solutions for
Markov-Modulated Queues
Isi Mitrani
Computing Science Department, University of Newcastle

[email protected]
1 Introduction
There are many computer, communication and manufacturing systems which

give rise to queueing models where the arrival and/or service mechanisms are
influenced by some external processes. In such models, a single unbounded queue
evolves in an environment which changes state from time to time. The instanta-
neous arrival and service rates may depend on the state of the environment and
also, to a limited extent, on the number of jobs present.
The system state at time t is described by a pair of integer random variables,
(It , Jt ), where It represents the state of the environment and Jt is the num-
ber of jobs present. The variable It takes a finite number of values, numbered
0, 1, . . . , N ; these are also called the environmental phases. The possible values
of Jt are 0, 1, . . .. Thus, the system is in state (i, j) when the environment is in
phase i and there are j jobs waiting and/or being served.
The two-dimensional process X = {(It , Jt ) ; t ≥ 0} is assumed to have the
Markov property, i.e. given the current phase and number of jobs, the future
behaviour of X is independent of its past history. Such a model is referred to
as a Markov-modulated queue (see, for example, Prabhu and Zhu [21]). The
corresponding state space, {0, 1, . . . , N } × {0, 1, . . .} is known as a lattice strip.
A fully general Markov-modulated queue, with arbitrary state-dependent
transitions, is not tractable. However, one can consider a sub-class of models
which are sufficiently general to be useful, and yet can be solved efficiently. We
shall introduce the following restrictions:
(i) There is a threshold M , such that the instantaneous transition rates out of
state (i, j) do not depend on j when j ≥ M .
(ii) the jumps of the random variable J are bounded.
When the jumps of the random variable J are of size 1, i.e. when jobs arrive
and depart one at a time, the process is said to be of the Quasi-Birth-and-Death
type, or QBD (the term skip-free is also used, e.g. in Latouche et al. [12]). The
state diagram for this common model, showing some transitions out of state
(i, j), is illustrated in figure 1.
The requirement that all transition rates cease to depend on the size of the job
queue beyond a certain threshold is not too restrictive. Note that we impose no
limit on the magnitude of the threshold M , although it must be pointed out that

18 I. Mitrani
j+1 n n n n
@
I 6
@
@ n?

j n n - - n
6@I
@
? @
j−1 n n n n
1 n n n n
0 n n n n
0 1 i N
Fig. 1. State diagram of a QBD process
the larger M is, the greater the complexity of the solution. Similarly, although
jobs may arrive and/or depart in fixed or variable (but bounded) batches, the
larger the batch size, the more complex the solution.
The object of the analysis of a Markov-modulated queue is to determine the
joint steady-state distribution of the environmental phase and the number of
jobs in the system:
pi,j = lim P (It = i , Jt = j) ; i = 0, 1, . . . , N ; j = 0, 1, . . . . (1)

t→∞
That distribution exists for an irreducible Markov process if, and only if, the
corresponding set of balance equations has a positive solution that can be nor-
malized.
The marginal distributions of the number of jobs in the system, and of the
phase, can be obtained from the joint distribution:

N
p·,j = pi,j . (2)
i=0
Spectral Expansion Solutions for Markov-Modulated Queues 19
∞

pi,· = pi,j . (3)
j=0
Various performance measures can then be computed in terms of these joint and
marginal distributions.
There are three ways of solving Markov-modulated queueing models exactly.
Perhaps the most widely used one is the matrix-geometric method [18]. This
approach relies on determining the minimal positive solution, R, of a non-linear
matrix equation; the equilibrium distribution is then expressed in terms of pow-
ers of R.
The second method uses generating functions to solve the set of balance
equations. A number of unknown probabilities which appear in the equations for
those generating functions are determined by exploiting the singularities of the
coefficient matrix. A comprehensive treatment of that approach, in the context
of a discrete-time process with an M/G/1 structure, is presented in Gail et al.
[5].
The third (and arguably best) method is the subject of this tutorial. It is
called spectral expansion, and is based on expressing the equilibrium distribution
of the process in terms of the eigenvalues and left eigenvectors of a certain matrix
polynomial. The idea of the spectral expansion solution method has been known
for some time (e.g., see Neuts [18]), but there are rather few examples of its
application in the performance evaluation literature. Some instances where that
solution has proved useful are reported in Elwalid et al. [3], and Mitrani and
Mitra [17]; a more detailed treatment, including numerical results, is presented
in Mitrani and Chakka [16]. More recently, Grassmann [7] has discussed models
where the eigenvalues can be isolated and determined very efficiently. Some
comparisons between the spectral expansion and the matrix-geometric solutions
can be found in [16] and in Haverkort and Ost [8]. The available evidence suggests
that, where both methods are applicable, spectral expansion is faster even if the
matrix R is computed by the most efficient algorithm.
The presentation in this tutorial is largely based on the material in chapter
6 of [13] and chapter 13 of [14].
Before describing the details of the spectral expansion solution, it would be
instructive to show some examples of systems which are modelled as Markov-
modulated queues.
2 Examples of Markov-Modulated Queues

We shall start with a few models of the Quasi-Birth-and-Death type, where the
queue size increases and decreases in steps of 1.
2.1 A Multiserver Queue with Breakdowns and Repairs

A single, unbounded queue is served by N identical parallel servers. Each server
goes through alternating periods of being operative and inoperative, indepen-
dently of the others and of the number of jobs in the system. The operative
20 I. Mitrani
and inoperative periods are distributed exponentially with parameters ξ and η,

respectively. Thus, the number of operative servers at time t, It , is a Markov
process on the state space {0, 1, . . . , N }. This is the environment in which the
queue evolves: it is in phase i when there are i operative servers (see [15,20]).
Jobs arrive according to a Markov-Modulated Poisson Process controlled by
It . When the phase is i, the instantaneous arrival rate is λi . Jobs are taken for
service from the front of the queue, one at a time, by available operative servers.
The required service times are distributed exponentially with parameter μ. An
operative server cannot be idle if there are jobs waiting to be served. A job
whose service is interrupted by a server breakdown is returned to the front of
the queue. When an operative server becomes available, the service is resumed
from the point of interruption, without any switching overheads. The flow of
jobs is shown in figure 2.
μ, ξ, η

-

λi b
""
- bb
"
@
@
R
@
-

Fig. 2. A multiserver queue with breakdowns and repairs
The process X = {(It , Jt ) ; t ≥ 0} is QBD. The transitions out of state (i, j)

are:
(a) to state (i − 1, j) (i > 0), with rate iξ;

(b) to state (i + 1, j) (i < N ), with rate (N − i)η;
(c) to state (i, j + 1) with rate λi ;
(d) to state (i, j − 1) with rate min(i, j)μ.
Note that only transition (d) has a rate which depends on j, and that dependency
vanishes when j ≥ N .
Remark. Even if the breakdown and repair processes were more compli-
cated, e.g., if servers could break down and be repaired in batches, or if a
server breakdown triggered a job departure, the queueing process would still
be QBD. The environmental state transitions can be arbitrary, as long as the
queue changes in steps of 1.
In this example, as in all models where the environment state transitions do

not depend on the number of jobs present, the marginal distribution of the num-
ber of operative servers can be determined without finding the joint distribution
first. Moreover, since the servers break down and are repaired independently of
each other, that distribution is binomial:
i N −i
N η ξ
pi,· = ; i = 0, 1, . . . , N . (4)
i ξ+η ξ+η
Hence, the steady-state average number of operative servers is equal to
Nη
E(Xt ) = . (5)
ξ+η
The overall average arrival rate is equal to

N
λ= pi,· λi . (6)
i=0
This gives us an explicit condition for stability. The offered load must be less
than the processing capacity:
λ Nη
< . (7)
μ ξ+η
2.2 Manufacturing Blocking
Consider a network of two nodes in tandem, such as the one in figure 3. Jobs
arrive into the first node in a Poisson stream with rate λ, and join an unbounded
queue. After completing service at node 1 (exponentially distributed with pa-
rameter μ ), they attempt to go to node 2, where there is a finite buffer with
room for a maximum of N − 1 jobs (including the one in service). If that transfer
is impossible because the buffer is full, the job remains at node 1, preventing its
server from starting a new service, until the completion of the current service at
node 2 (exponentially distributed with parameter ξ ). In this last case, server 1
is said to be ‘blocked’. Transfers from node 1 to node 2 are instantaneous (see
[1,19]).
λ
- μ - ξ -

N −1
Fig. 3. Two nodes with a finite intermediate buffer

22 I. Mitrani
The above type of blocking is referred to as ‘manufacturing blocking’. (An

alternative model, which also gives rise to a Markov-modulated queue, is the
‘communication blocking’. There node 1 does not start a service if the buffer is
full.)
In this system, the unbounded queue at node 1 is modulated by a finite-state
environment defined by node 2. We say that the environment, It , is in state i if
there are i jobs at node 2 and server 1 is not blocked (i = 0, 1, . . . , N − 1). An
extra state, It = N , is needed to describe the situation where there are N − 1
jobs at node 2 and server 1 is blocked.
The above assumptions imply that the pair X = {(It , Jt ) ; t ≥ 0}, where Jt
is the number of jobs at node 1, is a QBD process. Note that the state (N, 0)
does not exist: node 1 may be blocked only if there are jobs present.
The transitions out of state (i, j) are:
(a) to state (i − 1, j) (0 < i < N ), with rate ξ;

(b) to state (N − 1, j − 1) (i = N, j > 0), with rate ξ;
(c) to state (i + 1, j − 1) (0 ≤ i < N − 1, j > 0), with rate μ;
(d) to state (N, j) (i = N − 1, j > 0), with rate μ;
(e) to state (i, j + 1) with rate λ.
The only dependency on j comes from the fact that transitions (b), (c) and (d)
are not available when j = 0. In this example, the j-independency threshold is
M = 1.
Because the environmental process is coupled with the queueing process, the
marginal distribution of the former (i.e., the number of jobs at node 2), cannot
be determined without finding the joint distribution of It and Jt . Nor is the
stability condition as simple as in the previous example.
2.3 Phase-Type Distributions
There is a large and useful family of distributions that can be incorporated into
queueing models by means of Markovian environments. Those distributions are
‘almost’ general, in the sense that any distribution function either belongs to
this family or can be approximated as closely as desired by functions from it.
Let It be a Markov process with state space {0, 1, . . . , N } and generator
matrix Ã. States 0, 1, . . . , N − 1 are transient, while state N , reachable from any
of the other states, is absorbing (the last row of Ã is 0). At time 0, the process
starts in state i with probability αi (i = 0, 1, . . . , N −1; α1 +α2 +. . .+αN −1 = 1).
Eventually, after an interval of length T , it is absorbed in state N . The random
variable T is said to have a ‘phase-type’ (PH) distribution with parameters Ã
and αi (see [18]).
The exponential distribution is obviously phase-type (N = 1). So is the
Erlang distribution—the convolution of N exponentials (exercise 5 in section
2.3). The corresponding generator matrix is
⎡ ⎤
−μ μ
⎢ −μ μ ⎥
⎢ ⎥
⎢ .. .. ⎥
Ã = ⎢ . . ⎥ ,
⎢ ⎥
⎣ −μ μ ⎦
0
and the initial probabilities are α0 = 1, α1 = . . . = αN −1 = 0.

Another common PH distribution is the ‘hyperexponential’, where I0 = i
with probability αi , and absorbtion occurs at the first transition. The generator
matrix of the hyperexponential distribution is
⎡ ⎤
−μ0 μ0
⎢ −μ1 μ1 ⎥
⎢ ⎥
⎢ . .. ⎥ .
Ã = ⎢ .. . ⎥
⎢ ⎥
⎣ −μN −1 μN −1 ⎦
0
The corresponding probability distribution function, F (x), is a mixture of expo-

nentials:
N−1
F (x) = 1 − αi e−μi x .
i=0
The PH family is very versatile. It contains distributions with both low and
high coefficients of variation. It is closed with respect to mixing and convolution:
if X1 and X2 are two independent PH random variables with N1 and N2 (non-
absorbing) phases respectively, and c1 and c2 are constants, then c1 X1 + c2 X2
has a PH distribution with N1 + N2 phases.
A model with a single unbounded queue, where either the interarrival in-
tervals, or the service times, or both, have PH distributions, is easily cast in
the framework of a queue in Markovian environment. Consider, for instance, the
M/PH/1 queue. Its state at time t can be represented as a pair (It , Jt ), where Jt
is the number of jobs present and It is the phase of the current service (if Jt > 0).
When It has a transition into the absorbing state, the current service completes
and (if the queue is not empty) a new service starts immediately, entering phase
i with probability αi .
The PH/PH/n queue can also be represented as a QBD process. However,
the state of the environmental variable, It , now has to indicate the phase of the
current interarrival interval and the phases of the current services at all busy
servers. If the interarrival interval has N1 phases and the service has N2 phases,
the state space of It would be of size N1 N2n .
2.4 Checkpointing and Recovery in the Presence of Faults
The last example is not a QBD process. Consider a system where transactions,
arriving according to a Poisson process with rate λ, are served in FIFO order by
24 I. Mitrani
a single server. The service times are i.i.d. random variables distributed exponen-
tially with parameter μ. After N consecutive transactions have been completed,
the system performs a checkpoint operation whose duration is an i.i.d. random
variable distributed exponentially with parameter β. Once a checkpoint is es-
tablished, the N completed transactions are deemed to have departed. However,
both transaction processing and checkpointing may be interrupted by the occur-
rence of a fault. The latter arrive according to an independent Poisson process
with rate ξ. When a fault occurs, the system instantaneously rolls back to the
last established checkpoint; all transactions which arrived since that moment
either remain in the queue, if they have not been processed, or return to it,
in order to be processed again (it is assumed that repeated service times are
resampled independently) (see [11,8]).
This system can be modelled as an unbounded queue of (uncompleted) trans-
actions, which is modulated by an environment consisting of completed trans-
actions and checkpoints. More precisely, the two state variables, I(t) and J(t),
are the number of transactions that have completed service since the last check-
point, and the number of transactions present that have not completed service
(including those requiring re-processing), respectively.
The Markov-modulated queueing process X = {[I(t), J(t)] ; t ≥ 0}, has the
following transitions out of state (i, j):
(a) to state (0, j + i), with rate ξ;

(b) to state (0, j) (i = N ), with rate β;
(c) to state (i, j + 1), with rate λ;
(d) to state (i + 1, j − 1) (0 ≤ i < N, j > 0), with rate μ;
Because transitions (a), resulting from arrivals of faults, cause the queue size
to jump by more than 1, this is not a QBD process.
3 Spectral Expansion Solution
Let us now turn to the problem of determining the steady-state joint distribu-
tion of the environmental phase and the number of jobs present, for a Markov-
modulated queue. We shall start with the most commonly encountered case,
namely the QBD process, where jobs arrive and depart singly. The starting
point is of course the set of balance equations which the probabilities pi,j , de-
fined in 1, must satisfy. In order to write them in general terms, the following
notation for the instantaneous transition rates will be used.
(a) Phase transitions leaving the queue unchanged: from state (i, j) to state
(k, j) (0 ≤ i, k ≤ N ; i = k), with rate aj (i, k);
(b) Transitions incrementing the queue: from state (i, j) to state (k, j + 1) (0 ≤
i, k ≤ N ), with rate bj (i, k);
(c) Transitions decrementing the queue: from state (i, j) to state (k, j − 1) (0 ≤
i, k ≤ N ; j > 0), with rate cj (i, k).
It is convenient to introduce the (N + 1) × (N + 1) matrices containing the

rates of type (a), (b) and (c): Aj = [aj (i, k)], Bj = [bj (i, k)] and Cj = [cj (i, k)],
respectively (the main diagonal of Aj is zero by definition; also, C0 = 0 by
definition). According to the assumptions of the Markov-modulated queue, there
is a threshold, M (M ≥ 1), such that those matrices do not depend on j when
j ≥ M . In other words,
Aj = A ; Bj = B ; Cj = C , j ≥ M . (8)
Note that transitions (b) may represent a job arrival coinciding with a change
of phase. If arrivals are not accompanied by such changes, then the matrices
Bj and B are diagonal. Similarly, a transition of type (c) may represent a job
departure coinciding with a change of phase. Again, if such coincidences do not
occur, then the matrices Cj and C are diagonal.
By way of illustration, here are the transition rate matrices for some of the
examples in the previous subsection.
Multiserver Queue with Breakdowns and Repairs
Since the phase transitions are independent of the queue size, the matrices Aj
are all equal:
⎡ ⎤
0 Nη
⎢ ξ 0 (N − 1)η ⎥
⎢ ⎥
⎢ .. ⎥
Aj = A = ⎢ ⎢ 2ξ 0 . ⎥ .
⎥
⎢ . . ⎥
⎣ .. .. η ⎦
Nξ 0
Similarly, the matrices Bj do not depend on j:

⎡ ⎤
λ0
⎢ λ1 ⎥
⎢ ⎥
B=⎢ .. ⎥ .
⎣ . ⎦
λN
Denoting
μi,j = min(i, j)μ ; i = 0, 1, . . . , N ; j = 1, 2, . . . ,
the departure rate matrices, Cj , can thus be written as

⎡ ⎤
0
⎢ μ1,j ⎥
⎢ ⎥
Cj = ⎢ .. ⎥ ; j = 1, 2, . . . ,
⎣ . ⎦
μN,j
26 I. Mitrani
These matrices cease to depend on j when j ≥ N . Thus, the threshold M is now

equal to N , and
⎡ ⎤
0
⎢ μ ⎥
⎢ ⎥
C=⎢ .. ⎥ .
⎣ . ⎦
Nμ
Manufacturing Blocking
Remember that the environment changes phase without changing the queue size
either when a service completes at node 2 and node 1 is not blocked, or when
node 1 becomes blocked (if node 1 is already blocked, then a completion at node
2 changes both phase and queue size). Hence, when j > 0,
⎡ ⎤
0 0
⎢ξ 0 0 ⎥
⎢ ⎥
⎢ .. .. .. ⎥
Aj = A = ⎢ . . . ⎥ ; j = 1, 2, . . . .
⎢ ⎥
⎣ ξ 0 μ ⎦
0 0
When node 1 is empty (j = 0), it cannot become blocked; the state (N, 0)
does not exist and the matrix A0 has only N rows and columns:
⎡ ⎤
0
⎢ξ 0 ⎥
⎢ ⎥
A0 = ⎢ . . ⎥ ;
⎣ .. .. ⎦
ξ 0
Since the arrival rate into node 1 does not depend on either i or j, we have
Bj = B = λI, where I is the identity matrix of order N + 1. The departures
from node 1 (which can occur when i = N − 1) are always accompanied by
environmental changes: from state (i, j) the system moves to state (i + 1, j − 1)
with rate μ for i < N − 1; from state (N, j) to state (N − 2, j − 1) with rate ξ.
Hence, the departure rate matrices do not depend on j and are equal to
⎡ ⎤
0μ
⎢0 0 μ ⎥
⎢ ⎥
⎢ .. .. ⎥
⎢ . . ⎥
Cj = C = ⎢
⎢
⎥ .
⎥
⎢ ..
⎢ . 0 μ ⎥⎥
⎣ 0 0 0⎦
ξ 00
Balance Equations
Using the instantaneous transition rates defined at the beginning of this section,
the balance equations of a general QBD process can be written as

N
pi,j [aj (i, k) + bj (i, k) + cj (i, k)]
k=0

N
= [pk,j aj (k, i) + pk,j−1 bj−1 (k, i) + pk,j+1 cj+1 (k, i)] , (9)
k=0
where pi,−1 = b−1 (k, i) = c0 (i, k) = 0 by definition. The left-hand side of (9)
gives the total average number of transitions out of state (i, j) per unit time (due
to changes of phase, arrivals and departures), while the right-hand side expresses
the total average number of transitions into state (i, j) (again due to changes of
phase, arrivals and departures). These balance equations can be written more
compactly by using vectors and matrices. Define the row vectors of probabilities
corresponding to states with j jobs in the system:
vj = (p0,j , p1,j , . . . , pN,j ) ; j = 0, 1, . . . . (10)
Also, let DjA , DjB and DjC be the diagonal matrices whose i th diagonal element
is equal to the i th row sum of Aj , Bj and Cj , respectively. Then equations (9),
for j = 0, 1, . . ., can be written as:
vj [DjA + DjB + DjC ] = vj−1 Bj−1 + vj Aj + vj+1 Cj+1 , (11)
where v−1 = 0 and D0C = B−1 = 0 by definition.

When j is greater than the threshold M , the coefficients in (11) cease to
depend on j :
vj [DA + DB + DC ] = vj−1 B + vj A + vj+1 C , (12)
for j = M + 1, M + 2, . . ..
In addition, all probabilities must sum up to 1:
∞

vj e = 1 , (13)
j=0
where e is a column vector with N + 1 elements, all of which are equal to 1.

The first step of any solution method is to find the general solution of the
infinite set of balance equations with constant coefficients, (12). The latter are
normally written in the form of a homogeneous vector difference equation of
order 2:
vj Q0 + vj+1 Q1 + vj+2 Q2 = 0 ; j = M, M + 1, . . . , (14)
where Q0 = B, Q1 = A − DA − DB − DC and Q2 = C. There is more than one
way of solving such equations.
28 I. Mitrani
Associated with equation (14) is the so-called ‘characteristic matrix polyno-

mial’, Q(x), defined as
Q(x) = Q0 + Q1 x + Q2 x2 . (15)
Denote by xk and uk the ‘generalized eigenvalues’, and corresponding ‘gen-
eralized left eigenvectors’, of Q(x). In other words, these are quantities which
satisfy
det[Q(xk )] = 0 ,
uk Q(xk ) = 0 ; k = 1, 2, . . . , d , (16)
where det[Q(x)] is the determinant of Q(x) and d is its degree. In what follows,
the qualification generalized will be omitted.
The above eigenvalues do not have to be simple, but it is assumed that if
one of them has multiplicity m, then it also has m linearly independent left
eigenvectors. This tends to be the case in practice. So, the numbering in (16) is
such that each eigenvalue is counted according to its multiplicity.
It is readily seen that if xk and uk are any eigenvalue and corresponding left
eigenvector, then the sequence
vk,j = uk xj−M
k ; j = M, M + 1, . . . , (17)
is a solution of equation (14). Indeed, substituting (17) into (14) we get
vk,j Q0 + vk,j+1 Q1 + vk,j+2 Q2 = xj−M
k uk [Q0 + Q1 xk + Q2 x2k ] = 0 .
By combining any multiple eigenvalues with each of their independent eigen-
vectors, we thus obtain d linearly independent solutions of (14). On the other
hand, it is known that there cannot be more than d linearly independent solu-
tions. Therefore, any solution of (14) can be expressed as a linear combination
of the d solutions (17):

d
vj = αk uk xj−M
k ; j = M, M + 1, . . . , (18)
k=1
where αk (k = 1, 2, . . . , d), are arbitrary (complex) constants.

However, the only solutions that are of interest in the present context are
those which can be normalized to become probability distributions. Hence, it
is
necessary to select from the set (18), those sequences for which the series
vj e converges. This requirement implies that if |xk | ≥ 1 for some k, then the
corresponding coefficient αk must be 0.
So, suppose that c of the eigenvalues of Q(x) are strictly inside the unit
disk (each counted according to its multiplicity), while the others are on the
circumference or outside. Order them so that |xk | < 1 for k = 1, 2, . . . , c. The
corresponding independent eigenvectors are u1 , u2 , . . ., uc . Then any normaliz-
able solution of equation (14) can be expressed as

c
vj = αk uk xj−M
k ; j = M, M + 1, . . . , (19)
k=1
where αk (k = 1, 2, . . . , c), are some constants.

Expression (19) is referred to as the ‘spectral expansion’ of the vectors vj .

The coefficients of that expansion, αk , are yet to be determined.
Note that if there are non-real eigenvalues in the unit disk, then they appear
in complex-conjugate pairs. The corresponding eigenvectors are also complex-
conjugate. The same must be true for the appropriate pairs of constants αk , in
order that the right-hand side of (19) be real. To ensure that it is also positive,
the real parts of xk , uk and αk should be positive.
So far, expressions have been obtained for the vectors vM , vM +1 , . . .; these
contain c unknown constants. Now it is time to consider the balance equations
(11), for j = 0, 1, . . . , M . This is a set of (M + 1)(N + 1) linear equations with
M (N +1) unknown probabilities (the vectors vj for j = 0, 1, . . . , M −1), plus the
c constants αk . However, only (M + 1)(N + 1) − 1 of these equations are linearly
independent, since the generator matrix of the Markov process is singular. On
the other hand, an additional independent equation is provided by (13).
In order that this set of linearly independent equations has a unique solution,
the number of unknowns must be equal to the number of equations, i.e. (M +
1)(N + 1) = M (N + 1) + c, or c = N + 1. This observation implies the following
Proposition 1 The QBD process has a steady-state distribution if, and only
if, the number of eigenvalues of Q(x) strictly inside the unit disk, each counted
according to its multiplicity, is equal to the number of states of the Markovian
environment, N +1. Then, assuming that the eigenvectors of multiple eigenvalues
are linearly independent, the spectral expansion solution of (12) has the form

N +1
vj = αk uk xj−M
k ; j = M, M + 1, . . . . (20)
k=1
In summary, the spectral expansion solution procedure consists of the follow-

ing steps:
1. Compute the eigenvalues of Q(x), xk , inside the unit disk, and the corre-
sponding left eigenvectors uk . If their number is other than N + 1, stop; a
steady-state distribution does not exist.
2. Solve the finite set of linear equations (11), for j = 0, 1, . . . , M , and (13),
with vM and vM +1 given by (20), to determine the constants αk and the
vectors vj for j < M .
3. Use the obtained solution in order to determine various moments, marginal
probabilities, percentiles and other system performance measures that may
be of interest.
Careful attention should be paid to step 1. The ‘brute force’ approach which
relies on first evaluating the scalar polynomial det[Q(x)], then finding its roots,
may be very inefficient for large N . An alternative which is preferable in most
cases is to reduce the quadratic eigenvalue-eigenvector problem
u[Q0 + Q1 x + Q2 x2 ] = 0 , (21)
30 I. Mitrani
to a linear one of the form uQ = xu, where Q is a matrix whose dimensions

are twice as large as those of Q0 , Q1 and Q2 . The latter problem is normally
solved by applying various transformation techniques. Efficient routines for that
purpose are available in most numerical packages.
This linearization can be achieved quite easily if the matrix C = Q2 is non-
singular. Indeed, after multiplying (21) on the right by Q−1
2 , it becomes
u[H0 + H1 x + Ix2 ] = 0 , (22)

where H0 = Q0 C −1 , H1 = Q1 C −1 , and I is the identity matrix. By introducing
the vector y = xu, equation (22) can be rewritten in the equivalent linear form

0 −H0
[u, y] = x[u, y] . (23)
I −H1
If C is singular but B is not, a similar linearization is achieved by multiplying
(21) on the right by B −1 and making a change of variable x → 1/x. Then the
relevant eigenvalues are those outside the unit disk.
If both B and C are singular, then the desired result is achieved by first
making a change of variable, x → (γ + x)/(γ − x), where the value of γ is chosen
so that the matrix S = γ 2 Q2 + γQ1 + Q0 is non-singular. In other words, γ can
have any value which is not an eigenvalue of Q(x). Having made that change
of variable, multiplying the resulting equation by S −1 on the right reduces it to
the form (22).
The computational demands of step 2 may be high if the threshold M is large.
However, if the matrices Bj (j = 0, 1, . . . , M −1) are non-singular (which is often
the case in practice), then the vectors vM −1 , vM −2 , . . . , v0 can be expressed in
terms of vM and vM +1 , with the aid of equations (11) for j = M, M − 1, . . . , 1.
One is then left with equations (11) for j = 0, plus (13) (a total of N + 1
independent linear equations), for the N + 1 unknowns xk .
Having determined the coefficients in the expansion (19) and the probabilities
pi,j for j < N , it is easy to compute performance measures. The steady-state
probability that the environment is in state i is given by

M −1
N +1
1
pi,· = pi,j + αk uk,i , (24)
j=0
1 − xk
k=1
where uk,i is the i th element of uk .

The conditional average number of jobs in the system, Li , given that the
environment is in state i, is obtained from
⎡ ⎤
M −1
1 ⎣
N +1
M − (M − 1)xk ⎦
Li = jpi,j + αk uk,i . (25)
pi,· j=1 (1 − xk )2
k=1
The overall average number of jobs in the system, L, is equal to

N
L= pi,· Li . (26)
i=0
The spectral expansion solution can also be used to provide simple estimates
of performance when the system is heavily loaded. The important observation
in this connection is that when the system approaches instability, the expansion
(19) is dominated by the eigenvalue with the largest modulus inside the unit
disk, xN +1 . That eigenvalue is always real. It can be shown that when the offered
load is high, the average number of jobs in the system is approximately equal to
xN +1 /(1 − xN +1 ).
3.1 Batch Arrivals and/or Departures

Consider now a Markov-modulated queue which is not a QBD process, i.e. one
where the queue size jumps may be bigger than 1. As before, the state of the
process at time t is described by the pair (It , Jt ), where It is the state of the
environment (the operational mode) and Jt is the number of jobs in the system.
The state space is the lattice strip {0, 1, . . . , N }×{0, 1, . . .}. The variable Jt may
jump by arbitrary, but bounded amounts in either direction. In other words, the
allowable transitions are:
(a) Phase transitions leaving the queue unchanged: from state (i, j) to state
(k, j) (0 ≤ i, k ≤ N ; i = k), with rate aj (i, k);
(b) Transitions incrementing the queue by s: from state (i, j) to state (k, j + s)
(0 ≤ i, k ≤ N ; 1 ≤ s ≤ r1 ; r1 ≥ 1), with rate bj,s (i, k);
(c) Transitions decrementing the queue by s: from state (i, j) to state (k, j − s)
(0 ≤ i, k ≤ N ; 1 ≤ s ≤ r2 ; r2 ≥ 1), with rate cj,s (i, k),
provided of course that the source and destination states are valid.
Obviously, if r1 = r2 = 1 then this is a Quasi-Birth-and-Death process.
Denote by Aj = [aj (i, k)], Bj,s = [bj,s (i, k)] and Cj,s = [cj,s (i, k)], the tran-
sition rate matrices associated with (a), (b) and (c), respectively. There is a
threshold M , such that
Aj = A ; Bj,s = Bs ; Cj,s = Cs ; j ≥ M . (27)
Defining again the diagonal matrices DA , DBs and DCs , whose i th diagonal
element is equal to the i th row sum of A, Bs and Cs , respectively, the balance
equations for j > M + r1 can be written in a form analogous to (12):

r1
r2
r1
r2
vj [DA + D Bs + D Cs ] = vj−s Bs + vj A + vj+s Cs . (28)
s=1 s=1 s=1 s=1
Similar equations, involving Aj , Bj,s and Cj,s , together with the corresponding
diagonal matrices, can be written for j ≤ M + r1 .
As before, (28) can be rewritten as a vector difference equation, this time of
order r = r1 + r2 , with constant coefficients:

r
vj+ Q = 0 ; j ≥ M . (29)
=0
32 I. Mitrani
Here, Q = Br1 − for = 0, 1, . . . r1 − 1,

r1
r2
Qr1 = A − DA − D Bs − D Cs ,
s=1 s=1
and Q = C−r1 for = r1 + 1, r1 + 2, . . . r1 + r2 .

The spectral expansion solution of this equation is obtained from the char-
acteristic matrix polynomial

r
Q(x) = Q x . (30)
=0
The solution is of the form

c
vj = αk uk xj−M
k ; j = M, M + 1, . . . , (31)
k=1
where xk are the eigenvalues of Q(x) in the interior of the unit disk, uk are the
corresponding left eigenvectors, and αk are constants (k = 1, 2, . . . , c ). These
constants, together with the the probability vectors vj for j < M , are deter-
mined with the aid of the state-dependent balance equations and the normalizing
equation.
There are now (M + r1 )(N + 1) so-far-unused balance equations (the ones
where j < M + r1 ), of which (M + r1 )(N + 1) − 1 are linearly independent, plus
one normalizing equation. The number of unknowns is M (N + 1) +c (the vectors
vj for j = 0, 1, . . . , M − 1), plus the c constants αk . Hence, there is a unique
solution when c = r1 (N + 1).
Proposition 2 The Markov-modulated queue has a steady-state distribution if,

and only if, the number of eigenvalues of Q(x) strictly inside the unit disk,
each counted according to its multiplicity, is equal to the number of states of the
Markovian environment, N + 1, multiplied by the largest arrival batch, r1 . Then,
assuming that the eigenvectors of multiple eigenvalues are linearly independent,
the spectral expansion solution of (28) has the form
r1 ∗(N +1)

vj = αk uk xj−M
k ; j = M, M + 1, . . . . (32)
k=1
For computational purposes, the polynomial eigenvalue-eigenvector problem

of degree r can be transformed into a linear one. For example, suppose that Qr
is non-singular and multiply (29) on the right by Q−1
r . This leads to the problem
r−1

r
u H x + Ix =0, (33)
=0
where H = Q Q−1 r . Introducing the vectors y = x u, = 1, 2, . . . , r − 1, one

obtains the equivalent linear form

⎡ ⎤
0 −H0
⎢I 0 −H1 ⎥
⎢ ⎥
[u, y1 , . . . , yr−1 ] ⎢ . . ⎥ = x[u, y1 , . . . , yr−1 ] .
⎣ .. .. ⎦
I −Hr−1
As in the quadratic case, if Qr is singular then the linear form can be achieved
by an appropriate change of variable.
Example: Checkpointing and Recovery

Consider the transaction processing system described in section 2.4. Here r1 = N
and r2 = 1 (the queue size is incremented by 1 when jobs arrive and by 1, 2, . . . , N
when faults occur; it is decremented by 1 when a transaction completes service.
The threshold M is equal to 0. The matrices A, Bs and Cs are given by:
⎡ ⎤
0
⎢0 0 ⎥
⎢ ⎥
Aj = A = ⎢ . ⎥ ; j = 0, 1, . . . .
⎣ .. ⎦
β 0 ... 0
The only transition which changes the environment, but not the queue, is the
establishment of a checkpoint in state (N, j).
⎡ ⎤
λ
⎢ξ λ ⎥
⎢ ⎥
⎢0 0 λ ⎥
Bj,1 = B1 = ⎢ ⎥ ; j = 0, 1, . . . .
⎢ .. ⎥
⎣ . ⎦
λ
The queue size increases by 1 when a job arrives, causing a transition from (i, j)
to (i, j + 1), and also when a fault occurs in state (1, j); then the new state is
(0, j + 1).
⎡ ⎤
0
⎢0 0 ⎥
⎢ ⎥
⎢ξ 0 0 ⎥
Bj,2 = B2 = ⎢ ⎥ ; j = 0, 1, . . . .
⎢ .. ⎥
⎣. ⎦
0 0 ... 0
The queue size increases by 2 when a fault occurs in state (2, j), causing a
transition to state (0, j + 2). The other Bs matrices have a similar form, until
⎡ ⎤
0
⎢0 0 ⎥
⎢ ⎥
Bj,N = BN = ⎢ . ⎥ ; j = 0, 1, . . . .
⎣ .. ⎦
ξ 0 ... 0
34 I. Mitrani
There is only one matrix corresponding to decrementing queue:

⎡ ⎤
0μ
⎢ 0μ ⎥
⎢ ⎥
⎢ .. .. ⎥
Cj,1 = C1 = ⎢ . . ⎥ ; j = 1, 2, . . . .
⎢ ⎥
⎣ 0 μ⎦
0
The matrix polynomial Q(x) is of degree N + 1. According to Proposition

2, the condition for stability is that the number of eigenvalues in the interior of
the unit disk is N (N + 1).
References
1. J.A. Buzacott and J.G. Shanthikumar, Stochastic Models of Manufacturing Sys-

tems, Prentice-Hall, 1993.
2. J.N. Daigle and D.M. Lucantoni, Queueing systems having phase-dependent arrival
and service rates, in Numerical Solutions of Markov Chains, (ed. W.J. Stewart),
Marcel Dekker, 1991.
3. A.I. Elwalid, D. Mitra and T.E. Stern, Statistical multiplexing of Markov mod-
ulated sources: Theory and computational algorithms, Int. Teletraffic Congress,
1991.
4. M. Ettl and I. Mitrani, Applying spectral expansion in evaluating the performance
of multiprocessor systems, CWI Tracts (ed. O. Boxma and G. Koole), 1994.
5. H.R. Gail, S.L. Hantler and B.A. Taylor, Spectral analysis of M/G/1 type Markov
chains, RC17765, IBM Research Division, 1992.
6. I. Gohberg, P. Lancaster and L. Rodman, Matrix Polynomials, Academic Press,
1982.
7. W.K. Grassmann and S. Drekic, An analytical solution for a tandem queue with
blocking, Queueing Systems, 36, pp. 221–235, 2000.
8. B.R. Haverkort and A. Ost, Steady-State Analysis of Infinite Stochastic Petri Nets:
Comparing the Spectral Expansion and the Matrix-Geometric Method, Procs., 7th
Int. Workshop on Petri Nets and Performance Models, San Malo, 1997.
9. A. Jennings, Matrix Computations for Engineers and Scientists, Wiley, 1977.
10. A.G. Konheim and M. Reiser, A queueing model with finite waiting room and
blocking, JACM, 23, 2, pp. 328–341, 1976.
11. L. Kumar, M. Misra and I. Mitrani, Analysis of a Transaction System with Check-
pointing, Failures and Rollback, Computer Performance Evaluation (Eds T. Field,
P.G. Harrison and U. Harder), LNCS 2324, Springer, 2002.
12. G. Latouche, P.A. Jacobs and D.P. Gaver, Finite Markov chain models skip-free
in one direction, Naval Res. Log. Quart., 31, pp. 571–588, 1984.
13. I. Mitrani, Probabilistic Modelling, Cambridge University Press, 1998.
14. I. Mitrani, The Spectral Expansion Solution Method for Markov Processes on
Lattice Strips, Chapter 13 in Advances in Queueing, (Ed. J.H. Dshalalow), CRC
Press, 1995.
15. I. Mitrani and B. Avi-Itzhak, A many-server queue with service interruptions,
Operations Research, 16, 3, pp.628-638, 1968.
16. I. Mitrani and R. Chakka, Spectral expansion solution for a class of Markov mod-
els: Application and comparison with the matrix-geometric method, to appear in
Performance Evaluation, 1995.
17. I. Mitrani and D. Mitra, A spectral expansion method for random walks on semi-
infinite strips, IMACS Symposium on Iterative Methods in Linear Algebra, Brussels,
1991.
18. M.F. Neuts, Matrix Geometric Solutions in Stochastic Models, John Hopkins Press,
1981.
19. M.F. Neuts, Two queues in series with a finite intermediate waiting room, J. Appl.
Prob., 5, pp. 123–142, 1968.
20. M.F. Neuts and D.M. Lucantoni, A Markovian queue with N servers subject to
breakdowns and repairs, Management Science, 25, pp. 849–861, 1979.
21. N.U. Prabhu and Y. Zhu, Markov-modulated queueing systems, QUESTA, 5, pp.
215–246, 1989.
M/G/1-Type Markov Processes: A Tutorial∗
Alma Riska and Evgenia Smirni
Department of Computer Science

College of William and Mary
Williamsburg, VA 23187-8795
{riska,esmirni}@cs.wm.edu
Abstract. M/G/1-type processes are commonly encountered when

modeling modern complex computer and communication systems.
In this tutorial, we present a detailed survey of existing solution
methods for M/G/1-type processes, focusing on the matrix-analytic
methodology. From first principles and using simple examples, we derive
the fundamental matrix-analytic results and lay out recent advances.
Finally, we give an overview of an existing, state-of-the-art software tool
for the analysis of M/G/1-type processes.
Keywords: M/G/1-type processes; matrix analytic method; Markov

chains.
1 Introduction
Matrix analytic techniques, pioneered by Marcel Neuts [25,26], provide a frame-

work that is widely used for the exact analysis of a general and frequently encoun-
tered class of queueing models. In these models, the embedded Markov chains are
two-dimensional generalizations of elementary GI/M/1 and M/G/1 queues [13],
and their intersection, i.e., quasi-birth-death (QBD) processes. GI/M/1 and
M/G/1 Markov chains model systems with interarrival and service times char-
acterized, respectively, by general distributions rather than simple exponentials
and are often used as the modeling tool of choice in modern computer and com-
munication systems [24,30,35,6,18]. As a consequence, considerable effort has
been placed into the development of efficient matrix-analytic techniques for their
analysis [26,21,8,9,11,15]. Alternatively, GI/M/1 and M/G/1 Markov chains can
be analyzed by means of eigenvalues and eigenvectors [7].
The class of models that can be analyzed using M/G/1-type Markov chains
includes the important class of BMAP/G/1 queues, where the arrival process is
a batch Markovian arrival process (BMAP) [17,26,3]. Special cases of BMAPs in-
clude phase-type renewal processes (e.g., Erlang or Hyperexponential processes)
and non-renewal processes (e.g., the Markov modulated Poisson process). The
importance of BMAPs lies in their ability to be more effective and powerful
∗
This work has been supported by National Science Foundation under grands EIA-
9974992, CCR-0098278, and ACI-0090221.

M/G/1-Type Markov Processes: A Tutorial 37
traffic models than the simple Poisson process or the batch Poisson process, as
they can effectively capture dependence and correlation, salient characteristics
of Internet traffic [27,12,33].
In this paper, we focus on the solution techniques for M/G/1-type Markov
chains. Neuts [25] defines various classes of infinite-state Markov chains with a
repetitive structure, whose state space1 is partitioned into the boundary states
(0) (0) (i) (i)
S (0) = {s1 , . . . , sm } and the sets of states S (i) = {s1 , . . . , sn }, for i ≥ 1, that
correspond to the repetitive portion of the chain. For the class of M/G/1-type
Markov chains, the infinitesimal generator QM/G/1 has upper block Hessenberg
form:
⎡ (1) (2) (3) (4) ⎤
L F F F F ···
⎢B F(1) F(2) F(3) ···⎥
⎢ L ⎥
⎢0 B L F(1) F(2) ···⎥
⎢ ⎥
QM/G/1 = ⎢0 0 F(1) ···⎥ . (1)
⎢ B L ⎥
⎢0 0 0 B L ···⎦ ⎥
⎣
.. .. .. .. .. ..
. . . . . .
We use the letters “L”, “F”, and “B” to describe “local”, ‘forward”, and “back-
ward” transition rates, respectively, in relation to a set of states S (i) for i ≥ 1,
and a “” for matrices related to S (0) .
For systems of the M/G/1-type, matrix analytic methods have been pro-
posed for the solution of the basic equation π · QM/G/1 = 0 [26], where π
is the (infinite) stationary probability vector of all states in the chain. Key to
the matrix-analytic methods is the computation of an auxiliary matrix called
G. Traditional solution methodologies for M/G/1-type processes compute the
stationary probability vector with a recursive function based on G. Iterative
algorithms are used to determine G [20,16].
Another class of Markov-chains with repetitive structure that commonly oc-
curs in modeling of computer systems is the class of GI/M/1-type processes,
whose infinitesimal generator QGI/M/1 has a lower block Hessenberg form:
1
We use calligraphic letters to indicate sets (e.g., A), lower case boldface Roman or
Greek letters to indicate row vectors (e.g., a, α), and upper case boldface Roman
letters to indicate matrices (e.g., A). We use superscripts in parentheses or subscripts
to indicate family of related entities (e.g., A(1) , A1 ), and we extend the notation
to subvectors or submatrices by allowing sets of indices to be used instead of single
indices (e.g., a[A], A[A, B]). Vector and matrix elements are indicated using square
brackets (e.g., a[1], A[1, 2]). RowSum(·) indicates the diagonal matrix whose entry in
position (r, r) is the sum of the entries on the rth row of the argument (which can be
a rectangular matrix). Norm(·) indicates a matrix whose rows are normalized. 0 and
1 indicate a row vector or a matrix of 0’s, or a row vector of 1’s, of the appropriate
dimensions, respectively.
38 A. Riska and E. Smirni
⎡ ⎤

L
F 0 0 0 ···
⎢ (1) ⎥
⎢ B L F 0 0 ···⎥
⎢ (2) ⎥
QGI/M/1 =⎢ B
⎢ (3)
B(1) L F 0 ···⎥.
⎥ (2)
⎢B B(2) B(1) L F ···⎥
⎣ ⎦
.. .. .. .. .. . .
. . . . . .
The solution of GI/M/1-type processes is significantly simpler than the solu-
tion of M/G/1-type processes because of the matrix geometric relation [25] that
exists among the stationary probabilities of sets S (i) for i ≥ 1. This property
leads to significant algebraic simplifications resulting in the very elegant matrix-
geometric solution technique that was pioneered by Neuts and that was later
popularized by Nelson in the early ’90s [23,24]. Key to the matrix-geometric so-
lution is a matrix called R which is used in the computation of the steady-state
probability vector and measures of interest.
Quasi-Birth-Death (QBD) processes are the intersection of M/G/1-type and
GI/M/1-type processes and their infinitesimal generator has the structure de-
picted in Eq.(3).
⎡ ⎤
F
L 0 0 0 ···
⎢B ⎥
⎢ L F 0 0 ···⎥
⎢ 0 B L F 0 ···⎥
QQDB = ⎢ ⎥. (3)
⎢ 0 0 B L F ···⎥
⎣ ⎦
.. .. .. .. .. . .
. . . . . .
Since QBDs are special cases of both M/G/1-type processes and GI/M/1-type
processes, either the matrix-analytic method or the matrix-geometric solution
can be used for their analysis. The matrix-geometric solution is the preferable one
because of its simplicity. Both matrices G and R are defined for QBD processes.
We direct the interested reader to [16] for recent advances on the analysis of
QBD processes.
Key to the solution of Markov chains of the M/G/1, GI/M/1, and QBD
types, is the existence of a repetitive structure, as illustrated in Eqs. (1), (2),
and (3), that allows for a certain recursive procedure to be applied for the compu-
tation of the stationary probability vector π (i) corresponding to S (i) for i ≥ 1.
It is this recursive relation that gives elegance to the solution for the case of
GI/M/1 (and consequently QBD) Markov chains, but results in unfortunately
more complicated mathematics for the case of the M/G/1-type.
The purpose of this tutorial is to shed light into the existing techniques
for the analysis of Markov chains of the M/G/1 type that are traditionally
considered not easy to solve. Our intention is to derive from first principles (i.e.,
global balance equations) the repetitive patterns that allow for their solution and
illustrate that the mathematics involved are less arduous than initially feared.
Our stated goals and outline of this tutorial are the following:
– Give an overview of the matrix-geometric solution of GI/M/1 and QBD
processes and establish from first principles why a geometric solution exists
(Section 2).
– Use first principles to establish the most stable recursive relation for the
case of M/G/1-type processes and essentially illustrate the absence of any
geometric relation among the steady state probabilities of sets S (i) , i ≥ 0,
for such chains (Section 3).
– Present an overview of the current state of the art of efficient solutions for
M/G/1-type processes (Section 4).
– State the stability conditions for M/G/1-type processes (Section 5).
– Summarize the features of an existing software tool that can provide M/G/1-
type solutions (Section 6).
Our aim is to make these results more accessible to performance modelers. We
do this by presenting simplified derivations that are often example driven and
by describing an existing tool for the solution of such processes.
2 Matrix Geometric Solutions for GI/M/1-Type and

QBD Processes
In this section we give a brief overview2 of the matrix geometric solution tech-
nique for GI/M/1-type and QBD processes. While QBDs fall under both the
M/G/1 and the GI/M/1-type cases, they are most commonly associated with
GI/M/1 processes because they can be both solved using the very well-known
matrix geometric approach [25].
Key to the general solution for the generator of Eqs.(2) and (3) is the assump-
tion that a geometric relation3 holds among the stationary probability vectors
π (i) of states in S (i) for i ≥ 1:
∀i ≥ 1, π (i) = π (1) · Ri−1 , (4)
where, in the GI/M/1-type case, R is the solution of the matrix equation
∞

F+R·L+ Rk+1 · B(k) = 0, (5)
k=1
and can be computed using iterative numerical algorithms. The above equation
is obtained from the balance equations of the repeating portion of the process,
i.e., starting from the third column of QGI/M/1 . Using Eq.(4) and substituting
in the balance equation that corresponds to the second column of QGI/M/1 , and
together with the normalization condition
∞

π (0) · 1T + π (1) · Ri−1 · 1T = 1 i.e., π (0) · 1T + π (1) · (I − R)−1 · 1T = 1,
i=1
we obtain the following system of linear equations

2
In this section and in the remainder of this tutorial we assume continuous time
Markov chains, or CTMCs, but our discussion applies just as well to discrete time
Markov chains, or DTMCs.
3
This is similar to the simplest degenerate case of a QBD process, the straight forward
birth-death M/M/1 case.

e (L(0) ) (1)
F
[π (0)
,π (1)
]· ∞ = [1, 0],
(I − R)−1 · e
∞ (k)
Rk−1 · B L+ Rk · B(k)
k=1 k=1
(6)
that yields a unique solution for π (0) and π (1) . The symbol “ ” indicates that
we discard one (any) column of the corresponding matrix, since we added a
column representing the normalization condition. For i ≥ 2, π (i) can be obtained
numerically from Eq. (4), but many useful performance metrics such as expected
system utilization, throughput, or queue length can be expressed explicitly in
closed-form using π (0) , π (1) , and R only (e.g., the average queue length is simply
given by π (1) · (I − R)−2 · 1T ) [23].
In the case of QBD processes, Eq. (5) simply reduces to the matrix quadratic
equation
F + R · L + R2 · B = 0,
while π (0) and π (1) are obtained as the solution of the following system of linear
equations [24]:

e (L(0) ) F (1)
[π , π ] ·
(0) (1)
L + R · B = [1, 0].
(I − R)−1 · e (B)
Again, the average queue length is given by the same equation as in the GI/M/1
case.
2.1 Why Does a Geometric Relation Hold for QBD Processes?

There is a clear intuitive appeal to the fact that a geometric relation holds for
QBD processes. In this section, we first focus on the reasons for the existence of
this relationship via a simple example. Our first example is a QBD process that
models an M/Cox2 /1 queue. The state transition diagram of the CTMC that
models this queue is depicted in Figure 1. The state space S of this CTMC is
divided into subsets S (0) = {(0, 0)} and S (i) = {(i, 1), (i, 2)} for i ≥ 1, implying
that the stationary probability vector is also divided in the respective subvectors
π (0) = [π(0, 0)] and π (i) = [π(i, 1), π(i, 2)], for i ≥ 1. The block-partitioned
infinitesimal generator QQBD is a infinite block tridiagonal matrix as defined in
Eq.(3) and its component matrices are:

0.2μ
L = [−λ] , F = [λ 0] , B= ,
γ (7)
0.2μ 0 −(λ + μ) 0.8μ λ 0
B= , L= , F= .
γ 0 0 − (λ + γ) 0 λ
To illustrate the existence of the geometric relationship among the various

stationary probability vectors π (i) , we use the concept of stochastic comple-
mentation [22]. A detailed summary of some important results on stochastic
complementation is presented in Appendix A. We partition the state space into
two subsets; A = ∪j=i
j=0 S
(j)
and A = ∪∞j=i+1 S
(j)
, i.e., a set with finite number of
(1) (2) (3)

S (0) S S S
λ λ λ λ
0 ,0 1 ,1 2 ,1 3,1 .......
0.2μ 0.2μ 0.2μ
0.8μ 0.8μ 0.8μ
γ γ γ
1 ,2
.......
2 ,2 3,2
λ λ λ
Fig. 1. The CTMC modeling an M/Cox2 /1 queue.
(1) (i)
(0) S S
S
λ
λ λ
0 ,0 1 ,1 ....... 0.2μ i,1
0.2μ
0.8μ 0.8μ x
γ γ γ
1 ,2 ....... i,2
λ λ
Fig. 2. The CTMC of the stochastic complement of A = ∪j=i

j=0 S
(j)
of the CTMC
modeling an M/Cox2 /1 queue.
states and a set with an infinite number of states, respectively. The stochastic
complement of the states in A is a new Markov chain that “skips over” all states
in A. This Markov chain includes states in A only but all transitions out of S (i)
(i.e., the boundary set) to S (i+1) (i.e., the first set in A) need to be “folded back”
to A (see Figure 2). This folding introduces a new direct transition with rate x
that ensures that the stochastic complement of the states in A is a stand-alone
process. Because of the structure of this particular process, i.e., A is entered from
A only through state (i, 1), x is simply equal to λ (see Lemma 1 in Appendix
A). Furthermore, because of the repetitive structure of the original chain, this
rate does not depend on i (which essentially defines the size of set A).
The steady state probability vector π = [π (0) , · · · , π (i) ] of the stochastic
complement of the states in A relates to the steady state probability π A of the
original process with: π = π A /π A · 1T . This implies that if a relation exists
between π (i−1) and π (i) , then the same relation holds for π (i−1) and π (i) .
The flow balance equations for states (i, 1) and (i, 2) in the stochastic com-
plement of A, are:
(0.2μ + 0.8μ)π (i) [1] = λπ (i−1) [1] + λπ (i) [2],

(γ + λ)π (i) [2] = λπ (i−1) [2] + 0.8μπ (i) [1],
which can further be expressed as:
(−μπ (i) [1] + λπ (i) [2]) = −λπ (i−1) [1],
(0.8μπ (i) [1] − (γ + λ)π (i) [2]) = −λπ (i−1) [2].
This last set of equations leads us to the following matrix equation

−μ 0.8μ λ0
π (i) [1], π (i) [2] = − π (i−1) [1], π (i−1) [2] ,
λ −(γ + λ) 0λ
which implies that the relation between π (i−1) and π (i) can be expressed as
π (i) = π (i−1) R, (8)
where matrix R is defined as

−1 −1
λ 0 −μ 0.8μ 1 0
R=− · = −F L + F . (9)
0 λ λ − (γ + λ) 1 0
Applying Eq.(8) recursively, one can obtain the result of Eq.(4). Observe that
in this particular case an explicit computation of R is possible (i.e., there is no
need to compute R [28] via an iterative numerical procedure as in the general
case). This is a direct effect of the fact that in this example backward transitions
from S (i) to S (i−1) are directed toward a single state only. In Appendix B, we
give details on the cases when matrix R can be explicitly computed.
2.2 Generalization: Geometric Solution for the GI/M/1 Processes
We generalize the finding in the previous example by considering a GI/M/1-

queue with infinitesimal generator QGI/M/1 similarly to the proof given in [14].
To evaluate the relation between π (i−1) and π (i) for i > 1, we construct the
stochastic complement of the states in A = ∪ij=0 S (j) (A = S −A). The stochastic
complement of states in A has an infinitesimal generator defined by the following
relation
Q = Q[A, A] + Q[A, A] · (−Q[A, A])−1 · Q[A, A],
where
⎡ ⎤ ⎡ ⎤
L F ··· 0 0 0 0 0 0 ···
⎢ (1) ⎥ ⎢ 0 0 0 0 ···⎥
⎢ B L ··· 0 0⎥ ⎢ ⎥
⎢. .. ⎥ ⎢ ⎥
Q[A, A] = ⎢
⎢ ..
..
.
.. ..
. .
⎥
. ⎥, Q[A, A] = ⎢ ... ... ... ... ... ⎥ ,
⎢ (i−1) ⎥ ⎢ ⎥
⎣B B(i−2) · · · L F⎦ ⎣ 0 0 0 0 ···⎦
(i)
B B(i−1) · · · B(1) L F 0 0 0 ···
⎡ ⎤ ⎡ ⎤
(i+1)
B B(i) · · · B(1) L F 0 0 ···
⎢ (i+2) ⎥ ⎢ B(1) 0 ···⎥
⎢B B(i+1) · · · B(2) ⎥ ⎢ (2) L F ⎥
⎢ (i+3) ⎥
Q[A, A] = ⎢ B B(i+2) · · · B(3) ⎥ , Q[A, A] = ⎢
⎢B B(1) L F ···⎥ ⎥.
⎢ (i+4) ⎥ ⎢ B(3)
⎢B B(i+3) ··· B ⎦ (4) ⎥
⎣ B(2) B(1) L ···⎥ ⎦
⎣ .. .. .. .. . .
.. .. .. ..
. . . . . . . . .
(10)
Observe that Q[A, A] is the same matrix for any i > 1. We define its inverse
to be as follows
⎡ ⎤
A0,0 A0,1 A0,2 A0,3 · · ·
⎢ A1,0 A1,1 A1,2 A1,3 · · · ⎥
⎢ ⎥
⎢ ⎥
(−Q[A, A]) = ⎢ A2,0 A2,1 A2,2 A2,3 · · · ⎥ .
−1
(11)
⎢ A3,0 A3,1 A3,2 A3,3 · · · ⎥
⎣ ⎦
.. .. .. .. ..
. . . . .
From the special structure of Q[A, A] we conclude that the second term in the
summation that defines Q is a matrix with all block entries equal to zero except
the very last block row, whose block entries Xj are of the form:
∞

Xj = F · (j+1+k)
A0,k B j=i
k=0
and
∞

Xj = F · A0,k B(j+1+k) , 0 ≤ j < i.
k=0
∞
Note that X0 = F · k=0 A0,k B(1+k) which means that X0 does not depend on
the value of i > 1. The infinitesimal generator Q of the stochastic complement
of states in A is determined as
⎡ ⎤

L
F ··· 0 0
⎢ (1) ⎥
⎢ B L ··· 0 0 ⎥
⎢ ⎥
Q=⎢ ⎢
..
.
..
.
..
.
..
.
..
.
⎥.
⎥ (12)
⎢ (i−1) ⎥
⎣ B B (i−2)
··· L F ⎦
(i) + Xi B(i−1) + Xi−1 · · · B(1) + X1 L + X0
B
Let π be the stationary probability vector of the CTMC with infinitesimal gen-
erator Q and π A the steady-state probability vector of the CTMC of states in
A in the original process, i.e., the process with infinitesimal generator QGI/M/1 .
There is a linear relation between π and π A given in the following equation:
πA
π= . (13)
π A · 1T
Since πQ = 0, we obtain the following relation
π (i) · (L + X0 ) = −π (i−1) · F
implying:
π (i) · (L + X0 ) = −π (i−1) · F.
The above equation holds for any i > 1, because their matrix coefficients do not
depend on i. By applying it recursively over all vectors π (i) for i > 1, we obtain
the following geometric relation
π (i) = π (1) · Ri−1 ∀ i ≥ 1.
Matrix R, the geometric coefficient, has an important probabilistic interpre-

tation: the entry (k, l) of R is the expected time spent in the state l of S (i) ,
before the first visit into S (i−1) , expressed in time unit Δi , given the starting
state is k in S (i−1) . Δi is the mean sojourn time in the state k of S (i−1) for
i ≥ 2 [25, pages 30-35].
3 Why M/G/1 Processes Are More Difficult

For M/G/1-type processes there is no geometric relation among the various
probability vectors π (i) for i ≥ 1 as in the case of QBD and GI/M/1-type
processes. In this section, we first demonstrate via a simple example why such a
geometric relation does not exist and then we generalize and derive Ramaswami’s
recursive formula, i.e., the classic methodology for the solution of M/G/1 chains.
3.1 Example: A BM AP1 /Cox2 /1 Queue

Figure 3 illustrates a Markov chain that models a BM AP1 /Cox2 /1 queue. This
chain is very similar with the one depicted in Figure 1, the only difference is that
the new chain models bulk arrivals of unlimited size. The infinitesimal generator
QM/G/1 of the process is block partitioned according to the partitioning of the
state space S of this CTMC into subsets S (0) = {(0, 0)} and S (i) = {(i, 1), (i, 2)}
for i ≥ 1. The definition of the component matrices of QM/G/1 is as follows:

= [−2λ] ,
L B = 0.2μ , F (i) = 0.5i−1 λ 0 i ≥ 1,
γ
i−1
0.2μ 0 −(2λ + μ) 0.8μ 0.5 λ 0
B= , L= , F(i) = i ≥ 1.
γ 0 0 − (2λ + γ) 0 0.5i−1 λ
(0) (1) (2) (3)
...
S S S S
...
0.25λ 0.25λ
0.25λ
0.5λ 0.5λ 0.5λ
λ λ λ λ
0 ,0 1 ,1 2 ,1 3,1 .......
0.2μ 0.2μ 0.2μ
0.8μ 0.8μ 0.8μ
γ γ γ
1 ,2
.......
2 ,2 3,2
λ λ λ
0.5λ 0.5λ
0.25λ
0.25λ
...
Fig. 3. The CTMC that models a BM AP1 /Cox2 /1 queue.
In the following, we derive the relation between π (i) for i ≥ 1 and the rest of
vectors in π using stochastic complementation, i.e., similarly to the approach
described in Section 2. First we partition the state space S into two partitions
A = ∪j=i
j=0 S
(j)
and A = ∪∞j=i+1 S
(j)
and then we construct the stochastic comple-
ment of states in A. The Markov chain of the stochastic complement of states in
A, (see Figure 4), illustrates how transitions from states (j, 1) and (j, 2) for j ≤ i
and state (0, 0) to states (l, 1) and (l, 2) for l > i are folded back to state (i, 1),
which is the single state to enter A from states in A. These “back-folded” tran-
sitions are marked by xk,h for k ≤ i and h = 1, 2 and represent the “correction”
needed to make the stochastic complement of states in A, a stand-alone process.
Because of the single entry state in A the stochastic complement of states in A
for this example can be explicitly derived (see Lemma 1 in Appendix A) and the
definition of rates xk,h is as follows:
xk,h = 2 · 0.5i−k λ = 0.5i−k−1 λ, i ≥ 1, k ≤ i, h = 1, 2.
The flow balance equations for states (i, 1) and (i, 2) for the stochastic comple-
ment of states in A are:
(0.2μ + 0.8μ)π (i) [1] = 2λπ (i) [2]
+ 2 · 0.5i−1 λπ (0) [1]
+ 2 · 0.5i−2 λπ (1) [1] + 0.5i−2 λπ (1) [2] + ...
+ 2 · 0.5i−i λπ (i−1) [1] + 0.5i−i λπ (i−1) [2]
and
(2λ + γ)π (i) [2] = 0.8μπ (i) [1] + 0.5i−2 λπ (1) [2] + ... + 0.5i−i λπ (i−1) [2].
S (0) S (1) S (i)

x 0,0
x 1,1
0.25λ
0.5λ
λ λ λ
0 ,0 1 ,1 .... i,1
0.2μ 0.2μ 0.2μ
0.8μ γ 0.8μ x i,2
γ x 1,2
1 ,2 .... γ
i,2
λ λ
Fig. 4. Stochastic complement of BM AP1 /Cox2 /1-type queue at level i.
In the above equalities we group the elements of π (i) on the left and the
rest of the terms on the right in order to express their relation in terms of
the block matrices that describe the infinitesimal generator Q of the stochastic
complement of states in A. By rearranging the terms, the above equations can
be re-written as:
−μπ (i) [1] + 2λπ (i) [2] = −(2 · 0.5i−1 λπ (0) [1]
+ 2 · 0.5i−2 λπ (1) [1] + 0.5i−2 λπ (1) [2] + ...
+ 2 · 0.5i−i λπ (i−1) [1] + 0.5i−i λπ (i−1) [2])
and
0.8μπ (i) [1] − (2λ + γ)π (i) [2] = −(0.5i−2 λπ (1) [2] + ... + 0.5i−i λπ (i−1) [2]).
We can now re-write the above equations in the following matrix equation form:

−μ 0.8μ (0) i−1

(i)
[π [1], π
(i)
[2]] · =− π [1] 2 · 0.5 λ 0
2λ −(2λ + γ)

2 · 0.5i−2 λ 0
+ [π (1) [1], π (1) [2]] + ...
0.5i−2 λ 0.5i−2 λ

2 · 0.5i−i λ 0
+ [π (i−1) [1], π (i−1) [2]] .
0.5i−i λ 0.5i−i λ
By substituting [π (i) [1], π (i) [2]] with π (i) and expressing the coefficient ma-
trices in the above equation in terms of the component matrices of the infinites-
imal generator Q of the stochastic complement of states in A, we obtain4 :
4
Recall that π is the stationary probability vector of the stochastic complement
of states in A and π[A] is the stationary probability vector of states in A in
∞
∞
∞
∞

π (i) ·(L+ F(j) G) = −(π (0) (j) G+π (1)
F F(j) G+...+π (i−1) F(j) G),
j=1 j=i j=i−1 j=1
where G is a matrix with the following structure:

1 0
G= .
1 0
Note that at this point, we have introduced a new matrix, G, that has an im-
portant probabilistic interpretation. In this specific example, the matrix G can
be explicitly derived [28]. This is a direct outcome of the fact that all states in
set S (i) for i ≥ 1 return to the same single state in set S (i−1) . Equivalently,
the matrix B of the infinitesimal generator QM/G/1 has only a single column
different from zero.
3.2 Generalization: Derivation of Ramaswami’s Recursive Formula
In this section, we investigate the relation between π (i) for i > 1 and π (j) for
0 ≤ j < i for the general case in the same spirit as [34]. We construct the
stochastic complementation of the states in A = ∪ij=0 S (j) (A = S − A). We
obtain
⎡ ⎤ ⎡ (i+1) ⎤
F
L (1) (i−1) F
··· F (i)
F (i+2) F
F (i+3) ···
⎢B L (i−1) ⎥
(i−2) F
··· F ⎢ F(i) F (i+1)
F(i+2) ···⎥
⎢ ⎥ ⎢ ⎥
⎢ . . .. .. .. ⎥ ⎢ . .. .. ⎥
Q[A, A] = ⎢ .. .. ⎥ , Q[A, A] = ⎢ .. . . ···⎥ ,
⎢ . . . ⎥ ⎢ ⎥
⎣0 0 ··· L F ⎦ ⎣ F(2) F(3) F(4) ··· ⎦
0 0 ··· B L F(1) F(2) F(3) ···
⎡ ⎤ ⎡ ⎤
0 0 ··· 0 B L F(1) F(2) F(3) ···
⎢0 0 ··· 0 0 ⎥ ⎢B L F(1) F(2) ···⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ···⎥
Q[A, A] = ⎢ 0 0 · · · 0 0 ⎥ , Q[A, A] = ⎢ 0 B L F(1) ⎥.
⎢0 0 ··· 0 0 ⎥ ⎢0 0 B L ···⎥
⎣ ⎦ ⎣ ⎦
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . .
The stochastic complement for states in A has an infinitesimal generator

defined as follows
Q = Q[A, A] + Q[A, A] · (−Q[A, A])−1 · Q[A, A].
the original M/G/1 process. They relate to each other based on the equation
π = π[A]/(π[A]1T ), which implies that any relation that holds among subvectors
π (j) for j ≤ i would hold for subvectors π (j) for j ≤ i as well
Observe that Q[A, A] is the same matrix for any i ≥ 1. We define its inverse to
be as follows ⎡ ⎤
A0,0 A0,1 A0,2 A0,3 · · ·
⎢ A1,0 A1,1 A1,2 A1,3 · · · ⎥
⎢ ⎥
⎢ ⎥
(−Q[A, A]) = ⎢ A2,0 A2,1 A2,2 A2,3 · · · ⎥ .
−1
(14)
⎢ A3,0 A3,1 A3,2 A3,3 · · · ⎥
⎣ ⎦
.. .. .. .. ..
. . . . .
From the special structure of Q[A, A] we conclude that the second term of the
above summation is a matrix with all block entries equal to zero except the very
last block column, whose block entries Xj are of the form:
∞

Xi = (i+1+k) · Ak,0 · B
F
k=0
and
∞

Xj = F(j+1+k) · Ak,0 · B, 0 ≤ j < i.
k=0
The infinitesimal generator Q of the stochastic complement of states in A is

determined as ⎡ (1) ⎤
L F ··· F (i−1) F (i) + Xi
⎢B F(i−1) + Xi−1 ⎥
⎢ L ··· F
(i−2)
⎥
⎢ . . .. .. .. ⎥
Q = ⎢ .. .. ⎥. (15)
⎢ . . . ⎥
⎣0 0 ··· L (1)
F +X ⎦
1
0 0 ··· B L + X0
We define π to be the steady-state probability vector of the CTMC with infinites-
imal generator Q and π A the steady-state probability vector of the CTMC with
infinitesimal generator QM/G/1 corresponding to the states in A. There is a
linear relation between π and π A :
πA
π= . (16)
π A · 1T
From the relation πQ = 0, it follows that

i−1
(i) + Xi ) +
π (i) · (L + X0 ) = −(π (0) · (F π (j) · (F(i−j) + Xi−j )) ∀i ≥ 1
j=1
and

i−1
(i) + Xi ) +
π (i) · (L + X0 ) = −(π (0) · (F π (j) · (F(i−j) + Xi−j )) ∀i ≥ 1. (17)
j=1
The above equation shows that there in no geometric relation between vectors
π (i) for i ≥ 1, however it provides a recursive relation for the computation of
the steady-state probability vector for M/G/1 Markov chains. In the following,
we further work on simplifying the expression of matrices Xj for 0 ≤ j ≤ i.
From the definition of the stochastic complementation (see Appendix A) we
know that an entry [r, c] in (−Q[A, A]−1 · Q[A, A]) 5 represents the probability
that starting from state r ∈ A the process enters A through state c. Since
A is entered from A only through states in S (i) we can use the probabilistic
interpretation of matrix G to figure out the entries in (−Q[A, A]−1 ) · Q[A, A].
An entry [r, c] in Gj for j > 0 represents the probability that starting from
state r ∈ S (i+j) for i > 0 the process enters set S (i) through state c. It is
straightforward now to define
⎡ ⎤
0 0 ··· 0 G
⎢ 0 0 · · · 0 G1 ⎥
⎢ ⎥
⎢ 2⎥
(−Q[A, A] ) · Q[A, A] = ⎢ 0 0 · · · 0 G3 ⎥ .
−1
(18)
⎢0 0 ··· 0 G ⎥
⎣ ⎦
.. .. .. .. ..
. . . . .
The above result simplifies the expression of Xj as follows

∞
∞

Xi = (i+k) · Gk and Xj =
F F(j+k) · Gk , 0 ≤ j < i. (19)
k=1 k=1
This is in essence Ramaswami’s recursive formula. We will return to this in

the following section after we elaborate on matrix G, its implications, and its
probabilistic interpretation.
4 General Solution Method for M/G/1
For the solution of M/G/1-type processes, several algorithms exist [2,20,26].

These algorithms first compute matrix G as the solution of the following equa-
tion:
∞
B + LG + F(i) Gi+1 = 0. (20)
i=1
The matrix G has an important probabilistic interpretation: an entry (r, c) in G

expresses the conditional probability of the process first entering S (i−1) through
state c, given that it starts from state r of S (i) [26, page 81]6 . Figure 5 illus-
trates the relation of entries in G for different paths of the process. From the
probabilistic interpretation of G the following structural properties hold [26]
– if the M/G/1 process with infinitesimal generator QM/G/1 is recurrent then

G is row-stochastic,
5
Only the entries of the last block column of (−Q[A, A]−1 ) · Q[A, A] are different
from zero.
6
The probabilistic interpretation of G is the same for both DTMCs and CTMCs.
..
..
..
..
..
.
.
.
.
(0 ) (1 ) (i) (i+ 1 ) (i+ 2 )
S S S S S
... r ...
j k
h
i− 1 G [k ,h ] G [r,k ]
G [h ,j]
2
G [r,h ]
Fig. 5. Probabilistic interpretation of G.
– to any zero column in matrix B of the infinitesimal generator QM/G/1 , there

is a corresponding zero column in matrix G.
The G matrix is obtained by solving iteratively Eq.(20). However, recent ad-
vances show that the computation of G is more efficient when displacement
structures are used based on the representation of M/G/1-type processes by
means of QBD processes [20,2,1,16]. The most efficient algorithm for the com-
putation of G is the cyclic reduction algorithm [2].
4.1 Ramaswami’s Formula

From Eqs.(17) and (19) and the aid of matrix G, we derive Ramaswami’s recur-
sive formula [29], which is numerically stable because it entails only additions and
multiplications7 . Ramaswami’s formula defines the following recursive relation
among stationary probability vectors π (i) for i ≥ 0:

i−1
−1
(0) (i)
π =− π S +
(i)
π S
(k) (i−k)
S(0) ∀i ≥ 1, (21)
k=1
(i) and S(i) are defined as follows:

where, letting F(0) ≡ L, matrices S
∞
∞

(i) =
S (l) Gl−i , i ≥ 1,
F S(i) = F(l) Gl−i , i ≥ 0. (22)
l=i l=i
Observe that the above auxiliary sums represent the last column in the infinites-
imal generator Q defined in Eq.(15). We can express them in terms of matrices
Xi defined in Eq.(19) as follows:
S (i) + Xi , i ≥ 1
(i) = F S(i) = F(i) + Xi , i ≥ 0.
7
Subtractions on these type of formulas present the possibility of numerical instabil-
ity [26,29].
Given the above definition of π (i) for i ≥ 1 and the normalization condition, a
unique vector π (0) can be obtained by solving the following system of m linear
equations, i.e., the cardinality of set S (0) :
⎡ ⎛ ⎞−1 ⎤
∞ ∞
⎢ (0) (1) (0) −1 (i) ⎝ ⎥
π (0) ⎣ L −S S B | 1T − S S(j) ⎠ 1T ⎦ = [0 | 1], (23)
i=1 j=0
where the symbol “ ” indicates that we discard one (any) column of the corre-
sponding matrix, since we added a column representing the normalization condi-
tion. Once π (0) is known, we can then iteratively compute π (i) for i ≥ 1, stopping
when the accumulated probability mass is close to one. After this point, mea-
sures of interest can be computed. Since the relation between π (i) for i ≥ 1 is
not straightforward, computation of measures of interest requires generation of
the whole stationary probability vector.
4.2 Special Case: Explicit Computation of G
A special case of M/G/1-type processes occurs when B is a product of two

vectors, i.e., B = α·β. Assuming, without loss of generality, that β is normalized,
then G = 1T · β, i.e., it is derived explicitly [28,30].
For this special case, G = Gn , for n ≥ 1. This special structure of matrix
G simplifies the form of matrices S (i) for i ≥ 1, and S(i) for i ≥ 0 defined in
Eq.(22):
(i) + (∞
(i) = F (j)
S j=i+1 F ) · G, i ≥ 1
+( ∞ (24)
(i)
S =F (i)
j=i+1 F(j) ) · G, i ≥ 0, F(0) ≡ L.
The major gain of this special case is the fact that G does not need to be either
computed or fully stored.
4.3 Fast FFT Ramaswami’s Formula
[19] gives an improved version of Ramaswami’s formula. Once π (0) is known using
Eq.(23), the stationary probability vector is computed using matrix-generating
functions associated with block triangular Toeplitz matrices8 . These matrix-
generating functions are computed efficiently by using fast Fourier transforms
(FFT).
The algorithm of the Fast FFT Ramaswami’s formula is based on the fact
that in practice it is not possible to store an infinite number of matrices to
express the M/G/1-type process. Assuming that only p matrices can be stored
then the infinitesimal generator QM/G/1 has the following structure
8
A Toeplitz matrix has equal elements in each of its diagonals allowing the use of
computationally efficient methods.
⎡ ⎤
F
L (1) (2)
F (3)
F (4)
F ··· F (p) 0 0 ···
⎢B ···⎥
⎢ L F(1) F(2) F(3) · · · F(p−1) F(p) 0 ⎥
⎢ ⎥
⎢0 B L F(1) F(2) · · · F(p−2) F(p−1) F(p) ···⎥
⎢ ⎥
⎢0 0 B L F(1) · · · F(p−3) F(p−2) F(p−1) ···⎥
⎢ . . ⎥
⎢ . . .. .. .. .. .. .. .. .. ⎥
⎢ . .⎥
QM/G/1 =⎢ . . . . . . . .
⎥. (25)
⎢0 0 0 0 0 · · · F(1) F(2) F(3) ···⎥
⎢ ⎥
⎢0 0 0 0 0 ··· L F(1) F(2) ···⎥
⎢ ⎥
⎢0 0 0 0 0 ··· B L F(1) ···⎥
⎢ ⎥
⎢0 0 0 0 0 ··· 0 B L ···⎥
⎣ ⎦
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . .
Because there are only p matrices of type F (i) and F(i) , there are only p

sums of type S and S to be computed. Therefore, the computation of π (i)
(i) (i)
for i > 0 using Ramaswami’s formula, i.e., Eq.(21), depends only on p vectors
π (j) for max(0, i − p) ≤ j < i. Define
π̃ (1) = [π (1) , ..., π (p) ] and π̃ (i) = [π (p(i−1)+1) , ..., π (pi) ] for i ≥ 2. (26)
The above definition simplifies the formalization of Ramaswami’s formula since

π̃ (i) depends only on the values of π̃ (i−1) for i > 1. If we apply Ramaswami’s
formula for vectors π (1) to π (p) , we obtain the following equations
(1) (S(0) )−1

π (1) =−π (0) S
(0) (2)
π =−(π S + π (1) S(1) )(S(0) )−1
(2)
(3) + π (1) S(2) + π (2) S(1) )(S(0) )−1

π (3) =−(π (0) S . (27)
..
.
(p) + π (1) S(p−1) + ... + π (p−1) S(1) )(S(0) )−1
π (p) =−(π (0) S
We rewrite the above equations in the following form:
π (1) S(0) (1)

=−π (0) S
(0) (2)
π (2) S(0) + π (1) S(1) =−π S
π (3) S(0) + π (2) S(1) + π (1) S(2) (3) .
=−π (0) S (28)
..
.
(p)
π (p) S(0) + π (p−1) S(1) + ... + π (1) S(p−1) =−π (0) S
Define
⎡ ⎤
S(0) S(1) S(2) · · · S(p−1)
⎢ 0 S(0) S(1) · · · S(p−2) ⎥
⎢ ⎥
⎢ S(0) · · · S(p−3) ⎥ (2) , S
(1) , S (3) , · · · , S
(p) . (29)
Y=⎢ 0 0 ⎥ and b = S
⎢ . .. .. .. .. ⎥
⎣ .. . . . . ⎦
0 0 0 · · · S(0)
The set of equations in Eq.(28) can be written in a compact way by using the
definitions in Eq.(26) and Eq.(29).
π̃ (1) = −π (0) · b · Y−1 . (30)
We apply Ramswami’s formula for all vectors π (j) , p(i − 1) + 1 ≤ j ≤ pi in π̃ (i)

for i > 1.
π (p(i−1)+1) =−(π (p(i−2)+1) S(p) +...+ π (p(i−1)) S(1) ) (S(0) )−1

π (p(i−1)+2) =−(π (p(i−2)+2) S(p) +...+ π (p(i−1)+1) S(1) ) (S(0) )−1
π (p(i−1)+3) =−(π (p(i−2)+3) S(p) +...+ π (p(i−1)+2) S(1) ) (S(0) )−1 .
..
.
π (p(i−1)+p) =−(π (p(i−2)+p) S(p) +...+π (p(i−1)+p−1) S(1) ) (S(0) )−1
These equations can be rewritten in the following form
π (p(i−1)+1) S(0) =−(π (p(i−2)+1) S(p) + ... +π (p(i−1)) S(1) )

π (p(i−1)+2) S(0) + π (p(i−1)+1) S(1) =−(π (p(i−2)+2) S(p) + ... +π (p(i−1)) S(2) )
π (p(i−1)+3) S(0) + ... + π (p(i−1)+1) S(2) =−(π (p(i−2)+3) S(p) + ... +π (p(i−1)) S(3) )
..
.
π (p(i−1)+p) S(0) + ... + π (p(i−1)+1) S(p−1) =−π (p(i−1)) S(p) .
The above set of equations can be written in a matrix form as
π̃ (i) = −π̃ (i−1) · ZY−1 i ≥ 2, (31)
where matrix Y is defined in Eq.(29) and the definition of matrix Z is given by

the following
⎡ (p) ⎤
S 0 ··· 0 0
⎢ S(p−1) S(p) · · · 0 0 ⎥
⎢ ⎥
⎢ .. .
.. . . . . .. ⎥ .
Z=⎢ . .. . ⎥ (32)
⎢ ⎥
⎣ S (2)
S ··· S
(3) (p)
0 ⎦
S(1) S(2) · · · S(p−1) S(p)
The Fast Ramaswami’s Formula consists of the set of equations defined in

Eq.(30) and Eq.(31). The effectiveness of the representation of the Ramaswami’s
formula in the form of Eq.(30) and Eq.(31) comes from the special structure of
matrices Y and Z. The matrix Y is an upper block triangular Toeplitz ma-
trix and the matrix Z is a lower block triangular Toeplitz matrix. Using fast
Fourier transforms one can compute efficiently the inverse of a Toeplitz matrix
or the multiplication of a vector with a Toeplitz matrix [19]. Although the use
of Fourier transforms for matrix operations may result in numerical instabilities,
in numerous test cases the above algorithm has not experienced instability [19,
21].
4.4 ETAQA-M/G/1
Etaqa[31]
∞ (i) is an aggregation-based technique that computes only π , π and
(0) (1)
i=2 π for an M/G/1-type processes by solving a finite system of linear equa-

tions. Distinctively from the classic techniques of solving M/G/1-type processes,
this method recasts the problem into that of solving a finite linear system in
m + 2n unknowns, where m is the number of states in the boundary portion of
the process and n is the number of states in each of the repetitive “levels” of the
state space, and are able to obtain exact results. The ETAQA methodology uses
basic, well-known results for Markov chains. Assuming that the state space S is
partitioned into sets S (j) , j ≥ 0, instead of evaluating the probability distribu-
tion of all states in each S (j) , we calculate the aggregate probability distribution
of n classes of states T (i) , 1 ≤ i ≤ n, appropriately defined (see Figure 6).
Ë ´½µ Ë ´¾µ Ë ´¿µ ´½µ ´¾µ ´¿µ

´¼µ ½ ½ ½

´½µ

´¾µ

´¿µ
Ì ´½µ ½
Ë ´¼µ ´¼µ
½
½ ½ ½
´¼µ
´¼µ ¿ ¾
´½µ ´¾µ
¾ ¾
´¿µ

´¼µ ´½µ ´¾µ ´¿µ ¾

´¼µ ¿ ¾ ¾ ¾ Ì ´¾µ
¾
´¼µ
´½µ ´¾µ ´¿µ
´¼µ

´½µ
´¾µ
´¿µ
Ì
Fig. 6. Aggregation of an infinite S into a finite number of states.
The following theorem formalizes the basic ETAQA result.

Theorem 1. [Etaqa] Given an ergodic CTMC with infinitesimal generator
QM/G/1 having the structure shown in Eq.(1), with stationary probability vector
π = [π (0) , π (1) , π (2) , ...] the system of linear equations
x · X = [1, 0] (33)
where X ∈ IR(m+2n)×(m+2n) is defined as follows

⎡ ⎤
∞
∞
∞
⎢1
T F
L −
(1) ·G
S (i)
( +(
F(i) · G) ⎥
S (i)
⎢ ⎥
⎢ i=3
∞
i=2
∞
i=3
∞ ⎥
⎢ T ⎥
X=⎢ 1
B L − S (i)
· G ( F(i)
+ S (i)
· G)⎥
(34)
⎢ ⎥
⎢ i=2 i=1 i=2 ⎥
⎢ (i)
∞ (i)
∞ (i)
∞ ⎥
⎣ ⎦
1T 0 B− S ·G ( F +L+ S · G)
i=1 i=1 i=1
∞
admits a unique solution x = [π (0) , π (1) , π (∗) ], where π (∗) = i=2 π (i) .
Etaqa approach is in the same spirit as the one presented in [5,4] for the
exact solution of a very limited class of QBD and M/G/1-type Markov chains,
but in distinct contrast to these works, the above theorem does not require any
restriction on the form of the chain’s repeating pattern, thus can be applied to
any type of M/G/1 chain.
Etaqa provides a recursive approach to compute metrics of interest once
π (0) , π (1) , and π (∗) have been computed. We consider measures that can be
expressed as the expected reward rate:
∞
(j) (j)
r= ρi π i ,
j=0 i∈S (j)
(j) (j)
where ρi is the reward rate of state si . For example, to compute the expected
queue length in steady state, where S (j) represents the system states with j
(j)
customers in the queue, we let ρi = j. To compute the second moment of the
(j)
queue length, we let ρi = j 2 . ∞
Since our solution approach obtains π (0) , π (1) , and j=2 π (j) , we rewrite r
as
∞

r = π (0) ρ(0)T + π (1) ρ(1)T + π (j) ρ(j)T ,
j=2
(0) (0) (j) (j)
where ρ(0) = [ρ1 , . . . , ρm ]
and ρ(j) = [ρ1 , . . . , ρn ],
for j ≥ 1. Then, we
must show how to compute the above summation without explicitly using the
(j)
values of π (j) for j ≥ 2. We can do so if the reward rate of state si , for j ≥ 2
and i = 1, . . . , n, is a polynomial of degree k in j with arbitrary coefficients
[0] [1] [k]
ai , ai , . . . , ai :
(j) [0] [1] [k]
∀j ≥ 2, ∀i ∈ {1, 2, . . . , n}, ρi = ai + ai j + · · · + ai j k . (35)
(j)
The definition of ρi illustrates that the set of measures of interest that one can
compute includes any moment of the probability vector π as long as the reward
rate of the ith state in each set S (j) has the same polynomial coefficients for all
j ≥ 2. ∞
We compute j=2 π (j) ρ(j)T as follows
∞ (j) (j) T
∞ T
j=2 π ρ = j=2 π (j) a[0] + a[1] j + · · · + a[k] j k
(36)
= r[0] a[0]T + r[1] a[1]T + · · · + r[k] a[k]T ,
∞
and the problem is reduced to the computation of r[l] = j=2 j l π (j) , for l =
0, . . . , k. We show how r[k] , k > 0, can be computed recursively, starting from
r[0] , which is simply π (∗) .
∞
∞
r[k] · [(B + L + F(j) ) | ( j k F(j) − B)1T ] = [b[k] | c[k] ], (37)
j=1 j=1
[k,j,h] , and F[k,j,h] are as follows

where the definitions of b[k] , c[k] , F
k

k
(0)
b = − π F[k,1,1] + π (2 L + F[k,1,2] ) +
[k] (1) k
r [k−l]
(L + F[l,1,1] ) ,
l
l=1
k

∞

∞
k
∞
c[k] = −(π (0) [0,j,0] 1T +π (1)

jkF (j+1)k F[0,j,0] 1T+ r[k−l] j l F[0,j,0] 1T )
l
j=2 j=1 l=1 j=1
∞
∞

[k,j,h] =
F (l) ,
(l + h)k · F F[k,j,h] = (l + h)k · F(l) , j ≥ 1.
l=j l=j
As an example, we consider r[1] , which is used to compute measures such as the

first moment of the queue length. In this case,
∞
∞
∞

b[1] = −π (0) (j) −π (1) (2L+
(j + 1)F (j + 2)F(j) )−π (∗) (L+ (j + 1)F(j) )
j=1 j=1 j=1
∞
∞
∞

c[1] = −π (0) [0,j] 1T − π (1)
jF (j + 1)F[0,j] 1T − π (∗) jF[0,j] 1T
j=2 j=1 j=1
5 Conditions for Stability

We briefly review the conditions that enable us to assert that the CTMC de-
scribed by the infinitesimal generator QM/G/1 in Eq.(1) is stable, that is, admits
a probability vector satisfying πQM/G/1 = 0 and π1T = 1.
∞
First observe that the matrix Q = B + L + j=1 F(j) is an infinitesimal
generator, since it has zero row sums and non-negative off-diagonal entries. The
conditions for stability depend on the irreducibility of matrix Q.
If Q is irreducible then there exists a unique positive vector π that satisfies
the equations π Q = 0 and π1T = 1. In this case, the stability condition for the
M/G/1-type process with infinitesimal generator QM/G/1 [26] is given by the
following inequality
∞
∞
∞

π(L + (j + 1)F(j) )1T = π(L + F(j) )1T + π jF(j) 1T < 0.
j=1 j=1 j=1
∞
(j)
Since B + L + j=1 F is an infinitesimal generator, then (B + L +
∞ (j) T
F )1 = 0. By substituting in the condition for stability the term
j=1
∞ (j) T
(L + j=1 F )1 with its equal (−B1T ), the condition for stability can be
re-written as:
∞
π jF(j) 1T < πB1T . (38)
j=1
As in the scalar case, the equality in the above relation results in a null-recurrent
CTMC.
In the example of the BM AP1 /Cox2 /1 queue depicted in Figure 3 the in-
finitesimal generator Q and its stationary probability vector are

−0.8μ 0.8μ γ 0.8μ
Q= and π = , ,
γ −γ γ + 0.8μ γ + 0.8μ
while
∞

(j) 4λ 0
jF = .
0 4λ
j=1
The stability condition is expressed as

γ 0.8μ 4λ 0 1 γ 0.8μ 0.2μ 0 1
, · · < , · · ,
γ + 0.8μ γ + 0.8μ 0 4λ 1 γ + 0.8μ γ + 0.8μ γ 0 1
which can be written in the following compact form

μ·γ
4λ < .
0.8μ + γ
If Q is reducible, then the stability condition is different. By identifying the

absorbing states in Q, its state space can be rearranged as follows
⎡ ⎤
C1 0 · · · 0 0
⎢ 0 C2 · · · 0 0 ⎥
⎢ ⎥
⎢ ⎥
Q = ⎢ ... ... . . . ... 0 ⎥ , (39)
⎢ ⎥
⎣ 0 0 · · · Ck 0 ⎦
D1 D2 · · · Dk D0
where the blocks Ch for 1 ≤ h ≤ k are irreducible and infinitesimal genera-

tors. Since the matrices B, L and F(i) for i ≥ 1 have non-negative off-diagonal
elements, they can be restructured similarly and have block components BCh ,
(i) (i)
BDl , LCh , LDl , and FCh , FDl for 1 ≤ h ≤ k, 0 ≤ l ≤ k, and i ≥ 1.
This implies that each of the sets S (i) for i ≥ 1 is partitioned into subsets that
communicate only through the boundary portion of the process, i.e., states in
S (0) . The stability condition in Eq.(38) should be satisfied by all the irreducible
blocks identified in Eq.(39) in order for the M/G/1-type process to be stable as
summarized below:
∞
(i)
π (h) jFCh 1T < π (h) BCh 1T ∀1 ≤ h ≤ k. (40)
j=1
6 MAMSolver: A Matrix-Analytic Methods Tool

In this section we briefly describe the MAMSolver [32] software tool9 for the
solution of M/G/1-type, GI/M/1-type, and QBD processes. MAMSolver is a
collection of the most efficient solution methodologies for M/G/1-type, GI/M/1-
type, and QBD processes. In contrast to existing tools such as MAGIC [36] and
MGMTool [10], which provide solutions only for QBD processes, MAMSolver
provides implementations for the classical and the most efficient algorithms that
solve M/G/1-type, GI/M/1-type, and QBD processes. The solution provided by
MAMSolver consists of the stationary probability vector for the processes under
9
Available at http://www.cs.wm.edu/MAMSolver/
study, the queue length and queue length distribution, as well as probabilistic
indicators for the queueing model such as the caudal characteristic [16].
MAMSolver provides solutions for both DTMCs and CTMCs. The matrix-
analytic algorithms are defined in terms of matrices, making matrix manipula-
tions and operations the basic elements of the tool. The input to MAMSolver,
in the form of a structured text file, indicates the method to be used for the
solution and the finite set of matrices that accurately describe the process to be
solved. Several tests are performed within the tool to insure that special cases
are treated separately and therefore efficiently.
To address possible numerical problems that may arise during matrix op-
erations, we use well known and heavily tested routines provided by the La-
pack and BLAS packages10 . Methods such as LU-decomposition, GMRES, and
BiCGSTAB are used for the solution of systems of linear equations.
The solution of QBD processes starts with the computation of the matrix
R using the logarithmic reduction algorithm [16]. However for completeness
we provide also the classical iterative algorithm. There are cases when G (and
R) can be computed explicitly [28]. We check if the conditions for the explicit
computation hold in order to simplify and speedup the solution. The available
solution methods for QBD processes are matrix-geometric and Etaqa.
The classic matrix geometric solution is implemented to solve GI/M/1 pro-
cesses. The algorithm goes first through the classic iterative procedure to com-
pute R (to our knowledge, there is no alternative more efficient one). Then, it
computes the boundary part of the stationary probability vector. Since there
exists a geometric relation between vectors π (i) for i ≥ 1, there is no need to
compute the whole stationary probability vector.
M/G/1 processes require the computation of matrix G. More effort has
been placed on efficient solution of M/G/1 processes. MAMSolver provides the
classic iterative algorithm, the cyclic-reduction algorithm, and the explicit one
for special cases. The stationary probability vector is computed recursively using
either Ramaswami’s formula or its fast FFT version. Etaqa is the other available
alternative for the solution of M/G/1 processes.
For a set of input and output examples and the source code of MAMSolver,
we point the interested reader to the tool’s website
http://www.cs.wm.edu/MAMSolver/.
7 Concluding Remarks
In this tutorial, we derived the basic matrix analytic results for the solution of
M/G/1-type Markov processes. Via simple examples and from first principles,
we illustrated why the solution of QBD and GI/M/1-type processes is simpler
than the solution of M/G/1-type processes. We direct the interested reader in
the two books of Neuts [25,26] for further reading as well as to the book of
Latouche and Ramaswami [16]. Our target was to present enough material for
10
Available from http://www.netlib.org.
a modeler to solve performance models with embedded Markov chains of the

M/G/1 form.
References
1. D. A. Bini and B. Meini. Using displacement structure for solving non-skip-free

M/G/1 type Markov chains. In A. Alfa and S. Chakravarthy, editors, Advances in
Matrix Analytic Methods for Stochastic Models, pages 17–37, Notable Publications
Inc. NJ, 1998.
2. D. A. Bini, B. Meini, and V. Ramaswami. Analyzing M/G/1 paradigms through
QBDs: the role of the block structure in computing the matrix G. In G. Latouche
and P. Taylor, editors, Advances in Matrix-Analytic Methods for Stochastic Models,
pages 73–86, Notable Publications Inc. NJ, 2000.
3. L. Breuer. Parameter estimation for a class of BMAPs. In G. Latouche and
P. Taylor, editors, Advances in Matrix-Analytic Methods for Stochastic Models,
pages 87–97, Notable Publications Inc. NJ, 2000.
4. G. Ciardo, A. Riska, and E. Smirni. An aggregation-based solution method for
M/G/1-type processes. In B. Plateau, W. J. Stewart, and M. Silva, editors, Numer-
ical Solution of Markov Chains, pages 21–40. Prensas Universitarias de Zaragoza,
Zaragoza, Spain, 1999.
5. G. Ciardo and E. Smirni. ETAQA: an efficient technique for the analysis of QBD
processes by aggregation. Performance Evaluation, vol. 36-37, pages 71–93, 1999.
6. J. N. Daige and D. M. Lucantoni. Queueing systems having phase-dependent
arrival and service rates. In J. W. Stewart, editor, Numerical Solution of Markov
Chains, pages 179–215, Marcel Dekker, New York, 1991.
7. H. R. Gail, S. L. Hantler, and B. A. Taylor. Use of characteristic roots for solving
infinite state Markov chains. In W. K. Grassmann, editor, Computational Proba-
bility, pages 205–255, Kluwer Academic Publishers, Boston, MA, 2000.
8. W. K. Grassmann and D. A. Stanford. Matrix analytic methods. In W. K. Grass-
mann, editor, Computational Probability, pages 153–204, Kluwer Academic Pub-
lishers, Boston, MA, 2000.
9. D. Green. Lag correlation of approximating departure process for MAP/PH/1
queues. In G. Latouche and P. Taylor, editors, Advances in Matrix-Analytic Meth-
ods for Stochastic Models, pages 135–151, Notable Publications Inc. NJ, 2000.
10. B. Haverkort, A. Van Moorsel, and A. Dijkstra. MGMtool: A Performance Analysis
Tool Based on Matrix Geometric Methods. In R. Pooley, and J. Hillston, editors,
Modelling Techniques and Tools, pages 312–316, Edinburgh University Press, 1993.
11. D. Heyman and A. Reeves. Numerical solutions of linear equations arising in
Markov chain models. ORSA Journal on Computing, vol. 1 pages 52–60, 1989.
12. D. Heyman and D. Lucantoni. Modeling multiple IP traffic streams with rate
limits. In Proceedings of the 17th International Teletraffic Congress, Brazil, Dec.
2001.
13. L. Kleinrock. Queueing systems. Volume I: Theory, Wiley, 1975.
14. G. Latouche. A simple proof for the matrix-geometric theorem. Applied Stochastic
Models and Data Analysis, vol. 8, pages 25–29, 1992.
15. G. Latouche and G. W. Stewart. Numerical methods for M/G/1 type queues. In
G. W. Stewart, editor, Computations with Markov chains, pages 571–581, Kluwer
Academic Publishers, Boston, MA, 1995.
16. G. Latouche and V. Ramaswami. Introduction to Matrix Geometric Methods in

Stochastic Modeling. ASA-SIAM Series on Statistics and Applied Probability.
SIAM, Philadelphia, PA, 1999.
17. D. M. Lucantoni. The BMAP/G/1 queue: A tutorial. In L. Donatiello and R. Nel-
son, editors, Models and Techniques for Performance Evaluation of Computer and
Communication Systems, pages 330–358. Springer-Verlag, 1993.
18. D. M. Lucantoni. An algorithmic analysis of a communication model with retrans-
mission of flawed messages. Pitman, Boston, 1983.
19. B. Meini. An improved FFT-based version of Ramaswami’s formula. Comm.
Statist. Stochastic Models, vol. 13, pages 223–238, 1997.
20. B. Meini. Solving M/G/1 type Markov chains: Recent advances and applications.
Comm. Statist. Stochastic Models, vol. 14(1&2), pages 479–496, 1998.
21. B. Meini. Fast algorithms for the numerical solution of structured Markov chains.
Ph.D. Thesis, Department of Mathematics, University of Pisa, 1998.
22. C. D. Meyer. Stochastic complementation, uncoupling Markov chains, and the
theory of nearly reducible systems. SIAM Review, vol. 31(2) pages 240–271, June
1989.
23. R. Nelson. Matrix geometric solutions in Markov models: a mathematical tutorial.
Research Report RC 16777 (#742931), IBM T.J. Watson Res. Center, Yorktown
Heights, NY, Apr. 1991.
24. R. Nelson. Probability, Stochastic Processes, and Queueing Theory. Springer-
Verlag, 1995.
25. M. F. Neuts. Matrix-geometric solutions in stochastic models. Johns Hopkins
University Press, Baltimore, MD, 1981.
26. M. F. Neuts. Structured stochastic matrices of M/G/1 type and their applications.
Marcel Dekker, New York, NY, 1989.
27. B. F. Nielsen. Modeling long-range dependent and heavy-tailed phenomena by
matrix analytic methods. In Advances in Matrix-Analytic Methods for Stochastic
Models, G. Latouche and P. Taylor, editors, Notable Publications, pages 265–278,
2000.
28. V. Ramaswami and G. Latouche. A general class of Markov processes with explicit
matrix-geometric solutions. OR Spektrum, vol. 8, pages 209–218, Aug. 1986.
29. V. Ramaswami. A stable recursion for the steady state vector in Markov chains of
M/G/1 type. Comm. Statist. Stochastic Models, vol. 4, pages 183–263, 1988.
30. V. Ramaswami and J. L. Wang. A hybrid analysis/simulation for ATM perfor-
mance with application to quality-of-service of CBR traffic. Telecommunication
Systems, vol. 5, pages 25–48, 1996.
31. A. Riska and E. Smirni. An exact aggregation approach for M/G/1-type Markov
chains. In the Proceedings of the ACM International Conference on Measurement
and Modeling of Computer Systems (ACM SIGMETRICS ’02), pages 86–96, Ma-
rina Del Rey, CA, 2002.
32. A. Riska and E. Smirni. MAMSolver: a Matrix-analytic methods tools. In T. Field
et al. (editors), TOOLS 2002, LNCS 2324, pages 205–211, Springer-Verlag, 2002.
33. A. Riska, M. S. Squillante, S.-Z. Yu, Z. Liu, and L. Zhang. Matrix-analytic analysis
of a MAP/PH/1 queue fitted to web server data. 4th Conference on Matrix-
Analytic Methods (to appear), Adelaide, Australia, July 2002.
34. H. Schellhaas. On Ramaswami’s algorithm for the computation of the steady state
vector in Markov chains of M/G/1 type. Comm. Statist. Stochastic Models, vol. 6,
pages 541–550, 1990.
35. M. S. Squillante. Matrix-analytic methods: Applications, results and software tools.

In G. Latouche and P. Taylor, editors, Advances in Matrix-Analytic Methods for
Stochastic Models, Notable Publications Inc. NJ, 2000.
36. M. S. Squillante. MAGIC: A computer performance modeling tool based on matrix-
geometric techniques. In G. Balbo and G. Serazzi, editors, Computer Performance
Evaluation: Modeling Techniques and Tools, North-Holland, Amsterdam, pages
411–425, 1992.
Appendix A: Stochastic Complementation
Here, we briefly outline the concept of stochastic complementation [22]. While

[22] introduces the concept of stochastic complementation for DTMCs with finite
state spaces we define it instead for the infinite case, a straightforward extension,
and state the results in terms of CTMCs.
Partition the state space S of an ergodic CTMC with infinitesimal generator
matrix Q and stationary probability vector π, satisfying πQ = 0, into two
disjoint subsets, A and A.
Definition 1. [22] (Stochastic complement) The stochastic com-

plement of A is
Q = Q[A, A] + Q[A, A](−Q[A, A])−1 Q[A, A], (41)
where (−Q[A, A])−1 [r, c] represents the mean time spent in state c ∈
A, starting from state r ∈ A, before reaching any state in A, and
((−Q[A, A])−1 Q[A, A])[r, c ] represents the probability that, starting
from r ∈ A, we enter A through state c . 2
The stochastic complement Q is the infinitesimal generator of a new CTMC

which mimics the original CTMC but “skips over” states in A. The following
theorem formalizes this concept.
Theorem 2. [22] The stochastic complement Q of A is an infinitesimal gener-
ator and is irreducible if Q is. If α is its stationary probability vector satisfying
αQ = 0, then α = Norm(π[A]). 2
This implies that the stationary probability distribution α of the stochastic
complement differs from the corresponding portion of the stationary distribution
of the original CTMC π[A] only by the constant π[A]1T , which represents the
probability of being in A in the original CTMC.
There are cases where we can take advantage of the special structure of the
CTMC and explicitly generate the stochastic complement of a set of states A. To
consider these cases, rewrite the definition of stochastic complement in Eq.(41)
as
Q = Q[A, A] + RowSum(Q[A, A])Z, (42)
where Z = Norm(Q[A, A]) (−Q[A, A])−1 Q[A, A]. The rth diagonal element of
RowSum(Q[A, A]) represents the rate at which the set A is left from its rth
state to reach any of the states in A, while the rth row of Z, which sums to one,
specifies how this rate should be redistributed over the states in A when the
process eventually reenters it.
Lemma 1. (Single entry) If A can be entered from A only through
a single state c ∈ A, the matrix Z defined in Eq. (42) is trivially com-
putable: it is a matrix of zeros except for its cth column, which contains
all ones. 2
γ
α α d
b d b
β μ β λ μ
λ a γ
a
ν ν e
e
δ δ c
c τ
τ
(a ) (b )
Fig. 7. Stochastic Complementation for a finite Markov chain.
We choose the simple finite Markov chain depicted in Figure 7(a) to explain the
concept of stochastic complementation. The state space of this Markov chain is
S = {a, b, c, d, e}. We construct the stochastic complement of the states in set
A = {a, b, c} (A = {d, e}), as shown in Figure 7(b). The matrices used in Eq.(41)
for this example are:
⎡ ⎤ ⎡ ⎤
−(α + ν) α 0 0ν
Q[A, A] = ⎣ β −(γ + β) 0 ⎦ , Q[A, A] = ⎣ γ 0 ⎦ ,
δ 0 −δ 00

000 −μ μ
Q[A, A] = , Q[A, A] = .
00τ λ −(λ + τ )
Observe that in this case one can apply Lemma 1 to trivially construct the
stochastic complement, since A is entered from states in A only through state
c. There are only two transitions from states in A to states in A; the transition
with rate γ from state b to state d and the transition with rate ν from state a to
state e. These two transitions are folded back into A through state c, which is
the single entry in A. The following derivation shows that because of the special
single entry state the two folded transitions have the original rates, γ and ν
respectively.

01 λ+τ 1 001
000
Z = Norm(Q[A, A])(−Q[A, A])−1 Q[A, A] = 10 · μτ τ
λ · = 001 ,
1
μτ τ
00τ
00 000
which further results in:

⎡ ⎤ ⎡ ⎤ ⎡ ⎤
ν00 001 00ν
RowSum(Q[A, A]) · Z = ⎣ 0 γ 0 ⎦ · ⎣ 0 0 1 ⎦ = ⎣ 0 0 γ ⎦ .
000 000 000
Appendix B: Explicit Computation of R

QBD processes are defined as the intersection of M/G/1 and GI/M/1-type pro-
cesses. Hence, both matrix G (characteristic for M/G/1) and matrix R (char-
acteristic for GI/M/1) can be defined for a QBD process as solutions of the
following quadratic equations [16]:
B + LG + FG2 = 0, F + RL + R2 B = 0.
If matrix-geometric is used to solve a QBD process then the relation between

π (i) and π (i−1) for i > 1 is expressed in terms of R
π (i) = π (i−1 R,
If matrix-analytic is the solution method then the relation between π (i) and
π (i−1) is based on Ramaswami’s recursive formula:
π (i) = −π (i−1) S(1) (S(0) )−1 ,
where S(1) = F and S(0) = (L + FG), i.e., the only auxiliary sums (see Subsec-
tion 4.1) used in the solution of M/G/1 processes that are defined for a QBD
process. The above equations allow the derivation of the fundamental relation
between R and G [16, pages 137-8],
R = −F(L + FG)−1 . (43)
Obviously, for the case of QBD processes, knowing G (or R) implies a direct
computation of R (or G). Computing G is usually easier than computing R:
G’s computation is a prerequisite to the computation of R in the logarithmic
reduction algorithm, the most efficient algorithm to compute R [16]. If B can
be expressed as a product of two vectors
B = α · β,
where, without loss of generality β is assumed to be a normalized vector, then

G and R can be explicitly obtained as
G = 1 · β, R = −F(L + F1 · β)−1 .
Representative examples, where the above condition holds, are the queues
M/Cox/1, M/Hr/1, and M/Er/1, whose service process is Coxian, Hyperex-
ponential, and Erlang distribution respectively.
An Algorithmic Approach to Stochastic Bounds
J.M. Fourneau and N. Pekergin
PRiSM, Université de Versailles Saint-Quentin en Yvelines,

45 Av. des Etats Unis, 78000 Versailles, France
Abstract. We present a new methodology based on the stochastic or-

dering, algorithmic derivation of simpler Markov chains and numerical
analysis of these chains. The performance indices defined by reward func-
tions are stochastically bounded by reward functions computed on much
simpler or smaller Markov chains. This leads to an important reduction
on numerical complexity. Stochastic bounds are a promising method to
analyze QoS requirements. Indeed it is sufficient to prove that a bound
of the real performance satisfies the guarantee.
1 Introduction
Since Plateau’s seminal work on composition and compact tensor representation
of Markov chains using Stochastic Automata Networks (SAN), we know how to
model Markov systems with interacting components and large state space [29,30,
31]. The main idea of the SAN approach is to decompose the system of interest
into its components and to model each component separately. Once this is done,
interactions and dependencies among components can be added to complete the
model. The stochastic matrix of the chain is obtained after summations and
Kronecker (or tensor) products of local components. The benefit of the SAN ap-
proach is twofold. First, each component can be modeled much easier compared
to the global system. Second, the space required to store the description of com-
ponents is in general much smaller than the explicit list of transitions, even in
a sparse representation. However, using this representation instead of the usual
sparse matrix form increases the time required for numerical analysis of the
chains [6,15,37,33]. Note that we are interested in performance indices
R defined
as reward functions on the steady-state distribution (i.e. R = i r(i)π(i)) and
we do not try to compute transient measures. Thus the numerical computation
of the analysis is mainly the computation of the steady-state distribution and
then the summation of the elementary rewards r(i) to obtain R. The first step
is in general the most difficult because of the memory space and time require-
ments (see Steward’s book [34] for an overview of usual numerical techniques for
Markov chains). The decomposition and tensor representation has been general-
ized to other modeling formalisms as well : Stochastic Petri nets [13], Stochastic
Process Algebra [20]. So we now have several well-founded methods to model
complex systems using Markov chains with large state space.
Despite considerable works [7,12,15,37], the numerical analysis of Markov
chains, is still a very difficult problem when the state space is too large or the

An Algorithmic Approach to Stochastic Bounds 65
eigenvalues badly distributed. Fortunately enough, while modeling high speed

networks, it is often sufficient to satisfy the requirements for the Quality of Ser-
vice (QoS) we expect. Exact values of the performance indices are not necessary
in this case and bounding some reward functions is often sufficient.
So, we advocate the use of stochastic bounds to prove that the QoS re-
quirements are satisfied. Our approach differs from sample path techniques and
coupling theorem applied to models transformation (see [27] for an example on
Fair Queueing delays comparison based on sample-paths), as we only consider
Markov chains and algorithmic operations on stochastic matrices. Assume that
we have to model a problem using a very large Markov chain. We need to compute
its steady-state distribution in order to obtain reward functions (for instance,
the cell loss rates for a finite capacity buffer). The key idea of the methodol-
ogy is to design a new chain such that the reward functions will be upper or
lower bounds of the exact reward functions. This new chain is an aggregated or
simplified model of the former one. These bounds and the simplification criteria
are based on some stochastic orderings applied to Markov processes (see Stoyan
[35] and other references therein). As we drastically reduced the state space or
the complexity of the analysis, we may now use numerical methods to efficiently
compute a bound of the rewards.
Several methods have been proposed to bound rewards : resolution of a linear
algebra problem and polyhedra properties by Courtois and Semal [8,9], Markov
Decision Process by Van Dijk [38] and various stochastic bounds (see [35,22,32]
and references therein). Here we present recent results based on stochastic orders
and structure-based algorithms combined with usual numerical techniques. Thus
the algorithms we present can be easily implemented inside software tools based
on Markov chains. Unlike former approaches which are either analytical or not
really constructive, this new approach is only based on simple algorithms. These
algorithms can always be applied, even if the quality of the bounds may be
sometimes not enough accurate.
We survey the results in two steps : first how to obtain a bounding matrix and
a bound of the distributions and in a second step how to simplify the numerical
computations. We present several algorithms based on stochastic bounds and
structural properties of the chains and some examples to show the effectiveness
of the approach. In section 2, we define the “st” and “icx” stochastic orderings
and we give the fundamental theorem on the stochastic matrices. We also present
Vincent’s algorithm [1] which is the starting point of all the algorithms for the
“st” ordering presented here. Then we present, in section 3, several algorithms
for “st” bounds based on structures: upper-Hessenberg, lumpability, stochastic
complement [26], Class C Matrices. Section 4 is devoted to the analysis of a real
problem: the loss rates of a finite buffer with batch arrivals and modulation,
Head of Line service discipline and Pushout buffer management. Such systems
have been proposed for ATM networks [18]. The example here is only slightly
simplified to focus on the algorithmic aspects. The reduction algorithms we have
used on this example has divided the state-space by ten for several analysis.
Finally, in section 5, we present some algorithms for “icx” ordering.
66 J.M. Fourneau and N. Pekergin
2 Strong Stochastic Bounds

For the sake of simplicity, we restrict ourselves to Discrete Time Markov Chains
(DTMC) with finite state space E = {1, . . . , n} but continuous-time models can
be considered after uniformization. Here we restrict ourselves to “st” stochastic
ordering. The definitions and results for “icx” ordering are presented in section
5. In the following, n will denote the size of matrix P and Pi,∗ will refer to row
i of P .
First, we give a brief overview on stochastic ordering for Markov chains and
we obtain a set of inequalities to imply bounds. Then we present a basic al-
gorithm proposed by Vincent and Abuamsha [1] and we explain some of its
properties.
2.1 A Brief Overview

Following [35], we define the strong stochastic ordering by the set of non-
decreasing functions or by matrix Kst .
⎡ ⎤
1 0 0 ... 0
⎢1 1 0 ... 0⎥
⎢ ⎥
⎢ ⎥
Kst = ⎢ 1 1 1 . . . 0 ⎥
⎢ .. .. .. . . .. ⎥
⎣. . . . . ⎦
1 1 1 ... 1
Definition 1 Let X and Y be random variables taking values on a totally or-

dered space. Then X is said to be less than Y in the strong stochastic sense, that
is, X <st Y if and only if E[f (X)] ≤ E[f (Y )] for all non decreasing functions
f whenever the expectations exist.
If X and Y take values on the finite state space {1, 2, . . . , n} with p and
q as probability distribution vectors, then X is said to be less than
nY in the
n
strong stochastic sense, that is, X <st Y if and only if j=k pj ≤ j=k qj for
k = 1, 2, . . . , n, or briefly: pKst <st qKst .
Important performance indices such as average population, loss rates or tail
probabilities are non decreasing functions. Therefore, bounds on the distribution
imply bounds on these performance indices as well. It is important to know that
st-bounds are valid pour the transient distributions as well. We do not use this
property as we are mainly interested in performance measures on the the steady-
state. To the best of our knowledge, such a work has still to be done to link
st-bounds and numerical analysis for the computation of transient distributions.
It is known for a long time that monotonicity [21] and comparability of
the one step transition probability matrices of time-homogeneous MCs yield
sufficient conditions for their stochastic comparison. This is the fundamental
result we use in our algorithms. First let us define the st-comparability of the
matrix and the st-monotonicity.
Definition 2 Let P and Q be two stochastic matrices. P <st Q if and only if

P Kst ≤ QKst . This can be also characterized as Pi,∗ <st Qi,∗ for all i.
Definition 3 Let P be a stochastic matrix, P is st-monotone if and only if for

all u and v, if u <st v then uP <st vP .
Hopefully, st-monotone matrices are completely characterized (this is not the

case for other orderings, see [4]).
Definition 4 Let P be a stochastic matrix. P is <st -monotone if and only if

−1
Kst P Kst ≥ 0 component-wise.
Thus we get:
Property 1 Let P be a stochastic matrix, P is st-monotone if and only if for

all i, j > i, we have Pi,∗ <st Pj,∗
Theorem 1 Let X(t) and Y (t) be two DTMC and P and Q be their respective
stochastic matrices. Then X(t) <st Y (t), t > 0, if
• X(0) <st Y (0),
• st-monotonicity of at least one of the matrices holds,
• st-comparability of the matrices holds, that is, Pi,∗ <st Qi,∗ ∀i.
Thus, assuming that P is not monotone, we obtain a set of inequalities on

elements of Q :
! n n
Pi,k ≤ k=j Qi,k ∀ i, j
k=j
n n (1)
k=j Q i,k ≤ k=j Qi+1,k ∀ i, j
2.2 Algorithms
It is possible to derive a set of equalities, instead of inequalities. These equalities
provides, once they have been ordered (in increasing order for i and in decreasing
order for j in system 2), a constructive way to design a stochastic matrix which
yields a stochastic bound.
! n n
Q1,k = k=j P1,k
k=j
n n n (2)
k=j Qi+1,k = max( k=j Qi,k , k=j Pi+1,k ) ∀ i, j
The following algorithm [1] constructs an st-monotone upper bounding

DTMC Q for a given DTMC P . For the sake of simplicity, we use a full matrix
representation for P and Q. Stochastic matrices associated to real performance
evaluation problems are usually sparse. And the sparse matrix version of all the
algorithms we present here isstraightforward.Note that due to the ordering of
n n
the indices, the summations j=l qi−1,j and j=l+1 qi,j are already computed
when we need them. And they can be stored to avoid computations. How-
ever, we let them appear as summations to show the relations with inequalities 1.
Algorithm 1 Construction of the optimal st-monotone upper bounding

DTMC Q:
q1,n = p1,n ;
for i = 2, 3, . . . , n do qi,n = max(qi−1,n , pi,n ); od
for l = n − 1, n − 2, . . . , 1, do q1,l = p
1,l ; n n
n
for i = 2, 3, . . . , n, do qi,l = max( j=l qi−1,j , j=l pi,l ) − j=l+1 qi,j ; od
od
Definition 5 We denote by v(P ) the matrix obtained after application of Algo-

rithm 1 to a stochastic matrix P .
First let us illustrate Algorithm 1 on a small matrix. We consider a 5 × 5

matrix for P 1 and we compute matrix Q, and both steady-state distributions.
⎡ ⎤ ⎡ ⎤
0.5 0.2 0.1 0.2 0.0 0.5 0.2 0.1 0.2 0.0
⎢ 0.1 0.7 0.1 0.0 0.1 ⎥ ⎢ 0.1 0.6 0.1 0.1 0.1 ⎥
⎢ ⎥ ⎢ ⎥
P1 = ⎢⎢ 0.2 0.1 0.5 0.2 0.0 ⎥ Q = v(P 1) = ⎢ 0.1 0.2 0.5 0.1 0.1 ⎥
⎥ ⎢ ⎥
⎣ 0.1 0.0 0.1 0.7 0.1 ⎦ ⎣ 0.1 0.0 0.1 0.7 0.1 ⎦
0.0 0.2 0.2 0.1 0.5 0.0 0.1 0.1 0.3 0.5
Their steady-state distributions are respectively πP 1 =

(0.180, 0.252, 0.184, 0.278, 0.106) and πQ = (0.143, 0.190, 0.167, 0.357, 0.143).
Their expectations are respectively 1.87 and 2.16 (we assume that the first state
has index 0 to compute the reward f (i) = i associated to the expectation).
Remember that the strong stochastic ordering implies that the expectation of f
on distribution πP 1 is smaller than the expectation of f on distribution πQ for
all non decreasing functions f .
It may happen that matrix v(P ) computed by Algorithm 1 is not irreducible,
even if P is irreducible. Indeed due to the subtraction operation in inner loops,
some elements of v(P ) may be zero even if the elements with the same indices
in P are positive. We have derived a new algorithm which try to keep almost all
transitions of P in matrix v(P ) and we have proved a necessary and sufficient
condition on P to obtain an irreducible matrix (the proof of the theorem is
omitted for the sake of readability):
Theorem 2 Let P be an irreducible finite stochastic matrix. Matrix Q computed

from P by Algorithm 2 is irreducible if and only if every row of the lower triangle
of matrix P contains at least one positive element.
Even if matrix v(P ) is reducible, it has one essential class of states and the
last state belongs to that class. So it is still possible to compute the steady-
state distribution for this class. We do not prove the theorem but we present an
example of a matrix P 2 such that v(P 2) is reducible (i.e. states 0, 1 and 2 are
transient in matrix v(P 2)).
⎡ ⎤ ⎡ ⎤
0.5 0.2 0.1 0.2 0.0 0.5 0.2 0.1 0.2 0.0
⎢ 0.1 0.7 0.1 0.0 0.1 ⎥ ⎢ 0.1 0.6 0.1 0.1 0.1 ⎥
⎢ ⎥ ⎢ ⎥
P2 = ⎢
⎢ 0.2 0.1 0.5 0.2 0.0 ⎥
⎥ Q = v(P 2) = ⎢
⎢ 0.1 0.2 0.5 0.1 0.1 ⎥
⎥
⎣ 0.0 0.0 0.0 0.7 0.3 ⎦ ⎣ 0.0 0.0 0.0 0.7 0.3 ⎦
0.0 0.2 0.2 0.1 0.5 0.0 0.0 0.0 0.5 0.5
In the following, is an arbitrary positive value. And we assume that a

summation with a lower index larger than the upper index is 0.
Algorithm 2 Construction of an st-monotone upper bounding DTMC with-

out transition deletion:
q1,n = p1,n ;
for i = 2, 3, . . . , n do qi,n = max(qi−1,n , pi,n ); od
for l = n − 1, n − 2, . . . , 1, do q1,l = p1,l ;
for i = 2, 3, . . .
, n, do n n
n
qi,l = max( j=l qi−1,j , j=l pi,l ) − j=l+1 qi,j );
n
if (qi,l = 0) and (pi,l > 0) and ( j=l+1 qi,j < 1) then
n
qi,l = × (1 − j=l+1 qi,j )
od
od
2.3 Properties
Algorithm 1 has several interesting properties which can be proved using a max-
plus formulation [10] which appears clearly in equation 2.
Theorem 3 Algorithm 1 provides the smallest st-monotone upper bound for a
matrix P : i.e. if we consider U another st-monotone upper bounding DTMC for
P then Q <st U [1].
However bounds on the probability distributions may still be improved. The
former theorem only states that Algorithm 1 provides the smallest matrix. We
have developed new techniques to improve the accuracy of the bounds on the
steady-state π which are based on some transformations on P [10].
We have studied a linear transformation for stochastic matrices α(P, δ) =
(1−δ)I +δP , for δ ∈ (0, 1). This transformation has no effect on the steady-state
distribution but it has a large influence on the effect of Algorithm 1. We have
proved in [10] that if the given stochastic matrix is not row diagonally dominant,
then the steady-state probability distribution of the optimal st-monotone upper
bounding matrix corresponding to the row diagonally dominant transformed
matrix is better in the strong stochastic sense than the one corresponding to
the original matrix. And we have established that the transformation P/2 +
I/2 provides the best bound for the family of linear transformation we have
considered. More precisely:
Theorem 4 Let P be a DTMC of order n, and two different values δ1 , δ2 ∈ (0, 1)
such that δ1 < δ2, Then πv(α(P,δ1)) <st πv(α(P,δ2)) <st πv(P ) .
One may ask if there is an optimal value of δ. When the matrix is row diagonal
dominant (RDD), its diagonal serves as a barrier for the perturbation moving
from the upper-triangular part to the strictly lower-triangular part in forming
v(P ).
Definition 6 A stochastic matrix is said to be row diagonally dominant (RDD)
if all of its diagonal elements are greater than or equal to 0.5.
Corollary 1 Let P be a DTMC of order n that is RDD. Then v(P ) and v(α(P ))
have the same steady-state probability distribution.
Corollary 1 implies that one cannot improve the steady-state probability

bounds by choosing a smaller δ value to transform an already RDD DTMC.
And δ = 1/2 is sufficient to transform an arbitrary stochastic matrix into a
RDD one. This first approach was then generalized to transformations based on
a set of polynomials which gives better (i.e. more accurate) bounds [5]. Let us
first introduce these transformations and their basic properties.
Definition 7 Let D be the set of polynomials Φ() such that Φ(1) = 1, Φ different
of Identity, and all the coefficients of Φ are non negative.
Proposition 1 Let Φ() be an arbitrary polynomial in D, then Φ(P ) has the

same steady-state distribution than P
Theorem 5 Let Φ be an arbitrary polynomial in D, Algorithm 1 applied on Φ(P )

provides a more accurate bound than the steady-state distribution of Q i.e.:
πP <st πv(Φ(P )) <st πv(P )
For a stochastic interpretation of this result and a proof based on linear

algebra see [5]. Corollary 1 basically states that the optimal transformation if
we restrict ourselves to degree 1 polynomials is φ(X) = X/2 + 1/2. Such a result
is still unknown for arbitrary degree polynomials, even if it is clear that the
larger the degree of Φ, the more accurate the bound v(Φ(P )). This is illustrated
in the example below. Let us consider stochastic matrix P 3 and we study the
polynomials φ(X) = X/2 + 1/2 and ψ(X) = X 2 /2 + 1/2.
⎡ ⎤
0.1 0.2 0.4 0.3
⎢ 0.2 0.3 0.2 0.3 ⎥
P3 = ⎢ ⎣ 0.1 0.5 0.4 0 ⎦
⎥
0.2 0.1 0.3 0.4
First, let us compute φ(P 3) and ψ(P 3).

⎡ ⎤ ⎡ ⎤
0.55 0.1 0.2 0.15 0.575 0.155 0.165 0.105
⎢ 0.1 0.65 0.1 0.15 ⎥ ⎢ 0.08 0.63 0.155 0.135 ⎥
φ(P 3) = ⎢ ⎥ ⎢
⎣ 0.05 0.25 0.7 0 ⎦ ψ(P 3) = ⎣ 0.075
⎥
0.185 0.65 0.09 ⎦
0.1 0.05 0.15 0.7 0.075 0.13 0.17 0.625
Then, we apply operators v to obtain the bounds on matrices :

⎡ ⎤ ⎡ ⎤
0.55 0.1 0.2 0.15 0.575 0.155 0.165 0.105
⎢ 0.1 0.55 0.2 0.15 ⎥ ⎢ 0.08 0.63 0.155 0.135 ⎥
v(φ(P 3)) = ⎢
⎣ 0.05 0.25 0.55
⎥ v(ψ(P 3)) = ⎢ ⎥
0.15 ⎦ ⎣ 0.075 0.185 0.605 0.135 ⎦
0.05 0.1 0.15 0.7 0.075 0.13 0.17 0.625
And, ⎡ ⎤
0.1 0.2 0.4 0.3
⎢ 0.1 0.2 0.4 0.3 ⎥
v(P 3) = ⎢
⎣ 0.1
⎥
0.2 0.4 0.3 ⎦
0.1 0.2 0.3 0.4
Finally, we compute the steady-state distributions for all matrices:
⎧
⎪
⎪ π = (0.1, 0.2, 0, 3667, 0.3333)
⎨ v(P 3)
πv(φ(P 3)) = (0.1259, 0.2587, 0, 2821, 0.3333)
⎪
⎪ π = (0.1530, 0.2997, 0, 2916, 0.2557)
⎩ v(ψ(P 3))
πP 3 = (0.1530, 0.3025, 0, 3167, 0.2278)
Clearly, bounds obtained by polynomial ψ are more accurate than the other
bounds.
2.4 Time and Space Complexity

It must be clear at this point that Algorithm 1 builds a matrix Q which is, in
general, as difficult as P to analyze. This first algorithm is only presented here to
show that inequalities 1 have algorithmic implications. Concerning complexity
of Algorithm 1 on sparse matrix, we do not have positive results. Indeed, it may
be possible that matrix Q has many more positive elements than matrix P and
it may be even completely filled. For instance:
⎡ ⎤ ⎡ ⎤
0.5 0.2 0.1 0.1 0.1 0.5 0.2 0.1 0.1 0.1
⎢ 1.0 0.0 0.0 0.0 0.0 ⎥ ⎢ 0.5 0.2 0.1 0.1 0.1 ⎥
⎢ ⎥ ⎢ ⎥
P 4 = ⎢ 1.0 0.0 0.0 0.0 0.0 ⎥ Q = v(P 4) = ⎢
⎢ ⎥
⎢ 0.5 0.2 0.1 0.1 0.1 ⎥
⎥
⎣ 1.0 0.0 0.0 0.0 0.0 ⎦ ⎣ 0.5 0.2 0.1 0.1 0.1 ⎦
1.0 0.0 0.0 0.0 0.0 0.5 0.2 0.1 0.1 0.1
More generally, it is easy to build a matrix P with 3n positive elements resulting
in a completely filled matrix v(P ). Of course the algorithms we survey in the
next section provide matrices with structural or numerical properties. Most of
them do not suffer the same complexity problem.
3 Structure Based Bounding Algorithms for “st”

Comparison
We can also use the two sets of constraints of system 1 and add some structural
properties to simplify the resolution of the bounding matrix. For instance, Al-
gorithm 3 provides an upper bounding matrix which is upper-Hessenberg (i.e.
the low triangle except the main sub-diagonal is zero). Therefore the resolution
by direct elimination is quite simple. In the following we illustrate this princi-
ple with several structures associated to simple resolution methods and present
algorithms to build structure based st-monotone bounding stochastic matrices.
Most of these algorithms do not assume any particular property or structure for
the initial stochastic matrix.
3.1 Upper-Hessenberg Structure
Definition 8 A matrix H is said to be upper-Hessenberg if and only if Hi,j = 0

for i > j + 1.
The paradigm for upper-Hessenberg case is the M/G/1 queue. The resolution
by recursion for these matrices requires o(m) operations [34].
Property 2 Let P be an irreducible finite stochastic matrix such that every

row of the lower triangle of P contains at least one positive element. Let Q be
computed from P by Algorithm 3. Then Q is irreducible, st-monotone, upper-
Hessenberg and an upper bound for P .
The proof is omitted. The algorithm is slightly different of Algorithm 2.

The last two instructions create the upper-Hessenberg structure. Note that the
generalization to block upper-Hessenberg matrices is straightforward.
Algorithm 3 An upper-Hessenberg st-monotone upper bound Q:

q1,n = p1,n ;
for i = 1, 2, . . . , n do q1,i = p1,i ; qi+1,n = max(qi,n , pi+1,n ); od
for i = 2, 3, . . . , n do
for l = n − 1, n − 2, . . . , i do n
n n
qi,l = max( j=l qi−1,j , j=l pi,l ) − j=l+1 qi,j ;
n
if (qi,l = 0) and (pi,l > 0) and ( j=l+1 qi,j < 1) then
n
qi,l = × (1 − j=l+1 qi,j )
od n
qi,i−1 = 1 − j=i qi,j
for l = i − 2, i − 3, . . . , 1 do qi,l = 0 od
od
The application of this algorithm on matrix P 1 already defined leads to:
⎡ ⎤
0.5 0.2 0.1 0.2 0.0
⎢ 0.1 0.6 0.1 0.1 0.1 ⎥
⎢ ⎥
Q=⎢
⎢ 0.0 0.3 0.5 0.1 0.1 ⎥
⎥
⎣ 0.0 0.0 0.2 0.7 0.1 ⎦
0.0 0.0 0.0 0.5 0.5
3.2 Lumpability
Ordinary lumpability is another efficient technique to combine with stochastic
bounds [36]. Unlike the former algorithms, lumpability implies a state space
reduction. The algorithms are based on Algorithm 1 and on the decomposition
of the chain into macro-states. Again we assume that the states are ordered
according to the macro-state partition. Let r be the number of macro-states.
Let b(k) and e(k) be the indices of the first state and the last state, respectively,
of macro-state Ak . First, let us recall the definition of ordinary lumpability.
Definition 9 (ordinary lumpability) Let Q be the matrix of an irreducible
finite DTMC, let Ak be a partition of the states of the chain. The chain is
ordinary lumpable according to partition Ak , if and only if for all states e and f
in the same arbitrary macro state Ai , we have:

qe,j = qf,j ∀ macro − state Ak
j∈Ak j∈Ak
Ordinary lumpability constraints are consistent with the st-monotonicity and

they provide a simple characterization for matrix Q.
Theorem 6 Let Q be an st-monotone matrix which is an upper bound for P .
Assume that Q is ordinary lumpable for partition Ak and let Qm,l and P m,l be
the blocks of transitions from set Am to set Al for Q and P respectively, then
for all m and l, block Qm,l is st-monotone.
Indeed, since Q is st-monotone we have:

n
n
Q(i, j) ≤ Q(i + 1, j) (3)
j=a j=a
But as Q is ordinary lumpable, if i and i + 1 are in the same macro-state we

have:
Q(i, j) = Q(i + 1, j) ∀r
j∈Ar j∈Ar
So we can subtract in both terms of relation 3 partial sums on the macro state
which are all equal due to ordinary lumpability. Therefore, assume that a, i and
i + 1 are in the same macro state Ak , we get

Q(i, j) ≤ Q(i + 1, j)
j≥a,j∈Ak j≥a,j∈Ak
The algorithm computes the matrix column by column. Each block needs two
steps. The first step is based on Algorithm 1 while second step modifies the
first column of the block to satisfy the ordinary lumpability constraint. More
precisely, the first step uses the same relations as Algorithm 1 but it has to
take into account that the first row of P and Q may now be different due to
the second step. The lumpability constraint is only known at the end of the
first step. Recall that ordinary lumpability is due to a constant row sum for
the block. Thus after the first step, we know how to modify the first column of
the block to obtain a constant row sum. Furthermore due to st-monotonicity,
we know that the maximal row sum is reached for the last row of the block. In
step 2, we modify the first column of the block taking into account the last row
sum. Once a block has been computed, it is now possible to compute the block
on the left.
Algorithm 4 Construction of an ordinary lumpable st-monotone upper

bounding DTMC Q:
q1,n = p1,n ;
for x = r, r − 1, . . . , 1 do n n
for l = e(x)..b(x) do q1,l = j=l p1,l − j=l+1 q1,j ;
for i = 2, 3, . . .
, n do n n
n
qi,l = max( j=l qi−1,j , j=l pi,l ) − j=l+1 qi,j ;
od
for y = 1, 2, . . . , r do
e(y)
c = j=b(y) qe(y),j ;
e(y)
for i = b(y), . . . , e(y) − 1 do qi,b(y) = c − j=b(y)+1 qi,j ; od
od
od
od
Let us illustrate the two steps on a simple example using matrix P 1 formerly
defined. Assume that we divide the state-space into two macro-states: (1, 2) and
(3, 4, 5). We show the first block after the first step (the matrix on the left) and
after the second step.
⎡ ⎤ ⎡ ⎤
0.1 0.2 0.0 0.5 0.2 0.0
⎢ 0.1 0.1 0.1 ⎥ ⎢ 0.5 0.1 0.1 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥
0.5 0.1 0.1 ⎥ ⎢ ⎢ 0.5 0.1 0.1 ⎥
⎢ ⎥
⎣ ⎦ ⎣ ⎦
This algorithm is used in the next section for the analysis of a mechanism for
high speed networks. Most of the algorithms presented here may be applied but
the best results, for this particular problem, were found with this last approach.
3.3 Class C Stochastic Matrices

Some stochastic matrices also have a closed form steady-state solution, for in-
stance, the class C matrices defined in [4].
Definition 10 A stochastic matrix Q = (qi,j )1≤i,j≤n belongs to class C, if for
each column j there exists a real constant cj satisfying the following conditions:
qi+1,j = qi,j + cj , 1 ≤ i ≤ n − 1. Since Q isa stochastic matrix, the sum of
n
elements in each row must be equal to 1, thus j=1 cj = 0.
⎡ ⎤
0.45 0.15 0.4
For instance, the matrix ⎣ 0.35 0.20 0.45 ⎦ is in class C. It is also st-
0.25 0.25 0.5
monotone. These matrices have several interesting properties and we also con-
sider this class for “icx” ordering in section 5. First the steady-state distribution
of Q can be computed in linear time:
n
j=1 j q1,j − 1
πj = q1,j + cj n (4)
1 − j=1 j cj
The st-monotonicity characterization is also quite simple in this class:
Proposition 2 Let P be na stochastic matrix belonging to class C. P is st-
monotone if and only if k=j ck ≥ 0, ∀j ∈ {1, . . . , n}.
The algorithm to obtain a monotone upper bound Q of class C for an
arbitrary matrix P has been presented in [4]. First remark that since the
upper bounding matrix Q belongs to class C, we must determine its first row
q1,j , 1 ≤ j ≤ n, and the columns coefficients cj , 1 ≤ j ≤ n rather than
all the elements of Q. Within former algorithms the elements of Q are linked
by inequalities but now we add the linear relations which define the C class.
For instance we have qn,n = q1,n + n × cn . Therefore we must choose carefully
q1,n and cn to insure that 0 ≤ qn,n ≤ 1. Note that x+ denotes as usual max(x, 0).
Algorithm 5 Construction of a st-monotone upper bounding DTMC Q

which belongs to class
C:
(n−1)pi,n −(i−1)
q1,n = max1≤i≤n−1 n−i
+
p −q
cn = max2≤i≤n i,ni−11,n
for j = n − 1, n − 2, . . . , 2 do
n n
pi,k − q1,k
k=j k=j
αj = max2≤i≤n i−1

n n n
gi = n−1
n−i k=j p i,k − k=j+1 q i,k + i−1
n−i k=j+1 qn,k −1
+
q1,j = [max1≤i≤n−1 gi ]
−q1,j n
cj = max( n−1 , αj+ − k=j+1 ck )
od n
q1,1 = 1 − j=2 q1,j
Again consider an example: let P 5 be a matrix which does not belong to

class C, and Q its upper bounding matrix computed through algorithm 5.
⎡ ⎤ ⎡ ⎤
0.5 0.1 0.4 0.5 0.1 0.4
P 5 = ⎣ 0.7 0.1 0.2 ⎦ Q = ⎣ 0.4 0.15 0.45 ⎦
0.3 0.2 0.5 0.3 0.2 0.5
Since c3 = 0.05, c2 = 0.05 c1 = −0.1, Q belongs to class C. The steady-state

distributions are :
πP 5 = (0.4456, 0.1413, 0.4130) πQ = (0.3941, 0.1529, 0.4529) and πP 5 <st πQ
3.4 Partition and Stochastic Complement
The stochastic complement was initially proposed by Meyer in [26] to uncouple

Markov chains and to provide a simple approximation for steady-state. Here we
propose a completely different idea based on an easy resolution ofthe stochastic
AB
complement. Let us consider a block decomposition of Q: , where A, B,
CD
C, and D are matrices of size n0 ∗n0 , n0 ∗n1 , n1 ∗n0 and n1 ∗n1 (with n0 +n1 = n).
We know that I − D is not singular if P is not reducible [26]. We decompose π
into two components π0 and π1 to obtain the stochastic complement formulation
for the steady-state equation:
⎧
⎨ π0 R = 0
π0 r = 1 (5)
⎩
π1 = π0 H
where H = B(I − D)−1 , R = I − A − HC and r = e0 + He1 .

Following Quessette [17], we chose to partition the states such that matrix D
is upper triangular with positive diagonal elements. It should be clear that this
partition is not mandatory for the theory of stochastic complement. However it
simplifies the computation of H. Such a partition is always possible, even if for
some cases it implies that n1 is very small [17].
It is quite simple to derive from Algorithm 1 an algorithm which builds a
matrix of this form once the partition has been fixed. The algorithm has two
steps. The first step is Algorithm 1. Then we remove the transitions in the
lower triangle of D and sum up their probabilities in the corresponding diagonal
elements of D.
3.5 Single Input Macro State Markov Chain
Feinberg and Chiu [14] have studied chains divided into macro-states where
the transition entering a macro-state must go through exactly one node. This
node is denoted as the input node of the macro-state. They have developed an
algorithm to efficiently compute the steady-state distribution by decomposition.
It consists of the resolution of the macro-state in isolation and the analysis of
the chain reduced to input nodes. Unlike ordinary lumpability, the assumptions
of the theorem are based on the graph of the transitions and do not take into
account the real transition rates.
It is very easy to modify Algorithm 1 to create a Single Input Macro State
Markov chain. We assume that for every macro state, the input state is the last
state of the macro state. Thus the matrix Q looks like this:
⎡ ⎤
... ... ... ...
⎢ A ...0 ... ...0 ... ⎥
⎢ ⎥
⎢ ... ... ... ... ⎥
⎢ ⎥
⎢... ... ... ... ⎥
⎢ ⎥
⎢...0 ... B ...0 ... ⎥
⎢ ⎥
⎢... ... ... ... ⎥
⎢ ⎥
⎢... ... ... ... ⎥
⎢ ⎥
⎣...0 ... ...0 ... C ⎦
... ... ... ...
The algorithm is based on the following decomposition into three types of
block : diagonal blocks, upper triangle and lower triangle. The elements of diag-
onal blocks are computed using the same equalities as in Algorithm 1:
⎧ n n
⎨ Q1,j = k=j P1,k − k=j+1 Q1,k
n n n (6)
⎩
Qi+1,j = max( k=j Qi,k , k=j Pi+1,k ) − k=j+1 Qi+1,k
The elements of blocks in upper and lower triangles have the “single input”
structure : several columns of zero followed by a last column which is positive.
Furthermore, lower and upper triangles differ because the elements of lower
triangle of Q must follows inequalities which take into account the diagonal
blocks of Q. Let us denote by f (i) the lower index of the set which contains state
i. Then for all i, j in the upper triangle, we just have to sum up the elements of
P (take care of the lower index f (j) on the summation of the elements of P ):
⎧ n
⎪ Q1,n = k=f (n) P1,k
⎪
⎪
⎪
⎨ n n
Q1,j = k=f (j) P1,k − k=j+1 Q1,k (7)
⎪
⎪
⎪
⎪ n n n
⎩
Qi+1,j = max( k=j Qi,k , k=f (j) Pi+1,k ) − k=j+1 Qi+1,k
And for all i, j in the lower triangle (here the lower index f (j) is also also
used in the summation of the elements in the former row of Q):
& n n n
Qi+1,j = max( k=f (j) Qi,k , k=f (j) Pi+1,k ) − k=j+1 Qi+1,k (8)
The derivation of the algorithm is straightforward. Again let us apply this
algorithm on matrix P 1 with partition into two sets of size 2 and 3 to obtain
matrix Q (we also give the values of f for all the indices):
⎡ ⎤
0.5 0.2 0.0 0.0 0.3
⎢ 0.1 0.6 0.0 0.0 0.3 ⎥
⎢ ⎥
f = (1, 1, 3, 3, 3) Q = ⎢ ⎥
⎢ 0.0 0.3 0.4 0.0 0.3 ⎥
⎣ 0.0 0.1 0.1 0.5 0.3 ⎦
0.0 0.2 0.0 0.3 0.5
This structure have been used by several authors even if their proofs of compar-
ison are usually based on sample-path theorem [19,24,25].
3.6 Quasi Birth and Death Process
Finally, we have to briefly mention QBD matrices. They have a well-known

algorithmic solution [23] but clearly it is not always possible to build an upper
bounding st-monotone matrix which is block-tridiagonal. However, it is possible
to derive some generalization of Algorithm 1 to get a QBD is the initial matrix
has upper bounded transitions to the right (i.e., there exist a small integer k
such that for all indices, if j − i > k then P (i, j) = 0). The example presented
in [24] is partially based on such a structure.
4 A Real Example with Large State Space
As an example, we present the analysis of a buffer policy which combines the

PushOut mechanism for the space management and a Head Of Line service
discipline. We assume that there exist two types of packets with distinct loss
rate requirements. In the sequel, we denote as high priority, the packets which
have the highest requirements, i.e., the smallest loss ratio. A low priority packet
which arrives in a full buffer is lost. If the buffer is not full, both types of packets
are accepted. The PushOut mechanism specifies that when the buffer is full,
an arriving high priority packet pushes out of the buffer a low priority one if
there is any in the buffer. Otherwise the high priority packet is lost. For the
sake of simplicity, we assume that packet size is constant. This is consistent with
ATM cells but it is clearly a modeling simplification for other networks. Such a
mechanism has been proposed for ATM networks [18]. We further assume that
the low priority packets are scheduled before high priority packets (recall that
the priority level is based on the access). We assume that the departure due to
service completion always takes place just before the arrivals.
a h ig h p rio rity c e ll
" p u s h e s -o u t" a lo w p rio rity c e ll
L L H H H H H L L
B a tc h B e rn o u illi D e te rm in is tic
a rriv a ls o f h ig h s e rv ic e tim e
a n d lo w p rio rity c e lls
th e lo w p rio rity
c e ll is lo s t
Fig. 1. Push-Out mechanism description
As the buffer size is B, the number of states is (B + 1)(B + 2)/2 if the

arrivals follow a simple batch process. For the sake of simplicity we assume that
the batch size is between 0 and 2. We use the following representation for the
state space (T, H) where T is the total number of packets and H is the number
of high priority packets. The states are ordered according to a lexicographic
non decreasing ordering. It must be clear at this point that the ordering of
the state is a very important issue. First, the rewards have to be non decreasing
functions of the state indices. Furthermore, as the st-monotone property is based
on the state representation and ordering, the accuracy of the results may depend
on this ordering. Here, we are interested in the the expected number of lost
i
packet per slot. Let us denote by RM this expectation for type i packets and let
R = R + R . The difficult problem here is the computation of RH . Indeed R
H L
can be computed with a smaller chain since the Pushout mechanism does not
change the global number of losses. It is sufficient to analyze the global number
of packets (i.e without distinction). Such a chain has only B + 1 states if we use
a simple batch arrival process. For realistic values of buffer size (i.e. 1000), such
a chain is very simple to solve with usual numerical algorithms. However for
the same value of B, the chain of the HOL+Pushout mechanisms has roughly
5 105 states. So, we use Algorithm 4 to get a lumpable bounding matrix. And
we analyze the macro-state chain. First let us describe the ordering of the states
and the rewards. Let pH k be the probability of k arrivals of high priority packets
during one slot.

RH = Π(T, H) pH 2 max(0, (H + 2 − B − 1T =H ))
(T,H)
Where 1T =H is an indicator function which states if one high priority packet

can leave the buffer at the end of the slot after service completion (T = H that
there is no low priority packet). Thus max(0, (H + 2 − B − 1T =H )) is the number
of packets exceeding the buffer size. For this particular case, due to scheduling
of arrivals and service, RH can be computed in a more simpler expression :
2 × Π(B, B)
R H = pH
Clearly, we have to estimate only one probability and the reward is a non
decreasing function which is zero everywhere except for the last state where its
value is one. For more general arrival process, the reward function is only slightly
different.
The key idea to shorten the state space is to avoid the states with large value
of low priority packets. So, we bound the matrix with an ordinary lumpable
matrix Q with o(B × F ) macro-states where the parameter F allows a trade-off
between the computational complexity and the accuracy of the results. More
precisely, we define macro-states (T, Y ) where Y is constrained to evolve in the
range T..T − F . If Y = T − F , then the state (T, Y ) is a real macro-state which
contains all the states (T, X) such that X ≤ T − F . In this case Y is a upper
bound of the number of high priority packets in the states which are aggregated.
If Y > T − F then the state contains only one state (T, X) where Y = X. So, Y
represents exactly the number of high priority packets in the buffer (see figure
2). Clearly, if the value of F is large, we do not reduce the state space but we
expect that the bound would be tight.
s ta te s w h e re
T h ig h p rio rity
c e lls c a n b e lo s t
B
(B -M + 2 , B -M + 2 ) M
(B -M + 2 , B -M + 1 )
H = T -F
F m a c ro -s ta te s
u n c h a n g e d
s ta te s
B H
Fig. 2. The aggregated chain
In [16] we have analyzed small buffers to check the accuracy of the bound
and large buffers to show the efficiency of the method. Here, we only present
a typical comparison of these bounds for a small buffer of size 80 (these small
value allows the computation of the exact result for comparison purpose). The
load is 0.9 with 2/3 high level packets. With a sufficiently large value for F
(typically 10), the algorithm gives accurate results. The exact result for RH is
in this example 8.9 10−13 . The bound with F = 10 is 9.510−13 . Of course if
F is too small, the result is worse and can reach 10−6 for F = 2. The exact
chain has 3321 states while the bound with F = 10 is computed with a chain
of size 798. The number of states is divided by 4 and we only lost few digits. It
is worthy to remark that a reduction by an order on the states space implies a
reduction by two or three orders on the computation times for the steady state
distribution. And the reduction is much more important if the original chain is
bigger. Typically, for a buffer size of 1000 and an aggregation factor F equal to
20, the bounding matrix obtained from Algorithm 5 has roughly 20000 states.
The original state space is 25 times larger.
The results shows previously are very accurate. We have found several reasons
for that property. First the distribution is skewed. Almost all the probability
mass is concentrated on the states with a small number of packets. Moreover
the first part of the initial matrix is already st-monotone. This property is due
to the ordering of the states we have considered. Again, we have to emphasis
that the states ordering is a crucial issue for st-bounds [11].
For instance, consider the matrix of the chain for a small buffer of size 4.
The chain has 15 states ordered in a lexicographic way: {(0), (1, 0), (1, 1), (2, 0),
(2, 1), (2, 2), (3, 0), (3, 1), (3, 2), (3, 3), (4, 0), (4, 1), (4, 2), (4, 3), (4, 4)}. Let us de-
note by p, q and r respectively the arrival probabilities of arrival for a batch of
size 0, 1 or 2. And let a be the probability that an arriving packets is a low pri-
ority one. Similarly b is the probability for a high level packet. The distribution
of packets types in a size 2 batch are respectively c for 2 low level packets, e for
two high level, and d for a mixed batch. Independence assumption on the type
of packets entering the queue lead to an important reduction of the number of
parameter (for instance c = a2 ). However, it is not necessary to illustrate the
effect of Algorithm 5.
⎛ ⎞
p qa qb rc rd re
⎜p qa qb rc rd re ⎟
⎜ ⎟
⎜p qa qb rc rd re ⎟
⎜ ⎟
⎜ p qa qb rc rd re ⎟
⎜ ⎟
⎜ ⎟
⎜ ⎟
⎜ ⎟
P =⎜ ⎜ p qa qb rc rd re ⎟
⎟
⎜ ⎟
⎜ ⎟
⎜ p qa + rc qb + rd re ⎟
⎜ ⎟
⎜ ⎟
⎜ ⎟
⎝ p qa + rc qb + rd + re⎠
p qa + rc bq + rd + re
A careful inspection of matrix P shows that the 10 first rows of the matrix
already satisfy the st-monotone property. For a bigger buffer model, this prop-
erty is still true for the states where the buffer is not full. We assume that F = 2,
the only one non trivial values for such a small example). Thus, we consider two
real macro-states : {(3, 2), (3, 3)} and {(4, 2), (4, 3), (4, 4)}. Note that the initial
matrix is already lumpable since the scheduling of service and arrivals imply that
some states have similar transitions. For instance states (0), (1, 0) and (1, 1) can
be gathered into one macro-state). We use this property in the resolution algo-
rithm but we do not develop here to focus on the bounding algorithm. Algorithm
5 provides a lumpable matrix with the macro-states already defined which can
be aggregated to obtain (f = min(qb, rc) and g = max(qb, rc)):
⎛ ⎞
p qa qb rc rd re
⎜ ⎟
⎜ ⎟
⎜ ⎟
⎜ p qa qb rc rd + re ⎟
⎜ ⎟
⎜ ⎟
⎜ ⎟
⎜ ⎟
⎜ p qa + qb r ⎟
⎜ ⎟
⎜ p qa + f g − rc r ⎟
⎜ ⎟
⎝ p qa + f g + rd + re ⎠
p q+r
5 Algorithms for “icx” Comparison
Stoyan’s proof in Theorem 4.2.5 of ([35], p.65]) that the monotonicity and the
comparability of transition matrices yield sufficient conditions for chain com-
parison is not restricted to “st” ordering. Similarly, the definitions of the mono-
tonicity and the comparison of stochastic matrices are much more general than
the statements presented in section 2. First let us turn back to the definitions
for “icx” ordering which is supposed to be more accurate than the st-ordering.
Definition 11 Let X and Y be two random variables taking values on a totally

ordered space. X is said to be less than Y (X <icx y= if and only if E[f (X)] ≤
E[f (Y )], for all non decreasing convex functions f , whenever the expectations
exist.
For discrete state space, it is also possible to use a matrix formulation through
matrix Kicx . Let p and q be respectively the probability distribution vectors of
X and Y . X <icx Y if and only if pKicx ≤ qKicx , where Kicx is defined as
following : ⎡ ⎤
10 0 ... 0
⎢2 1 0 ... 0⎥
⎢ ⎥
⎢3 2 1 ... 0⎥
Kicx = ⎢ ⎥
⎢ .. .. .. . . .. ⎥
⎣. . . ..⎦
n n − 1 n − 2 ... 1
This can be rewritten as follows :

n
n
X <icx Y ⇐⇒ (k − i + 1) pk ≤ (k − i + 1) qk , ∀i ∈ {1, . . . , n}
k=i k=i
Similarly, we can define the increasing concave ordering by the set non-
decreasing concave functions. In this case Kicv = −Kicx T
, where AT denotes
the transpose of matrix A.
Clearly, the icx-comparison and the icx-monotonicity of stochastic matri-
ces are defined in the same manner as the st-ordering (see definitions 2 and
3). However, the characterization of the <icx -monotonicity through matrix
Kicx must take into account the finiteness of matrix P . Indeed, the conditions
−1
Kicx P Kicx ≥ 0 provide sufficient conditions for the <icx -monotonicity. It is
known for a long time time that these conditions are also necessary for infinite
chains.
For finite chains, the necessary conditions were unknown until recently. More-
−1
over the conditions Kicx P Kicx ≥ 0 are very restrictive and they lead to a chain
whose first and last states are absorbing. Thus, it was not possible to develop an
algorithmic approach without an efficient necessary and sufficient condition for
monotonicity. Recently, in [2], Benmammoun has proved such conditions for the
icx-monotonicity of finite chains. This characterization is based on matrix Zicx
−1
which is slightly different from matrix Kicx .
⎡ ⎤ ⎡ ⎤
1 0 0 ... 0 1 0 0 ... 0
⎢ −2 1 0 . . . 0⎥ ⎢ −1 1 0 . . . 0 ⎥
⎢ ⎥ ⎢ ⎥
⎢ 0 ⎥ Zicx = ⎢
⎥ ⎥
−1
Kicx = ⎢ 1 −2 1 . . . ⎢ 1 −2 1 . . . 0 ⎥
⎢ .. .. .. . . .. ⎥ ⎢ .. .. .. . . .. ⎥
⎣. . . . .⎦ ⎣. . . . . ⎦
0 . . . 1 −2 1 0 . . . 1 −2 1
5.1 Basic Algorithm
The sufficient conditions to compare Markov chains through the monotonicity

and the comparability of matrices (see theorem 1) are also valid for the
icx-ordering. Therefore, it is possible to design an algorithm to construct an icx-
monotone and upper bounding chain based on Benmammoun’s characterization.
Algorithm 6 An icx-monotone upper bound Q:

q1,n = p1,n ;
q2,n = max(q1,n , p2,n );
for i = 3, . . . , n do qi,n = max(pi,n , 2qi−1,n − qi−2,n ); od
for j = n − 1, n − 2, · · · , 2 do n
n
q1,j = k=j (k − j + 1)p1,k − k=j+1 (k − j + 1)q1,k ;
n
n
q2,j = max k=j (k − j + 1)p 2,k , k=j (k − j + 1)q 1,k
n
− k=j+1 (k − j + 1)q2,k ;
for i = 3, 4, · · · , n do
n
qi,j = max k=j (k − j + 1)pi,k ,
n n
2 k=j (k − j + 1)qi−1,k − k=j (k − j + 1)qi−2,k
n
− k=j+1 (k − j + 1)qi,k ;
od
od n
for i = 1, 2 · · · n do qi,1 = 1 − j=2 qi,j ; od
Unfortunately, the output of this algorithm is not always a stochastic matrix

as we may obtain elements larger than 1.0. First, we apply this algorithm to
matrix P 6 and the output is a stochastic upper bound Q:
⎡ ⎤ ⎡ ⎤
0.5 0.15 0.35 0.5 0.15 0.35
P 6 = ⎣ 0.3 0.4 0.3 ⎦ Q = ⎣ 0.35 0.3 0.35 ⎦
0.45 0.1 0.45 0.4 0.25 0.45
However, for matrix P 7, the output of Algorithm 6 is not a stochastic matrix

since Q3,3 > 1.
⎡ ⎤
0.5 0.15 0.35
P 7 = ⎣ 0.3 0.0 0.7 ⎦
0.45 0.1 0.45
Indeed, the last column of Q is:

⎡ ⎤
0.35
Q=⎣ 0.7 ⎦
1.05
Several heuristics may be used to solve this problem. Further researches are
still necessary to obtain a simple and efficient algorithm.
5.2 Class C Matrices

Beside, the closed form solution of the stationary distribution, class C matrices
have nice properties about stochastic monotonicity. First, we present the stochas-
tic monotonicity characterization for this class and then we show an algorithm
to construct an icx-monotone, upper bounding, class C matrix.
Proposition 3

n
P is icx − monotone ⇐⇒ (k − j + 1) ck ≥ 0, ∀j ∈ {1, . . . , n}
k=j
Proposition 4 If P is in class C, then
P is st − monotone =⇒ P is icx − monotone
Let us emphasize here by an example, that, in general, st-monotonicity does

not imply icx-monotonicity:
⎡ ⎤
0.6 0.4 0
P = ⎣ 0.2 0.2 0.6 ⎦
0.1 0.3 0.6
Clearly, P is st-monotone. On the other hand, if we consider p = [0.2 0.4 0.4]

and q = [0.3 0.1 0.6]. pP = [0.24 0.28 0.48] and qP = [0.26 0.32 0.42]. And
p <icx q is true while pP <icx qP is false. Thus P is not icx-monotone.
In the following algorithm we compute an icx-monotone, upper bounding,
class C matrix, Q for a given matrix P [4] As in the st-ordering case, this
algorithm consists in computing the first row q1,j , 1 ≤ j ≤ n and the constant
cj , 1 ≤ j ≤ n. In fact, these parameters take values within an interval ([cj , cj ])
and [q1,j , q1,j ]). Since we construct an upper bound, one must intuitively choose
the smallest values for these parameters in order to have elements which are as
close as possible to the original ones. Moreover, we define a constant const which
is greater than 1, but less than cj . By doing so, all entries of Q are positive, thus
Q is irreducible.
Algorithm 7 Construction of an icx-monotone upper bounding DTMC Q

which belongs to class C:
(n−1)pi,n −i+1 1−q1,n
q1,n = max1≤i≤n−1 [ n−i ]; q1,n = q1,n + const ;
pi,n −q1,n 1−q1,n
cn = max2≤i≤n [ i−1 ]; cn = n−1 ;
cn −cn
if cn < 0 then cn = 0 else cn = cn + const ;
for j = n − 1, n −2, · · · , 2 do
n n
f (i, j) = n−1
n−i k=j (k − j + 1)p i,k − k=j+1 (k − j + 1)q i,k
n
− n−i
i−1
[1 − k=j+1 qn,k ];
n
1− q1,k −q1,j
+ k=j+1
q1,j = (max1≤i≤n−1 f (i, j)) ; q1,j = q1,j + const ;
n
(k−j+1)pi,k − (k−j+1)qi,k −q1,j
αj = max2≤i≤n ( k=j k=j+1
i−1 );
n
−q1,j 1− qn,k −q1,j
k=j+1
cj = max(αj , n−1 ); cj = n−1 ;
n n
if cj < − k=j=1 (k − j + 1)ck then cj = − k=j+1 (k − j + 1)ck
cj −cj
else cj = cj const ;
od n
q1,1 = 1 − j=2 q1,j ;
n
c1 = − j=2 cj ;
We illustrate the application of this algorithm on matrix P 7.

⎡ ⎤ ⎡ ⎤
0.25 0.2 0.25 0.3 0.2718 0.1962 0.162 0.37
⎢ 0.15 0.1 0.65 0.1 ⎥ ⎢ 0.279 0.158 0.136 0.427 ⎥
P7 = ⎢ ⎥ ⎢
⎣ 0.35 0.05 0.15 0.45 ⎦ Q = ⎣ 0.2863 0.1199 0.1098
⎥
0.484 ⎦
0.3 0.2 0.1 0.4 0.2935 0.0818 0.0837 0.541
Matrix Q obtained by this algorithm belongs to class C with c1 = 0.00723,

c2 = −0.03813, c3 = −0.0261, c4 = 0.057. Their steady-state distributions are
πP = (0.2755, 0.1497, 0.2354, 0.3393) and πQ = (0.2846, 0.1286, 0.1157, 0.4711)
and we have πP <icx πP .
6 Conclusions
Strong stochastic bounds are not limited to sample-path proofs. It is now possi-
ble to compute bounds of the steady-state distribution directly from the chain.
This approach may be specially useful for high speed networks modeling where
the performance requirements are thresholds. Using the algorithmic approach
we survey in this paper, a sample-path proof is not necessary anymore and these
algorithms may be integrated into software performance tools based on Markov
chains. Generalizations to other orderings or to computation of transient mea-
sures are still important problems for performance analysis.
References
1. Abu-Amsha O., VincentJ.-M.: An algorithm to bound functionals of Markov chains
with large state space. Int: 4th INFORMS Conference on Telecommunications,
Boca Raton, Florida, (1998)
2. Benmammoun M.: Encadrement stochastiques et évaluation de performances des
réseaux, PHD, Université de Versailles St-Quentin en Yvelines, (2002)
3. Benmammoun M., Fourneau J.M., Pekergin N., Troubnikoff A.: An algorithmic and
numerical approach to bound the performance of high speed networks, Submitted,
(2002)
4. Benmammoun M., Pekergin N.: Closed form stochastic bounds on the stationary
distribution of Markov chains. To appear in Probability in the Engineering and
Informational Sciences, (2002)
5. Boujdaine F., Dayar T., Fourneau J.M., Pekergin N., Saadi S., Vincent J.M.: A new
proof of st-comparison for polynomials of a stochastic matrix, Submitted, (2002)
6. Buchholz P.: An aggregation\disaggregation algorithm for stochastic automata net-
works. In: Probability in the Engineering and Informational Sciences, V 11, (1997)
229–253
7. Buchholz P.: Projection methods for the analysis of stochastic automata networks.
In: Proc. of the 3rd International Workshop on the Numerical Solution of Markov
Chains, B. Plateau, W. J. Stewart, M. Silva, (Eds.), Prensas Universitarias de
Zaragoza, Spain, (1999) pp. 149–168.
8. Courtois P.J., Semal P.: Bounds for the positive eigenvectors of nonnegative ma-
trices and for their approximations by decomposition. In: Journal of ACM, V 31
(1984) 804–825
9. Courtois P.J., Semal P.: Computable bounds for conditional steady-state prob-
abilities in large Markov chains and queueing models. In: IEEE JSAC, V4, N6,
(1986)
10. Dayar T., Fourneau J.M., Pekergin N.: Transforming stochastic matrices for
stochastic comparison with the st-order, Submitted, (2002)
11. Dayar T., Pekergin, N.: Stochastic comparison, reorderings, and nearly completely
decomposable Markov chains. In: Proceedings of the International Conference on
the Numerical Solution of Markov Chains (NSMC’99), (Ed. Plateau, B. Stewart,
W.), Prensas universitarias de Zaragoza. (1999) 228–246
12. Dayar T., Stewart W. J.: Comparison of partitioning techniques for two-level iter-
ative solvers on large sparse Markov chains. In: SIAM Journal on Scientific Com-
puting V21 (2000) 1691–1705.
13. Donatelli S.: Superposed generalized stochastic Petri nets: definition and efficient
solution. In: Proc. 15th Int. Conf. on Application and Theory of Petri Nets,
Zaragoza, Spain, (1994)
14. Feinberg B.N., Chiu S.S.: A method to calculate steady-state distributions of large
Markov chains by aggregating states. In: Oper. Res, V 35 (1987) 282-290
15. Fernandes P., Plateau B., Stewart W.J.: Efficient descriptor-vector multiplications
in stochastic automata networks. In: Journal of the ACM, V45 (1998) 381–414.
16. Fourneau J.M., Pekergin N., Taleb H.: An Application of Stochastic Ordering to
the Analysis of the PushOut Mechanism. In Performance Modelling and Evaluation
of ATM Networks, Chapman and Hall, (1995) 227–244
17. Fourneau J.M., Quessette F.: Graphs and Stochastic Automata Networks. In: Pro-
ceedings of the 2nd Int. Workshop on the Numerical Solution of Markov Chains,
Raleigh, USA, (1995)
18. Hébuterne G., Gravey A.: A space priority queueing mechanism for multiplexing
ATM channels. In: ITC Specialist Seminar, Computer Network and ISDN Systems,
V20 (1990) 37–43
19. Golubchik, L. and Lui, J.: Bounding of performance measures for a threshold-based
queuing systems with hysteresis. In: Proceeding of ACM SIGMETRICS’97, (1997)
147–157
20. Hillston J., Kloul L.: An Efficient Kronecker Representation for PEPA Models. In:
PAPM’2001, Aachen Germany, (2001)
21. Keilson J., Kester A.: Monotone matrices and monotone Markov processes. In:
Stochastic Processes and Their Applications, V5 (1977) 231–241
22. Kijima M.: Markov Processes for stochastic modeling. Chapman & Hall (1997)
23. Latouche G., Ramaswami V.: Introduction to Matrix Analytic Methods in Stochas-
tic Modeling. SIAM, (1999)
24. Lui, J. Muntz, R. and Towsley, D.: Bounding the mean response time of the min-
imum expected delay routing policy: an algorithmic approach. In: IEEE Transac-
tions on Computers. V44 N12 (1995) 1371–1382
25. Lui, J. Muntz, R. and Towsley, D.: Computing performance bounds of Fork-Join
parallel programs under a multiprocessing environment. In: IEEE Transactions on
Parallel and Distributed Systems. V9 N3 (1998) 295–311
26. Meyer C.D.: Stochastic complementation, uncoupling Markov chains, and the the-
ory of nearly reducible systems. In: SIAM Review. V31 (1989) 240–272.
27. Pekergin N.: Stochastic delay bounds on fair queueing algorithms. In: Proceedings
of INFOCOM’99 New York (1999) 1212–1220
28. Pekergin N.: Stochastic performance bounds by state reduction. In: Performance
Evaluation V36-37 (1999) 1–17
29. Plateau B.: On the stochastic structure of parallelism and synchronization models
for distributed algorithms. In: Proceedings of the SIGMETRICS Conference on
Measurement and Modeling of Computer Systems, Texas (1985) 147–154
30. Plateau B., Fourneau J.-M., Lee K.-H.: PEPS: A package for solving complex
Markov models of parallel systems. In: Modeling Techniques and Tools for Com-
puter Performance Evaluation, R. Puigjaner, D. Potier (Eds.), Spain (1988) 291–
305
31. Plateau B., Fourneau J.-M.: A methodology for solving Markov models of parallel
systems. In: Journal of Parallel and Distributed Computing. V12 (1991) 370–387.
32. Shaked M., Shantikumar J.G.: Stochastic Orders and Their Applications. In: Aca-
demic Press, California (1994)
33. Stewart W.J., Atif K., Plateau B.: The numerical solution of stochastic automata
networks. In: European Journal of Operational Research V86 (1995) 503–525
34. Stewart W. J.: Introduction to the Numerical Solution of Markov Chains. Princeton
University Press, (1994)
35. Stoyan D.: Comparison Methods for Queues and Other Stochastic Models. John
Wiley & Sons, Berlin, Germany, (1983)
36. Truffet L.: Reduction Technique For Discrete Time Markov Chains on Totally
Ordered State Space Using Stochastic Comparisons. In: Journal of Applied Prob-
ability, V37 N3 (2000)
37. Uysal E., Dayar T.: Iterative methods based on splittings for stochastic automata
networks. In: European Journal of Operational Research, V 110 (1998) 166–186
38. Van Dijk N.: Error bound analysis for queueing networks” In: Performane 96 Tu-
torials, Lausanne, (1996)
Dynamic Scheduling via Polymatroid
Optimization
David D. Yao
Columbia University, New York, NY 10027, USA,

[email protected],
http://www.ieor.columbia.edu/∼yao
Abstract. Dynamic scheduling of multi-class jobs in queueing systems

has wide ranging applications, but in general is a very difficult control
problem. Here we focus on a class of systems for which conservation laws
hold. Consequently, the performance space becomes a polymatroid — a
polytope with a matroid-like structure, with all the vertices correspond-
ing to the performance under priority rules, and all the vertices are easily
identified. This structure translates the optimal control problem to an
optimization problem, which, under a linear objective, becomes a special
linear program; and the optimal schedule is a priority rule. In a more
general setting, conservation laws extend to so-called generalized conser-
vation laws, under which the performance space becomes more involved;
however, the basic structure that ensures the optimality of priority rules
remains intact. This tutorial provides an overview to the subject, fo-
cusing on the main ideas, basic mathematical facts, and computational
implications.
1 Polymatroid
1.1 Equivalent Definitions and Properties
We start with three equivalent definitions of a polymatroid. Definition 1 is the

most standard one; Definition 2 will later motivate the definition for EP; Defini-
tion 3 provide a contrast against the structure of the EP in §4 (refer to Definition
7).
Throughout, E = {1, ..., n} is a finite set; Ac denotes the complement of set
A: Ac = E \ A; and the terms, “increasing” and “decreasing” are used in the
non-strict sense, meaning “non-decreasing” and “non-increasing”, respectively.
Definition 1. (Welsh [47], Chapter 18) The following polytope

P(f ) = { x ≥ 0 : xi ≤ f (A), A ⊆ E } (1)
i∈A
is termed a polymatroid if the function f : 2E → + satisfies the following

properties:

90 D.D. Yao
(i) (normalized) f (∅) = 0;

(ii) (increasing) if A ⊆ B ⊆ E, then f (A) ≤ f (B);
(iii) (submodular) if A, B ⊆ E, then f (A) + f (B) ≥ f (A ∪ B) + f (A ∩ B).
In matroid parlance, a function f that satisfies the above properties is termed
a “rank function.” Also note that a companion to submodularity is supermod-
ularity, defined as when the inequality in (iii) holds in the opposite direction
(≤).
We now present the second definition for polymatroid. Given a set function
f : 2E → + , with f (∅) = 0, and a permutation π of { 1, 2, · · · , n }, the elements
of the set E, we define a vector xπ with the following components (to simplify
notation, xπi below is understood to be xππi ):
xπ1 = f ({π1 })
xπ2 = f ({π1 , π2 }) − xπ1 = f ({π1 , π2 }) − f ({π1 })
..
.
xπn = f ({π1 , π2 , · · · , πn }) − f ({π1 , π2 , · · · , πn−1 })
xπ is termed a “vertex” of the polytope P(f ) in (1). Note, however, that this
terminology could be misleading, since a priori there is no guarantee that xπ
necessarily belongs to the polytope, since we simply do not know, as yet, whether
or not xπ defined as above satisfies the set of inequalities that define P(f ) in
(1). In fact, this is the key point in the second definition of polymatroid below.
Definition 2. P(f ) of (1) is a polymatroid if xπ ∈ P(f ) for all permutation π.
Here is a third definition.
Definition 3. P(f ) of (1) is a polymatroid if for any A ⊂ B ⊆ E, there exists
a point x ∈ P(f ), such that

xi = f (A) and xi = f (B).
i∈A i∈B
Below we show the three definitions are equivalent.

Theorem 1. The above three definitions for polymatroid are equivalent.
Proof. ( Definition 1 =⇒ Definition 2 )
That xπi ≥ 0 for all i follows directly from the increasing property of f .
For any A ⊆ E and πi ∈ A, since f is submodular, we have
f (A ∩ {π1 , · · · , πi }) + f ({π1 , · · · , πi−1 })
≥ f (A ∩ {π1 , · · · , πi−1 }) + f ((A ∩ {π1 , · · · , πi }) ∪ {π1 , · · · , πi−1 })
= f (A ∩ {π1 , · · · , πi−1 }) + f ({π1 , · · · , πi }),
which implies
f ({π1 , · · · , πi }) − f ({π1 , · · · , πi−1 })
≤ f (A ∩ {π1 , · · · , πi }) − f (A ∩ {π1 , · · · , πi−1 }).
Dynamic Scheduling via Polymatroid Optimization 91
Summing over πi ∈ A, we have

xπi = (f ({π1 , · · · , πi }) − f ({π1 , · · · , πi−1 })
πi ∈A πi ∈A

≤ f (A ∩ {π1 , · · · , πi }) − f (A ∩ {π1 , · · · , πi−1 }) = f (A).
πi ∈A
Hence, xπ ∈ P(f ), and Definition 2 follows.

(Definition 2 =⇒ Definition 3)
For any given A ⊂ B ⊆ E, from Definition 2, it suffices to pick a vertex
xπ , such that its first |A| components constitute the set A, and its first |B|
components constitute the set B.
( Definition 3 =⇒ Definition 1 )
Taking A = ∅ in Definition 3 yields f (∅) = 0. Monotonicity is trivial, since
xi ≥ 0. For submodularity, take any A, B ⊆ E, A = B; then there exists
x ∈ P(f ) such that

xi = f (A ∪ B), and xi = f (A ∩ B),
A∪B A∩B
since A ∩ B ⊂ A ∪ B. Therefore,

f (A ∪ B) + f (A ∩ B) = xi + xi = xi + xi ≤ f (A) + f (B),
A∪B A∩B i∈A i∈B
where the inequality follows from x ∈ P(f ).
1.2 Optimization
Here we consider the optimization problem of maximizing a linear function over

the polymatroid P(f ).

(P) max ci xi
i∈E

s.t. xi ≤ f (A), for all A ⊆ E,
i∈A
xi ≥ 0, for all i ∈ E.
Assume
c1 ≥ c2 ≥ · · · ≥ cn ≥ 0, (2)
without loss of generality, since any negative ci clearly results in the correspond-
ing xi = 0. Let π = (1, 2, · · · , n). Then, we claim that the vertex xπ in Definition
2 is the optimal solution to (P).
92 D.D. Yao
To verify the claim, we start with writing down the dual problem as follows:

( D) min yA f (A)
A⊆E

s.t. yA ≥ ci , for all i ∈ E,
Ai
yA ≥ 0, for all A ⊆ E.
Define y π , a candidate dual solution, componentwise as follows:

π
y{1} = c1 − c2 ,
π
y{1,2} = c2 − c3 ,
..
.
π
y{1,...,n−1} = cn−1 − cn ,
π
y{1,...,n} = cn ;
π
and set yA = 0, for all other A ⊆ E.
Now the claimed optimality follows from
(1) primal feasibility: xπ is a vertex of the polymatroid P(f ), and hence is
feasible by definition (refer to Definition 2);
(2) dual feasibility: that y π is feasible is easily checked (in particular, non-
negativity follows from (2));
(3) complementary slackness: also easily checked, in particular, the n binding
constraints in (P) that define the vertex xπ correspond to the n non-zero
(not necessarily zero, to be precise) components of y π listed above.
It is also easy to verify that the primal and the dual objectives are equal: letting
cn+1 := 0, we have

ci xπi = ci [f ({1, · · · , i}) − f ({1, · · · , i − 1})]
i∈E i∈E
n
= (ci − ci+1 )f ({1, · · · , i}) = π
yA f (A).
i=1 A⊆E
To summarize, xπ is optimal for (P) and y π is optimal for (D). It is important

to note that
(a) Primal feasibility is always satisfied, by definition of the polymatroid.
(b) It is the dual feasibility that determines the permutation π, which, by way
of complementary slackness, points to a vertex of P(f ) that is optimal.
More specifically, the sum of the dual variables yields the cost coefficients:
π
y{1,...,i} + · · · + y{1,...,n}
π
= ci , i = 1, ..., n; (3)
the order of which [cf. (2)] decides the permutation π.

2 Conservation Laws
2.1 Polymatroid Structure
To relate to the last section, here E = {1, 2, ..., n} denotes the set of all job
classes, and x denotes the vector of performance measures of interest. For in-
stance, xi is the (long-run) average delay or throughput of job class i.
The conservation laws defined below were first formalized in Shanthikumar
and Yao [39], where the connection to polymatroid was made. In [39], as well as
subsequent papers in the literature, these laws are termed “strong conservation
laws.” Here, we shall simply refer to these as conservation laws.
Verbally, conservation laws can be summarized into the following two state-
ments:
(i) the total performance (i.e., the sum) over all job classes in E is invariant
under any admissible policy;
(ii) the total performance over any given subset, A ⊂ E, of job classes is mini-
mized (or maximized) by offering priority to job classes in this subset over
all other classes.
As a simple example, consider a system of two job classes. Each job (of either
class) brings a certain amount of “work” (service requirement) to the system.
Suppose the server serves (i.e., depletes work) at unit rate. Then it is not difficult
to see that (i) the total amount of work, summing over all jobs of both classes
that are present in the system, will remain invariant regardless of the actual
policy that schedules the server, as long as it is non-idling; and (ii) if class 1
jobs are given preemptive priority over class 2 jobs, then the amount of work
in system summing over class 1 jobs is minimized, namely, it cannot be further
reduced by any other admissible policy.
We now state the formal definition of conservation laws. For any A ⊆ E, de-
note by |A| the cardinality of A. Let A denote the space of all admissible policies
— all non-anticipative and non-idling policies (see more details below), and xu
the performance vector under an admissible policy u ∈ A. As before, let π denote
a permutation of the integers {1, 2, ..., n}. In particular, π = (π1 , ..., πn ) denotes
a priority rule, which is admissible, and in which class π1 jobs are assigned the
highest priority, and class πn jobs, the lowest priority.
Definition 4. (Conservation Laws) The performance vector x is said to satisfy
conservation laws, if there exists a set function b (or respectively f ): 2E → + ,
satisfying

b(A) = xπi , ∀π : {π1 , ..., π|A| } = A, ∀A ⊆ E; (4)
i∈A
or respectively,

f (A) = xπi , ∀π : {π1 , ..., π|A| } = A, ∀A ⊆ E; (5)
i∈A
94 D.D. Yao
(when A = ∅, by definition, b(∅) = f (∅) = 0); such that for all u ∈ A the
following is satisfied:

xui ≥ b(A), ∀A ⊂ E; xui = b(E); (6)
i∈A i∈E
or respectively,

xui ≤ f (A), ∀A ⊂ E; xui = f (E). (7)
i∈A i∈E
Note that whether the function b or the function f applies in a particular

context is determined by whether the performance in question is minimized
or maximized by the priority rules. (For instance, b applies to delay, and f
applies to throughput.) It is important to note that this minimal (or maximal)
performance is required to be independent of the priority assignment among
the classes within the subset A on the one hand and the priority assignment
among the classes within the subset E \ A on the other hand, as long as any
class in A has priority over any class in E \ A. This requirement is reflected
in the qualifications imposed on π in defining b(A) and f (A) in (4) and (5).
In particular, the definition requires that b(A) and f (A) be respectively, the
minimal and the maximal total performance summing over all job classes in the
subset A that are given priority over all the other classes.
For the time being, ignore the b part of Definition 4. It is clear that when x
satisfies the conservation laws, the performance space, as defined by the polytope
in (7), is a polymatroid. This is because following (5) and (7), all the vertices
xπ indeed, by definition, belong to the polytope. In fact, thepolytope in (7)
is the polymatroid P(f ) of (1) restricted to the hyperplane i∈E xi = f (E)
(instead of the half-plane i∈E xi ≤ f (E)), and is hence termed the base of the
polymatroid P(f ), denoted B(f ) below. Furthermore, following Theorem 1, we
know that when x satisfies conservation laws, the function f (·) as defined in (5)
is increasing and submodular.
Next, consider the b part of Definition 4. Note that subtracting the inequality
constraint from the equality constraint in (6), we can express these constraints
in the same form as in (7), by letting f (A) := b(E) − b(E \ A), or equivalently,
b(A) := b(E) − f (E \ A). Hence, the polytope in (6) is also (the base of) a
polymatroid. Furthermore, the increasingness and submodularity of f translate
into the increasingness and supermodularity of b.
To sum up the above discussion, we have
Theorem 2. If the performance vector x satisfies conservation laws, then its

feasible space (i.e., the achievable performance region) constitutes the base poly-
tope of a polymatroid, B(f ) or B(b), of which the vertices correspond to the
priority rules. Furthermore, the functions f and b, which are the performance
functions corresponding to priority rules, are, respectively, increasing and sub-
modular, and increasing and supermodular.
2.2 Examples
Consider a queueing system with n different job classes which are denoted by the
set E. Let u be the control or scheduling rule that governs the order of service
among different classes of jobs. Let A denote the class of admissible controls,
which are required to be non-idling and non-anticipative. That is, no server is
allowed to be idle when there are jobs waiting to be served, and the control is
only allowed to make use of past history and current state of the system. Neither
can an admissible control affect the arrival processes or the service requirements
of the jobs. Otherwise we impose no further restrictions on the system. For
instance, the arrival processes and the service requirements of the jobs can be
arbitrary. Indeed, since the control cannot affect the arrival processes and the
service requirements, all the arrival and service data can be viewed as generated
a priori following any given (joint) distribution and with any given dependence
relations. We allow multiple servers, and multiple stages (e.g., tandem queues
or networks of queues). We also allow the control to be either preemptive or
non-preemptive. (Some restrictions will be imposed on individual systems to be
studied below.)
Let xui be a performance measure of class i (i ∈ E) jobs under control u.
This need not be a steady-state quantity or an expectation; it can very well
be a sample-path realization over a finite time interval, for instance, the delay
(sojourn time) of the first m class i jobs, the number of class i jobs in the system
at time t, or the number of class i job completions by time t. Let xu := (xui )i∈E
be the performance vector.
For any given permutation π ∈ Π, let xπ denote the performance vector
under a priority scheduling rule that assigns priority to the job classes according
to the permutation π, i.e., class π1 has the highest priority, ..., class πn has the
lowest priority. Clearly any such priority rule belongs to the admissible class.
In all the queueing systems studied below, the service requirements of the
jobs are mutually independent, and are also independent of the arrival processes.
(One exception to these independence requirements is Example 1 below, where
these independence assumptions are not needed.) No independence assumption,
however, is required for the arrival processes, which can be arbitrary. When a
performance vector satisfies conservation laws, whether its state space is B(b)
(6) or B(f ) (7) depends on whether the performance of a given subset of job
classes is minimized or maximized by giving priority to this subset. This is often
immediately evident from the context.
Example 1 Consider a G/G/1 system that allows preemption. For i ∈ E, let
Vi (t) denote the amount of work (processing requirement) in the system at time
t due to jobs of class i. (Note that for any given t, Vi (t) is a random quantity,
corresponding to some sample realization of the work-load process.) Then it is
easily verified that for any t, x := [Vi (t)]i∈E satisfies conservation laws.
Example 2 Continue with the last example. For all i ∈ E, let Ni (t) be the
number of class i jobs in the system at time t. When the service times follow
exponential distributions, with mean 1/μi for class i jobs, we have ENi (t) =
96 D.D. Yao
μi EVi (t). Let Wi be the steady-state sojourn time in system for class i jobs.
From Little’s Law we have EWi = ENi /λi = EVi /ρi , where λi is the arrival rate
of class i jobs, ρi := λi /μi , Ni and Vi are the steady-state counterparts of Ni (t)
and Vi (t), respectively. Hence, the following x also satisfies conservation laws:
(i) for any given t, x := [ENi (t)/μi ]i∈E ;
(ii) x := [ρi EWi ]i∈E .
Example 3 In a G/M/c (c > 1) system that allows preemption, if all job classes
follow the same exponential service-time distribution (with mean 1/μ), then it
is easy to verify that for any t, x := [ENi (t)]i∈E satisfies conservation laws. In
this case, EVi (t) = ENi (t)/μ and EWi = ENi /λi . Hence, x defined as follows
satisfies conservation laws:
(i) for any given t, x := [ENi (t)]i∈E , x := [EVi (t)]i∈E ;
(ii) x := [λi EWi ]i∈E .
(If the control is restricted to be non-preemptive, the results here still hold true.
See Example 6 below.)
Example 4 The results in Example 3 still hold when the system is a network
of queues, provided all job classes follow the same exponential service-time dis-
tribution and the same routing probabilities at each node (service-time distri-
butions and routing probabilities can, however, be node dependent); (external)
job arrival processes can be arbitrary and can be different among the classes.
Example 5 Another variation of Example 3 is the queue, G/M/c/K, where

K ≥ c denotes the upper limit on the total number of jobs allowed in the system
at any time. In this system, higher priority jobs can preempt lower priority jobs
not only in service but also in occupancy. That is, whenever a higher priority
job finds (on its arrival) a fully occupied system, a lower priority job within the
system (if any) will be removed from the system and its occupancy given to the
higher priority job. If there is no lower priority job, then the arrived job is rejected
and lost. As in Example 3, all jobs follow the same exponential service-time
distribution. Let Ri (t) and Di (t) (i ∈ E) denote, respectively, the (cumulated)
number of rejected/removed class i jobs and the (cumulated) number of class i
departures (service completions) up to time t. Then, for any given t, (i) x :=
[ERi (t)]i∈E and (ii) x := [EDi (t)]i∈E satisfy conservation laws.
We next turn to considering cases where the admissible controls are restricted
to be non-preemptive.
Example 6 Consider the G/G/c system, c ≥ 1. If all job classes follow the same
service-time distribution, then it is easy to see that the scheduling of the servers
will not affect the departure epochs of jobs (in a pathwise sense); although
it will affect the identity (class) of the departing jobs at those epochs. (See
Shanthikumar and Sumita [38], §2, for the G/G/1 case; the results there also
hold true for the G/G/c case.) Hence, for any given t, x := [Ni (t)]i∈E satisfies
conservation laws.
Example 7 Comparing the above with Example 3, we know that the results
there also hold for non-preemptive controls. However, in contrast to the extension
of Example 3 to the network case in Example 4, the above can only be extended
to queues in tandem, where overtaking is excluded. Specifically, the result in
Example 6 also holds for a series of G/G/c queues in tandem, where at each
node all job classes have the same service-time distribution, which, however, can
be node dependent. External job arrival processes can be arbitrary and can be
different among classes. The number of servers can also be node dependent.
Example 8 With non-preemptive control, there is a special case for the G/G/1
system with only two job classes (n = 2) which may follow different service-time
distributions: for any given t, x := [Vi (t)]i∈E satisfies conservation laws.
For steady-steady measures, from standard results in GI/G/1 queues (see,
e.g., Asmussen [1], Chapter VIII, Proposition 3.4), we have
EVi = μ−1
i [ENi − ρi ] + ρi μi mi /2
and
EVi = ρi [EWi − μ−1
i + μi mi /2],
where mi is the second moment of the service time of class i jobs. Hence, following
the above, we know that x = [ENi /μi ]i∈E and x = [ρi EWi ]i∈E also satisfy
conservation laws.
Example 9 Two more examples that satisfy conservation laws:

(i) for the G/G/1 system with preemption,
) t
x := [ exp(−ατ )Vi (τ )dτ ]i∈E ;
0
(ii) for the G/M/1 system with preemption,

) t
x := [E exp(−ατ )Ni (τ )dτ /μi ]i∈E ,
0
where in both (i) and (ii) α > 0 is a discount rate, and t is any given time.
Finally, note that in all the above examples, with the exception of Exam-
ple 5, whenever [ENi (t)]i∈E satisfies conservation laws, [EDi (t)]i∈E also satisfies
conservation laws, since in a no-loss system the number of departures is the dif-
ference between the number of arrivals (which is independent of the control) and
the number in system.
Evidently, based on the above discussions, the state space of the performance
vectors in each of the examples above is a polymatroid.
98 D.D. Yao
2.3 Optimal Scheduling
Theorem 3. Consider the optimal control (scheduling) of n jobs classes in the

set E:
max ci xui [ or min ci xui ],
u∈A u∈A
i∈E i∈E
where x is a performance measure that satisfies conservation laws, and the cost
coefficients ci (i ∈ E) satisfy, without loss of generality, the ordering in (2).
Then, this optimal control problem can be solved by solving the following linear
program (LP):
max ci xi [ or min ci xi ].
x∈B(f ) x∈B(b)
i∈E i∈E
The optimal solution to this LP is simply the vertex xπ ∈ B(f ), with π = (1, ..., n)
being the permutation corresponding to the decreasing order of the cost coeffi-
cients in (2). And the optimal control policy is the corresponding priority rule,
which assigns the highest priority to class 1 jobs, and the lowest priority to class
n jobs.
Example 10 (cμ-rule) Consider one of the performance vectors in Example 2,

x := [E(Ni )/μi ]i∈E , where Ni is the number of jobs of class i in the system (or,
“inventory”) in steady state, and μi is the service rate. Suppose our objective is
to minimize the total inventory cost,

min ci E(Ni ),
i∈E
where ci is the inventory holding cost rate for class i jobs. We then rewrite this
objective as
min ci μi xi .
i∈E
(Note that (Ni )i∈E does not satisfy conservation laws; (xi )i∈E does.) Then, we
know from the above theorem that the optimal policy is a priority rule, with the
priorities assigned according to the ci μi values — the larger the value, the higher
the priority. This is what is known as the “cμ-rule”. When all jobs have the same
cost rate, the priorities follow the μi values, i.e., the faster the processing rate (or,
the shorter the processing time), the higher the priority, which is the so-called
SPT (shortest processing time) rule.
The connection between conservation laws and polymatroid, as specified in

Theorem 2, guarantees that any admissible control will yield a performance
vector that belongs to the polymatroid. Furthermore, the converse is also true:
any performance vector that belongs to the polymatroid can be realized by an
admissible control. This is because since B(f ) (or B(b)) is a convex polytope, any
vector in the performance space can be expressed as a convex combination of
the vertices. Following Caratheodory’s theorem (refer to, e.g., Chvátal [8]), any
vector in the performance space can be expressed as a convex combination of no

more than n + 1 vertices. In other words, any performance vector can be realized
by a control that is a randomization of at most n + 1 priority rules, with the
convex combination coefficients being the probabilities for the randomization.
In terms of implementation, however, randomization can be impractical.
First, computationally, there is no easy way to derive the randomization co-
efficients. Second, in order to have an unbiased implementation, randomization
will have to be applied at the beginning of each regenerative cycle, e.g., a busy
period. In heavy traffic, busy periods could be very long, making implementation
extremely difficult, and also creating large variance of the performance.
In fact, one can do better than randomization. It is known (e.g., Federgruen
and Groenevelt [16]) that any interior point of the performance space can be
realized by a particular dynamic scheduling policy, due originally to Kleinrock
[30,31], in which the priority index of each job present in the system grows
proportionately to the time it has spent waiting in queue, and the server always
serves the job that has the highest index. This scheduling policy is completely
specified by the proportionate coefficients associated with the jobs classes, which,
in turn, are easily determined by the performance vector (provided it is at the
interior of the performance space). In terms of practical implementation, there
are several versions of this scheduling policy, refer to [18,19].
3 Generalized Conservation Laws
3.1 Motivation and Definition
Although conservation laws apply to the many examples in the last section, there
are other interesting and important problems that do not fall into this category.
A primary class of such examples includes systems with feedback, i.e., jobs may
come back after service completion. For example, consider the so-called Klimov’s
problem: a multi-class M/G/1 queue in which jobs, after service completion, may
return and switch to another class, following a Bernoulli mechanism. Without
feedback, we know this is a special case of Example 1, and the work in system,
[Vi (t)]i∈E , satisfies conservation laws. With feedback, however, the conservation
laws as defined in Definition 4, need to be modified.
Specifically, with the possibility of feedback, the work of a particular job class,
say class i, should not only include the work associated with class i jobs that are
present in the system, it should also take into account the potential work that
will be generated by feedback jobs, which not only include class i jobs but also
all other classes that may feedback to become class i. With this modification,
the two intuitive principles of conservation laws listed at the beginning of §2.1
will apply.
To be concrete, let us paraphrase here the simple example at the beginning of
§2.1 with two job classes, allowing the additional feature of feedback. As before,
suppose the server serves at unit rate. Then it is not difficult to see that (i) the
total amount of potential work, summing over both classes, will remain invariant
100 D.D. Yao
regardless of the actual schedule that the server follows, as long as it is a non-
idling schedule; and (ii) if class 1 jobs are given (preemptive) priority over class 2
jobs, then the amount of potential work due to class 1 jobs is minimized, namely,
it cannot be further reduced by any other scheduling rule. And the same holds
for class 2 jobs, if given priority over class 1 jobs.
Another way to look at this example: Let T be the first time there is no class 1
jobs left in the system. Then, T is minimized by giving class 1 jobs (preemptive)
priority over class 2 jobs. In particular, T is no smaller than the potential work
of class 1 generated by class 1 jobs (only); T is equal to the latter if and only if
class 1 jobs are given priority over class 2 jobs.
Therefore, with this modification, the conservation laws in Definition 4 can
be generalized. The net effect, as will be demonstrated in the examples below,
is that the variables xi in Definition 4 will have to be multiplied with different
coefficients aA i that depend on both the job classes (i) and the subsets (A). In
particular, when xi is, for instance, the average number of jobs of class i, aA i
denotes the rate of potential work of those classes in set A that is generated by
class i jobs.
We now state the formal definition of generalized conservation laws (GCL),
using the same notation wherever possible as in Definition 4.
Definition 5. (Generalized Conservation Laws) The performance vector x is
said to satisfy generalized conservation laws (GCL), if there exists a set function
b (or respectively f ): 2E → + , and a matrix (aSi )i∈E, S⊆E (which is in general
different for b and f , but we will not make this distinction below for notational
simplicity) satisfying:
aSi > 0, i ∈ S; and aSi = 0, i ∈ S; ∀S ⊆ E;
such that

b(A) = aA
πi xπi , ∀π : {π1 , ..., π|A| } = A, ∀A ⊆ E; (8)
i∈A
or respectively,

f (A) = aA
πi xπi , ∀π : {π1 , ..., π|A| } = A, ∀A ⊆ E; (9)
i∈A
such that for all u ∈ A the following is satisfied:

i xi ≥ b(A),
aA ∀A ⊂ E;
u
aE u
i xi = b(E); (10)
i∈A i∈E
or respectively,

i xi ≤ f (A),
aA ∀A ⊂ E;
u
aE u
i xi = f (E). (11)
i∈A i∈E
It is obvious from the above definition that GCL reduces to the conservation
i = 1 for all i ∈ A, and all A ⊆ E.
laws if aA
3.2 Examples
Example 11 (Klimov’s problem [32]) This concerns the optimal control of a
system in which a single server is available to serve n classes of jobs. Class i jobs
arrive according to a Poisson process with rate αi , which is independent of other
classes of jobs. The service times for class i jobs are independent and identically
distributed with mean μi . When the service of a class i job is completed, it either
returns to become a class j job, with probability pij , or leaves the system with
probability 1 − j pij . Denote α = (αi )i∈E , μ = (μi )i∈E , and P = [pij ]i,j∈E .
Consider the class of non-preemptive policies. The performance measure is
xui = long-run average number of class i jobs in system under policy u.

u
The objective is to find the optimal policy that minimizes j cj xj . Klimov
proved that a priority policy is optimal and gave a recursive procedure for ob-
taining the priority indices.
Tsoucas [42] showed that the performance space of Klimov’s problem is the
following polytope:

{x ≥ 0 : aSi xi ≥ b(S), S ⊂ E; aE
i xi = b(E)},
i∈S i∈E
where the coefficients are given as aSi = λi βiS , with λ = (λ)i∈E and β S = (β)i∈S
obtained as follows:
λ = (I − P )−1 α and β S = (I − PSS )−1 μS ,
where PSS and μS are, respectively, the restriction of P and μ to the set S. Note
that here, λi is the overall arrival rate of class i jobs (including both external
arrivals and feedback jobs), βiS is the amount of potential work of the classes in
S generated by a class i job. (Hence, this potential work is generated at rate αi
in the system.) Summing over i ∈ S yields the total amount of potential work
of the classes in S (generated by the same set of jobs), which is minimized when
these jobs are given priority over other classes. This is the basic intuition as to
why x satisfies GCL.
Example 12 (Branching bandit process) There are m projects at time 0. They

are of K classes, labeled k = 1, · · · , K. Each class k project can be in one of a
finite number of states, with Ek denoting the state space. Classifying different
project classes or projects of the same class but in different states as different
“classes,” we denote E = ∪k Ek = {1, · · · , n} as the set of all project classes. A
single server works on the projects one at a time. Each class i project keeps the
server busy for a duration of vi time units. Upon completion, the class i project
is replaced by Nij projects of class j. The server then has to decide which
next project to serve, following a scheduling rule (control) u. The collection
{(vi , Nij ), j ∈ E}, follows a general joint distribution, which is independent and
identically distributed for all i ∈ E.
102 D.D. Yao
Given S ⊆ E, the S-descendants of a project of class i ∈ S refers to all of its

immediate descendants that are of classes belonging to S, as well as the immedi-
ate descendants of those descendents, and so on. (If a project in S transfers into
a class that is not in S, and later transfers back into a class in S, it will not be
considered as an S-descendant of the original project.) Given a class i project,
the union of the time intervals in which its S-descendants are being served is
called an (i, S) period. Let TiS denote the length of an (i, S) period. It is the
“potential work” of the classes in the set S generated by the class i project. And
S
we use Tm to denote the time until the system has completely cleared all classes
of projects in S class — under a policy that gives priority to those classes in S
E
over other classes. Note, in particular, that Tm represents the length of a busy
period.
In theudiscounted case, the expected reward associated with the control u is
i∈E ci xi , where ) ∞
xi = Eu [
u
e−αt Iiu (t)dt]
0
α > 0 is the discount rate and
!
u 1, if a class i project is being served at time t
Ii (t) =
0, otherwise
Bertsimas and Niño-Mora [3] showed that xu = (xui )i∈E , as defined above,
satisfy the GCL, with coefficients
* TiSc −αt
E[ e dt]
aSi = *0 v , i ∈ S ⊆ E,
E[ 0 i e−αt dt]
and ) E
) c
S

Tm Tm
−αt −αt
b(S) = E e dt − E e dt .
0 0
Intuitively, the GCL here says that the time until all the S c -descendents of all
the projects in S are served is minimized by giving project classes in S c priority
over those in S.
An undiscounted version is also available in [3]. (This includes Klimov’s prob-
lem, the last example above, as a special case.) The criterion here is to minimize
the total expected cost incurred under control u during the first busy period (of
the server) [0, T ], i∈E ci xui , with
) ∞
xi = Eu
u u
tIi (t)dt .
0
Following [3], xu satisfy GCL with coefficients

c
aSi = E[TiS ]/E[vi ], i ∈ S ⊆ E,
and
1 1
Sc 2
b(S) = E[(Tm ) ] − E[(Tm
E 2
) ]+ bi (S),
2 2
i∈S
where c c
E[vi ]E[vi2 ] E[TiS ] E[(TiS )2 ]
bi (S) = − , i ∈ S.
2 E[vi ] E[vi2 ]
The intuition is similar to the discounted case.
4 Extended Polymatroid
4.1 Equivalent Definitions
Recall the space of any performance measure that satisfies conservation laws is a
polymatroid. Analogously, one can ask what is the structure of the performance
space under GCL, i.e., what is the structure of the following polytopes:

EP(b) = { x ≥ 0 : aSi xi ≥ b(S), S ⊆ E }, (12)
i∈S

EP(f ) = { x ≥ 0 : aSi xi ≤ f (S), S ⊆ E }. (13)
i∈S
The most natural route to approach this issue appears to be mimicking Def-
inition 2 of polymatroid (and this is indeed the route taken in [3]). Similar
to the definition of xπ preceding Definition 2, here, given a permutation π (of
{ 1, 2, · · · , n }), we can generate a vertex xπ as follows.
xπ1 = f ({π1 })/aπ{π1 1 }

+
xπ2 = f ({π1 , π2 }) − aπ{π1 1 ,π2 } xπ1 aπ{π2 1 ,π2 }
..
. ,

n−1
xπn = f ({π1 , · · · , πn }) − a{π
πi
1 ,···,πn }
xπi a{π
πn
1 ,···,πn }
.
i=1
Same as in the polymatroid case, we should emphasize here that as yet, xπ

does not necessarily belong to the polytope in (13). The vertices for EP(b) are
analogously generated, with f (·) replaced by b(·).
Definition 6. EP(f ) (respectively EP(b)) is an extended polymatroid (EP) if
xπ as generated above (respectively with b replacing f ) belongs to the polytope
EP(f ), (respectively EP(b)), for any permutation π.
(The term, “extended polymatroid,” was previously used to refer to a poly-
matroid without the requirement that x ≥ 0; e.g., see [26], p. 306. Since [3,
42] and other works in the queueing literature, it has been used to refer to the
polytopes defined above. Also, in [3], the EP corresponding to the b function is
termed “extended contra-polymatroid,” with the term “extended polymatroid”
104 D.D. Yao
reserved for the f function. For simplicity, we do not make such a distinction
here and below.)
With the above definition for EP, the right hand side functions b and f are
not necessarily increasing and supermodular/submodular. In other words, we do
not have a counterpart of Definition 1 for EP (more on this later). On the other
hand, the counterpart for Definition 3 does apply.
Definition 7. EP(f ) is an extended polymatroid if the following is satisfied: for

any A ⊂ B ⊂ E, there exists a point x ∈ EP(f ), such that

aA
i xi = f (A) and aB
i xi = f (B).
i∈A i∈B
Theorem 4. The two definitions of EP in 6 and 7 are equivalent.
Proof. If EP(f ) is EP, then the stated condition in Definition 7 is obviously

satisfied: just pick the vertex xπ such that the first |A| components in π constitute
the set A, and the first |B| components constitute the set B.
For the other direction, i.e., if the stated condition in Definition 7 holds, then
EP(f ) is EP, we use induction on n = |E|. That this holds for n = 1 is trivial.
Suppose this holds for n = k, i.e. for a polytope of the kind in (13) with k
variables. Now consider such a polytope with k + 1 variables, i.e., |E| = k + 1.
Without loss of generality, consider the permutation π = (1, 2, ..., k + 1). We
want to show that the corresponding xπ (i.e., generated from the triangulation
above)) is in the polytope EP(f ).
{1}
Since xπ1 = f ({1})/a1 , we substitute it into the other xπi expressions, i = 1,
to arrive at the following polytope of k variables:

EP(f˜) = {x ≥ 0 : aSi xi ≤ f˜(S), {1} ∈ S ⊆ E},
i∈S,i=1
where
f ({1})
f˜(S) := f (S) − {1} aS1 .
a1
Clearly, since the stated condition in Definition 7 is assumed to hold for EP(f )
(the one with k + 1 variables), it also holds for EP(f˜) (the one with k variables),
{1}
since the equations in question all differ by an amount f ({1})a1 /aS1 on both
sides. Hence, the induction hypothesis confirms that EP(f˜) is an EP. This implies
that (xπ2 , ..., xπn ) ∈ EP(f˜), which is equivalent to xπ = (xπ1 , xπ2 , ..., xπn ) satisfying
all the constraints in EP(f ) that involve S ⊆ E with 1 ∈ S.
We still need to check that xπ satisfies all the other constraints in EP(f )
corresponding to S ⊆ E with 1 ∈ S. To this end, consider the following polytope:

{x ≥ 0 : aSi xi ≤ f (S), S ⊆ E \ {1}}. (14)
i∈S
The above is another polytope with k variables. Obviously the stated condition
in Definition 7, which is assumed to hold for the polytope EP(f ), holds for the
above polytope as well (since the defining inequalities in the latter are just part
of those in EP(f )). Hence, based on the induction hypothesis, the polytope in
(14) is also an EP. This implies that (xπ2 , ..., xπn ), and hence xπ , satisfies all the
inequalities involved in (14).
Hence, we have established that given the stated condition in Definition 7,
xπ does satisfy all the constraints in EP(f ), for each permutation π. Therefore,
EP(f ) is an EP.
The above theorem leads immediately to the following:
Corollary 1. If EP(f ) is an extended polymatroid, then

EP − (f ) := {x ≥ 0 : aSi xi ≤ f (S), S ⊆ E \ E0 }
i∈S
is also an extended polymatroid, for any E0 ⊂ E.

Proof. Simply verify Definition 7. Since EP(f ) is an EP, we can pick any A ⊂
B ⊆ E \ E0 ⊂ E, and there exists an x ∈ EP(f ), such that i∈A aA i xi = f (A)
−
and i∈B aB x
i i = f (B). But this is exactly what is required for EP (f ) to be
EP.
In summary, we have
Theorem 5. If the performance vector x satisfies GCL, then the performance
polytope is an EP, of which the vertices correspond to the performance under
priority rules, and the functions b(A) and f (A) correspond to the performance
of job classes in set A when A is given priority over all other classes in E \ A.
5 Optimization over EP
Here we consider the optimization problem of maximizing a linear function over
the EP, EP(f ), defined in (13):

(PG) max ci xi
i∈E

s.t. i xi ≤ f (A),
aA for all A ⊆ E,
i∈A
xi ≥ 0, for all i ∈ E.
The dual problem can be written as follows:

(DG) min yA f (A)
A⊆E

s.t. i ≥ ci ,
yA aA for all ∈ E,
Ai
yA ≥ 0, for all A ⊆ E.
106 D.D. Yao
Let us start with π = (1, 2, · · · , n), and consider xπ , the vertex defined at
the beginning part of the last section. Below we write out the objective function
of (PG) at xπ , and use the expression, along with complementary slackness, to
identify a candidate for the dual solution. From dual feasibility, we then identify
the conditions under which π is the optimal permutation. Collectively, these
steps constitute an algorithm that finds the optimal π.
For simplicity, write x for xπ below. We first write out xn in the objective
function:
,
n {1,···,n}
n−1
n−1
ci xi = cn f ({1, · · · , n}) − ai xi a{1,···,n}
n + ci xi
i=1 i=1 i=1

n−1
{1,···,n}

= y{1,···,n} f ({1, · · · , n}) + ci − y{1,···,n} ai xi ,
i=1
where we set
y{1,···,n} = cn /a{1,···,n}
n .
Next, we write out xn−1 in the summation above, and set
{1,···,n} {1,···,n−1}
y{1,···,n−1} = (cn−1 − y{1,···,n} an−1 )/an−1 ,
to reach the following expression:

n
ci xi = y{1,···,n} f ({1, · · · , n}) + y{1,···,n−1} f ({1, · · · , n − 1})
i=1

n−2
{1,···,n} {1,···,n−1}

+ ci − y{1,···,n} ai − y{1,···,n−1} ai xπi .
i=1
This procedure can be repeated to yield the following:

n
ci xi = y{1,···,n} f ({1, · · · , n}) + y{1,···,n−1} )f ({1, · · · , n − 1})
i=1
+ · · · + y{1,2} f ({1.2}) + y{1} f ({1}), (15)
where
⎛ ⎞,

n
{1,···,j} ⎠ {1,···,k}
y{1,···,k} = ⎝ck − y{1,···,j} ak ak , (16)
j=k+1
for k = 1, ..., n. (When k = n, the vacuous summation in (16) vanishes.) Fur-

thermore, set yA := 0 for all other A ⊆ E.
With the above choice of x and y, it is easy to check that complementary
slackness is satisfied. Also, primal feasibility is automatic — guaranteed by the
definition of EP, since x is a vertex. Hence, we only need to check dual feasibility.
From the construction of y in (16), we have

n
{1,···,j}
y{1,···,j} ai = ci , i ∈ E,
j=i
satisfying the first set of constraints in (DG). So it suffices to show that the
n non-zero dual variables in (16) are non-negative. To this end, we need to be
specific about the construction of the permutation π = (1, ..., n).
Let us start from the last element in π. Note that from (16), we have
cn
y{1,···,n} = {1,···,n}
≥ 0.
an
Next, to ensure y{1,···,n−1} ≥ 0, the numerator of its expression in (16) must be
non-negative, i.e.,
cn−1 cn
{1,···,n}
≥ y{1,···,n} = {1,···,n} .
an−1 an
Therefore, the index n has to be:
ci
n = arg min {1,···,n}
.
i ai
Note that this choice of n guarantees y{1,···,n−1} ≥ 0, independent of the ordering

of the other n − 1 elements in the permutation.
Similarly, to ensure y{1,···,n−2} ≥ 0, from (16), we must have
{1,···,n−1} {1,···,n}
cn−2 − y{1,···,n−1} an−2 − y{1,···,n} an−2 ≥ 0,
or
{1,···,n}
cn−2 − y{1,···,n} an−2
{1,···,n−1}
≥ y{1,···,n−1} .
an−2
Hence, the choice of n − 1 has to be:
{1,···,n}
ci − y{1,···,n} ai
n − 1 = arg min {1,···,n−1}
.
i≤n−1 ai
This procedure can be repeated until all elements of the permutation is de-
termined. In general, the index k is chosen in the order of k = n, n − 1, ..., 1, and
it has to satisfy:
n {1,···,j}
ci − j=k+1 y{1,···,j} ai
k = arg min {1,···,k}
.
i≤k ai
Formally, the following algorithm solves the dual problem (DG) in terms
of generating the permutation π, along with the dual solution y π . The optimal
primal solution is then the vertex, xπ , corresponding to the permutation π.
108 D.D. Yao
Algorithm 1 [for (DG)]

(i) Initialization: S(n) = E, k = n;
(ii) If k = 1, stop, and output {π, S(k); y π (S(k)}; else, set
n S(j)
ci − π
j=k+1 yS(j) ai
πk := arg min S(k)
i ai
n S(j)
π
ci − π
j=k+1 yS(j) ai
yS(k) := min S(k)
;
i ai
(iii) k ← k − 1, S(k) = S(k + 1) \ {πk }; goto (ii).
Theorem 6. Given an extended polymatroid EP(f ), the above algorithm solves

the primal and dual LP’s, (PG) and (DG) in O(n2 ) steps, with xπ and y π being
the optimal primal-dual solution pair.
Proof. Following the discussions preceding the algorithm, it is clear that we only
π
need to check yS(k) ≥ 0, for k = 1, · · · , n.
When k = n, following the algorithm, we have S(n) = E, and
πn = arg min{ci /aE

i },
π
yE πn ≥ 0.
= cπn /aE
i
π
Inductively, suppose yS(j) ≥ 0, for j = k + 1, ..., n, have all been determined.
π
The choice of πk+1 and hence yS(k+1) in the algorithm guarantees

n
S(j)
ck − π
yS(j) ak ≥ 0,
j=k
π
and hence yS(k) ≥ 0.
That the optimal solution is generated in O(n2 ) steps is evident from the
description of the algorithm.
To summarize, the two remarks at the end of §1.2 for the polymatroid op-
timization also apply here: (i) primal feasibility is automatic, by way of the
definition of EP; and (ii) dual feasibility, along with complementary slackness,
identifies the permutation π that defines the (primal) optimal vertex.
Furthermore, there is also an analogy to (3), i.e., the sum of dual variables
yields the priority index. To see this, for concreteness consider Klimov’s problem,
with the performance measure xi being the (long-run) average number of class
i jobs in the system. (For this example, we are dealing with a minimization
problem over the EP EP(b). But all of the above discussions, including the
algorithm, still apply, mutatis mutandis, such as changing f to b and max to
min, etc.) The optimal policy is a priority rule corresponding to the permutation
π generated by the above algorithm, with the jobs of class π1 given the highest
priority, and jobs of class πn , the lowest priority. Let y ∗ be the optimal dual
solution generated by the algorithm. Define

γi := yS∗ , i ∈ E.
Si
Then, we have
∗ ∗
γπi = y{π 1 ,···,πi }
+ · · · + y{π 1 ,···,πn }
, i ∈ E. (17)
Note that γπi is decreasing in i, since the dual variables are non-negative. Hence,
the order of γπi ’s is in the same direction as the priority assignment. In other
words, (17) is completely analogous to (3): just like the indexing role played by
the cost coefficients in the polymatroid case, in the EP case here {γi } is also a
set of indices upon which the priorities are assigned: at each decision epoch, the
server chooses to serve, among all waiting jobs, the job class with the highest γ
index.
Finally, we can synthesize all the above discussions on GCL and its con-
nection to EP, and on optimization over an EP, to come up with the following
generalization of Theorem 3.
Theorem 7. Consider the optimal control problem in Theorem 3:

max ci xui [ or min ci xui ].
u∈A u∈A
i∈E i∈E
Suppose x is a performance measure that satisfies GCL. Then, this optimal con-
trol problem can be solved by solving the following LP:

max ci xi [ or min ci xi ].
x∈EP(f ) x∈EP(b)
i∈E i∈E
The optimal solution to this LP is simply the vertex xπ ∈ B(f ), with π being the
permutation identified by Algorithm 1; and the optimal policy is the corresponding
priority rule, which assigns the highest priority to class π1 jobs, and the lowest
priority to class πn jobs.
Applying the above theorem to Klimov’s model we can generate the optimal
policy, which is a priority rule dictated by the permutation π, which, in turn, is
generated by Algorithm 1.
6 Notes and Comments
The materials presented here are drawn from Chapter 11 of the book by Chen
and Yao [7], to which the reader is also referred for preliminaries in queueing
networks. A standard reference to matroid, as well as polymatroid, is Welsh [47].
The equivalence of the first two definitions of the polymatroid, Definitions 1 and
110 D.D. Yao
2, is a classical result; refer to, e.g., Edmonds [13], Welsh [47], and Dunstan and
Welsh [12].
The original version of conservation laws, due to Kleinrock [31], takes the
form of a single equality constraint, i∈E xi = b(E) or = f (E). In the works
of Coffman and Mitrani [9], and Gelenbe and Mitrani [20], the additional in-
equality constraints were introduced, which, along with the equality constraint,
give a full characterization of the performance space. In a sequence of papers,
Federgruen and Groenevelt [15,16,17], established the polymatroid structure of
the performance space of several queueing systems, by showing that the RHS
(right hand side) functions are increasing and submodular.
Shanthikumar and Yao [39] revealed the equivalence between conservations
laws and the polymatroid nature of the performance polytope. In other words,
the increasingness and submodularity of the RHS functions are not only sufficient
but also necessary conditions for conservation laws. This equivalence is based on
two key ingredients: On the one hand, the polymatroid Definition 2 asserts that
if the “vertex” xπ — generated through a triangular system of n linear equations
(made out of a total of 2n − 1 inequalities that define the polytope) — belongs to
the polytope (i.e., if it satisfies all the other inequalities), for every permutation,
π, then the polytope is a polymatroid. On the other hand, in conservation laws
the RHS functions that characterize the performance polytope can be defined
in such a way that they correspond to those “vertices”. This way, the vertices
will automatically belong to the performance space, since they are achievable by
priority rules.
The direct implication of the connection between conservation laws and poly-
matroid is the translation of the scheduling (control) problem into an optimiza-
tion problem. In the case of a linear objective, the optimal solution follows im-
mediately from examining the primal-dual pair: primal feasibility is guaranteed
by the polymatroid property — all vertices belong to the polytope, and dual
feasibility, along with complementary slackness, yields the priority indices.
Motivated by Klimov’s problem, Tsoucas [42], and Bertsimas and Ninõ-Mora
[3] extended conservation laws and related polymatroid structure to GCL and
EP. The key ingredients in the conservation laws/polymatroid theory of [39]
are carried over to GCL/EP. In particular, EP is defined completely analogous
to the polymatroid Definition 2 mentioned above, via the “vertex” xπ ; whereas
GCL is such that for every permutation π, xπ corresponds to a priority rule, and
thereby guarantees its membership to the performance polytope. The equivalent
definitions for EP in Definition 7 are due to Lu [34] and Zhang [52] (also see
[51]).
Dynamic scheduling of a multi-class stochastic network is a complex and
difficult problem that has continued to attract much research effort. A sample
of more recent works shows a variety of different approaches to the problem,
from Markov decision programming (e.g., Harrison [27], Weber and Stidham
[45]), monotone control of generalized semi-Markov processes (Glasserman and
Yao [24,25]), to asymptotic techniques via diffusion limits (Harrison [28], and
Harrison and Wein [29]). This chapter presents yet another approach, which is
based on polymatroid optimization. It exploits, in the presence of conservation

laws and GCL, the polymatroid or EP structure of the performance polytope
and turns the dynamic control problem into a static optimization problem.
The cμ-rule in Example 10 is a subject with a long history that can be
traced back to Smith [40], and the monograph of Cox and Smith [10]; also see,
e.g., [5,6]. More ambitious examples of applications that are based on Theorem
3 include: scheduling in a Jackson network ([36]), scheduling and load balancing
in a distributed computer system ([37]), and scheduling multi-class jobs in a
flexible manufacturing system ([50]).
Klimov’s problem generalizes the cμ-rule model by allowing completed jobs
to feedback and change classes. Variations of Klimov’s model have also been
widely studied using different techniques; e.g., Harrison [27], Tcha and Pliska
[41]. The optimal priority policy is often referred to as the “Gittins index” rule,
as the priority indices are closely related to those indices in dynamic resource
allocation problems that are made famous by Gittins ([21,22,23]).
Klimov’s model, in turn, belongs to the more general class of branching bandit
problems, (refer to §3), for which scheduling rules based on Gittins indices are
optimal. There is a vast literature on this subject; refer to, e.g., Lai and Ying
[33], Meilijson and Weiss [35], Varaiya et al. [43], Weber [44], Weiss [46], Whittle
[48,49]; as well as Gittins [21,22], and Gittins and Jones [23].
GCL corresponds to the so-called “indexable” class of stochastic systems,
including Klimov’s model and branching bandits as primary examples; refer to
[3,4]. Beyond this indexable class, however, the performance space is not even an
EP. There have been recent studies that try to bound such performance space
by more structured polytopes (e.g., polymatroid and EP), e.g., Bertsimas [2],
Bertsimas et al [4], and Dacre et al [11].
References
1. Asmussen, S., Applied Probability and Queues. Wiley, Chichester, U.K., 1987.
2. Bertsimas, D., The Achievable Region Method in the Optimal Control of Queueing
Systems; Formulations, Bounds and Policies. Queueing Systems, 21 (1995), 337–
389.
3. Bertsimas, D. and Niño-Mora, J., Conservation Laws, Extended Polymatroid and
Multi-Armed Bandit Problems: A Unified Approach to Indexable Systems. Math-
ematics of Operations Research, 21 (1996), 257–306.
4. Bertsimas, D. Paschalidis, I.C. and Tsitsiklis, J.N., Optimization of Multiclass
Queueing Networks: Polyhedral and Nonlinear Characterization of Achievable Per-
formance. Ann. Appl. Prob., 4 (1994), 43–75.
5. Baras, J.S., Dorsey, A.J. and Makowski, A.M., Two Competing Queues with Linear
Cost: the μc Rule Is Often Optimal. Adv. Appl. Prob., 17 (1985), 186–209.
6. Buyukkoc, C., Varaiya, P. and Walrand, J., The cμ Rule Revisited. Adv. Appl.
Prob., 30 (1985), 237–238.
7. Chen, H. and Yao, D.D., Fundamentals of Queueing Networks: Performance,
Asymptotics and Optimization. Springer-Verlag, New York, 2001.
8. Chvátal, V., Linear Programming. W.H. Freeman, New York, 1983.
112 D.D. Yao
9. Coffman, E. and Mitrani, I., A Characterization of Waiting Time Performance

Realizable by Single Server Queues. Operations Research, 28 (1980), 810–821.
10. Cox, D.R. and Smith, W.L., Queues. Methunen, London, 1961.
11. Dacre, K.D., Glazebrook, K.D., and Ninõ-Mora, J., The Achievable Region Ap-
proach to the Optimal Control of Stochastic Systems. J. Royal Statist. Soc. (1999).
12. Dunstan, F.D.J. and Welsh, D.J.A., A Greedy Algorithm for Solving a Certain
Class of Linear Programmes. Math. Programming, 5 (1973), 338–353.
13. Edmonds, J., Submodular Functions, Matroids and Certain Polyhedra. Proc. Int.
Conf. on Combinatorics (Calgary), Gordon and Breach, New York, 69-87, 1970.
14. Federgruen, A. and Groenevelt, H., The Greedy Procedure for Resource Allocation
Problems: Necessary and Sufficient Conditions for Optimality. Operations Res., 34
(1986), 909–918.
15. Federgruen, A. and Groenevelt, H., The Impact of the Composition of the Cus-
tomer Base in General Queueing Models. J. Appl. Prob., 24 (1987), 709–724.
16. Federgruen, A. and Groenevelt, H., M/G/c Queueing Systems with Multiple Cus-
tomer Classes: Characterization and Control of Achievable Performance under
Non-Preemptive Priority Rules. Management Science, 34 (1988), 1121–1138.
17. Federgruen, A. and Groenevelt, H., Characterization and Optimization of Achiev-
able Performance in Queueing Systems. Operations Res., 36 (1988), 733–741.
18. Fong, L.L. and Squillante, M.S., Time-Function Scheduling: A General Approach
to Controllable Resource Management. IBM Research Report RC-20155, IBM Re-
search Division, T.J. Watson Research Center, Yorktown Hts., New York, NY
10598, 1995.
19. Franaszek, P.A. and Nelson, R.D., Properties of Delay Cost Scheduling in Time-
sharing Systems. IBM Research Report RC-13777, IBM Research Division, T.J.
Watson Research Center, Yorktown Hts., New York, NY 10598, 1990.
20. Gelenbe, E. and Mitrani, I., Analysis and Synthesis of Computer Systems. Aca-
demic Press, London, 1980.
21. Gittins, J.C., Bandit Processes and Dynamic Allocation Indices (with discussions).
J. Royal Statistical Society, Ser. B, 41 (1979), 148–177.
22. Gittins, J.C., Multiarmed Bandit Allocation Indices. Wiley, Chichester, 1989.
23. Gittins, J.C. and Jones, D.M., A Dynamic Allocation Index for the Sequential
Design of Experiments. In: Progress in Statistics: European Meeting of Statisti-
cians, Budapest, 1972, J. Gani, K. Sarkadi and I. Vince (eds.), North-Holland,
Amsterdam, 1974, 241–266.
24. Glasserman, P. and Yao, D.D., Monotone Structure in Discrete-Event Systems.
Wiley, New York, 1994.
25. Glasserman, P. and Yao, D.D., Monotone Optimal Control of Permutable GSMP’s.
Mathematics of Operations Research, 19 (1994), 449–476.
26. Grötschel, M., Lovász, L and Schrijver, A., Geometric Algorithms and Combina-
torial Optimization, second corrected edition. Springer-Verlag, Berlin, 1993.
27. Harrison, J.M., Dynamic Scheduling of a Multiclass Queue: Discount Optimality.
Operations Res., 23 (1975), 270–282.
28. Harrison, J.M., The BIGSTEP Approach to Flow Management in Stochastic
Processing Networks. In: Stochastic Networks: Theory and Applications, Kelly,
Zachary, and Ziedens (eds.), Royal Statistical Society Lecture Note Series, #4,
1996, 57–90.
29. Harrison, J.M. and Wein, L., Scheduling Networks of Queues: Heavy Traffic Anal-
ysis of a Simple Open Network. Queueing Systems, 5 (1989), 265–280.
30. Kleinrock, L., A Delay Dependent Queue Discipline. Naval Research Logistics
Quarterly, 11 (1964), 329–341.
31. Kleinrock, L., Queueing Systems, Vol. 2. Wiley, New York, 1976.
32. Klimov, G.P., Time Sharing Service Systems, Theory of Probability and Its Appli-
cations, 19 (1974), 532–551 (Part I) and 23 (1978), 314–321 (Part II).
33. Lai, T.L. and Ying, Z., Open Bandit Processes and Optimal Scheduling of Queueing
Networks. Adv. Appl. Prob., 20 (1988), 447-472.
34. Lu, Y., Dynamic Scheduling of Stochastic Networks with Side Constraints. Ph.D.
Thesis, Columbia University, 1998.
35. Meilijson, I. and Weiss, G., Multiple Feedback at a Single-Server Station. Stochastic
Proc. and Appl., 5 (1977), 195–205.
36. Ross, K.W. and Yao, D.D., Optimal Dynamic Scheduling in Jackson Networks.
IEEE Transactions on Automatic Control, 34 (1989), 47-53.
37. Ross, K.W. and Yao, D.D., Optimal Load Balancing and Scheduling in a Dis-
tributed Computer System. Journal of the Association for Computing Machinery,
38 (1991), 676–690.
38. Shanthikumar, J.G. and Sumita, U., Convex Ordering of Sojourn Times in Single-
Server Queues: Extremal Properties of FIFO and LIFO Service Disciplines. J. Appl.
Prob., 24 (1987), 737–748.
39. Shanthikumar J.G. and Yao D.D., Multiclass Queueing Systems: Polymatroid
Structure and Optimal Scheduling Control. Operation Research, 40 (1992), Sup-
plement 2, S293–299.
40. Smith, W.L., Various Optimizers for Single-Stage Production. Naval Research Lo-
gistics Quarterly, 3 (1956), 59–66.
41. Tcha, D. and Pliska, S.R., Optimal Control of Single-Server Queueing Networks
and Multiclass M/G/1 Queues with Feedback. Operations Research, 25 (1977),
248–258.
42. Tsoucas, P., The Region of Achievable Performance in a Model of Klimov. IBM
Research Report RC-16543, IBM Research Division, T.J. Watson Research Center,
Yorktown Hts., New York, NY 10598, 1991.
43. Varaiya, P., Walrand, J. and Buyyokoc, C., Extensions of the Multiarmed Bandit
Problem: The Discounted Case. IEEE Trans. Automatic Control, 30 (1985), 426–
439.
44. Weber, R., On the Gittins Index for Multiarmed Bandits. Annals of Applied Prob-
ability, (1992), 1024–1033.
45. Weber, R. and Stidham, S., Jr., Optimal Control of Service Rates in Networks of
Queues. Adv. Appl. Prob., 19 (1987), 202–218.
46. Weiss, G., Branching Bandit Processes. Probability in the Engineering and Infor-
mational Sciences, 2 (1988), 269–278.
47. Welsh, D., Matroid Theory, (1976), Academic Press, London.
48. Whittle, P., Multiarmed Bandits and the Gittins Index. J. Royal Statistical Society,
Ser. B, 42 (1980), 143–149.
49. Whittle, P., Optimization over Time: Dynamic Programming and Stochastic Con-
trol, vols. I, II, Wiley, Chichester, 1982.
50. Yao, D.D. and Shanthikumar, J.G., Optimal Scheduling Control of a Flexible Ma-
chine. IEEE Trans. on Robotics and Automation, 6 (1990), 706–712.
51. Yao, D.D. and Zhang, L., Stochastic Scheduling and Polymatroid Optimization,
Lecture Notes in Applied Mathematics, 33, G. Ying and Q. Zhang (eds.), Springer-
Verlag, 1997, 333–364.
52. Zhang, L., Reliability and Dynamic Scheduling in Stochastic Networks. Ph.D. The-
sis, Columbia University, 1997.
Workload Modeling for Performance Evaluation
Dror G. Feitelson
School of Computer Science and Engineering

The Hebrew University, 91904 Jerusalem, Israel
[email protected]
http://www.cs.huji.ac.il/˜feit
Abstract. The performance of a computer system depends on the char-

acteristics of the workload it must serve: for example, if work is evenly
distributed performance will be better than if it comes in unpredictable
bursts that lead to congestion. Thus performance evaluations require the
use of representative workloads in order to produce dependable results.
This can be achieved by collecting data about real workloads, and cre-
ating statistical models that capture their salient features. This survey
covers methodologies for doing so. Emphasis is placed on problematic is-
sues such as dealing with correlations between workload parameters and
dealing with heavy-tailed distributions and rare events. These consider-
ations lead to the notion of structural modeling, in which the general
statistical model of the workload is replaced by a model of the process
generating the workload.
1 Introduction
The goal of performance evaluation is often to compare different system designs

or implementations. The evaluation is expected to bring out performance differ-
ences that will allow for an educated decision regarding what design to employ
or what system to buy. Thus it is implicitly assumed that observed performance
differences indeed reflect important differences between the systems being stud-
ied.
However, performance differences may also be an artifact of the evaluation
methodology. The performance of a system is not only a function of the system
design and implementation. It may also be affected by the workload to which
the system is subjected. For example, communication networks have often been
analyzed using Poisson-related models of traffic, which indicated that the vari-
ance in load should smooth out over time and when multiple data sources are
combined. But in 1994 Leland and co-workers showed, based on extensive obser-
vations and measurements, that this does not happen in practice [52]. Instead,
they proposed a self-similar traffic model that captures the burstiness of network
traffic and leads to more realistic evaluations of required buffer space and other
parameters [24].
Analyzing network traffic was easy, in a sense, because all packets are of equal
size and the only characteristic that required measurement and modeling was

Workload Modeling for Performance Evaluation 115
the arrival process. But if we consider a complete computer system, the problem
becomes more complex [13,11]. For example, a computer program may require a
certain amount of CPU time, memory, and I/O, and these resource requirements
may be interleaved in various ways during its execution. In addition there are
several levels at which we might model the system: we can study the functional
units used by a stream of instructions, the subsystems used by a job during its
execution, or the requirements of jobs submitted to the system over time. Each
of these scales is relevant for the design and evaluation of different parts of the
system: the CPU, the hardware configuration, or the operating system.
The main domain used as a source of examples in this survey is that of paral-
lel job scheduling. Workloads in this field are interesting due to the combination
of being relatively small and at the same time relatively complex. The size of
typical workloads is tens of thousands of jobs, as opposed to millions of packets
in communication workloads. These workloads are characterized by a large num-
ber of factors, including the job sizes, runtimes, runtime estimates, and arrival
patterns. The complexity derives not only from the multiple factors themselves,
but from various correlations between them. Research on these issues is facili-
tated by the availability of data and models in the Parallel Workloads Archive
[60]. In addition, there are several documented cases of how workload parameters
influence the outcomes of performance evaluation studies [53,57,25].
2 Data Sources
The suggestion that workload modeling should be based on measurements is

not new [32,4]. However, for a long time relatively few models based on actual
measurements were published. As a result, many performance studies did not
use experimental workload models at all (and don’t to this day).
It is true that real-world data is not always available, or may be hard to
obtain. But not using real data may lead to flawed evaluations [26]. This real-
ization has led to a new wave of workload analyses in various fields of system
design in recent years. Maybe the most prominent are the study of Internet traf-
fic patterns [52,62,75] and world-wide web traffic patterns, with the intent of
using the knowledge to evaluate server performance and caching schemes [5,18,
6]. Other examples include studies of process arrivals and runtimes [12,37], file
systems [36], and video streams [48]. In the area of parallel systems, descriptive
studies of workloads have only started to appear in recent years [29,76,58,27,
14]. There are also some attempts at modeling [10,28,21,41,23,54,15] and on-line
characterization [34].
But where does the data come from? There are two main options: use data
that is available anyway, or collect data specifically for the workload model.
The latter can be done in two ways: active or passive instrumentation. Impor-
tantly, collected data can and should be made publicly available for use by other
researchers [60,33].
116 D.G. Feitelson
2.1 Using Accounting and Activity Logs
The most readily available source of data is accounting or activity logs. Such
logs are kept by the system for auditing, and record selected attributes of all
activities. For example, many computer systems keep a log of all executed jobs.
In large scale parallel systems, these logs can be quite detailed and are a rich
source of information for workload studies [60]. Another example is web servers,
that are often configured to log all requests.
A good example is provided by the analysis of three months of activity on
the 128-node NASA Ames iPSC/860 hypercube supercomputer. This analysis
provided the following data [29]:
– The distribution of job sizes (in number of nodes) for system jobs, and for
user jobs classified according to when they ran: during the day, at night, or
on the weekend.
– The distribution of total resource consumption (node seconds), for the same
job classifications.
– The same two distributions, but classifying jobs according to their type:
those that were submitted directly, batch jobs, and Unix utilities.
– The changes in system utilization throughout the day, for weekdays and
weekends.
– The distribution of multiprogramming level seen during the day, at night,
and on weekends. This also included the measured down time (a special case
of 0 multiprogramming).
– The distribution of runtimes for system jobs, sequential jobs, and parallel
jobs, and for jobs with different degrees of parallelism. This included a con-
nection between common runtimes and the queue time limits of the batch
scheduling system.
– The correlation between resource usage and job size, for jobs that ran during
the day, at night, and over the weekend.
– The arrival pattern of jobs during the day, on weekdays and weekends, and
the distribution of interarrival times.
– The correlation between the time of day a job is submitted and its resource
consumption.
– The activity of different users, in terms of number of jobs submitted, and
how many of them were different.
– Profiles of application usage, including repeated runs by the same user and
by different users, on the same or on different numbers of nodes.
– The dispersion of runtimes when the same application is executed many
times.
Note, however, that accounting logs do not always exist at the desired level
of detail. For example, even if all communication on a web server is logged,
this is at the request level, not at the packet level. To obtain packet-level data,
specialized instrumentation is needed.
2.2 Passive and Active Instrumentation

If data is not readily available, it should be collected. This is done by instrument-
ing the system with special facilities that record its activity. A major problem
with this is being unobtrusive, and not modifying the behavior of the system
while we measure it.
Passive instrumentation refers to designs in which the system itself is not
modified. The instrumentation is done by adding external components to the
system, that monitor system activity but do not interfere with it. This approach
is commonly used is studies of communication, where it is relatively easy to add
a node to a system that only listens to the traffic on the communication net-
work [52,73,35]. A more extreme example is a proposal to add a shadow parallel
machine to a production parallel machine, with each shadow node monitoring
the corresponding production node, and all of them cooperating to filter and
summarize the data [66].
Active instrumentation refers to the modification of the system so that it will
collect data about its activity. This can be integrated with the original system
design, as was done for example in the RP3 [43]. However, it is more commonly
done after the fact, when a need to collect data about a specific system arises.
A good example is the Charisma project, which set out to characterize the I/O
patterns on parallel machines [59]. This was done by instrumenting the I/O
library and requesting users to re-link their applications; when running with the
instrumented library, all I/O activity was recorded for subsequent analysis.
Obviously, instrumenting a system to collect data at runtime can affect the
systems behavior and performance. This may not be very troublesome in the
case of I/O activity, which suffers from high overhead anyway, but may be very
problematic for the study of fine grain events related to communication, synchro-
nization, and memory usage. One possible solution to this problem is to model
the effect of the instrumentation, thereby enabling it to be factored out of the
measurement results [55]. This leads to results that reflect real system behavior
(that is, unaffected by the instrumentation), but leaves the problem of perfor-
mance degradation while the measurements are being taken. An alternative is
to selectively activate only those parts of the instrumentation that are needed
at each instant, rather than collecting data about the whole system all the time.
Remarkably, this can be done efficiently by modifying the system’s object code
as it runs [38].
2.3 Data Sanitation

Before data can be used to create a workload model, it has to be cleaned up.
This has several aspects.
One important aspect is the handling of outliers. Workload logs sometimes
include uncommon events that “don’t make sense”. Examples include
– In the two-year log of jobs run on the LANL CM-5 parallel machine, there
is a 10-day stretch in which a single user ran about 5000 instances of a job
that executed in 1–2 seconds on 128 nodes.
118 D.G. Feitelson
– In the two-year log of jobs run on the SDSC Paragon parallel machine, there
is a large concentration of short jobs that arrive at 3:30 AM on different
days. This is probably due to periodic invocation of administrative scripts.
– In the two-year log of jobs run on the SDSC SP2 parallel machine, there is
a single hour in which a single user submitted some 580 similar jobs.
Of course, the decision that something is “uncommon” is subjective. The purist

approach would be to leave everything in, because in fact it did happen in a
real system. But on the other hand, while strange things may happen, it is
difficult to argue for a specific one; if we leave it in the workload that is used to
analyze systems, we run the risk of promoting systems that specifically cater for
a singular unusual condition that is unlikely to ever occur again.
A procedure that was advocated by Cirne and Berman is to use clustering as
a means to distinguish between “normal” and “abnormal” data [15]. Specifically,
they characterize days in a workload log by an n-valued vector, and cluster these
vectors into two clusters in Rn . If the clustering procedure distinguishes a single
day and puts it in a cluster by itself, this day is removed and the procedure is
repeated with the data that is left. Note, however, that this has its risks: first,
abnormal behavior may span more than a single day, as the above examples show;
moreover, removing days may taint other data, e.g. when interarrival times are
considered.
Another aspect of workload sanitation involves errors. Workload logs may
contain data about activities that failed to complete successfully, e.g. jobs that
were submitted and either failed or were killed by the user. Should these jobs be
included or deleted from the data? On one hand, they represent work that the
system had to handle, even if nothing came of it. On the other hand, they do not
represent useful work, and may have been submitted again later. An interesting
compromise is to keep such data, and explicitly include it in the workload model
[15]. This will enable the study of how failed work affects system utilization and
the performance of “good” work.
Finally, an important issue is determining the degree to which data is gener-
ally representative. One problem is that data may be affected by local procedures
and constraints where it was collected. For example, data on programs run on
a machine equipped with only 32MB memory will show that programs do not
have larger resident sets, but this is probably an artifact of this limit, and not a
real characteristic of general workloads. A more striking example is provided by
the NASA iPSC log mentioned above. In this log a full 57% of the jobs are in-
vocations of the Unix pwd command on various nodes, which was the technique
used by system personnel to verify that the system was working [29]. Another
problem is that workloads may evolve with time [39], especially on large and
unique installations such as parallel supercomputers. It is therefore important
to capture data from a mature system, and not a new (or old) one.
3 Workload Modeling
There are two common ways to use a measured workload to analyze or evaluate
a system design [32]: (1) use the traced workload directly to drive a simulation,
or (2) create a model from the trace and use the model for either analysis or
simulation. For example, trace-driven simulations based on large address traces
are often used to evaluate cache designs [45,42]. But models of how applications
traverse their address space have also been proposed, and provide interesting
insights into program behavior [71,72].
3.1 Why Model

The advantage of using a trace directly is that it is the most “real” test of the
system; the workload reflects a real workload precisely, with all its complexities,
even if they are not known to the person performing the analysis.
The drawback is that the trace reflects a specific workload, and there is al-
ways the question of whether the results generalize to other systems or load
conditions. In particular, there are cases where the workload depends on the
system configuration, and therefore a given workload is not necessarily represen-
tative of workloads on systems with other configurations. Obviously, this makes
the comparison of different configurations problematic. In addition, traces are of-
ten misleading if we have incomplete information about the circumstances when
they were collected. For example, workload traces often contain intervals when
the machine was down or part of it was dedicated to a specific project, but this
information may not be available.
Workload models have a number of advantages over traces [70].
– It is possible to change model parameters one at a time, in order to inves-
tigate the influence of each one, while keeping other parameters constant.
This allows for direct measurement of system sensitivity to the different pa-
rameters. It is also possible to select model parameters that are expected to
match the specific workload at a given site.
In general it is not possible to manipulate traces in this way, and even when
it is possible, it can be problematic. For example, it is common practice to
increase the modeled load on a system by reducing the average interarrival
time. But this practice has the undesirable consequence of shrinking the
daily load cycle as well. With a workload model, we can control the load
independent of the daily cycle.
– Using a model, it is possible to repeat experiments under statistically similar
conditions that are nevertheless not identical. For example, a simulation can
be run several times with different seeds for the random number generator.
This is needed in order to compute confidence intervals.
– Logs may not represent the real workload due to various problems: a limit
of 4 hours may force users to break long jobs into multiple short jobs, jobs
killed by the system may be repeated, etc. If taken at face value this may
be misleading, but the problem is that often we do not know about such
problems.
120 D.G. Feitelson
Conversely, a modeler has full knowledge of model workload characteristics.

For example, it is easy to know which workload parameters are correlated
with each other because this information is part of the model.
– Finally, modeling increases our understanding, and can lead to new designs
based on this understanding. For example, identifying the repetitive nature
of job submittal can be used for learning about job requirements from history.
One can design a resource management policy that is parameterized by a
workload model, and use measured values for the local workload to tune the
policy.
The main problem with models, as with traces, is that of representativeness.

That is, to what degree does the model represent the workload that the system
will encounter in practice? The answer depends in part on the degree of detail
that is included. As noted above, each job is composed of procedures that are
built of instructions, and these interact with the computer at different levels.
One option is to model these levels explicitly, creating a hierarchy of interlocked
models for the different levels [13,10,64]. This has the obvious advantage of
conveying a full and detailed picture of the structure of the workload. In fact,
it is possible to create a whole spectrum of models spanning the range from
condensed rudimentary models to direct use of a detailed trace.
For example, the sizes of a sequence of jobs need not be modeled indepen-
dently. Rather, they can be derived from a lower-level model of the jobs’ struc-
tures [30]. Hence the combined model will be useful both for evaluating systems
in which jobs are executed on predefined partitions, and for evaluating systems
in which the partition size is defined at runtime to reflect the current load and
the specific requirements of jobs.
The drawback of this approach is that as more detailed levels are added, the
complexity of the model increases. This is detrimental for three reasons. First,
more detailed traces are needed in order to create the lower levels of the model.
Second, it is commonly the case that there is wider diversity at lower levels.
For example, there may be many jobs that use 32 nodes, but at a finer detail,
some of them are coded as data parallel with serial and parallel phases, whereas
others are written with MPI in an SPMD style. Creating a representative model
that captures this diversity is hard, and possibly arbitrary decisions regarding
the relative weight of the various options have to be made. Third, it is harder
to handle such complex models. While this consideration can be mitigated by
automation [70,44], it leaves the problem of having to check the importance and
impact of very many different parameters.
3.2 How to Model
The most common approach used in workload modeling is to create a statistical

summary of an observed workload. This is applied to all the workload attributes,
e.g. computation, memory usage, I/O behavior, communication, etc. [46]. It is
typically assumed that the longer the observation period, the better. Thus we
can summarize a whole year’s workload by analyzing a record of all the jobs
that ran on a given system during this year. A synthetic workload can then
be generated according to the model, by sampling from the distributions that
constitute the model.
The question of what exactly to model, and at what degree of detail, is a
hard one. On one hand, we want to fully characterize all important workload
attributes. On the other hand a parsimonious model is more manageable, as
there are less parameters whose values need to be assessed and whose influence
needs to be studied. Also, there is a danger of over-fitting a particular workload
at the expense of generality.
Fitting Distributions. The goal of a model is to be able to create a syn-

thetic workload that mimics the original (possibly with certain modifications,
according to the effects we wish to study). The statistical summary is therefore
a distribution, or collection of distributions for various workload attributes. By
sampling from these distributions we then create the model workload [49].
C T C S P 2 J a n n m o d e l
1 1
s e r ia l s e r ia l
2 -4 2 -4
0 .8 5 -8 0 .8 5 -8
c u m m u la tiv e p r o b a b ility
9 -3 2 9 -3 2
> 3 2 > 3 2
0 .6 0 .6
0 .4 0 .4
0 .2 0 .2
0 0
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0
r u n tim e [s ] r u n tim e [s ]
S D S C S P 2 F e ite ls o n m o d e l
1 1
s e r ia l s e r ia l
2 -4 2 -4
0 .8 5 -8 0 .8 5 -8
9 -3 2 9 -3 2
> 3 2 > 3 2
0 .6 0 .6
0 .4 0 .4
0 .2 0 .2
0 0
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0
r u n tim e [s ] r u n tim e [s ]
Fig. 1. Distributions of runtimes for different ranges of job sizes, in two workload logs
and two models of parallel jobs.
One way to select suitable distributions is based on moments, and especially

the mean and the variance of the sample data [23]. For example, these statistics
indicate that the distribution of job runtimes has a wide dispersion, leading to
a preference for a hyper-exponential model over an exponential one. Jann et al.
122 D.G. Feitelson
have used hyper-Erlang distributions to create models that match the first 3
moments of a distribution [41]. However, such summaries may be misleading,
because they may not represent the shape of the distribution correctly. Specifi-
cally, in the Jann models, the distributions become distinctly bimodal, whereas
the original data is much more continuous (Figure 1). The Feitelson model,
which uses a three-stage hyper-exponential distribution, more closely resembles
the original data in this respect.
The use of distributions with the right shape is not just an esthetic is-
sue. Some 25 years ago Lazowska showed that using models based on a hyper-
exponential distribution with matching moments to evaluate a simple queueing
system leads to inaccurate results [50], and advocated the use of distributions
with matching percentiles instead. He also noted that a hyper-exponential distri-
bution has three parameters, whereas the mean and standard deviation of data
only define two, so many different hyper-exponential distributions that match
the first two moments are possible — and lead to different results.
Table 1. Sensitivity of statistics to the largest data points. Data regarding runtimes
on the CTC SP2 machine from [23] courtesy of Allen Downey.
Rec’s omitted statistic (% change)

(% of total) mean [sec] CV median [sec]
0 (0%) 9371 3.1 552
5 (0.01%) 9177 (-2.1%) 2.2 (-29%) 551 (-0.2%)
10 (0.02%) 9094 (-3.0%) 2.0 (-35%) 551 (-0.2%)
20 (0.04%) 9023 (-3.7%) 1.9 (-39%) 551 (-0.2%)
40 (0.08%) 8941 (-4.6%) 1.9 (-39%) 550 (-0.4%)
80 (0.16%) 8834 (-5.7%) 1.8 (-42%) 549 (-0.5%)
160 (0.31%) 8704 (-7.1%) 1.8 (-42%) 546 (-1.1%)
Another problem with using statistics based on high moments of the data is
that they are very sensitive to rare large samples [23]. Table 1 shows data based
on the runtimes of 50866 parallel jobs from the CTC SP2 machine. Removing just
the top 5 values causes the mean to drop by 2%, and the coefficient of variation
(the standard deviation divided by the mean) to drop by 29%. The median, as
a representative of order statistics, only changes by 0.2%. As the extreme values
observed in a sample are not necessarily representative, this implies that the
model may be largely governed by a small number of unrepresentative samples.
Finding a distribution that matches given moments is relatively easy, be-
cause it can be done based on inverting equations that relate a distribution’s
parameters to its moments. Finding a distribution that fits a given shape is typ-
ically harder [54]. One possibility is to use a maximum likelihood method, which
finds the parameters that most likely gave rise to the observed data. Another
option is to use an iterative method, in which the goodness of fit at each stage
is quantified using the Chi-square test, the Kolmogorov-Smirnov test, or the
Anderson-Darling test (which is like the Kolmogorov-Smirnov test but places

more emphasis on the tail of the distribution).
Correlations. Modeling the distribution of each workload attribute in isola-

tion is not enough. An important issue that has to be considered is possible
correlations between different attributes.
Correlations are important because they can have a dramatic impact on
system behavior. Consider the scheduling of parallel jobs on a massively parallel
machine as an example. Such scheduling is akin to 2D bin packing: each job is
represented by a rectangle in processors×time space, and these rectangles have
to be packed as tightly as possible. Assuming that when each job is submitted
we know how many processors it needs, but do not know for how long it will run,
it is natural to do the packing according to size. Specifically, packing the bigger
jobs first may be expected to lead to better performance [16]. But what if there
is a correlation between size and running time? If this is an inverse correlation,
we find a win-win situation: the larger jobs are also shorter, so packing them
first is statistically similar to using SJF (shortest job first) [47]. But if size and
runtime are correlated, and large jobs run longer, scheduling them first may
cause significant delays for subsequent smaller jobs, leading to dismal average
performance [53].
Table 2. Correlation coefficient of runtime and size for different parallel supercom-
puter workloads.
System Correlation
CTC SP2 −0.029
KTH SP2 0.011
SDSC SP2 0.145
LANL CM-5 0.211
SDSC Paragon 0.305
Establishing whether or not a correlation exists is not always easy. The com-
monly used correlation coefficient only yields high values if a strong linear rela-
tionship exists between the variables. In the example of the size and runtime of
parallel jobs, the correlation coefficient is typically rather small (Table 2), and a
scatter plot shows no significant correlation either (Figure 2). However, these two
attributes are actually correlated with each other, as seen from the distributions
for the CTC and SDSC logs in Figure 1. In both of these, the distribution of
runtimes for ranges of larger job-sizes distinctly favors longer runtimes, whereas
smaller jobs sizes favor short runtimes1 .
A coarse way to model correlation, which avoids this problem altogether, is to
represent the workload as a set of points in a multidimensional space, and apply
1
The only exception is the serial jobs on the CTC machine, which have very long
runtimes; but this anomaly is unique to the CTC workload.
124 D.G. Feitelson
Fig. 2. The correlation between job sizes and runtimes on parallel supercomputers.
The scatter-plot data is from the SDSC Paragon parallel machine.
clustering [13]. For example, each job can be represented by a tuple including its
runtime, its size, its memory usage, and so on. By clustering we can then select
a small number of representative jobs, as use them as the basis of our workload
model; each such job comes with a certain (representative) combination of values
for the different attributes. However, many workloads do not cluster nicely —
rather, attribute values come from continuous distributions, and many different
combinations are all possible.
The direct way to model a correlation between two attributes is to use the
joint distribution of the two attributes. This suffers from two problems. One is
that it may be expected to be hard to find an analytical distribution function
that matches the data. The other is that for a large part of the range, the data
may be very sparse. For example, most parallel jobs are small and run for a
short time, so we have a lot of data about small short jobs. But we may not
have enough data about large long jobs to say anything meaningful about the
distribution — we just have a small set of unrelated samples.
The typical solution is therefore to divide the range of one attribute into
sub-ranges, and model the distribution of the other attribute for each such sub-
range. For example, the Jann model of supercomputer workloads divides the job
size scale according to powers of two, and creates an independent model of the
runtimes for each range of sizes [41]. As can be seen in Figure 1, these models
are completely different from each other. An alternative is to use the same model
for all subranges, and define a functional dependency of the model parameters
on the subrange. For example, the Feitelson model first selects the size of each
job according to the distribution of job sizes, and then selects a runtime from a
distribution of runtimes that is conditioned on the selected size [28]. Specifically,
the runtime is selected from a two-stage hyperexponential distribution, and the
probability for using the exponential with the higher mean is linearly dependent
on the size:
p(n) = 0.95 − 0.2(n/N )

Thus, for small jobs (the job size n is small relative to the machine size N ) the
probability of using the exponential with the smaller mean is 0.95, and for large
jobs this drops to 0.75.
Stationarity. A special type of correlation is correlation with time. This means

that the workload changes with time: it is not stationary.
On short time scales, the most commonly encountered non-stationary phe-
nomenon is the daily work cycle. In many systems, the workload at night is
quite different from the workload during the day. Many workload models ignore
this and focus on the daytime workload, assuming that it is stationary. How-
ever, when the workload includes items whose duration is on the scale of hours
(such as parallel jobs), the daily cycle cannot be ignored. There are two typical
ways for dealing with it. One is to divide the day into a number of ranges, and
model each one separately assuming that it is stationary [14]. The other is to
use parameterized distributions, and model the daily cycle by showing how the
parameters change with time of day [54].
Over long ranges, a non-stationary workload can be the result of changing
usage patterns as users get to know the system better. It can also result from
changing missions, e.g. when one project ends and another takes its place. Such
effects are typically not included in workload models, but they could affect the
data on which models are based. We return to this issue in Section 5.
Assumptions. An important point that is often overlooked in workload mod-

eling is that everything has to be modeled. It is not good to model one attribute
with great precision, but use unbased assumptions for the others.
The problem is that assumptions can be very tempting and reasonable, but
still be totally untrue. For example, it is reasonable to assume that parallel jobs
are used for speedup, that is, to complete the computation faster. After all, this
is the basis for Amdahl’s Law. But other possibilities also exist — for example,
parallelism can be used to solve the same problem with greater precision rather
than faster. The problem is that assuming speedup is the goal leads to a model
in which parallelism is inversely correlated with runtime, and this has an effect
on scheduling [53,26]. Observations of real workloads indicate that this is not
the case, as shown above.
Another reasonable assumption is that users will provide the system with
accurate estimates of job runtimes when asked to. At least on large scale parallel
systems, users indeed spend significant effort tuning their applications, and may
be expected to have this information. Also, backfilling schedulers reward low
estimates but penalize underestimates, leading to a convergence towards accurate
estimates. Nevertheless, studies of user estimates reveal that they are often highly
inaccurate, and often represent an overestimate by a full order of magnitude [57].
Surprisingly, this can sway results comparing schedulers that use the estimates
to decide whether to backfill jobs (that is, to use them to fill holes in an existing
schedule) [25].
126 D.G. Feitelson
4 Heavy Tails, Self Similarity, and Burstiness

A major problem with applying the techniques described in the previous section
occurs when the data is “bad” [3]. This is best explained by an example. If the
data fits, say, an exponential distribution, then a running average of growing
numbers of data samples quickly converges to the mean of the distribution. But
bad data is ill-behaved: it does not converge when averaged, but rather continues
to grow and fluctuate. Such effects have received considerable attention lately, as
many different data sets were found to display them. For more technical detail
on this topic, see [62,61].
4.1 Distributions with Heavy Tails

A very common situation is that distributions have many small elements, and
few large elements. For example, there are many small files and few large files;
many short processes and few long processes. The question is how dominant are
the large elements relative to the small ones. In heavy-tailed distributions, the
rare large elements (from the tail of the distribution) dominate.
In general, the relative importance of the tail can be classified into one of
three cases [62]. Consider trying to estimate the length of a process, given that
we know that it has already run for a certain time, and that the mean of the
distribution of process lengths is m.
– If the distribution of process lengths has a short tail, than the more we have
waited already, the less additional time we expect to wait. The mean of the
tail is smaller than m. For example, this would be the case if the distribution
was uniform over a certain range.
– If the distribution is memoryless, the expected additional time we need to
wait for the process to terminate is independent of how long we have waited
already. The mean length of the tail is always the same as the mean length
of the whole distribution. This is the case for the exponential distribution.
– But if the distribution is heavy tailed, the additional time we may expect to
wait till the process terminates grows with the time we have already waited.
The mean of the tail is larger than m, the mean of the whole distribution.
An example of this type is the Pareto distribution.
An important consequence of heavy tailed distributions is the mass disparity phe-
nomenon: a small number of samples account for the majority of mass, whereas
all small samples account for negligible mass [17]. Conversely, a typical sample
is small, but a typical unit of mass comes from a large sample. Using concrete
examples from computers, a typical process is short, but a typical second of
CPU activity is part of a long process; a typical file is small, but a typical byte
of storage belongs to a large file (Figure 3). This disparity is sometimes referred
to as the “mice and elephants” phenomenon. But this metaphor may conjure
the image of a bimodal distribution2 , which could be misleading: in most cases,
the distribution is continuous.
2
A typical mouse weighs about 28 grams, whereas an elephant weighs 3 to 6 tons,
depending on whether it is Indian or African. Cats, dogs, and zebras, which fall in
1 0 0 1
9 0
8 0 0 .1
C u m m u la tiv e P e r c e n t
s u r v iv a l p r o b a b ility ( lo g )
7 0
n u m b e r
0 .0 1
6 0 o f file s
5 0
0 .0 0 1
4 0
d is k s p a c e
3 0 0 .0 0 0 1 U n ix file s
2 0 P a re to a = 1 .2 5
1 0 1 e -0 5
0
0 1 8 6 4 5 1 2 4 K 3 2 K 2 5 6 K 2 M 1 6 M 1 2 8 M 1 G 1 e -0 6
1 1 0 0 1 0 0 0 0 1 e + 0 6 1 e + 0 8
F ile S iz e
file s iz e ( lo g )
Fig. 3. The distribution of file sizes, from a 1993 survey of 12 million Unix files [40].
Left: 90% of the files are less than 16KB long, and use only some 10% of the total
disk space. Half the disk space is occupied by a very small fraction of large files. Right:
log-log complementary distribution plot, with possible Pareto model of the tail; see
Equation (2).
Formally, it is common to define heavy tailed distributions to be those whose

tails decay like a power law — the probability of sampling a value larger than x
is proportional to one over x raised to some power [62]:
F̄ (x) = Pr[X > x] ∼ x−a 0<a<2 (1)
where F̄ (x) is the survival function (that is, F̄ (x) = 1−F (x)), and ∼ means “has
the same distribution”. This is a very strong statement. Consider an exponential
distribution. The probability of sampling a value larger than say 100 times the
mean is e−100 , which is totally negligible for all intents and purposes. But for a
Pareto distribution with a = 2, this probability is 1/40000: one in every 40000
samples will be bigger than 100 times the mean. While rare, such events can
certainly happen. When the shape parameter is a = 1.1, and the tail is heavier,
this probability increases to one in 2216 samples.
An important characteristic of heavy tailed distributions is that some of their
moments may be undefined. Specifically, using the above definition, if a ≤ 1 the
mean will be undefined, and if a ≤ 2 the variance will be undefined. But what
does this mean? Consider a Pareto distribution with a = 1, whose probability
density is proportional to x−2 . Trying to evaluate its mean leads to
)
1
E[x] = cx 2 dx = c ln x
x
so the mean is infinite. But for any finite number of samples, the mean obviously
exists. The answer is that the mean grows logarithmically with the number of
observations. However, this statement is misleading, as the running mean does
not actually resemble the log function. In fact, it grows in big jumps every time a
between, are missing from this picture.
128 D.G. Feitelson
4 0
3 5
r u n n in g a v e r a g e 3 0
2 5
2 0
1 5
lo g ( x )
1 0
0 2 0 4 0 6 0 8 0 1 0 0
s a m p le s iz e ( m illio n s )
Fig. 4. Examples of the running mean of samples from a Pareto distribution. Four
plots using different random number generator seeds are shown.
large observation from the tail of the distribution is sampled, and then it slowly
decays again towards the log function (Figure 4).
The definition (1) can also be used to determine if a given data set is heavy
tailed. Taking the log from both sides we observe that
log F̄ (x) = log x−a = −a log x (2)
So plotting log F̄ (x) (the log of the fraction of observations larger than x) as a
function of log x should lead to a straight line with slope −a (this is sometimes
called a “log-log complementary distribution plot”, or LLCD, see Figure 3).
This technique can be further improved by aggregating successive observa-
tions (that is, replacing each sequence of k observations by their sum). Distribu-
tions for which such aggregated random variables have the same distribution as
the original are called stable distributions. The Normal distribution is the only
stable distribution with finite variance. Heavy tailed distributions (according to
definition (1)) are also stable, but have an infinite variance. Thus the central
limit theorem does not apply, and the aggregated random variables do not have
a Normal distribution. Rather, they have the same heavy-tailed distribution.
This can be verified by creating LLCD plots of the aggregated samples, and
checking that they too are straight lines with the same slope as the original [19,
18]. If the distribution is not heavy tailed, the aggregated samples will tend to
be Normally distributed (the more so as the level of aggregation increases), and
the slopes of the LLCD plots will increase with the level of aggregation.
Using these and other procedures, the following have been argued to be heavy
tailed:
– Process runtimes on general purpose workstations [51,37]. Note that this
only applies to the tail of the distribution, i.e. to processes longer than a
certain threshold. Measurements show the power to be close to 1.
model tail k
Pr[T > t] = t−k [51] (’86) > 3s 1.05–1.25
[37] (’96) > 1s 0.78–1.29
– File sizes on a general purpose system (Figure 3), again limited to the tail
of the distribution. There has been some discussion on whether this is best
modeled by a Pareto or a lognormal distribution, but at least some data sets
seem to fit a Pareto model better, and in any case they are highly skewed
[22].
– Various aspects of Internet traffic, specifically [62,69]
• Flow sizes
• FTP data transfer sizes
• TELNET packet interarrival times
– Various aspects of web server load, specifically [18,6]
• The tail of the distribution of file sizes on a server
• The distribution of request sizes
• The popularity of the different files (this is a Zipf distribution — see
below)
• The distribution of off times (between requests)
• The distribution of the number of embedded references in a web page
– The popularity of items (e.g. pages on the web) is often found to follow Zipf’s
Law [77], which is also a power law [7]. Assume a set of items are ordered
according to their popularity counts, i.e. according to how many times each
was selected. Zipf’s Law is that the count y is inversely proportional to the
rank r according to
y ≈ r−b b≈1 (3)
This means that there are r items with count larger than y, or
Pr[Y > y] = r/N (4)
where N is the total number of items. We can express r as a function of y

by inverting the original expression (3), leading to r ≈ y −1/b ; substituting
this into (4) gives a power-law tail
Pr[Y > y] = C · y −a
moreover, b ≈ 1 implies a ≈ 1 [2].

The problem with procedures such as plotting log F̄ (x) as a function of log x
and measuring the slope of the line is that data regarding the tail is sparse by
definition. When applying an automatic classification procedure, a single large
sample may sway the decision is favor of “heavy”. But is this the correct general-
ization? The question is one of identifying the nature of the underlying distribu-
tion, without having adequate data. Claiming a truly heavy tailed distribution is
almost always unfounded, because such a claim means that unbounded samples
should be expected as more and more samples are generated. In all real cases,
samples must be bounded by some number (a process cannot run for longer than
130 D.G. Feitelson
the uptime of the computer; a file cannot be larger than the total available disk
space).
One simple option is to postulate a certain upper bound on the distribution,
but this does not really solve the problem because the question of where to
place the bound remains unanswered. Another option is to try fitting alternative
distributions for which all moments converge. For example, there have been
successful attempts to model file sizes using a lognormal distribution rather than
a Pareto distribution [22]. This has the additional benefit of fitting the whole
distribution rather than just the tail.
A more general approach is to use phase-type distributions, which employ
a mixture of exponentials. Consider a simple example, in which N samples are
drawn from an exponential distribution, and one additional sample is a far out-
lier. This can be modeled as a hyperexponential distribution, with probability
N/(N + 1) to sample from the main exponential, and probability 1/(N + 1) to
sample from a second exponential distribution with a mean equal to the outlier
value. In general, it is possible to construct mixtures of exponentials to fit any
observed distribution [9]. This is especially important for analytical modeling, as
distributions with infinite moments cause severe problems for such analysis. For
simulation the exact definition is somewhat less important, as long as significant
mass is concentrated in the tail.
4.2 The Phenomena of Self Similarity
Self similarity refers to situations in which a phenomenon has the same general
characteristics at different scales [56,67]. In particular, parts of the whole may
be scaled-down copies of the whole, as in well known fractals such as the Cantor
set and the Sierpiński triangle. In natural phenomena we cannot expect perfect
copies of the whole, but we can expect the same statistical properties. A well
known natural fractal is the coast of Britain [56]. Workloads often also display
such behavior.
The first demonstrations of self similarity in computer workloads were for
Internet traffic, and used a striking visual demonstration. A time series rep-
resenting the number of packets transmitted during successive time units was
recorded. At a fine granularity, i.e. when using small time unites, this was seen
to be bursty. But the same bursty behavior persisted also when the time series
was aggregated over several orders of magnitude, by using larger and larger time
units. This contradicted the common Poisson model of packet arrivals, which
predicted that the traffic should average out when aggregated.
Similar demonstrations have since been done for other types of workloads.
Figure 5 gives an example from jobs arriving at a parallel supercomputer. Self
similarity has also been shown in file systems [36] and in web usage [18].
The mathematical description of self similarity is based on the notion of long-
range correlations. Actually, there are correlations at many different time scales:
self similarity implies that the workload at a certain instant is similar to the
workload at other instants at different scales, starting with a short time scale,
1 0 8 0
p ro c e s e s p e r 3 6 s e c .
jo b s p e r 3 6 s e c .
8 6 0
6
4 0
4
2 2 0
0 0
3 8 0 0 0 0 3 8 2 0 0 0 3 8 4 0 0 0 3 8 6 0 0 0 3 8 8 0 0 0 3 9 0 0 0 0 3 8 0 0 0 0 3 8 2 0 0 0 3 8 4 0 0 0 3 8 6 0 0 0 3 8 8 0 0 0 3 9 0 0 0 0
tim e tim e
5 0 5 0 0
p r o c e s s e s p e r 6 m in .
jo b s p e r 6 m in .
4 0 4 0 0
3 0 3 0 0
2 0 2 0 0
1 0 1 0 0
0 0
3 0 0 0 0 0 3 2 0 0 0 0 3 4 0 0 0 0 3 6 0 0 0 0 3 8 0 0 0 0 4 0 0 0 0 0 3 0 0 0 0 0 3 2 0 0 0 0 3 4 0 0 0 0 3 6 0 0 0 0 3 8 0 0 0 0 4 0 0 0 0 0
tim e tim e
1 0 0 1 2 0 0
p ro c e s s e s p e r 1 h r.
8 0
jo b s p e r 1 h r .
8 0 0
6 0
4 0
4 0 0
2 0
0 0
1 e + 0 6 1 .2 e + 0 6 1 .4 e + 0 6 1 .6 e + 0 6 1 .8 e + 0 6 2 e + 0 6 1 e + 0 6 1 .2 e + 0 6 1 .4 e + 0 6 1 .6 e + 0 6 1 .8 e + 0 6 2 e + 0 6
tim e tim e
5 0 0 5 0 0 0
p ro c e s s e s p e r 1 0 h r.
jo b s p e r 1 0 h r .
4 0 0 4 0 0 0
3 0 0 3 0 0 0
2 0 0 2 0 0 0
1 0 0 1 0 0 0
0 0
3 e + 0 7 3 .2 e + 0 7 3 .4 e + 0 7 3 .6 e + 0 7 3 .8 e + 0 7 4 e + 0 7 3 e + 0 7 3 .2 e + 0 7 3 .4 e + 0 7 3 .6 e + 0 7 3 .8 e + 0 7 4 e + 0 7
tim e tim e
3 0 0 0 2 0 0 0 0
p ro c e s s e s p e r 4 d a y s
2 5 0 0 1 6 0 0 0
jo b s p e r 4 d a y s
2 0 0 0
1 2 0 0 0
1 5 0 0
8 0 0 0
1 0 0 0
5 0 0 4 0 0 0
0 0
0 2 e + 0 7 4 e + 0 7 6 e + 0 7 8 e + 0 7 1 e + 0 8 0 2 e + 0 7 4 e + 0 7 6 e + 0 7 8 e + 0 7 1 e + 0 8
tim e tim e
Fig. 5. Burstiness of job arrivals to the SDSC Paragon parallel supercomputer at

different time scales. Left: jobs per time unit. Right: processes per time unit (each
parallel job is composed of multiple processes). In all the graphs time is in seconds; the
duration of the log is two years, which is about 63 million seconds.
through medium time scales, and up to long time scales. But the strength of the
correlation decreases as a power law with the time scale.
A model useful for understanding the correlations leading to self similarity is
provided by random walks. In a one-dimensional random walk, each step is either
to the left or to the right with equal probabilities.
√ It is well known that after n
steps the expected distance from the origin is n, or n0.5 . But what happens if
the steps are correlated with each other? If each step has a probability higher
than 12 of being in the same direction as the previous step, we can expect slightly
longer stretches of steps in the same direction. But this is not enough to change
the expected distance from the origin after n steps — is stays n0.5 . This remains
true also if each step is correlated with all previous steps with exponentially
decreasing weights. In both these cases, the correlation only has a short range,
and the effect of each step decays to zero very quickly.
132 D.G. Feitelson
But if a step is correlated with previous steps with polynomially decreasing

weights, meaning that the weight of the step taken k steps back is proportional
to k −a , stretches of steps in the same direction become much longer. And the
expected distance from the origin is found to behave like nH , with 0.5 < H < 1.
H is called the Hurst parameter [63]. The closer it is to 1, the more self-similar
the walk.
One way of checking whether a process is self similar is directly based on the
above: measure the range covered after n steps, and check the exponent that
relates it to n. Assume you start with a time series x1 , x2 , . . .. The procedure is
as follows [63]:
1. Normalize it subtracting the mean x̄ from each sample, giving zi = xi − x̄.

The mean of the new series is obviously 0.
2. Calculate the distance covered after j steps:

j
yj = zi
i=1
3. The range covered after n steps is the maximum distance that has occurred:
Rn = max yj − min yj
j=1...n j=1...n
4. Rescale this by dividing by the standard deviation of the original data.

5. The model is that the rescaled range, R/s, should grow like cnH . To check
this take the log leading to

R
log = log c + H log n
s n
If the process is indeed self similar, we expect to see a straight line, and the
slope of the line gives H.
If a long time series is given, the calculation for small values of n is repeated
for non-overlapping sub-series of length n each, and the average is used. An
example of the results of doing so is given in Figure 6, based on the data shown
graphically in Figure 5.
Other ways of checking for self similarity are based on the rate in which the
variance decays as observations are aggregated, or on the decay of the spectral
density, possibly using wavelet analysis [1]. Results of the Variance-time method
are also shown in Figure 6. This is based on aggregating the original time series
(that is, replacing each m consecutive values by their average) and calculating
the variance of the new series. This decays polynomially with a rate of −β,
leading to a straight line with this slope in log-log axes. The Hurst parameter is
then given by
H = 1 − (β/2)
1 0 0 0 0 1 0 0 0 0
S D S C jo b s
S D S C jo b s H = 0 .8 1 2 S D S C p ro c
S D S C p ro c 1 0 0 0
P o is s o n
P o is s o n
H = 0 .6 5 6 1 0 0
b e ta = 0 .6 8 0
H = 0 .6 5 5
1 0 0 0 1 0
a v e r a g e v a r ia n c e
a v e ra g e R /s
1 b e ta = 0 .4 4 9
H = 0 .4 9 0
H = 0 .7 7 5
0 .1
1 0 0 0 .0 1
0 .0 0 1 b e ta = 1 .0 3 6
H = 0 .4 8 2
0 .0 0 0 1
1 0 1 e -0 5
1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0
n a g g r e g a te s iz e m
Fig. 6. The (R/s)n and variance-time methods for measuring self similarity, applied to
the data in Figure 5. A Poisson process with no self-similarity is included as reference,
as well as linear regression lines.
4.3 Modeling Self-Similarity
Heavy tailed distributions and self similarity are intimately tied to each other,
and the modeling of self-similar workloads depends on this. As noted above,
self similarity is a result of long-range correlation in the workload. By using
heavy tailed distributions to create a workload model with the desired long
range correlation, we get a model that also displays self similarity.
The idea is that the workload is not uniform, but rather generated by mul-
tiple on-off processes [75,18,36]. “On” periods are active periods, in which the
workload arrives at the system at a certain rate (jobs per hour, packets per sec-
ond, etc). “Off” periods are inactive periods during which no load is generated.
The complete workload is the result of many such on-off processes.
The crux of the model is the distributions governing the lengths of the on and
off periods. If these distributions are heavy tailed, we get long-range correlation:
if a unit of work arrives at time t, similar units of work will continue to arrive for
the duration d of the on period to which it belongs, leading to a correlation with
subsequent times up to t + d. As this duration is heavy tailed, the correlation
created by this burst will typically be for a short d; but occasionally a long on
period will lead to a correlation over a long span of time. As many different
bursts may be active at time t, what we actually get is a combination of such
correlations for durations that correspond to the distribution of the on periods.
But this is heavy tailed, so we get a correlation that decays polynomially — a
long range dependence.
In some cases, this type of behavior is built in, and a direct result of the
heavy tailed nature of certain workload parameters. For example, given that
web server file sizes are heavy tailed, the distribution of service times will also
be heavy tailed (as the time to serve a file is proportional to its size). During the
time a file is served, data is transmitted at a constant rate. This is correlated
with later transmittals according to the heavy-tailed distribution of sizes and
transmission times, leading to long range correlation and self similarity [18].
134 D.G. Feitelson
5 Workload Dynamics and Structural Modeling
The on-off process used for modeling self-similar workloads has another very
important benefit. It provides a mechanism for introducing locality into the
workload, so that not only the statistics will be modeled, but also the dynamics.
5.1 User Behavior
The procedure for workload modeling outlined in Section 3.2 was to analyze real
workloads, recover distributions that characterize them, and then sample from
these distributions. The main problem with this procedure is that is loses all
structural information.
A real workload is not a random sampling from a distribution. For example,
the load on a server used by students at a university changes from week to week,
depending on the assignments that are due each time. In each week, everybody
is working on the same task, so the workload is composed of many jobs that are
statistically similar. The next week all the jobs are similar to each other again,
but they are all different from the jobs of the previous week. Over the whole year
we indeed observe a wide distribution with many job types, but at any given
time we do not see a representative sampling of this distribution. Instead, we
only see samples concentrated in a small part of the distribution (Figure 7). The
workload displays a “locality of sampling”3 .
7 0 0 1 2 0
6 0 0 C T C S P 2 C T C S P 2
a v e r a g e d iffe r e n t jo b s iz e s
1 0 0
L A N L C M -5 S D S C P a ra g o n
a v e r a g e d iffe r e n t u s e r s
5 0 0 S D S C P a ra g o n S D S C S P 2
8 0
S D S C S P 2
4 0 0
6 0
3 0 0
4 0
2 0 0
1 0 0 2 0
0 0
d a y w e e k m o n th q u a rte r a ll d a y w e e k m o n th q u a rte r a ll
o b s e r v a tio n w in d o w o b s e r v a tio n w in d o w
Fig. 7. The dynamics of workloads. Left: the active set of users grows with the obser-
vation window. Right: so does the diversity of the workload, in this case represented
by the number of different job sizes observed. Note that the x scale is not linear.
The common way to model workload dynamics is with a user behavior graph
[31]. This is a graph whose nodes represent states. In each state, the user exe-
cutes a certain job with characteristics drawn from a certain distribution. The
3
The existence of such local repetitiveness in workloads was suggested to me by Larry
Rudolph over ten years ago.
arcs denote the probability of moving from state to state. The graph therefore
encodes a Markovian model of the workload dynamics. A random walk on the
graph, subject to the model’s transition probabilities, creates a random workload
sequence such that the probability of each job matches the limiting probability
of that job’s state, but it also abides by the model of which jobs come after each
other, and how many times a job may be repeated (using self-looping arcs in the
graph) [64]. However, this needs to be adjusted in order to create heavy tailed
distributions.
In a university it may be plausible to argue that all students should be
modeled using the same user behavior graph. But in a production environment
one would expect different users, with different levels of activity and different
behaviors. In addition, the active population changes with time (Figure 7) [23].
Thus what we actually need is not one user behavior graph, but a model of the
user population as a whole: how the population of users changes, and what user
behavior graph each one should have. Using such a model has two important
advantages. First, it has built-in support for generating self-similar workloads
(assuming users have long-tailed on and off activity times). Second, it provides a
good way to control load without modifying the underlying distributions: simply
change the number of users [6].
Another aspect of user behavior, which is not captured by the user behav-
ior graph, is the feedback from the system performance to the generation of
new work. Real users are not oblivious to the system’s behavior: They typically
submit additional work only when existing work is finished. Thus, if the user
population is bounded, the system’s current performance modulates the offered
load, automatically reducing it when congestion occurs, and spreading the load
more evenly over time. But adding this integrates the workload model with the
system, and prevents the use of an independent workload model.
5.2 Internal Structure

User modeling implants a structure on the workload. But it does not by itself
define the basic building blocks of the workload — the jobs that are submitted
to the system.
One approach is to use a descriptive model. For example, modeling of parallel
applications requires a functional relationship between the number of processors
and the runtimes — in short, a speedup function of the application. A model
of speedups based on the average parallelism and its variance was proposed by
Downey [21]. Another model, based on the parallel and sequential parts of the
application and on the overheads of parallelization, was proposed by Sevcik [68].
An alternative is to model the application’s internal structure. It is com-
mon practice to measure systems using parameterized synthetic applications [8].
Such applications typically involve several nested loops that mimic the behavior
of iterative applications, and perform different amounts of computations, I/O
operations, and memory accesses. The number of iterations, types of operations,
and spread of addresses are all parameters, thus allowing a single simple and
generic benchmark to mimic many different applications.
136 D.G. Feitelson
A similar approach can be used to generate a synthetic workload: use a

parameterized program, selecting the parameters from suitable distributions in
order to create the desired mix of behaviors. For example, Rudolph and Feitelson
have proposed a model of parallel applications with relatively few parameters,
including the total work done, the average size of work units and its variability,
the way in which these work units are partitioned into threads, and the number
of barriers by which they are synchronized [30].
The question is what distributions to use. While there has been some work
done on characterizing specific applications [20,74,65], there has been little if
any work on characterizing the mix of application characteristics in a typical
workload. A rather singular example is the Charisma project, in which a whole
workload was measured [59]. Interestingly, this requires the same statistical tech-
niques described in Section 3.2, just applied to a different level. Indeed, such hier-
archical structuring of workloads has been recognized as an important workload
structuring tool [64].
Naturally, all this applies to practically all types of workloads, and not only
to jobs on (parallel) machines. For example, web workloads can be viewed as
sessions that each include a sequence of requests for pages that each have several
embedded components; database workloads include transactions that contain a
number of embedded database operations, and so on.
6 Conclusions
Performance evaluation depends on workload modeling. We have outlined the
conceptual framework of such modeling, starting with simple statistical charac-
terization, continuing with the handling of self similarity, and ending with the
need to also model user behavior. But all this is useless without real measured
data from which distributions and parameters can be learned. One of the most
important tasks is to collect large amounts of high resolution data about the
behavior of workloads, and to share this data to facilitate the creation of better
workload models.
Apart from collecting data, there are also many methodological issues that
beg for additional work. These include techniques to analyze and characterize
workloads, evaluations of the relative importance of different workload parame-
ters, and demonstrations of how workloads affect system performance. In all of
these, emphasis should be placed on the dynamics of workloads. And as with the
workload data, it is important to share the programs that perform the analysis
and implement the models — both to facilitate the dissemination and use of new
techniques, and to help ensure that researchers use compatible methodologies.
Acknowledgement. This research was supported by the Israel Science Foun-

dation (grant no. 219/99).
References
1. P. Abry and D. Veitch, “Wavelet analysis of long-range-dependent traffic”. IEEE

Trans. Information Theory 44(1), pp. 2–15, Jan 1998.
2. L. A. Adamic, “Zipf, power-laws, and Pareto – a ranking tutorial”. 2000.
http://www.hpl.hp.com/shl/papers/ranking/.
3. R. J. Adler, R. E. Feldman, and M. S. Taqqu (eds.), A Practical Guide to Heavy
Tails: Statistical Techniques and Applications. Birkhäuser, 1998.
4. A. K. Agrawala, J. M. Mohr, and R. M. Bryant, “An approach to the workload
characterization problem”. Computer 9(6), pp. 18–32, Jun 1976.
5. M. F. Arlitt and C. L. Williamson, “Web server workload characterization: the
search for invariants”. In SIGMETRICS Conf. Measurement & Modeling of Com-
put. Syst., pp. 126–137, May 1996.
6. P. Barford and M. Crovella, “Generating representative web workloads for net-
work and server performance evaluation”. In SIGMETRICS Conf. Measurement
& Modeling of Comput. Syst., pp. 151–160, Jun 1998.
7. L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, “Web caching and Zipf-
like distributions: evidence and implications”. In IEEE Infocom, pp. 126–134, Mar
1999.
8. W. Buchholz, “A synthetic job for measuring system performance”. IBM Syst. J.
8(4), pp. 309–318, 1969.
9. W. Bux and U. Herzog, “The phase concept: approximation of measured data and
perfrmance analysis”. In Computer Performance, K. M. Chandy and M. Reiser
(eds.), pp. 23–38, North Holland, 1977.
10. M. Calzarossa, G. Haring, G. Kotsis, A. Merlo, and D. Tessera, “A hierarchical
approach to workload characterization for parallel systems”. In High-Performance
Computing and Networking, pp. 102–109, Springer-Verlag, May 1995. Lect. Notes
Comput. Sci. vol. 919.
11. M. Calzarossa, L. Massari, and D. Tessera, “Workload characterization issues and
methodologies”. In Performance Evaluation: Origins and Directions, G. Haring,
C. Lindemann, and M. Reiser (eds.), pp. 459–482, Springer-Verlag, 2000. Lect.
Notes Comput. Sci. vol. 1769.
12. M. Calzarossa and G. Serazzi, “A characterization of the variation in time of work-
load arrival patterns”. IEEE Trans. Comput. C-34(2), pp. 156–162, Feb 1985.
13. M. Calzarossa and G. Serazzi, “Workload characterization: a survey”. Proc. IEEE
81(8), pp. 1136–1150, Aug 1993.
14. S-H. Chiang and M. K. Vernon, “Characteristics of a large shared memory produc-
tion workload”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson
and L. Rudolph (eds.), pp. 159–187, Springer Verlag, 2001. Lect. Notes Comput.
Sci. vol. 2221.
15. W. Cirne and F. Berman, “A comprehensive model of the supercomputer work-
load”. In 4th Workshop on Workload Characterization, Dec 2001.
16. E. G. Coffman, Jr., M. R. Garey, and D. S. Johnson, “Approximation algorithms for
bin-packing — an updated survey”. In Algorithm Design for Computer Systems
Design, G. Ausiello, M. Lucertini, and P. Serafini (eds.), pp. 49–106, Springer-
Verlag, 1984.
17. M. E. Crovella, “Performance evaluation with heavy tailed distributions”. In Job
Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.),
pp. 1–10, Springer Verlag, 2001. Lect. Notes Comput. Sci. vol. 2221.
138 D.G. Feitelson
18. M. E. Crovella and A. Bestavros, “Self-similarity in world wide web traffic: evi-
dence and possible causes”. In SIGMETRICS Conf. Measurement & Modeling of
Comput. Syst., pp. 160–169, May 1996.
19. M. E. Crovella and M. S. Taqqu, “Estimating the heavy tail index from scaling
properties”. Methodology & Comput. in Applied Probability 1(1), pp. 55–79, Jul
1999.
20. R. Cypher, A. Ho, S. Konstantinidou, and P. Messina, “A quantitative study of par-
allel scientific applications with explicit communication”. J. Supercomput. 10(1),
pp. 5–24, 1996.
21. A. B. Downey, “A parallel workload model and its implications for processor allo-
cation”. In 6th Intl. Symp. High Performance Distributed Comput., Aug 1997.
22. A. B. Downey, “The structural cause of file size distributions”. In 9th Modeling,
Anal. & Simulation of Comput. & Telecomm. Syst., Aug 2001.
23. A. B. Downey and D. G. Feitelson, “The elusive goal of workload characterization”.
Performance Evaluation Rev. 26(4), pp. 14–29, Mar 1999.
24. A. Erramilli, U. Narayan, and W. Willinger, “Experimental queueing analysis
with long-range dependent packet traffic”. IEEE/ACM Trans. Networking 4(2),
pp. 209–223, Apr 1996.
25. D. G. Feitelson, Analyzing the Root Causes of Performance Evaluation Results.
Technical Report 2002–4, School of Computer Science and Engineering, Hebrew
University, Mar 2002.
26. D. G. Feitelson, “The forgotten factor: facts”. In EuroPar, Springer-Verlag, Aug
2002. Lect. Notes Comput. Sci.
27. D. G. Feitelson, “Memory usage in the LANL CM-5 workload”. In Job Scheduling
Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 78–94,
Springer Verlag, 1997. Lect. Notes Comput. Sci. vol. 1291.
28. D. G. Feitelson, “Packing schemes for gang scheduling”. In Job Scheduling Strate-
gies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 89–110,
Springer-Verlag, 1996. Lect. Notes Comput. Sci. vol. 1162.
29. D. G. Feitelson and B. Nitzberg, “Job characteristics of a production parallel sci-
entific workload on the NASA Ames iPSC/860”. In Job Scheduling Strategies for
Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 337–360, Springer-
Verlag, 1995. Lect. Notes Comput. Sci. vol. 949.
30. D. G. Feitelson and L. Rudolph, “Metrics and benchmarking for parallel job
scheduling”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitel-
son and L. Rudolph (eds.), pp. 1–24, Springer-Verlag, 1998. Lect. Notes Comput.
Sci. vol. 1459.
31. D. Ferrari, “On the foundation of artificial workload design”. In SIGMETRICS
Conf. Measurement & Modeling of Comput. Syst., pp. 8–14, Aug 1984.
32. D. Ferrari, “Workload characterization and selection in computer performance
measurement”. Computer 5(4), pp. 18–24, Jul/Aug 1972.
33. K. Ferschweiler, M. Calzarossa, C. Pancake, D. Tessera, and D. Keon, “A commu-
nity databank for performance tracefiles”. In Euro PVM/MPI, Y. Cotronis and
J. Dongarra (eds.), pp. 233–240, Springer-Verlag, 2001. Lect. Notes Comput. Sci.
vol. 2131.
34. R. Gibbons, “A historical application profiler for use by parallel schedulers”. In
Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph
(eds.), pp. 58–77, Springer Verlag, 1997. Lect. Notes Comput. Sci. vol. 1291.
35. S. D. Gribble and E. A. Brewer, “System design issues for internet middleware
services: deductions from a large client trace”. In Symp. Internet Technologies and
Systems, USENIX, Dec 1997.
36. S. D. Gribble, G. S. Manku, D. Roselli, E. A. Brewer, T. J. Gibson, and E. L. Miller,

“Self-similarity in file systems”. In SIGMETRICS Conf. Measurement & Modeling
of Comput. Syst., pp. 141–150, Jun 1998.
37. M. Harchol-Balter and A. B. Downey, “Exploiting process lifetime distributions for
dynamic load balancing”. ACM Trans. Comput. Syst. 15(3), pp. 253–285, Aug
1997.
38. J. K. Hollingsworth, B. P. Miller, and J. Cargille, “Dynamic program instrumenta-
tion for scalable performance tools”. In Scalable High-Performance Comput. Conf.,
pp. 841–850, May 1994.
39. S. Hotovy, “Workload evolution on the Cornell Theory Center IBM SP2”. In
Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph
(eds.), pp. 27–40, Springer-Verlag, 1996. Lect. Notes Comput. Sci. vol. 1162.
40. G. Irlam, “Unix file size survey - 1993”. http://www.base.com/gordoni/ufs93.html.
41. J. Jann, P. Pattnaik, H. Franke, F. Wang, J. Skovira, and J. Riodan, “Model-
ing of workload in MPPs”. In Job Scheduling Strategies for Parallel Processing,
D. G. Feitelson and L. Rudolph (eds.), pp. 95–116, Springer Verlag, 1997. Lect.
Notes Comput. Sci. vol. 1291.
42. R. E. Kessler, M. D. Hill, and D. A. Wood, “A comparison of trace-sampling
techniques for multi-megabyte caches”. IEEE Trans. Comput. 43(6), pp. 664–
675, Jun 1994.
43. D. N. Kimelman and T. A. Ngo, “The RP3 program visualization environment”.
IBM J. Res. Dev. 35(5/6), pp. 635–651, Sep/Nov 1991.
44. D. L. Kiskis and K. G. Shin, “SWSL: a synthetic workload specification language
for real-time systems”. IEEE Trans. Softw. Eng. 20(10), pp. 798–811, Oct 1994.
45. E. J. Koldinger, S. J. Eggers, and H. M. Levy, “On the validity of trace-driven
simulation for multiprocessors”. In 18th Ann. Intl. Symp. Computer Architecture
Conf. Proc., pp. 244–253, May 1991.
46. G. Kotsis, “A systematic approach for workload modeling for parallel processing
systems”. Parallel Comput. 22, pp. 1771–1787, 1997.
47. P. Krueger, T-H. Lai, and V. A. Dixit-Radiya, “Job scheduling is more impor-
tant than processor allocation for hypercube computers”. IEEE Trans. Parallel &
Distributed Syst. 5(5), pp. 488–497, May 1994.
48. M. Krunz and S. K. Tripathi, “On the characterization of VBR MPEG streams”.
In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 192–202,
Jun 1997.
49. A. M. Law and W. D. Kelton, Simulation Modeling and Analysis. McGraw Hill,
3rd ed., 2000.
50. E. D. Lazowska, “The use of percentiles in modeling CPU service time distribu-
tions”. In Computer Performance, K. M. Chandy and M. Reiser (eds.), pp. 53–66,
North-Holland, 1977.
51. W. E. Leland and T. J. Ott, “Load-balancing heuristics and process behavior”. In
SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 54–69, 1986.
52. W. E. Leland, M. S. Taqqu, W. Willinger, and D. V. Wilson, “On the self-similar
nature of Ethernet traffic”. IEEE/ACM Trans. Networking 2(1), pp. 1–15, Feb
1994.
53. V. Lo, J. Mache, and K. Windisch, “A comparative study of real workload traces
and synthetic workload models for parallel job scheduling”. In Job Scheduling
Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 25–
46, Springer Verlag, 1998. Lect. Notes Comput. Sci. vol. 1459.
140 D.G. Feitelson
54. U. Lublin and D. G. Feitelson, The Workload on Parallel Supercomputers: Modeling

the Characteristics of Rigid Jobs. Technical Report 2001-12, Hebrew University,
Oct 2001.
55. A. D. Malony, D. A. Reed, and H. A. G. Wijshoff, “Performance measurement
intrusion and perturbation analysis”. IEEE Trans. Parallel & Distributed Syst.
3(4), pp. 433–450, Jul 1992.
56. B. B. Mandelbrot, The Fractal Geometry of Nature. W. H. Freeman and Co., 1982.
57. A. W. Mu’alem and D. G. Feitelson, “Utilization, predictability, workloads, and
user runtime estimates in scheduling the IBM SP2 with backfilling ”. IEEE Trans.
Parallel & Distributed Syst. 12(6), pp. 529–543, Jun 2001.
58. T. D. Nguyen, R. Vaswani, and J. Zahorjan, “Parallel application characteriza-
tion for multiprocessor scheduling policy design”. In Job Scheduling Strategies for
Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 175–199, Springer-
Verlag, 1996. Lect. Notes Comput. Sci. vol. 1162.
59. N. Nieuwejaar, D. Kotz, A. Purakayastha, C. S. Ellis, and M. L. Best, “File-access
characteristics of parallel scientific workloads”. IEEE Trans. Parallel & Distributed
Syst. 7(10), pp. 1075–1089, Oct 1996.
60. Parallel workloads archive. http://www.cs.huji.ac.il/labs/parallel/workload/.
61. K. Park and W. Willinger, “Self-similar network traffic: an overview”. In Self-
Similar Network Traffic and Performance Evaluation, K. Park and W. Willinger
(eds.), pp. 1–38, John Wiley & Sons, 2000.
62. V. Paxon and S. Floyd, “Wide-area traffic: the failure of Poisson modeling”.
IEEE/ACM Trans. Networking 3(3), pp. 226–244, Jun 1995.
63. E. E. Peters, Fractal Market Analysis. John Wiley & Sons, 1994.
64. S. V. Raghavan, D. Vasukiammaiyar, and G. Haring, “Generative workload models
for a single server environment”. In SIGMETRICS Conf. Measurement & Modeling
of Comput. Syst., pp. 118–127, May 1994.
65. E. Rosti, G. Serazzi, E. Smirni, and M. S. Squillante, “Models of parallel appli-
cations with large computation and I/O requirements”. IEEE Trans. Softw. Eng.
28(3), pp. 286–307, Mar 2002.
66. R. V. Rubin, L. Rudolph, and D. Zernik, “Debugging parallel programs in par-
allel”. In Workshop on Parallel and Distributed Debugging, pp. 216–225, SIG-
PLAN/SIGOPS, May 1988.
67. M. Schroeder, Fractals, chaos, Power Laws. W. H. Freeman and Co., 1991.
68. K. C. Sevcik, “Application scheduling and processor allocation in multipro-
grammed parallel processing systems”. Performance Evaluation 19(2-3), pp. 107–
140, Mar 1994.
69. A. Shaikh, J. Rexford, and K. G. Shin, “Load-sensitive routing of long-lived IP
flows”. In SIGCOMM, pp. 215–226, Aug 1999.
70. A. Singh and Z. Segall, “Synthetic workload generation for experimentation with
multiprocessors”. In 3rd Intl. Conf. Distributed Comput. Syst., pp. 778–785, Oct
1982.
71. D. Thiébaut, “On the fractal dimension of computer programs and its application
to the prediction of the cache miss ratio”. IEEE Trans. Comput. 38(7), pp. 1012–
1026, Jul 1989.
72. D. Thiébaut, J. L. Wolf, and H. S. Stone, “Synthetic traces for trace-driven simu-
lation of cache memories”. IEEE Trans. Comput. 41(4), pp. 388–410, Apr 1992.
(Corrected in IEEE Trans. Comput. 42(5) p. 635, May 1993).
73. J. J. P. Tsai, K-Y. Fang, and H-Y. Chen, “A noninvasive architecture to monitor
real-time distributed systems”. Computer 23(3), pp. 11–23, Mar 1990.
74. J. S. Vetter and F. Mueller, “Communication characteristics of large-scale scien-

tific applications for contemporary cluster architectures”. In 16th Intl. Parallel &
Distributed Processing Symp., May 2002.
75. W. Willinger, M. S. Taqqu, R. Sherman, and D. V. Wilson, “Self-similarity through
high-variability: statistical analysis of Ethernet LAN traffic at the source level”.
In ACM SIGCOMM, pp. 100–113, 1995.
76. K. Windisch, V. Lo, R. Moore, D. Feitelson, and B. Nitzberg, “A comparison of
workload traces from two production parallel machines”. In 6th Symp. Frontiers
Massively Parallel Comput., pp. 319–326, Oct 1996.
77. G. K. Zipf, Human Behavior and the Principle of Least Effort. Addison-Wesley,
1949.
Capacity Planning for Web Services
Techniques and Methodology
Virgilio A.F. Almeida
Department of Computer Science,

Federal University of Minas Gerais,
31270-010 Belo Horizonte, Brazil
[email protected]
Abstract. Capacity planning is a powerful tool for managing quality of

service on the Web. This tutorial presents a capacity planning methodol-
ogy for Web-based environments, where the main steps are: understand-
ing the environment, characterizing the workload, modeling the work-
load, validating and calibrating the models, forecasting the workload,
predicting the performance, analyzing the cost-performance plans, and
suggesting actions. The main steps are based on two models: a workload
model and a performance model. The first model results from under-
standing and characterizing the workload and the second from a quanti-
tative description of the system behavior. Instead of relying on intuition,
ad hoc procedures and rules of thumb to understand and analyze the
behavior of Web services, this tutorial emphasizes the role of models, as
a uniform and formal way of dealing with capacity planning problems.
1 Introduction
Performance, around-the-clock availability, and security are the most common
indicators of quality of service on the Internet. Management faces a twofold
challenge. On the one hand, it has to meet customer expectations in terms of
quality of service. On the other hand, companies have to keep IT costs under
control to stay competitive. Many possible alternative architectures can be used
to implement a Web service; one has to be able to determine the most cost-
effective architecture and system. This is where the quantitative approach and
capacity planning techniques come into play. This tutorial introduces capacity
planning [19,1] as an essential tool for managing quality of service on the Web
and presents a methodology, where the main steps are: understanding the envi-
ronment, characterizing the workload, modeling the workload, validating and cal-
ibrating the models, predicting the performance, analyzing the cost-performance
plans, and suggesting actions. It provides a framework for planning the capacity
of Web services and understanding their behavior. The tutorial also discusses a
state transition graph called Customer Behavior Model Graph (CBMG), that is
used to describe the behavior of groups of users who exhibit similar navigational
patterns. The rest of the paper is organized as follows. Section two presents the
main steps of the capacity planning methodology. Section three discusses the

Capacity Planning for Web Services 143
role of models in capacity planning. Section four describes workload models.

Next section discusses issues related to performance models. Finally, section six
presents concluding remarks.
2 Capacity Planning as a Management Tool
Planning the capacity of Web services requires that a series of steps be followed
in a systematic way. Figure 1 gives an overview of the main steps of the quanti-
tative approach to analyze Web services. The starting point of the process is the
business model and its measurable objectives, which are used to establish service
level goals and to find out the applications that are central to the goals. Once
the business model and its quantitative objectives have been understood, one
is able to go through the quantitative analysis cycle. We now cover the various
steps of the capacity planning process.
2.1 Understand the Environment
The first step entails obtaining an in-depth understanding of the service architec-
ture. This means answering questions such as: What are the system requirements
of the business model? What is the configuration of the site in terms of servers
and internal connectivity? How many internal layers are there in the site? What
types of servers (i.e., HTTP, database, authentication, streaming media) is the
site running? What type of software (i.e., operating system, HTTP server soft-
ware, transaction monitor, DBMS) is used in each server machine? How reliable
and scalable is the architecture? This step should yield a systematic descrip-
tion of the Web environment, its components, and services. This initial phase of
the process consists of learning what kind of hardware and software resources,
network connectivity, and network protocols, are present in the environment. It
also involves the identification of peak usage periods, management structures,
and service-level agreements. This information is gathered by various means in-
cluding user group meetings, audits, questionnaires, help desk records, planning
documents, interviews, and other information-gathering techniques [14].
Table 1 summarizes the main elements of a system that must be catalogued
and understood before the remaining steps of the methodology can be taken.
2.2 Characterize Workload
Workload characterization is the process of precisely describing the systems’s

global workload in terms of its main components. Each workload component
is further decomposed into basic components. The basic components are then
characterized by workload intensity (e.g., transaction arrival rates) and service
demand parameters at each resource.
Capacity planning procedures have been used to assure that users receive
adequate quality of service as they navigate through the site. A key step of any
144 V.A.F. Almeida
Table 1. Elements in Understanding the Environment
Element Description
Web Server Quantity, type, configuration, and function.
Application Server Quantity, type, configuration, and function.
Database Server Quantity, type, configuration, and function.
Middleware Type (e.g., TP monitors and DBMS).
Application Main applications.
Network connectivity Network connectivity diagram showing LANs,
WANs, routers, servers, etc.
Network protocols List of protocols used.
Service-level agreements Existing SLAs per application or service.
User Community Number of potential users, geographic location, etc.
Procurement procedures Elements of the procurement process, expenditure limits.
Business Model &

Measurable Goals
Cost Performance Understand Service

Analysis and Actions
Architecture
Predict Service Characterize

Performance Service Workload
Performance Model
Model Validation Obtain Model

and Calibration Parameters
Workload Model
Develop Forecast
Performance Models Workload Evolution
Fig. 1. Capacity Planning Process
performance evaluation and capacity planning study is workload characteriza-

tion [4,18,3]. Thus, the second step of the methodology aims at characterizing
the workload of a Web service. In Web-based environments [3], users interact
with the site through a series of consecutive and related requests, called ses-
sions. A session is a sequence of requests to execute e-business functions made

by a single customer during a single visit to a Web service. Different navigational
patterns can be observed for different groups of users. Examples of e-business
functions requested by an online shopper include browse the catalog, search for
products or services based on keywords, select products to obtain more detailed
information, add to the shopping cart, user registration, and checkout. A cus-
tomer of an online brokerage site would request different functions, such as enter
a stock order, research a mutual fund history, obtain real-time quotes, retrieve
company profiles, and compute earning estimates. Web workload is unique in
its characteristics and some studies [2,5] identified workload properties and in-
variants, such as the heavy-tailed distributions (e.g., Pareto distribution) of file
sizes in the Web. It has been also observed that Web traffic is bursty in several
time scales [20,8].
2.3 Obtain Parameters for the Workload Model

The third step consists of obtaining values for the parameters of the workload
models. This step also involves monitoring and measuring the performance of a
Web service. It is a key step in the process of guaranteeing quality of service and
preventing problems. Performance measurements should be collected from dif-
ferent reference points, carefully chosen to observe and monitor the environment
under study. For example, logs of transactions and accesses to servers are the
main source of information. Further information, such as page download times
from different points in the network may help to track the service level perceived
by customers. The information collected should help us answer questions such
as: What is the number of user visits per day? What is the average and peak
traffic to the site? What characterizes the shoppers of a particular set of prod-
ucts? What are the demands generated by the main requests on the resources
(e.g., processors, disks, and networks) of the IT infrastructure? Steps 2 and 3
generate the workload model, which is a synthetic and compact representation
of the workload seen by a Web service.
The parameters for a basic component are seldom directly obtained from
measurements. In most cases, they must be derived from other parameters that
are measured directly. Table 2 shows an example of two basic components, along
with examples of parameters that can be measured for each. The last column
indicates the type of basic component parameter—workload intensity (WI) or
service demand (SD). Values must be obtained or estimated for these parameters,
preferably through measurements with performance monitors and accounting
systems. Measurements must be made during peak workload periods and for an
appropriate monitoring interval.
2.4 Forecast Workload Evolution

The fourth step forecasts the expected workload intensity for a Web service.
Forecasting is the art and science of predicting future events. It has been ex-
tensively used in many areas, such as the financial market, climate studies and
146 V.A.F. Almeida
Table 2. Example of Basic Component Parameters and Types
Basic Component and Parameters Parameter Type

Order transaction
Number of transactions submitted per customer WI
Number of registered customers WI
Total number of IOs to the Sales DB SD
CPU utilization at the DB server SD
Average message size sent/received by the DB server SD
Web-based training
Average number of training sessions/day WI
Average size of image files retrieved SD
Average size of http documents retrieved SD
Average number of image files retrieved/session SD
Average number of documents retrieved/session SD
Average CPU utilization of the httpd server SD
production and operations management [12]. For example, one could forecast
number and type of employees, volume and type of production, product de-
mand, volume and destination of products. In the Internet, demand forecasting
is essential for guaranteeing quality of service. It is critical for the operation of
Web services. Let us consider the following scenario [16]. Unprecedented demand
for the newest product slows Web servers to a crawl. The company servers were
overwhelmed on Tuesday as a wave of customers attempted to download the
company’s new software product. Web services, in terms of responsiveness and
speed, started degrading as more and more customers tried to access the service.
And it is clear that many frustrated customers simply stopped trying. This un-
desirable scenario emphasizes the importance of good forecasting and planning
for Web environments.
A good forecast is more than just a single number; it is a set of scenarios and
assumptions. Time plays a key role in the forecasting process. The longer the
time horizon, the less accurate the forecast will be. Forecasting horizons can be
grouped into the classes: short term (e.g., less than three months), intermediate
term (e.g., from three months to one year) and long term (e.g., more than 2
years). Demand forecasting in the Web can be illustrated by typical questions
that come up very often during the course of capacity planning projects. Can
we forecast the number of visitors to the company’s Web site in order to plan
the adequate capacity to support the load? What is the expected workload for
the credit card authorization service during the Christmas season? How will the
number of messages processed by the e-mail servers vary over the next year?
What will be the number of simultaneous users for the streaming media ser-
vices six months from now? Implementation of Web services should rely on a
careful planning process, a planning process that pays attention to performance
and capacity right from the beginning. The goal of this step is to use existing
forecasting methods and techniques to predict future workload for Web services.
The literature [12,7] describes several forecasting techniques. In selecting one,

some factors need to be considered. The first one is the availability and reliabil-
ity of historical data. The degree of accuracy and the planning horizon are also
factors that determine the forecasting technique. The pattern found in historical
data has a strong influence on the choice of the technique. The nature of histor-
ical data may be determined through visual inspection of a plot of the data as
a function of time. Three patterns of historical data are commomly identified:
random, trend, and seasonal. While the trend pattern reflects a workload that
tends to increase (or decrease, in some cases), seasonal patterns show the pres-
ence of fluctuations. The underlying hypothesis of forecasting techniques is that
the information to be forecast is somehow directly related to historical data;
this emphasizes the importance of knowing the pattern of historical data. There
are many commercial packages (e.g., Matlab, S-PLUS, MS-EXCEL [11]) that
perform various methods of forecasting techniques.
2.5 Develop Performance Model

In the fifth step, quantitative techniques and analytical models based on queuing
network theory are used to develop performance models of Web services. Perfor-
mance models are used to predict performance when any aspect of the workload
or the site architecture is changed. Two types of models may be used: simu-
lation models and analytical models. Analytical models [6] specify the interac-
tions between the various components of a Web system via formulas. Simulation
models [10] mimic the behavior of the actual system by running a simulation
program. After model construction and parameterization, the model is solved.
That is, the model parameters are manipulated in some fashion to yield per-
formance measures (e.g., throughput, utilization, response time). Many solution
techniques have been suggested [13].
Performance models have to consider contention for resources and the queues
that arise at each system resource—processors, disks, routers, and network links.
Queues also arise for software resources—threads, database locks, and protocol
ports. The various queues that represent a distributed system are interconnected,
giving rise to a network of queues, called a queuing network (QN). The level of
detail at which resources are depicted in the QN depends on the reasons to build
the model and the availability of detailed information about the operation and
availability of detailed parameters of specific resources.
The input parameters for queuing network models describe the resources of
the system, the software, and the workload of the system under study. These
parameters include four groups of information:
– servers or components
– workload classes
– workload intensity
– service demands
In order to increase the model’s representativeness, workloads are partitioned
into classes of somehow similar components. Programs that are alike concerning
148 V.A.F. Almeida
the resource usage may be grouped into workload classes. Depending on the way
a given class is processed by a system, it may be classified as one of two types:
open, or closed.
Servers or service centers, are components of performance models intended
to represent the resources of a system. The first step in specifying a model is
the definition of the servers that make up the model. The scope of the capacity
planning project helps to select which servers are relevant to the performance
model. Consider the case of a Web site composed of Web servers, application and
database servers connected via a LAN. The capacity planner wants to examine
the impact caused on the system by the estimated growth of sales transactions.
The specific focus of the project may be used to define the components of a per-
formance model. For example, the system under study could be well represented
by an open queueing network model consisting of queues, which correspond to
the servers of the site. A different performance model, with other queues, would
be required if the planner were interested in studying the effect of a proxy cache
on the performance of the system.
2.6 Validate Performance Model

Once the model has been constructed, parameterized, and solved, it should be
validated. That is, the performance measures found by solving the model should
be compared against actual measurement data of the system being modeled. A
performance model is said to be valid if the performance metrics (e.g., response
time, resource utilizations, and throughputs) calculated by the model match the
measurements of the actual system within a certain acceptable margin of er-
ror. For instance, the actual server utilizations should be compared against the
server utilizations found by solving the model. This comparison check will be
judged to be either acceptable or unacceptable. The choice of what determines
acceptable versus unacceptable is left to the modeler. As a rule of thumb, de-
vice utilizations within 10%, system throughput within 10%, and response time
within 20% are considered acceptable [13]. If the comparison is unacceptable, a
series of questions must be addressed to determine the source of the errors.
Errors are possible within each capacity planning step. During workload char-
acterization, measurements are taken for service demands, workload intensity,
and for performance metrics such as response time, throughput, and device uti-
lization. The same measures are computed by means of the performance model.
If the computed values do not match the measured values within an accept-
able level, the model must be calibrated. Even though one can hypothesize the
source of possible errors, it is often difficult to pinpoint them and correct them.
Therefore, it is normal to iterate among the steps of the methodology until an
acceptable model is found. This changing of the model to force it to match
the actual system is referred to as the calibration procedure. A calibration is a
change to some target parameter of the analytic model. A detailed discussion
of calibration techniques is given in [13]. When the model is considered valid it
can be used for performance prediction. The sixth step aims at validating the
models used to represent performance and workload.
2.7 Predict Service Performance

Prediction is key to capacity planning because one needs to be able to determine
how a Web service will react when changes in load levels and customer behavior
occur or when new business models are developed. This determination requires
predictive models and not experimentation. So, in the seventh step, one uses
performance models to predict the performance of Web services under many
different scenarios [9].
Performance models aim at representing the behavior of real systems in terms
of their performance. In order to use performance models for predicting future
scenarios, one needs to obtain the input parameters to feed the model. The in-
put parameters for performance models describe the hardware configuration, the
software environment, and the workload of the system under study. The repre-
sentativeness of a model depends directly on the quality of input parameters.
Therefore, a key issue to conduct practical performance prediction is the deter-
mination of input parameters for performance models. Two practical questions
naturally arise when one thinks of modeling a real system:
– What are the information sources for determining input parameters?
– What techniques are used to calculate input parameters?
The main source of information is the set of performance measurements collected
from the observation of the real system under study. Further information can
also be obtained from benchmarks and from product specifications provided by
manufacturers. However, typical measurement data do not coincide with the kind
of information required as input by performance models. For modeling purposes,
typical measurement data need to be reworked in order to become useful.
2.8 Analyze Future Scenarios

In the eighth step of the cycle, many possible candidate architectures are ana-
lyzed in order to determine the most cost-effective one. Future scenarios should
take into consideration the expected workload, the site cost, and the quality of
service perceived by customers. Finally, this step should indicate to management
what actions should be taken to guarantee that the Web services will meet the
business goals set for the future.
The performance model and cost models can be used to assess various sce-
narios and configurations. Some example scenarios are, “Should we use CDN
services to serve images?” “ Should we use Web hosting services? ” “Should we
mirror the site to balance the load, cut down on network traffic and improve
global performance?” For each scenario, we can predict what the performance of
each system component will be and what the costs are for the scenario. The com-
parison of the various scenarios yields a configuration plan, an investment plan,
and a personnel plan. The configuration plan specifies which upgrades should
be made to existing hardware and software resources. The performance model
is built and solved and a cost model developed, various analyses can be made
regarding cost-performance tradeoffs. The investment plan specifies a timeline
150 V.A.F. Almeida
− a v a ila b ility
M e tr ic s : − r e s p o n s e tim e
− th ro u g h p u t
U s e r W o r k lo a d P e rfo rm a n c e
M o d e l M o d e l M o d e l
W h a t− if q u e s tio n s W h a t− if q u e s tio n s
r e g a r d in g im p a c ts o f re g a rd in g im p a c ts o f
u s e r b e h a v io r w o r k lo a d , a r c h ite c tu r e , a n d
c o n fig u r a tio n c h a n g e s
Fig. 2. Customer, Workload, and Resource Models.
for investing in the necessary upgrades. The personnel plan determines what
changes in the support personnel size and structure must be made in order to
accommodate changes in the system.
3 Models for Capacity Planning

Models play a central role in capacity planning. In the methodology discussed
here, we consider two types of models: performance model and workload model.
In Web environments, user models are important to provide information to
workload models. Figure 2 shows the relationship between user model, work-
load model, and performance model.
Each Web service request (e.g., a credit card authorization or a search) may
exercise the site’s resources in different manners. Some services may use large
amount of processing time from the application server while others may con-
centrate on the database server. Other service may demand high network band-
width, such as requests for streaming media services. Different users exhibit
different navigational patterns and, as a consequence, invoke services in differ-
ent ways with different frequencies. For instance, in an e-business service, some
customers may be considered as heavy buyers while others, considered occasional
buyers, would spend most of their time browsing and searching the site. Under-
standing the customer behavior is critical for characterizing the workload as well
as to an adequate sizing of the site’s resources. Models of user behavior can be
quite useful. In addition to characterizing navigational patterns within sessions,
one needs to characterize the rate at which sessions of different types are started.
This gives us an indication of the workload intensity. Workload models provide
input parameters for performance models, that predict the system behavior for
that specific workload.
Customer (i.e. user) models capture elements of user behavior in terms of
navigational patterns, e-business functions used, frequency of access to the vari-
ous e-business functions, and times between access to the various services offered
by the site. A customer model can be useful for navigational and workload pre-
diction.
– Model User Navigational Patterns for Predictive Purposes. By building mod-

els, one can answer what-if questions regarding the effects on user behavior
due to site layout changes or content redesign.
– Capture Workload Parameters. If the only purpose of a customer model is
to generate a workload model to be used as input to a resource model, then
it is not necessary to use a detailed model.
Workload models describe the workload of an Web service in terms of work-

load intensity (e.g., transaction arrival rates) and service demands on the various
resources (e.g., processors, I/O subsystems, networks) that make up the site. The
workload model can be derived from the customer model as shown in [16]. Per-
formance models represent the various resources of the site and captures the
effects of the workload model on these resources. A performance model can be
used for predictive purposes to answer what-if questions regarding performance
impacts due to changes in configuration, software and hardware architecture,
and other parameters. A performance model is used to compute the values of
metrics such as response time, throughput, and business-oriented metrics such
as revenue throughput.
4 Workload Models
A workload model is a representation that mimics the real workload under study.
Although each system may require a specific approach to characterize and gen-
erate a workload model, there are some general guidelines that apply well to all
types of systems [4]. The common steps to be followed by any workload charac-
terization include: (1) specification of a point of view from which the workload
will be analyzed, (2) choice of the set of parameters that captures the most rel-
evant characteristics of the workload for the purpose of capacity planning, (3)
monitoring the system to obtain the raw performance data, (4) analysis and
reduction of performance data, (5) construction of a workload model, and (6)
verification that the model does capture all the important performance informa-
tion.
Graphs are also used to represent workloads. For example, a graph-based
model can be used to characterize Web sessions and generate information for
constructing workload models. This section concentrates on models that rep-
resent the behavior of users (i.e, customers). User models capture elements of
user behavior in terms of navigational patterns, Web service functions used,
frequency of access to the various functions, and times between access to the
various services offered by the site. Two different types of models are commonly
used in the capacity planning methodology.
152 V.A.F. Almeida
0.30
2
0.50 0.30
Browse
0.1
0.25
1 6 5 4
0.20 0.2
Entry Pay Add to Cart Select
0.60 0.20
1.0
0.10 0.1
0.45
0.1
0.40
0.50
Search
0.30
3
0.40
Fig. 3. The Customer Behavior Model Graph
4.1 Customer Graph Behavior Model

In Web-based environments, users interact with the site through a series of
consecutive and related requests, called sessions. It has been observed that dif-
ferent customers exhibit different navigational patterns. The Customer Behavior
Graph Model (CBMG), introduced in [15,17], can be used to capture the navi-
gational pattern of a customer through an e-business site. This pattern includes
two aspects: a transitional and a temporal one. The former determines how a
customer moves from one state (i.e., an e-business function) to the next. This is
represented by the matrix of transition probabilities. The temporal aspect has
to do with the time it takes for a customer to move from one state to the next.
This time is measured from the server’s perspective and is called server-perceived
think time or just think time. This is defined as the average time elapsed since
the server completes a request for a customer until it receives the next request
from the same customer during the same session. A think time can be associated
with each transition in the CBMG.
So, a CBMG can be defined by a pair (P, Z) where P = [pi,j ] is an n×n matrix
of transition probabilities between the n states of the CBMG and Z = [zi,j ] is
an n × n matrix that represents the average think times between the states of
the CBMG. Recall that state 1 is the Entry state and n is the Exit state.
Consider the CBMG of Figure 3. This CBMG has seven states; the Exit state,
state seven, is not explicitly represented in the figure. Let Vj be the average
number of times that state j of the CBMG is visited for each visit to the e-
business site, i.e., for each visit to the state Entry. Consider the Add to Cart
state. We can see that the average number of visits (VAdd ) to this state is equal
to the average number of visits to the state Select (VSelect ) multiplied by the
probability (0.2) that a customer will go from Select to Add Cart. We can then
write the relationship
VAdd = VSelect × 0.2. (1)
Consider now the Browse state. The average number of visits (VBrowse ) to this
state is equal to the average number of visits to state Search (VSearch ) multiplied
by the probability (0.2) that a customer will go from Search to Browse, plus the
average number of visits to state Select (VSelect ) multiplied by the probability
(0.30) that a customer will go from Select to Browse, plus the average number
of visits to the state Add to Cart (VAdd ) multiplied by the probability (0.25)
that a customer will go from Add to Cart to Browse, plus the average number
of visits to the state Browse (VBrowse ) multiplied by the probability (0.30) that
a Customer will remain in the Browse state, plus the number of visits to the
Entry state multiplied by the probability (0.5) of going from the Entry state to
the Browse state. Hence,
VBrowse = VSearch × 0.20 + VSelect × 0.30 + VAdd × 0.25 +

VBrowse × 0.30 + VEntry × 0.5. (2)
So, in general, the average number of visits to a state j of the CBMG is

equal to the sum of the number of visits to all states of the CBMG multiplied
by the transition probability from each of the other states to state j. Thus, for
any state j (j = 2, · · · , n − 1) of the CBMG, one can write the equation

n−1
Vj = Vk × pk,j , (3)
k=1
where pk,j is the probability that a customer makes a transition from state k
to state j. Note that the summation in Eq. (3) does not include state n (the
Exit state) since there are no possible transitions from this state to any other
state. Since V1 = 1 (because state 1 is the Entry state), we can find the average
number of visits Vj by solving the system of linear equations
V1 = 1 (4)

n−1
Vj = Vk × pk,j j = 2, · · · , n − 1. (5)
k=1
Note that Vn = 1 since, by definition, the Exit state is only visited once per
session.
Useful metrics can be obtained from the CBMG. Once we have the average
number of visits (Vj ) to each state of the CBMG, we can obtain the average
session length as

n−1
AverageSessionLength = Vj . (6)
j=2
154 V.A.F. Almeida
Table 3. Using the CVM to Characterize a Session
Session 1 Session 2 Session 3

Home 1 2 3
Browse 4 8 4
Search 5 5 3
Login 0 1 1
Pay 0 0 1
Register 0 0 1
Add to Cart 0 2 1
Select 3 3 2
For the the visit ratios of Fig. 3, the average session length is
AverageSessionLength = VBrowse + VSearch + VSelect + VAdd + VPay

= 2.498 + 4.413 + 1.324 + 0.265 + 0.053
= 8.552. (7)
The buy to visit Ratio is simply given by VPay .

Each customer session can be represented by a CBMG, that can be derived
from HTTP logs. “Similar” sessions can be clustered to represent each group
by a single CBMG. The goal is to characterize the workload by a relatively
small and representative number of CBMGs as opposed to having to deal with
thousands or even hundreds of thousands of CBMGs. Procedures and algorithms
for clustering CBMGs are described in details in [15]
4.2 The Customer Visit Model (CVM)
An alternate and less detailed representation of a session would entail represent-

ing a session as a vector of visit ratios to each state of the CBMG. The visit
ratio is the number of visits to a state during a session.
Table 3 shows an example of three sessions described by the number of visits
to each state of the CBMG. Note that states Entry and Exit are not represented
in the CVM since the number of visits to these states is always one. Session 1 in
the table represents a session of a customer who browsed through the site, did a
few searches, but did not login or buy anything. In Session 2, the customer logged
in but did not need to register because it was an already registered customer.
This customer abandoned the site before paying even though two items had been
added to the shopping cart. Finally, Session 3 represents a new customer who
registers with the site, adds one item to the shopping cart, and pays for it.
The CVM is then a set of vectors (columns in Table 3) that indicate the
number of times each of the functions supplied by the e-business site are exe-
cuted. For example, Session 1 would be represented by the vector (1, 4, 5, 0, 0,
0, 0, 3) and Session 2 by the vector (2, 8, 5, 1, 0, 0, 2, 3).
The CBMG is a state transition graph, in which the nodes correspond to

states in the session (e.g., browsing, searching, selecting, checking out, and pay-
ing) and the arcs correspond to transitions between states. Probabilities are as-
sociated with transitions as in a Markov Chain. A Customer Visit Model (CVM)
represents sessions as a collection of session vectors, one per session. A session
vector Vj = (v1 , v2 , · · · vm ) for the j th session indicates the number of times,
that each of the different functions (e.g., search, browse, add to cart, etc) were
invoked during the session. To be able to perform capacity planning studies of
a Web service, one needs to map each CBMG or CVM resulting from the work-
load characterization process described above to IT resources. In other words,
one has to associate service demands at the various components (e.g., processors
disks and network) with the execution of the functions [16].
5 Performance Models
Performance models represent the way system’s resources are used by the work-
load and capture the main factors determining system performance. These mod-
els use information provided by workload models and system architecture de-
scription. Performance models are used to compute both traditional performance
metrics such as response time, throughput, utilization, and mean queue length
as well as innovative business-oriented performance metrics, such as revenue
throughput or lost-revenue throughput. Basically, performance models can be
grouped into two categories: analytic and simulation models. Performance mod-
els help us understand the quantitative behavior of complex systems, such as
electronic business applications, e-government, and entertainment. Performance
models have been used for multiple purposes in systems.
– In the infrastructure design of Web-based applications, various issues call for
the use of models to evaluate system alternatives. For example, a distributed
Web server system is any architecture consisting of multiple Web server hosts
distributed on a LAN, with some sort of mechanism to distribute incoming
requests among the servers. So, for a specific type of workload, what is the
most effective scheme for load balancing in a certain distributed Web server
system? Models are also useful for analyzing document replacement policies
in caching proxies. Bandwidth capacity of certain network links can also be
estimated by performance models. In summary, performance models are an
essential tool for studying resource allocation problems in the context of Web
services.
– Most Web-based applications operate in multi-tiered environments. Models
can be used to analyze performance of distributed applications running on
three-tiered architectures, composed of Web servers, application servers and
database servers.
– Performance tuning of complex applications is a huge territory. When a
Web-based application presents performance problems, a mandatory step to
solve them is to tune the underlying system. This means to measure the
system and try to identify the sources of performance problems: application
156 V.A.F. Almeida
design, lack of capacity, excess of load, or problems in the infrastructure

(i.e., network, servers, ISP). Performance models can help find performance
problems by answering what-if questions as opposed to making changes in
the production environment.
Parameters for queuing network (QN) models are divided into the following
categories. (1) System parameters specify the characteristics of a system that
affect performance. Examples include load-balancing disciplines for Web server
mirroring, network protocols, maximum number of connections supported by a
Web server, and maximum number of threads supported by the database man-
agement system. (2) Resource parameters describe the intrinsic features of a
resource that affect performance. Examples include disk seek times, latency and
transfer rates, network bandwidth, router latency, and processor speed ratings.
(3) Workload parameters that are derived from workload characterization and
are divided into types: workload intensity and service demand. Workload inten-
sity parameters provide a measure of the load placed on the system, indicated
by the number of units of work that contend for system resources. Examples in-
clude the number of requests/sec submitted to the database server and number
of sales transactions submitted per second to the credit card service. Workload
service demand parameters specify the total amount of service time required by
each basic component at each resource. Examples include the processor time of
transactions at the database server, the total transmission time of replies from
the database server and the total I/O time at the streaming media server.
6 Concluding Remarks
Capacity planning techniques are needed to avoid the pitfalls of inadequate ca-
pacity and to meet users’ performance expectations in a cost-effective manner.
This tutorial provides the foundations required to carry out capacity planning
studies. Planning the capacity of Web services requires that a series of steps be
followed in a systematic way. This paper gives an overview of the main steps of
the quantitative approach to analyze Web services. The main steps are based
on two models: a workload model and a performance model. The two models
can be used in capacity planning projects to answer typical what-if questions,
frequently faced by managers of Web services.
References
1. V. Almeida and D. Menascé, “Capacity Planning: an essential tool for managing

Web services”, IEEE IT Pro, Vol. 4, Issue 4, July-August, 2002.
2. M. Arlitt and C. Williamson, “Internet Web Servers: workload characterization and
performance implication”, in IEEE/ACM Trans. on Networking, October 1997.
3. M. Arlitt, D. Krishnamurthy, and J. Rolia, “Workload Characterization and Perfor-
mance Scalability of a Large Web-based Shopping System”, in ACM Transactions
on Internet Technologies, Vol.1, No. 1, Aug. 2001.
4. M. Calzarossa and G. Serazzi, “Workload Characterization: A Survey,” Proceedings

of the IEEE , Vol. 81, No. 8, August 1993.
5. M. Crovella and A. Bestravos, “ Self-Similarity in the World Wide Web: evidence
possible causes”, in IEEE/ACM Transactions on Networking, 5(6):835–846, De-
cember 1997.
6. P. Denning and J. Buzen, “The operational analysis of queuing network models”,
Computing Surveys, Vol. 10, No. 3 , September 1978, pp. 225-261.
7. R. Jain, The Art of Computer Systems Performance Analysis. New York: Wiley,
1991.
8. K. Kant and Y. Won “Server Capacity Planning for Web Traffic Workload”, in
IEEE Trans. on Knowledge and Data Engineering, September 1999.
9. D. Krishnamurthy and J. Rolia, “Predicting the Performance of an E-Commerce
Server: Those Mean Percentiles,” in Proc. First Workshop on Internet Server Per-
formance, ACM SIGMETRICS 98, June 1998.
10. A. Law and W. Kelton, Simulation Modeling and Techniques. 2nd ed. New York:
McGraw-Hill, 1990.
11. D. Levine, P. Ramsey, R. Smidt, Applied Statistics for Engineers and Scientists:
Using Microsoft Excel & MINITAB, Upper Saddle River, Prentice Hall, 2001,
12. J. Martinich, Production and Operations Management : An Applied Modern Ap-
proach, John Wiley & Sons, 1996.
13. D. A. Menascé, V. A. F. Almeida, and L. W. Dowdy, Capacity Planning and
Performance Modeling: From Mainframes to Client-Server Systems. Upper Saddle
River, NJ: Prentice Hall, 1994.
14. D. A. Menascé, D. Dregits, R. Rossin, and D. Gantz, A federation-oriented capac-
ity management methodology for LAN environments, Proc. 1995 Conf. Comput.
Measurement Group, Nashville, TN, Dec. 3–8, 1995,
15. D. A. Menascé, V. Almeida, R. Fonseca, and M. Mendes, “A Methodology for
Workload Characterization for E-Commerce Servers”, Proc. 1999 ACM Conference
in Electronic Commerce, Denver, 1999.
16. D. A. Menascé and V. A. F. Almeida, Scaling for E-Business: technologies, models,
performance and capacity planning, Prentice Hall, Upper Saddle River, 2000.
17. D. A. Menascé, V. A. F. Almeida, R. Fonseca, and M. A. Mendes, “Business-
oriented Resource Management Policies for E-Commerce Servers,” Performance
Evaluation, September 2000.
18. D. A. Menascé, V. Almeida, R. Fonseca, R. Riedi, F. Ribeiro, and W. Meira Jr.,
“In Search of Invariants for E-Business Workloads ”, Proc. 2000 ACM Conference
in Electronic Commerce, Minneapolis, 2000.
19. D. A. Menascé and V. A. F. Almeida, Capacity Planning for Web Services: metrics,
models and methods, Prentice Hall, Upper Saddle River, 2002.
20. V. Paxson and S. Floyd, “Wide area traffic: The failure of Poisson modeling,”
IEEE/ACM Transactions on Networking 3, pp. 226–244, 1995.
End-to-End Performance of Web Services
Paolo Cremonesi and Giuseppe Serazzi
Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy,

{cremones, serazzi}@elet.polimi.it
Abstract. As the number of applications that are made available over

the Internet rapidly grows, providing services with adequate performance
becomes an increasingly critical issue. The performance requirements of
the new applications span from few milliseconds to hundreds of seconds.
In spite of the continuous technological improvement (e.g., faster servers
and clients, multi-threaded browsers supporting several simultaneous and
persistent TCP connections, access to the network with larger bandwidth
for both servers and clients), the network performance as captured by re-
sponse time and throughput does not keep up and progressively degrades.
Several are the causes of the poor “Quality of Web Services” that users
very often experience. The characteristics of the traffic (self-similarity
and heavy-tailedness) and the widely varying resource requirements (in
terms of bandwidth, size and number of downloaded objects, processor
time, number of I/Os, etc.) of web requests are among the most im-
portant ones. Other factors refer to the architectural complexity of the
network path connecting the client browser to the web server and to the
protocols behavior at the different layers.
In this paper we present a study of the performance of web services.
The first part of the paper is devoted to the analysis of the origins of
the fluctuations in web data traffic. This peculiar characteristic is one of
the most important causes of the performance degradation of web ap-
plications. In the second part of the paper experimental measurements
of performance indices, such as end-to-end response time, TCP connec-
tion time, transfer time, of several web applications are presented. The
presence of self-similarity characteristics in the traffic measurements is
shown.
1 Introduction
In the last few years, the number of network-based services available on the Inter-
net has grown considerably. Web servers are now used as the ubiquitous interface
for information exchange and retrieval both at enterprise level, via intranets, and
at global level, via the the World Wide Web. In spite of the continuous increase
of the network capacity, in terms of investments in new technologies and in new
network components, the Internet still fails to satisfy the needs of a consistent
fraction of users. New network-based applications require interactive response

This work has been supported by MIUR project COFIN 2001: High quality Web
systems.

End-to-End Performance of Web Services 159
time ranging from few milliseconds to tens of seconds. Traditional best-effort

service that characterizes the Internet is not adequate to guarantee strict re-
sponse time requirements of many new applications. Hence, the need for Quality
of Service (QoS) capabilities.
In order to develop techniques that allow to improve performance, it is im-
portant to understand and reduce the various sources of delay in the response
time experienced by end users. The delays introduced by all the components,
both hardware and software, that are involved in the execution of a web ser-
vice transaction are cumulative. Therefore, in order to decrease the end-to-end
response time it is necessary to improve all the individual component response
times in the chain, and primarily that of the slowest one.
A first effort should be devoted to the analysis of the workload characteristics
with the goal of identifying the causes of traffic fluctuations in the Internet.
Such fluctuations contribute to transient congestions in the network components
and therefore are the primary sources of the response time increase. At the
application level, it is known that the applications that contribute major portions
of the network traffic transmit their load in a highly bursty manner, which is a
cause of further congestion.
The complexity of the network structure and the behavior of the trans-
port/network protocols play a fundamental role in the propagation of the fluc-
tuations from the application level to the link level.
The complexity of the Internet infrastructure, from the network level up
to the application level, results in performance indexes characterized by high
variability and long–range dependence. Such features introduce new problems
in the analysis and design of networks and web applications, and many of the
past assumptions upon which web systems have been built are no longer valid.
Usual statistics as average and variance become meaningless in the presence of
heavy-tailedness and self-similarity.
The paper is organized as follows. In Sect. 2 we illustrate some of the main
sources of web delays: the complexity of the request path browser-server-browser
and the origin of the self-similarity property in the Internet traffic are analyzed.
In Sect. 3, experimental results concerning the heavy-tailedness properties of
end-to-end response times of some web sites are presented. Section 4 describes
few case studies that show how to measure and improve web user satisfaction.
Section 5 summarizes our contributions and concludes the paper.
2 Sources of Web Delays
One of the typical misconceptions related to the Internet is that the bandwidth
is the only factor limiting the speed of web services. Thus, with the diffusion of
broadband networks in the next few years, high performance will be guaranteed.
This conclusion is clearly wrong.
Indeed, although high bandwidth is necessary for the efficient download of
large files such as video, audio and images, as more and more services are offered
on the Internet, a small end-to-end response time, i.e., the overall waiting time
160 P. Cremonesi and G. Serazzi
that end users experience, is becoming a requirement. The main components

that contribute to the end-to-end response time fall into three categories: client
side, server side, and network architecture and protocols.
On the client side, the browser parameters, such as the number of simul-
taneous TCP connections, the page cache size, the memory and computation
requirements of the code downloaded from the server (applet, java scripts, plug-
ins, etc.), and the bandwidth of the access network, must be taken into account.
On the server side, among the factors that should be carefully analyzed are:
the behavior of the server performance with respect to the forecast workload
increase, some of the application architecture parameters (e.g., multithreading
level, maximum number of opened connections, parallelism level), the CPU and
I/O power available (as demand for dynamic content of pages increases, more
and more computational power and I/O performance/capacity are required),
and the bandwidth of the access network.
The Internet network architecture is characterized by the large number of
components that a user request visits along the path between the browser and
the web server and back. Each of these components, both hardware and software,
introduces some delay that contribute to the creation of the end-to-end response
time. The global throughput of a connection between a browser and a web server,
and back, corresponds to the throughput of the slowest component in the path.
This component, referred to as bottleneck, is likely in a congestion state and
causes severe performance degradation.
Two are the factors that contribute to the congestion of a component: the
frequency of the arriving requests and the service time required for the complete
execution of a request. These two factors are related to the characteristics of the
workload and to the characteristics of the component. Thus, concentrating only
on the bandwidth with the objective of providing a small end-to-end response
time it is not enough.
In this section we will analyze some of the most important sources of web
delays, namely, the complexity of the path between browsers and servers, and
the self-similarity characteristic of Internet traffic.
2.1 The Complexity of the Request Path
A current trend in the infrastructure of the Internet is the increase of the com-
plexity of the chain of networks between a client and a server, that is, the path
in both directions between the user browser and the web server, also referred
to as request path. From the instant a request is issued by a browser, a series
of hardware components and software processes are involved in the delivery of
the request to the server. Hardware components comprise routers, gateways, in-
termediate hosts, proxy cache hosts, firewalls, application servers, etc. Software
processes involved in the delivery of a request refer to the protocol layers (HTTP,
TCP, IP, and those of lower layers), the routing algorithms, the address transla-
tion process, the security controls, etc. In Fig. 1, a simplified model of a request
path between a user browser and a web server is illustrated.
In te r n a tio n a l
N a tio n a l B a c k b o n e N a tio n a l
In te rn e t In te rn e t w e b
S e r v ic e S e r v ic e
u s e r S e r v ic e S e r v ic e s e rv e r
P r o v id e r P r o v id e r
b ro w s e r P r o v id e r P r o v id e r
ro u te rs , g a te w a y s , h o s ts
s w itc h e s , fir e w a lls
Fig. 1. Simplified model of a request path from a user browser to a web server.
As a consequence, the end-to-end chain of components, or hops, between a

client browser and a web server (including also the return path) may be very
long. They can be subdivided into the hops located in the access networks (ISP
and NSP), both on the client and server sides, the hops located in the server farm
or the corporate intranet, if not used, and the hops in the Internet infrastructure
(mix among national and international carriers, international backbone, routers,
etc.). Several statistics collected at business users show that the average number
of hops is increasing with the popularity of Internet, reaching an average value
of about 15-20. The trend is clearly towards an increase of the number of hops
since the architectures of the Internet and of the intranets and server farms are
becoming more and more complex due to various new functions to be executed
(e.g., security controls, complex back–end applications).
The majority of the components of a request path operate on a store-and-
forward basis, i.e., the incoming requests are queued waiting to use the resource,
and thus are potential source of delays. The request path browser-server-browser,
represented in Fig. 1, can be modeled as an open queueing network, i.e., a network
of interconnected queues characterized by more sources of arriving requests and
by the independence of the arrival processes from the network conditions. Queue-
ing networks are well suited for representing resource contention and queueing
for service (see, e.g., [6]). In an open network the number of customers can grow
to infinity depending on the saturation condition of the bottleneck resource. As-
suming that the distributions of request interarrival times and service times at
all resources are exponential and that the scheduling discipline at each resource
is FCFS, a typical characteristic of such networks [8] is that each resource be-
haves like an independent M/M/1 queue. In this case the response time tends
to a vertical asymptote as the load increases until the resource saturation. The
rate of increase of a component response time R, normalized with respect to the
square of service time S, as a function of the request arrival rate λ is given by:
dR 1 1 1
= = (1)
dλ S 2 (1 − λS)2 (1 − U )2
where U = λS is the utilization of the component, i.e., the proportion of time
the component is busy. As it can be seen from Fig. 2, when the arrival rate is
such that the utilization is greater than 80%, the rate of increase of the response
time is extremely high, i.e., the resource is congested and the delay introduced
in the packet flow is huge.
4 0 0
3 5 0
r e s p o n s e tim e ( r a te o f in c r e a s e )
3 0 0
2 5 0
2 0 0
1 5 0
1 0 0
5 0
0
0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1
u tiliz a tio n
Fig. 2. Rate of increase of response time of a component (normalized with respect to

the square of service time) vs component utilization.
The burstiness of Internet traffic produces a variability in the request inter-

arrival time much higher than the one exhibited by traffic following exponential
assumption, making the actual situation even worse than the one described by
(1). As a consequence, the probability of finding a congested component along
a request path is much higher than in usual telecommunication environments.
Thus, the further away the client browser is from the web server, the greater
the likelihood that one, or more, components of the path is found congested.
Let p be the probability that a component is in a congestion state and let n be
the number of (independent) components along the request path, including the
return path, the probability of finding exactly i congested components is

n
pi (1 − p)n−i , i = 0, 1, 2, ..., n (2)
i
and the probability of finding at least one component congested is 1−(1−p)n . In

Fig. 3, the probability of finding one or more congested components (i.e., of a very
high response time experienced by a user) along a request path browser-server-
browser as a function of the congestion probability p of a single component and
the path length n is reported. The maximum number of components considered
in a path is 15, a conservative value if compared to the actual average situation

encountered in the Internet. As it can be seen, with a probability of congestion
of a component p = 0.01, i.e., 1%, and a path length n = 15 hops, the probability
of finding at least one component congested is 13.9%, a clearly high value.
1
p r o b . o f h ig h r e s p o n s e tim e
0 .8
0 .6
0 .1 3 9
0 .4
0 .2
0
1 5
1 0
5 0 .2
0 .1 5
0 .1
p a th le n g th 0 0 .0 5
0
p r o b . o f c o m p . c o n g e s tio n
Fig. 3. Probability of finding one or more congested components along a request path
browser-web server-browser (i.e., of a very high response time) as a function of the
congestion probability p of a single component and the path length n.
2.2 The Self-Similarity of Web Traffic
The World Wide Web is a more variable system than it was expected. Several
analyses show that the limited variability notion widely used for several decades
in telecommunication modelling, i.e., the assumption of the Poisson nature of
traffic related phenomena, has very little in common with Internet reality. Ev-
idence is provided by the fact that the behavior of the aggregated traffic does
not become less bursty as the number of sources increases [12].
More precisely, models in which the exponential distribution of the variables
is assumed are not able to describe Internet conditions in which the variables
(e.g., duration of the sessions, end-to-end response times, size of downloaded
files) show a variability encompassing several time scales. The high temporal
variability in traffic processes is captured assuming the long-term dependence of
the corresponding variables and the heavy-tail distribution of their values (i.e.,
distribution whose tail declines according to a power-law).
A distribution is heavy-tailed (see, e.g., [5]) if its complementary cumulative
distribution 1 − F (x) decays slower than the exponential, i.e. if
lim eγx [1 − F (x)] → +∞ (3)

x→+∞
for all γ > 0. One of the simplest heavy-tailed distributions is the Pareto distri-
bution, whose probability density function f (x) and distribution function F (x)
are given by (see, e.g., [15]):
f (x) = α k α x−α−1 , F (x) = 1 − k α x−α , 0 < k ≤ x, α>0 (4)
where k is a positive constant independent of x and α represents the tail index.

If 1 < α < 2, the random variable has infinite variance, if 0 < α ≤ 1 the
random variable has infinite mean. Note that the first and second moments are
infinite only if the tail stretches to infinity, while in practice infinite moments
are exhibited as non-convergence of sample statistics.
An interesting property exhibited by the processes whose values follow heavy-
tailed distributions is the self-similar, or fractal-like, behavior, i.e., the behavior
of the variables is invariant over all time scales. The autocorrelation function of
self-similar time series declines like a power-law for large lags. As a consequence,
autocorrelations exist at all time scales, i.e., the high values of the tail of the
distribution occur with non-negligible probability and the corresponding traffic
is bursty.
The probability density functions (in log scale) of several Pareto random
variables, with different parameters α and k (4), are compared with the proba-
bility density function (dashed line) of an exponential variable in Fig. 4. All the
functions have the same mean value equal to one. As it can be seen, the tails of
the Pareto random variables are much higher than the one of the exponential
random variable.
Evidence of Internet traffic self-similarity is reported in several papers. This
type of behavior has been identified in high-speed Ethernet local area networks
[9], in Internet traffic [16], in the file sizes of the web servers and in the think
time of browsers [4], in the number of bytes in FTP transmissions [12], and in
several others variables.
As we showed, the self-similarity property of Internet traffic implies that
the values of the corresponding variables exhibit fluctuations over a wide range
of time scales, i.e., their variance is infinite. The peculiar nature of the load
generated at the application layer, the self-similarity and the heavy-tail charac-
teristics, propagates to lower layers affecting the behavior of the transport and
network protocols. This, in turns, induces a self-similarity behavior of the link
traffic negatively affecting network performance. The most important causes of
such a high variability and of its ubiquitous presence at all layers in the network
environment fall into three categories: the sources related ones, the request path
related ones, and the protocols related ones.
0
1 0
α = 1 .1 k = 0 .0 9
α = 1 .5 k = 0 .3 3
− 2
1 0
f(x )
− 4
1 0
α = 4 k = 0 .7 5
α = 3 k = 0 .6 6
− 6 α = 2 k = 0 .5
1 0
e x p o n e n tia l
− 8
1 0
0 5 1 0 1 5 2 0
x
Fig. 4. Probability density functions (in log scale) of several Pareto random variables,
with different parameters α and k, compared with the probability density function
(dashed line) of an exponential random variable; the mean value of all the functions is
one.
The activity of a typical Internet user can be regarded as a sequence of

active periods interleaved with idle periods. Observing the usage patterns of the
most significant Internet applications, like the retrieval/download/upload cycle
of web files using HTTP, the file transfer with FTP, and the send/receive process
of SMTP, their execution can be seen as a sequence of activity phases, during
which a given amount of data is transferred from one site to another, intermixed
with idle phases, when users analyze the downloaded objects (or pages) and type
a message or issue a new command, but no load is generated on the network.
Such a behavior favors the known burstiness of application data transmission.
Thus, a user can be modeled as a source of traffic that alternates between two
states identified with ON and OFF, respectively. ON/OFF sources are widely
used to model the workload generated by the users of the Internet. During the
ON periods, the source is active and data packets are sent on the network, i.e., a
burst of load is generated. During the OFF periods, no activity is performed. The
characteristics of the ON and OFF periods, e.g., average durations, distributions,
traffic generation rates, depend on the application considered. Typically, in the
ON periods the traffic is generated at constant rate and the lengths of ON and
OFF periods follow known distributions, that may differ from each other, having
finite or infinite variance. The very high, or infinite, variance of the input traffic
parameters is explained by the results of several empirical studies (see, e.g., [2])
that have shown the presence of self-similarity in the size distribution of web
files transferred over the network and thus of their transmission times.
At a more aggegated level than the one of a single source, the traffic generated
by a set of users can be modeled considering several ON/OFF sources sharing the
network resources. It has been shown [16] [17] that, under certain assumptions,
the superposition of many ON/OFF sources generates a process exhibiting the
long-term dependency characteristic. Thus, the corresponding model is able to
capture the self-similar nature of Internet traffic.
Another phenomenon that influences the origin of fluctuations of Internet
traffic (at a more macroscopic level than the one seen at single source level)
is related to the amount of correlation existing among the sources. Empirical
observations suggest the presence of traffic cycles on a temporal basis, among
which the daytime cycle is the most evident. The existence of such a cycle is
enough intuitive and is connected to office working hours and availability periods
of some on-line services (e.g., typically the traffic peaks during the morning and
the afternoon hours). The time difference across the globe may also generate
cycles with different periodicity. Other types of source correlations are generated
by the occurrence of special events (sport competitions, natural disasters, wars,
etc.).
As we have seen, the Internet is a network environment where load fluctu-
ations should be considered physiological rather than exceptional events. The
self-similarity characteristic of the load propagates its effects on all the network
layers, from the application to the link layer. As a consequence, transient con-
gestions may occur with non-negligible probability in each of the components
along the request path browser-server-browser (Sect.2.1). While the task of per-
formance optimization is relatively straightforward in a network with limited
load variability, it becomes significantly more complex in the Internet because of
transient congestions. The load imbalance in the resources, usually modeled as
an open network of queues (Fig. 1), of a request path will be extreme and will
grow as the load increases. Thus, the probability of finding a component subject
to transient congestion in a relatively long request path, e.g., of about 15 hops,
is consistent (Fig. 3).
When a fluctuation of traffic creates a congestion in a component (e.g., a
router) of an open network of queues, the performance degradation due to the
overload is huge since the asymptotes of the performance indices are vertical
(Fig. 2): the response time increases several orders of magnitude, the throughput
reaches saturation, and the number of customers at the congested component
tends to infinity.
This unexpected increase of response time triggers the congestion control
mechanism implemented in the TCP protocol in order to prevent the source of
traffic from overloading the network. Since the source uses a feedback control,
directly computed from the network or received from intermediate components,
to tune the load sent on the network, the increase of response time (in this
context usually referred to as round trip time) beyond a threshold value triggers
an immediate reduction of the congestion window size, thus a reduction of the
traffic input on the network. The throughput decreases suddenly and will increase
slowly according to the algorithm implemented by the TCP version adopted.
The various versions of TCP implement different congestion control mechanisms
inducing a different impact on network performance [11]. Clearly, this type of
behavior introduce further fluctuations in the throughput and, more generally,

in the indices capturing the traffic of the Internet.
3 Measurements of End-to-End Performance

In the previous section we have seen that there is a wide evidence of high vari-
ability and self-similarity in aggregate Internet traffic. In this section we will see
that this property is valid also for end-to-end performance.
3.1 Experiments
The monitoring system used to collect the data consists of a Java–based tool
WPET (Web Performance Evaluation Tool) developed at the Politecnico di Mi-
lano. WPET is composed by a set of agents for the collection of Internet per-
formance data. Each agent is an automated browser that can be programmed
to periodically download web pages and to measure several performance metrics
(e.g., download times). Each agent is connected to Internet through a different
connection type (e.g., ISDN, xDSL, cable, backbone), from different geograph-
ical locations (e.g., Rome, Milan) and through different providers. A WPET
agent can surf on a web site performing a set of complex operations, such as
fill a form, select an item from a list, follow a link. An agent can handle HTTP
and HTTPS protocols, session tracking (url-rewriting and cookies) and plug-ins
(e.g., flash animations, applets, activexes). For each visited page, the agent col-
lects performance data for all the objects in the page. For each object, several
performance metrics are measured: DNS lookup time, connection time, redirect
time, HTTPS handshake time, server response time, object download time, ob-
ject size, error conditions. All the data collected by the agents are stored in a
centralized database and analyzed in order to extract meaningful statistics.
3.2 Evidence of Heavy-Tail Distribution

Figure 5 shows the time required to download the home page of the MIT web
site (www.mit.edu). Measurements have been collected for 9 days (from March,
20th till March, 28th 2002) downloading the home page every 15 minutes with
a WPET agent located in Milan and connected to Internet directly through a
backbone. The upper part of the figure shows a sample of the download times.
In order to investigate the heavy-tail properties of the download times, a log-log
plot of the page time complementary cumulative distribution is shown in the
lower left part of Fig. 5.
This plot is a graphical method to check the heavy-tailedness property of
a sequence of data. If a good portion of the log-log complementary plot of the
distribution is well fitted by a straight line then the distribution hold the heavy-
tail property. The plot of Fig. 5 is well approximated by a straight line with
slope −3.2, indicating that the distribution is the Pareto one (4) with α = 3.2
[17].
w w w .m it.e d u
2 5
2 0
d o w n lo a d tim e ( s e c .)
1 5
1 0
0
2 0 /0 3 2 1 /0 3 2 2 /0 3 2 3 /0 3 2 4 /0 3 2 5 /0 3 2 6 /0 3 2 7 /0 3 2 8 /0 3
d a y
0
1 0 2 5
2 0
− 1
1 0 α = 3 .2
Y Q u a n tile s
1 5
1 − F (x )
− 2
1 0
1 0
5
− 3
1 0 0
2 3 4 5 0 5 1 0 1 5 2 0 2 5
1 0 1 0 1 0 1 0
x X Q u a n tile s
Fig. 5. Download times of the home page of the MIT web site (upper part). Log-Log
complementary plot of cumulative distribution F (x) (lower left). Quantile-quantile plot
of the estimated Pareto distribution vs. the real distribution (lower right).
While the log-log complementary distribution plot provides solid evidence for
Pareto distribution in a given data set, the method described above for producing
an estimate for α is prone to errors. In order to confirm the correctness of the
estimated parameter α we can use the quantile-quantile plot method (lower
right part of Fig. 5). The purpose of this plot is to determine whether two
samples come from the same distribution type. If the samples do come from the
same distribution, the plot will be linear. The quantile-quantile plot in Fig. 5
shows quantiles of the measured data set (x axis) versus the quantiles of a
Pareto distribution with tail parameter α = 3.2 (y axis). The plot confirms the
correctness of the results.
Figures 6 and 7 extend the analysis by comparing the download times of the
home pages of four web servers:
– Massachusetts Institute of Technology (www.mit.edu)
– Standford University (www.standford.edu)
– Google www.google.com)
– Altavista (www.altavista.com).
The four plots in both the figures show the log-log complementary cumulative
distributions (continuous lines), together with the approximating Pareto distri-
butions (dashed lines). The measurements of Fig. 6 have been collected with a
WPET agent running on a system directly connected on a backbone. The mea-
surements of Fig. 7 have been collected with an agent connected to the Internet
via an ADSL line. Both the agents were located in Milan. All the figures confirm
the heavy-tail property of end-to-end download times.
w w w .m it.e d u − α = 3 .2 w w w .s ta n d fo rd .e d u − α = 2 .6 6
0 0
1 0 1 0
− 1 − 1
1 0 1 0
1 − F (x )
1 − F (x )
− 2 − 2
1 0 1 0
− 3 − 3
1 0 1 0
− 2 0 2 − 1 0 1 2
1 0 1 0 1 0 1 0 1 0 1 0 1 0
w w w .g o o g le .c o m − α = 2 .3 3 w w w .a lta v is ta .c o m − α = 3 .1 1
0 0
1 0 1 0
− 1 − 1
1 0 1 0
1 − F (x )
1 − F (x )
− 2 − 2
1 0 1 0
− 3 − 3
1 0 1 0
0 1 2 − 1 0 1 2
1 0 1 0 1 0 1 0 1 0 1 0 1 0
Fig. 6. Log-Log complementary plots of the home page download times distribution of
four web sites measured from a backbone Internet connection. The real data distribution
(continuous line) and the approximated Pareto distribution (dashed line) are shown.
The estimated tail index α is reported on each plot.
It is interesting to observe that all the plots in Fig. 7 (ADSL connection) have
a lower value of α with respect to the corresponding plots in Fig. 6 (backbone
connection). We remember that lower values of α mean higher variability. This
suggests that slow client connections are characterized by high variability, be-
cause (i) the source of congestion is in the network, not in the client connection,
and (ii) the overhead of retransmissions is higher for slower client connections.
w w w .m it.e d u − α = 1 .9 4 w w w .s ta n d fo rd .e d u − α = 1 .8 4
0 0
1 0 1 0
− 1 − 1
1 0 1 0
1 − F (x )
1 − F (x )
− 2 − 2
1 0 1 0
− 3 − 3
1 0 1 0
− 1 0 1 2 − 2 0 2
1 0 1 0 1 0 1 0 1 0 1 0 1 0
w w w .g o o g le .c o m − α = 2 .1 3 w w w .a lta v is ta .c o m − α = 2 .6 1
0 0
1 0 1 0
− 1 − 1
1 0 1 0
1 − F (x )
1 − F (x )
− 2 − 2
1 0 1 0
− 3 − 3
1 0 1 0
0 1 2 − 2 0 2
1 0 1 0 1 0 1 0 1 0 1 0
d o w n lo a d tim e ( s e c .) d o w n lo a d tim e ( s e c .)
Fig. 7. Log-Log complementary plots of the home page download times distribution
of the same four web server of Fig. 6measured from an ADSL Internet connection.
The real data distribution (continuous line) and the approximated Pareto distribution
(dashed line) are shown. The estimated tail index α is reported on each plot.
3.3 Evidence of Self-Similarity

In Fig. 8 we use the wavelet-based method proposed by Abry and Veitch [1] for
the analysis of self-similar data and for the estimation of the associated Hurst
parameter (for a formal definition of self-similarity see, e.g., [1]). Here we recall
that the Hurst parameter H measures the degree of long-range dependence.
For self-similar phenomena its value is between 0.5 and 1, and the degree of
self-similarity increases as the Hurst parameter approaches 1. For short-range
dependent processes, H → 0.5. Abry and Veitch’s method utilizes the ability of
wavelets to localize the energy of a signal associated with each time-scale. It is
possible to study the scaling of a process by log-plotting the energy associated

with several time-scale: a signal which is self-similar will yield a linear plot for
the larger times scales. The slope m of the linear portion of the plot is related
to the Hurst parameter by the equation
m+1
H= (5)
2
Figure 8 shows the scaling properties of the MIT home page download times
plotted in Fig. 5. The wavelet energy (continuous line) is approximated with a
straight line (dashed line) with slope m = 0.90. According to (5), the measure-
ments are consistent with a self-similar process with Hurst parameter H = 0.95
(very close to 1).
H = 0 .9 5
3 2
3 1
3 0
2 9
e n e rg y
2 8
2 7
2 6
3 0 m in 1 h 2 h 4 h 8 h 1 6 h
tim e s c a le
Fig. 8. Scaling analysis of the download times of MIT web site home page. The wavelet
energy (continuous line) is approximated with a straight line (dashed line) with slope
0.90.
4 Improving Web Performance

For web sites that need to retain users beyond the first page there is a strong
motivation to reduce the delay between the browser click and the delivery of
the page content on the user’s screen. Although there are many reasons behind
poor web performance which are not due to the web server alone (e.g., low
bandwidth, high latency, network congestion), in this section we discuss some

remedial actions that can be taken in order to reduce the negative influence of
such factors on the user end-to-end response time.
In Sect. 4.1 an analysis of how long users are willing to wait for web pages
to download is described. Section 4.2 presents some case studies oriented to the
detection of performance problems and to the improvement of the end-to-end
performance of web sites.
4.1 User Satisfaction
User-perceived response time has a strong impact on how long users would stay at
a web site and on the frequency with which they return to the site. Acceptable
response times are difficult to determine because people’s expectations differ
from situation to situation. Users seem willing to wait varying amounts of time
for different types of interactions [13]. The amount of time a user is willing to
wait appears to be a function of the perceived complexity of the request. For
example, people will wait longer:
– for requests that they think are hard or time-consuming for the web site to
be performed (e.g. search engines);
– when there are no simple or valid alternatives to the visited web site (e.g.,
the overhead required to move a bank account increases the tolerance of
home banking users).
On the contrary, users will be less tolerant to long delays for web tasks that they
consider simple or when they know there are valid alternatives to the web site.
Selvidge and Chaparro [14] conducted a study to examine the effect of down-
load delays on user performance. They used delays of 1 second, 30 seconds, and
60 seconds. They found that users were less frustrated with the one-second delay,
but their satisfaction was not affected by the 30 seconds response times.
According to Nielsen, download times greater than 10 seconds causes user
discomfort [10]. According to a study presented by IBM researchers, a download
time longer than 30 seconds is considered too slow [7].
Studies on how long users would wait for the complete download of a web
page have been performed by Bouch, Kuchinsky and Bhatti [3]. They reported
good ratings for pages with latencies up to 5 seconds, and poor ratings for pages
with delays over 10 seconds. In a second study, they applied the incremental load
of web pages (with the banner first, text next and graphics last). Under these
conditions, users were much more tolerant of longer latencies. They rated the
delay as “good” with latencies up to 30 seconds. In a third study they observed
that, as users interact more with a web site, their frustration with downloading
delays seems to accumulate. In general, the longer a user interacts with a site
(i.e., the longer is the navigation path), the less delay he will tolerate.
In Fig. 9 we have integrated the results of these studies in order to identify two
thresholds for the definition of a user satisfaction. The thresholds are function
of the navigation step:
– the lower threshold (continuous line) identifies the acceptable experience:

users are always satisfied when web pages have a latency up to the lower
threshold, independently of the situation;
– the higher threshold (dashed line) identifies the unsatisfactory experience:
users will not tolerate longer latencies, independently of the other conditions.
3 0
a c c e p ta b le
u n a c c e p ta b le
2 5
2 0
d o w n l o a d t i m e ( s e c .)
1 5
1 0
0
1 2 3 4 5 1 0 2 0
n a v ig a tio n s te p
Fig. 9. User satisfaction as a function of the navigation steps. Users are always satisfied
with web pages whose download time is below the lower threshold (continuous line).
Users will not tolerate latencies longer than the upper threshold (dashed line).
4.2 Optimization Issues

The possible sources of unsatisfactory end-to-end delays fall into three categories:
– Network problems: high delays are originated by network problems along
the path connecting the user to the web site (such problems can be classified
into insufficient bandwidth at the client/web site or congestions in a network
component).
– Site problems: one or more components of the web site are under-dimensioned
(e.g., web server, back-end systems, firewall, load balancer).
– Complexity problems: page content and web applications are not optimized
(e.g., too many objects in a page, usage of secure protocols with high over-
head to deliver non-sensitive information, low-quality application servers).
Figure 10 shows the performance of an Italian web site. Measurements have

been collected for two weeks (from December, 23th 2001 to January, 5th 2002)
downloading the home page every 30 minutes during work hours (8.00–20.00)
with three WPET agents located in Milan. Each agent was connected to the
Internet with a 64kbit ISDN line with a different provider. Each bar in the
figure is the median of the measurements collected in one day. The three main
components of the page download time, namely the connection time (i.e., the
round-trip or network latency time), the response time of the server and the
transfer time (or transmission time) are reported.
It is evident that the web site has performance problems because the average
download time for the home page is higher than 30 seconds (i.e., the maximum
tolerance threshold) for all the 14 days. The most probable source of problems
resides in the network at the web side. In fact, the average connection time,
which measures the round-trip time of one packet between the client and the
server, is about 10 seconds, while it should be usually smaller than one second.
6 0
c o n n e c tio n
re s p o n s e
tra n s fe r
5 0
4 0
3 0
2 0
1 0
0
2 3 − d e c 3 0 − d e c 0 6 − ja n 1 3 − ja n 2 0 − ja n
d a te
Fig. 10. End-to-end response time for the download of the home page of a web site
with network problems. The three basic components, the TCP/IP connection time, the
server response time and the page transfer time are shown.
Figure 11 shows the performance of a second web site. Measurements have

been collected according to the same scheme of the previous experiment. The
performance of this web site are satisfactory, although not excellent. Download
time is always lower than 30 seconds but higher than 10 seconds. Connection
time is always around 1 second. However, there is still space for optimizations.
In fact, the average response time, which measures the time required for the web
server to load the page from disk (or to generate the page dynamically), is about
10 seconds in most of the cases. By adding new hardware or improving the web
application, the response time should be reduced to 1–2 seconds.
3 0
c o n n e c tio n
re s p o n s e
tra n s fe r
2 5
2 0
1 5
1 0
0
2 3 − d e c 3 0 − d e c 0 6 − ja n 1 3 − ja n 2 0 − ja n
d a te
Fig. 11. End-to-end response time for the download of the home page of a web site
with server problems. The three basic components, the TCP/IP connection time, the
server response time and the page transfer time are shown.
Figure 12 presents an example of a complexity problem. The figure shows a

page component plot. Each vertical bar in the plot represents the download time
of a single object in the page. The bar in the lower-left corner is the main HTML
document. All the other bars are banners, images, scripts, frames, etc. The time
required to download the whole page is measured from the beginning of the first
object to the end of the last object to complete the download. Together with the
download time, the figure shows the dimension of each object. The measurements
have been collected with an agent connected to a backbone. The object pointed
out by the arrow is a small one (about 1.5 KByte) but it is the slowest object
in the page (it requires almost 20 seconds for its complete download). Without
this object the whole page would be received in less than 8 seconds. This object
is a banner downloaded from an external ad-server which is poorly connected to

the Internet. Because of the banner, the users experience a long delay.
A possible way for the web site to improve the performance experienced by
the end user is to download off-line the banners from the ad-server and to cache
them locally into the web server.
2 5 1 6
1 4
2 0
o b je c t s iz e 1 2
1 0
1 5
K B y te
8
1 0 6
4
5
2
o b je c ts
Fig. 12. Page components plot. Vertical bars represent the download times for all the
objects in the page. The line indicates the dimension of each object. The object pointed
by the arrow is a banner.
Figure 13 is another example of complexity problem. Although the overall

size of the page is rather small (less than 100 KBytes) the page is composed of
more than 50 small different objects. The overhead introduced with the download
of each object (e.g., DNS lookup time, connection time, response time) makes
more convenient for a web site to have pages with few big objects than pages
with many small objects.
5 Conclusions
In this paper we have analyzed the origins of the high fluctuations in web traf-
fic. The sources of these fluctuations are located into the characteristics of the
applications, the complexity of the network path connecting the web user to the
web server, the self-similarity of web traffic (file sizes and user think times), and
the congestion control mechanism in the TCP/IP protocol. Empirical evidence
of self-similar and heavy-tail features in measured end-to-end web site perfor-
mance is provided. We have integrated this technical knowledge with the results
of recent studies aimed at determining the effects of long download delays on
users satisfaction. We have showed that users satisfaction can be modelled with
2 5 3 0
2 5
2 0
o b je c t s iz e
2 0
1 5
K B y te
1 5
1 0
1 0
5
5
o b je c ts
Fig. 13. Page components plot. Vertical bars represent the download times for all the
objects in the page. The line indicates the dimension of each object.
two thresholds. Simple guidelines for the detection of web performance problems
and for their optimization are also presented.
References
1. Abry, P. and Veitch, D.: Wavelet analysis of long-range dependent traffic. IEEE
Trans. on Information Theory 44 (1998) 2–15.
2. Barford, P., Bestavros, A., Bradley, A., Crovella, M.E.: Changes in Web Client Ac-
cess Patterns: Characteristics and Caching Implications. World Wide Web Journal
2 (1999) 15–28.
3. Bhatti, N., Bouch, A., Kuchinsky, A.: Integrating User–Perceived Quality into Web
Server Design. Proc. of the 9th International World-Wide Web Conference. Elsevier
(2000) 1–16.
4. Crovella, M.E., Bestavros, A.: Self-Similarity in World Wide Web traffic evidence
and possible causes. IEEE/ACM Trans. on Networking 5 (1997) 835–846.
5. Feldmann, A., Whitt. W.: Fitting mixtures of exponentials to long-tail distributions
to analyze network performance models. Performance Evaluation 31 (1998) 245–
279.
6. Haverkort, B.R.: Performance of Computer Communication System: A Model-
based Approach. Wiley, New York (1998).
7. IBM: Designed for Performance.
http://www.boulder.ibm.com/wsdd/library/techarticles/hvws/perform.html
8. Jackson, J.R.: Network of waiting lines. Oper. Res. 5 (1957) 518–521.
9. Leland, W.E., Taqqu, M.S., Willinger, W., Wilson, D.V.: On the Self-Similar Na-
ture of Ethernet Traffic. IEEE/ACM Trans. on Networking 2 (1994), 1–15.
10. Nielsen, J.: Designing Web Usability. New Riders (2000).
11. Park, K., Kim, G., Crovella, M.E.: On the Effect of Traffic Self-similarity on Net-
work Performance. Proc. of the SPIE International Conference on Performance
and Control of Network Systems (1997) 296–310.
12. Paxon, V., Floyd, S.: Wide area traffic: The failure of Poisson modeling.
IEEE/ACM Trans. on Networking 3 (1995) 226–244.
13. Ramsay, J., Barbesi, A., Preece, J.: A psychological Investigation of Long Retrieval
Times on the World Wide Web. Interacting with Computers 10 (1998) 77–86.
14. Selvidge, P.R., Chaparro, B., Bender, G.T.: The World Wide Wait: Effects of
Delays on User Performance. Proc. of the IEA 2000/HFES 2000 Congress (2000)
416–419.
15. Trivedi, K.S.: Probability and Statistics with Reliability, Queueing and Computer
Science Applications. Wiley, New York (2002).
16. Willinger, W., Paxon, V., Taqqu, M.S.: Self-Similarity and Heavy-Tails: Structural
Modeling of Network Traffic. In A Practical Guide To Heavy Tails: Statistical
Techniques and Applications. R.Adler, R.Feldman and M.Taqqu Eds., Birkhauser,
Boston (1998) 27–53.
17. Willinger, W., Taqqu, M.S., Sherman, R., Wilson, D.V.: Self-Similarity Through
High-Variability: Statistical Analysis of Ethernet LAN Traffic at the Source Level.
IEEE/ACM Trans. on Networking 5 (1997) 71–86.
B e n c h m a r k in g
R e in h o ld W e ic k e r
F u jits u S ie m e n s C o m p u te rs , 3 3 0 9 4 P a d e rb o rn , G e rm a n y
r e i n h o l d . w e i c k e r @ f u j i t s u - s i e m e n s . c o m
A b str a c t. A fte r a d e fin itio n (lis t o f p ro p e rtie s ) o f a b e n c h m a rk , th e m a jo r

b e n c h m a rk s c u rre n tly u s e d a re c la s s ifie d a c c o rd in g to s e v e ra l c rite ria :
O w n e rs h ip , d e fin itio n o f th e b e n c h m a rk , p ric in g , a re a c o v e re d b y th e
b e n c h m a rk . T h e S P E C O p e n S y s te m s G ro u p , T P C a n d S A P b e n c h m a rk s a re
d is c u s s e d in m o re d e ta il. T h e u s e o f b e n c h m a rk s in a c a d e m ic r e s e a rc h is
d is c u s s e d . F in a lly , s o m e c u rre n t is s u e s in b e n c h m a rk in g a re lis te d th a t u se rs o f
b e n c h m a rk re s u lts s h o u ld b e a w a re o f.
1 I n tr o d u c tio n : U s e , D e fin itio n o f B e n c h m a r k s
A p a n e l d is c u s s io n o n th e o c c a s io n o f th e A S P L O S -III sy m p o s iu m in 1 9 8 9 h a d th e
title “ F a ir B e n c h m a rk in g – A n O x y m o ro n ? ” T h e e v e n t lo o k e d so m e w h a t s tra n g e
a m o n g a ll th e o th e r, te c h n ic a lly o rie n te d p re s e n ta tio n s a t th e c o n fe re n c e . T h is
a n e c d o ta l e v id e n c e in d ic a te s th e u n iq u e s ta tu s th a t b e n c h m a rk in g h a s in th e c o m p u te r
a re a : B e n c h m a rk in g is , o n th e o n e h a n d , a h ig h ly te c h n ic a l e n d e a v o r. B u t it a ls o is ,
a lm o s t b y d e fin itio n , re la te d to c o m p u te r m a rk e tin g .
W e s e e th a t b e n c h m a rk s a re u s e d in th re e a re a s :
1 . C o m p u te r c u s to m e rs a s w e ll a s th e g e n e ra l p u b lic u s e b e n c h m a rk re s u lts to
c o m p a re d iffe re n t c o m p u te r s y s te m s , b e it a v a g u e p e rc e p tio n o f p e r-
fo rm a n c e o r a s p e c ific p u rc h a s in g d e c is io n . C o n s e q u e n tly , m a rk e tin g
d e p a rtm e n ts o f h a rd w a re a n d s o ftw a re v e n d o rs d riv e to a la rg e d e g re e w h a t
h a p p e n s in b e n c h m a rk in g . A fte r a ll, th e d e v e lo p m e n t o f g o o d b e n c h m a rk s is
n e ith e r e a s y n o r c h e a p , a n d it is m o s tly th e c o m p u te r v e n d o rs ’ m a rk e tin g
d e p a rtm e n ts th a t in th e e n d , d ire c tly o r in d ire c tly , p a y th e b ill.
2 . D e v e lo p e rs in h a rd w a re o r s o ftw a re d e p a rtm e n ts u s e b e n c h m a rk s to o p tim iz e
th e ir p ro d u c ts , to c o m p a re th e m w ith a lte rn a tiv e o r c o m p e titiv e d e s ig n s .
Q u ite o fte n , d e s ig n d e c is io n s a re m a d e o n th e b a s is o f s u c h c o m p a ris o n s .
3 . F in a lly , b e n c h m a rk s a re h e a v ily u s e d in c o m p u te r re s e a rc h p a p e rs b e c a u s e
th e y a re re a d ily a v a ila b le , a n d th e y c a n b e e x p e c te d to b e e a s ily p o rta b le .
T h e re fo re , w h e n q u a n tita tiv e c la im s a re m a d e in re s e a rc h p a p e rs , th e y a re
o fte n b a s e d o n b e n c h m a rk s .
A lth o u g h th e firs t u s a g e o f b e n c h m a rk s (c o m p a ris o n o f e x is tin g c o m p u te rs ) is fo r
m a n y th e p rim a ry u s a g e , th is p a p e r trie s to to u c h a ll th re e a s p e c ts .
W h a t m a k e s a p ro g ra m a b e n c h m a rk ? W e c a n s a y th a t a b e n c h m a rk is a s ta n d a rd iz e d
p ro g ra m (o r d e ta ile d s p e c ific a tio n s o f a p ro g ra m ) d e s ig n e d o r s e le c te d to b e ru n o n
M .C . C a lz a r o s s a a n d S . T u c c i ( E d s .) : P e r f o r m a n c e 2 0 0 2 , L N C S 2 4 5 9 , p p . 1 7 9 – 2 0 7 , 2 0 0 2 .
© S p rin g e r-V e rla g B e rlin H e id e lb e rg 2 0 0 2
1 8 0 R . W e ic k e r
d iffe re n t c o m p u te r s y s te m s , w ith th e g o a l o f a fa ir c o m p a ris o n o f th e p e rfo rm a n c e o f

th o s e s y s te m s . A s a c o n s e q u e n c e , b e n c h m a rk s a re e x p e c te d to b e
1 . P o rta b le : It m u s t b e p o s s ib le to e x e c u te th e p ro g ra m o n d iffe re n t c o m p u te r
s y s te m s .
2 . F a ir: E v e n p o rta b le p ro g ra m s m a y h a v e a c e rta in b u ilt-in b ia s fo r a s p e c ific
s y s te m , th e s y s te m th a t th e y h a v e b e e n o rig in a lly w ritte n fo r. B e n c h m a rk s
a re e x p e c te d to m in im iz e s u c h a b ia s .
3 . R e le v a n t, w h ic h ty p ic a lly m e a n s “ re p re s e n ta tiv e fo r a re le v a n t a p p lic a tio n
a re a ” : A p e rfo rm a n c e c o m p a ris o n m a k e s n o s e n s e if it is b a s e d o n s o m e
e x o tic p ro g ra m th a t m a y b e p o rta b le b u t h a s n o re la tio n to a re le v a n t
a p p lic a tio n a re a . T h e b e n c h m a rk m u s t p e rfo rm a ta s k th a t is in s o m e
id e n tifia b le w a y re p re s e n ta tiv e fo r a b ro a d e r a re a .
4 . E a s y to m e a s u re : It h a s o fte n b e e n s a id th a t th e b e s t b e n c h m a rk is th e
c u s to m e r’s a p p lic a tio n its e lf. H o w e v e r, it o fte n w o u ld b e p ro h ib itiv e ly
e x p e n s iv e to p o rt th is a p p lic a tio n to e v e ry s y s te m o n e is in te re s te d in . T h e
n e x t b e s t th in g is a s ta n d a rd b e n c h m a rk a c c e p te d b y th e re le v a n t p la y e rs in
th e fie ld . B o th c u s to m e rs a n d d e v e lo p e rs a p p re c ia te it if b e n c h m a rk re s u lts
a re a v a ila b le fo r a la rg e n u m b e r o f s y s te m s .
5 . E a s y to e x p la in : It h e lp s th e a c c e p ta n c e o f a b e n c h m a rk if re a d e rs o f
b e n c h m a rk re s u lts h a v e s o m e fe e lin g a b o u t th e m e a n in g o f th e re s u lt m e tric s .
S im ila rly , a b e n c h m a rk is o fte n e x p e c te d to d e liv e r its re s u lt a s o n e “ s in g le
fig u re o f m e rit” . E x p e rts c a n a n d s h o u ld lo o k a t a ll d e ta ils o f a re s u lt b u t it is
a fa c t o f life th a t m a n y re a d e rs o n ly c a re a b o u t o n e n u m b e r, a n u m b e r th a t
s o m e h o w c h a ra c te riz e s th e p e rfo rm a n c e o f a p a rtic u la r s y s te m .
I n h is “ B e n c h m a r k H a n d b o o k ” [ 3 ] , J im G r a y lis ts a n o th e r p r o p e r ty : S c a la b ility , i.e .
th e b e n c h m a rk s h o u ld b e a p p lic a b le to s m a ll a n d la rg e c o m p u te r s y s te m s . W h ile th is
is a d e s ir a b le p r o p e r ty , s o m e b e n c h m a r k s ( e .g ., B A P C o ) a r e lim ite d to c e r ta in c la s s e s
o f c o m p u te rs , a n d s till a re u s e fu l b e n c h m a rk s .
W e w ill c o m e b a c k to th e s e g o a ls d u rin g th e fo llo w in g p re s e n ta tio n s o f s o m e

im p o rta n t b e n c h m a rk s . N o t s u rp ris in g ly , s o m e o f th e g o a ls c a n b e in c o n flic t w ith
e a c h o th e r.
2 O v e r v ie w , C la s s ific a tio n o f B e n c h m a r k s
T h e b e n c h m a rk s th a t h a v e b e e n o r a re w id e ly u s e d c a n b e c la s s ifie d a c c o rd in g to
se v e ra l c rite r ia . P e rh a p s th e b e s t s ta rtin g p o in t is “ W h o o w n s / a d m in is te rs a
b e n c h m a rk ? ” It tu rn s o u t th a t th is a ls o ro u g h ly c o rre s p o n d s to a c h ro n o lo g ic a l o rd e r in
th e h is to ry o f b e n c h m a rk in g .
2 .1 C la s s ific a tio n b y B e n c h m a r k O w n e r s h ip , H is to r y o f B e n c h m a r k s
In e a rlie r y e a rs , b e n c h m a rk s w e re a d m in is te re d b y s in g le a u th o rs , th e n in d u s try
a s s o c ia tio n s to o k o v e r. In th e la s t y e a rs , b e n c h m a rk s b e c a m e p o p u la r th a t a re
a d m in is te re d b y im p o rta n t s o ftw a re v e n d o rs .
B e n c h m a rk in g 1 8 1
2 .1 .1 I n d iv id u a l A u th o r s , C o m p le te P r o g r a m s
T h e firs t b e n c h m a rk s w e re p u b lis h e d b y in d iv id u a l a u th o rs a n d h a v e , a fte r in itia l

p u b lic a tio n , s p re a d b a s ic a lly th ro u g h “ w o rd o f m o u th ” . O fte n , th e ir p o p u la rity w a s a
s u rp ris e fo r th e a u th o r. A m o n g th o s e b e n c h m a rk s a re
T a b le 1 . S in g le -a u th o r c o m p le te b e n c h m a rk s
N a m e a n d A u th o r(s ) Y e a r L a n g u a g e C o d e s iz e , C h a ra c te riz a tio n
in b y te
W h e ts to n e 1 9 7 6 A L G O L 6 0 / 2 ,1 2 0 S y n th e tic
(C u rn o w /W ic h m a n ) F o rtra n N u m e ric a l C o d e ,
F P -in te n s iv e
L in p a c k 1 9 7 6 F o rtra n (In n e r lo o p :) P a c k a g e
(J. D o n g a rra ) 2 3 0 L in e a r A lg e b ra
N u m e ric a l C o d e ,
F P -in te n s iv e
D h ry s to n e 1 9 8 4 A d a / 1 ,0 4 0 S y n th e tic
(R . W e ic k e r) P a sc a l / S y s te m -ty p e c o d e ,
C in te g e r o n ly
A d e ta ile d o v e rv ie w o f th e s e th re e b e n c h m a rk s , w ritte n a t a b o u t th e p e a k o f th e ir
p o p u la rity , c a n b e fo u n d in [1 1 ]. A m o n g th e s e th re e b e n c h m a rk s , o n ly L in p a c k h a s
re ta in e d a c e rta in im p o rta n c e , m a in ly th ro u g h th e p o p u la r “ T o p 5 0 0 ” lis t
( w w w .to p 5 0 0 .o r g ) . I t m u s t b e s ta te d , a n d th e a u th o r, J a c k D o n g a rra , h a s
a c k n o w le d g e d , th a t it re p re s e n ts ju s t “ o n e a p p lic a tio n ” (s o lu tio n o f a s y s te m o f lin e a r
e q u a tio n s w ith a d e n s e m a trix ), re s u ltin g in “ o n e n u m b e r” . O n th e o th e r h a n d , th is
w e a k n e s s c a n tu rn in to a s tre n g th : T h e re is h a rd ly a n y c o m p u te r s y s te m in th e w o rld
fo r w h ic h th is b e n c h m a rk h a s n o t b e e n ru n ; th e re fo re m a n y re s u lts a re a v a ila b le . T h is
h a s le a d to th e e ffe c t th a t s y s te m s a re c o m p a re d o n th is b a s is , w h ic h w ill n e v e r ru n , in
re a l life , s c ie n tific -n u m e ric c o d e lik e L in p a c k .
2 .1 .2 I n d iv id u a l A u th o r s , M ic r o b e n c h m a r k s
T h e te rm “ m ic ro b e n c h m a rk ” is u s e d fo r p ro g ra m p ie c e s th a t in te n tio n a lly h a v e b e e n
c o n s tru c te d to te s t o n ly o n e p a rtic u la r fe a tu re o f th e s y s te m u n d e r te s t; th e y d o n o t
c la im to b e re p re s e n ta tiv e fo r a w h o le a p p lic a tio n a re a . H o w e v e r, th e fe a tu re th a t th e y
te s t is im p o rta n t e n o u g h th a t s u c h a s p e c ia liz e d te s t is in te re s tin g . T h e b e s t-k n o w n
e x a m p le , a n d a n o fte n -u s e d o n e , is p ro b a b ly J o h n M c C a lp in ’s “ S tre a m ” b e n c h m a rk
( w w w .c s .v ir g in ia .e d u /s tr e a m ) . I t m e a s u re s “ s u s ta in a b le m e m o ry b a n d w id th a n d th e
c o rre s p o n d in g c o m p u ta tio n ra te fo r s im p le v e c to r k e rn e ls ” [7 ]. T h e o n c e p o p u la r
“ lm b e n c h ” b e n c h m a r k ( w w w .b itm o v e r .c o m /lm b e n c h ) , c o n s is tin g o f v a r io u s s m a ll
p ro g ra m s e x e c u tin g in d iv id u a l U n ix s y s te m c a lls , s e e m s to b e n o lo n g e r a c tiv e ly
p u rs u e d b y its a u th o r.
1 8 2 R . W e ic k e r
2 .1 .3 B e n c h m a r k s O w n e d a n d A d m in is te r e d b y I n d u s tr y A s s o c ia tio n s
A fte r th e s u c c e s s o f s o m e s m a ll s in g le -a u th o r b e n c h m a rk s in th e 1 9 8 0 ’s , it b e c a m e
e v id e n t th a t s m a ll s in g le -a u th o r b e n c h m a rk s lik e D h ry s to n e w e re in s u ffic ie n t to
c h a ra c te riz e th e e m e rg in g la rg e r a n d m o re s o p h is tic a te d s y s te m s . F o r e x a m p le ,
b e n c h m a rk s w ith a s m a ll w o rk in g s e t c a n n o t a d e q u a te ly m e a s u re th e e ffe c t o f
m e m o ry h ie ra rc h ie s (m u lti-le v e l c a c h e s , m a in m e m o ry ). A ls o , s m a ll b e n c h m a rk s c a n
e a s ily b e s u b je c t to ta rg e te d c o m p ile r o p tim iz a tio n s . T o s a tis fy th e n e e d fo r
b e n c h m a rk s th a t a re la rg e r a n d c o v e r a b ro a d e r a re a , te c h n ic a l re p re s e n ta tiv e s fro m
v a rio u s c o m p u te r m a n u fa c tu re rs fo u n d e d in d u s try a s s o c ia tio n s th a t d e fin e
b e n c h m a rk s , s e t ru le s fo r m e a s u re m e n ts , a n d re v ie w a n d p u b lis h re s u lts .
T a b le 2 . B e n c h m a rk s o w n e d b y in d u s try a s s o c ia tio n s
B e n c h m a rk S in c e U R L L a n g u a g e A p p lic a tio n S y s te m s
g ro u p o f b e n c h - a re a ty p ic a lly
m a rk s te s te d
G P C / 1 9 8 6 w w w .s p e c .o r g / C G ra p h ic s W o rk -
S P E C G P C g p c p ro g ra m s s ta tio n s
P e rf e c t / 1 9 8 7 w w w .s p e c .o r g / F o rtra n N u m e ric a l S u p e rc o m -
S P E C H P G h p g p ro g ra m s p u te rs
S P E C C P U 1 9 8 8 w w w .s p e c .o r g / C , C + + , M ix e d W o rk s ta tio n
o sg /c p u F o rtra n p ro g ra m s s,
S e rv e rs
T P C 1 9 8 8 w w w .tp c .o r g S p e c ific a - O L T P D a ta b a s e
tio n o n ly p ro g ra m s S e rv e rs
B A P C o 1 9 9 1 w w w .b a p c o .c o m O b je c t P C P C s
C o d e A p p lic a tio n
s
S P E C 1 9 9 2 w w w .s p e c .o r g / C (d riv e r) S e le c te d S e rv e rs
S y s te m o sg s y s te m
fu n c tio n s
E E M B C 1 9 9 7 w w w .e e m b c .o r g C M ix e d E m b e d d e d
p ro g ra m s p ro c e sso rs
A m o n g th e s e b e n c h m a rk in g g ro u p s , S P E C s e e m s to b e s o m e k in d o f a ro le m o d e l:
S o m e la te r a s s o c ia tio n s (B A P C o , E E M B C , S to ra g e P e rfo rm a n c e C o u n c il) h a v e
fo llo w e d , to a s m a lle r o r la rg e r d e g re e , S P E C ’s a p p ro a c h in th e d e v e lo p m e n t a n d
a d m in is tra tio n o f b e n c h m a rk s . A ls o , s o m e o ld e r b e n c h m a rk in g g ro u p s lik e th e
“ P e rfe c t” o r G P C g ro u p s d e c id e d to u s e S P E C ’s e s ta b lis h e d in fra s tru c tu re fo r re s u lt
p u b lic a tio n ) a n d to c o n tin u e th e ir e ffo rts a s a s u b g ro u p o f S P E C . T h e la te s t e x a m p le
is th e E C P e rf b e n c h m a rk in g g ro u p , w h ic h is c u rre n tly c o n tin u in g its e ffo rt fo r a “ J a v a
A p p lic a tio n S e rv e r” b e n c h m a rk w ith in th e S P E C J a v a s u b c o m m itte e .
2 .1 .4 B e n c h m a r k s O w n e d a n d A d m in is te r e d b y M a jo r S o ftw a r e V e n d o r s
D u rin g th e la s t d e c a d e , b e n c h m a rk s b e c a m e p o p u la r th a t w e re d e v e lo p e d b y s o m e
m a jo r s o ftw a re v e n d o rs . O fte n , th e s e v e n d o rs a re a s k e d a b o u t s iz in g d e c is io n s : H o w
m a n y u s e rs w ill b e s u p p o rte d o n a g iv e n s y s te m , ru n n in g a s p e c ific a p p lic a tio n
p a c k a g e ? T h e re fo re , th e v e n d o r ty p ic a lly c o m b in e d o n e o r m o re o f h is a p p lic a tio n
p a c k a g e s w ith a fix e d in p u t a n d d e fin e d th is a s a b e n c h m a rk . L a te r, th e m a jo r s y s te m
v e n d o rs w h o ru n th e b e n c h m a rk c o o p e ra te w ith th e s o ftw a re v e n d o r in th e e v o lu tio n
o f th e b e n c h m a rk , a n d th e re is u s u a lly s o m e fo rm o f o rg a n iz e d c o o p e ra tio n a ro u n d a
s o ftw a re v e n d o r’s b e n c h m a rk . S till, th e re s p o n s ib ility fo r re s u lt p u b lic a tio n ty p ic a lly
lie s w ith th e s o ftw a re v e n d o r. T h e a ttra c tiv e n e s s o f th e s e b e n c h m a rk s fo r c o m p u te r
c u s to m e rs lie s in th e fa c t th a t th e y im m e d ia te ly h a v e a fe e lin g fo r th e p ro g ra m s th a t
a re e x e c u te d d u rin g th e b e n c h m a rk m e a s u re m e n t: Id e a lly , th e y a re th e s a m e p ro g ra m s
th a t c u s to m e rs ru n in th e ir d a ily o p e ra tio n s .
T a b le 3 . B e n c h m a rk s a d m in is te re d b y m a jo r s o ftw a re v e n d o rs
S o ftw a re v e n d o r S in c e U R L B e n c h m a rk s S y s te m s
c o v e re d te s te d
S A P 1 9 9 3 w w w .s a p .c o m / E R P s o ftw a re S e rv e rs
b e n c h m a rk /
L o tu s 1 9 9 6 w w w .n o te s b e n c h . D o m in o a n d L o tu s S e rv e rs
o rg so f tw a re , m a in ly
m a il
O ra c le 1 9 9 9 w w w .o r a c le .c o m / E R P s o ftw a re S e rv e rs
A p p lic a tio n s a p p s_ b e n c h m a rk /
T h e re a re m o re s o ftw a re v e n d o rs th a t h a v e c re a te d th e ir o w n b e n c h m a rk s , a g a in o fte n
a s a b y p ro d u c t o f s iz in g c o n s id e ra tio n s , a m o n g th e m a re B a a n , P e o p le s o ft, S ie b e l, a n d
o th e rs . T a b le 3 o n ly lis ts th o s e w h e re th e e x te rn a l u s e a s a b e n c h m a rk h a s b e c o m e
m o re im p o rta n t th a n ju s t s iz in g .
N o te th a t in th is g ro u p , th e re is n o c o lu m n “ S o u rc e L a n g u a g e ” : A lth o u g h th e
a p p lic a tio n p a c k a g e ty p ic a lly h a s b e e n d e v e lo p e d in a h ig h -le v e l la n g u a g e , th e
b e n c h m a rk c o d e is th e b in a ry c o d e g e n e ra te d b y th e s o ftw a re v e n d o r a n d /o r th e
s y s te m v e n d o r fo r a p a rtic u la r p la tfo rm .
2 .1 .5 R e s u lt C o lle c tio n s b y T h ir d P a r tie s
F o r c o m p le te n e s s , re s u lt c o lle c tio n s s h o u ld a ls o b e m e n tio n e d w h e re a c o m m e rc ia l

o rg a n iz a tio n d o e s n o t d e v e lo p a n e w b e n c h m a rk b u t ra th e r c o lle c ts b e n c h m a rk re s u lts
m e a s u re d e ls e w h e re . T h e b e s t-k n o w n e x a m p le is ID E A S In te rn a tio n a l
( w w w .id e a s in te r n a tio n a l.c o m ) . T h e ir w e b p a g e o n b e n c h m a rk s d is p la y s th e to p
re s u lts fo r th e T P C , S P E C , O ra c le A p p lic a tio n s , L o tu s , S A P , a n d B A P C o
b e n c h m a rk s . S u c h lis ts o r c o lle c tio n s o f b e n c h m a rk re s u lts fro m v a rio u s so u rc e s m a y
1 8 4 R . W e ic k e r
s e rv e a n e e d o f th o s e th a t ju s t w a n t a s h o rt a n s w e r to th e n o n -triv ia l q u e s tio n o f
p e rfo rm a n c e ra n k in g : “ G iv e m e a s im p le lis t ra n k in g a ll s y s te m s a c c o rd in g to th e ir
o v e ra ll p e rfo rm a n c e ” , o r “ Ju st g iv e m e th e to p 1 0 s y s te m s , w ith o u t a ll th e d e ta ils ” .
O fte n , th e m e d ia , o r h ig h -le v e l m a n a g e rs h a rd p re s s e d o n tim e , w a n t s u c h a ra n k in g .
H o w e v e r, th e in e v ita b le d a n g e r o f s u c h c o n d e n s e d re s u lt p re s e n ta tio n s is th a t
im p o rta n t c a v e a ts , im p o rta n t d e ta ils o f a p a rtic u la r re s u lt g e t lo s t.
2 .2 O th e r C la s s ific a tio n s : B e n c h m a r k D e fin itio n , P r ic in g , A r e a s C o v e r e d
2 .2 .1 B e n c h m a r k D e f in it io n
T h e re a re b a s ic a lly th re e c la s s e s o f b e n c h m a rk d e fin itio n s , w ith a n im p o rta n t
a d d itio n a l s p e c ia l c a s e :
1 . B e n c h m a rk s th a t a re d e fin e d in s o u rc e c o d e fo rm : T h is is th e “ c la s s ic a l”
fo rm o f b e n c h m a rk s . T h e y re q u ire a c o m p ile r fo r th e s y s te m u n d e r te s t b u t
th is is u s u a lly n o p ro b le m . S P E C , E E M B C , a n d a ll o ld e r s in g le -a u th o r
b e n c h m a rk s lis te d h e re b e lo n g to th is g ro u p .
2 . B e n c h m a rk s th a t a re d e fin e d a s b in a ry c o d e s : B y d e fin itio n , th e s e
b e n c h m a rk s c o v e r a lim ite d ra n g e o f s y s te m s o n ly . H o w e v e r, th e m a rk e t o f
In te l/W in d o w s c o m p a tib le P C s is la rg e e n o u g h th a t b e n c h m a rk s c o v e rin g
o n ly th is a re a c a n b e q u ite p o p u la r. T h e B A P C o b e n c h m a rk s a n d o th e r
b e n c h m a rk s o fte n u s e d b y th e p o p u la r P C p re s s (n o t c o v e re d h e re ) b e lo n g to
th is c a te g o ry .
3 . B e n c h m a rk s th a t a re d e fin e d a s s p e c ific a tio n s o n ly : T h e T P C b e n c h m a rk s
a re d e fin e d b y a s p e c ific a tio n d o c u m e n t, T P C d o e s n o t p ro v id e s o u rc e c o d e .
B e c a u s e o f th e n e e d to p ro v id e a le v e l p la y in g fie ld in th e a b s e n c e o f s o u rc e
c o d e , a n d to p re v e n t lo o p h o le s , th e s e s p e c ific a tio n d o c u m e n ts a re q u ite
v o lu m in o u s . F o r e x a m p le , a s o f 2 0 0 2 , th e c u rre n t T P C -C d e fin itio n c o n ta in s
1 3 7 p a g e s , th e T P C -W d e fin itio n e v e n 1 9 9 p a g e s .
4 . B e n c h m a rk s a d m in is te re d b y a s o ftw a re v e n d o r a re a s o m e w h a t s p e c ia l c a s e :
T h e c o d e ru n n in g o n th e s y s te m u n d e r te s t is m a c h in e c o d e , b u t it u s u a lly is
th e c o d e s o ld b y th e s o ftw a re v e n d o r to h is c u s to m e rs , a n d th e re is ty p ic a lly
a v e rs io n fo r e v e ry m a jo r in s tru c tio n s e t a rc h ite c tu re / o p e ra tin g s y s te m
c o m b in a tio n . T h e o n ly p ro b le m c a n b e th a t in th e c a s e o f a s m a ll s y s te m
v e n d o r, w h e re le s s s y s te m s a re s o ld , th e s o ftw a re v e n d o r m a y n o t h a v e tu n e d
th e c o d e (c o m p ila tio n , u s e o f s p e c ific O S fe a tu re s ) a s w e ll a s h e d o e s in th e
c a s e o f a b ig s y s te m v e n d o rs , w h e re m a n y c o p ie s o f th e s o ftw a re a re s o ld .
O n th e o n e h a n d , th e s o ftw a re s y s te m s s o ld to c u s to m e rs w ill a ls o h a v e th is
p ro p e rty ; s o o n e c a n s a y th a t th e s itu a tio n re p re s e n ts re a l life . O n th e o th e r
h a n d , th e fe e lin g re m a in s th a t th is is s o m e w h a t u n fa ir, p e n a liz in g s m a lle r
s y s te m v e n d o rs .
In a ll c a se s, e v e n in th e c a se w h e re th e c o d e e x e c u te d o n th e s y s te m is g iv e n in s o u rc e
o r b in a ry fo rm , a b e n c h m a rk is d e fin e d n o t o n ly b y th e c o d e th a t is e x e c u te d b u t a ls o
b y in p u t d a ta a n d b y a d o c u m e n t, ty p ic a lly c a lle d “ R u n a n d R e p o rtin g R u le s ” ; it
d e s c rib e s th e ru le s a n d re q u ire m e n ts fo r th e m e a s u re m e n t e n v iro n m e n t.
2 .2 .2 P r ic e / P e r fo r m a n c e
S o m e b e n c h m a rk s in c lu d e “ p ric in g ” ru le s , i.e . re su lt q u o ta tio n s m u st c o n ta in n o t o n ly

a p e rfo rm a n c e m e tric b u t a ls o a p ric e /p e rfo rm a n c e m e tric . S in c e its b e g in n in g s , T P C
re s u lts h a v e in c lu d e d s u c h a m e tric , e .g . “ p ric e p e r tp m -C ” . A m o n g th e o th e r
b e n c h m a rk s, o n ly th e L o tu s b e n c h m a r k a n d th e S P E C G P C b e n c h m a rk s h a v e a
p ric e /p e rfo rm a n c e m e tric .
T h e v a lu e o f p ric in g in b e n c h m a rk s is o fte n s u b je c t to d e b a te , in th e b e n c h m a rk
o rg a n iz a tio n s th e m s e lv e s a n d in th e p re s s . A rg u m e n ts fo r p ric in g a re :
C u s to m e rs n a tu ra lly a re in te re s te d in p ric e s , a n d p ric e s d e te rm in e d a c c o rd in g
to u n ifo rm p ric in g ru le s s e t b y th e b e n c h m a rk o rg a n iz a tio n h a v e a c h a n c e to
b e m o re u n ifo rm th a n , s a y , p ric e s p u b lis h e d b y a m a g a z in e .
T h e re is a lw a y s a te n d e n c y a m o n g s y s te m v e n d o rs to a im fo r th e to p s p o t in
th e p e rfo rm a n c e lis t. T h e re q u ire m e n t to p ro v id e p ric e in fo rm a tio n c a n b e a
u s e fu l c o rre c tiv e a g a in s t b e n c h m a rk c o n fig u ra tio n s th a t d e g ra d e in to s h e e r
b a ttle s o f m a te r ia l: I f a b e n c h m a r k s c a le s w e ll, e .g . f o r c lu s te r s , th e n w h o e v e r
c a n a c c u m u la te e n o u g h h a rd w a re in th e b e n c h m a rk la b , w in s th e c o m p e titio n
fo r p e rfo rm a n c e . T h e re q u ire m e n t to q u o te th e p ric e o f th e c o n fig u ra tio n m a y
p re v e n t s u c h u s e le s s b a ttle s .
O n th e o th e r h a n d , th e re a re th e a rg u m e n ts a g a in s t p ric in g :
In th e c a s e o f s y s te m s th a t a re la rg e r th a n ju s t a s in g le w o rk s ta tio n , p ric e s a re
d iffic u lt to d e te rm in e a n d h a v e m a n y c o m p o n e n ts : H a rd w a re , s o ftw a re ,
m a in te n a n c e . It is h a rd to fin d u n ifo rm c rite ria fo r a ll c o m p o n e n ts , in
p a rtic u la r fo r m a in te n a n c e ; d iffe re n t c o m p a n ie s m a y h a v e d iffe re n t b u s in e s s
m o d e ls .
In th e c o m p u te r b u s in e s s , p ric e s g e t o u td a te d v e ry fa s t. It is te m p tin g b u t
m is le a d in g to c o m p a re a p ric e p u b lis h e d to d a y w ith a p ric e p u b lis h e d a y e a r
a g o .
W ith th e g o a l to b e r e a lis tic , s o m e p r ic in g r u le s ( e .g . T P C ’ s r u le s ) a llo w
d is c o u n ts , p ro v id e d th a t th e y a re g e n e ra lly a v a ila b le . O n th e o th e r h a n d , th is
a llo w s s y s te m v e n d o rs to g ra n t s u c h d is c o u n ts ju s t fo r c o n fig u ra tio n s th a t
h a v e b e e n s e le c te d w ith a n e y e o n im p o rta n t b e n c h m a rk re s u lts , m a k in g th e
p ric e le s s re a lis tic th a n it a p p e a rs .
E x p e rie n c e in b e n c h m a rk o rg a n iz a tio n s lik e T P C s h o w s th a t a la rg e
p e r c e n ta g e o f r e s u lt c h a lle n g e s h a v e to d o w ith p r ic in g .. T h is d is tr a c ts e n e r g y
fro m th e m e m b e r o rg a n iz a tio n s th a t c o u ld b e b e tte r s p e n t in th e im p ro v e m e n t
o f b e n c h m a rk s.
O v e ra ll, th e a rg u m e n ts a g a in s t p ric in g a p p e a r to b e m o re c o n v in c in g . T h e tra d itio n a l

a p p ro a c h o f th e S P E C O p e n S y s te m s G ro u p (O S G ) “ H a v e th e te s t sp o n s o r p u b lis h , in
d e ta il, a ll c o m p o n e n ts th a t w e re u se d , a n d e n c o u ra g e th e in te re s te d re a d e r to g e t a
p ric e q u o ta tio n d ir e c tly fro m v e n d o rs’ s a le s o ff ic e s ” se e m s to w o rk q u ite w e ll.
2 .2 .3 A r e a s C o v e r e d b y B e n c h m a r k s
F in a lly , a n im p o rta n t c la s s ific a tio n o f b e n c h m a rk s is re la te d to th e a re a th e
b e n c h m a rk s in te n d to c o v e r. O n e s u c h c la s s ific a tio n is s h o w n in fig u re 1 .
1 8 6 R . W e ic k e r
A p p lic a tio n a n d B e n c h m a r k K it
O p e r a tin g s y s te m , C o m p ile r , L ib r a r ie s
M u lt i- D is k -
C P U C a c h e M e m o ry L A N D B M S
C P U IO
C P U S P E C C P U 2 0 0 0
T e s tin g R a n g e
C P U s S P E C C P U 2 0 0 0 C a p a c ity
C a te g o ry
J a v a V M S P E C jb b 2 0 0 0
W e b s e rv e r S P E C w e b 9 9
O L T P T P C -C
D S S T P C -H , T P C -R
e -C o m m e rc e T P C -W
E R P S A P
E R P O r a c le A p p lic a tio n s
F ig . 1 . A re a s c o v e re d b y s o m e m a jo r b e n c h m a rk s (fro m [1 0 ]).
O f c o u rse , th e b e n c h m a rk c o v e ra g e a ls o h a s so m e re la tio n to th e c o st o f
b e n c h m a rk in g . F o r e x a m p le , if n e tw o rk in g is in v o lv e d , th e b e n c h m a rk s e tu p ty p ic a lly
in c lu d e s o n e o r m o re se rv e rs a n d s e v e ra l (o fte n m a n y ) c lie n ts . It is n o t s u rp ris in g th a t
th e n u m b e r o f re s u lts fo r s u c h b e n c h m a rk s, w h e re m e a su re m e n ts ta k e w e e k s o r
m o n th s , is m u c h s m a lle r th a n fo r th o s e th a t in v o lv e o n ly o n e s y s te m .
3 A C lo s e r L o o k a t th e M o r e P o p u la r B e n c h m a r k s
In th is s e c tio n , th e m o re im p o rta n t b e n c h m a rk s fro m S P E C , T P C , a n d S A P a re

c o v e re d . T h e y a re , fo r la rg e s y s te m s (s e rv e rs ), th e m o s t w id e ly q u o te d b e n c h m a rk s .
3 .1 S P E C C P U B e n c h m a r k s, M e a su r e m e n t M e th o d s
A C P U b e n c h m a rk s u ite , n o w c a lle d C P U 8 9 , w a s S P E C ’s fir st b e n c h m a rk p ro d u c t.

W h e n S P E C w a s fo u n d e d in 1 9 8 8 , th e e x p lic it in te n tio n w a s to p ro v id e so m e th in g
b e tte r th a n th e s m a ll s in g le -a u th o r b e n c h m a rk s th a t h a d b e e n u se d b e f o re , a n d to
p ro v id e s ta n d a r d iz e d v e rs io n s o f la rg e r p ro g ra m s (e .g . g c c , sp ic e ) th a t w e re a lre a d y
u se d b y so m e R IS C s y s te m v e n d o rs.
S in c e 1 9 8 9 , S P E C h a s re p la c e d th e C P U b e n c h m a rk s th re e tim e s , w ith C P U 9 2 ,
C P U 9 5 , a n d C P U 2 0 0 0 . C u rre n tly , th e S P E C C P U su b c o m m itte e is w o rk in g o n
C P U 2 0 0 4 , in te n d e d to re p la c e th e c u rre n t s u ite C P U 2 0 0 0 . O n th e o th e r h a n d , th e
p rin c ip le o f S P E C C P U b e n c h m a rk in g h a s b e e n r e m a rk a b ly c o n s is te n t:
A n u m b e r o f p ro g ra m s a re c o n trib u te d b y th e S P E C m e m b e r c o m p a n ie s , b y
th e o p e n s o u rc e c o m m u n ity , o r b y re s e a rc h e rs in th e a c a d e m ic c o m m u n ity .
F o r th e C P U 2 0 0 0 s u ite , a n d a g a in fo r C P U 2 0 0 4 , S P E C h a s in itia te d a n
a w a rd p ro g ra m to e n c o u ra g e s u c h c o n trib u tio n s .
T h e m e m b e r c o m p a n ie s th a t a re a c tiv e in th e C P U s u b c o m m itte e p o rt th e
b e n c h m a rk s to th e ir v a rio u s p la tfo rm s ; d e p e n d e n c y o n I/O o r o p e ra tin g
s y s te m a c tiv ity is re m o v e d , if n e c e s s a ry . C a re is ta k e n th a t a ll c h a n g e s a re
p e rfo rm a n c e -n e u tra l a c ro s s p la tfo rm s . If p o s s ib le , th e c o o p e ra tio n o f th e
o rig in a l p ro g ra m ’s a u th o r(s ) is s o u g h t fo r a ll th e s e a c tiv itie s .
T h e b e n c h m a rk s a re te s te d in a to o l h a rn e s s p ro v id e d b y S P E C . C o m p ila tio n
a n d e x e c u tio n o f th e b e n c h m a rk s is a u to m a te d a s m u c h a s p o s s ib le . T h e
te s te r s u p p lie s a “ c o n f ig u r a tio n f ile ” w ith s y s te m - s p e c if ic p a r a m e te r s ( e .g .
lo c a tio n o f th e C c o m p ile r, c o m p ila tio n fla g s , lib ra rie s , d e s c rip tio n o f th e
s y s te m u n d e r te s t).
T a b le 4 . S P E C ’s C P U b e n c h m a rk s o v e r th e y e a rs
C P U 8 9 C P U 9 2 C P U 9 5 C P U 2 0 0 0
In te g e r p ro g ra m s 4 6 8 1 2
F lo a tin g -p o in t 6 1 4 1 0 1 4
p ro g ra m s
T o ta l s o u rc e lin e s , 7 7 ,1 0 0 8 5 ,5 0 0 2 7 5 ,0 0 0 3 8 9 ,3 0 0
In te g e r
T o ta l s o u rc e lin e s , 2 4 ,2 0 0 4 4 ,0 0 0 2 0 ,6 0 0 1 5 8 ,3 0 0
F P
S o u rc e la n g u a g e s C , F 7 7 C , F 7 7 C , F 7 7 C , C + + , F 7 7 , F 9 0
N u m b e r o f re su lts 1 9 1 1 2 9 2 1 8 8 1 1 0 4 3
th ro u g h Q 1 /2 0 0 2
S P E C ’s c u rr e n t C P U 2 0 0 0 s u ite is d e s c rib e d in d e ta il in [5 ] ; th is a rtic le a ls o d e s c rib e s

h o w S P E C s e le c ts its C P U b e n c h m a rk s, a n d so m e p r o b le m s th a t h a v e to b e s o lv e d in
th e d e fin itio n o f a n e w b e n c h m a rk s u ite . T a b le 4 su m m a riz e s s o m e k e y d a ta o n
S P E C ’s C P U b e n c h m a rk s (N o te : S o u rc e lin e c o u n ts d o n o t in c lu d e c o m m e n t lin e s ).
T h e p o p u la r “ s p e e d ” m e tric ty p ic a lly m e a su re s th e e x e c u tio n tim e o f e a c h p ro g ra m in

th e s u ite w h e n it is e x e c u te d o n o n e C P U (o th e r C P U s a re re m o v e d o r d e a c tiv a te d ), it
re p la c e s o ld “ M IP S ” m e tric s th a t h a d p re v io u s ly b e e n u s e d . T h e e x e c u tio n tim e o f
e a c h p ro g ra m is s e t in re la tio n to a “ S P E C R e fe re n c e T im e ” (e x e c u tio n tim e o n a
s p e c ific o ld e r p o p u la r s y s te m ), a n d th e g e o m e tric m e a n o f a ll th e s e p e rfo rm a n c e ra tio s
is c o m p u te d a s a n o v e ra ll fig u re o f m e r it (S P E C in t, S P E C fp ).
S o o n a fte r th e in tro d u c tio n o f th e C P U 8 9 s u ite , m a n u fa c tu re rs o f m u lti-C P U s y s te m s

w a n te d to s h o w th e p e rfo rm a n c e o f th e ir s y s te m s w ith th e s a m e b e n c h m a rk s. In
re sp o n se , S P E C d e v e lo p e d ru le s fo r a “ S P E C ra te ” (th ro u g h p u t) c o m p u ta tio n : F o r
e a c h b e n c h m a rk in th e s u ite , se v e ra l c o p ie s a re e x e c u te d s im u lta n e o u s ly (ty p ic a lly , a s
m a n y c o p ie s a s th e re a re C P U s in th e s y s te m ), a n d to ta l th ro u g h p u t o f th is p a ra lle l
1 8 8 R . W e ic k e r
e x e c u tio n (jo b s p e r tim e p e rio d ) is c o m p u te d o n a p e r-b e n c h m a rk b a s is . A g e o m e tric

m e a n is c o m p u te d , s im ila r to th e “ s p e e d ” c a s e .
In its o ffic ia l s ta te m e n ts , S P E C e m p h a s iz e s th e a d v ic e “ L o o k a t a ll th e n u m b e rs ” .
T h is m e a n s th a t a c u s to m e r in te re s te d in a p a rtic u la r a p p lic a tio n a re a w e ig h ts
b e n c h m a rk s fro m th is a re a h ig h e r th a n th e re s t. E x p e rts c a n d ra w e v e n m o re
in te re s tin g c o n c lu s io n s , c o rr e la tin g , fo r e x a m p le , th e w o rk in g s e ts fo r p a rtic u la r
b e n c h m a rk s w ith th e te s t s y s te m ’s c a c h e a rc h ite c tu re a n d c a c h e s iz e s [5 ]. H o w e v e r,
s u c h e v a lu a tio n s re m a in e d , to a la rg e d e g re e , a n a re a o f e x p e rts o n ly ; c u s to m e rs ra re ly
lo o k a t m o re th a n th e n u m b e r a c h ie v e d in th e o v e ra ll m e tric .
3 .2 E v o lu tio n o f th e S P E C C P U B e n c h m a r k s , I s s u e s
D e s p ite th e c o n s is te n c y o f th e m e a s u re m e n t m e th o d , a n u m b e r o f n e w e le m e n ts w e re
b ro u g h t in to th e s u ite o v e r th e y e a rs . B o th th e S P E C -p ro v id e d to o l h a rn e ss a n d th e
R u n R u le s re g u la tin g c o n fo rm a n t e x e c u tio n s o f th e s u ite g re w in c o m p le x ity . T h e
m o s t im p o rta n t s in g le c h a n g e w a s th e in tro d u c tio n o f th e “ b a s e lin e ” m e tric in 1 9 9 4 .
In th e firs t p u b lic a tio n s o f S P E C C P U b e n c h m a rk re s u lts , th e “ N o te s ” s e c tio n w h e re

a ll c o m p ila tio n p a ra m e te rs (in U n ix te rm in o lo g y : “ F la g s ” ) m u s t b e lis te d , c o n s is te d o f
a fe w s h o rt lin e s , lik e
O p t i m i z a t i o n w a s s e t t o - O 3 f o r a l l b e n c h m a r k s e x c e p t f p p p a n d
s p i c e 2 g 6 , w h i c h u s e d – O 2
O v e r tim e , th is c h a n g e d d ra s tic a lly ; to d a y , o n e to th re e s m a ll-p rin t lin e s p e r

b e n c h m a rk lik e
1 9 7 . p a r s e r : - f a s t - x p r e f e t c h = n o % a u t o - x c r o s s f i l e - x r e g s = s y s t
- W c , - Q g s c h e d - t r a c e _ l a t e = 1 , - Q g s c h e d - T 4 - x a l i a s _ l e v e l = s t r o n g
- W c , - Q i p a : v a l u e p r e d i c t i o n – x p r o f i l e
lis tin g th e o p tim iz a tio n fla g s u s e d ju s t fo r th is s in g le b e n c h m a rk , a re n o t u n c o m m o n .

C o m p ile r w rite rs h a d a lw a y s u s e d in te rn a l v a ria b le s c o n tro llin g , fo r e x a m p le , th e
tr a d e o f f b e tw e e n th e b e n e f it o f in lin in g ( e lim in a tio n o f c a llin g s e q u e n c e s e tc .) a n d its
n e g a tiv e e ffe c ts (lo n g e r c o d e , m o re in s tru c tio n c a c h e m is s e s ). N o w , th e s e v a ria b le s
w e re m a d e a c c e s s ib le to th e c o m p ile r u s e r – a n e a s y a n d s tra ig h tfo rw a rd c h a n g e in th e
c o m p ile r d riv e r. H o w e v e r, th e u s e r o fte n d o e s n o t h a v e th e k n o w le d g e to a p p ly th e
a d d itio n a l c o m m a n d -lin e p a ra m e te rs p ro p e rly , n o r th e tim e to o p tim iz e th e ir u s e . In
a d d itio n , s o m e tim e s “ a s s e rtio n ” fla g s a re u s e d w h ic h h e lp a p a rtic u la r b e n c h m a rk b u t
w h ic h w o u ld c a u s e o th e r p ro g ra m s to fa il. T h is s o o n p ro m p te d th e q u e s tio n w h e th e r
s u c h e x c e s s iv e o p tim iz a tio n s a re re p re s e n ta tiv e fo r re a l p ro g ra m m in g , o r w h e th e r it
w o u ld b e b e tte r to “ b u rn a ll fla g s " [1 3 ]. T h e is s u e w a s la te r d is c u s s e d in a c a d e m ic
p u b lic a tio n s [ 1 ,8 ] a n d in th e tr a d e p r e s s .
A fte r a c o n s id e ra b le d e b a te , th e fin a l c o m p ro m is e fo r S P E C , e s ta b lis h e d in J a n u a ry

1 9 9 4 , w a s th a t tw o m e tric s w e re e s ta b lis h e d , “ b a s e lin e ” a n d “ p e a k ” : E v e ry S P E C
C P U b e n c h m a rk m e a s u re m e n t h a s to m e a s u re p e rfo rm a n c e w ith a re s tric te d
“ b a s e lin e ” s e t o f fla g s (m e tric n a m e : S P E C in t_ b a s e 2 0 0 0 o r s im ila r), a n d o p tio n a lly
w ith a m o re e x te n d e d “ p e a k ” s e t o f fla g s . In its o w n p u b lic a tio n s , o n th e S P E C w e b
s ite , S P E C tre a ts th e m e q u a lly . In m a rk e tin g m a te ria l, m o s t c o m p a n ie s e m p h a s iz e th e

h ig h e r p e a k n u m b e rs ; th e re fo re e v e n th e e x is te n c e o f a b a s e lin e d e fin itio n m a y b e
u n k n o w n to s o m e u s e rs o f S P E C re s u lts . T h e e x a c t d e fin itio n o f “ b a s e lin e ” c a n b e
fo u n d in th e s u ite ’s R u n a n d R e p o rtin g R u le s . F o r C P U 2 0 0 0 , it c a n b e s u m m a riz e d a s
fo llo w s :
A ll b e n c h m a rk s o f a s u ite m u s t b e c o m p ile d w ith th e s a m e s e t o f fla g s (e x c e p t
fla g s th a t m a y b e n e c e s s a ry fo r p o rta b ility ).
T h e n u m b e r o f o p tim iz a tio n fla g s is re s tric te d to 4 .
A s s e rtio n fla g s (fla g s th a t a s s e rt a c e rta in p ro p e rty o f th e p ro g ra m , o n e th a t m a y
n o t h o ld fo r o th e r p ro g ra m s , a n d th a t ty p ic a lly a llo w s m o re a g g re s s iv e
o p tim iz a tio n s ) a re n o t a llo w e d : T h is p ro p e rty m a y n o t h o ld fo r o th e r p ro g ra m s ,
a n d o n e o f th e b a s ic p r in c ip le s o f b a s e lin e is th a t th e y a r e “ s a f e ” , i.e . d o n o t le a d
to e rro n e o u s b e h a v io r fo r a n y la n g u a g e -c o m p lia n t p ro g ra m .
T h e id e a is n o w g e n e ra lly a c c e p te d th a t it m a k e s s e n s e to h a v e , in a d d itio n to th e
“ e v e ry th in g g o e s (e x c e p t s o u rc e c o d e c h a n g e s )” o f th e “ p e a k ” re s u lts , a “ b a s e lin e ”
re s u lt. H o w e v e r, th e d e ta ils a re o fte n c o n tro v e rs ia l, th e y e m e rg e a s a c o m p ro m is e in
S P E C ’s C P U s u b c o m m itte e . T h e q u e s tio n “ W h a t is th e p h ilo s o p h y b e h in d b a se lin e ? ”
m a y g e n e ra te d iffe re n t a n s w e rs if d iffe re n t p a rtic ip a n ts a re a s k e d . A p o ss ib le a n sw e r
c o u ld b e :
B a s e lin e ru le s s e rv e to fo rm a " b a s e lin e " o f p e rfo rm a n c e th a t ta k e s in to a c c o u n t
re a s o n a b le e a s e o f u s e fo r d e v e lo p e rs ,
c o rre c tn e s s a n d s a fe ty o f th e g e n e ra te d c o d e ,
re c o m m e n d a tio n s o f th e c o m p ile r v e n d o r fo r g o o d p e rfo rm a n c e ,
re p re s e n ta tiv ity o f th e c o m p ila tio n /lin k a g e p ro c e s s fo r w h a t h a p p e n s in th e
p ro d u c tio n o f im p o rta n t s o ftw a re p a c k a g e s .
A g a in , it c a n n o t b e d is p u te d th a t in s o m e c a s e s , in d iv id u a l p o in ts m a y c o n tra d ic t e a c h
o th e r: W h a t if th e v e n d o r o f a p o p u la r c o m p ile r s e ts , fo r p e rfo rm a n c e re a so n s, th e
d e fa u lt b e h a v io r to a m o d e th a t d o e s n o t im p le m e n t a ll fe a tu re s re q u ire d b y th e
la n g u a g e d e fin itio n ? T h is u s a g e m o d e m a y b e v e ry c o m m o n – m o st u se rs la c k th e
e x p e rtis e to re c o g n iz e s u c h c a s e s a n y w a y -, b u t c o rre c tn e s s o f th e g e n e ra te d c o d e , a s
d e fin e d b y th e la n g u a g e s ta n d a rd , is n o t g u a ra n te e d . S in c e S P E C re le a s e s th e C P U
b e n c h m a rk s in s o u rc e c o d e fo rm , c o rre c t im p le m e n ta tio n o f th e p ro g ra m m in g
la n g u a g e a s d e fin e d b y th e s ta n d a rd is a n e c e s s a ry re q u ire m e n t fo r a fa ir c o m p a ris o n .
T h e b a s e lin e to p ic is p ro b a b ly th e m o s t im p o rta n t s in g le is s u e c o n n e c te d w ith th e

S P E C C P U b e n c h m a rk s . B u t th e re a re o th e r im p o rta n t is s u e s a s w e ll:
H o w s h o u ld a c o m p u te -in te n s iv e m u lti-C P U m e tric b e d e fin e d ? T h e
c u rre n t S P E C ra te m e th o d ta k e s a s im p lis tic a p p ro a c h : E x e c u te , fo r
e x a m p le o n a n n -C P U s y s te m , n c o p ie s o f a g iv e n b e n c h m a rk in p a ra lle l,
re c o rd th e ir s ta rt a n d e n d tim e s . T h is in tro d u c e s s o m e a rtific ia l p ro p e rtie s
in to th e w o r k lo a d . I n a r e a l- lif e c o m p u te - in te n s iv e e n v ir o n m e n t, e .g . in a
u n iv e rs ity c o m p u tin g c e n te r, b a tc h jo b s c o m e a n d g o a t irre g u la r in te rv a ls
a n d n o t in lo c k s te p . T y p ic a lly , th e re is a ls o a n e le m e n t o f o v e rlo a d in g :
M o re jo b s a re in th e ru n q u e u e th a n C P U s in th e s y s te m .
S P E C n e e d s to d is trib u te its C P U b e n c h m a rk s a s s o u rc e c o d e . F o r th e
flo a tin g -p o in t s u ite , it s e e m s p o s s ib le to g e t, w ith s o m e e ffo rt, g o o d , s ta te -
1 9 0 R . W e ic k e r
o f-th e -a rt c o d e s fro m re s e a rc h e rs (e n g in e e rin g , n a tu ra l s c ie n c e s ). It is

m o re d iffic u lt to g e t s im ila rly g o o d , re p re s e n ta tiv e s o u rc e c o d e s in n o n -
n u m e ric p ro g ra m m in g . S u c h p ro g ra m s a re ty p ic a lly v e ry la rg e , a n d th e
d e v e lo p e rs , th e v e n d o rs o f s u c h s o ftw a re c a n n o t g iv e th e m a w a y fo r fre e
in s o u rc e c o d e fo rm . T ra d itio n a lly , S P E C h a s re lie d h e a v ily o n in te g e r
p r o g r a m s f r o m th e O p e n S o u r c e c o m m u n ity ( G N U C c o m p ile r , P e r l, e tc .) ,
b u t th e s e p ro g ra m s a re o n ly o n e p a rt o f th e s p e c tru m .
It h a s b e e n o b s e rv e d [2 ] th a t th e S P E C C P U b e n c h m a rk s , in p a rtic u la r th e
in te g e r b e n c h m a rk s , d e v ia te in th e ir c a c h e u s a g e fro m w h a t is ty p ic a lly
o b s e rv e d o n “ liv e ” s y s te m s . T h e o b s e rv a tio n s in [2 ] w e re fo r th e C P U 9 2
b e n c h m a rk s b u t th e te n d e n c y s till c a n b e o b s e rv e d to d a y . T o s o m e d e g re e ,
s u c h d iffe re n c e s a re a n u n a v o id a b le c o n s e q u e n c e o f th e m e a s u re m e n t
m e th o d : F o r th e p u rp o s e o f b e n c h m a rk in g , w ith a re a s o n a b ly lo n g
m e a s u re m e n t in te rv a l, a S P E C C P U b e n c h m a rk ru n s fo r a lo n g e r tim e
u n in te rru p te d o n o n e C P U th a n in re a l-life e n v iro n m e n ts w h e re jo b s o fte n
c o n s is t o f s e v e ra l c o o p e ra tin g p ro c e s s e s . D u e to th e a b s e n c e o f c o n te x t
s w itc h e s , S P E C C P U m e a s u re m e n ts s h o w a lm o s t n o p ro c e s s m ig ra tio n
w ith s u b s e q u e n t in s tru c tio n c a c h e in v a lid a tio n s .
In p a rtic u la r th e la s t p o in t s h o u ld b e ta k e n b y S P E C a s a re a s o n to th in k a b o u t o th e r
a lte rn a tiv e s to m e a s u re th e ra w c o m p u tin g p o w e r o f a s y s te m : In a tim e w h e n
m u ltip le C P U s a re p la c e d o n a s in g le d ie , d o e s it m a k e s e n s e to a rtific ia lly is o la te th e
sp e e d o f o n e C P U fo r a tra d itio n a l “ s p e e d ” m e a s u re m e n t?
3 .3 S P E C S y ste m B e n c h m a r k s
S P E C O S G s ta rte d w ith C P U b e n c h m a rk s b u t v e ry s o o n a ls o d e v e lo p e d b e n c h m a rk s
th a t m e a s u re d s y s te m p e rfo rm a n c e . T h e a re a s th a t S P E C c h o s e to p u t e ffo rts in w e re
d e te rm in e d b y a p e rc e p tio n o f th e m a rk e t d e m a n d s a s s e e n b y th e S P E C O S G m e m b e r
c o m p a n ie s . F o r e x a m p le , w h e n th e In te rn e t a n d J a v a g a in e d p o p u la rity , S P E C O S G
s o o n d e v e lo p e d its W e b a n d J V M b e n c h m a rk s . C u rre n tly , J a v a o n s e rv e rs a n d m a il
s e rv e rs a re s e e n a s h o t to p ic s ; th e re fo re , J a v a s e rv e r b e n c h m a rk s a n d m a il s e rv e r
b e n c h m a rk s a re a re a s w h e re S P E C m e m b e r c o m p a n ie s in v e s t c o n s id e ra b le e ffo rts o n
th e d e v e lo p m e n t o f n e w b e n c h m a rk s . T h e re a re a ls o a re a s w h e re S P E C ’s e ffo rts w e re
u n s u c c e s s fu l: S P E C w o rk e d fo r s o m e tim e o n a n I/O b e n c h m a rk b u t fin a lly c o u ld n o t
fin d a p ra c tic a l w a y b e tw e e n ra w d e v ic e m e a s u re m e n ts a n d s y s te m -s p e c ific I/O
lib ra ry c a lls . (T h e s e e ffo rts a p p a re n tly a re ta k e n u p n o w b y a s e p a ra te in d u s try
o r g a n iz a tio n , th e “ S to r a g e P e r f o r m a n c e C o u n c il” , s e e w w w .s to r a g e p e r f o r m a n c e .o r g ) .
T h e o n ly a re a in te n tio n a lly le ft o u t b y S P E C is tra n s a c tio n p ro c e s s in g , th e tra d itio n a l
d o m a in o f th e s is te r b e n c h m a rk in g o rg a n iz a tio n T P C .
It is im p o rta n t to re a liz e th a t S P E C ’s s y s te m b e n c h m a rk s a re b o th b ro a d e r a n d
n a rro w e r th a n th e c o m p o n e n t (C P U ) b e n c h m a rk s:
T h e y a re b ro a d e r th a n th e C P U b e n c h m a rk s b e c a u s e th e y te s t m o re c o m p o n e n ts
o f th e s y s te m , ty p ic a lly in c lu d in g th e O S , n e tw o rk in g , a n d – fo r s o m e
b e n c h m a rk s – th e I/O s u b s y s te m .
T h e y a re n a rro w e r th a n th e C P U b e n c h m a rk s b e c a u s e th e y te s t th e s y s te m w h e n
it is e x e c u tin g s p e c ific , s p e c ia liz e d ta s k s o n ly , e .g . a c tin g a s a f ile s e r v e r , a w e b
s e rv e r o r a m a il s e rv e r.
T h is n a rro w e r s c o p e o f s o m e s y s te m b e n c h m a rk s is n o t u n re la te d to re a l-life p ra c tic e :
M a n y c o m p u te rs a re e x c lu s iv e ly u s e d a s file s e rv e rs , w e b s e rv e rs , d a ta b a s e s e rv e rs ,
m a il s e rv e rs , e tc . T h e re fo re it m a k e s s e n s e to te s t th e m in s u c h a lim ite d s c e n a rio
o n ly .
M o s t s y s te m b e n c h m a rk s p re s e n t re s u lts in th e fo rm o f a ta b le o r a c u rv e , g iv in g , fo r
e x a m p le , th e th ro u g h p u t c o rre la te d w ith th e re s p o n s e tim e . T h is c o rre s p o n d s to
S P E C ’s p h ilo s o p h y “ L o o k a t a ll th e n u m b e rs ” , s im ila rly a s th e C P U b e n c h m a rk s u ite s
g iv e s th e re s u lts fo r e a c h in d iv id u a l b e n c h m a rk . H o w e v e r, S P E C is re a lis tic e n o u g h to
k n o w th a t th e m a rk e t o fte n d e m a n d s a “ s in g le fig u re o f m e rit” , a n d h a s d e fin e d s u c h a
n u m b e r f o r e a c h b e n c h m a r k ( e .g . m a x im u m th r o u g h p u t, th r o u g h p u t a t o r b e lo w a
s p e c ific re s p o n s e tim e ). T a b le 5 lis ts S P E C ’s c u rre n t s y s te m b e n c h m a rk s , w ith th e
n u m b e r o f p u b lic a tio n s o v e r th e y e a rs .
T a b le 5 . S P E C O S G s y s te m b e n c h m a rk re s u lts o v e r th e y e a rs
S D M S F S S P E C S P E C S P E C S P E C S P E C
w e b 9 6 w e b 9 9 jv m 9 8 jb b 2 0 0 0 m a il2 0 0 1
1 9 9 1 5 1
1 9 9 2 1 7
1 9 9 3 5 2 1
1 9 9 4 6 1 8
1 9 9 5 1 5 2 1
1 9 9 6 1 4 2 2
1 9 9 7 3 6 5 0
1 9 9 8 1 9 8 0 2 1
1 9 9 9 9 6 6 1 5 2 6
2 0 0 0 6 2 1 2 4 9 2 3 2 2
2 0 0 1 2 5 + 3 4 5 7 4 5 7 6
Q 1 /2 0 0 2 2 1 1 3 3 2 1 3
O v e ra ll 9 4 2 1 5 2 2 5 1 2 4 7 7 1 0 0 9
S o m e g e n e ra l d iffe re n c e s to th e C P U b e n c h m a rk s a re :
T h e w o rk lo a d is , a lm o s t in e v ita b ly , s y n th e tic . W h e re a s S P E C a v o id s s y n th e tic
b e n c h m a rk s fo r th e C P U s u ite s , w o rk lo a d s fo r file s e rv e r o r w e b s e rv e r
b e n c h m a rk s c a n n o t b e d e riv e d fro m “ re a l life ” w ith o u t e x te n s iv e tra c e c a p tu rin g /
r e p la y c a p a b ilitie s th a t m a k e u s e in b e n c h m a rk s im p ra c tic a l. O n th e o th e r h a n d ,
p r o p e r tie s th a t a r e im p o r ta n t f o r s y s te m b e n c h m a r k s , e .g . f ile s iz e s , c a n b e m o r e
e a s ily m o d e le d in s y n th e tic w o rk lo a d s .
T h e re s u lts a re m o re d iffic u lt to u n d e rs ta n d ; th e re fo re th e b e n c h m a rk s a re
p o s s ib ly n o t a s w e ll k n o w n a n d n o t a s p o p u la r a s th e C P U b e n c h m a rk s .
1 9 2 R . W e ic k e r
F in a lly , th e m e a s u re m e n t e ffo rt is m u c h la rg e r. T y p ic a lly , th e s e b e n c h m a rk s n e e d

c lie n ts / w o rk lo a d g e n e ra to rs th a t a d d c o n s id e ra b ly to th e c o s t o f b e n c h m a rk in g .
T h e re fo re , fe w e r re s u lts e x is t th a n fo r th e C P U b e n c h m a rk s .
3 .3 .1 S D M B e n c h m a r k
S P E C ’s firs t s y s te m b e n c h m a rk s u ite , re le a s e d in 1 9 9 1 , w a s S D M (S y s te m
D e v e lo p m e n t M u ltiu s e r, re le a s e d 1 9 9 1 ). It c o n s is ts o f tw o in d iv id u a l b e n c h m a rk s th a t
d iffe r in s o m e a s p e c ts (w o rk lo a d in g re d ie n ts , th in k tim e ) b u t h a v e m a n y p ro p e rtie s in
c o m m o n : B o th h a v e a m ix o f g e n e ra l-p u rp o s e U n ix c o m m a n d s (ls , c d , fin d , c c , ...) a s
th e ir w o rk lo a d . B o th u s e s c rip ts o r s im u la te d c o n c u rre n t u s e rs th a t p u t m o re a n d m o re
s tre s s o n th e s y s te m , u n til th e s y s te m b e c o m e s o v e rlo a d e d a n d th e a d d itio n o f u se rs
re s u lts in a d e c re a s e in th ro u g h p u t.
A fte r in itia l e n th u s ia s m , in te re s t in th e b e n c h m a rk d ie d d o w n . T o d a y , S D M re s u lts a re

n o lo n g e r re q u e s te d b y c u s to m e rs , a n d th e re fo re n o t re p o rte d b y S P E C m e m b e r
c o m p a n ie s . T h is is a n in te re s tin g p h e n o m e n o n b e c a u s e w e k n o w th a t m a n y S P E C
m e m b e r c o m p a n ie s s till u s e S D M h e a v ily a s a n in te rn a l p e rfo rm a n c e tra c k in g to o l:
W h e n e v e r a n e w re le a s e o f th e O S is d e v e lo p e d , its p e rfo rm a n c e im p a c t is m e a s u re d
w ith th e S D M b e n c h m a rk s . W h y h a s it th e n b e e n u n s u c c e s s fu l a s a b e n c h m a rk ? It
s e e m s th a t th e re a re s e v e ra l re a s o n s :
S D M m e a s u re s ty p ic a l U n ix a ll-a ro u n d s y s te m s ; to d a y , w e h a v e m o re c lie n t-
s e rv e r c o n fig u ra tio n s .
F o r m a rk e tin g , th e fa c t th a t th e re is n o t a n o v e ra ll, e a s y -to -e x p la in s in g le fig u re
o f m e rit (o n ly in d iv id u a l re s u lts fo r s d e t a n d k e n b u s ; n o re q u ire m e n t to re p o rt
b o th ) is p ro b a b ly a s e v e re d is a d v a n ta g e .
M o s t im p o rta n t, th e in g re d ie n ts a re th e s y s te m ’s U n ix c o m m a n d s , a n d s e v e ra l o f
th e m (c c , n ro ff) ta k e a la rg e p e rc e n ta g e o f tim e in th e b e n c h m a rk . T h is o p e n s u p
p o s s ib ilitie s fo r S D M -s p e c ific v e rs io n s o f th e s e c o m m a n d s , e .g . a c o m p ile r (c c
c o m m a n d ) th a t d o e s little m o re th a n s y n ta x c h e c k in g : G o o d fo r g re a t S D M
n u m b e rs b u t u s e le s s o th e rw is e .
S D M is a g o o d e x a m p le f o r th e e x p e rie n c e th a t a g o o d p e rfo rm a n c e m e a s u re m e n t to o l
is n o t y e t n e c e s s a rily a g o o d b e n c h m a rk : A n in -h o u s e to o l n e e d s n o p ro v is io n s a g a in s t
u n in te n d e d o p tim iz a tio n s (“ c h e a tin g ” ); th e u s e r w o u ld o n ly c h e a t h im s e lf o r h e rs e lf.
A b e n c h m a rk w h o se re su lts a re u s e d in m a rk e tin g m u s t h a v e a d d itio n a l q u a litie s : It
m u s t b e ta m p e r-p ro o f a g a in s t s u b s titu tio n o f fa s t, s p e c ia l-c a s e , b e n c h m a rk -s p e c ific
c o m p o n e n ts w h e re n o rm a lly o th e r s o ftw a re c o m p o n e n ts w o u ld b e u se d .
3 .3 .2 S F S B e n c h m a r k
T h e S F S b e n c h m a rk (S y s te m b e n c h m a rk , F ile S e rv e r, firs t re le a s e 1 9 9 3 ) is S P E C ’s
firs t c lie n t-s e rv e r b e n c h m a rk a n d h a s e s ta b lis h e d a m e th o d th a t w a s la te r u s e d fo r
o th e r b e n c h m a rk s a ls o : T h e b e n c h m a r k c o d e ru n s s o le ly o n th e c lie n ts , th e y g e n e ra te
N F S re q u e s ts lik e lo o k u p , g e ta ttr, re a d , w rite , e tc . U n ix is re q u ire d fo r th e c lie n ts , b u t
th e se rv e r, th e s y s te m u n d e r te s t, c a n b e a n y s e rv e r c a p a b le o f a c c e p tin g N F S
re q u e s ts . S F S trie s to b e a s in d e p e n d e n t fro m th e c lie n ts ’ p e rfo rm a n c e a s p o s s ib le ,

c o n c e n tra tin g s o le ly o n m e a s u rin g th e s e rv e r’s p e rfo rm a n c e .
D e s p ite th e la rg e in v e s tm e n t n e c e ss a ry fo r th e te s t s p o n s o r (th e s e tu p fo r a la rg e
s e rv e r m a y in c lu d e a s m a n y a s 4 8 0 d is k s fo r th e s e rv e r, a n d a s m a n y a s 2 0 lo a d -
g e n e ra tin g c lie n ts ), th e re h a s b e e n a s te a d y flo w o f re s u lt p u b lic a tio n s , a n d S F S is th e
e s ta b lis h e d b e n c h m a r k in its a re a . In 1 9 9 7 , S F S 1 .0 w a s re p la c e d b y S F S 2 .0 . T h e
n e w e r b e n c h m a rk c o v e rs th e N F S p ro to c o l v e rs io n 3 , a n d it h a s a n e w ly d e s ig n e d m ix
o f o p e ra tio n s .
In s u m m e r 2 0 0 1 , p ro m p te d b y o b s e rv a tio n s d u rin g re s u lt re v ie w s , S P E C d is c o v e re d
s ig n ific a n t d e fe c ts in its S F S 9 7 b e n c h m a rk s u ite : C e rta in p ro p e rtie s b u ilt in to th e
b e n c h m a rk (p e rio d ic c h a n g e s b e tw e e n h ig h a n d lo w file s y s te m a c tiv itie s , d is trib u tio n
o f file s a c c e s s e s , n u m e ric a c c u ra c y o f th e ra n d o m p ro c e s s s e le c tin g th e file s th a t a re
a c c e s s e d ) w e re n o lo n g e r g u a ra n te e d w ith to d a y ’s fa st p ro c e sso rs. A s a c o n se q u e n c e ,
S P E C h a s s u s p e n d e d s a le s o f th e S F S 9 7 (2 .0 ) b e n c h m a rk a n d re p la c e d it b y S F S 9 7
R 1 V 3 .0 . R e s u lt s u b m is s io n s h a d to s ta r t o v e r , s in c e re s u lts m e a s u re d b y th e d e fe c tiv e
b e n c h m a rk c a n n o t b e u s e d w ith o u t c o n s id e ra b le c a re in in te rp re ta tio n .
3 .3 .3 S P E C w e b B e n c h m a r k s
T h e S P E C w e b b e n c h m a rk (W e b s e rv e r b e n c h m a rk , firs t re le a s e 1 9 9 6 ) w a s d e riv e d
fro m th e S F S b e n c h m a rk , a n d it h a s m a n y p ro p e rtie s in c o m m o n w ith S F S :
It m e a s u re s th e p e rfo rm a n c e o f th e s e rv e r a n d trie s to d o th is , a s m u c h a s
p o s s ib le , in d e p e n d e n tly fro m th e c lie n ts ’ p e rfo rm a n c e .
A s y n th e tic lo a d is g e n e ra te d o n th e c lie n ts , g e n e ra tin g H T T P re q u e s ts . In th e
c a s e o f S P E C w e b 9 6 , th e s e w e re s ta tic G E T re q u e s ts o n ly , th e m o s t c o m m o n ty p e
o f H T T P re q u e s ts a t th a t tim e . S P E C w e b 9 9 a d d e d d y n a m ic re q u e s ts (d y n a m ic
G E T , d y n a m ic P O S T , s im u la tio n o f a n a d ro ta tio n s c h e m e b a s e d o n c o o k ie s ).
T h e file s iz e d is trib u tio n is b a s e d o n lo g s fro m s e v e ra l la rg e w e b s ite s ; th e file s e t
s iz e is re q u ire d to s c a le w ith th e p e rfo rm a n c e
D iffe re n t fro m S F S , th e b e n c h m a rk c o d e c a n ru n o n N T c lie n t s y s te m s a s w e ll a s o n
U n ix c lie n t s y s te m s . S im ila r to S F S , th e s e rv e r c a n b e a n y s y s te m c a p a b le o f s e rv in g
H T T P re q u e s ts . A s a p p a re n t fro m th e la rg e n u m b e r o f re s u lt p u b lic a tio n s (s e e ta b le
5 ), S P E C ’s w e b b e n c h m a rk s h a v e b e c o m e v e ry p o p u la r.
W h e n it b e c a m e e v id e n t th a t e le c tro n ic c o m m e rc e n o w c o n s titu te s a la rg e p e rc e n ta g e
o f W W W u s a g e , a n d th a t th is ty p ic a lly in v o lv e s u se o f a “ S e c u re S o c k e t L a y e r”
(S S L ) m e c h a n is m , S P E C re s p o n d e d w ith a n e w W e b b e n c h m a rk , S P E C w e b 9 9 _ S S L .
In th e in te re s t o f a fa s t re le a s e o f th e b e n c h m a rk , S P E C d id n o t c h a n g e th e w o rk lo a d
b u t ju s t a d d e d S S L u s a g e to th e e x is tin g S P E C w e b 9 9 b e n c h m a rk . A n e w S S L
h a n d s h a k e is re q u ire d w h e n e v e r S P E C w e b 9 9 te rm in a te d a “ k e e p a liv e ” c o n n e c tio n (in
th e a v e ra g e , e v e ry te n th re q u e s t). O f c o u rs e , it m a k e s n o s e n s e to c o m p a re
S P E C w e b 9 9 _ S S L re s u lts w ith (n o n -S S L ) S P E C w e b 9 9 re s u lts . A n e x c e p tio n c a n b e
re s u lts th a t h a v e b e e n o b ta in e d fo r th e s a m e s y s te m ; th e y c a n h e lp to e v a lu a te th e
p e rfo rm a n c e im p a c t o f S S L e n c ry p tio n o n th e s e rv e r.
1 9 4 R . W e ic k e r
3 .3 .4 S P E C J a v a B e n c h m a r k s
In th e y e a rs 1 9 9 7 /1 9 9 8 , w h e n it b e c a m e c le a r th a t J a v a p e rfo rm a n c e w a s a h o t to p ic in
in d u s try , S P E C fe lt c o m p e lle d to p ro d u c e , d u rin g a re la tiv e ly s h o rt tim e , a s u ite fo r
J a v a b e n c h m a r k in g . P r e v io u s b e n c h m a r k c o lle c tio n in th is a r e a ( C o f f e in e m a r k e tc .)
h a d k n o w n w e a k n e s s e s th a t m a d e th e m v u ln e ra b le to s p e c ific o p tim iz a tio n s . T h e firs t
S P E C J a v a b e n c h m a rk s u ite fo llo w e d th e p a tte rn o f th e e s ta b lis h e d C P U b e n c h m a rk s :
A c o lle c tio n o f p ro g ra m s , ta k e n fro m re a l a p p lic a tio n s w h e re p o s s ib le , in d iv id u a l
b e n c h m a rk p e rfo rm a n c e ra tio s , a n d th e g e o m e tric m e a n o v e r a ll b e n c h m a rk s a s a
“ s in g le fig u re o f m e rit” . In th e d e s ig n o n th e b e n c h m a rk s u ite a n d th e ru n ru le s fo r th e
s u ite , s e v e ra l n e w a s p e c ts h a d to b e d e a lt w ith :
G a rb a g e c o lle c tio n p e rfo rm a n c e , e v e n th o u g h it m a y o c c u r a t u n p re d ic ta b le
tim e s , is im p o rta n t fo r J a v a p e rfo rm a n c e .
J u s t-In -T im e (J IT ) c o m p ile rs a re ty p ic a lly u s e d w ith J a v a v irtu a l m a c h in e s , in
p a rtic u la r fo r s y s te m s fo r w h ic h th e m a n u fa c tu re r w a n ts to s h o w g o o d
p e rfo rm a n c e .
D u r in g th e f ir s t y e a r s , J a v a w a s ty p ic a lly u s e d o n s m a ll c lie n t s y s te m s ( e .g .
w ith in w e b b ro w s e rs ), a n d m e m o ry s iz e is a n im p o rta n t is s u e fo r s u c h s y s te m s .
F in a lly , S P E C h a d to d e c id e w h e th e r b e n c h m a rk e x e c u tio n n e e d e d to fo llo w th e
s tric t ru le s o f th e o ffic ia l J a v a d e fin itio n , o r w h e th e r s o m e s o rt o f o fflin e
c o m p ila tio n s h o u ld b e a llo w e d .
S P E C d e c id e d to s ta rt w ith a s u ite th a t re q u ire s s tric t c o m p lia n c e w ith th e J a v a V irtu a l
M a c h in e m o d e l. A n o v e l fe a tu re o f S P E C jv m 9 8 re s u lts is th e ir g ro u p in g in to th re e
c a te g o rie s a c c o rd in g to th e m e m o ry s iz e : U n d e r 4 8 M B , 4 8 – 2 5 6 M B , o v e r 2 5 6 M B .
T h e e x p e rie n c e s e e m s to s h o w th a t th e te s t s p o n s o rs (m o s tly h a rd w a re m a n u fa c tu re rs )
fo llo w e d S P E C ’s s u g g e s tio n a n d p ro d u c e d re s u lts n o t o n ly fo r b ig m a c h in e s b u t a ls o
fo r “ th in ” c lie n ts .
D u rin g th e la s t y e a rs , J a v a b e c a m e m o re p o p u la r a s a p ro g ra m m in g la n g u a g e n o t o n ly
fo r s m a ll (c lie n t) s y s te m s , b u t a ls o fo r s e rv e rs . S P E C re s p o n d e d w ith th e
S P E C jb b 2 0 0 0 J a v a B u s in e s s B e n c h m a rk w h ic h h a s b e c o m e q u ite p o p u la r.
M a n u fa c tu re rs o f s e rv e r s y s te m s n o w u s e it to d e m o n s tra te th e c a p a b ilitie s o f th e ir
h ig h -e n d s e rv e rs . T h e J a v a c o d e e x e c u te d in S P E C jb b 2 0 0 0 re s e m b le s T P C ’s T P C -C
b e n c h m a r k ( w a r e h o u s e s , p r o c e s s in g o f o r d e r s ; s e e s e c tio n 3 .3 ) b u t th e r e a liz a tio n is
d iffe re n t: In s te a d o f re a l d a ta b a s e o b je c ts s to re d o n d is k s , J a v a o b je c ts (in m e m o ry )
a re u s e d to re p re s e n t th e b e n c h m a rk ’s d a ta ; th e re fo re J a v a m e m o ry a d m in is tra tio n a n d
g a rb a g e c o lle c tio n p la y a n im p o rta n t ro le fo r p e rfo rm a n c e . O v e ra ll, th e in te n tio n is to
m o d e l th e m id d le tie r in th re e -tie r s o ftw a re s y s te m s ; s u c h s o ftw a re is n o w o fte n
w ritte n in J a v a .
C u rre n tly , S P E C is w o rk in g o n a n o th e r b e n c h m a rk th a t re lie s m o re d ire c tly o n Ja v a

a p p lic a tio n s o ftw a re . A “ Ja v a A p p lic a tio n S e rv e r” b e n c h m a rk S P E C jA p p S e r v e r w ill
in c lu d e th e u s e o f p ro g ra m p a rts th a t u s e p o p u la r J a v a a p p lic a tio n p a c k a g e s
(E n te rp ris e J a v a B e a n s ™ ) a n d w ill a ls o in v o lv e th e u s e o f d a ta b a s e s o ftw a re . In th is
b e n c h m a rk , c lie n t s y s te m s d riv e th e s e rv e r; th e re fo re th e c o s t o f b e n c h m a rk in g w ill b e
m u c h h ig h e r.
3 .4 T P C B e n c h m a r k s
L ik e S P E C , th e T ra n s a c tio n P ro c e s s in g P e rfo rm a n c e C o u n c il (T P C ), fo u n d e d in 1 9 8 8 ,
is a n o n -p ro fit c o rp o ra tio n w ith th e m is s io n to d e liv e r o b je c tiv e p e rf o rm a n c e
e v a lu a tio n s ta n d a rd s to th e in d u s try . H o w e v e r, T P C fo c u s e s o n tra n s a c tio n p r o c e s s in g
a n d d a ta b a s e b e n c h m a rk s . A n o th e r im p o rta n t d iffe re n c e to th e S P E C O S G
b e n c h m a rk s is th e fa c t th a t T P C , fro m its b e g in n in g , in c lu d e d a p ric e /p e rf o rm a n c e
m e tr ic a n d m a d e p r ic e r e p o r tin g m a n d a to r y ( s e e s e c tio n 2 .2 ) . T h e m o s t w id e ly u s e d
b e n c h m a rk o f th is o rg a n iz a tio n , a n d th e re fo re th e c la s s ic s y s te m b e n c h m a r k , is th e
T P C -C p u b lis h e d in 1 9 9 2 , a n O L T P b e n c h m a rk s im u la tin g a n o rd e r-e n try b u s in e s s
s c e n a rio .
T h e T P C -C b e n c h m a rk s im u la te s a n e n v iro n m e n t w h e re v irtu a l u s e rs a re c o n c u rre n tly

e x e c u tin g tra n s a c tio n s a g a in s t a s in g le d a ta b a s e . T h e e n v iro n m e n t m o d e ls a w h o le s a le
s u p p lie r , i.e . a c o m p a n y r u n n in g a c e r ta in n u m b e r o f w a r e h o u s e s , e a c h w ith a n
in v e n to ry o f ite m s o n s to c k . E a c h w a re h o u s e is s e rv in g te n s a le s d is tric ts , a n d e a c h
d is tric t h a s 3 0 0 0 c u s to m e rs . S a le s c le rk s fo r e a c h d is tric t a re th e v irtu a l u s e rs o f th e
b e n c h m a rk w h o e x e c u te tra n s a c tio n s . T h e re a re fiv e ty p e s o f tra n s a c tio n s o f w h ic h th e
n e w -o rd e r ty p e is th e m o s t im p o rta n t, re p re s e n tin g th e e n try o f a n o rd e r o n b e h a lf o f a
c u s to m e r. T h e n u m b e r o f o rd e r e n trie s p e r m in u te th e s y s te m c a n h a n d le is th e
p rim a ry p e rfo rm a n c e m e tric , it is c a lle d tp m C (tra n s a c tio n s p e r m in u te , T P C -C ).
T h e u s e r s im u la tio n h a s to o b e y re a lis tic a s s u m p tio n s o n k e y in g a n d th in k tim e s . F o r

th e d a ta b a s e r e s p o n s e tim e s , th e b e n c h m a r k d e f in e s p e r m is s ib le u p p e r lim its , e .g . f iv e
s e c o n d s fo r th e n e w -o rd e r tra n s a c tio n ty p e . F u rth e r c o n s tra in ts a p p ly to th e s h a re s o f
th e tr a n s a c tio n ty p e s , e .g . o n ly a b o u t 4 5 % o f a ll tr a n s a c tio n s a r e n e w o r d e r e n tr ie s , th e
re m a in in g s h a re is d is trib u te d to p a y m e n ts , p o s tin g o f d e liv e rie s , a n d s ta tu s c h e c k s .
T o o n ly re p o rt th e ra te o f n e w o rd e r e n trie s is m o tiv a te d b y th e d e s ire to e x p re s s a
b u s in e s s th ro u g h p u t in s te a d o f p ro v id in g a b s tra c t te c h n ic a l q u a n titie s . A s tra n s a c tio n s
c a n n o t b e s u b m itte d to th e s y s te m a t a rb itra ry s p e e d d u e to k e y in g a n d th in k tim e s , a
d e p e n d e n c y is e n fo rc e d b e tw e e n th e th ro u g h p u t a n d th e n u m b e r o f e m u la te d u s e rs , o r
th e n u m b e r o f c o n fig u re d w a re h o u s e s a n d u ltim a te ly th e s iz e o f th e d a ta b a s e . T h e
w a re h o u s e w ith its te n u s e rs is th e s c a lin g u n it o f th e b e n c h m a rk , it c o m e s w ith
c u s to m e r a n d s to c k d a ta a n d a n in itia l p o p u la tio n o f o rd e rs th a t o c c u p y p h y s ic a l d is k
sp a c e .
S in c e T P C d o e s n o t d e fin e its b e n c h m a rk s v ia s o u rc e c o d e , th e d a ta b a s e v e n d o rs ,
o fte n in c o o p e ra tio n w ith th e m a jo r s y s te m v e n d o rs , ty p ic a lly p ro d u c e a “ b e n c h m a rk
k it” w h ic h im p le m e n ts th e b e n c h m a rk fo r a s p e c ific h a rd w a re /s o ftw a re c o n fig u ra tio n .
It is th e n th e ta s k o f th e T P C -c e rtifie d a u d ito r to c h e c k th a t th e k it im p le m e n ts th e
b e n c h m a rk c o rre c tly , in a d d itio n to v e rify in g th e p e rfo rm a n c e d a ta (th ro u g h p u t,
re s p o n s e tim e s ). A n im p o rta n t re q u ire m e n t c h e c k e d b y th e a u d ito r, w h ic h is u n iq u e to
T P C , a re th e s o -c a lle d “ A C ID p ro p e rtie s ”
A to m ic ity : T h e e n tire s e q u e n c e o f a c tio n s m u s t b e e ith e r c o m p le te d o r
a b o rte d .
C o n s is te n c y : T ra n s a c tio n s ta k e th e re s o u rc e s fro m o n e c o n s is te n t s ta te to
a n o th e r c o n s is te n t s ta te .
1 9 6 R . W e ic k e r
Is o la tio n : A tra n s a c tio n ’s e ffe c t is n o t v is ib le to o th e r tra n s a c tio n s u n til th e

tra n s a c tio n is c o m m itte d .
D u ra b ility : C h a n g e s m a d e b y c o m m itte d tra n s a c tio n s a re p e rm a n e n t a n d
m u s t s u rv iv e s y s te m fa ilu re s .
T h e ra tio n a le fo r th e s e “ A C ID p ro p e rtie s ” is th e fu n d a m e n ta l re q u ire m e n t th a t a
b e n c h m a rk m u s t b e re p re s e n ta tiv e fo r a n a p p lic a tio n e n v iro n m e n t, a n d th a t th e re fo re ,
th e d a ta b a s e u s e d n e e d s to b e a “ re a l d a ta b a s e ” . In th e c a s e o f tra n s a c tio n p ro c e s s in g ,
fa s te r e x e c u tio n c o u ld e a s ily b e a c h ie v e d if th e im p le m e n ta tio n w o u ld d ro p o n e o r
m o re o f th e “ A C ID p ro p e rty ” re q u ire m e n ts . B u t s u c h a n e n v iro n m e n t w o u ld n o t b e
o n e in to w h ic h c u s to m e rs w o u ld h a v e e n o u g h tru s t to s to re th e ir b u s in e s s d a ta in it,
a n d a n y p e rfo rm a n c e re s u lts fo r s u c h a n u n s ta b le e n v iro n m e n t w o u ld b e m e a n in g le s s .
O v e r th e y e a r s , T P C - C u n d e r w e n t s e v e r a l r e v is io n s , th e c u r r e n t m a jo r r e v is io n is 5 .0 .
It d iffe rs f r o m r e v is io n 3 .5 in s o m e a s p e c ts o n ly ( r e v is e d p r ic in g r u le s ) ; th e a lg o r ith m s
re m a in e d u n c h a n g e d . A n e a rlie r a tte m p t (“ re v is io n 4 .0 ” ) to m a k e th e b e n c h m a rk
e a s ie r to h a n d le ( e .g . f e w e r d is k s r e q u ir e d ) , a n d a t th e s a m e tim e m o r e r e a lis tic ( e .g .
m o re c o m p le x tra n s a c tio n s ) fa ile d a n d d id n o t g e t th e n e c e s s a ry v o te s w ith in T P C . It
c a n b e a s s u m e d th a t th e in v e s tm e n t th a t c o m p a n ie s h a d m a d e in to th e e x is tin g re s u lts
(w h ic h w o u ld th e n lo o s e th e ir v a lu e fo r c o m p a ris o n s ) p la y e d a ro le in th is d e c is io n .
C o m p le m e n tin g T P C -C , T P C in tro d u c e d , in 1 9 9 5 , its “ d e c is io n s u p p o rt” b e n c h m a rk

T P C -D . W h ile T P C -C is u p d a te -in te n s iv e a n d trie s to re p re s e n t o n -lin e tra n s a c tio n s a s
th e y o c c u r in re a l life , T P C -D m o d e le d tra n s a c tio n s th a t c h a n g e th e d a ta b a s e c o n te n t
q u ite in fre q u e n tly b u t p e rfo rm c o m p u te -in te n s iv e o p e ra tio n s o n it, w ith th e g o a l o f
s u p p o rtin g ty p ic a l b u s in e s s d e c is io n s . T h is d iffe re n t ta s k im m e d ia te ly h a d
c o n s e q u e n c e s fo r p ra c tic a l b e n c h m a rk in g :
D e c is io n s u p p o rt b e n c h m a rk s a re m o re C P U -in te n s iv e a n d le s s I/O
in te n s iv e th a n tra n s a c tio n p ro c e s s in g b e n c h m a rk s .
D e c is io n s u p p o rt b e n c h m a rk s s c a le b e tte r fo r c lu s te r s y s te m s
In 1 9 9 9 , T P C fa c e d a p ro b le m th a t c a n b e , in a la rg e r s e n s e , a p ro b le m fo r a ll
b e n c h m a rk s w h e re th e a lg o rith m a n d th e in p u t d a ta a re k n o w n in a d v a n c e to th e
im p le m e n to rs : It is p o s s ib le fo r th e d a ta b a s e im p le m e n ta tio n to s to re d a ta n o t o n ly in
th e c u s to m a ry fo rm a s re la tio n s (n -tu p le s o f v a lu e s , in s o m e p h y s ic a l re p re s e n ta tio n )
b u t to a n tic ip a te c o m p u ta tio n s th a t a re lik e ly to b e p e rfo rm e d o n th e d a ta la te r, a n d to
s to re p a rtia l re s u lts o f s u c h c o m p u ta tio n s . T h e d a ta b a s e c a n a lre a d y p re s e n t
“ m a te ria liz e d v ie w s ” , a s th e y a re c a lle d , to th e u s e r. It is o n ly a s h o rt s te p fro m th is
o b s e rv a tio n to th e c o n s tru c tio n o f m a te ria liz e d v ie w s th a t a re d e s ig n e d w ith a n e y e
to w a rd s th e re q u ire m e n ts o f th e T P C -D b e n c h m a rk . W ith in a s h o rt tim e , s u c h
im p le m e n ta tio n s b ro u g h t a h u g e in c re a s e in re p o rte d p e rfo rm a n c e , a n d T P C w a s
fo rc e d to s e t n e w ru le s o r to w ith d ra w th e b e n c h m a rk . It d e c id e d to re p la c e T P C -D
w ith tw o s u c c e s s o r b e n c h m a rk s th a t a re d iffe re n t w ith re s p e c t to o p tim iz a tio n s b a s e d
o n a d v a n c e k n o w le d g e o f th e q u e rie s : T P C -R a llo w s th e m , T P C -H d o e s n o t. It tu rn e d
o u t th a t a p p a re n tly , th e u s e rs o f T P C b e n c h m a rk re s u lts c o n s id e re d a s itu a tio n w ith
m a te ria liz e d v ie w s a s n o t re p re s e n ta tiv e fo r th e ir e n v iro n m e n t. A fte r tw o in itia l
re s u lts in 1 9 9 9 /2 0 0 0 , n o m o re T P C -R re s u lts w e re p u b lis h e d , a n d T P C -R c a n b e
c o n s id e re d d e a d . In a b ro a d s e n s e , T P C -R a n d T P C -H , c a n b e c o m p a re d w ith S P E C ’s
“ p e a k ” a n d “ b a s e lin e ” m e tric s : S im ila r c o m p u ta tio n s , b u t in o n e c a s e , m o re
o p tim iz a tio n s a re a llo w e d . It is in te re s tin g to n o te th a t th e T P C c u s to m e rs a p p a re n tly
w e re m o re in te re s te d in th e “ b a s e lin e ” v e rs io n o f d e c is io n s u p p o rt b e n c h m a rk s .
R e a liz in g th a t th e s p lit in to tw o b e n c h m a rk s w a s a te m p o ra ry s o lu tio n o n ly , T P C
c u rre n tly w o rk s o n a c o m m o n s u c c e s s o r b e n c h m a rk fo r b o th T P C -H a n d T P C -R .
T P C ’s la te s t b e n c h m a rk , T P C -W , c o v e rs a n im p o rta n t n e w a re a ; it h a s b e e n d e s ig n e d
to s im u la te th e a c tiv itie s o f a b u s in e s s o rie n te d tra n s a c tio n a l In te rn e t w e b s e rv e r, a s it
m ig h t b e u s e d in e le c tro n ic c o m m e rc e . C o rre s p o n d in g ly , th e a p p lic a tio n p o rtra y e d b y
th e b e n c h m a rk is a re ta il s to re o n th e In te rn e t w ith c u s to m e r “ b ro w s e a n d o rd e r”
s c e n a rio . T h e fig u re o f m e rit c o m p u te d b y th e b e n c h m a rk is “ W e b In te ra c tio n s P e r
S e c o n d ” (W IP S ), fo r a g iv e n s c a le fa c to r (o v e ra ll ite m c o u n t). T h e in itia l p ro b le m o f
T P C -W s e e m s to b e its c o m p le x ity s in c e th e re a re m a n y c o m p o n e n ts th a t c a n
in flu e n c e th e re s u lt:
W e b s e rv e r, a p p lic a tio n s e rv e r, im a g e s e rv e r, d a ta b a s e s o ftw a re , a ll o f
w h ic h c a n c o m e fro m d iffe re n t s o u rc e s
S S L im p le m e n ta tio n
T C P /IP re a liz a tio n
R o u te b a la n c in g
C a c h in g
It c o u ld b e d u e to th is c o m p le x ity th a t th e re a re s till re la tiv e ly fe w T P C -W re s u lts
(c u rre n tly , a s o f J u n e 2 0 0 2 , 1 3 re s u lts ), m u c h le s s th a n fo r T P C -C (7 9 a c tiv e re s u lts
fo r v e rs io n 5 ). T h is m a y b e a n in d ic a tio n th a t in th e n e c e s s a ry tra d e o ff b e tw e e n
re p re s e n ta tiv ity (re a lis tic s c e n a rio s ) a n d e a s y o f u s e , T P C m ig h t h a v e b e e n to o
a m b itio u s a n d m ig h t h a v e d e s ig n e d a b e n c h m a rk th a t is to o c o s tly to m e a s u re . A ls o ,
th e b e n c h m a rk d o e s n o t h a v e a n e a s y -to -u n d e rs ta n d , in tu itiv e re s u lt m e tric . S P E C ’s
“ H T T P o p s /s e c ” (in S P E C w e b 9 6 ) m a y b e u n re a lis tic if o n e lo o k s c lo s e r a t th e
d e fin itio n , b u t it a t le a s t a p p e a r s m o re in tu itiv e th a n T P C -W ’s “ W IP S ” (W e b
In te ra c tio n s P e r S e c o n d ). In a d d itio n , w h e n th e firs t re s u lts w e re s u b m itte d , it b e c a m e
c le a r th a t m o re ru le s a re n e c e s s a ry fo r th e b e n c h m a rk w ith re s p e c t to th e ro le o f th e
v a rio u s s o ftw a re la y e rs (w e b s e rv e r, a p p lic a tio n s e rv e r, d a ta b a s e s e rv e r).
3 .5 S A P B e n c h m a r k s
S im ila rly to S P E C a n d T P C , S A P o ffe rs n o t o n ly a s in g le b e n c h m a rk b u t a fa m ily o f

b e n c h m a rk s ; th e re a re v a rio u s b e n c h m a rk s fo r v a rio u s b u s in e s s s c e n a rio s . T h e S A P
S ta n d a rd A p p lic a tio n B e n c h m a rk s h a v e a c c o m p a n ie d th e S A P R /3 p ro d u c t re le a s e s
s in c e 1 9 9 3 . S in c e 1 9 9 5 , is s u e s re la tin g to b e n c h m a rk in g , a n d th e p u b lic u s e o f re s u lts
a re d is c u s s e d b y S A P a n d its p a rtn e rs in th e S A P B e n c h m a rk C o u n c il. B e n c h m a rk
d e fin itio n a n d c e rtific a tio n o f re s u lts , h o w e v e r, is a t th e d is c re tio n o f S A P . T h e
n u m b e r o f b e n c h m a rk s in th e fa m ily is a b o u t a d o z e n , w ith th e m a jo rity s im u la tin g
u s e rs in o n lin e d ia lo g w ith S A P R /3 .
T w o o f th e b e n c h m a rk s, S a le s a n d D is trib u tio n (S D ) a n d A sse m b le -to -O rd e r (A T O ),

c o v e r n e a rly 9 0 % o f a ll p u b lis h e d re s u lts . H is to ric a lly , S D c a m e firs t a n d g a in e d its
im p o rta n c e b y b e in g th e to o l to m e a s u re th e S A P S th ro u g h p u t m e tric (S A P
A p p lic a tio n B e n c h m a rk P e rfo rm a n c e S ta n d a rd ) th a t is a t th e c e n te r o f a ll S A P R /3
s iz in g :
1 9 8 R . W e ic k e r
1 0 0 S A P S a r e d e fin e d a s 2 0 0 0 fu lly b u s in e s s p r o c e s s e d o r d e r lin e ite m s p e r h o u r

in th e S ta n d a r d S D b e n c h m a r k . T h is is e q u iv a le n t to 6 0 0 0 d ia lo g s te p s a n d 1 6 0 0
p o s tin g s p e r h o u r o r 2 4 0 0 S A P tr a n s a c tio n s p e r h o u r .
T h e S D a n d A T O b u s in e s s s c e n a rio is th a t o f a s u p p ly c h a in : A c u s to m e r o rd e r is
p la c e d , th e d e liv e ry o f g o o d s is s c h e d u le d a n d in itia te d , a n in v o ic e in w ritte n . In S D ,
a n o rd e r c o m p ris e s fiv e s im p le a n d in d e p e n d e n t ite m s fro m a w a re h o u s e . In A T O , a n
in d iv id u a lly c o n fig u re d a n d a s s e m b le d P C is o rd e re d , w h ic h e x p la in s th e d iffe re n c e s
in c o m p le x ity . T h e s e q u e n c e o f S A P tra n s a c tio n s c o n s is ts o f a n u m b e r o f d ia lo g s te p s
o r s c re e n c h a n g e s . B y m e a n s o f a b e n c h m a rk d riv e r th e b e n c h m a rk s s im u la te
c o n c u rre n t u s e rs p a s s in g th ro u g h th e re s p e c tiv e s e q u e n c e w ith 1 0 s e c o n d s th in k tim e
a fte r e a c h d ia lo g s te p . A fte r a ll v irtu a l u s e rs h a v e lo g g e d in to th e S A P s y s te m a n d
s ta rte d w o rk in g in a ra m p -u p p h a s e , th e u s e rs re p e a t th e s e q u e n c e a s m a n y tim e s a s is
n e c e s s a ry to p ro v id e a s te a d y s ta te m e a s u re m e n t w in d o w o f a t le a s t 1 5 m in u te s . It is
re q u ire d th a t th e a v e ra g e re s p o n s e tim e o f th e d ia lo g s te p s is le s s th a n tw o s e c o n d s . In
c a s e o f S D , u s e rs , re s p o n s e tim e , a n d th e th ro u g h p u t e x p re s s e d a s S A P S a re th e m a in
p e rfo rm a n c e m e tric s . F o r A T O w h e re th e c o m p le te s e q u e n c e o f d ia lo g s te p s is c a lle d
a fu lly b u s in e s s p r o c e s s e d a s s e m b ly o r d e r , o n ly th e th ro u g h p u t in te rm s o f a s s e m b ly
o rd e rs p e r h o u r is re p o rte d .
L o o k in g a t S A P b e n c h m a rk re s u lts , it is im p o rta n t to d is tin g u is h “ tw o -tie r” a n d

“ th re e -tie r” re s u lts , w ith th e d iffe re n c e ly in g in th e a llo c a tio n o f th e d iffe re n t la y e rs
fo r p re s e n ta tio n , a p p lic a tio n a n d d a ta b a s e . T h e p re s e n ta tio n la y e r is w h e re u s e rs a re
ru n n in g th e ir fro n t-e n d to o ls , ty p ic a lly P C s ru n n in g a to o l c a lle d s a p g u i (S A P
G ra p h ic a l U s e r In te rfa c e ). In th e b e n c h m a rk , th is la y e r is re p re s e n te d b y a d riv e r
s y s te m w h e re fo r e a c h v irtu a l u s e r a s p e c ia l p ro c e s s is s ta rte d th a t b e h a v e s lik e a
“ s a p g u i” . T h is d riv e r la y e r c o rre s p o n d s to th e R T E la y e r in th e T P C -C b e n c h m a rk . It
is th e lo c a tio n o f th e S A P a p p lic a tio n la y e r s o ftw a re th a t d e te rm in e s th e im p o rta n t
d is tin c tio n b e tw e e n “ tw o -tie r” a n d “ th re e -tie r” re s u lts :
If it re s id e s o n th e s a m e s y s te m a s th e d a ta b a s e la y e r, th e re s u lt is c a lle d a
“ tw o -tie r” o r “ c e n tra l” re s u lt. In th is c a s e , a b o u t 9 0 % o f th e C P U c y c le s
a re s p e n t in th e S A P R /3 a p p lic a tio n a n d o n ly 1 0 % in th e d a ta b a s e c o d e .
If it re s id e s o n s e p a ra te a p p lic a tio n s e rv e rs (ty p ic a lly , th e re a re s e v e ra l o f
th e m s in c e th e a p p lic a tio n p a rt o f th e w o rk lo a d is e a s ily d is trib u ta b le ), th e
re s u lt is c a lle d a “ th re e -tie r” re s u lt.
S in c e in re s u lt q u o ta tio n s , th e n u m b e r o f u s e rs is ty p ic a lly re la te d to th e d a ta b a s e
s e rv e r, th e 9 0 -1 0 re la tio n e x p la in s w h y th e s a m e h a rd w a re s y s te m c a n h a v e a m u c h
h ig h e r “ th re e -tie r” re s u lt th a n “ tw o -tie r” re s u lt: In th e “ th re e -tie r” c a s e , a ll a c tiv itie s
o f th e S A P a p p lic a tio n c o d e a re o fflo a d e d to th e s e c o n d tie r. T y p ic a lly , th e
b e n c h m a rk e r c o n fig u re s a s m a n y a p p lic a tio n s y s te m s a s a re n e c e s s a ry to s a tu ra te th e
d a ta b a s e s e rv e r. In a la rg e th re e -tie r b e n c h m a rk in s ta lla tio n in N o v e m b e r 2 0 0 0 ,
F u jits u S ie m e n s u s e d 1 6 0 R /3 a p p lic a tio n s e rv e rs (4 -C P U P R IM E R G Y N 4 0 0 ,
P e n tiu m /X e o n -b a s e d ) in c o n n e c tio n w ith a la rg e 6 4 -C P U P R IM E P O W E R 2 0 0 0 a s th e
d a ta b a s e s e rv e r. F ig u re 4 s h o w s th e im p re s s iv e s c e n a rio o f th is la rg e S A P th re e -tie r
S D m e a s u re m e n t.
B e n c h m a r k D r iv e r R /3 C e n tr a l In s ta n
P R IM E P O W E R 8 0 0 P R IM E R G Y N 8 0 0
R /3 U p d a te S e r v e r
P R IM E R G Y 4 0 0 /N 4 0 0
R /3 D ia lo g S e r v e r
P R IM E R G Y H 4 0 0 /N 4 0 0
1 6 0 R /3 A p p lic a tio n
S e rv e rs
6 4 C P U D B S e rv e r
P R IM E P O W E R 2 0 0 0
w ith E M C S to ra g e
F ig . 2 . E x a m p le o f a la rg e S A P th re e -tie r in s ta lla tio n (F u jits u S ie m e n s , D e c . 2 0 0 0 )
T h e s p o n so r o f a th re e -tie r S A P b e n c h m a rk is fre e h o w to a c c o m m o d a te th e
p ro c e s s in g n e e d s o f th
e a p p lic a tio n la y e r. F o r e x a m p le , in a n o th e r S A P m e a su re m e n t
(IB M , w ith th e p 6 8 0 a
s d a ta b a s e s e rv e r), th e a p p lic a tio n la y e r c o n s is te d o f la rg e p 6 8 0
s y s te m s , to o . T h e b e nc h m a rk e r is fre e to c h o o s e b e tw e e n c o n fig u ra tio n s, b a se d fo r
e x a m p le o n e a s e o f h a n d lin g , o r o n th e d e s ire to im p le m e n t a ty p ic a l im p le m e n ta tio n
fo r re a l-life c lie n t/s e rv e r a p p lic a tio n e n v iro n m e n ts .
It m a y b e o f in te re s t to p ro v id e s o m e in s ig h t in to th e w o rk lo a d o n th e d a ta b a s e s e rv e r
a s th e m o s t im p o rta n t fa c to r in th e th re e -tie r c a s e . T h e re is a s im ila rity to th e T P C -C
b e c a u s e b o th b e n c h m a rk s d e a l w ith d a ta b a s e s e rv e r p e rfo rm a n c e . B u t w h ile in th e
T P C -C d is k I/O is a t th e c e n te r o f in te re s t, it is n e tw o rk I/O in th e S A P b e n c h m a rk .
T h e tw o w o rk lo a d s a re c o m p le m e n tin g e a c h o th e r in th is re g a rd in a v e ry n ic e w a y . In
th e b e n c h m a rk s e tu p s u m m a riz e d in F ig u re 4 th e re w e re 6 5 0 0 0 n e tw o rk ro u n d trip s o r
1 3 0 0 0 0 n e tw o rk I/O s p e r s e c o n d . F iv e o u t o f 6 4 C P U s o f th e d a ta b a s e s e rv e r w e re
a s s ig n e d to h a n d le th e in te rru p t p ro c e s s in g fo r th is tra ffic b e tw e e n a p p lic a tio n la y e r
a n d d a ta b a s e . T h e n e tw o rk s ta c k w a s s ta n d a rd T C P /IP , a m a tu re in d u s try s ta n d a rd .
4 B e n c h m a r k in g a n d C o m p u te r R e s e a r c h
It is w e ll k n o w n th a t s e v e ra l b e n c h m a rk s , in p a rtic u la r th e S P E C C P U b e n c h m a rk s ,
a re u s e d e x te n s iv e ly in th e m a n u fa c tu re rs ’ d e v e lo p m e n t la b s ; w e s h a ll d is c u s s th is
a s p e c t in s e c tio n 5 . A c ritic a l lo o k a t re c e n t c o m p u te r s c ie n c e c o n fe re n c e s o r jo u rn a ls ,
e s p e c ia lly in th e a re a o f c o m p u te r a rc h ite c tu re a n d c o m p ile r o p tim iz a tio n , s h o w s th a t
th is p h e n o m e n o n is n o t re s tric te d to m a n u fa c tu re rs , it a p p e a rs in a c a d e m ic re s e a rc h
a ls o . T a b le 6 s h o w s a s n a p s h o t fo r s e v e ra l m a jo r c o n fe re n c e s in 2 0 0 0 /2 0 0 1 , lis tin g
h o w o fte n b e n c h m a rk s w e re u s e d in c o n fe re n c e p a p e rs (N o te th a t s o m e p a p e rs u s e
m o re th a n o n e b e n c h m a rk p ro g ra m c o lle c tio n ).
2 0 0 R . W e ic k e r
T a b le 6 . U s e o f b e n c h m a rk c o lle c tio n s in c o n fe re n c e p a p e rs
A S P L O S S IG P L A N S IG A R C H
N o v . 2 0 0 0 Ju n e 2 0 0 1 Ju n e 2 0 0 1
O v e ra ll n u m b e r o f p a p e rs 2 4 3 0 2 4
S P E C C P U (9 2 , 9 5 , 2 0 0 0 ) 8 4 1 7
S P E C JV M 9 8 1 4 1
S P L A S H b e n c h m a rk s (P a ra lle l S y s te m s ) 2 - -
O ld e n b e n c h m a rk s (P o in te r, M e m o ry ) - - 3
O L T P / T P C 1 - 1
V a rio u s o th e r p ro g ra m c o lle c tio n s 1 1 1 3 8
N o b e n c h m a rk u s e d 6 9 3
L o o k in g a t ta b le 6 , w e c a n s a y :
B e n c h m a rk s th a t a re c o m p o s e d o f s e v e ra l in d iv id u a l c o m p o n e n ts (S P E C
C P U , S P E C J V M 9 8 , O ld e n , S P L A S H ) a re p a rtic u la rly p o p u la r. T h e s e a re
a ls o th e b e n c h m a rk s th a t a re re la tiv e ly e a s y to ru n , c o m p a re d w ith o th e rs .
F o r s p e c ific re s e a rc h to p ic s , s o m e b e n c h m a rk c o lle c tio n s th a t e m p h a s iz e a
p a rtic u la r a s p e c t a re p o p u la r: S P E C J V M 9 8 fo r J a v a , th e p a ra lle liz a b le
S P L A S H c o d e s fo r re s e a rc h o n p a ra lle l p ro c e s s in g , th e O ld e n b e n c h m a rk
c o lle c tio n o f p o in te r- a n d m e m o ry -in te n s iv e p ro g ra m s fo r re s e a rc h o n
p o in te rs a n d m e m o ry h ie ra rc h y .
T h e S P E C C P U b e n c h m a rk s a re th e m o s t p o p u la r b e n c h m a rk c o lle c tio n
o v e ra ll.
V e ry fe w p a p e rs b a s e q u a n tita tiv e d a ta o n O L T P w o rk lo a d s .
G iv e n th e im p o rta n c e o f O L T P , T P C - a n d S A P -ty p e w o rk lo a d s in c o m m e rc ia l
c o m p u tin g , o n e c a n a s k w h e th e r a c a d e m ic re s e a rc h n e g le c ts to g iv e g u id a n c e fo r
c o m p u te r d e v e lo p e rs a s fa r a s th e s e e n v iro n m e n ts a re c o n c e rn e d . F o rtu n a te ly ,
s p e c ia liz e d w o rk s h o p s lik e th e “ W o rk s h o p o n C o m p u te r A rc h ite c tu re E v a lu a tio n
u s in g C o m m e rc ia l W o rk lo a d s ” h e ld in c o n ju n c tio n w ith th e “ S y m p o s iu m o n H ig h
P e r f o r m a n c e C o m p u te r A r c h ite c tu r e ” ( w w w .h p c a c o n f .o r g ) f ill th is g a p .
“ A Q u a n tita tiv e A p p ro a c h ” is th e s u b title o f o n e o f th e m o s t p o p u la r a n d in flu e n tia l

b o o k s o n c o m p u te r a rc h ite c tu re [4 ]. T h e u n d e rly in g th e s is is th a t c o m p u te r
a rc h ite c tu re s h o u ld b e d riv e n b y id e a s w h o s e v a lu e c a n b e ju d g e d n o t o n ly b y
in tu itio n b u t a ls o b y q u a n tita tiv e c o n s id e ra tio n s . A n e w d e s ig n p rin c ip le h a s to s h o w a
q u a n tita tiv e a d v a n ta g e o v e r o th e r d e s ig n s . F o llo w in g th is a p p ro a c h , th e re is a n
a p p a r e n t tr e n d in P h .D . d is s e r ta tio n s a n d o th e r r e s e a r c h p a p e r s :
A n e w id e a is p re s e n te d .
A p o s s ib le im p le m e n ta tio n (h a rd w a re , s o ftw a re , o r a c o m b in a tio n o f b o th ) is
d is c u s s e d , w ith o p e ra tio n tim e s fo r in d iv id u a l o p e ra tio n s .
S im u la tio n re s u lts a re p re s e n te d , o n th e b a s is o f s o m e p o p u la r b e n c h m a rk s, v e ry
o fte n s o m e o r a ll o f th e S P E C C P U b e n c h m a rk s .
T h e c o n c lu s io n is “ W e fo u n d a n im p ro v e m e n t o f x x to y y p e rc e n t” .
S u c h a s ta te m e n t is c o n s id e re d p ro o f th a t th e id e a h a s v a lu e to it, a n d is re le v a n t fo r
s u c c e s s o r p ro je c ts , fo r re s e a rc h g ra n ts , a n d a c a d e m ic p ro m o tio n .
T h is te n d e n c y , b o th in m a n u fa c tu re rs ’ d e v e lo p m e n t la b s a n d in a c a d e m ic re s e a rc h ,
p la c e s a re s p o n s ib ility o n b e n c h m a rk d e v e lo p m e n t g ro u p s th a t c a n b e frig h te n in g :
S o m e tim e s , a s p e c ts o f b e n c h m a rk s b e c o m e im p o rta n t fo r s u c h d e s ig n o p tim iz a tio n s
th a t w e re n o t y e t th o u g h t o f, o r n e v e r d is c u s s e d , w h e n th e b e n c h m a rk s w e re s e le c te d .
S u d d e n ly , th e y d e v e lo p a n in flu e n c e n o t o n ly o n th e c o m p a ris o n o f to d a y ’s c o m p u te rs
b u t a ls o o n th e d e s ig n o f to m o rro w ’s c o m p u te rs . F o r e x a m p le , w h e n th e e a rlie r C P U
b e n c h m a rk s u ite s w e re p u t to g e th e r, S P E C lo o k e d m a in ly a t th e s o u rc e c o d e s . N o w ,
w ith te c h n iq u e s lik e fe e d b a c k o p tim iz a tio n a n d v a lu e p re d ic tio n b e c o m in g m o re
p o p u la r in re s e a rc h a n d p o s s ib ly a ls o in s ta te -o f-th e -a rt c o m p ile rs , o n e a ls o h a s to
lo o k m u c h m o re c lo s e ly a t th e in p u t d a ta th a t a re u s e d w ith th e b e n c h m a rk s : A re th e y
re p re s e n ta tiv e fo r ty p ic a l p ro b le m s in th e a re a c o v e re d b y th e b e n c h m a rk s ? D o th e y
e n c o u ra g e o p tim iz a tio n te c h n iq u e s th a t h a v e a n o v e r-p ro p o rtio n a l e ffe c t o n
b e n c h m a rk s , a s o p p o s e d to n o rm a l p ro g ra m s ?
T h is a u th o r o n c e a tte n d e d a n A S P L O S (A rc h ite c tu ra l S u p p o rt fo r P ro g ra m m in g
L a n g u a g e s a n d O p e ra tin g S y s te m s ) c o n fe re n c e a n d m a d e a c ritic a l re m a rk a b o u t s o m e
p a p e rs th a t re lie d , in h is o p in io n , to o m u c h o n th e S P E C C P U b e n c h m a rk s . H e w a s
a s k e d : “ D o y o u m e a n th a t w e s h o u ld n o t u s e th e S P E C b e n c h m a rk s ? ” T h is , o f c o u rs e ,
w o u ld m e a n to “ th ro w o u t th e c h ild w ith th e b a th w a te r” . T h e re s u lt w a s a s h o rt
c o n trib u tio n in a n e w s le tte r w id e ly re a d b y a c a d e m ic c o m p u te r a rc h ite c ts [1 2 ], a s k in g
to c o n tin u e u s in g th e S P E C C P U b e n c h m a rk s , b u t to u s e – if p o s s ib le – a ll o f th e m ,
a n d to r e p o r t a ll r e le v a n t c o n d itio n s ( e .g . c o m p ile r f la g s ) . T h e m a in r e q u e s t w a s n o t to
ta k e b e n c h m a rk s b lin d ly a s g iv e n , b u t to in c lu d e a c ritic a l d is c u s s io n o f th o s e
p ro p e rtie s o f th e b e n c h m a rk s th a t m a y b e re le v a n t fo r th e a rc h ite c tu ra l fe a tu re th a t is
s tu d ie d . F o r e x a m p le , in [1 4 ] it is s h o w n th a t th e p ro g ra m “ H E A L T H ” , o n e o f th e
o fte n -u s e d “ O ld e n B e n c h m a rk s ” s u p p o s e d ly re p re s e n ta tiv e fo r lin k e d -lis t d a ta
s tru c tu re s , is a lg o rith m ic a lly s o in e ffic ie n t th a t a n y p e rfo rm a n c e e v a lu a tio n s b a s e d o n
it a re h ig h ly q u e s tio n a b le . S u rp ris in g ly fe w re s e a rc h p a p e rs d is c u s s th e v a lu e o f
S P E C ’s b e n c h m a rk s fro m a n in d e p e n d e n t p e rs p e c tiv e , [2 ] is o n e o f th e m (h o w e v e r,
d is c u s s in g th e o ld C P U 9 2 b e n c h m a rk s ). O c c a s io n a lly , w h e n s u rp ris in g n e w re s u lts
c o m e u p , o n lin e d is c u s s io n g ro u p a re fu ll o f s ta te m e n ts “ B e n c h m a rk s , a n d in
p a rtic u la r th e S P E C C P U b e n c h m a rk s , a re b o g u s a n y w a y ” . B u t it is h a rd to fin d g o o d ,
c o n s tru c tiv e c ritic is m s o f b e n c h m a rk p ro g ra m s . M o re p a p e rs th a t c o m p a re
b e n c h m a rk s w ith o th e r p ro g ra m s th a t a re h e a v ily u s e d w o u ld b e p a rtic u la rly u s e fu l. In
o n e s u c h p a p e r [6 ], th e a u th o rs c o m p a re th e e x e c u tio n c h a ra c te ris tic s o f p o p u la r
W in d o w s -b a s e d d e s k to p a p p lic a tio n s w ith th a t o f s e le c te d S P E C C IN T 9 5 b e n c h m a rk s
a n d f in d s im ila r itie s ( e .g ., d a ta c a c h e b e h a v io r ) a s w e ll a s d if f e r e n c e s ( e .g ., in d ir e c t
b ra n c h e s ). G iv e n th e la rg e in flu e n c e – d ire c t o r in d ire c t, e v e n in a c a d e m ic re s e a rc h –
th a t b e n c h m a rk s c a n h a v e , it w o u ld b e b e n e fic ia l fo r b o th c o m p u te r d e v e lo p m e n t a n d
c o m p u te r s c ie n c e re s e a rc h if th e b e n c h m a rk s g e t th e a tte n tio n a n d c ritic a l s c ru tin y
th e y d e s e rv e .
O n e h a s to a c k n o w le d g e th a t th e th re e u s a g e a re a s fo r b e n c h m a rk s m e n tio n e d in th e
in tro d u c tio n
1 . C u s to m e r in fo rm a tio n a n d m a rk e tin g , g o a l: C o m p a re to d a y ’s c o m p u te rs o n a fa ir
b a s is ;
2 0 2 R . W e ic k e r
2 . D e s ig n in m a n u fa c tu re rs ’ la b s , g o a l: B u ild b e tte r c o m p u te rs ;
3 . C o m p u te r s c ie n c e re s e a rc h , g o a l: D e v e lo p d e s ig n id e a s fo r th e lo n g -te rm fu tu re ;
c a n c a ll fo r d iffe re n t s e le c tio n c rite ria : F o r g o a l 1 , it m a k e s s e n s e to h a v e
re p re s e n ta tiv e s o f to d a y ’s p ro g ra m s in a b e n c h m a rk s u ite , in c lu d in g in s ta n c e s o f
“ d u s ty d e c k ” c o d e . F o r g o a ls 2 a n d 3 , it w o u ld m a k e m u c h m o re s e n s e to o n ly h a v e
p ro g ra m s o f to m o rro w , p ro g ra m s w ith g o o d c o d in g s ty le , p o s s ib ly p ro g ra m s o f a ty p e
th a t ra re ly e x is ts to d a y . F o r e x a m p le , it h a s b e e n o b s e rv e d [6 ] th a t o b je c t-o rie n te d
p ro g ra m s a n d p ro g ra m s th a t m a k e fre q u e n t u s e o f d y n a m ic a lly lin k e d lib ra rie s
(D L L ’s ) h a v e d iffe re n t e x e c u tio n c h a ra c te ris tic s th a n th e c la s s ic a l C a n d F o rtra n
p ro g ra m in to d a y ’s S P E C C P U s u ite s .
In a d d itio n to its u n d e r-re p re s e n ta tio n in a c a d e m ic re s e a rc h , a c ritic a l d is c u s s io n o f

b e n c h m a rk s is a ls o o fte n m is s in g fro m c o m p u te r s c ie n c e c u rric u la . In th e a re a o f
c o m p u te r a rc h ite c tu re , u n iv e rs itie s te n d to e d u c a te fu tu re c o m p u te r d e s ig n e rs, n o t so
m u c h fu tu re c o m p u te r u s e rs . O n th e o th e r h a n d , it is o b v io u s th a t m a n y m o re
g ra d u a te s w ill e v e n tu a lly e n d u p m a k in g p u rc h a s e d e c is io n s th a n m a k in g d e s ig n
d e c is io n s fo r c o m p u te rs . T h e y w o u ld b e n e fit fro m a d e e p e r k n o w le d g e a b o u t th e
v a lu e s a n d th e p itfa lls o f b e n c h m a rk s .
5 U s e o f B e n c h m a r k s , I s s u e s , a n d O p p o r tu n itie s
T h e p re s e n ta tio n o f v a rio u s im p o r ta n t b e n c h m a rk s in s e c tio n 3 h a s a lre a d y sh o w n

so m e a sp e c ts th a t s e e m to a p p e a r a c ro ss se v e ra l b e n c h m a rk s. B e n c h m a rk s a s d riv e rs
o f o p tim iz a tio n s , c o n flic tin g g o a ls fo r b e n c h m a rk s , a n d a p o s s ib le b ro a d e r u se o f
b e n c h m a rk s, fo r m o re th a n ju st th e g e n e ra tio n o f m a g ic a l n u m b e rs, a r e so m e o f th e se
a s p e c ts .
5 .1 B e n c h m a r k s a s D r iv e r s o f O p tim iz a tio n
It is w e ll k n o w n , a n d in te n d e d a n d e n c o u ra g e d b y b e n c h m a rk o rg a n iz a tio n s , th a t g o o d
b e n c h m a rk s d riv e in d u s try a n d te c h n o lo g y fo rw a rd . E x a m p le s a re :
A d v a n c e s in c o m p ile r o p tim iz a tio n e n c o u ra g e d b y th e S P E C C P U
b e n c h m a rk s.
A d v a n c e s in d a ta b a s e s o ftw a re e n c o u ra g e d b y th e T P C b e n c h m a rk s .
O p tim iz a tio n s in W e b s e rv e rs lik e c a c h in g fo r s ta tic re q u e s ts , e n c o u ra g e d
b y th e S P E C w e b b e n c h m a rk s .
It m u s t b e e m p h a s iz e d th a t s u c h a d v a n c e s in p e rfo rm a n c e a re a w e lc o m e a n d
in te n d e d e ffe c t o f b e n c h m a rk s , a n d th a t th e b e n c h m a rk in g o rg a n iz a tio n s c a n ta k e
p rid e in th e s e d e v e lo p m e n ts . H o w e v e r, s o m e d e v e lo p m e n ts c a n a ls o b e
c o u n te rp ro d u c tiv e :
C o m p ile r w rite rs m a y c o n c e n tra te o n o p tim iz a tio n s th a t a re a llo w e d fo r
th e S P E C C P U b e n c h m a rk s b u t th a t a re n o t u s e d b y 8 0 o r 9 0 % o f th e
s o ftw a re d e v e lo p e rs : T h e n , th e b e n c h m a rk s le a d to a o n e -s id e d a n d
p ro b le m a tic a llo c a tio n o f re s o u rc e s . S P E C h a s trie d to c o u n te r th is w ith
th e re q u ire m e n t to p u b lis h “ b a s e lin e ” re s u lts . H o w e v e r, th is c a n o n ly b e
s u c c e s s fu l a s lo n g a s “ b a s e lin e ” o p tim iz a tio n s a llo w e d b y S P E C a re re a lly

re le v a n t in th e c o m p ila tio n /lin k a g e p ro c e s s o f re a l-life p ro g ra m s .
H a rd w a re d e s ig n e rs m a y o p tim iz e th e s iz e o f th e ir c a c h e s to w h a t th e
c u rre n t b e n c h m a rk s re q u ire , irre s p e c tiv e o f lo n g e r-te rm n e e d s .
D a ta b a s e d e s ig n e rs m a y c o n c e n tra te to o m u c h o n q u e ry s itu a tio n s th a t
o c c u r in b e n c h m a rk s .
L o o k in g a t s u c h is s u e s fro m a m o re g e n e ra l p o in t o f v ie w , th e o v e ra ll is s u e is th a t o f
th e re p re s e n ta tiv ity o f th e b e n c h m a rk ; it te n d s to a p p e a r in s e v e ra l c o n te x ts :
T P C h a d its p ro b le m s w ith m a te ria liz e d v ie w s ;
S P E C C P U h a s its o n g o in g d e b a te s o n p e a k v s . b a s e lin e ;
S P E C W e b h a d d e b a te s a b o u t w e b s e rv e r s o ftw a re th a t w a s , in o n e w a y o r
a n o th e r, in te g ra te d w ith th e o p e ra tin g s y s te m , le a d in g to a s u d d e n in c re a s e
in p e rfo rm a n c e .
L e t u s lo o k a t th e e x a m p le o f th e S P E C C P U b e n c h m a rk s w h e re th e c o m p o n e n t
b e n c h m a rk s a re g iv e n in s o u rc e c o d e fo rm . If s o m e a s p e c t o f th e b e n c h m a rk tu rn s o u t
to b e p a rtic u la rly re le v a n t fo r th e m e a s u re d p e rfo rm a n c e , b u t if th is p ro p e rty is n o t
s h a re d b y im p o rta n t s ta te -o f-th e -a rt p ro g ra m s , a p a rtic u la r o p tim iz a tio n c a n s u g g e s t to
th e n a iv e r e a d e r a p e r f o r m a n c e im p r o v e m e n t th a t is u n r e a l, i.e . th a t is b a s e d o n th e
s p e c ific p ro p e rtie s o f a p a rtic u la r b e n c h m a rk o n ly . A n is s u e th a t s e e m s to c o m e u p
re p e a te d ly w ith th e S P E C C P U b e n c h m a rk s is a “ s ta irc a s e ” e ffe c t o f c e rta in s in g le
b e n c h m a rk s w ith re s p e c t to c a c h e s . S o m e p ro g ra m s , to g e th e r w ith th e ir in p u t d a ta
s e ts , h a v e a c ritic a l w o rk in g s e t s iz e : If th e w o rk in g s e t fits in to a c a c h e , p e rfo rm a n c e
is a m a g n itu d e b e tte r th a n in th e c a s e th a t th e w o rk in g s e t s iz e m a k e s m a n y m e m o ry
a c c e s s e s n e c e s s a ry , p o s s ib ly c o n n e c te d w ith “ c a c h e th ra s h in g ” e ffe c ts . S e v e ra l S P E C
C P U b e n c h m a r k s ( 0 3 0 .m a tr ix 3 0 0 in C P U 8 9 , 0 2 3 .e q n to tt in C P U 9 2 , 1 7 3 .a r t in
C P U 2 0 0 0 ) a p p a re n tly h a d s u c h “ m a g ic a l” w o rk in g s iz e p a ra m e te rs a n d , c o n s e q u e n tly ,
s h o w e d s u d d e n in c re a s e s in th e ir p e rfo rm a n c e th ro u g h c o m p ile r o p tim iz a tio n s th a t
m a n a g e d to p u s h th e w o rk in g s e t s iz e b e lo w a c ritic a l, c a c h e -re la te d b o u n d a ry . S u c h
o p tim iz a tio n s ty p ic a lly g e n e ra te d h e a te d d is c u s s io n s in s id e a n d o u ts id e o f S P E C :
“ C a n s u c h a n in c re a s e in p e rfo rm a n c e b e re a l? ” T h e o p tim iz a tio n s th e m s e lv e s c a n b e
“ le g a l” , i.e . g e n e r a l e n o u g h th a t th e y a r e a p p lic a b le to o th e r p r o g r a m s a ls o . B u t th e
s p e c ific p e rfo rm a n c e g a in m a y n o t b e re p re s e n ta tiv e ; it m a y c o m e fro m a p a rtic u la r
p ro g ra m m in g s ty le th a t S P E C o v e rlo o k e d in its b e n c h m a rk s e le c tio n p ro c e s s .
It is n o t o n ly th e c o m p ile r a n d th e C P U b e n c h m a rk s w h e re th e q u e s tio n o f
re p re s e n ta tiv ity is re le v a n t. S y s te m b e n c h m a rk s o fte n te s t th e p e rfo rm a n c e o f a
la y e re d s o ftw a re s y s te m . T h e s e v e n -la y e r IS O re fe re n c e m o d e l fo r n e tw o rk
a rc h ite c tu re s is a g o o d e x a m p le : In th e in te re s t o f g o o d s o ftw a re e n g in e e rin g p ra c tic e s
(m o d u la rity , e n c a p s u la tio n , m a in ta in a b ility ), s o ftw a re is o fte n o rg a n iz e d in la y e rs ,
e a c h p e rfo rm in g a s p e c ific ta s k . O n th e o th e r h a n d , it is w e ll k n o w n th a t s h o rtc u ts
th ro u g h th e s e la y e rs u s u a lly re s u lt in b e tte r p e rfo rm a n c e ; th e v a rio u s W e b s e rv e r
c a c h e s n o w u s e d in m o s t S P E C w e b re s u lts a re a g o o d e x a m p le . W h e re s h o u ld
b e n c h m a rk s p e c ific a tio n s d ra w th e lin e w h e n s u c h la y e r-tra n s g re s s in g o p tim iz a tio n s
o c c u r fo r th e firs t tim e a t th e o c c a s io n o f a n e w b e n c h m a rk re s u lt? W h a t w ill b e s e e n
a s a n a d v a n c e in te c h n o lo g y , a n d w h a t w ill b e s e e n a s a “ b e n c h m a rk s p e c ia l” , a
c o n s tru c t th a t is o u tla w e d in m o s t b e n c h m a rk s ’ ru le s ? E v e ry o n e w ill a g re e o n e x tre m e
c a se s:
2 0 4 R . W e ic k e r
R e s u lts w h e re th e tra d itio n a l b a rrie r b e tw e e n s y s te m m o d e a n d u s e r m o d e

is a b a n d o n e d in fa v o r o f p e rfo rm a n c e d o n o t re fle c t ty p ic a l u s a g e a n d a re
rig h tly fo rb id d e n .
O n th e o th e r h a n d , th e fa c t th a t d a ta b a s e s ru n n in g o n to p o f U n ix ty p ic a lly
o m it th e U n ix file s y s te m la y e r a n d o p e ra te d ire c tly o n ra w d e v ic e s w ill b e
c o n s id e re d s ta n d a rd p ra c tic e .
It is th e c a s e s in b e tw e e n s u c h e x tre m e s th a t o fte n c a u s e le n g th y d e b a te s in
b e n c h m a rk o rg a n iz a tio n s . B e c a u s e s u c h o p tim iz a tio n s a re ra re ly fo re s e e n a t th e tim e
w h e n th e b e n c h m a rk , to g e th e r w ith its ru le s , is a d o p te d , th e s e c a s e s ty p ic a lly a ris e
d u rin g re s u lt re v ie w s , w h e re im m e d ia te v e n d o r in te re s ts a re a ls o in v o lv e d . T P C h a s a
s e p a ra te b o d y , th e “ T e c h n ic a l A d v is o ry B o a rd ” (T A B ) th a t m a k e s s u g g e s tio n s in
c a s e s o f re s u lt c o m p lia n c e c o m p la in ts o r w h e n T P C a u d ito rs a s k fo r a d e c is io n . S P E C
le a v e s th e s e d e c is io n s to th e a p p ro p ria te s u b c o m m itte e , w h e re s u c h d is c u s s io n s o fte n
ta k e c o n s id e ra b le tim e a w a y fro m th e c o m m itte e ’ s te c h n ic a l w o rk , e .g . n e w
b e n c h m a rk d e v e lo p m e n t. B u t s o m e o n e h a s to d e c id e ; th e a lte rn a tiv e th a t a te s t
s p o n s o r c a n d o a n y th in g a n d re m a in u n c h a lle n g e d w o u ld b e e v e n w o rs e .
W ith its A C ID te s ts , T P C h a s fo u n d a n in te re s tin g w a y to e n fo rc e th e re q u ire m e n t th a t

th e s o ftw a re u s e d d u rin g th e b e n c h m a rk m e a s u re m e n t is “ re a l d a ta b a s e s o ftw a re ” .
T h e A C ID te s ts a re a re q u ire d p a rt o f th e b e n c h m a rk ; if th e s y s te m fa ils th e m , n o
re s u lt c a n b e s u b m itte d . B u t th e y a re e x e c u te d s e p a ra te ly fro m th e p e rfo rm a n c e
m e a s u re m e n t, o u ts id e th e m e a s u re m e n t in te rv a l. S o fa r, S P E C h a s n o t fo rm u la te d
s im ila r te s ts fo r its b e n c h m a rk a lth o u g h o n e c o u ld im a g in e s u c h te s ts :
F o r th e C P U b e n c h m a rk s , S P E C re q u ire s th a t “ h a rd w a re a n d s o ftw a re
u s e d to ru n th e C IN T 2 0 0 0 /C F P 2 0 0 0 p ro g ra m s m u s t p ro v id e a s u ita b le
e n v iro n m e n t fo r ru n n in g ty p ic a l C , C + + , o r F o rtra n p ro g ra m s ” (R u n
R u le s , s e c tio n 1 .2 ). F o r b a s e lin e , th e re is th e a d d itio n a l re q u ire m e n t th a t
“ th e o p tim iz a tio n s u s e d a r e e x p e c te d to b e s a f e ” ( 2 .2 .1 ) , w h ic h is
n o rm a lly in te rp re te d b y th e s u b c o m m itte e a s a re q u ire m e n t th a t “ th e
s y s te m , a s u s e d in b a s e lin e , im p le m e n ts th e la n g u a g e c o r r e c tly ” ( 2 .2 .7 ) .
T h e id e a o f a “ M h ills to n e ” p ro g ra m th a t w o u ld , o u ts id e th e m e a s u re m e n t
in te rv a l a n d w ith m o re s p e c ia liz e d p ro g ra m s , te s t fo r c o rre c t
im p le m e n ta tio n , a n d th a t w o u ld fla g ille g a l o v e r-o p tim iz a tio n s , h a s b e e n
d e b a te d a fe w tim e s . H o w e v e r, w ith its lim ite d re s o u rc e s , S P E C s o fa r h a s
n o t y e t im p le m e n te d s u c h a te s t.
T h e S P E C w e b b e n c h m a rk s a re h e a v ily d o m in a te d b y T C P /IP a c tiv ity .
S P E C re q u ire s in th e R u n R u le s th a t th e re le v a n t p ro to c o ls a re fo llo w e d ;
b u t th e re is n o c h e c k in c lu d e d in th e b e n c h m a rk . S in c e th e b e n c h m a rk
ty p ic a lly is e x e c u te d in a n is o la te d lo c a l n e tw o rk , a n im p le m e n ta tio n
c o u ld , th e o re tic a lly , ta k e a s h o rtc u t a ro u n d re q u ire m e n ts lik e th e a b ility to
re p e a t tra n s m is s io n s th a t h a v e fa ile d .
T h e S P E C S F S b e n c h m a rk re q u ire s s ta b le s to ra g e fo r th e file s y s te m u s e d
in th e b e n c h m a rk m e a s u re m e n t. H o w e v e r, th e b e n c h m a rk d o e s n o t
c o n ta in c o d e th a t w o u ld te s t th is .
O th e r a s p e c ts o f re p re s e n ta tiv ity s o fa r h a v e b a re ly b e e n to u c h e d b y b e n c h m a rk
d e s ig n e rs : In [9 ], S h u b h e n d u M u k h e rje e p o in ts o u t th a t fu tu re p ro c e s s o rs m ig h t h a v e
s e v e ra l m o d e s o f o p e ra tio n in re la tio n to c o s m ic ra y s : A “ fa u lt to le ra n t” m o d e w h e re
in te rm itte n t e rro rs c a u s e d b y s u c h c o s m ic ra y s a re c o m p e n s a te d b y a d d itio n a l c irc u its

(w h ic h m a k e th e C P U s lo w e r), a n d a “ fa s t e x e c u tio n ” m o d e w h e re s u c h c irc u itry is
n o t p re s e n t o r n o t a c tiv a te d . H e a s k s w h a t w o u ld b e th e re le v a n c e o f b e n c h m a rk
re s u lts h a v e th a t c a n o n ly b e re p ro d u c e d in th e b a s e m e n t o f th e m a n u fa c tu re r’s te s tin g
la b w h e re th e c o m p u te r is b e tte r s h ie ld e d fro m c o s m ic ra y s . In th e s o ftw a re a re a ,
fu tu re S h a re d M e m o ry P ro c e s s o r (S M P ) s y s te m s m a y h a v e c a c h e c o h e re n c y p ro to c o ls
th a t c a n s w itc h fro m “ s n o o p y ” to “ d ire c to ry b a s e d ” a n d v ic e v e rs a , w ith p o s s ib le
p e rfo rm a n c e im p lic a tio n s fo r la rg e v s . s m a ll s y s te m s . B e n c h m a rk re s u lts th a t h a v e
b e e n o b ta in e d in o n e e n v iro n m e n t m a y n o t b e re p re s e n ta tiv e fo r th e o th e r
e n v iro n m e n t.
5 .2 C o n flic tin g G o a ls fo r B e n c h m a r k s
T h e d is c u s s io n s in th e p re v io u s s e c tio n s c a n b e s u b s u m e d u n d e r th e title
“ R e p re s e n ta tiv ity ” . A n o b v io u s w a y to w a rd s a c h ie v in g th is g o a l is th e in tro d u c tio n o f
n e w b e n c h m a rk s (lik e S P E C ’s s e q u e n c e o f C P U b e n c h m a rk s , fro m C P U 8 9 to
C P U 2 0 0 0 ), o r a t le a s t th e in tro d u c tio n o f n e w ru le s fo r e x is tin g b e n c h m a rk s (lik e
T P C ’s s e q u e n c e o f “ m a jo r re v is io n s ” fo r its T P C -C b e n c h m a rk ). N e w b e n c h m a rk s
c a n m a k e o v e r-a g g re s s iv e o p tim iz a tio n s irre le v a n t, a n d th e e x p e c ta tio n th a t
b e n c h m a rk s w ill b e re tire d a fte r a fe w y e a rs c a n d is c o u ra g e s p e c ia l-c a s e o p tim iz a tio n s
in th e firs t p la c e . O n th e o th e r h a n d , m a rk e tin g d e p a rtm e n ts a n d e n d u s e rs w a n t
b e n c h m a rk s to b e “ s ta b le ” , to b e v a lid o v e r m a n y y e a rs . T h is le a d s , fo r e x a m p le , to
o n e o f th e m o s t fre q u e n tly a s k e d q u e s tio n s to S P E C : “ C a n I c o n v e rt S P E C in t9 5
ra tin g s to S P E C in t2 0 0 0 ra tin g s ? ” S P E C ’s o ffic ia l a n s w e r “ Y o u c a n n o t – th e p ro g ra m s
a re d iffe re n t” is n o t s a tis fy in g fo r m a rk e tin g b u t n e c e s s a ry fro m a te c h n ic a l p o in t o f
v ie w .
5 .3 M o r e T h a n J u s t G e n e r a to r s o f S in g le N u m b e r s : B e n c h m a r k s a s U s e fu l
T o o ls fo r th e I n te r e s te d U s e r
L o o k in g a t o ffic ia l b e n c h m a rk re s u lt p u b lic a tio n s, re a d e rs w ill n o tic e th a t a lm o s t a ll

re s u lts h a v e b e e n m e a su re d a n d s u b m itte d b y h a rd w a re o r s o ftw a re v e n d o rs . G iv e n
th e la rg e c o s ts th a t a re o fte n in v o lv e d , th is is u n d e rs ta n d a b le . F ro m a fa irn e s s p o in t o f
v ie w , th is c a n a ls o m a k e s e n s e : E v e ry b e n c h m a rk e r trie s to a c h ie v e th e b e st re s u lt
p o s s ib le ; th e re fo re , if a ll m e a s u r e m e n ts a re g o v e rn e d b y th e s a m e ru le s , th e re s u lts
s h o u ld b e c o m p a ra b le .
O n th e o th e r h a n d , s o m e im p o rta n t q u e s tio n s re m a in u n a n s w e re d w ith th is p ra c tic e :

W h a t w o u ld b e th e re s u lt if th e m e a s u re m e n ts w e re p e rfo rm e d u n d e r c o n d itio n s th a t
a re n o t o p tim a l b u t re fle c t ty p ic a l s y s te m u s a g e b e tte r? E x a m p le s a re :
W h a t C P U b e n c h m a rk re s u lts w o u ld b e a c h ie v e d if th e c o m p ila tio n u s e s a
b a s ic “ - O ” o p tim iz a tio n o n ly ?
W h a t w o u ld b e th e re s u lt o f a W e b b e n c h m a rk th a t in te n tio n a lly d o e s n o t
u se o n e o f th e p o p u la r W e b c a c h e a c c e le ra to rs ?
2 0 6 R . W e ic k e r
W h a t w o u ld b e th e re s u lt o f a T P C o r S A P b e n c h m a rk m e a s u re m e n t if th e
C P U u tiliz a tio n o f th e s e rv e r is a t a le v e l re c o m m e n d e d fo r e v e ry d a y u se ,
a n d n o t a s c lo s e to 1 0 0 % a s p o s s ib le ?
M a n u fa c tu re rs ty p ic a lly d o n o t p u b lis h s u c h m e a s u re m e n ts b e c a u se th e y a re a fra id o f
u n fa ir c o m p a ris o n s : E v e ry o n e n o t u s in g a p e rfo r m a n c e -re le v a n t o p tim iz a tio n th a t is
le g a l a c c o rd in g to th e b e n c h m a rk ’s ru le s w o u ld r u n th e ris k th a t th e re s u lt w o u ld le t
h is s y s te m a p p e a r in fe rio r to c o m p e titiv e s y s te m s fo r w h ic h o n ly h ig h ly o p tim iz e d
re s u lts a re p u b lis h e d
A s a c o n s e q u e n c e , s u c h m e a s u re m e n ts u n d e r s u b -o p tim a l c o n d itio n s a re o fte n

p e rfo rm e d in th e d e v e lo p m e n t la b s , b u t th e re s u lts a re n e v e r p u b lis h e d . T h e o n ly
m e a n s to c h a n g e th is w o u ld b e re s u lts p u b lis h e d b y in d e p e n d e n t o rg a n iz a tio n s s u c h a s
te c h n ic a l m a g a z in e s . F o r its C P U b e n c h m a rk s , S P E C h a s ru le s s ta tin g th a t if re s u lts
m e a s u re d b y th ird p a rtie s a re s u b m itte d to S P E C , a v e n d o r w h o s e p ro d u c ts a re
a ffe c te d c a n p ro te s t a g a in s t s u c h a re s u lt a n d h o ld u p re s u lt p u b lic a tio n fo r a fe w
w e e k s b u t c a n n o t p re v e n t a n e v e n tu a l p u b lic a tio n . T h e id e a is th a t s u c h a p u b lic a tio n
c o u ld b e b a s e d o n re a lly s illy c o n d itio n s (“ L e t’s m a k e th is m a n u fa c tu re r’s s y s te m
lo o k b a d ” ), a n d th a t th e m a n u fa c tu re r s h o u ld h a v e a fa ir c h a n c e to c o u n te r w ith h is
o w n m e a s u re m e n t. B u t in p rin c ip le , m e a s u re m e n ts a n d re s u lt s u b m is s io n s b y n o n -
v e n d o rs a re a llo w e d ; th e y ju s t h a p p e n v e ry ra re ly . O n th e o th e r h a n d , s u c h
m e a s u re m e n ts , w ith a p p ro p ria te a c c o m p a n y in g d o c u m e n ta tio n , c o u ld h a v e
c o n s id e ra b le v a lu e fo r a n in fo rm e d a n d k n o w le d g e a b le re a d e r.
A t le a st fo r e a s y -to -u s e b e n c h m a rk s lik e S P E C ’s C P U a n d S P E C jb b 2 0 0 0
b e n c h m a r k s , v a lu a b le in s ig h ts c o u ld b e g a in e d b y th e p u b lic a tio n s o f m o re re s u lts th a t
th o se p ro m o te d fo r m a rk e tin g p u rp o s e s . W ith a p p ro p ria te d o c u m e n ta tio n , re a d e rs
c o u ld a sk fo r a n s w e rs to q u e s tio n s lik e
W h a t is th e p e rfo rm a n c e g a in fo r a n e w p ro c e s s o r a rc h ite c tu re if p ro g ra m s
a re n o t re c o m p ile d ? T h is o fte n h a p p e n s w ith im p o rta n t IS V c o d e s .
W h a t is th e p e rfo rm a n c e lo s s if a P C u s e s R A M c o m p o n e n ts th a t a re
s lo w e r b u t c h e a p e r?
F o r d iffe re n t C P U a rc h ite c tu re s , h o w m u c h d o th e y d e p e n d o n
s o p h is tic a te d c o m p ila tio n te c h n iq u e s , h o w w e ll c a n th e y d e liv e r
a c c e p ta b le p e rfo rm a n c e in e n v iro n m e n ts th a t d o n o t in c lu d e s u c h
te c h n iq u e s ?
It w o u ld b e u n re a lis tic to e x p e c t a n s w e rs to s u c h q u e s tio n s fro m v e n d o r-p u b lis h e d
b e n c h m a rk re s u lts ; v e n d o rs c a n n o t a ffo rd to p u b lis h a n y th in g b u t th e b e s t re s u lts . B u t
if b e n c h m a rk s a re w e ll-d e s ig n e d , re p re s e n ta tiv e , a n d p o rta b le – a s th e y s h o u ld b e -,
th e n , w ith s o m e e ffo rts fro m in fo rm e d u s e r o rg a n iz a tio n s , re s e a rc h e rs , o r m a g a z in e s ,
th e y c o u ld b e u s e d fo r m u c h m o re th a n w h a t is v is ib le to d a y .
A c k n o w le d g m e n ts. I w a n t to th a n k m y c o lle a g u e s in th e F u jits u S ie m e n s B e n c h m a rk

C e n te r in P a d e rb o rn fo r v a lu a b le s u g g e s tio n s , in p a rtic u la r W a lte r N its c h e , L u d g e r
M e y e r, a n d S te fa n G ra d e k , w h o se “ P e rfo r m a n c e B rie f P R IM E P O W E R ” [ 1 0 ]
p ro v id e d v a lu a b le in fo r m a tio n o n se v e ra l b e n c h m a rk s th a t I h a v e n o t b e e n p u rsu in g
a c tiv e ly m y se lf. In a d d itio n , I w a n t to th a n k m y c o lle a g u e s in th e S P E C O p e n

S y s te m s G ro u p ; a la rg e p a rt o f th is p a p e r is b a s e d o n m a n y y e a rs o f e x p e rie n c e in th is
g ro u p . H o w e v e r, it is e v id e n t th a t it c o n ta in s a n u m b e r o f s ta te m e n ts th a t re fle c t th e
a u th o r’s p e rso n a l o p in io n o n c e rta in to p ic s , w h ic h m a y o r m a y n o t b e s h a re d b y o th e r
S P E C O S G re p re s e n ta tiv e s . T h e re fo re , it m u s t b e s ta te d th a t th is is a p e rs o n a l a c c o u n t
ra th e r th a n a p o lic y s ta te m e n t o f e ith e r S P E C o r F u jits u S ie m e n s C o m p u te rs
R e fe r e n c e s
1 . Y in C h a n , A s h o k S u n d a rs a n a m , a n d A n d re w W o lfe : T h e E ffe c t o f C o m p ile r-F la g T u n in g

in S P E C B e n c h m a r k P e r f o r m a n c e . A C M C o m p u te r A r c h ite c tu r e N e w s 2 2 ,4 ( S e p t. 1 9 9 4 ) ,
6 0 – 7 0
2 . J e ffre y D . G e e , M a rk D . H ill, D io n is io s N . P n e v m a tik o s , a n d A la n J a y S m ith : C a c h e
P e rfo rm a n c e o f th e S P E C 9 2 B e n c h m a rk S u ite . IE E E M ic ro , A u g u s t 1 9 9 3 , 1 7 – 2 7
3 . J im G ra y : T h e B e n c h m a rk H a n d b o o k fo r D a ta b a s e a n d T ra n s a c tio n P ro c e s s in g S y s te m s .
n d
M o rg a n K a u fm a n n , S a n M a te o , 2 E d itio n , 1 9 9 3 , 5 9 2 p a g e s
n d
4 . J o h n H e n n e s s y a n d D a v id P a tte rs o n : C o m p u te r A rc h ite c tu re . A Q u a n tita tiv e A p p ro a c h . 2
E d itio n , M o rg a n K a u fm a n n , S a n F ra n c is c o 1 9 9 6 , 7 6 0 p a g e s p lu s a p p e n d ix e s
5 . J o h n L . H e n n in g : S P E C C P U 2 0 0 0 : M e a s u rin g C P U P e rfo rm a n c e in th e N e w M ille n iu m .
C o m p u te r ( I E E E ) 3 3 ,7 ( J u ly 2 0 0 0 ) , 2 8 – 3 5
6 . D e n n is C . L e e , e t a l: E x e c u tio n C h a ra c te ris tic s o f D e s k to p A p p lic a tio n s o n W in d o w s N T .
th
2 5 A n n u a l S y m p o s iu m o n C o m p u te r A rc h ite c tu re , = A C M C o m p u te r A rc h ite c tu re N e w s
2 6 ,3 ( J u n e 1 9 9 8 ) , 2 7 – 3 8
7 . J o h n M c C a lp in : h ttp ://w w w .c s .v ir g in ia .e d u /s tr e a m /r e f .h tm l: F A Q lis t ( F r e q u e n tly A s k e d
Q u e s tio n s ) a b o u t S tre a m
8 . N ik k i M irg h a fo ri, M a rg re t J a c o b y , a n d D a v id P a tte rs o n : T ru th in S P E C B e n c h m a rk s .
A C M C o m p u te r A r c h ite c tu r e N e w s 2 3 ,5 ( D e c . 1 9 9 5 ) , 3 4 – 4 2
9 . S h u b h e n d u S . M u k h e rje e : N e w C h a lle n g e s in B e n c h m a rk in g F u tu re P ro c e s s o rs . F ifth
W o rk s h o p o n C o m p u te r A rc h ite c tu re E v a lu a tio n u s in g C o m m e rc ia l W o rk lo a d s , 2 0 0 2
1 0 . P e rfo rm a n c e B rie f P R IM E P O W E R (P R IM E P O W E R P e rfo rm a n c e P a p e r). F u jits u
S ie m e n s , L o B U n ix , B e n c h m a rk C e n te r P a d e rb o rn , M a rc h 2 0 0 2 . A v a ila b le , a s o f J u n e
2 0 0 2 , u n d e r h ttp ://w w w .f u jits u - s ie m e n s .c o m /b c c /p e r f o r m a n c e .h tm l
1 1 . R e in h o ld P . W e ic k e r: A n O v e rv ie w o f C o m m o n B e n c h m a rk s . C o m p u te r (IE E E ) 2 3 , 1 2
(D e c . 1 9 9 0 ), 6 5 – 7 5
1 2 . R e in h o ld W e ic k e r: O n th e U s e o f S P E C B e n c h m a rk s in C o m p u te r A rc h ite c tu re R e s e a rc h .
A C M C o m p u te r A r c h ite c tu r e N e w s 2 5 ,1 ( M a r c h 1 9 9 7 ) , 1 9 – 2 2
1 3 . R e in h o ld W e ic k e r: P o in t: W h y S P E C S h o u ld B u rn (A lm o s t) A ll F la g s , a n d W a lte r B a y s .
C o u n te r p o in t: D e f e n d in g th e F la g . S P E C N e w s le tte r 7 ,4 , D e c . 1 9 9 5 , 5 - 6
1 4 . C ra ig B . Z ille s : B e n c h m a rk H E A L T H C o n s id e re d H a rm fu l. A C M C o m p u te r A rc h ite c tu re
N e w s 2 9 ,3 ( J u n e 2 0 0 1 ) , 4 – 5
Benchmarking Models and Tools for Distributed
Web-Server Systems
Mauro Andreolini1 , Valeria Cardellini1 , and Michele Colajanni2

1
Dept. of Computer, Systems and Production
University of Roma “Tor Vergata”
Roma I-00133, Italy
{andreolini,cardellini}@ing.uniroma2.it,
2
Dept. of Information Engineering
University of Modena
Modena I-41100, Italy
[email protected]
Abstract. This tutorial reviews benchmarking tools and techniques

that can be used to evaluate the performance and scalability of highly
accessed Web-server systems. The focus is on design and testing of lo-
cally and geographically distributed architectures where the performance
evaluation is obtained through workload generators and analyzers in a
laboratory environment. The tutorial identifies the qualities and issues
of existing tools with respect to the main features that characterize a
benchmarking tool (workload representation, load generation, data col-
lection, output analysis and report) and their applicability to the analysis
of distributed Web-server systems.
1 Introduction
The explosive growth in size and usage of the Web is causing enormous strain on
users, network service, and content providers. Sophisticated software components
have been implemented for the provision of critical services through the Web.
Consequently, many research efforts have been directed toward improving the
performance of Web-based services through caching and replication solutions. A
large variety of novel content delivery architectures, such as distributed Web-
server systems, cooperative proxy systems, and content distribution networks
have been proposed and implemented [35].
One of the key issues is the evaluation of the performance and scalability of
these systems under realistic workload conditions. In this tutorial, we focus on
the use of benchmarking models and tools during the design, testing, and alter-
native comparison of locally and geographically distributed systems for highly
accessed Web sites. We discuss the properties that should be provided by a
benchmarking tool in terms of various parameters: applicability to distributed
Web-server systems, realism of workload and significance of the output results.
The analysis is also influenced by the availability of the source code and the
customizability of the workload model. We analyze popular products that are

Benchmarking Models and Tools for Distributed Web-Server Systems 209
free or at nominal costs, and provide source code: httperf [32], SPECweb99 (in-
cluding the version supporting SSL encryption/decryption) [38,39], SURGE [7,
8], S-Clients [6], TPC-W [41], WebBench [45], Web Polygraph [42], and Web-
Stone [30]. For this reason, we do not consider commercial tools (e.g., Techno-
vations’ Websizr [40], Neal Nelson’s Web Server Benchmark [34]) that are more
expensive and typically unavailable to the academic community, although they
provide richer functionalities. Other benchmarking tools that come from the re-
search (e.g., Flintstone [15], WAGON [24]) have not been included because they
are not publicly available.
We can anticipate that none of the observed tools is specifically oriented to
testing distributed Web-server systems, and only a minority of them reproduces
the load imposed by a modern user session. Many existing benchmarks prefer
to test the maximum capacity of a Web server by requesting objects as quickly
as possible or at a constant rate. Others with more realistic reproductions of
user session behavior (involving multiple requests for Web pages separated by
think times) refer to request and delivery of static content only. This result was
rather surprising if we think that the variety and complexity of offered Web-
based services require system structures that are quite different from the typical
browser/server solutions of the early days of the Web. The increasing need for
dynamic request, multimedia services, e-commerce transactions, and security are
typically based on multi-tier distributed systems. These novel architectures have
really complicated the user and client interactions with a Web system, ranging
from simple browsing to elaborated sessions involving queries to application and
database servers. Not to say about the manipulations to which a user request can
be subject, from cookie-based identifications to tunneling, caching, and redirec-
tions. Moreover, an increasing amount of Web services and content are subject
to security restrictions and secure communication channels involving strong au-
thentication that is becoming a common practice in the e-business world. Since
distributed Web-server systems typically provide dynamic and secure services, a
modern benchmarking tool should model and monitor the complex interactions
occurring between clients and servers. None of them seems publicly available to
the academic community.
We illustrate in Fig. 1 the basic structure of a benchmark tool for distributed
Web-server systems that we assume based on six main components (benchmark-
ing goal and scope, workload characterization, content mapping on servers, work-
load generation, data collection, data analysis and report) that will be analyzed
in details in the following sections. The clear identification of the characteristics
to be evaluated is at the basis of any serious benchmarking study that cannot
expect to achieve multiple goals. From this choice, the workload representation
phase takes as its input the set of parameters representing a given workload con-
figuration and produces a non ambiguous Web workload specification. In the case
of a distributed Web-server system, the content is not always replicated among
all the servers, hence it is important that the content mapping phase decides the
assignment of the Web content among multiple front-end and back-end servers.
The workload generation engine of a benchmark analyzes the workload specifi-
210 M. Andreolini, V. Cardellini, and M. Colajanni
cation and produces the offered Web workload, issuing the necessary amount of
requests to the Web system and handling the server responses. The component
responsible for data collection considers the metrics of interest that have been
chosen in the first phase of the benchmarking study and stores relative data
measurements. Often, the whole set of measurements must be aggregated and
processed in order to present meaningful results to the benchmark user. The
output analysis and report component of a benchmark takes the collected data
set, computes the desired statistics, and presents them to the benchmark user
in a readable form.
S c o p e a n d
g o a ls
W o rk lo a d
C o n fig u ra tio n c h a ra c te riz a tio n
p a ra m e te rs
C o n te n t m a p p in g W e b c o n te n t
W e b c o n te n t o n se rv e rs
a n d s e rv ic e s
W o rk lo a d D is trib u te d
W e b s y s te m
M e a s u re m e n ts W o rk lo a d R e q u e s ts
g e n e ra tio n
M e tric s R e sp o n se s
D a ta
c o lle c tio n
C o lle c te d d a ta
M e tric s D a ta a n a ly s is
& re p o rt
S ta tis tic s
Fig. 1. Main components of a benchmarking tool for distributed Web-server systems.
After a brief description in Sect. 2 of the main architectures for locally and
geographically distributed Web-server systems, the remaining sections of this
tutorial follows the components outlined in Fig. 1. Finally, Sect. 9 concludes the
paper and summarizes some open issues for future research.
2 Distributed Web-Server Systems
In this section we outline the main characteristics of the Web-server systems we

consider in this tutorial, by distinguishing locally from geographically distributed
architectures.
Any distributed Web-server system needs to appear as one host to the outside
world, so that users need not be concerned about the names or locations of the
replicated servers. Although a large system may consist of dozens of nodes, it is
publicized with one site name to provide a single interface to users at least at
the site name level.
2.1 Locally Distributed Architectures
A locally distributed Web-server system, namely Web cluster, is composed by

a multi-tier architecture placed at a single location. A typical architecture is
shown in Fig. 2. A modern Web cluster has typically a front-end component
(called Web switch) that is located between the Internet and the first tier of
Web server nodes, and it acts as a network representative for the Web site.
The Web system comprises also one authoritative Domain Name System (DNS)
server for translating the Web site name into one IP address. The role of this
name server is easy because a Web cluster provides to the external world a single
virtual IP address that corresponds to the IP address of the Web switch.
The HTTP processes running on the Web server nodes listen on some network
port for the client requests assigned by the Web switch, prepare the content
requested by the clients, send the response back to the clients or to the Web
switch depending on the cluster architecture, and finally return to the listen
status. The Web server nodes are capable of handling requests for static content,
whereas they forward requests for dynamic content to other processes that are
interposed between the Web servers and the back-end servers. In less complex
architectures these middle-tier processes (e.g., CGI, ASP, JSP) are executed on
the same nodes where the HTTP processes run, so to avoid a connection with
another server node. These middle-tier processes are activated by and accept
requests from the HTTP processes. They interact with database servers or other
legacy applications running on the back-end server nodes for providing dynamic
content.
In Fig. 2 we evidence the three main flows of interactions of a client with
the Web cluster not including secure connections: requests for static files that
are served from the disk cache of the Web servers, requests for static files that
require the disk access, requests for dynamic content.
The Web switch receives the totality of inbound packets and distributes them
among the Web servers. The two main architecture alternatives can be broadly
classified according to the OSI protocol stack layer at which the Web switch
operates the request assignment, that is layer-4 and layer-7 Web switches. The
main difference is the kind of information available to the Web switch to perform
assignment and routing decision.
Layer-4 Web switches work at TCP/IP layer. They are content information
blind, because they determine the target server when the client establishes the
TCP connection, before sending out the HTTP request. Therefore, the type of
information regarding the client is limited to that contained in TCP/IP packets,
that is IP source address, TCP port numbers, SYN/FIN flags in the TCP header.
Layer-7 Web switches work at the application layer. They can deploy content-
based request distribution. The Web switch establishes a complete TCP connec-
tion with the client, inspects the HTTP request content, and then relays it to
D y n a m ic r e s p o n s e
W e b c lu s te r fo r
w w w .w e b s it e .c o m
S ta tic r e s p o n s e (c a c h e ) 1
0 0
1
000
111
0
1 0
1
000
111
0
1 0
1
L A N 000
111
0
1 0
1
000
111
0
1
0
1 0
1
S ta tic r e s p o n s e (d is k )
0 1
000
111
1
000
111
0
1
0
0
1
0
1
000
111
0
1
S ta tic r e q u e s t 0 1
000
111
1 0
0
1
C lie n t S ta tic r e q u e s t W e b se r v e r 1
D y n a m ic r e q u e s t 1
0 0
1
000
111
0
1 0
1
000
111
0
1 0
1
000
111
0
1 0
1
000
111
B a c k − e n d
0
1
0 1
1 0
0
1 se r v e r 1
IN T E R N E T 000
111
0
1
000
1110
1
0
1
0 1
000
111
1 0
0
1
000
111
0
1 0
1
1 3 5 . 6 4 . 5 6 . 2 0 W e b s w itc h
1 3 5 .6 4 .5 6 .2 0 W e b se r v e r 2
0
1 0
1
111
000
0
1 0
1
111
000
0
1 0
1
111
000
0
1 0
1
111
000
w w w . w e b s i t e . c o m 0
1
0
1 0
1
0
1
111
000
0
1
0 1
000
1110 B a c k − e n d
L o c a l D N S se r v e r 1
000
111
0
1 0
1
0 1
0
000
111 se r v e r M
1 0
1
W e b se r v e r N
A u th o r ita tiv e D N S s e r v e r
f o r w w w .w e b s it e .c o m
Fig. 2. Flows of interaction in a locally distributed architecture.
the chosen Web server. The selection of the target server can be based on the
Web service/content requested, as URL content, SSL identifiers, and cookies.
Another important classification regards the mechanism used by the Web
cluster to route outbound packets to the clients. In two-ways architectures, both
inbound and outbound traffic pass through the Web switch. In one-way architec-
tures, only inbound packets flow through the Web switch, while outbound pack-
ets use a separate high-bandwidth network connection. A detailed description of
request routing mechanisms and dispatching algorithms for locally distributed
architectures can be found in [10].
2.2 Geographically Distributed Architectures
A locally distributed system is a powerful and robust architecture from the server
point of view, but does not solve the problems related to network delivery, such as
first and last mile connectivity, router overload, peering points. An alternative
solution is to distribute the server nodes over the Internet. With respect to
clusters of nodes that reside at a single location, geographically distributed Web-
server systems can reduce network delays experienced by the client, and also
provide high availability to face network failures and congestion.
For performance and availability reasons, the distribution take typically place
at the granularity of Web clusters that is, each geographically distributed node
consists of a cluster of servers as that described in the previous section. We
refer to this architectures as to Web multi-cluster. It maintains one hostname
for the extern as in the Web cluster case, but now each Web cluster has a visible
IP address. Hence, the request assignment process can occur in two or more
steps. The first request assignment (inter-cluster) is typically carried out by the
authoritative Domain Name Server (DNS) of the Web site that selects the IP
address of the target Web cluster during the address lookup of the client request.
The second (intra-cluster) dispatching level is executed by the Web switch of the
target cluster that distributes the request reaching the cluster among the local
Web server nodes. A third (extra-cluster) dispatching level based on some request
re-routing technique may be integrated with the previous two mechanisms [11,
35].
3 Scope and Goals of the Benchmarking Study

In considering the performance of a Web system we should regard to its software,
operating system, and hardware environment, because each of these factors can
dramatically influence the results. In a distributed Web-server system, this en-
vironment is further complicated by the presence of multiple components, that
require connection handoffs, process activations and request dispatching. For ex-
ample, referring to the Web switch component in Fig. 2, we may be interested
to evaluate several alternatives, such as hardware, operating system, network re-
lated software, request dispatching policy and request forwarding mechanism. A
Web server is characterized by similar hardware and software layers, and besides
them by the HTTP software, the data distribution, the software for dynamic
requests. A back-end server is also characterized by application and database
software. Not to say of the additional complexity that characterizes a geograph-
ically distributed system.
A complete performance evaluation of all layers and components of a dis-
tributed Web-server system is simply impossible. Hence, any serious benchmark-
ing study should clearly define its goals and limit the scope of the alternatives
to be considered. In particular, this tutorial focuses mainly on benchmarking
tools used in the design and prototype phase when different architectures must
be evaluated and alternative solutions must be compared through experiments
in a laboratory. Our main interests do not go to the hardware and operating
system that in most cases are simply given. Similarly, we are not interested to
evaluate the end-to-end performance of an installed Web system although many
considerations can be also used for these purposes.
4 Workload Characterization
The characterization of the workload generated by a Web benchmarking tool
represents a central aspect of benchmarking and constitutes a distinguishing
core feature of existing tools as on it founds the attempt to mimic the real-world
traffic patterns observed by Web-server systems. The generation of synthetic
Web traffic is not a trivial task because it aims at reproducing as accurately
as possible the characteristics of real traffic patterns, which exhibit some un-
usual features such as burstiness and self-similarity [4,12]. On the other hand,
real world workloads are inherently irreproducible, since it is impossible to repli-
cate the overall conditions under which the performance testing was originally
performed.
In this section, we identify the main properties that are at the basis of the
process of specifying the workload characterization. Moreover, we analyze the
requirements that are specific for the benchmarking of distributed Web-server
systems, compare the identified approaches, and discuss how the existing bench-
marks realize these properties, providing also directions which we feel should
be considered in the realization of benchmarking tools specific to distributed
Web-server systems.
4.1 Classification of Alternatives

The workload characterization of a Web benchmark deals with three main as-
pects:
– the Web service characterization defines the types of services requested to

the Web-server system;
– the request stream characterization defines the characteristics and the
methodology used to generate the stream of requests issued to the Web-
server system under evaluation;
– the Web client characterization defines the behavioral model of the Web
client (i.e., the browser) and specifies to which extent the client characteris-
tics support the HTTP specifications.
Characterization of Web-based Services. Let us first examine the charac-

terization of Web-based services. As the variety of services and functions offered
over the Web is steadily increasing, and puts dramatic performance demands
on Web servers, the workload characterization of a benchmark should attempt
to model realistic Web traffic and aim to capture this large variety of services.
That is to say, the requests cannot be limited to static resources, but rather
the workload should at least include dynamic services, which typically impose
higher resource demands on Web servers [2]. Streaming multimedia services pro-
vided over the Web are also becoming increasingly popular and should be taken
into account in the workload model. Security is a further issue which is often
neglected in existing Web server benchmarks. With the increasing number of sen-
sible and private transactions being conducted on the Web, security has raised
its importance; therefore, modern workload characteristics should also include
encrypted client-server interactions.
In Table 1 we summarize the core parameters that are involved in the specifi-
cation of the offered workload. The definition of the parameters is oriented to the
user session and resembles that described in [7,8,23]. The first set of parameters
reviews some basic terminology, the second contains user-oriented parameters,
while the third concerns Web object characteristics.
Characterization of the Request Stream. There are several possibilities to

generate the stream of Web requests that will reach the tested system. The choice
of a methodology impacts on the characteristics of the offered Web workload as
Table 1. Main parameters involved in the specification of Web workload.
Name Meaning
Web page A collection of objects constituting a multipart document
intended to be rendered simultaneously; the base object
is the first fetched from the server, then it is parsed, and
all embedded objects are subsequently requested
User session A sequence of requests for Web pages (clicks) issued by
the same user during an entire visit to the Web site
Session length The number of Web pages constituting a user session
Session interarrival rate The rate at which new user sessions are generated
User think time The time between two consecutive Web pages retrievals
Object sizes The size of the collection of objects stored on the Web system
Request sizes The size of objects transferred from the Web system
Object popularity The relative frequency of requests made to individual objects
Embedded objects The number of objects (not counting the base object)
composing a single Web page
Temporal locality How likely a requested object will be requested again in the
near future
well as on the mapping of the synthetic content on the Web-server system that
will be analyzed in Sect. 5. As shown in Fig. 3, the generation of the stream of
Web requests falls into main four approaches.
W e b re q u e s t s tre a m
s p e c ific a tio n
T ra c e − b a se d F ile lis t b a s e d A n a ly tic a l H y b rid

d is trib u tio n d riv e n
Fig. 3. Possible approaches to generate the stream of Web requests.
In the trace-based approach, the characteristics of the Web workload is based

on pre-recorded (or synthetically generated) trace logs derived by server access
logs [20]. The workload characteristics can be reproduced by replaying (or sam-
pling) the requests as logged in the trace. An alternative is to create an abstract
model of the Web site and extract session-oriented high-level information (such
as session lengths and inter-arrival times) through a preliminary trace analysis
that pre-processes server logs [25]. Some techniques to infer Web session charac-
teristics from trace logs have been described in [1,27]. The trace-based approach
allows the benchmark tool to mimic the user behavior in a realistic way. However,
the conclusions drawn from the experiments depends on the trace representa-
tiveness, as a trace can present workload properties that are strictly peculiar to
it and do not have general validity. Furthermore, it can be hard to adjust the
workload to imitate future conditions or varying demands.
It should also be remarked that, unlike the early days of the Web, server
access logfiles are becoming a precious source of business and marketing infor-
mation. As a consequence, companies and organizations are not willing to give
their traces for free (or even at all), if not after years when the realism of these
traces is at least doubtful. A further issue of the trace-based approach regards
the reconstruction of the user sessions from the trace logs, which is not a triv-
ial task [1]. For example, as sessions are identified through their IP address, it
may happen that clients behind the same proxy are considered as coming from
the same machine, which may lead to an improper characterization of the Web
workload. Another issue that may complicate the reconstruction of user sessions,
especially for highly accessed Web systems, concerns the coarse time resolution
at which requests are recorded in server access logs [20].
In the filelist based approach, the tool provides a list of Web objects with their
access frequencies. The object sizes are typically based upon the analysis of logs
from several Web sites. During the workload generation phase, the next object to
be retrieved is chosen on the basis of its access frequency. Time characteristics are
typically not taken into account, hence the stream of requests depends only on
the filelist while the inter-arrival request time is set. The filelist approach lacks
of flexibility with respect to the workload specification, and also ignores the
concept of user sessions As discussed in [4,3,8,12], Web traffic is bursty, session-
oriented, and characterized by heavy-tailed distributions, which have high or
even infinite variance and therefore show extreme variability on all time scales.
To emulate these workload characteristics, it is not sufficient to mimic the user
activity by requesting a set of files as quickly as possible; it is necessary to
provide some support for modeling the session-oriented nature of Web traffic.
As a consequence, a benchmark that uses just a filelist is not able to reproduce
a realistic Web workload. When using filelists, the only feasible alternative is
to provide some support to define the characteristics of a user session (such as
user think times) otherwise the workload generator will not be able to emulate
a realistic load. Furthermore, the overall size of the file set being used should be
checked to ensure that the server caching mechanism is fully exercised.
In the analytical distribution-driven approach, the Web workload characteris-
tics are specified by means of mathematical distributions. The requests are issued
according to the parameters of the workload model. The probability distributions
may be used to generate random values that reproduce all the characteristics of
the request stream during the execution of the benchmarking test. An alternative
is to pre-generate all user sessions and the resulting sequence of requests, and
to store them in a trace file which will be used by the workload generator. The
analytical distribution-driven approach allows a tool to define a detailed Web
workload characterization because all features are specified through mathemat-
ical models. Some can argue about the realism and accuracy of the workload
characterization, but changing the parameters of a distribution or a distribution
itself to evaluate the performance under different conditions is a really easy task.
The hybrid approach is a mix of the filelist and analytical techniques.
For example, the objects to be accessed may be specified through a filelist,
while session-oriented parameters, such as session lengths and user think times,
are modeled through analytical distributions. In the hybrid method, parame-
ters shaping the main characteristics of session-oriented workload are modeled
through stochastic models.
Web Client Characterization. The first important characterization for a

Web client regards the alternative between an open and a closed loop model.
In a closed model, a pre-determined number of clients sends requests only after
having received the previous server responses. Although this model does not
give a realistic view of the offered load, it is adopted by several tools that aim to
evaluate performance of a Web system subject to constant load. However, this
behavior becomes unrealistic and not acceptable for a distributed Web-server
system under heavy load conditions. Indeed, as Web traffic increases, clients
spend most of their time waiting for responses and, substantially, they issue
requests at the response rate imposed by the system responses. This situation
is far from reality, in which the clients access a distributed popular Web site
concurrently and independently from the server responses. Hence, an open client
model, characterized by periodic client interarrival times, is typically preferred
when evaluating the performance of a distributed Web-server system.
Another main feature related to the client requests is represented by the
HTTP protocol that is supported by the emulated browser. The client should be
capable of requesting objects using both HTTP/1.0 and HTTP/1.1. Indeed, the
latter provides some interesting features (such as persistent connections, request
pipelining, and chunked transfer encoding [20]) which affect the performance of
the Web system under testing [8,19]. In particular, persistent connections are
used to limit the number of opened TCP connections (thereby reducing resource
consumption on the Web-server system) and to avoid slow start each time a new
object is requested. It would be also important to have full support for various
request methods (GET, POST, HEAD) in the request header. Further issues
regard the possibility to allow for session tracking via cookies and to support
SSL/TLS encryption in such a way to request secure Web services.
To properly mimic the resource usage of the Web-server system, the emulated
client could also use multiple parallel connections for the retrieval of embedded
objects in a Web page. Although this is a deprecated technique for its impact on
the Web servers, it is commonly employed by modern browsers (together with
closing active connection by means of TCP resets) to reduce the latency time
experimented by users. This implies that the browser behavior cannot be naively
emulated by a simple model in which the client opens a single TCP connection
at a time for the retrieval of a single Web object.
4.2 Requirements for Distributed Web-Server Systems
In this section we identify the requirements pertaining to the workload charac-

terization component which are suitable to perform the benchmarking of dis-
tributed Web-server systems. Besides the workload characteristics which should
mimic at best those of real Web traffic and an open system model for client
requests, the distinguishing feature that characterizes the benchmarking of dis-
tributed Web-server systems regards the mechanisms supported by the client for
request routing.
No particular support is required to the benchmark of Web clusters, as the
Web switch completely masks the distributed nature of the architecture to the
clients that interact with the Web system as if it were a one server node. On the
other hand, some request routing support must be provided for benchmarking
geographically distributed Web-server systems in which multiple IP addresses
may be visible to client applications. The most important feature to add to the
client model is the DNS mechanism with all main steps related to the address
lookup phase. This would allow us to test the impact of alternative routing mech-
anisms, such as DNS-based routing, URL rewriting, and HTTP redirection [13].
To support the last technique, the client should also be able to redirect the
request as indicated in the response header.
4.3 Comparison of Selected Tools
In this subsection we analyze how the selected Web benchmarks specify their
workload. We appreciate that most benchmark tools allow us to customize and
extend the workload model in order to test different scenarios. On the other
hand, the option for workload configuration of SPECweb and TCP-W bench-
marks are quite limited because their goal is to measure the performance of
different systems in a well-defined and standardized scenario. Obviously, we do
not penalize these benchmarks for a limit that is intrinsic in their design.
Httperf permits two approaches to generate the request stream that is, hybrid
and trace-based [32]. Both methods enable a session-oriented workload charac-
terization and the requests for both static and dynamic services. In the hybrid
approach, single or multiple URL sequences may be specified, together with
some session oriented parameters, such as user think times. In the trace-based
approach, user sessions are defined in a trace file. The requests are issued ac-
cording to an open model. Both HTTP/1.0 and HTTP/1.1 protocols are fully
supported, including cookies (although only one cookie per user session). Pri-
mary SSL support is provided, including the possibility of specifying session
reuse, which is an important feature as it avoids handshaking every client re-
quest. Httperf allows also to specify some realistic browser characteristics, such
as the use of multiple concurrent connections.
SURGE relies on a analytically generated workload aimed at dealing with
the self-similarity issues of the Web characteristics [7,8]. The workload model
derives from empirical analysis of Web server usage to mimic real-world traffic
properties. In SURGE, the workload is measured in terms of User Equivalent,
defined as a single process in an endless loop, alternating between requests and

thinking times. Therefore, the user behavior is modeled as a bursty two-state
ON/OFF process, where ON periods correspond to the transfer of Web objects,
and OFF periods correspond to the silent intervals after that all objects in a
Web page have been retrieved. It has been demonstrated that the superposition
of a large number of ON/OFF sources results in self-similar traffic, if the du-
rations of ON and OFF phases are described by heavy-tailed distributions [12,
43]. The characteristics of the request stream are specified through heavy-tailed
distributions as regarding file size, request size, file popularity, embedded ob-
ject references, temporal locality, and OFF times. Support for HTTP/1.0 and
HTTP/1.1 protocols is provided (the latter with request pipelining), while no
security support is provided. The browser activity is emulated using only one
connection at time. SURGE remains the most accurate tool for the characteri-
zation of static requests. Its main limits, especially for the analysis of multi-tier
Web systems, are that the workload model does not take into account request
for dynamic services and that the generation of requests follows a closed-loop
model.
The S-Clients workload is intentionally not realistic, being characterized by
a single file which is requested at a specified fixed rate [6]. This choice provides
excellent measurements of the server performance and capacity, but does not ex-
ercise other system resources, starting from the disk as the file is always get from
the cache. With S-Clients it is not possible to specify any browser behavior, only
the plain HTTP/1.0 protocol is supported, and no session encryption is allowed.
These aspects make the workload characterization provided by S-Clients inap-
propriate from the point of view of the workload realism, while it is appreciable
its sustained load solution for stress testing distributed Web-server systems, as
discussed in Sect. 6.
WebStone denotes the characteristics of the request stream through a file
list [30]. The benchmark workload includes both static and dynamic services,
the latter generated through CGIs and server APIs. Since the maximum size of
the filelist is limited to 100 files, it is difficult to model typical workloads of dis-
tributed Web-server systems which consist of thousands of files. Moreover, there
is no way of specifying a session-oriented workload, since requests are intended to
be issued consecutively. The workload is generated following a closed loop model.
The emulation of the browser characteristics is quite limited, as WebStone sup-
ports only standard HTTP/1.0 without keep-alive. Support for encryption and
authentication is not officially included, although a patched version exists which
enables it [31].
WebBench follows a hybrid approach, where the workload characterization
is done through test suites that is, appropriate combinations of request streams
(which model specific user interactions) along with their reproduction modali-
ties [45]. Static, dynamic (CGI and API), and secure services may be configured.
Both HTTP/1.0 and HTTP/1.1 protocols are supported. The two main draw-
backs are related to the impossibility to specify the session-oriented nature of
client requests and to the closed loop model.
Web Polygraph permits a fairly complete specification of the Web workload,

characterized by a session-oriented request stream, Web pages, popularity of files,
cacheability at the client, server delays due to network congestion [42]. Many of
these propierties may be specified through probability distributions. Requests
may be issued through both HTTP/1.0 and HTTP/1.1 protocols, in an open or
closed loop model. An interesting feature is the presence of already configured
Web workloads oriented to layer-4 and layer-7 Web clusters.
A different common observation is in order about the workload of
SPECweb99 and TPC-W benchmarks. They define standardized Web workloads
which are not intended to be customized by the user. Hence, they cannot be used
to define workloads for different categories of Web sites. The basic workload of
SPECweb99 [38] includes both static and dynamically generated content, while
an enhanced version supports also secure services [39]. The static workload is
characterized by four classes of file sets, modeling different types of Web servers
and spread into a precomputed number of directories. Directory access and class
access are chosen according to a Zipf distribution. The dynamic workload models
two common features of commercial Web servers: advertising and user registra-
tion. The client model is closed because a fixed number of clients is executed
during each experiment.
The TPC-W benchmark specification (note that it is not a tool) defines
the details of the Web services and content at the site and the workload of-
fered by clients [26,41]. It specifies a database structure oriented to e-commerce
transactions for an online bookstore together with its Web interface. Clients
are characterized by Web interactions that is, well-defined sequences of Web
page traversals which pursue particular actions such as browsing, searching, and
ordering. Request streams are session-oriented, with think times between Web
page retrievals. It also includes secure connections because some client actions
(e.g., online buying) require SSL/TLS encryption.
No tool provides explicit support to DNS routing that is of key importance in
geographically distributed Web-server systems and Content Delivery Networks.
Most benchmarking tools perform only one DNS lookup at the beginning of the
test, that is unrealistic since a DNS lookup is needed per each client session.
There is no much support even to other (application based) request routing
mechanisms, for example only Web Polygraph and WebBench support HTTP
redirection.
5 Content Mapping on Servers
An interesting issue of a benchmarking tool for distributed Web-server systems

is the replication of the synthetic content among the multiple server nodes.
Once the synthetic workload has been specified, it must be replicated on the
Web nodes composing the Web-server system prior that the workload generation
engine starts to generate the request stream. This constitutes an error-prone
operation which should be automated as much as possible. Another peculiarity
of distributed Web-server systems is that the replication strategy may differ on
the basis of the system architectures, because the content is not always replicated
among all the servers.
Let us first examine the problem of mapping the Web site content onto
the Web servers in the case of one Web server, for which we identify three
alternatives: full support, partial support, and no support.
The most attractive feature to the benchmark user is a full support that
is, once the benchmark user provides the specification of the entire Web site
content (the tree of static documents as well as the set of data to be placed on
the back-end servers), it is automatically generated and uploaded on the Web
and back-end server disks. A partial support means that only a portion of the
Web site content (that , static documents) is put on the server disk, while other
content (that is, dynamic services) is left up to the benchmark user. If the bench-
mark does not provide any support for the content generation and mapping, the
content must be generated and uploaded manually on the server. Manual gener-
ation is errore-prone and is often unfeasible due to the large number of involved
files. Thus, the presence of a mapping component is strongly encouraged.
Webstone provides a partial support for Web content creation [30]. It is
possible to specify and generate a set of static files with given sizes, while dy-
namic content creation is left to the user. The other Web benchmarking tools,
although providing in some cases already predefined Web contents (SPECweb99,
WebBench), neither perform content mapping across different Web servers nor
install them. Every decision is left to the benchmark user.
The benchmark study of a distributed Web-server system has an additional
requirement because the site content may be fully replicated, partially replicated,
or partitioned among the multiple server nodes. The two last configurations are
typically used to increase the secondary storage scalability [10,44] or to enhance
the features of specialized server nodes providing dynamically generated content
or streaming media files. It is also important to observe that fully replication can
be easily avoided only if we use a layer-7 Web switch that can take content-aware
dispatching decisions. An alternative is to use a layer-4 Web switch combined
with a distributed file system, because any selected server node should be able
to respond to client requests for any part of the Web site content.
We can easily observe that none of the selected benchmarking tools includes
any utility for fully or partial content replication among multiple servers.
6 Workload Generation Engine
An important component of a Web benchmarking tool is the workload generation

engine, which is responsible for reproducing the specified workload in the most
accurate and efficient way.
Distributed Web-server systems are characterized by a huge number of ac-
cesses, which have to be emulated with a usually limited amount of resources.
This may be obtained by generating and sustaining overload [6] that is, con-
stantly offering a load that exceeds the capacity of the distributed Web system.
In this section, we identify the main features of workload generation, analyze the
requirements that are specific for distributed Web systems, and discuss how the
selected Web benchmarking tools behave with respect to the identified features
and requirements.
The two main features in a workload generator are the engine architecture denot-
ing the computational units used to generate Web traffic (processes or threads)
and mutual interactions, and the coordination scheme defining the ability of
configuring and synchronizing the computational unit executions.
Engine Architectures. We give a possible taxonomy of workload generator

architectures in Fig. 4. In a centralized architecture, a single instance of the
workload generator runs on a single node, whereas in distributed architectures
the engine is spread across multiple nodes.
W o rk lo a d g e n e ra tio n
e n g in e
S in g le n o d e M u ltip le n o d e s
(c e n tra liz e d ) (d is trib u te d )
S in g le M u ltip le M u ltip le H y b rid S in g le M u ltip le M u ltip le H y b rid

p ro c e ss p ro c e sse s th re a d s p ro c e ss p ro c e sse s th re a d s p e r n o d e
p e r n o d e p e r n o d e p e r n o d e
Fig. 4. Architecture of a workload generator.
The architecture characterization defines the nature of the computational

units on each client node. In single-process architectures, one process is responsi-
ble for the generation of the whole workload on the node on which it is running.
In multiple-process architectures, the task of generating client requests is split up
among several user-level processes. The multiple-process approach is relatively
straightforward, but suffers from two drawbacks. First, it is CPU-intensive be-
cause of frequent context switches, especially when many user processes are
spawned on the same machine. Second, since process address spaces are usually
separated, most information (e.g., the workload configuration) must be repli-
cated, thus wasting main memory that is an important resource for the scalabil-
ity of the load generated by the client node.
In multi-threaded architectures, light-weight processes sharing the same ad-

dress space are used to generate the appropriate portion of workload, while in
hybrid architectures each node runs several user processes, each handling multi-
ple threads. The multi-threaded architecture does not suffer from context switch
drawbacks. In general, light-weight processes guarantee for a better scalability,
but multi-threaded programming incurs in a higher degree of complexity. Shar-
ing the address space surely leads to a better memory utilization than in the
multi-process architecture, at the cost of implementing synchronization prim-
itives which could block client activity. Finally, several threads usually share
one set of system resources, which could be exhausted (for example, the file
descriptor set used to reference TCP connections).
The hybrid architecture aims to combine the advantages of multi-threaded
architectures (lower CPU overhead due to less frequent context switches) with
those of multi-process architectures (increase in available system resources such
as socket descriptors).
Coordination Schemes. The task of configuring and coordinating the execu-

tion of the computational units may be performed manually or automatically. In
the latter case, two coordination schemes are possible: master-client and master-
collector-client.
In the master-client scheme (see Fig. 5), the client generation task is delegated
to a master component, which reads the configuration and performs several op-
erations. First, it decides how many computational units have to be started and
how they are distributed among the client nodes, in order to offer the specified
workload. Then, it distributes part of the workload specification (for example,
the filelist) among all clients. Finally, it synchronizes the start of the benchmark-
ing experiment.
C lie n t n o d e 1
C lie n t
...
C lie n t
M a s te r n o d e
D is trib u te d
W e b s y s te m
M a s te r
C lie n t n o d e K
C lie n t
...
C lie n t
Fig. 5. The master-client coordination scheme.

The master-client approach is further extended in the master-collector-client

coordination scheme, illustrated in Fig. 6. One or more collector processes are
activated on each client node, either manually or automatically through a mas-
ter process (for clarity of representation, Fig. 6 shows only one collector). The
master connects to each collector, distributes the workload configuration, and
synchronizes the start of the benchmarking experiment. Each collector reads its
portion of configuration from the master, spawns the necessary amount of com-
putational units, and waits for a start signal from the master. Master, collector,
and clients are logically separated, but they may reside on the same node.
C lie n t n o d e 1
C lie n t
C o lle c to r
C lie n t
M a s te r n o d e
D is trib u te d
M a s te r W e b s y s te m
C lie n t n o d e K
C lie n t
C o lle c to r
C lie n t
Fig. 6. The master-collector-client coordination scheme.
We can conclude that is clearly preferable to have an automated generation

of client emulators among different nodes than referring to manual activations.
This is especially true if the coordinator is able to share the Web workload among
multiple nodes according to the capacity of each client node.
6.2 Requirements for Distributed Web-Server Systems

Distributed Web-server systems are typically subject to a large amount of traf-
fic, which has to be reproduced somehow to evaluate their performance under
realistic conditions. Thus, the scalability of the workload generation engine is
a strict requirement for a benchmarking tool. Single node architecture is not
adequate since operating system resource constraints typically limit the number
of concurrent clients. The generation of Web workload should be distributed
across as many nodes as possible. The amount of available client nodes for tests
is usually limited to few tenths. Thus, it is desirable to generate the maximum
amount of workload on a given node. This holds especially when client nodes
have heterogeneous capacities. An unbalanced assignment of workload may de-
termine the under-utilization of some nodes and the partial inability to generate
requests on others.
Being the scalability of the workload generation engine an important require-

ment for the performance evaluation of distributed Web systems, we observe that
it is quite difficult to achieve it using a centralized architecture. The only solu-
tion to keep one process spawning many concurrent client sessions is to use an
event-driven approach, combined with non-blocking I/O [18]. This single pro-
cess polls the network for events and reacts accordingly. This approach involves
programming non-blocking I/O, which can be tricky and much more difficult
than in a multi-process or multi-threaded model. Moreover, one process may
run out of file descriptors if the machine is not well tuned. On the other hand,
this approach does not suffer from context switch overheads, provided that the
client node does not execute other resource intensive tasks.
In this section we analyze how the selected benchmarking tools generate the load
offered to the Web system.
Httperf generates the specified workload through one process, implement-
ing an event-driven approach with non-blocking I/O [32]. As a consequence,
the workload generator keeps a single CPU constantly occupied, so it is recom-
mended not to run more than one httperf process per CPU. Furthermore, the
maximum number of concurrent sessions is bounded by typical process limits
such as the maximum number of open descriptors. As there is no coordination
scheme, several instances of httperf must be executed manually on distinct nodes
to scale to the desired workload; an helper utility can be used to automate this
task [29]. The workload generation engine of httperf is adequate to the perfor-
mance evaluation of distributed Web systems.
In SURGE the client activity is modeled through a User Equivalent, which
is represented by a thread [7,8]. The benchmarking experiment is activated by
invoking a master which spawns a predefined number of client processes. Each
client process generates a prefixed number of client threads (i.e., User Equiv-
alents). Therefore, SURGE architecture can be defined as being centralized,
multiple-process and multiple-thread. The coordination scheme is a master-
collector-client, although on a single node. Since no support is provided to au-
tomatically distribute clients among multiple nodes, several instances of the
SURGE master have to be activated manually on distinct client nodes, in order
to scale the workload.
The workload generator of S-Clients is executed by a single process on one
client node [6]. The engine aims at generating excess load by using non-blocking
connects and closing the socket if no connection was established within a given
interval. There is no means to automatically start different workload generators
on distinct nodes, but this operation has to be performed manually. Further-
more, since timers are implemented using the rdtsc primitive [14], the ability to
generate connections with a specified rate depends heavily on the CPU speed of
the client, and the CPU type, which should be a Pentium. The most interesting
feature of S-Clients for the benchmarking of distributed Web systems is the use
of non-blocking connections combined with timeouts, as it allows to guarantee

a specified connection rate.
In WebStone, the activity of Web users is emulated through a preconfigured,
fixed number of Web clients [30]. The architecture of WebStone is distributed
and multi-process; each Web client is executed as a distinct user process that
requests files continuously. Web clients are distributed over several nodes through
another user process, called Web master. The workload generator of WebStone
is not able to sustain high loads and, consequently, it is not adequate for the
performance evaluation of distributed Web-server systems.
In WebBench, Web clients are emulated through client processes running
on distinct nodes [45]. The architecture of WebBench is distributed and may
be either single-process or multi-threaded. In the first case, each client runs
as a user process (called physical client), in the latter, multiple clients run as
threads (called logical clients). A controller on a distinct node coordinates the
client execution. The recommended coordination scheme is master-client with
one physical client per node. It is also possible to specify a master-collector-client
scheme where logical clients are locally coordinated. In both cases, processes
residing on client nodes must be started manually. The features of the WebBench
workload generation engine are not sufficient for the benchmarking of distributed
Web-server systems.
The workload generation engine in Web Polygraph has a centralized, single-
process architecture, which is capable of sustaining overload [42]. Optionally,
server agents may be used to emulate parts of a distributed Web server system,
besides exercising real components. A drawback of Web Polygraph is the lack
of some support to automatically distribute the generation of requests across
multiple client nodes.
The TPC-W benchmark specification requires that client requests be issued
by a given number of “emulated browsers”, which remains constant throughout
the experiment [26,41]. The number of clients is obtained as a function of the
database table size and appropriate scaling factors. As a consequence, it is dif-
ficult to generate a considerable amount of traffic without modifying the Web
content.
SPECWeb99 distributes clients on several machines in order to achieve work-
load scalability [38]. If the operating system supports POSIX threads, clients
are executed as threads, otherwise as processes. Thus, the architecture of the
SPECWeb99 engine is distributed and multi-process or multi-threaded. Clients
are executed by processes called collectors, which must be manually activated
before starting the test. A master process connects to the collectors, sends them
the configuration parameters and synchronizes the runs. The workload genera-
tion allows for a certain degree of scalability but cannot be sustained when the
distributed Web system is under stress.
7 Data Collection and Analysis
The measurement and collection of data during the benchmarking test is of

key importance. If done superficially, it leads to improper conclusions about
the performance of the resources constituting the system. A first issue concerns
the definition of the metrics and the statistics which can yield the most useful
information about the components of the distributed Web-server system. Then
it is important to investigate the data collection strategies that are somehow
related to the previous choices.
The most common metrics for Web system performance are reported in Ta-
ble 2 [28].
Table 2. Typical Web performance metrics.
Name Meaning
Throughput The rate at which data is sent through the network
Connection rate The number of open connections per second
Request rate The number of client requests per second
Reply rate The number of server responses per second
Error rate The percentage of errors of a given type
DNS lookup time The time to translate the hostname into the IP address
Connect time The time interval between the initial SYN and the final
ACK sent by the client to establish the TCP connection
Latency time The time interval between the sending of the last byte
of a client request and the receipt of the first byte
of the corresponding response
Transfer time The time interval between the receipt of the first response
byte and the last response byte
Web object response time The sum of latency time and transfer time
Web page response time The sum of Web object response times pertaining to a
single Web page, plus the connect time
Session time The sum of all Web page response times and think times
in a user session
As Web workload is characterized by heavy-tailed distributions, most perfor-

mance metrics may assume highly variable values with non negligible probability.
Collecting just minimum, mean, and maximum times, error levels, is not an error,
but these metrics may not yield a representative view of the system behavior.
The metrics subject to high variability should be represented by means of higher
moments, percentiles or cumulative distributions [16,22]. Mean values may be
meaningless about peaks due to heavy load. This holds for throughput and re-
sponse times (specifically, object and page response times), which may exhibit
high variations from the mean value.
These performance statistics require more expensive or more sophisticated
data collection strategies, because measurements should be collected and stored
to allow later creation of histograms. The alternative is to implement techniques
to dynamically calculate the median and other percentiles without storing all
observations [17]. Let us analyze the main approaches to the collection strategy
(that is, record storage, data set processing, and hybrid ) and output analysis that
are strictly related.
In the record storage approach every record is stored. The generation of
meaningful statistics is entirely delegated to the output analysis. This technique
allows us to easily compute histograms and percentiles but it requires enormous
amount of memory. The main memory is often not sufficient, and the use of
secondary memory introduces other problems, such as delays and possible inter-
ferences in the experiment. Moreover, the elaboration of great amounts of data
tends to be resource expensive even if done post-mortem. Actually, a complete
collection and processing of all measurements is seldom necessary, and the use
of sampling techniques is the best alternative when we want to use the record
storage approach.
In the data set processing approach, measurements are not stored directly
into some repository, but are used to keep updated the data set with the in-
teresting statistics. Data set processing does not use great amounts of system
resources such as CPU or memory. This is the standard way for computing per-
formance indexes which do not require sophisticated statistics, such as minimum,
maximum, and mean values. It would be also possible to implement techniques
that dynamically calculate the median and other percentiles without storing all
observations [17], but even these more complex computation may interfere with
the experiment. The data set may coincide or not with the set of parameters
presented as final statistics. When they do not coincide, the generation of useful
statistics is partially delegated to the output analysis component that processes
the data set at the end of the benchmarking test.
None of the previous techniques is clearly the best. However, we can observe
that sophisticated statistics are really necessary only for those metrics which are
subject to high variance. In many other cases, min, max and mean values are
acceptable. For this reason, we consider also the hybrid approach that is a mix of
the previous two techniques. Each measurement may be stored, processed to keep
updated a data set, or both. This approach leads to a better trade-off between
main memory resource utilization and usefulness of the collected data. The per-
formance indexes that do not require sophisticated statistics may be computed
at run time, for the other indexes we can store the relative measurements and
postpone the evaluation during the output analysis after the experiment.
When multiple client emulators are used, it is necessary to use the data sets
and samples stored by each of them to compute the final metrics which are
presented to the benchmark user in a clear form. This operation is mandatory

in the case of distributed Web-server systems.
7.2 Requirement for Distributed Web-Server Systems

A typical benchmarking tool for distributed Web-server systems distributes the
generation of high volumes of Web traffic across different client nodes. Data
collection is usually done at the level of each computational unit. While it is
good to have per-process (or per-thread) statistics, it is certainly crucial to have
global reports, to understand how well the whole system has performed. To
obtain global session statistics, cumulative distributions and percentiles (not
only per-node statistics), data sets and records must be aggregated before the
computation of global statistics. Therefore, aggregation of collected data is a key
feature that should characterize all tools for distributed Web-server systems. We
also consider important to have session-oriented statistics that is, final reports
including metrics relative to user sessions, in addition to global statistics, which
are quite useful for evaluating the performance of the whole system.
Besides the previous considerations, there is a serious problem that makes
traditional benchmarking tools for Web servers not useful to collect important
statistics for an accurate performance evaluation of distributed Web-server sys-
tems, especially in the design and prototype phase when different alternative
architectures and solutions must be evaluated. Indeed, all considered tools have
been designed for the interaction of multiple clients with one server and give
global metrics that cannot take into account that the server side consists of mul-
tiple components usually running on different machines. In a distributed Web-
server system, the interaction of the client with this system consists of several
steps, such as switching to the right server and invoking the appropriate process
for the generation of dynamic content. The delay of each of these phases makes
up for the response time seen by the clients. A high response time means a bad
system performance, but it does not indicate where the bottleneck is. The asso-
ciated overhead within each phase of the Web transaction must be measured and
evaluated, since bottlenecks in one component make the whole system slower.
Some of the phases of a Web transaction in a distributed system are very hard
(if not impossible) to measure at run time without making modifications to the
system components. For example, in a locally distributed Web system, the time
required by the switch to dispatch a client request cannot be measured from the
client side. In other cases, the performance of some components may be inferred
by the external performance metrics. For example, in one-way Web clusters with
a layer-4 Web switch, the initial client SYN is processed by the switch and sent
to the appropriate Web server, which establishes the TCP connection. Thus, the
connect time embodies switch and server latencies, leaving us with the doubt
about a potentially overloaded node. Instead, the latency time is an approxi-
mate measure of server performance, since TCP segments do not pass by the
switch once the connection has established with the appropriate server.
In layer-7 one-way Web clusters, the opposite is true. Connect time is an
approximate measure of switch overload, since it establishes TCP connections
with clients prior to assigning requests to the appropriate servers. On the other
hand, the latency time embodies the Web switch and server delays, since every
client TCP segment directed to a server passes through the Web switch. In this
case, the latency time does not give sufficient information to localize a possibly
overloaded Web system component.
If one-way architectures allow an approximate evaluation of component per-
formance, this estimation is practically impossible in two-way architectures, since
both packet flows pass through the switch. Hence, the above mentioned proce-
dure may lead to gross evaluation errors. In general, there is no way for mea-
suring the performance of the Web switch and the single servers through client
measurements. Therefore, the right approach is that of enabling logging at ev-
ery system component and analyzing the resulting logs at the end of the test.
Monitor facilities and a log analyzer are required to this purpose. They should
be highly configurable because different applications may have different logfile
formats. Analyzing log outputs may require integration or modifications of the
network application software because the standard logs have too coarse granu-
larities (e.g., 1 second in the Apache server). Moreover, the statistics obtained by
the internal monitors must be integrated with those of the benchmark reports.
For geographically distributed Web systems, it is necessary to measure the
time taken by the request routing mechanism, such as DNS lookup and request
redirection times.

Httperf collects a large variety of metrics, both session- and request-oriented [32].
The most interesting non-session oriented metrics are connect time, latency time,
request and reply rate, throughput and error rates. Response time at the gran-
ularity of Web objects is not collected; session-oriented metrics include session
time and session rate. For each of these metrics, minimum, mean, maximum val-
ues and their standard deviations are computed through data sets. Support for
record storage through histograms is given only for some metrics such as session
length and connection duration. Httperf has a hybrid data collector and a cen-
tralized output analyzer. It also performs hybrid processing of the collected data.
A final report is presented with global and per-session statistics, thus providing a
way for detecting the degree of user concurrency in a distributed Web-server sys-
tem. However, it requires some extensions. For example, it would be interesting
to have the Web page response time as a metric. Moreover, the records should
be stored in histograms for later processing to evaluate higher order statistics.
SURGE stores only records of transaction time and Web object size for later
processing [7,8]. The output analyzer of SURGE is centralized and oriented to
record processing. It operates on server logs (in common log format) and on
the log file generated at the end of the benchmarking experiment. Final met-
rics provided to the benchmark use are session-oriented; a log-log cumulative
distribution table of Web page response times is also provided.
S-Clients collects a data set consisting of connection life time sums (which
are used to approximate transaction times, since HTTP/1.0 is used) and global
counters of opened connections and successfully delivered responses [6]. S-Clients

presents only request rate and average response time of the requested URLs.
WebBench keeps at run-time the following data set: a global count of success-
ful requests, a sum of transaction times and a sum of transfer sizes [45]. These
data sets are required to compute the final metrics, that is, number of requests
per second and throughput. WebBench gives only two overall metrics: interac-
tion times per second and throughput in bytes per second. They are computed
locally on each client and centrally gathered by the controller.
Webstone uses a hybrid data collector and output analyzer [30]. The re-
trieval phases of a Web object (connect latency and transfer times) are marked
by timestamps which are all recorded, while global counters are kept through ap-
propriate data sets. WebStone provides a report with global and per Web object
connect times, response times, error rates. It also computes a global connection
rate and a metric known as Little’s Load Factor [28]. No session oriented metrics
are reported, no response time subdivision in latency and transfer is evaluated,
although the collected records allow a successive computation.
The data collector of Web Polygraph is hybrid [42]. It stores records for later
computation of reply size and hit/miss response times. It also keeps global counts
for error rates, client and server side throughput, cache hit ratio and byte hit
ratio. No provision for session oriented metrics is provided. In Web Polygraph,
each client and server agent process generates its own log file. They have to be
manually concatenated before processing by the report module. Reports include
several performance graphs for throughput, cache hits and misses response times,
persistent connection usage, error rates.
SPECweb99 uses its own performance metric [38]. Substantially, it is the
maximum number of connections supported by the Web server under certain
conditions (throughput ranging between 300000 and 400000 bits per second).
To this purpose, SPECweb99 collects throughput, request and response times
over a single connection. According to online documentation [38], this is done in
a data set way. There is no session oriented statistics. In SPECweb99, the output
analyzer is centralized: data collected from each client is gathered by the master
process which reports test results for that iteration. A report consists of sum-
mary, results, overall metric, and configuration information. The SPECweb99
metric is the median of the connection average result over 3 iterations.
The TPC-W benchmark specification defines the collection of the Web In-
teraction response time (WIRT), which is the time interval occurring between
the sending of the first byte of a client request that starts a Web interaction
and the retrieval of the last byte in the last response of the same Web interac-
tion [26,41]. This is necessary to compute the final metric that is, the throughput
of Web interactions per second. The specification also suggests running perfor-
mance monitors on the servers for monitoring CPU utilization, memory utiliza-
tion, page/swap activity, database I/O activity, system I/O activity and Web
server statistics. The TPC-W benchmark specification defines three performance
indexes: WIPS, WIPSb, WIPSo, that are counted as the number of Web interac-
tions per second during shopping, browsing and ordering sessions, respectively.
The TPC-W specification recommends a report including graphs for the follow-
ing metrics: CPU utilization, memory utilization, page/swap activity, system
activity, Web server statistics (number of requests and error rates per second).
No session-oriented statistics are planned, but the provided graphs should give
an idea about the load conditions of the system.
8 Benchmark Support for Wide Area Networks
Benchmarking experiments of Web-server systems are usually carried out in a

closed, isolated, and high-speed local-area network (LAN) environment. These
laboratory experiments do not take into consideration network-related factors,
such as high and variable delays, transmission errors, packet losses, and net-
work connection limitations [28]. Modeling interactions that occur in the real
Web environment using clients machines connected to the Web-server system by
a low round-trip time, high-speed LAN may lead to incorrect results, because
the provision of Web-based services in the real world involves wide-area net-
work connections in which the presence of network components (such as routers)
make the environment noisy and error-prone and have influence on Web server
performance [33]. Therefore, to model interactions that occur in the real Web
environment using both clients machines connected to the Web-server system
by a low round-trip time, high-speed LAN may lead to incorrect results. Indeed,
as a result of benchmarking experiments carried out in LAN environments, it
occurs that performance aspects of the Web-server system that depend on the
network characteristics are not exposed or inaccurately evaluated. As a conse-
quence of WAN delays, Web server resources (such as listen socket’s SYN-RCVD
queue) remain tied up to clients for much longer periods and therefore the sys-
tem throughput decreases. Furthermore, in the wide-area Internet, packets are
lost or corrupted; this causes performance degradation as these packets have to
be retransmitted.
To take into account WAN effect in the benchmarking of distributed Web-
server systems two approaches are possible that is, WAN emulation in a LAN
environment and WAN environment. The first consists in emulating the WAN
characteristics in a controlled environment, where clients and server machines are
interconnected through a LAN network, by incorporating factors such as delays
and packet losses into the benchmarking tool. The WAN emulation approach
allows to perform the tests in a controllable, configurable, and reproducible en-
vironment, allowing easy changes in test conditions and iterative analysis [6,33,
37]. However, incorporating delays and packet losses due to WANs is not a trivial
task. On the other hand, experiments performed in a WAN environment allow to
identify many problems and causes of delays in Web transfers that do not mani-
fest themselves in a LAN environment [9,5,21]. At the same time, these wide-area
benchmarking experiments are hard to reproduce due to the uniqueness of the
test environment.
In the WAN environment, the benchmarking experiments are carried out
spreading the client machines in a wide area network. This approach suffers
from the difficulty in changing the network parameters of interest for different
test scenarios. Furthermore, it may be hard to generate a high workload using
it as discussed in [9], in which SURGE clients have been spread among different
network locations.
The majority of currently available Web benchmarking tools that operate in
high-speed LAN environment ignore the emulation of WAN conditions. Some
efforts in this direction have been pursued in some already considered bench-
marking tools (S-Client [6], WebPolygraph [42], and SpecWeb99 [38], although
quite limited in the latter) and also in WASPclient [33].
There are two main approaches that aim to emulate WAN conditions in
a LAN environment that is, centralized and distributed. In the centralized ap-
proach, one machine acting as a WAN emulator is interposed between the client
machines and the Web-server system to model WAN delays and packet losses
by dropping and delaying packets. S-Clients follows this approach, by putting a
router between the S-Client machines and the server system aimed at introducing
an artificial delay and dropping packets at a controlled rate [6].
In the distributed approach, each client acts as a WAN emulator, by di-
rectly delaying and dropping packets. WASPclient implements an interesting
distributed approach [33], by using an extended Dummynet layer in the proto-
col stack of the client machines to drop and delay packets [36]. The centralized
approach is transparent to the operating system of both client and server ma-
chines; however its scalability is limited [33]. On the contrary, the distributed
approach has the advantage that it provides a higher scalability, but it requires
modifications to the operating system of the client machines.
9 Conclusions
This study leads us to conclude that many Web benchmark tools work fine when
used to analyze a single server system, but none of them is able to address all
issues related to the analysis of distributed Web-server systems. Many popular
tools, such as SURGE and Webstone, suffer age problems, as they do not sup-
port dynamic requests and more recent protocols. Very few of them consider
application-level routing of the requests, such as DNS and HTTP redirection,
URL rewriting. In summary, we notice the lack of ability to sustain realistic Web
traffic under critical load conditions, the difficulty or impossibility of emulating
realistic dynamic and secure Web services, the poor support in analyzing col-
lected statistics different from min, max, mean values. Hence, we can conclude
that there is a lot of room for further research and implementation in this area.
References
[1] M. Arlitt. Characterizing Web user sessions. ACM Performance Evaluation Re-
view, 28(2):50–63, Sept. 2000.
[2] M. Arlitt, D. Krishnamurthy, and J. Rolia. Characterizing the scalability of a
large Web-based shopping system. ACM Trans. on Internet Technology, 1(1):44–
69, Sept. 2001.
[3] M. F. Arlitt and T. Jin. A workload characterization study of the 1998 World
Cup Web site. IEEE Network, 14(3):30–37, May/June 2000.
[4] M. F. Arlitt and C. L. Williamson. Internet Web servers: Workload characteriza-
tion and performance implications. IEEE/ACM Trans. on Networking, 5(5):631–
645, Oct. 1997.
[5] H. Balakrishnan, V. Padmanabhan, S. Seshan, M. Stemm, and R. Katz. TCP
behavior of a busy Internet server: Analysis and improvements. In Proc. of IEEE
Infocom 1998, pages 252–262, San Francisco, CA, Mar. 1998.
[6] G. Banga and P. Druschel. Measuring the capacity of a Web server under realistic
loads. World Wide Web, 2(1-2):69–89, May 1999.
[7] P. Barford and M. E. Crovella. Generating representative Web workloads for net-
work and server performance evaluation. In Proc. of ACM Performance 1998/Sig-
metrics 1998, pages 151–160, Madison, WI, July 1998.
[8] P. Barford and M. E. Crovella. A performance evaluation of Hyper Text Transfer
Protocols. In Proc. of ACM Sigmetrics 1999, pages 188–197, Atlanta, May 1999.
[9] P. Barford and M. E. Crovella. Critical path analysis of TCP transactions.
IEEE/ACM Trans. on Networking, 9(3):238–248, June 2001.
[10] V. Cardellini, E. Casalicchio, M. Colajanni, and P. S. Yu. The state of the art in
locally distributed Web-server systems. ACM Computing Surveys, 34(2):263–311,
June 2002.
[11] V. Cardellini, M. Colajanni, and P. S. Yu. Geographic load balancing for scalable
distributed Web systems. In Proc. of IEEE MASCOTS 2000, pages 20–27, San
Francisco, CA, Aug./Sept. 2000.
[12] M. E. Crovella and A. Bestavros. Self-similarity in World Wide Web traffic:
Evidence and possible causes. IEEE/ACM Trans. on Networking, 5(6):835–846,
Dec. 1997.
[13] R. T. Fielding, J. Gettys, J. C. Mogul, H. F. Frystyk, L. Masinter, P. J. Leach,
and T. Berners-Lee. Hypertext Transfer Protocol – HTTP/1.1. RFC 2616, June
1999.
[14] Intel Corp. Using the RDTSC instruction for performance monitoring, July 1998.
http://cedar.intel.com/software/idap/media/pdf/rdtscpm1.pdf.
[15] A. K. Iyengar, M. S. Squillante, and L. Zhang. Analysis and characterization
of large-scale Web server access patterns and performance. World Wide Web,
2(1-2):85–100, Mar. 1999.
[16] R. Jain. The Art of Computer Systems Performance Analysis: Techniques for Ex-
perimental Design, Measurement, Simulation, and Modeling. Wiley-Interscience,
1991.
[17] R. Jain and I. Chlamtac. The P-Square algorithm for dynamic calculation of
percentiles and histograms without storing observations. ACM Communications,
28(10), Oct. 1985.
[18] D. Kegel. The C10K problem, 2002. http://www.kegel.com/c10k.html.
[19] B. Krishnamurthy, J. C. Mogul, and D. M. Kristol. Key differences between
HTTP/1.0 and HTTP/1.1. Computer Networks, 31(11-16):1737–1751, 1999.
[20] B. Krishnamurthy and J. Rexford. Web Protocols and Practice: HTTP/1.1, Net-
working Protocols, Caching, and Traffic Measurement. Addison-Wesley, Reading,
MA, 2001.
[21] B. Krishnamurthy and C. E. Wills. Analyzing factors that influence end-to-end
Web performance. Computer Networks, 33(1-6):17–32, 2000.
[22] D. Krishnamurthy and J. Rolia. Predicting the QoS of an electronic commerce
server: Those mean percentiles. In Proc. of Workshop on Internet Server Perfor-
mance, Madison, WI, June 1998.
[23] B. Lavoie and H. F. Frystyk. Web Characterization Terminology & Definitions

Sheet. W3C Working Draft, May 1999.
[24] Z. Liu, N. Niclausse, and C. Jalpa-Villanueva. Traffic model and performance
evaluation of Web servers. Performance Evaluation, 46(2-3):77–100, Oct. 2001.
[25] S. Manley, M. Seltzer, and M. Courage. A self-scaling and self-configuring bench-
mark for Web servers. In Proc. of ACM Sigmetrics 1998 Conf., pages 170–171,
Madison, WI, June 1998.
[26] D. A. Menascé. TPC-W: A benchmark for e-commerce. IEEE Internet Computing,
6(3):83–87, May/June 2002.
[27] D. A. Menascé and V. A. F. Almeida. Scaling for E-business. Technologies, Mod-
els, Performance and Capacity planning. Prentice Hall, Upper Saddle River, NJ,
2000.
[28] D. A. Menascé and V. A. F. Almeida. Capacity Planning for Web Services.
Metrics, Models, and Methods. Prentice Hall, Upper Saddle River, NJ, 2002.
[29] J. Midgley. Autobench, 2002. http://http://www.xenoclast.org/autobench/.
[30] Mindcraft. WebStone. http://www.mindcraft.com/webstone/.
[31] N. Modadugu. WebStone SSL.
http://crypto.stanford.edu/ñagendra/projects/WebStone/.
[32] D. Mosberger and T. Jin. httperf — A tool for measuring Web server performance.
ACM Performance Evaluation Review, 26(3):31–37, Dec. 1998.
[33] E. M. Nahum, M. Rosu, S. Seshan, and J. Almeida. The effects of wide-area
conditions on WWW server performance. In Proc. of ACM Sigmetrics 2001,
pages 257–267, Cambridge, MA, June 2001.
[34] Neal Nelson. Web Server Benchmark. http://www.nna.com/.
[35] M. Rabinovich and O. Spatscheck. Web Caching and Replication. Addison Wesley,
2002.
[36] L. Rizzo. Dummynet: A simple approach to the evaluation of network protocols.
ACM Computer Communication Review, 27(1):31–41, Jan. 1997.
[37] R. Simmonds, C. Williamson, M. Arlitt, R. Bradford, and B. Unger. A case study
of Web server benchmarking using parallel WAN emulation. In Proc. of IFIP
Int’l Symposium Performance 2002, Roma, Italy, Sept. 2002.
[38] Standard Performance Evaluation Corp. SPECweb99.
http://www.spec.org/osg/web99/.
[39] Standard Performance Evaluation Corp. SPECweb99 SSL.
http://www.spec.org/osg/web99ssl/.
[40] Technovations. Websizr. http://www.technovations.com/websizr.htm.
[41] Transaction Processing Performance Council. TPC-W.
http://www.tpc.org/tpcw/.
[42] Web Polygraph. http://www.web-polygraph.org/.
[43] W. Willinger, M. S. Taqqu, R. Sherman, and D. V. Wilson. Self-similarity through
high-variability: Statistical analysis of Ethernet LAN traffic at the source level.
IEEE/ACM Trans. on Networking, 5(1):71–86, Jan. 1997.
[44] C.-S. Yang and M.-Y. Luo. A content placement and management system for
distributed Web-server systems. In Proceedings of the 20th IEEE International
Conference on Distributed Computing Systems, pages 691–698, Taipei, Taiwan,
Apr. 2000.
[45] Ziff Davis Media. WebBench.
http://www.etestinglabs.com/benchmarks/webbench/webbench.asp.
Stochastic Process Algebra: From an Algebraic
Formalism to an Architectural Description
Language
Marco Bernardo1 , Lorenzo Donatiello2 , and Paolo Ciancarini2

1
Università di Urbino, Centro per l’Appl. delle Sc. e Tecn. dell’Inf.
Piazza della Repubblica 13, 61029 Urbino, Italy
[email protected]
2
Università di Bologna, Dipartimento di Scienze dell’Informazione
Mura Anteo Zamboni 7, 40127 Bologna, Italy
donat, [email protected]
Abstract. The objective of this tutorial is to describe the evolution of

the field of stochastic process algebra in the past decade, through a pre-
sentation of the main achievements in the field. In particular, the tutorial
stresses the current transformation of stochastic process algebra from a
simple formalism to a fully fledged architectural description language
for the functional verification and performance evaluation of complex
computer, communication and software systems.
1 Introduction
Many computing systems consist of a possibly huge number of components that
not only work independently but also communicate with each other. Examples of
such systems are communication protocols, operating systems, embedded control
systems for automobiles, airplanes, and medical equipment, banking systems,
automated production systems, control systems of nuclear and chemical plants,
railway signaling systems, air traffic control systems, distributed systems and
algorithms, computer architectures, and integrated circuits.
The catastrophic consequences – loss of human lives, environmental damages,
and financial losses – of failures in many of these critical systems have compelled
computer scientists and engineers to develop techniques for ensuring that these
systems are designed and implemented correctly despite of their complexity.
The need of formal methods in developing complex systems is becoming well
accepted. Formal methods seek to introduce mathematical rigor into each stage
of the design process in order to build more reliable systems.
The need of formal methods is even more urgent when planning and im-
plementing concurrent and distributed systems. In fact, they require a huge
amount of detail to be taken into account (e.g., interconnection and synchroniza-
tion structure, allocation and management of resources, real time constraints,
performance requirements) and involve many people with different skills in the
project (designers, implementors, debugging experts, performance and quality

Stochastic Process Algebra 237
analysts). A uniform and formal description of the system under investigation

reduces misunderstandings to a minimum when passing information from one
task of the project to another.
Moreover, it is well known that the sooner errors are discovered, the less
costly they are to fix. Consequently, it is imperative that a correct design is
available before implementation begins. Formal methods are conceived to allow
the correctness of a system design to be formally verified. Using formal methods,
the design can be described in a mathematically precise fashion, correctness cri-
teria can be specified in a similarly precise way, and the design can be rigorously
proved to meet or not the stated criteria.
Although a number of description techniques and related software tools have
been developed to support the formal modeling and verification of functional
properties of systems, only in recent years temporal characteristics have received
attention. This has required extending formal description techniques by intro-
ducing the concept of time, represented either in a deterministic way or in a
stochastic way.
In the deterministic case, the focus typically is on verifying the satisfaction
of real time constraints, i.e. the fact that the execution of specific actions is
guaranteed by a fixed deadline after some event has happened. As an example,
if a train is approaching a railroad crossing, then bars must be guaranteed to be
lowered on due time.
In the stochastic case, instead, systems are considered whose behavior can-
not be deterministically predicted as it fluctuates according to some probability
distribution. Due to economic reasons, such stochastically behaving systems are
referred to as shared resource systems, because there is a varying number of
demands competing for the same resources. The consequences are mutual in-
terference, delays due to contention, and varying service quality. Additionally,
resource failures significantly influence the system behavior. In this case, the
focus is on evaluating the performance of the systems. As an example, if we
consider again a railway system, we may be interested in minimizing the average
train delay or studying the characteristics of the flow of passengers. The purpose
of performance evaluation is to investigate and optimize the time varying behav-
ior within and among individual components of shared resource systems. This
is achieved by modeling and assessing the temporal behavior of systems, iden-
tifying characteristic performance measures, and developing design rules that
guarantee an adequate quality of service.
The desirability of taking account of the performance aspects of shared re-
source systems in the early stages of their design has been widely recognized [33,
68] and has fostered the development of formal methods for both functional ver-
ification and performance evaluation of rigorous system models. The focus of
this tutorial is on stochastic process algebra (SPA), a formalism proposed in a
seminal work by Ulrich Herzog [46,47] in the early ’90s, whose growing interest
is witnessed by the annual organization of the international workshop on Pro-
cess Algebra and Performance Modeling (PAPM) and a number of Ph.D. theses
on this subject [48,38,70,67,63,54,62,59,40,53,10,31,24,29,22,25]. With respect to
238 M. Bernardo, L. Donatiello, and P. Ciancarini
formalisms traditionally used for performance evaluation purposes like Markov

chains (MCs) and queueing networks (QNs) [56,58], SPA provides a more com-
plete framework in which also functional verification can be carried out. With
respect to previous formal methods for performance evaluation like stochastic
Petri nets (SPNs) [1], SPA provides novel capabilities related to compositionality
and abstraction that help system modeling and analysis. The first part of this
tutorial (Sect. 2) is devoted to the presentation of the main results achieved in
the field of SPA since the early ’90s.
Although SPA supports compositional modeling via algebraic operators, this
feature has not been exploited yet to enforce a more controlled way of describ-
ing systems that makes SPA technicalities transparent. By this we mean that
in a SPA specification the basic concepts of system component and connection
are not clearly elucidated, nor checks are available to detect mismatches when
assembling components together. Since nowadays systems are made out of nu-
merous components, in the early design stages it is crucial to be equipped with a
formal specification language that permits to reason in terms of components and
component interactions and to identify components that result in mismatches
when put together. The importance of this activity is witnessed by the growing
interest in the field of software architecture and the development of architectural
description languages (ADLs) [61,66]. The formal description of the architecture
of a complex system serves two purposes. First and foremost is making avail-
able a precise document describing the structure of the system to all the people
involved in the design, implementation, and maintainance of the system itself.
The second one is concerned with the possibility of analyzing the properties of
the system at the architectural level, thus allowing for the early detection of
design errors. The second part of this tutorial (Sect. 3) is devoted to show how
SPA can easily be transformed into a compositional, graphical and hierchical
ADL endowed with some architectural checks, which can be profitably employed
for both functional verification and performance evaluation at the architectural
level of design.
The tutorial finally concludes with some remarks about future directions in
the field of SPA based ADLs.
2 SPA: Basic Notions and Main Achievements

SPA is a compositional specification language of algebraic nature that integrates
process algebra theory [60,50,5] and stochastic processes. In this section we pro-
vide a quick overview of the basic notions about the syntax, the semantics, and
the equivalences for SPA, as well as the main results and applications that have
been developed in the past decade.
2.1 Syntax: Actions, Operators, and Synchronization Disciplines

SPA is characterized by three main ingredients: the actions modeling the system
activities, the algebraic operators whereby composing the subsystem specifica-
tions, and the synchronization disciplines.
An action is usually composed of a type a and an exponential rate λ: <a, λ>

[48,45,27]. The type indicates the kind of activity that is performed by the system
at a certain point, while the rate indicates the reciprocal of the average dura-
tion of the activity assuming that the duration is an exponentially distributed
random variable. A special action type, traditionally denoted by τ , designates a
system activity whose functionality cannot be observed and serves for functional
abstraction purposes. In order to increase the expressiveness, in [10] prioritized,
weighted immediate actions of the form <a, ∞l,w > are proposed, which are use-
ful to model activities whose timing is irrelevant from the performance viewpoint
as well as activities whose duration follows a phase type distribution. In alter-
native to the durational actions considered so far, in [40] a different view is
taken according to which an action is either an instantaneous activity a or an
exponentially distributed time passage λ.
Several algebraic operators are usually present. The zeroary operator 0 rep-
resents the term that cannot execute any action. The action prefix operator
<a, λ>.E denotes the term that can execute an action with type a and rate λ
and then behaves as term E; in the approach of [40], there are the two action
prefix operators a.E and λ.E. The functional abstraction operator E/L, where
L is a set of action types not including τ , denotes the term that behaves as
term E except that the type a of each executed action is turned into τ whenever
a ∈ L. The functional relabeling operator E[ϕ], where ϕ is a function over action
types preserving observability, denotes a term that behaves as term E except
that the type a of each executed action becomes ϕ(a). The alternative compo-
sition operator E1 + E2 denotes a term that behaves as either term E1 or term
E2 depending on whether an action of E1 or an action of E2 is executed. The
action choice is regulated by the race policy (the fastest one succeeds), so that
each action of E1 and E2 has an execution probability proportional to its rate.
In the approach of [10], immediate actions take precedence over exponentially
timed ones and the choice among them is governed by the preselection policy:
the lower priority immediate actions are discarded, then each of the remaining
immediate actions is given an execution probability proportional to its weight.
In the approach of [40], the choice between two instantaneous activities is non-
deterministic. The parallel composition operator E1 S E2 , where S is a set of
action types not including τ , denotes a term that asynchronously executes ac-
tions of E1 or E2 whose type does not belong to S, and synchronously executes
– according to a synchronization discipline – equally typed actions of E1 and
E2 whose type belongs to S. Finally, a constant A denotes a term that behaves
Δ
according to the associated defining equation A = E, which allows for recursive
behaviors.
There are many different synchronization disciplines. In [45] the rate of the
action resulting from the synchronization of two actions is the product of the
rates of the two synchronizing actions, where the physical interpretation is that
one rate is the formal rate and the other rate acts like a scaling factor. In [48]
the bounded capacity assumption is introduced, according to which the rate
of an action cannot be increased/decreased due to the synchronization with
another action of the same type. In this approach, patient synchronizations are
considered, i.e. the rate of the action resulting from the synchronization of two
equally typed actions of E1 and E2 is given by the minimum of the two total
rates with which E1 and E2 can execute actions of the considered type, multiplied
by the local execution probabilities of the two synchronizing actions. Following
the terminology of [36], in [26] a generative-reactive synchronization discipline
complying with the bounded capacity assumption is adopted, which is based on
the systematic use of prioritized, weighted passive actions of the form <a, ∗l,w >.
The idea is that the nonpassive actions probabilistically determine the type of
action to be executed at each step, while the passive actions of the determined
type probabilistically react in order to identify the subterms taking part in the
synchronization. In order for two equally typed actions to synchronize, in this
approach one of them must be passive and the rate of the resulting action is
given by the rate of the nonpassive action multiplied by the local execution
probability of the passive action. Finally, in [40] equal instantaneous activities
can synchronize, while time passages cannot. Therefore, when both E1 and E2
can let time pass, in this approach the overall time passage is the maximum of
the two local, exponentially distributed time passages.
2.2 Semantics: Interleaving and Memoryless Property
The semantics for SPA is defined in an operational fashion by means of a set of

axioms and inference rules that formalize the meaning of the algebraic opera-
tors. The result of the application of such rules is a labeled transition system
(LTS), where states are in correspondence with process terms and transitions
are labeled with actions. As an example, the axiom for the action prefix operator
a,λ
<a, λ>.E −−−→ E
establishes that term/state <a, λ>.E can evolve into term/state E by perform-
ing action/transition <a, λ>. As another example, the inference rule for the
functional relabeling operator
a,λ
E −−−→ E
ϕ(a),λ
E[ϕ] −−−→ E [ϕ]
establishes that, whenever term/state E can evolve into term/state E by per-
forming action/transition <a, λ>, term/state E[ϕ] can evolve into term/state
E [ϕ] by performing action/transition <ϕ(a), λ>.
The most complicated inference rules are those for the alternative composi-
tion operator and the parallel composition operator. As far as the alternative
composition operator is concerned, the problem is that, in the case of terms like
<a, λ>.E + <a, λ>.E, the transition generation process must keep track of the
fact that the total rate is 2 · λ by virtue of the race policy. In [48] it is proposed
to use labeled multitransition systems, so that a single transition labeled with
<a, λ> is generated for the term above, which has multiplicity two. In [45], in-
stead, it is proposed to decorate the transitions with an additional distinguishing
label, whose value depends on whether the transitions are due to the left hand
side or the right hand side summand of the alternative compositions. As far as
the parallel composition operator is concerned, the related inference rules must
embody the desired synchronization discipline.
The resulting LTS is an interleaving semantic model, which means that ev-
ery parallel computation is represented through a choice between all the se-
quential computations that can be obtained by interleaving the execution of the
actions of the subterms composed in parallel. As an example, the parallel term
<a, λ>.0 ∅ <b, μ>.0 and the sequential term <a, λ>.<b, μ>.0+<b, μ>.<a, λ>.0
are given the same LTS up to state names:
a ,λ b ,μ
b ,μ a ,λ
This is correct from the functional viewpoint, because an external observer,

who is not aware of the structure of the systems represented by the two terms,
sees exactly the same behavior. Moreover, this is correct from the performance
viewpoint as well, by virtue of the memoryless property of the exponential dis-
tribution. For instance, if in the parallel term action <a, λ> is completed before
action <b, μ>, then state 0 ∅ <b, μ>.0 is reached and the time to the completion
of action <b, μ> is still exponentially distributed with rate μ. In other words,
the interleaving style fits well with the fact that the execution of an exponen-
tially timed action can be considered as being started in the state in which it
terminates.
The LTS produced by applying the operational semantic rules to a pro-
cess term represents the integrated semantic model of the process term. It can
undergo to integrated analysis techniques, like integrated model checking [8]
and integrated equivalence checking (see Sect. 2.3), to detect mixed functional-
performance properties, like the probability of executing a certain sequence of
activities. From the integrated semantic model, two projected semantic mod-
els can be derived. The functional semantic model is a LTS obtained by dis-
carding information about action rates; it can be analyzed through traditional
techniques like model checking [30] and equivalence/preorder checking [28]. The
performance semantic model is a LTS obtained by discarding information about
action types, which happens to be a continuous time Markov chain (CTMC). In
the approach of [26], where prioritized, weighted immediate and passive actions
are considered, the projected semantic models are generated after pruning the
lower priority transitions from the integrated semantic model. Furthermore, the
performance semantic model can be generated only if the integrated semantic
model is performance closed, i.e. has no passive transitions. If this is the case,
the performance semantic model is a CTMC whenever the integrated semantic
model has only exponentially timed transitions or both exponentially timed and
immediate transitions (in which case states having outgoing immediate transi-
tions are removed as their sojourn time is zero). If instead the integrated semantic
model has only immediate transitions, then it is assumed that the execution of
each of them takes one time unit so that the performance model turns out to be
a discrete time Markov chain (DTMC). CTMCs and DTMCs can then be ana-
lyzed through standard techniques [69], mainly based on rewards [52], to derive
performance measures.
2.3 Equivalences: Congruence and Lumpability

SPA terms can be equated on the basis of their functional and performance
behavior. The mostly used method is that, inspired by [57], of the Markovian
bisimulation equivalence [48,45,27], based on the ability of two terms of sim-
ulating each other behavior. The idea is that, given an equivalence relation B
over process terms, B is a Markovian bisimulation if, for each pair (E1 , E2 ) ∈ B,
action type a, and equivalence class C of B, the total rate with which E1 reaches
states in C by executing actions of type a is equal to the total rate with which E2
reaches states in C by executing actions of type a. The Markovian bisimulation
equivalence is then defined as the union of all the Markovian bisimulations.
The Markovian bisimulation equivalence enjoys several properties. First, it
is a congruence w.r.t. all the operators as well as recursive constant defining
equations [48,45,27,26]. This ensures substitutivity, i.e. compositionality at the
semantic level: given a term, if any of each subterms is replaced by a Marko-
vian bisimulation equivalent subterm, the new term is Markovian bisimulation
equivalent to the original one. Second, the Markovian bisimulation equivalence
complies with the ordinary lumping for MCs [64], thus ensuring that equivalent
terms possess the same performance characteristics [48,27]. Third, the Markovian
bisimulation equivalence is the coarsest congruence contained in the intersection
of the bisimulation equivalence [60] and the ordinary lumping, which means that
it is the best Markovian equivalence we can hope for in a bisimulation setting [10].
Fourth, the Markovian bisimulation equivalence has a sound and complete ax-
iomatization – with <a, λ1 >.E + <a, λ2 >.E = <a, λ1 + λ2 >.E as typical axiom
besides the usual expansion law for the parallel composition operator – which
provides an alternative characterization easier to understand [45].
There are some variants of the Markovian bisimulation equivalence. In the
approach of [26], the Markovian bisimulation equivalence is extended to deal with
prioritized, weighted immediate and passive actions. In the approach of [40], a
weak Markovian bisimulation equivalence is defined that abstracts from instan-
taneous τ activities. In [48], a different weak Markovian bisimulation equivalence
is proposed that, in some cases, abstracts from exponentially timed τ actions.
Finally, an alternative view is taken in [17]: following the testing approach of [32],
it is proposed of equating two terms whenever they have the same probability
to pass the same tests within the same average time. The resulting equivalence,
called Markovian testing equivalence, is coarser than the Markovian bisimulation
equivalence, abstracts from internal immediate actions and in some cases from
internal exponentially timed actions, and possesses an alternative characteriza-
tion in terms of extended traces. The congruence property, the axiomatization,
and the relationship with the ordinary lumping for the Markovian testing equiv-
alence are still under investigation. As far as ordinary lumping is concerned, it
is known that in some cases the Markovian testing equivalence produces a more
compact exact aggregation.
2.4 Performance Properties: Algebraic and Logic Approaches

SPA provides the capability of expressing the performance aspects of the be-
havior of complex systems, but not the performance properties of interest. In a
Markovian framework, stationary and transient performance measures (system
throughput, resource utilization, average buffer occupation, mean response time,
etc.) are usually described as weighted sums of state probabilities and transition
frequencies, where state weights are called yield rewards and transition weights
are called bonus rewards [52].
In [13,12] it is proposed to reuse the classical technique of rewards by ex-
tending the action format to include as many pairs of yield and bonus rewards
as there are performance measures of interest. In this framework, at semantic
model construction time every state is given a yield reward that is equal to the
sum of the yield rewards of the actions it can execute. The Markovian bisimu-
lation equivalence is then extended to take rewards into account, in a way that
preserves compositionality as well as the performance measures of interest.
In [29] an alternative reward based approach is proposed, which associates
certain rewards with those states satisfying certain formulas of a Markovian
modal logic that characterizes the Markovian bisimulation equivalence. This ap-
proach is implemented through a high level language for enquiring about the
stationary performance characteristics possessed by a process term. Such a lan-
guage, whose formal underpinning is constituted by the Markovian modal logic,
relies on the combination of the standard mathematical notation, a notation
based on the Markovian bisimulation equivalence to focus queries directly on
states, and a notation expressing the potential to perform an action of a given
type.
Finally, in [8,7,6] it is proposed to directly express the performance properties
of interest through logical formulas, whose validity is verified through an inte-
grated model checking procedure. The continuous stochastic logic is used in this
framework to inquiry about the value of stationary and transient performability
measures of a system. According to the observation that the progress of time
can be regarded as the earning of reward, a reward based variant of such a logic
is then introduced, where yield rewards are assumed to be already attached to
the states.
2.5 General Distributions

When introducing generally distributed durations in SPA, the memoryless prop-
erty can no longer be exploited to define the semantics in the plain interleaving
style. The reason is that the actions can no more be thought of as being started
in the states where they are terminated; the underlying performance models are
no longer MCs. Therefore, we have to keep track of the sequence of states in
which an action started and continued its execution.
There are several approaches in the literature, among which we mention

below those for which a notion of equivalence (in the bisimulation style) is de-
veloped. In [70] the problem of identifying the start and the termination of an
action is solved at the syntactic level by means of suitable operators that rep-
resent the random setting of a timer and the expiration of a timer, respectively.
Semantic models are infinite LTSs from which performance measures can be
derived via simulation.
In [31] the problem is again solved at the syntax level through suitable clock
related operators, with the difference that the semantic models are finitely rep-
resented through stochastic automata equipped with clocks.
In [25], instead, the problem of identifying the start and the termination of
an action is addressed at the semantic level through the ST approach of [37]. At
semantic model construction time, the start and the termination of each action
are distinguished and kept connected to each other. This framework naturally
supports action refinement, which can be exploited to replace a generally timed
action with a process term composed only of exponentially timed actions result-
ing in a phase type duration that approximates the original duration.
2.6 State Space Explosion

The semantic models for SPA are state based, hence suffer from the state space
explosion problem, i.e. the fact that the size of the state space grows exponen-
tially with the number of subterms composed in parallel. In general, this problem
can be tackled with traditional congruence based techniques. For instance, it is
wise to build the state space underlying a process term in a stepwise fashion,
along the structure imposed by the occurrences of the parallel composition op-
erator, and minimize the state space obtained at every step according to the
Markovian bisimulation equivalence. An alternative strategy is to operate at the
syntactical level using the axioms of the Markovian bisimulation equivalence as
rewriting rules.
More specific techniques to fight the state space explosion problem are present
in the literature. Among them we mention those based on Kronecker represen-
tation [27,67], time scale decomposition [59], product form solution [39,65,49],
symbolic representation [44], stochastic Petri net semantics [14], and queueing
network representation [9].
2.7 Tools and Case Studies

A few tools are under distribution for the modeling and analysis of systems with
SPA. Among them we mention the PEPA Workbench [34], the TIPPtool [55],
and TwoTowers [18].
With such tools several case studies have been conducted, which are con-
cerned with computer systems, communication protocols, and distributed algo-
rithms. Among such case studies we mention those related to CSMA/CD [10],
token ring [10], electronic mail system [41], multiprocessor mainframe [42], in-
dustrial production cell [51], robot control [35], plain old telephone system [43],
multimedia stream [23], adaptive mechanisms for transmitting voice over IP [21,
3], ATM switches [2], replicated web services [11], Lehmann-Rabin randomized
algorithm for dining philosophers [10], and comparison of six mutual exclusion
algorithms [13].
3 Turning SPA into an ADL
SPA supports the compositional modeling of complex systems via algebraic op-
erators. However, this feature has not been exploited yet to enforce an easier
and more controlled way of describing systems that makes SPA technicalities
transparent to the designer. As an example, if a system is made out of a certain
number of components, with SPA the system is simply described as the parallel
composition of a certain number of subterms, each representing the behavior of
a single component, with suitable synchronization sets to represent the compo-
nent interactions. It is desirable to be able to describe the same system at a
higher level of abstraction, where the parallel composition operators and the re-
lated synchronization sets do not come into play. It is more natural to separately
define the behavior of each type of component, to indicate the actions through
which each component type interacts with the others, to declare the instances of
each component type that form the system, and to specify the way in which the
interacting actions are attached to each other in order to make the component
instances interact. This view brings the advantage that the system components
and the component interactions are clearly elucidated, with the synchronization
mechanism being hidden (e.g. interacting actions must not necessarily have the
same type). Another strength is the capability of defining the behavior – possi-
bly parametrized w.r.t. action rates – and the interactions of a component type
just once and subsequently reusing it as many times as there are instances of
that component type in the system. Additionally, it is desirable that composite
systems can be described in a hierachical way, and that a graphical support is
provided for the whole modeling process.
Besides this useful syntactical sugar, checks are needed to detect possible
mismatches when assembling components together and to identify the compo-
nents that cause such mismatches. A typical example is deadlock freedom. If
we put together some components that we know to be deadlock free, we would
like that their combination is still deadlock free. In order to investigate that,
we need suitable checks that allow deadlock to be quickly detected and some
diagnostic information to be obtained for localizing the source of deadlock. As
another example, in order to evaluate the performance of a system, its model
must be performance closed. In this case, a check at the syntax level is helpful
to easily detect and pinpoint possible violations of the performance closure.
In this section we show how SPA can be enhanced to work with at the ar-
chitectural level of design. Based on ideas contained in [4,16,19,20], we illustrate
how SPA can be turned into a fully fledged ADL for the modeling, functional
verification, and performance evaluation of complex systems. Recalled that the
transformation is largely independent of the specific SPA, we concentrate on
EMPAgr [26] – which includes prioritized, weighted immediate and passive ac-
tions and the generative-reactive synchronization discipline – and we exhibit the
resulting SPA based ADL called Æmilia [15,9]. The description of a system with
Æmilia can be done in a compositional, hierachical, graphical and controlled
way. First, we have to define the behavior of the types of components in the
system and their interactions with the other components. The functional and
performance aspects of the behavior are described through a family of EMPAgr
terms or the invocation of the specification of a previously modeled system,
while the interactions are described through actions occurring in the behavior.
Second, we have to declare the instances of each type of component present in
the system and the way in which their interactions are attached to each other
in order to allow the instances to communicate. This process is supported by
a graphical notation. Then, the whole behavior of the system is a family of
EMPAgr terms transparently obtained by composing in parallel the behavior of
the declared instances according to the specified attachments. From the whole
behavior, integrated, functional and performance semantic models can be au-
tomatically derived, which can undergo to the analysis techniques mentioned
in Sect. 2. In addition to that, Æmilia comes equipped with some architectural
checks for ensuring deadlock freedom and performance closure.
3.1 Components and Topology: Textual and Graphical Notations
A description in Æmilia represents an architectural type. As shown in Table 1,

the description of an architectural type starts with the name of the architectural
type and its numeric parameters, which often are values for exponential rates
and weights. Each architectural type is defined as a function of its architectural
element types (AETs) and its architectural topology. An AET is defined as a
function of its behavior, specified either as a family of sequential 1 EMPAgr
terms or through an invocation of a previously defined architectural type, and
its interactions, specified as a set of EMPAgr action types occurring in the be-
havior that act as interfaces for the AET. The architectural topology is specified
through the declaration of a set of architectural element instances (AEIs) rep-
resenting the system components, a set of architectural (as opposed to local)
interactions given by some interactions of the AEIs that act as interfaces for
the whole architectural type, and a set of directed architectural attachments
among the interactions of the AEIs. Every interaction is declared to be an input
interaction or an output interaction and the attachments must respect such a
classification: every attachment must involve an output interaction and an input
interaction of two different AEIs. An AEI can have different types of interactions
(input/output, local/architectural); it must have at least one local interaction.
Every local interaction must be involved in at least one attachment, while ev-
ery architectural interaction must not be involved in any attachment. In order
to allow several AEIs to synchronize, every local interaction can be involved in
1
Including only 0, constants, action prefix operators, and alternative composition
operators.
several attachments provided that no autosynchronization arises, i.e. no chain

of attachments is created that starts from a local interaction of an AEI and
terminates on a local interaction of the same AEI. On the performance side,
we require that, for the sake of modeling consistency, all the occurrences of an
action type in the behavior of an AET have the same kind of rate (exponential,
immediate with the same priority level, or passive with the same priority level)
and that, to comply with the generative-reactive synchronization discipline of
EMPAgr , every chain of attachments contains at most one interaction whose
associated rate is exponential or immediate.
Table 1. Structure of an Æmilia textual description
archi type name and numeric parameters

archi elem types architectural element types: behaviors and
interactions
archi topology
archi elem instances architectural element instances
archi interactions architectural interactions
archi attachments architectural attachments
end
We now illustrate the textual notation of Æmilia by means of an example

concerning a pipe-filter system. The system is composed of three identical filters
and one pipe. Each filter acts as a service center of capacity two that is subject
to failures and subsequent repairs, which is characterized by a service rate σ,
a failure rate φ, and a repair rate ρ. For each item processed by the upstream
filter, the pipe instantaneously forwards it to one of the two downstream filters
according to the availability of free positions in their buffers. If both have free
positions, the choice is resolved probabilistically based on prouting . The Æmilia
textual description is provided in Table 2. 2 Such a description establishes that
there are three instances F0 , F1 , and F2 of FilterT as well as one instance P of
PipeT , connected in such a way that the items flow from F0 to P and from P to
F1 or F2 . It is worth observing that the system components are clearly elucidated
and easily connected to each other, and that the numeric parameters allow for a
good degree of specification reuse: e.g., the behavior of the filters is defined only
once. Additionally, the accept item input interaction of F0 and the serve item
output interactions of F1 and F2 are declared as being architectural. Therefore,
they can be used for hierchical modeling, e.g. to describe a client-server system
where the server structure is like the pipe-filter organization above.
Æmilia comes equipped with a graphical notation as well, in order to provide
a visual help during the architectural design of complex systems. Such a graphical
notation is based on flow graphs [60]. In a flow graph representing an architec-
2
Wherever omitted, priority levels and weights are taken to be 1.
Table 2. Textual description of PipeFilter
archi type PipeFilter (exp rate σ0 , σ1 , σ2 , φ0 , φ1 , φ2 , ρ0 , ρ1 , ρ2 ;

weight prouting )
archi elem types
elem type FilterT (exp rate σ, φ, ρ)
Δ
behavior Filter = <accept item, ∗>.Filter +
<fail, φ>.<repair , ρ>.Filter
Δ
Filter = <accept item, ∗>.Filter +
<serve item, σ>.Filter +
Δ
Filter = <serve item, σ>.Filter +
interactions input accept item
output serve item
elem type PipeT (weight p)
Δ
behavior Pipe = <accept item, ∗>.(<forward item 1 , ∞1,p >.Pipe +
<forward item 2 , ∞1,1−p >.Pipe)
interactions input accept item
output forward item 1 , forward item 2
archi topology
archi elem instances F0 : FilterT (σ0 , φ0 , ρ0 )
F1 : FilterT (σ1 , φ1 , ρ1 )
F2 : FilterT (σ2 , φ2 , ρ2 )
P : PipeT (prouting )
archi interactions input F0 .accept item
output F1 .serve item, F2 .serve item
archi attachments from F0 .serve item to P.accept item
from P.forward item 1 to F1 .accept item
from P.forward item 2 to F2 .accept item
end
tural description in Æmilia, the boxes denote the AEIs, the black circles denote
the local interactions, the white squares denote the architectural interactions,
and the directed edges denote the attachments. As an example, the architec-
tural type PipeFilter can be pictorially represented through the flow graph of
Fig. 1. From a methodological viewpoint, when modeling an architectural type
with Æmilia, it is convenient to start with the flow graph representation of the
architectural type and then to textually specify the behavior of each AET.
3.2 Translation Semantics

The semantics of an Æmilia specification is given by translation into EMPAgr .
While only the dynamic operators (action prefix and alternative composition)
of EMPAgr can be used in the syntax of an Æmilia specification, the more
complicated static operators (functional abstraction, functional relabeling, and
parallel composition) of EMPAgr are transparently used in the semantics of an
Æmilia specification. The translation into EMPAgr is accomplished in two steps.
In the first step, the semantics of all the instances of each AET is defined to
be the behavior of the AET projected onto its interactions. Such a projected
a c c e p t_ ite m
0011
F 0 : F ilte r T
0110 s e r v e _ i t e m
1010
10 a c c e p t _ i t e m
P : P ip e T
fo r w a r d 00000000000000
11111111111111
00000000000000
11111111111111
_ ite m 1 11111111111111
00000000000000
00000000000000
11111111111111
fo r w a r d _ ite m 2
00000000000000
11111111111111 00000000000000
11111111111111
00000000000000
11111111111111 00000000000000
11111111111111
a c c e p t_ ite m 00000000000000
11111111111111 00000000000000 a
11111111111111 c c e p t_ ite m
F 1 : F ilte r T F 2 : F ilte r T
se rv
0011
e ite m
0011
s e r v e ite m
Fig. 1. Flow graph of PipeFilter
behavior is obtained from the family of sequential EMPAgr terms representing

the behavior of the AET by applying a functional abstraction operator on all
the actions that are not interactions. In this way, we abstract from all the
internal details of the behavior of the instances of the AET. For the pipe-filter
system of Table 2 we have
[[FilterT ]] = [[F0 ]] = [[F1 ]] = [[F2 ]] = Filter /{fail , repair }
[[PipeT ]] = [[P ]] = Pipe
thus abstracting from the internal activities fail and repair .
In the second step, the semantics of an architectural type is obtained by
composing in parallel the semantics of its AEIs according to the specified
attachments. Recalled that the parallel composition operator is left associative,
for the pipe-filter system we have
[[PipeFilter ]] = [[F0 ]][serve item → a] ∅
[[F1 ]][accept item → a1 ] ∅
[[F2 ]][accept item → a2 ] {a,a1 ,a2 }
[[P ]][accept item → a,
forward item 1 → a1 ,
forward item 2 → a2 ]
The use of the functional relabeling operator is necessary to make the AEIs
interact. As an example, F0 and P must interact via serve item and accept item,
which are different from each other. Since the parallel composition operator
allows only equally typed actions to synchronize, in [[PipeFilter ]] each serve item
action executed by [[F0 ]] and each accept item action executed by [[P ]] is relabeled
to an action with the same type a. In order to avoid interferences, it is important
that a be a fresh action type, i.e. an action type occurring neither in [[F0 ]] nor
in [[P ]]. Then a synchronization on a is forced between the relabeled versions of
[[F0 ]] and [[P ]] by means of operator {a,a1 ,a2 } . It is worth reminding that the
transformation of PipeFilter into [[PipeFilter ]], which can be analyzed through
the techniques mentioned in Sect 2, is completely transparent to the designer.
The interested reader is referred to [9,16] for a formal definition of the trans-
lation semantics.
3.3 Architectural Checks

Æmilia is equipped with some architectural checks that the designer can use to
verify the well formedness of the architectural types and, in case a mismatch
is detected, to identify the components that cause it. Most of such checks are
based on the weak bisimulation equivalence [60], which captures the ability of
the functional semantic models of two terms to simulate each other behaviors
up to internal actions.
The first two checks take care of verifying whether the deadlock free AEIs
of an architectural type fit together well, i.e. do not lead to system blocks. The
first check (compatibility) is concerned with architectural types whose topology
is acyclic. For an acyclic architectural type, if we take an AEI K and we consider
all the AEIs C1 , . . . , Cn attached to it, we can observe that they form a star
topology whose center is K, as the absence of cycles prevents any two AEIs
among C1 , . . . , Cn from communicating via an AEI different from K. It can
easily be recognized that an acyclic architectural type is just a composition of
star topologies. An efficient compatibility check based on the weak bisimulation
equivalence (together with a simple constraint on action priorities) ensures the
absence of deadlock within a star topology whose center K is deadlock free, and
this check scales to the whole acyclic architectural type. The basic condition
to check is that every Ci is compatible with K, i.e. the functional semantics
of their parallel composition is weakly bisimulation equivalent to the functional
semantics of K itself. Intuitively, this means that attaching Ci to K does not alter
the behavior of K, i.e. K is designed in such a way that it suitably coordinates
with Ci .
Since the compatibility check is not sufficient for cyclic architectural types,
the second check (interoperability) deals with cycles. A suitable interoperabil-
ity check based on the weak bisimulation equivalence (together with a simple
constraint on action priorities) ensures the absence of deadlock within a cycle
C1 , . . . , Cn of AEIs in the case that at least one of such AEIs is deadlock free.
The basic condition to check is that at least one deadlock free Ci interoperates
with the other AEIs in the cycle, i.e. the functional semantics of the parallel
composition of the AEIs in the cycle projected on the interactions with Ci only
is weakly bisimulation equivalent to the functional semantics of Ci . Intuitively,
this means that inserting Ci into the cycle does not alter the behavior of Ci ,
i.e. that the behavior of the cycle assumed by Ci matches the actual behavior of
the cycle. In the case in which no deadlock free AEI is found in the cycle that
interoperates with the other AEIs, a loop shrinking procedure can be used to
single out the AEIs in the cycle responsible for the deadlock.
On the performance side, there is a third check to detect architectural mis-
matches resulting in performance underspecification. This check (performance
closure) ensures that the performance semantic model underlying an architec-
tural type exists in the form of a CTMC or DTMC. In order for an architectural
type to be performance closed, the basic condition to check is that no AET be-
havior contains a passive action whose type is not an interaction, and that every
set of attached local interactions contains one interaction whose associated rate
is exponential or immediate.
We conclude by referring the interested reader to [9,16] for a precise definition
and examples of application of the architectural checks outlined in this section.
3.4 Families of Architectures and Hierarchical Modeling

An Æmilia description represents a family of architectures called an architec-
tural type. An architectural type is an intermediate abstraction between a single
architecture and an architectural style [66]. An important goal of the software
architecture discipline is the creation of an established and shared understanding
of the common forms of software design. Starting from the user requirements,
the designer should be able to identify a suitable organizational style, in or-
der to capitalize on codified principles and experience to specify, analyze, plan,
and monitor the construction of a system with high levels of efficiency and con-
fidence. An architectural style defines a family of systems having a common
vocabulary of components as well as a common topology and set of contraints
on the interactions among the components. As examples of architectural styles
we mention main program-subroutines, pipe-filter, client-server, and the layered
organization. Since an architectural style encompasses an entire family of soft-
ware systems, it is desirable to formalize the concept of architectural style both
to have a precise definition of the system family and to study the architectural
properties common to all the systems of the family. This is not a trivial task
because there are at least two degrees of freedom: variability of the component
topology and variability of the component internal behavior.
An architectural type is an approximation of an architectural style, where
the component topology and the component internal behavior can vary from in-
stance to instance of the architectural type in a controlled way, which preserves
the architectural checks. More precisely, all the instances of an architectural
type must have the same observable functional behavior and conforming topolo-
gies, while the internal behavior and the performance characteristics can freely
vary. An instance of an architectural type can be obtained by invoking the ar-
chitectural type and passing actual AETs preserving the observable functional
behavior of the formal AETs, an actual topology (actual AEIs, actual architec-
tural interactions, and actual attachments) that conforms to the formal topology,
actual names for the architectural interactions, and actual values for the numeric
parameters.
The simplest form of architectural invocation is the one in which the actual
parameters coincide with the formal ones, in which case the actual parameters
are omitted for the sake of conciseness. The possibility of defining the behavior
of an AET through an architectural invocation as well as declaring architec-
tural interactions can be exploited to model a system architecture in a hier-
archical way. As an example, consider the pipe-filter organization of Table 2
and suppose that it is the architecture of the server of a client-server system.
C : C lie n tT
g e n e r a te _ r e q u e s t a c c e p t_ o u tc o m e
r e c e iv e fo r w a r d
N r : N e tw o r k T N o : N e tw o r k T
fo r w a r d r e c e iv e
a c c e p t_ r e q u e s t g e n e r a te _ o u tc o m e
S : S e rv e rT a c c e p t_ ite m
0011
F 0 : F ilte r T
0110 s e r v e _ i t e m
1010
10 a c c e p t _ i t e m
P : P ip e T
f o r w a r d 11111111111111
00000000000000
00000000000000
11111111111111
_ ite m 1 11111111111111
00000000000000
00000000000000
11111111111111
11111111111111
00000000000000 00000000000000
11111111111111
00000000000000
11111111111111 00000000000000
11111111111111
a c c e p t_ ite m 00000000000000
11111111111111 00000000000000 a
11111111111111 c c e p t_ ite m
s e r v e _ ite m 0011 0011s e r v e _ ite m
Fig. 2. Flow graph of ClientServer
The flow graph description of the resulting client-server system is depicted in

Fig. 2, while its textual description is reported in Table 3. The client descrip-
tion is parametrized w.r.t. the request generation rate λ, while the communi-
cation link description is parametrized w.r.t. the communication speed δ. As
can be observed, the behavior of the server is defined through an invocation
of the previously defined architectural type PipeFilter , where the actual names
accept request, generate outcome, and generate outcome substitute for the for-
mal architectural interactions F0 .accept item, F1 .serve item, and F2 .serve item,
respectively.
A more complex form of architectural invocation is the one in which actual
AETs are passed that are different from the corresponding formal AETs. In this
case, we have to make sure that the actual AETs preserves the functional be-
havior determined by the formal ones. To this purpose, Æmilia is endowed with
an efficient behavioral conformity check based on the weak bisimulation equiv-
Table 3. Textual description of ClientServer
archi type ClientServer (exp rate λ, δr , δo ,

σ0 , σ 1 , σ 2 , φ 0 , φ 1 , φ 2 , ρ 0 , ρ 1 , ρ 2 ;
weight prouting )
archi elem types
elem type ClientT (exp rate λ)
Δ
behavior Client = <generate request, λ>.
<accept outcome, ∗>.Client
interactions output generate request
input accept outcome
elem type NetworkT (exp rate δ)
Δ
behavior Network = <receive, ∗>.<forward, δ>.Network
interactions input receive
output forward
elem type ServerT (exp rate σ0 , σ1 , σ2 , φ0 , φ1 , φ2 , ρ0 , ρ1 , ρ2 ;
weight prouting )
Δ
behavior Server = PipeFilter (; / ∗ actual AETs ∗ /
; / ∗ actual AEIs ∗ /
; / ∗ actual arch. interactions ∗ /
; / ∗ actual attachments ∗ /
accept request,
generate outcome,
generate outcome;
σ0 , σ 1 , σ 2 , φ 0 , φ 1 , φ 2 , ρ 0 , ρ 1 , ρ 2 ,
prouting )
interactions input accept request
output generate outcome
archi topology
archi elem instances C : ClientT (λ)
Nr : NetworkT (δr )
No : NetworkT (δo )
S : ServerT (σ0 , σ1 , σ2 , φ0 , φ1 , φ2 , ρ0 , ρ1 , ρ2 , prouting )
archi interactions
archi attachments from C.generate request to Nr .receive
from Nr .forward to S.accept request
from S.generate outcome to No .receive
from No .forward to C.accept outcome
end
alence (together with a simple constraint on action rates) to verify whether an

architectural type invocation conforms to an architectural type definition, in the
sense that the architectural type invocation and the architectural type definition
have the same observable functional semantics up to some relabeling. The basic
condition to check is that the functional semantics of each actual AET is weakly
bisimulation equivalent to the functional semantics of the corresponding formal
AET up to some relabeling. This behavioral conformity check ensures that all
the correct instances of an architectural type possess the same compatibility, in-
teroperability, and performance closure properties. In other words, the outcome
of the application of the compatibility, interoperability, and performance closure
checks to the definition of an architectural type scales to all the behaviorally
conforming invocations of the architectural type.
The most complete form of architectural invocation is the one in which both
actual AETs and an actual topology are passed that are different from the
corresponding formal AETs and formal topology, respectively. In this case, we
have to additionally make sure that the actual topology conforms to the formal
topology. There are three kinds of admitted topological extensions, all of which
preserve the compatibility, interoperability, and performance closure properties
under some general conditions.
. . .
. . .
u n ic o n n − u n ic o n n u n ic o n n − a n d c o n n u n ic o n n − o rc o n n
Fig. 3. Legal attachments in case of extensible and/or connections
The first kind of topological extension is given by the extensible and/or

connections. As an example, consider the client-server system of Table 3. Every
instance of such an architectural type can admit a single client and a single
server, whereas it would be useful to allow for an arbitrary number of clients
(to be instantiated when invoking the architectural type) that can connect to
the server. From the syntactical viewpoint, the extensible and/or connections are
introduced in Æmilia by further typing the interactions of the AETs. Besides the
input/output qualification, the interactions are classified as uniconn, andconn,
and orconn, with only the three types of attachments shown in Fig. 3 considered
legal. A uniconn interaction is an interaction to which a single AEI can be
attached; e.g., all the interactions of ClientServer are of this type. An andconn
interaction is an interaction to which a variable number of AEIs can be attached,
such that all the attached AEIs must synchronize when that interaction takes
place; e.g., a broadcast transmission. An orconn interaction is an interaction
to which a variable number of AEIs can be attached, such that only one of
the attached AEIs must synchronize when that interaction takes place; e.g., a
client-server system with several clients. Every output orconn interaction must
be declared to depend on one input orconn interaction, with the occurrences
of the two interactions alternating in the behavior of the AET that contains
them. On the semantic side, the treatment of uniconn and orconn interactions is
trivial. Instead, every occurrence of an input orconn interaction must be replaced
by a choice among as many indexed occurrences of that interaction as there are
attached AEIs, while every occurrence of an output orconn interaction must
be augmented with the same index given to the occurrence of the preceding
input orconn interaction on which it depends. Such modifications, which are
completely transparent to the designer, are necessary to reflect the fact that an
orconn interaction expresses a choice among different attached AEIs whenever

the interaction takes place.
a c c e p t_ ite m
0011
F 0 : F ilte r T
01 s e r v e _ i t e m
1010
1010 a c c e p t _ i t e m
P : P ip e T
11111111111111111111111111111111
00000000000000000000000000000000
00000000000000000000000000000000
11111111111111111111111111111111
fo r w a r d _ ite m 1 111111111111111111111111111111111
000000000000000000000000000000000
000000000000000000000000000000000
111111111111111111111111111111111
00000000000000000000000000000000
11111111111111111111111111111111 000000000000000000000000000000000
111111111111111111111111111111111
a c c e p t_ ite m 00000000000000000000000000000000
11111111111111111111111111111111
00000000000000000000000000000000
11111111111111111111111111111111
000000000000000000000000000000000
111111111111111111111111111111111
000000000000000000000000000000000
111111111111111111111111111111111 a c c e p t_ ite m
000000000000000000000000000000000
111111111111111111111111111111111
s e r v e _ ite m 0110 s e r v e _ i t e m
1010
a c c e p t_ ite m 10 a c c e p t _ i t e m
P ’ : P ip e T P ’’ : P ip e T
f o r w a r d 111111111111111
000000000000000 11111111111111
00000000000000 f o r w a r d 111111111111111
000000000000000 11111111111111
00000000000000
111111111111111
000000000000000
_ ite m 1
00000000000000
11111111111111
fo r w a r d _ ite m 2 111111111111111
000000000000000
_ ite m 1
00000000000000
11111111111111
000000000000000
111111111111111 00000000000000
11111111111111 000000000000000
111111111111111 00000000000000
11111111111111
a c c e p t_ ite m 000000000000000
111111111111111
000000000000000
111111111111111 00000000000000
11111111111111
00000000000000 a c c e p
11111111111111 t_ ite m a c c e p t_ ite m 000000000000000
111111111111111
000000000000000
111111111111111 00000000000000
11111111111111
00000000000000 a c c e p
11111111111111 t_ ite m
F 3 : F ilte r T F 4 : F ilte r T F 5 : F ilte r T F 6 : F ilte r T

0011
s e r v e _ ite m
0011
s e r v e _ ite m
0011
s e r v e _ ite m
0011
s e r v e _ ite m
Fig. 4. Flow graph of an exogenous extension of PipeFilter
The second kind of topological extension is given by the exogenous one. As

an example, consider the pipe-filter system of Table 2. Every instance of such an
architectural type can admit a single pipe connected to one upstream filter and
two downstream filters, whereas it would be desirable to be able to express by
means of that architectural type any pipe-filter system with an arbitrary number
of filters and pipes, such that every pipe is connected to one upstream filter and
two downstream filters. E.g., the flow graph in Fig. 4 should be considered as
a legal extension of the flow graph in Fig. 1. The idea behind the exogenous
extensions is that, since the architectural interactions of an architectural type
are the frontier of the whole architectural type, it is reasonable to extend the
architectural type at some of its architectural interactions with instances of the
already defined AETs, in a way that follows the prescribed topology.
The third kind of topological extension is given by the endogenous one. As an
example, consider the Æmilia description of a ring of stations each following the
same protocol: wait for a message from the previous station in the ring, process
the received message, and send the processed message to the next station in the
ring. Since such a protocol guarantees that only one station can transmit at a
given instant, the protocol can be considered as an abstraction of the IEEE 802.5
standard medium access control protocol for local area networks known as token
ring. One of the stations is designated to be the initial one, in the sense that it is
se n d r e c e iv e
IS : In itS ta tio n T
r e c e iv e se n d
S 1 : S ta tio n T S 3 : S ta tio n T
se n d r e c e iv e
r e c e iv e se n d
se n d r e c e iv e
S 2’ : S t a t i o n T S 2’ ’ : S t a t i o n T
Fig. 5. Flow graph of an endogenous extension of Ring
the first station allowed to send a message. Suppose that the Æmilia description
declares one instance of the initial station and three instances of the normal
station. Every instance of the architectural type, say Ring, can thus admit a
single initial station and three normal stations connected to form a ring, whereas
it would be desirable to be able to express by means of that architectural type
any ring system with an arbitrary number of normal stations. E.g., the flow graph
in Fig. 5 should be considered as a legal extension of the architectural type Ring.
The idea behind the endogenous extensions is that of replacing a set of AEIs with
a set of new instances of the already defined AETs, in a way that follows the
prescribed topology. In this case, we consider the frontier of the architectural
type w.r.t. one of the replaced AEIs to be the set of interactions previously
attached to the local interactions of the replaced AEI. On the other hand, all
the replacing AEIs that will be attached to the frontier of the architectural type
w.r.t. one of the replaced AEIs must be of the same type as the replaced AEI.
We conclude by referring the interested reader to [9,16,19,20] for a precise
definition of the behavioral and topological conformity checks outlined in this
section.
4 Conclusion
In this paper we have recalled the basic notions and the main achievements in
the field of SPA and we have stressed its current transformation into a fully
fledged ADL for the compositional, graphical, hierarchical and controlled mod-
eling of complex systems as well as their functional verification and performance
evaluation. Such a transformation eases the modeling process and provides an
added value given by some architectural checks for detecting deadlock as well as
performance underspecification, which scale over families of architectures.
Concerning future work in the area of SPA based ADLs, first of all we mention
the importance of devising additional architectural checks on the performance
side, that provide diagnostic information like in the case of the compatibility and
interoperability checks. At the architectural level of design, it is extremely useful

to be able to reinterpret the performance results in terms of components and their
interactions. In order to achieve that, the performance must be calculated not
on a flat model like a MC, but on a model that maintains some correspondence
with the system structure, so that there is the possibility to localize bottlenecks.
Some work in this direction can be found in [9], where Æmilia descriptions are
translated into queueing network models.
Furthermore, SPA based ADLs should be viewed in the context of the whole
software life cycle. A link should be established from higher level notations like
UML, where requirements are expressed in a less formal way, as well as to object
oriented programming languages, aiming at the automatic generation of code
that possesses the functional and performance properties formally proved at the
architectural level.
References
1. M. Ajmone Marsan, G. Balbo, G. Conte, S. Donatelli, G. Franceschinis, “Modelling
with Generalized Stochastic Petri Nets”, John Wiley & Sons, 1995
2. A. Aldini, M. Bernardo, R. Gorrieri, “An Algebraic Model for Evaluating the Per-
formance of an ATM Switch with Explicit Rate Marking”, in Proc. of the 7th Int.
Workshop on Process Algebra and Performance Modelling (PAPM 1999), Prensas
Universitarias de Zaragoza, pp. 119-138, Zaragoza (Spain), 1999
3. A. Aldini, M. Bernardo, R. Gorrieri, M. Roccetti, “Comparing the QoS of Inter-
net Audio Mechanisms via Formal Methods”, in ACM Trans. on Modeling and
Computer Simulation 11:1-42, 2001
4. R. Allen, D. Garlan, “A Formal Basis for Architectural Connection”, in ACM
Trans. on Software Engineering and Methodology 6:213-249, 1997
5. J.C.M. Baeten, W.P. Weijland, “Process Algebra”, Cambridge University Press,
1990
6. C. Baier, B. Haverkort, H. Hermanns, J.-P. Katoen, “On the Logical Charac-
terisation of Performability Properties”, in Proc. of the 27th Int. Coll. on Au-
tomata, Languages and Programming (ICALP 2000), LNCS 1853:780-792, Geneve
(Switzerland), 2000
7. C. Baier, B. Haverkort, H. Hermanns, J.-P. Katoen, “Model Checking Continuous-
Time Markov Chains by Transient Analysis”, in Proc. of the 12th Int. Conf. on
Computer Aided Verification (CAV 2000), LNCS 1855:358-372, Chicago (IL), 2000
8. C. Baier, J.-P. Katoen, H. Hermanns, “Approximate Symbolic Model Checking of
Continuous Time Markov Chains”, in Proc. of the 10th Int. Conf. on Concurrency
Theory (CONCUR 1999), LNCS 1664:146-162, Eindhoven (The Netherlands), 1999
9. S. Balsamo, M. Bernardo, M. Simeoni, “Combining Stochastic Process Algebras
and Queueing Networks for Software Architecture Analysis”, to appear in Proc. of
the 3rd Int. Workshop on Software and Performance (WOSP 2002), Rome (Italy),
2002
10. M. Bernardo, “Theory and Application of Extended Markovian Process Algebra”,
Ph.D. Thesis, University of Bologna (Italy), 1999
11. M. Bernardo, “A Simulation Analysis of Dynamic Server Selection Algorithms for
Replicated Web Services”, in Proc. of the 9th Int. Symp. on Modeling, Analysis
and Simulation of Computer and Telecommunication Systems (MASCOTS 2001),
IEEE-CS Press, pp. 371-378, Cincinnati (OH), 2001
12. M. Bernardo, M. Bravetti, “Reward Based Congruences: Can We Aggre-

gate More?”, in Proc. of the 1st Joint Int. Workshop on Process Al-
gebra and Performance Modelling and Probabilistic Methods in Verification
(PAPM/PROBMIV 2001), LNCS 2165:136-151, Aachen (Germany), 2001
13. M. Bernardo, M. Bravetti, “Performance Measure Sensitive Congruences for
Markovian Process Algebras”, to appear in Theoretical Computer Science, 2002
14. M. Bernardo, N. Busi, M. Ribaudo, “Integrating TwoTowers and GreatSPN through
a Compact Net Semantics”, to appear in Performance Evaluation, 2002
15. M. Bernardo, P. Ciancarini, L. Donatiello, “ÆMPA: A Process Algebraic Descrip-
tion Language for the Performance Analysis of Software Architectures”, in Proc. of
the 2nd Int. Workshop on Software and Performance (WOSP 2000), ACM Press,
pp. 1-11, Ottawa (Canada), 2000
16. M. Bernardo, P. Ciancarini, L. Donatiello, “Architecting Software Systems with
Process Algebras”, Tech. Rep. UBLCS-2001-07, University of Bologna (Italy), 2001
17. M. Bernardo, W.R. Cleaveland, “A Theory of Testing for Markovian Processes”,
in Proc. of the 11th Int. Conf. on Concurrency Theory (CONCUR 2000),
LNCS 1877:305-319, State College (PA), 2000
18. M. Bernardo, W.R. Cleaveland, W.S. Stewart, “TwoTowers 1.0 User Manual”,
http://www.sti.uniurb.it/bernardo/twotowers/, 2001
19. M. Bernardo, F. Franzè, “Architectural Types Revisited: Extensible And/Or Con-
nections”, in Proc. of the 5th Int. Conf. on Fundamental Approaches to Software
Engineering (FASE 2002), LNCS 2306:113-128, Grenoble (France), 2002
20. M. Bernardo, F. Franzè, “Exogenous and Endogenous Extensions of Architectural
Types”, in Proc. of the 5th Int. Conf. on Coordination Models and Languages
(COORDINATION 2002), LNCS 2315:40-55, York (UK), 2002
21. M. Bernardo, R. Gorrieri, M. Roccetti, “Formal Performance Modelling and Evalu-
ation of an Adaptive Mechanism for Packetised Audio over the Internet”, in Formal
Aspects of Computing 10:313-337, 1999
22. H. Bohnenkamp, “Compositional Solution of Stochastic Process Algebra Models”,
Ph.D. Thesis, RWTH Aachen (Germany), 2001
23. H. Bowman, J.W. Bryans, J. Derrick, “Analysis of a Multimedia Stream using
Stochastic Process Algebra”, in Proc. of the 6th Int. Workshop on Process Algebra
and Performance Modelling (PAPM 1998), pp. 51-69, Nice (France), 1998
24. J.T. Bradley, “Towards Reliable Modelling with Stochastic Process Algebras”, Ph.D.
Thesis, University of Bristol (UK), 1999
25. M. Bravetti, “Specification and Analysis of Stochastic Real-Time Systems”, Ph.D.
Thesis, University of Bologna (Italy), 2002
26. M. Bravetti, M. Bernardo, “Compositional Asymmetric Cooperations for Process
Algebras with Probabilities, Priorities, and Time”, in Proc. of the 1st Int. Workshop
on Models for Time Critical Systems (MTCS 2000), Electronic Notes in Theoretical
Computer Science 39(3), State College (PA), 2000
27. P. Buchholz, “Markovian Process Algebra: Composition and Equivalence”, in
Proc. of the 2nd Int. Workshop on Process Algebra and Performance Modelling
(PAPM 1994), pp. 11-30, Erlangen (Germany), 1994
28. W.R. Cleaveland, J. Parrow, B. Steffen, “The Concurrency Workbench: A
Semantics-Based Tool for the Verification of Concurrent Systems”, in ACM Trans.
on Programming Languages and Systems 15:36-72, 1993
29. G. Clark, “Techniques for the Construction and Analysis of Algebraic Performance
Models”, Ph.D. Thesis, University of Edinburgh (UK), 2000
30. E.M. Clarke, O. Grumberg, D.A. Peled, “Model Checking”, MIT Press, 1999
31. P. D’Argenio, “Algebras and Automata for Timed and Stochastic Systems”, Ph.D.
Thesis, University of Twente (The Netherlands), 1999
32. R. De Nicola, M.C.B. Hennessy, “Testing Equivalences for Processes”, in Theoret-
ical Computer Science 34:83-133, 1983
33. D. Ferrari, “Considerations on the Insularity of Performance Evaluation”, in IEEE
Trans. on Software Engineering 12:678-683, 1986
34. S. Gilmore, “The PEPA Workbench User Manual”,
http://www.dcs.ed.ac.uk/pepa/tools.html, 2001
35. S. Gilmore, J. Hillston, D.R.W. Holton, M. Rettelbach, “Specifications in Stochas-
tic Process Algebra for a Robot Control Problem”, in Journal of Production Re-
search 34:1065-1080, 1996
36. R.J. van Glabbeek, S.A. Smolka, B. Steffen, “Reactive, Generative and Stratified
Models of Probabilistic Processes”, in Information and Computation 121:59-80,
1995
37. R.J. van Glabbeek, F.W. Vaandrager, “Petri Net Models for Algebraic Theories
of Concurrency”, in Proc. of the Conf. on Parallel Architectures and Languages
Europe (PARLE 1987), LNCS 259:224-242, Eindhoven (The Netherlands), 1987
38. N. Götz, “Stochastische Prozeßalgebren – Integration von funktionalem Entwurf
und Leistungsbewertung Verteilter Systeme”, Ph.D. Thesis, University of Erlangen
(Germany), 1994
39. P.G. Harrison, J. Hillston, “Exploiting Quasi-Reversible Structures in Markovian
Process Algebra Models”, in Computer Journal 38:510-520, 1995
40. H. Hermanns, “Interactive Markov Chains”, Ph.D. Thesis, University of Erlangen
(Germany), 1998
41. H. Hermanns, U. Herzog, J. Hillston, V. Mertsiotakis, M. Rettelbach, “Stochas-
tic Process Algebras: Integrating Qualitative and Quantitative Modelling”, Tech.
Rep. 11/94, University of Erlangen (Germany), 1994
42. H. Hermanns, U. Herzog, V. Mertsiotakis, “Stochastic Process Algebras as a Tool
for Performance and Dependability Modelling”, in Proc. of the 1st IEEE Int. Com-
puter Performance and Dependability Symp. (IPDS 1995), IEEE-CS Press, pp. 102-
111, Erlangen (Germany), 1995
43. H. Hermanns, J.-P. Katoen, “Automated Compositional Markov Chain Generation
for a Plain-Old Telephone System”, in Science of Computer Programming 36:97-
127, 2000
44. H. Hermanns, J. Meyer-Kayser, M. Siegle, “Multi Terminal Binary Decision Di-
agrams to Represent and Analyse Continuous Time Markov Chains”, in Proc. of
the 3rd Int. Workshop on the Numerical Solution of Markov Chains (NSMC 1999),
Zaragoza (Spain), 1999
45. H. Hermanns, M. Rettelbach, “Syntax, Semantics, Equivalences, and Axioms for
MTIPP”, in Proc. of the 2nd Int. Workshop on Process Algebra and Performance
Modelling (PAPM 1994), pp. 71-87, Erlangen (Germany), 1994
46. U. Herzog, “Formal Description, Time and Performance Analysis – A Framework”,
in Entwurf und Betrieb verteilter Systeme, Informatik Fachberichte 264, Springer,
1990
47. U. Herzog, “EXL: Syntax, Semantics and Examples”, Tech. Rep. 16/90, University
of Erlangen (Germany), 1990
48. J. Hillston, “A Compositional Approach to Performance Modelling”, Cambridge
University Press, 1996
49. J. Hillston, N. Thomas, “Product Form Solution for a Class of PEPA Models”, in
Performance Evaluation 35:171-192, 1999
50. C.A.R. Hoare, “Communicating Sequential Processes”, Prentice Hall, 1985

51. D.R.W. Holton, “A PEPA Specification of an Industrial Production Cell”, in Com-
puter Journal 38:542-551, 1995
52. R.A. Howard, “Dynamic Probabilistic Systems”, John Wiley & Sons, 1971
53. K. Kanani, “A Unified Framework for Systematic Quantitative and Qualitative
Analysis of Communicating Systems”, Ph.D. Thesis, Imperial College (UK), 1998
54. J.-P. Katoen “Quantitative and Qualitative Extensions of Event Structures”, Ph.D.
Thesis, University of Twente (The Netherlands), 1996
55. U. Klehmet, V. Mertsiotakis, “TIPPtool – User’s Guide”,
http://www7.informatik.uni-erlangen.de/tipp/tool.html, 1998
56. L. Kleinrock, “Queueing Systems”, John Wiley & Sons, 1975
57. K.G. Larsen, A. Skou, “Bisimulation through Probabilistic Testing”, in Information
and Computation 94:1-28, 1991
58. S.S. Lavenberg editor, “Computer Performance Modeling Handbook”, Academic
Press, 1983
59. V. Mertsiotakis, “Approximate Analysis Methods for Stochastic Process Algebras”,
Ph.D. Thesis, University of Erlangen (Germany), 1998
60. R. Milner, “Communication and Concurrency”, Prentice Hall, 1989
61. D.E. Perry, A.L. Wolf, “Foundations for the Study of Software Architecture”, in
ACM SIGSOFT Software Engineering Notes 17:40-52, 1992
62. M. Rettelbach, “Stochastische Prozeßalgebren mit zeitlosen Aktivitäten und prob-
abilistischen Verzweigungen”, Ph.D. Thesis, University of Erlangen (Germany),
1996
63. M. Ribaudo, “On the Relationship between Stochastic Process Algebras and
Stochastic Petri Nets”, Ph.D. Thesis, University of Torino (Italy), 1995
64. P. Schweitzer, “Aggregation Methods for Large Markov Chains”, in Mathematical
Computer Performance and Reliability, North Holland, pp. 275-286, 1984
65. M. Sereno, “Towards a Product Form Solution for Stochastic Process Algebras”, in
Computer Journal 38:622-632, 1995
66. M. Shaw, D. Garlan, “Software Architecture: Perspectives on an Emerging Disci-
pline”, Prentice Hall, 1996
67. M. Siegle, “Beschreibung und Analyse von Markovmodellen mit grossem Zustand-
sraum”, Ph.D. Thesis, University of Erlangen (Germany), 1995
68. C.U. Smith, “Performance Engineering of Software Systems”, Addison-Wesley,
1990
69. W.J. Stewart, “Introduction to the Numerical Solution of Markov Chains”, Prince-
ton University Press, 1994
70. B. Strulo, “Process Algebra for Discrete Event Simulation”, Ph.D. Thesis, Imperial
College (UK), 1994
Automated Performance and Dependability
Evaluation Using Model Checking
Christel Baier1 , Boudewijn Haverkort2 , Holger Hermanns3 , and

Joost-Pieter Katoen3
1
Institut für Informatik I, University of Bonn
Römerstraße 164, D-53117 Bonn, Germany
2
Dept. of Computer Science, RWTH Aachen
Ahornstraße 55, D-52056 Aachen, Germany
3
Faculty of Computer Science, University of Twente
P.O. Box 217, 7500 AE Enschede, The Netherlands
Abstract. Markov chains (and their extensions with rewards) have been
widely used to determine performance, dependability and performability
characteristics of computer communication systems, such as throughput,
delay, mean time to failure, or the probability to accumulate at least a
certain amount of reward in a given time.
Due to the rapidly increasing size and complexity of systems, Markov
chains and Markov reward models are difficult and cumbersome to spec-
ify by hand at the state-space level. Therefore, various specification for-
malisms, such as stochastic Petri nets and stochastic process algebras,
have been developed to facilitate the specification of these models at a
higher level of abstraction. Up till now, however, the specification of the
measure-of-interest is often done in an informal and relatively unstruc-
tured way. Furthermore, some measures-of-interest can not be expressed
conveniently at all.
In this tutorial paper, we present a logic-based specification technique
to specify performance, dependability and performability measures-of-
interest and show how for a given finite Markov chain (or Markov re-
ward model) such measures can be evaluated in a fully automated way.
Particular emphasis will be given to so-called path-based measures and
hierarchically-specified measures. For this purpose, we extend so-called
model checking techniques to reason about discrete- and continuous-time
Markov chains and their rewards. We also report on the use of techniques
such as (compositional) model reduction and measure-driven state-space
generation to combat the infamous state space explosion problem.
1 Introduction
Over the last decades many techniques have been developed to specify and solve
performance, dependability and performability models. In many cases, the mod-
els addressed possess a continuous-time Markov chain as their associated stochas-
tic process. To avoid the specification of performance models directly at the state

Corresponding author; [email protected], phone: +31 53 489-4661.

262 C. Baier et al.
level, high-level specification methods have been developed, most notably those
based on stochastic Petri nets, stochastic process algebras, and stochastic ac-
tivity networks. With appropriate tools supporting these specification methods,
such as, for instance, provided by TIPPtool [36], the PEPA workbench [23],
GreatSPN [13], UltraSAN [56] or SPNP [14], it is relatively comfortable to spec-
ify performance models of which the associated CTMCs have millions of states.
In combination with state-of-the-art numerical means to solve the resulting linear
system of equations (for steady-state measures) or the linear system of differen-
tial equations (for time-dependent or transient measures) a good workbench is
available to construct and solve dependability models of complex systems.
However, whereas the specification of performance and dependdability mod-
els has become very comfortable, the specification of the measures of interest
most often has remained fairly cumbersome. In particular, most often only sim-
ple state-based measures can be defined with relative ease.
In contrast, in the area of formal methods for system verification, in particu-
lar in the area of model checking, very powerful logic-based methods have been
developed to express properties of systems specified as finite state automata
(note that we can view a CTMC as a special type of such an automaton). Not
only are suitable means available to express state-based properties, a logic like
CTL [16] (Computational Tree Logic; see below) also allows one to express prop-
erties over state sequences. Such capabilities would also be welcome in specifying
performance and dependability measures.
To fulfil this aim, we have introduced the so-called continuous stochastic
logic (CSL) that provides us ample means to specify state- as well as path-based
performance measures for CTMCs in a compact and flexible way [1,2,3,4,5].
Moreover, due to the formal syntax and semantics of CSL, we can exploit the
structure of CSL-specified measures in the subsequent evaluation process, such
that typically the size of the underlying Markov chains that need to be evaluated
can be reduced considerably.
To further strengthen the applicability of the stochastic model checking ap-
proach we recently considered Markov models involving costs or rewards, as they
are often used in the performability context. We extended the logic CSL to the
continuous stochastic reward logic CSRL in order to specify steady-state, tran-
sient and path-based measures over CTMCs extended with a reward structure
(Markov reward models) [4]. We showed that well-known performability mea-
sures, most notably also the performability distribution introduced by Meyer [51,
52,53], can be specified using CSRL. However, CSRL allows for the specification
of new measures that have not yet been addressed in the performability liter-
ature. For instance, when rewards are interpreted as costs, we can express the
probability that, given a starting state, a certain goal state is reached within
t time units, thereby deliberately avoiding or visiting certain immediate states,
and with a total cost (accumulated reward) below a certain threshold.
We have introduced CSL and CSRL (including its complete syntax and for-
mal semantics) in a much more theoretical context as we do in this tutorial paper
(cf. [2,3,4,5,33]).
Automated Performance and Dependability Evaluation 263
The rest of the paper is organised as follows. In Section 2 and Section 3 we

present the two system evaluation techniques that will be merged in this paper:
performance and dependability evaluation and formal verification by means of
model checking. We then proceed with the specification of performance measures
using CSL in Section 4, and of performability measures using CSRL in Section
5. Section 6 addresses lumpability to combat the state space explosion problem;
Section 7 concludes the paper.
2 Performance Modelling with Markov Chains

2.1 Introduction
Performance and dependability evaluation aim at forecasting system behaviour
in a quantitative way by trying to answer questions related to the performance
and dependability of systems. Typical problems that are addressed are: how
many clients can this file server adequately support, how large should the buffers
in a router be to guarantee a packet loss of at most 10−6 , or how long does it take
before 2 failures have occurred? Notice that we restrict ourselves to model-based
performance and dependability evaluation, as opposed to measurement-based
evaluation.
A c c u ra c y Real system Performance Req.
modelling
formalising
Measure Specification
Model of system (desired performance)
evaluation
D e p e n d a b ility
C h e c k e r
solution
N u m e ric a l R e s u lts
Fig. 1. The model-based performance evaluation cycle
The basic idea of model-based performance and dependability evaluation is to

construct an abstract (and most often approximate) model of the system under
consideration that is just detailed enough to evaluate the measures of interest
(such as time-to-failure, system throughput, or number of failed components)
264 C. Baier et al.
with the required accuracy (mean values, variances or complete distributions).

The generated model is “solved” using either analytical, numerical or simula-
tion techniques. We focus on numerical techniques as they pair a good mod-
elling flexibility with still reasonable computational requirements. Due to the
ever increasing size and complexity of real systems, performance and depend-
ability models that are directly amenable for a numerical solution, i.e., typically
continuous-time Markov chains (CTMCs), are awkward to specify “by hand”
and are therefore generated automatically from high-level description/modelling
languages such as stochastic Petri nets, stochastic process algebras or queueing
networks [30]. The steps in the process from a system to a useful dependability
or performance evaluation are illustrated in the model-based performance and
dependability evaluation cycle in Fig. 1.
It remains to be stated at this point that even though good support exists
for the actual model description, the specification of the measures of interest is
mostly done in an informal or less abstract way.
2.2 Discrete and Continuous-Time Markov Chains
This section recalls the basic concepts of discrete- and continuous-time Markov
chains with finite state space. The presentation is focused on the concepts needed
for the understanding of the rest of this paper; for a more elaborate treatment
we refer to [21,43,47,48,59]. We slightly depart from the standard notations by
representing a Markov chain as an ordinary finite transition system where the
edges are equipped with probabilistic information, and where states are labelled
with atomic propositions, taken from a set AP. Atomic propositions identify
specific situations the system may be in, such as “acknowledgement pending”,
“buffer empty”, or “variable X is positive”.
Discrete-time Markov chains. A DTMC is a triple M = (S, P, L) where S

is a finite set of states, P : S × S → [0, 1] is the transition probability matrix,
and L : S → 2AP is the labelling function. Intuitively, P(s, s ) specifies that
probability to move from state s to s in a single step, and function L assigns
to each state s ∈ S the set L(s) of atomic propositions a ∈ AP that are valid in
s. One may view a DTMC as a finite state automaton equipped with transition
probabilities and in which time evolves in discrete steps.
Continuous-time Markov chains. A CTMC is a tuple M = (S, R, L) where

state space S and labelling function L are as for DTMCs, and R : S×S → IR0 is
the rate matrix. Intuitively, R(s, s ) specifies that the probability of moving from

state s to s within t time-units (for positive t) is 1 − e−R(s,s )·t . Alternatively,
a CTMC can be viewed as a finite state automaton enhanced with transition
labels specifying (in a certain way) the time it takes to proceed along them. It
should be noted that this definition does not require R(s, s) = − s =s R(s, s ),
as is usual for CTMCs. In the traditional interpretation, at the end of a stay
in state s, the system will move to a different state. In our setting, self-loops at
state s are possible and are modelled by having R(s, s) > 0. We thus allow the
system to occupy
the same state before and after taking a transition.
Let E(s) = s ∈S R(s, s ), the total rate at which any transition emanating
from state s is taken.1 More precisely, E(s) specifies that the probability of
leaving s within t time-units (for positive t) is 1 − e−E(s)·t . The probability of
eventually moving from state s to s , denoted P(s, s ), is determined by the
probability that the delay of going from s to s finishes before the delays of
other outgoing edges from s; formally, P(s, s ) = R(s, s )/E(s) (except if s is an
absorbing state, i.e. if E(s) = 0; in this case we define P(s, s ) = 0). The matrix
P describes an embedded DTMC of the CTMC.
Example 1. As a running example we address a triple modular redundant system

(TMR) taken from [28], a fault-tolerant computer system consisting of three
processors and a single (majority) voter. We model this system as a CTMC
where state si,j models that i (0 i < 4) processors and j (0 j 1) voters are
operational. As atomic propositions we use AP = { upi | 0 i < 4 } ∪ { down }.
The processors generate results and the voter decides upon the correct value by
taking a majority vote. The failure rate of a single processor is λ and of the
voter ν failures per hour (fph). The expected repair time of a processor is 1/μ
and of the voter 1/δ hours. It is assumed that one component can be repaired
at a time. The system is operational if at least two processors and the voter
are functioning correctly. If the voter fails, the entire system is assumed to have
failed, and after a repair (with rate δ) the system is assumed to start “as good
as new”. The details of the CTMC modelling this system are (with a clock-wise
ordering of states for the matrix/vector-representation, starting with s3,1 ):
up3 3λ up2
s3,1 μ s2,1
⎛ ⎞ ⎛ ⎞
ν 0 3λ 0 0 ν 3λ+ν
δ ν ⎜ μ 0 2λ 0 ν ⎟ ⎜ 2λ+μ+ν ⎟
down s0,0 μ 2λ ⎜ ⎟ ⎜ ⎟
ν R = ⎜ 0 μ 0 λ ν ⎟ and E = ⎜
⎜ ⎟
⎜ λ+μ+ν ⎟
⎟
ν ⎝0 0 μ 0 ν⎠ ⎝ μ+ν ⎠
μ
s0,1 s1,1 δ 0 0 00 δ
up0 λ up1
States are represented by circles and there is an edge between state s and state
s if and only if R(s, s ) > 0. The labelling is defined by L(si,1 ) = { upi } for
0 i < 4 and L(s0,0 ) = { down }, and is indicated near the states (set braces
are omitted for singletons). For the transition probabilities we have, for instance,
P(s2,1 , s3,1 ) = μ/(2λ+μ+ν) and P(s0,1 , s0,0 ) = ν/(μ+ν).
State sequences. A path σ through a CTMC is a (finite or infinite) sequence

of states where the time spent in any of the states is recorded. For instance,
1
Note that R and E just form an alternative representation of the usual infinitesimal
generator matrix Q; more precisely, Q = R − diag(E). Note that this alternative
representation does not affect the transient and steady-state behaviour of the CTMC,
and is used for technical convenience only.
266 C. Baier et al.
σ = s0 , t0 , s1 , t1 , s2 , t2 , . . . is an infinite path with for natural i, state si ∈ S and

time ti ∈ IR>0 such that R(si , si+1 ) > 0. We let σ[i] = si denote the (i+1)-st
state along a path, δ(σ, i) = ti , the time spent in si , and σ@t the state of σ at
time t. (For finite paths these notions have to be slightly adapted so as to deal
with the end state of a path.) Let Path(s) be the set of paths starting in s. A
Borel space (with probability measure Pr) can be defined over the set Path(s)
in a straightforward way; for details see [2].
Steady-state and transient measures. For CTMCs, two major types of state
probabilities are normally considered: steady-state probabilities where the sys-
tem is considered “in the long run”, i.e., when an equilibrium has been reached,
and transient probabilities where the system is considered at a given time instant
t. Formally, the transient probability
π(s, s , t) = Pr{σ ∈ Path(s) | σ@t = s },
stands for the probability to be in state s at time t given the initial state s. We
denote with π(s, t) the vector of state probabilities (ranging over states s ) at
time t, when the starting state is s. The transient probabilities are then computed
from a system of linear differential equations:
π (s, t) = π(s, t) · Q,
which can be solved by standard numerical methods or by specialised methods
such as uniformisation [45,26,25]. With uniformisation, the transient probabili-
ties of a CTMC are computed via a uniformised DTMC which characterises the
CTMC at discrete state transition epochs. Steady-state probabilities are defined
as
π(s, s ) = lim π(s, s , t),
t→∞
This limit always exists for finite CTMCs. In case the steady-state distribution

does not depend on the starting state s we often simply write π(s ) instead of

π(s, s ). For S ⊆ S, π(s, S ) = s ∈S π(s, s ) denotes the steady-state probabil-
ity for the set of states S . In this case, steady-state probabilities are computed
from a system of linear equations:

π(s) · Q = 0 with π(s, s ) = 1,
s
which can be solved by direct methods (such as Gaussian elimination) or iterative

methods (such as SOR or Gauss-Seidel).
Notice that the above two types of measures are truly state based; they con-
sider the probability for particular states. Although this is interesting as such,
one can image that for many performance and dependability questions, there
is an interest in the occurrence probability of certain state sequences. Stated
differently, we would also like to be able to express measures that address the
probability on particular paths through the CTMC. Except for the recent work
by Obal and Sanders [54], we are not aware of suitable mechanisms to express
such measures. In the sequel, we will specifically address this issue.
3 Formal Verification with Model Checking

Whereas performance and dependability evaluation focusses on answering ques-
tions concerning quantitative system issues, traditional formal verification tech-
niques try to answer questions related to the functional correctness of systems.
Thus, formal verification aims at forecasting system behaviour in a qualitative
way. Typical problems that are addressed by formal verification are: (i) safety,
e.g., does a given mutual exclusion algorithm guarantee mutual exclusion? (ii)
liveness, e.g., does a routing protocol eventually transfer packets to the cor-
rect destination? and (iii) fairness, e.g., will a repetitive attempt to carry out a
transaction be eventually granted?
Prominent formal verification techniques are theorem proving and model
checking, as well as (but to a less formal extent) testing [17,50,55,8]. Impor-
tant to note at this point is that for an ever-increasing class of systems, their
“formal correctness” cannot be separated anymore from their “quantitative cor-
rectness”, e.g., in real-time systems, multi-media communication protocols and
many embedded systems.
3.1 Model Checking

The model checking approach requires a model of the system under consider-
ation together with a desired property and systematically checks whether the
given model satisfies this property. The basic technique of model checking is a
systematic, usually exhaustive, state-space search to check whether the property
is satisfied in each state of the system model, thereby using effective methods to
combat the infamous state-space explosion problem.
Using model checking, the user specifies a model of the system (the “possible
behaviour”) and a specification of the requirements (the “desirable behaviour”)
and leaves the verification up to the model checker. If an error is found, the model
checker provides a counter-example showing under which circumstance the error
can be generated. The counter-example consists of an example scenario in which
the model behaves in an undesired way, thus providing evidence that the system
(or the model) is faulty and needs to be revised, cf. Fig. 2. This allows the user
to locate the error and to repair the system (or model specification). If no errors
are found, the user can refine the model description and continue the verification
process, e.g., by taking more design decisions into account, so that the model
becomes more concrete and realistic.
Typically, models of systems are finite-state automata, where transitions
model the evolution of the system while moving from one state to another.
These automata are usually generated from a high-level description language
such as Petri nets, Promela [41] or Statecharts [27]. At this point, notice the
similarities with the models used for performance and dependability evaluation.
Computational Tree Logic. Required system properties can be specified in

an extension of propositional logic called temporal logic. Temporal logics allow
the formulation of properties that refer to the dynamic behaviour of a system;
268 C. Baier et al.
requirements system
Formalizing Modeling
property
specification system model
Model Checking
satisfied violated + location

Simulation error
counterexample
Fig. 2. The model checking approach
it allows to express for instance the temporal ordering of events. Note that the
term “temporal” is meant in a qualitative sense, not in a quantitative sense. An
important logic for which efficient model checking algorithms exist is CTL [16]
(Computational Tree Logic). This logic allows to state properties over states,
and over paths using the following syntax:
State-formulas
Φ ::= a | ¬Φ |Φ ∨ Φ | ∃ϕ | ∀ϕ
a : atomic proposition
∃ϕ : there Exists a path that fulfils ϕ
∀ϕ : All paths fulfil ϕ
Path-formulas
ϕ ::= X Φ | ΦU Φ
XΦ : the neXt state fulfils Φ
ΦU Ψ : Φ holds along the path, Until Ψ holds
3Φ : true U Φ, i.e., eventually Φ
2Φ : ¬3¬Φ, i.e., invariantly Φ
The meaning of atomic propositions, negation (¬) and disjunction (∨) is stan-
dard; note that using these operators, other boolean operators such as conjunc-
tion (∧), implication (⇒), and so forth, can be defined. The state-formula ∃ϕ
is valid in state s if there exists some path starting in s and satisfying ϕ. The
formula ∃3deadlock, for example, expresses that for some system run eventually
a deadlock can be reached (potential deadlock). On the contrary, ∀ϕ is valid if
all paths satisfy ϕ; ∀3deadlock thus means that a deadlock is inevitable. A path
satisfies an until-formula Φ U Ψ if the path has an initial finite prefix (possibly
only containing state s) such that Φ holds at all states along the path until a
state for which Ψ holds is encountered along the path.
Example 2. Considering the TMR system example as a finite-state automaton,
some properties one can express with CTL are:
– up3 ⇒ ∃3down:
if the system is fully operational, it may eventually go down.
– up3 ⇒ ∀X (up2 ∨ down):
if the system is fully operational, any next step involves the failure of a
component.
– ∃2¬ down:
it is possible that the voter never fails.
– ∃((up3 ∨ up2 ) U down):
it is possible to have two or three processors continuously working until the
voter fails.
Model checking CTL. A model, i.e., a finite-state automaton where states are
labelled with atomic propositions, is said to satisfy a property if and only if all
its initial states satisfy this property. In order to check whether a model satisfies
a property Φ, the set Sat(Φ) of states that satisfy Φ is computed recursively,
after which it is checked whether the initial states belong to this set. For atomic
propositions this set is directly obtained from the above mentioned labelling
of the states; Sat(Φ ∧ Ψ ) is obtained by computing Sat(Φ) and Sat(Ψ ), and
then intersecting these sets; Sat(¬Φ) is obtained by taking the complement of
the entire state space with respect to Sat(Φ). The algorithms for the temporal
operators are slightly more involved. For instance, for Sat(∃X Φ) we first compute
the set Sat(Φ) and then compute those states from which one can move to this
set by a single transition. Sat(∃(Φ U Ψ )) is computed in an iterative way: (i) as
a precomputation we determine Sat(Φ) and Sat(Ψ ); (ii) we start the iteration
with Sat(Ψ ) as these states will surely satisfy the property of interest; (iii) we
extend this set by the states in Sat(Φ) from which one can move to the already
computed set by a single transition; (iv) if no new states have been added in step
(iii), we have found the required set, otherwise we repeat (iii). As the number of
states is finite, this procedure is guaranteed to terminate. The worst case time
complexity of this algorithm (after an appropriate treatment of the ∃2-operator
[16]) is linear in the size of the formula and the number of transitions in the
model.
Applications. Although the model checking algorithms are conceptually rel-

atively simple, their combination with clever techniques to combat the state-
space explosion problem (such as binary decision diagrams, bit-state hashing and
partial-order reduction) make model checking a widely applicable and successful
verification technique. This is illustrated by the success of model checkers such
as SPIN, SMV, Uppaal and Murϕ, and their successful application to a large set
of industrial case studies ranging from hardware verification (VHDL, Intel P7
Processor), software control systems (traffic collision avoidance and alert system
270 C. Baier et al.
TCAS-II, storm surge barrier), and communication protocols (ISDN-User Part

and IEEE Futurebus+); see for an overview [18].
4 Stochastic Model Checking CTMCs

As has become clear from the previous section, the existing approaches for formal
verification using model checking and performance and dependability evaluation
have a lot in common. Our aim is to integrate these two evaluation approaches
even more, thereby trying to combine the best of both worlds.
4.1 A Logic for Performance and Dependability

To specify and evaluate performance and dependability measures as logical for-
mulas over CTMCs, we describe in this section CSL [1,2], a stochastic variant of
CTL, and explain how model checking this logic can be performed, summarising
work reported in [2,3,46].
Syntax. CSL extends CTL with two probabilistic operators that refer to the
steady-state and transient behaviour of the system being studied. Whereas the
steady-state operator refers to the probability of residing in a particular set of
states (specified by a state-formula) in the long run, the transient operator allows
us to refer to the probability of the occurrence of particular paths in the CTMC.
In order to express the time-span of a certain path, the path-operators until U
and next X are extended with a parameter that specifies a time-interval. Let I
be an interval on the real line, p a probability and a comparison operator, i.e.,
∈ { , }. The syntax of CSL now becomes:
State-formulas
Φ ::= a | ¬Φ |Φ ∨ Φ | Sp (Φ) | Pp (ϕ)
Sp (Φ) : prob. that Φ holds in steady state p
Pp (ϕ) : prob. that a path fulfils ϕ p
Path-formulas
ϕ ::= X I Φ | Φ UI Φ
X I Φ : the next state is reached at time t ∈ I and fulfils Φ
Φ U I Ψ : Φ holds along the path until Ψ holds at time t ∈ I
The state-formula Sp (Φ) asserts that the steady-state probability for the set
of Φ-states meets the bound p. The operator Pp (.) replaces the usual CTL
path quantifiers ∃ and ∀. In fact, for most cases ∃ϕ can be written as P>0 (ϕ)
and ∀ϕ as P1 (ϕ). These rules are not generally applicable due to fairness con-
siderations [6]. Pp (ϕ) asserts that the probability measure of the paths sat-
isfying ϕ meets the bound p. Temporal operators like 3, 2 and their real-
time variants 3I or 2I can be derived, e.g., Pp (3I Φ) = Pp (true U I Φ) and
Pp (2I Φ) = P1−p (3I ¬Φ). The untimed next- and until-operators are ob-
tained by XΦ = X I Φ and Φ1 U Φ2 = Φ1 U I Φ2 for I = [0, ∞).
Semantics. State-formulas are interpreted over the states of a CTMC. Let

M = (S, R, L) with labels in AP. The meaning of CSL-formulas is defined by
means of a so-called satisfaction relation (denoted by |=) between a CTMC M,
one of its states s, and a formula Φ. For simplicity the CTMC identifier M
is often omitted as it is clear from the context. The pair (s, Φ) belongs to the
relation |=, usually denoted by s |= Φ, if and only if Φ is valid in s. For CSL
state-formulas we have:
s |= a iff a ∈ L(s),
s |= ¬Φ iff s |= Φ,
s |= Φ1 ∨ Φ2 iff s |= Φ1 ∨ s |= Φ2 ,
s |= Sp (Φ) iff π(s, Sat(Φ)) p,
s |= Pp (ϕ) iff P rob(s, ϕ) p,
where P rob(s, ϕ) denotes the probability of all paths σ ∈ Path(s) satisfying ϕ
when the system starts in state s, i.e.,
P rob(s, ϕ) = Pr{ σ ∈ Path(s) | σ |= ϕ }.
The satisfaction relation for the path-formulas is defined by a satisfaction relation
(also denoted by |=) between paths and CSL path-formulas as follows. We have
that σ |= X I Φ iff
σ[1] is defined and σ[1] |= Φ ∧ δ(σ, 0) ∈ I,
and that σ |= Φ1 U I Φ2 iff
∃t ∈ I. (σ@t |= Φ2 ∧ ∀u ∈ [0, t). σ@u |= Φ1 ) .
Note that the formula Φ1 U ∅ Φ2 cannot be satisfied.
4.2 Expressing Measures in CSL

What types of performance and dependability measures can be expressed using
CSL? As a first observation, we remark that by means of the logic one does
not specify a measure but in fact a constraint (or: bound) on a performance or
dependability measure. Four types of measures can be identified: steady-state
measures, transient-state measures, path-based measures, and nested measures.
Assume that for each state s, we have a characteristic atomic proposition in(s)
valid in state s and invalid in any other state.
Steady-state measures. The formula Sp (in(s)) imposes a requirement on the

steady-state probability to be in state s. For instance, S10−5 (in(s2,1 )) is valid
in state s0,0 (cf. the running example) if the steady-state probability of having
a system configuration in which a single processor has failed is at most 0.00001
(when starting in state s0,0 ). This can be easily generalized towards selecting sets
of states by using more general state-formulas. The formula Sp (Φ) imposes a
constraint on the probability to be in some Φ-state on the long run. For instance,
the formula S0.99 (up3 ∨ up2 ) states that on the long run, for at least 99% of
the time at least 2 processors are operational.
272 C. Baier et al.
Transient measures. The combination of the probabilistic operator with the

temporal operator 3[t,t] can be used to reason about transient probabilities since
π(s, s , t) = P rob(s, 3[t,t] in(s )).
More specifically, Pp (3[t,t] in(s )) is valid in state s if the transient probability
at time t to be in state s satisfies the bound p. For instance, P.2 (3[t,t] in(s2,1 ))
is valid in state s0,0 if the transient probability of state s2,1 at time t is at most
0.2 when starting in state s0,0 . In a similar way as done for steady-state measures,
the formula P0.99 (3[t,t] up3 ∨ up2 ) requires that the probability to have 3 or
2 processors running at time t is at least 0.99. For specification convenience, a
transient-state operator
Tp
@t
(Φ) = Pp (3[t,t] Φ)
could be defined. It states that the probability for a Φ-state at time t meets the
bound p.
Path-based measures. The standard transient measures on (sets of) states are
expressed using a specific instance of the P-operator. However, by the fact that
this operator allows an arbitrary path-formula as argument, much more general
measures can be described. An example is the probability of reaching a certain
set of states provided that all paths to these states obey certain properties. For
instance,
P0.01 ((up3 ∨ up2 ) U [0,10] down)
is valid for those states where the probability of the system going down within
10 time-units after having continuously operated with at least 2 processors is at
most 0.01.
Nested measures. By nesting the P- and S-operators more complex measures

of interest can be specified. These are useful to obtain a more detailed insight
into the system’s behaviour. We provide two examples. The property
S0.9 (P0.8 (2[0,10] ¬d own))
is valid in those states that guarantee that in equilibrium with probability at

least 0.9 the probability that the system will not go down within 10 time units
is at least 0.8. Conversely,
P0.5 ((¬d own) U [10,20] S0.8 ((up3 ∨ up2 )))
is valid for those states that with probability at least 0.5 will reach a state s
between 10 and 20 time-units, which guarantees the system to be operational
with at least 2 processors when the system is in equilibrium. Besides, prior to
reaching state s the system must be operational continuously.
To put it in a nutshell, we believe that there are two main benefits by using
CSL for specifying constraints on measures-of-interest. First, the specification is
completely formal such that the interpretation is unambiguous. Whereas this is

also the case for standard transient and steady-state measures, this often does
not apply to measures that are derived from these elementary measures. Such
measures are typically described in an informal manner. A rigorous specification
of such more intricate measures is of utmost importance for their automated
analysis (as proposed in the sequel). Furthermore, an important aspect of CSL
is the possibility to state performance and dependability requirements over a
selective set of paths through a model, which was not possible previously. Finally,
the possibility to nest steady-state and transient measures provides a means to
specify complex, though important measures in a compact and flexible way.
4.3 Model Checking CSL-Specified Measures

Once we have formally specified the (constraint on the) measure-of-interest in
CSL by a formula Φ, and have obtained our model, i.e., CTMC M, of the system
under consideration, the next step is to adapt the model checking algorithm for
CTL to support the automated validation of Φ over a given state s in M. The
basic procedure is as for model checking CTL: in order to check whether state
s satisfies the formula Φ, we recursively compute the set Sat(Φ) of states that
satisfy Φ, and check whether s is a member of that set. For the non-probabilistic
state-operators this procedure is the same as for CTL. The main problem we
have to face is how to compute Sat(Φ) for the S and P-operators. We deal with
these operators separately.
Steady-state measures. For an ergodic (strongly connected) CTMC:

s ∈ Sat(Sp (Φ)) iff π(s, s ) p.
s ∈Sat(Φ)
Thus, checking whether state s satisfies Sp (Φ), a standard steady-state analysis
has to be carried out, i.e., a system of linear equations has to be solved.
In case the CTMC M is not strongly-connected, the approach is to determine
the so-called bottom strongly-connected components (BSCCs) of M, i.e., the
set of strongly-connected components that cannot be left once they are reached.
Then, for each BSCC (which is an ergodic CTMC) the steady-state probability
of a Φ-state (determined in the standard way) and the probability to reach any
BSCC B from state s is determined. To check whether state s satisfies Sp (Φ)
it then suffices to verify
⎛ ⎞

⎝P rob(s, 3B) · π B (s )⎠ p,
B s ∈B∩Sat(Φ)
where π B (s ) denotes the steady-state probability of s in BSCC B, and

P rob(s, 3B) is the probability to reach BSCC B from state s. To compute
these probabilities, standard methods for steady-state and graph analysis can
be used.
274 C. Baier et al.
Path-based measures. In order to understand how the model checking of the

path-based operators is carried out it turns out to be helpful to give (recursive)
characterisations of P rob(s, ϕ):
s ∈ Sat(Pp (ϕ)) iff P rob(s, ϕ) p.
– Timed Next: For the timed next-operator we obtain that P rob(s, X I Φ)

equals
e−E(s)·inf I − e−E(s)·sup I · P(s, s ), (1)
s ∈Sat(Φ)
i.e., the probability to leave state s in the interval I times the probability
to reach a Φ-state in one step. Thus, in order to compute the set Sat(X I Φ)
we first recursively compute Sat(Φ) and add state s to Sat(X I Φ) if it fulfils
(1); this check boils down to a matrix-vector multiplication.
– Time-Bounded Until: For the sake of simplicity, we only treat the case
I = [0, t]; the general case is a bit more involved, but can be treated in a
similar way [3]. The probability P rob(s, Φ U [0,t] Ψ ) is the least solution of the
following set of equations: (i) 1, if s ∈ Sat(Ψ ), (ii) 0, if s ∈ Sat(Φ) ∪ Sat(Ψ ),
and ) t
R(s, s ) · e−E(s)·x · P rob(s , Φ U [0,t−x] Ψ ) dx (2)
0 s ∈S
otherwise. The first two cases are self-explanatory; the last equation is ex-
plained as follows. If s satisfies Φ but not Ψ , the probability of reaching a
Ψ -state from s within t time-units equals the probability of reaching some
direct successor state s of s within x time-units (x t), multiplied by the
probability to reach a Ψ -state from s in the remaining time-span t−x.
It is easy to check that for the untimed until-operator (i.e., I = [0, ∞))
equation (2) reduces to

P(s, s ) · P rob(s , Φ U Ψ ).
s ∈S
Thus, for the standard until-operator, we can check whether a state satis-
fies Pp (Φ U Ψ ) by first computing recursively the sets Sat(Φ) and Sat(Ψ )
followed by solving a linear system of equations.
Solution for time-bounded until. We now concentrate on numerical tech-

niques for solving the so-called Volterra integral equation system (2) arising in
the time-bounded until case.
As a first approach, numerical integration techniques can be applied. Exper-
iments with integration techniques based on equally-sized abscissas have shown
that the computation time for solving (2) is rapidly increasing when the state
space becomes larger (above 10,000 states), or when the required accuracy be-
comes higher, e.g., between 10−6 and 10−9 . Numerical stability is another issue
of concern when using this method [37].
An alternative method is to reduce the problem of computing

P rob(s, Φ U [0,t] Ψ ) to a transient analysis problem for which well-known and ef-
ficient computation techniques do exist. This idea is based on the earlier obser-
vation that for a specific instance of the time-bounded until-operator we know
that it characterises a standard transient probability measure:
Tp
@t
(Φ) = Pp (true U [t,t] Φ)
Thus, for computing P rob(s, true U [t,t] Φ) standard transient analysis techniques
can be exploited. This raises the question whether we might be able to reduce
the general case, i.e., P rob(s, Φ U [0,t] Ψ ), to an instance of transient analysis as
well. This is indeed possible: the idea is to transform the CTMC M under
consideration into another CTMC M such that checking ϕ = Φ U [0,t] Ψ on
M amounts to checking ϕ = true U [t,t] Ψ on M ; a transient analysis of M
(for time t) then suffices. The question then is, how do we transform M in
M ? Two simple observations form the basis for this transformation. First, we
observe that once a Ψ -state in M has been reached (along a Φ-path) before
time t, we may conclude that ϕ holds, regardless of which states will be visited
after having reached Ψ . Thus, as a first transformation we make all Ψ -states
absorbing. Secondly, we observe that ϕ is violated once a state has been reached
that neither satisfies Φ nor Ψ . Again, this is regardless of the states that are
visited after having reached ¬(Φ ∧ Ψ ). Thus, as a second transformation, all the
¬(Φ ∧ Ψ )-states are made absorbing. It then suffices to carry out a transient
analysis on the resulting CTMC M for time t and collect the probability mass
to be in a Ψ -state (note that M typically is smaller than M):

P robM (s, Φ U [0,t] Ψ ) = P robM (s, true U [t,t] Ψ ).
In fact, by similar observations it turns out that also verifying the general
U I -operator can be reduced to instances of (nested) transient analysis [3]. As
mentioned above, the transient probability distribution can be computed via a
uniformised DTMC which characterises the CTMC at discrete state transition
epochs. A direct application of uniformisation to compute P robM (s, Φ U [0,t] Ψ )
requires to perform this procedure for each state s. An improvement suggested
in [46] cumulates the entire vector P robM (Φ U I Ψ ) for all states simultaneously.
For a single operator U I this yields a time complexity of O(|R|·Nε ), where
|R| is the number of non-zero entries in R, and Nε is the number of iterations
within the uniformisation algorithm needed to achieve a given accuracy ε. The
value Nε can be computed a priori, it linearly depends on the maximal diago-
nal entry of the generator matrix E max , and on the maximal time bound tmax
occuring in Φ.
In total, the time complexity to decide the validity of a CSL fomula Φ on a
CTMC (S, R, L) is O(|Φ|·(|R|·E max ·tmax + |S|2.81 )), and the space complexity
is O(|R|) [5].
276 C. Baier et al.
5 Stochastic Model Checking Markov Reward Models
5.1 Introduction
With the advent of fault-tolerant gracefully-degradable computer systems, the

separation between performance and dependability aspects of a system does not
make sense anymore. Indeed, fault-tolerant systems can operate “correctly” at
various levels of performance, and the dependability of a system might be ex-
pressed in terms of providing a minimum performance level, rather then in terms
of a certain amount of operational hardware resources. These considerations lead,
in the late 1970’a and the early 1980’s, to the concept of performability [51,52],
in which it is investigated how well a system performs over a finite time horizon,
provided (partial) system failures and repair actions are taken into account. As
it turned out later, the notion of performability also fits quite naturally to the
notion of quality of service as specified in ITU-T Recommendation G.106 [12].
Furthermore, as natural model for performability evaluations, so-called Markov
reward models have been adopted, as will be explained below; for further details
on performability evaluation, see [29].
Markov reward models. An MRM is a CTMC augmented with a reward

structure assigning a real-valued reward to each state in the model. Such reward
can be interpreted as bonus, gain, or conversely, as cost. Typical measures of
interest express the amount of gain accumulated by the system, over a finite
or infinite time-horizon. Formally, an MRM is a tuple M = (S, R, L, ρ) where
(S, R, L) is a CTMC, and ρ : S → IR0 is a reward structure that assigns to
each state s a reward ρ(s), also called gain or bonus, or dually, cost.
Example 3. For the TMR example, the reward structure can be instantiated in
different ways so as to specify a variety of performability measures. The simplest
reward structure (leading to an availability model) divides the states into opera-
tional and non-operational ones: ρ1 (s0,0 ) = 0 and ρ1 (si,0 ) = 1 for the remaining
states. A reward structure in which varying levels of trustworthiness are repre-
sented is for instance based on the number of operational processors: ρ2 (s0,0 ) =
0 and ρ2 (si,1 ) = i. As a third reward structure, one may consider the mainte-
nance costs of the system, by setting: ρ3 (s0,0 ) = c2 and ρ3 (si,1 ) = c1 · (3 − i),
where c1 is the cost to replace a processor, and c2 the cost to renew the entire
system. As a fourth option (which we do not further consider here) one can also
imagine a reward structure quantifying the power consumption in each state.
Accumulating reward along a path. The presence of a reward structure

allows one to reason about (at least) two different aspects of system cost/reward.
One either may refer to the instantaneous reward at a certain point in time
(even in steady-state), or one may refer to the reward accumulated in a certain
interval of time. For an MRM (S, R, L, ρ), and σ = s0 , t0 , s1 , t1 , s2 , t2 , . . . an
infinite path (through the corresponding CTMC (S, R, L)) the instantaneuos
reward at time t is given by ρ(σ@t). The cumulated reward y(σ, t) along σ up
k−1
to time t can be formalised as follows. For t = j=0 tj + t with t tk we
k−1
define y(σ, t) = j=0 tj · ρ(sj ) + t · ρ(sk ). For finite paths ending at time point
t the
cumulated reward definition is slightly adapted, basically replacing t by
l−1
t − j=0 tj .
Measure specification. The specification of the measure-of-interest for a given

MRM can not always be done conveniently, nor can all possible measures-of-
interest be expressed conveniently. In particular, until recently it has not been
possible to directly express measures where state sequences or paths matter, nor
to accumulate rewards only in certain subsets of states, if the rewards outside
these subsets are non-zero. Such measures are then either “specified” informally,
with all its negative implications, or require a manual tailoring of the model so
as to address the right subsets of states. Below we will address a rigorous but
flexible way of expressing performability measures.
Finally, note that Obal and Sanders recently proposed a technique to specify
so-called path-based reward variables [54] by which the specification of measures
over state sequences becomes more convenient, because it avoids the manual
tailoring of the model. In the context of the stochastic process algebra PEPA,
Clark et al. recently proposed the use of a probabilistic modal logic to ease the
specification of reward structures of MRM [15], as opposed to the specification
of reward-based measures, as we do.
5.2 A Logic for Performability

The addition of rewards on the model level raises the question how they can be
reflected on the measure specification level, i.e., on the level of the logic. We re-
strict ourselves for the moment to consider the accumulation of reward, because
this turns out to be a conceptually interesting extension that fits very well to
the temporal logic approach. We shall later (in Section 5.6) return to the ques-
tion how to support other forms of reward quantification, such as instantaneuos
reward.
Since rewards are accumulated along a path, it appears wise to extend the
path formulas of CSL to account for the earning of reward, and this is what dis-
tinguishes CSRL from CSL. The state formulas of CSRL are unchanged relative
to CSL (until Section 5.6), whereas path formulas ϕ now become
-
-
ϕ ::= XJI Φ - Φ UJI Φ,
for intervals I, J ⊆ IR0 . In a similar way as before, we define 3IJ Φ = true UJI Φ
and Pp (2IJ Φ) = ¬Pp (3IJ ¬Φ). Interval I can be considered as a timing con-
straint whereas J represents a bound for the cumulative reward. The path-
formula XJI Φ asserts that a transition is made to a Φ-state at time point t ∈ I
such that the earned cumulative reward r until time t meets the bounds spec-
ified by J, i.e., r ∈ J. The semantics of Φ1 UJI Φ2 is as for Φ1 U I Φ2 with the
additional constraints that earned cumulative reward r at the time of reaching
some Φ2 -state lies in J, i.e., r ∈ J.
278 C. Baier et al.
[60,60]
Example 4. As an example property for the TMR system, P0.95 (3[0,200] true)
denotes that with probability of at least 0.95 the cumulative reward, e.g., the
incurred costs of the system for reward structure ρ3 , at time instant 60 is at most
200. Given that the reward of a state indicates the power consumed per time-
[0,30]
unit, property P<0.08 (up3 U[7,∞) (down ∨ up2 )) expresses that with probability
less than 0.08 within 30 time units at least 7 units of power have been consumed
in full operational mode before some component fails. A simpler property, that
[0,∞)
only refers to reward accumulation, P>0.5 (3[0,10] down) would say that it is likely
(probability > 0.5) to spend less than 10 units of energy before a voter failure.
The semantics of the CSRL path-formulas is an extension of the CSL se-

mantics we introduced in Section 4.1. It differs from the latter in that additional
constraints are imposed reflecting that the accumulated reward on the path must
be in the required interval. We have that σ |= XJI Φ iff
σ[1] is defined and σ[1] |= Φ ∧ δ(σ, 0) ∈ I ∧ y(σ, δ(σ, 0)) ∈ J
and that σ |= Φ1 UJI Φ2 iff
∃t ∈ I. (σ@t |= Φ2 ∧ (∀u ∈ [0, t). σ@u |= Φ1 ) ∧ y(σ, t) ∈ J).
For the XJI case, the definition refines the one for CSL by demanding that the
reward accumulated during the time δ(σ, 0) of staying in the first state of the
path lies in J, while for UJI the reward accumulated until the time t when
touching a Φ2 -state must be in J.
5.3 Expressing Measures in CSRL
MRMs are an extension of CTMCs, and so is CSRL an extension of CSL.

Since CSL does not allow any reference to rewards, it is obtained by putting
no constraint on the reward accumulated, i.e., by setting J = [0, ∞) for all
sub-formulas:
X I Φ = X[0,∞)
I
Φ and Φ1 U I Φ2 = Φ1 U[0,∞)
I
Φ2 .
Similarly, we can identify a new logic CRL (continuous reward logic) in case I =
[0, ∞) for all sub-formulas. In CRL it is only possible to refer to the cumulation
[0,∞)
of rewards, but not to the advance of time. The formula P>0.5 (3[0,10] down)
is an example property of the CRL subset of CSRL. The CRL logic will play
a special role when describing the model checking of CSRL, and therefore we
will first discuss how model checking CRL can be performed, before turning our
attention to CSRL. Before doing so, we list in Table 1 a variety of standard
performance, dependability, and performability measures and how they can be
phrased in CSRL. Here F is a generic formula playing the role of an identifier
of the failed system states of the model under study (in the TMR example, F
would be down∨up0 ). These measures correspond to basic formulas in the logic,
Table 1. Some typical performability measures
performability measure formula logic
steady-state availability Sp (¬F) CSL

instantaneous availability at time t Pp (3[t,t] ¬F) CSL
distribution of time to failure Pp (¬F U [0,t] F) CSL
distribution of reward until failure Pp (¬F U [0,r] F) CRL
[t,t]
distribution of cumulative reward until t Pp (3[0,r] true) CSRL
and it is worth to highlight that much more involved and nested measures are
easily expressible in CSRL, such as

[0,85]
S>0.3 P<0.3 (3[3,5] up2 ) ⇒ P>0.1 ((¬down) U [5,∞) up3 ) .
5.4 Model Checking CRL-Specified Measures
This section discusses how model checking can be performed for CRL properties,
i.e., formulas which do only refer to the cumulation of rewards, but not to the
advance of time. We will explain how a duality result can be used to reduce
model checking of such formulas to the CSL model checking algorithm described
above.
The basic strategy is the same as for CSL, and only the path operators XJ ,
UJ need specific considerations. To calculate the probability of satisfiying such
a path formula we rely on a general duality result for MRMs and CSRL [4].
Duality. Assume an MRM M = (S, R, L, ρ) with positive reward structure,

i.e., ρ(s) > 0 for each state s. The basic idea behind the duality phenomenon
is that the progress of time can be regarded as the earning of reward and vice
versa. This observation is inspired by [7]. To make it concrete, we define an MRM
M−1 = (S, R , L, ρ ) that results from M by
– rescaling the transition rates by the reward of their originating state, i.e.,
R (s, s ) = R(s, s )/ρ(s) and,
– inverting the reward structure, i.e., ρ (s) = 1/ρ(s).
Intuitively, the transformation of M into M−1 stretches the residence time in

state s with a factor that is proportional to the reciprocal of its reward ρ(s) if
ρ(s) > 1, and it compresses the residence time by the same factor if 0 < ρ(s) < 1.
The reward structure is changed similarly. Note that M = (M−1 )−1 .
One might interpret the residence of t time units in M−1 as the earning of t
reward in state s in M, or (reversely) an earning of a reward r in state s in M
corresponds to a residence of r in M−1 . As a consequence, the rôles of time and
280 C. Baier et al.
reward in M are reversed in M−1 . In terms of the logic CSRL, this corresponds
to swapping reward and time intervals inside a CSRL formula, and allows one
to establish that
−1
P robM (s, XJI Φ) = P robM (s, XIJ Φ), and
−1
P robM (s, Φ1 UJI Φ2 ) = P robM (s, Φ1 UIJ Φ2 ).
As a consequence, one can obtain the set SatM (Φ) (comprising the states in M
−1
satisfying Φ) by computing instead SatM (Φ−1 ), i.e.,
−1
SatM (Φ) = SatM (Φ−1 ),
where Φ−1 is defined as Φ where for each sub-formula in Φ of the form XJI
or UJI the intervals I and J are swapped. For the TMR example, for Φ =
P0.9 (¬F U[10,∞) F) we have Φ−1 = P0.9 (¬F U[50,50] F). We refer to [4] for a
[50,50] [10,∞)
proof of this property, and to extensions of this result to some cases with zero
rewards. Note that we excluded zero rewards here, since otherwise the model
inversion would imply divisions by zero.
The duality result is the key to model check CRL on MRMs (satisfying the
above restriction), since the swapping of formula implies that XJ turns into X J ,
and UJ into U J . Hence, any CRL formula corresponds to a CSL formula inter-
preted on the dual MRM. As a consequence, model checking CRL can proceed
via the algorithm for CSL, with some overhead (linear in the model size plus the
formula length) needed to swap the model and swap the formula.
5.5 Model Checking CSRL-Specified Measures

For the general case of CSRL, model checking algorithms are more involved, and
research on their effectiveness is ongoing [32,33]. In this section we describe the
basic strategy and sketch three algorithms implementing this strategy. A more
detailed comparison of the algorithmic intricacies can be found in [33].
Given an MRM M = (S, R, L, ρ) the model checking algorithm proceeds as
in the earlier considered cases: in order to check whether state s satisfies the
formula Φ, we recursively compute the set Sat(Φ) of states that satisfy Φ, and
check whether s is a member of that set. Most of the cases have been discussed
before in this paper, except for the handling of path operators with both time
and reward intervals. For the sake of simplicity, we do not consider the next
operator X, and we (again) restrict to formulas where all occurring intervals are
of the form [0, x], i.e., they impose upper bounds on the time or the cumulated
reward, but no lower bound.
[0,t]
So, the question is how to compute P rob(s, Φ U[0,r] Ψ ). Recall that in the
CSL case, the crucial step has been to reduce the computation to instances of
transient analysis. Indeed, it is possible to proceed in a similar way. In analogy
to the CSL strategy, we can show that the above probability agrees with the
[t,t]
probability P rob(s, true U[0,r] Ψ ) on a transformed MRM where all Ψ -states and
absorbing barrier
”accumulated reward dimension”

rate ρ(2)
rate ρ(1)
height r
1 2 5
Sat(Ψ )
3
Sat(Φ ∧ ¬Ψ )
Sat(¬Φ ∧ ¬Ψ ) 4
”CTMC dimension”
Fig. 3. Two-dimensional stochastic process ((Xt , Yt ), t ≥ 0) for model checking

[0,t]
P rob(s, Φ U[0,r] Ψ ).
all ¬ (Φ ∧ Ψ )-states are made absorbing, and have reward 0 assigned to them.
The intuitive justification is as in the CSL setting. The rewards are set to 0 since
once a path reaches a Ψ -state at time t < t, while not having accumulated more
than r reward, it suffices to be trapped in that state until time t provided no
reward will be earned anymore, i.e., ρ(s) = 0 for Ψ -state s. Note that we can
amalgamate all states satisfying Ψ and all states satisfying ¬ (Φ ∧ Ψ ), thereby
making the MRM considerably smaller.
Thus, we can restrict our attention to the computation of
[t,t]
P rob(s, true U[0,r] Ψ ). This probability, in turn, can be derived from the
transient accumulated reward distribution of the MRM. (Compare this to the
transient distribution used in the CSL case at this point.) To explain why this
is the case, we consider a two-dimensional stochastic process ((Xt , Yt ), t ≥ 0) on
S × IR≥0 , as illustrated in Figure 3. Informally speaking, this stochastic process
has a discrete component that describes the transition behaviour in the original
MRM, combined with a continuous component that describes the accumulated
reward gained over time. For t = 0 we have Yt = 0, and for t > 0 the value of Yt
increases continuously with rate ρ(Xt ). Hence, the discrete states of the original
CTMC become “columns” of which the height models the accumulated reward.
To take into account the reward bound (≤ r), we introduce an absorbing barrier
282 C. Baier et al.
in the process whenever Yt reaches the level r. Actually, we are interested in

Pr{Yt r, Xt ∈ S }, i.e., the probability of being in a certain subset S of states
at time t, having accumulated a reward smaller than r. For our purposes, S
shall be chosen to be the set Sat(Ψ ) of states satisfying Ψ and we start the
process in state s under consideration. We have that
[t,t]
P rob(s, true U[0,r] Ψ ) = Pr{Yt r, Xt ∈ Sat(Ψ )}
for the transformed MRM described above. This allows us to decide the satisfac-
tion of time- and reward-bounded until formulas via numerical recipes for cal-
culating Pr{Yt r, Xt ∈ S } on the two dimensional stochastic process (Xt , Yt ).
It is worth to remark that similar processes (with mixed discrete-continuous
state spaces) also emerge in the analysis of non-Markovian stochastic Petri nets
(when using the supplementary variable approach, cf. [22]), Markov-regenerative
stochastic Petri nets [9], and in fluid-stochastic Petri nets [42]. We briefly sketch
three other approaches to compute Pr{Yt r, Xt ∈ S } here, which are more
directly applicable to the problem.
An Erlangian approximation. A first approach to compute

Pr{Yt r, Xt ∈ S } is to approximate the fixed reward bound r by a re-
ward bound that is Erlang-k distributed with mean r. One may view this as
some kind of discretisation of the continuous reward dimension into k steps.
The main advantage of this approach is that the resulting model is both
discrete-space and completely Markovian, and hence the techniques developed
for CSL properties (cf. Section 4.3) can be used to approximate the required
probabilities; reaching the reward bound in the original model corresponds to
reaching a particular set of states in the approximated model. As a disadvantage
we mention that an appropriate value for k (the number of phases in the
Erlangian approximation) is not known a priori. Furthermore, when CSRL
expressions are nested, it is yet unclear how the error in the approximation
propagates. Furthermore, the resulting Markov chain becomes substantially
larger, especially if k is large. On the other hand, the MRM can be described as
a tensor product of two smaller MRMs, which can be exploited in the solution
procedure (as far as the storage of the generator matrix is concerned).
With an Erlang-k distributed approximation of the reward bound together
with uniformisation, the space complexity of this method is O(|S|2 ·k 2 ), and the
time complexity is O(Nε ·|S|2 ·k 2 ), where Nε equals the number of steps required
to reach a certain accuracy ε (which can be computed a priori). Note that N
determines the accuracy of only the transient analysis; it does not account for
the (in-)accuracy of the approximation itself.
Discretisation. Recently, Tijms and Veldman [60] proposed a discretisation

method for computing the transient distribution of the accumulated reward in
an MRM. Their algorithm is a generalisation of an earlier algorithm by Goyal and
Tantawi [24] for MRMs with only 0- and 1-rewards. The basic idea is to discretise
both the time and the accumulated reward as multiples of the same step size d,
where d is chosen such that the probability of more than one transition in the
MRM in an interval of length d is negligible. The algorithm allows only natural
number rewards, but this is no severe restriction since rational rewards can be
scaled to yield natural numbers.
The time complexity of this method is O(|S|·t·|(t−r)|/d2 ) and the space
complexity is O(|S|·r/d). As the computational effort is proportional to d−2 ,
the computation time grows rapidly when a higher accuracy is required.
Occupation time distributions. In 2000, Sericola [57] derived a result for the
distribution of occupation times in CTMCs prior to a given point in time t. The
approach is based on uniformisation, and (as with uniformisation) it is possible
to calculate an a priori error bound for the computed values. The distribution
of this occupation time can be used to derive Pr{Yt r, Xt ∈ S }, based on
the observation that if O(s, t) is the occupation time of state s prior to t then
ρ(s) · O(s, t) is the accumulated reward for this state prior to t. Summing over
all states leads to the accumulated reward required.
The computation of the occupation time distribution is an iterative proce-
dure, which in each iteration updates a linearly growing set of matrices. The
computational and storage requirements of the approach are therefore consider-
able. If we truncate after the Nε -th iteration, we obtain an overall time complex-
ity of O(Nε3 |S|3 ) and an overall space complexity of O(Nε2 |S|3 ). Contrary to the
Erlangian approximation, N determines the accuracy of the entire computation
procedure in this approach.
General observations. We have implemented all three algorithms, and exper-

imented with them on a case study analysing the power consumption in ad-hoc
mobile networks [33]. We can report the following observations:
– The three computational procedures converge to the same value, however,

only for the occupation time distribution approach an a priori error bound
(and hence a stopping criterion) is available.
– The method based on occupation time distributions is fast and accurate. In
the current case study (which is small) we did not run into storage problems,
however, the cubic storage requirements will limit this method to relatively
small case studies.
– The discretisation method is slow when a fine-grain discretisation is used.
Unfortunately, we have no method available (yet) to get a hold on the re-
quired step size to achieve a certain accuracy.
– The Erlangian approach is fast (where we did not even exploit the tensor
structure in the generator matrix), but also here, we have to guess a reason-
able number of phases for the approximation.
– The discretisation method suffers particularly from large time-bounds and
large state spaces, as these make the number of matrices to be computed
larger.
– The method based on occupation time distributions becomes less attractive
when the time bound is large in comparison to the uniformisation rate. We
284 C. Baier et al.
are currently investigating whether some kind of steady-state detection can

be employed to shorten the computation in these cases.
5.6 Extending CSRL with Further Reward Operators

So far we have considered the accumulation of reward along paths, because as
this is the basic novelty we support via the enriched path operators XJI and
UJI . In an orthogonal manner, it is possible to support further reward-based
measures, namely by allowing further state operators.
To do so, consider state s in MRM M. For time t and set of states S , the
instantaneous reward ρ (s, S , t) equals s ∈S π (s, s , t) · ρ(s ) and denotes
M M
the rate at which reward is earned in somestate in S at time t. The expected

(or long run) reward rate ρM (s, S ) equals s ∈S π M (s, s ) · ρ(s ). We can now
add the following state operators to our framework:
Expected reward rate EJ : The operator EJ (Φ) is true if the expected (long
run) reward rate is in the interval J, if starting in state s:
s |= EJ (Φ) iff ρM (s, Sat M (Φ)) ∈ J.
Expected instantaneous reward rate EJt : The operator EJt (Φ) states that
the expected instantaneous reward rate at time t lies in J:
s |= EJt (Φ) iff ρM (s, Sat M (Φ), t) ∈ J.
Expected cumulated reward CJI : The operator CJI (Φ) states that the ex-
pected amount of reward accumulated in Φ-states during the interval I lies
in J: )
s |= CJI (Φ) iff ρM (s, Sat M (Φ), u) du ∈ J.
I
The inclusion of these operators in CSRL is possible because their model check-
ing is rather straightforward. The first two formulas require the summation of
the Φ-conforming steady-state or transient state probabilities multiplied with
the corresponding rewards. The operator CJI (Φ) can be evaluated using a variant
of uniformisation [28,58]. Some example properties are now: EJ (¬F), which ex-
presses the expected reward rate, e.g., the system’s capacity, for an operational
system, EJt (true) expresses the expected instantaneous reward rate at time t and
[0,t]
CJ (true) expresses the amount of accumulated reward up to time t.
The suggestion to include these operators into CSRL exemplifies how a prag-
matic approach (providing new algorithms for new measures) can be combined
with our logical approach, and can profit from the latter due to the ability of
nesting state and path formulas in an arbitrary manner.
6 Stochastic Model Checking and Lumpability

This section is devoted to an important property that the CSRL logic fam-
ily possesses. The property relates the well-known concepts of lumpability and
bisimulation to the distinguishing power of the logic. We exemplify this property
for CSRL, since this includes the other logics as subsets.
Bisimulation (lumping) equivalence. Lumpability enables the aggregation

of CTMCs and MRMs without affecting performance properties [47,10,40,35].
We adapt the standard notion slightly in order to deal with MRMs with state-
labellings. We only sketch the concepts here, and refer to the papers [4,5] for more
details. For some MRM M = (S, R, L, ρ) we say that an equivalence relation R
on S is a bisimulation if whenever (s, s ) ∈ R then
L(s) = L(s ) and ρ(s) = ρ(s ) and R(s, C) = R(s , C) for all C ∈ S/R,

where S/R denotes the quotient space under R and R(s, C) = s ∈C R(s, s ).
States s and s are said to be bisimilar iff there exists a bisimulation R that
contains (s, s ). Thus, any two bisimilar states are equally labelled and the cu-
mulative rate of moving from these states to any equivalence class C is equal.
Since R is an equivalence relation, we can construct the quotient M/R, often
called the lumped Markov model of M.
Example 5. The reflexive, symmetric and transitive closure of the relation
R = { (0111, 1011), (1011, 1101), (0011, 0101), (0101, 1001) }
is a bisimulation on the set of states of the MRM depicted in Fig. 4. For conve-
nience, double arrows are used to indicate that there exists a transition from a
state to another state and vice versa. The lumped MRM M/R consists of five
aggregated states, yielding, in fact, the MRM of the TMR system discussed in
Example 1. For instance, state s2,1 of the original model can be considered as
the lumped state representing the three possible configurations in which, out of
three, a single processor has failed. These configurations are represented in the
detailed version of Fig. 4 by the states 0111, 1011, and 1101.
up2 up1
λ μ λ μ
0111 0011
μ λ
λ up2μ λ up1 μ
1011 λ 0101
μ μ μ λ λ
up2 μ λ up1
λ μ λ
up3 1111 1101 1001 0001 up0
μ μ ν λ
ν
ν ν
δ ν
ν
0000
ν down ν
Fig. 4. A detailed version of the TMR model
It is well known that the measures derived from M and its quotient M/R are
strongly related if R is a bisimulation. Without going into details, it is possible
286 C. Baier et al.
to compute transient as well as steady state probabilities on the lumped MRM

M/R if one is only interested in probabilities of equivalence classes. For a given
MRM it is therefore possible to establish the following property [4,19,5]:
s |= Φ iff s |= Φ for all CSRL formulas Φ
if and only if s and s are bisimilar.
In other words, CSRL cannot distinguish between lumping equivalent states, but
non-equivalent states can always be distinguished by some CSRL formula. This
looks like a theoretical result, but it also has practical implications: it allows
one to carry out model checking of CSRL (and CSL, and CRL) on the quotient
state space with respect to lumpability. This lumped state space is often much
smaller than the original one. It can be computed by a partition refinement
algorithm [39,20].
7 Conclusions and Future Outlook

In this paper we have tried to give a tutorial style overview on the model checking
approach to continuous time Markov chains and Markov reward models. While
the logics CSL and CRL can be model checked using well-known numerical
techniques to analyse Markov chains, more work is needed in the context of
model checking performability properties expressible with CSRL to make the
analysis effective.
Since the first paper on algorithms for CSL model checking has been pub-
lished in 1999 [2] the approach has been implemented in (at least) three research
tools, namely the ETMCC model checker [37], the model checker Prism, and the
APNN toolbox [11]. While ETMCC is a dedicated CSL model checker based on
sparse matrix data structures, Prism employs BDD based techniques to combat
the state space explosion problem. The APNN toolbox uses Kronecker represen-
tations to achieve better space efficiency.
So far, we (and others) have applied stochastic model checking to various
small and medium size case studies, including the analysis of a dependable work-
station cluster [31], the verification of the performance of the plain ordinary tele-
phone system protocol [3], the estimation of power consumption in mobile ad
hoc networks [33], and the assessment of the survivability of the Hubble space
telescope [34]. Among work that extends the basic stochastic model checking
approach to a broader context, we are aware of the extension of CSL to pro-
cess algebra specifications [38], to semi-Markov chains [44] and to random time
bounds [49].
More work is foreseen in many exciting areas extending what has been de-
scribed in this tutorial paper, ranging from research on the inclusion of non-
determinism, to efforts to improve the effectiveness of the algorithms described,
to the application of stochastic model checking to realistic case studies.
Acknowledgements. The authors thank Joachim Meyer-Kayser, Markus

Siegle and Lucia Cloth for valuable discussions and contributions. Holger Her-
manns is partially supported by the Netherlands Organization for Scientific Re-

search (NWO) and Joost-Pieter Katoen is partially supported by the Dutch
Technology Foundation (STW). The cooperation between the research groups
in Aachen, Bonn, and Twente takes place as part of the Validation of Stochastic
Systems (VOSS) project, funded by the Dutch NWO and the German Research
Council DFG.
References
1. A. Aziz, K. Sanwal, V. Singhal, and R. Brayton. Model checking continuous time
Markov chains. ACM Transactions on Computational Logic, 1(1): 162–170, 2000.
2. C. Baier, J.-P. Katoen, and H. Hermanns. Approximate symbolic model checking
of continuous-time Markov chains. In Concurrency Theory, LNCS 1664: 146–162,
Springer-Verlag, 1999.
3. C. Baier, B.R. Haverkort, H. Hermanns, and J.-P. Katoen. Model checking
continuous-time Markov chains by transient analysis. In Computer Aided Veri-
fication, LNCS 1855: 358–372, Springer-Verlag, 2000.
4. C. Baier, B.R. Haverkort, H. Hermanns, and J.-P. Katoen. On the logical character-
isation of performability properties. In Automata, Languages, and Programming,
LNCS 1853: 780–792, Springer-Verlag, 2000.
5. C. Baier, B.R. Haverkort, H. Hermanns, and J.-P. Katoen. Model checking al-
gorithms for continuous-time Markov chains. Technical report TR-CTIT-02-10.
Centre for Telematics and Information Technology, University of Twente. 2001.
6. C. Baier and M. Kwiatkowska. On the verification of qualitative properties of
probabilistic processes under fairness constraints. Information Processing Letters,
66(2): 71–79, 1998.
7. M.D. Beaudry. Performance-related reliability measures for computing systems.
IEEE Transactions on Computers, C-27: 540–547, 1978.
8. B. Bérard, M. Bidoit, A. Finkel, F. Laroussine, A. Petit, L. Petrucci, and Ph. Sch-
noebelen. Systems and Software Verification. Springer-Verlag, 2001.
9. A. Bobbio and M. Telek. Markov regenerative SPN with non-overlapping activity
cycles. In Proc. Int’l IEEE Performance and Dependability Symposium: 124–133,
1995.
10. P. Buchholz. Exact and ordinary lumpability in finite Markov chains. Journal of
Applied Probability, 31: 59–75, 1994.
11. P. Buchholz, J.-P. Katoen, P. Kemper, and C. Tepper. Model checking large struc-
tured Markov chains. Journal of Logic and Algebraic Programming, to appear,
2001.
12. CCITT Blue Book, Fascicle III.1, International Telecommunication Union,
Geneva, 1989.
13. G. Chiola, G. Franceschinis, R. Gaeta, and M. Ribaudo. GreatSPN 1.7: graphical
editor and analyzer for timed and stochastic Petri nets. Performance Evaluation,
24 (1-2):47-68, 1995.
14. G. Ciardo, J.K. Muppala, and K.S. Trivedi. SPNP: stochastic Petri net package.
In Proc. 3rd Int. Workshop on Petri Nets and Performance Models, pp. 142–151,
IEEE CS Press, 1989.
15. G. Clark, S. Gilmore, and J. Hillston. Specifying performance measures for PEPA.
In Formal Methods for Real-Time and Probabilistic Systems, LNCS 1601: 211–227,
288 C. Baier et al.
16. E. Clarke, E. Emerson, and A. Sistla. Automatic verification of finite-state concur-

rent systems using temporal logic specifications. ACM Transactions on Program-
ming Languages and Systems, 8: 244–263, 1986.
17. E. Clarke, O. Grumberg, and D. Peled. Model Checking. MIT Press, 1999.
18. E. Clarke and R. Kurshan. Computer-aided verification. IEEE Spectrum, 33(6):
61–67, 1996.
19. J. Desharnais and P. Panangaden. Continuous stochastic logic characterizes bisim-
ulation of continuous-time Markov processes. Journal of Logic and Algebraic Pro-
gramming, to appear, 2001.
20. S. Derisavi, H. Hermanns, and W.H. Sanders. Optimal state space lumping in
Markov models. 2002. submitted for publication.
21. W. Feller. An Introduction to Probability Theory and its Applications. John Wiley
& Sons, 1968.
22. R. German. Performance Analysis of Communication Systems: Modeling with Non-
Markovian Stochastic Petri Nets. John Wiley & Sons, 2000.
23. S. Gilmore and J. Hillston. The PEPA workbench: a tool to support a process
algebra-based approach to performance modelling. In Computer Performance Eval-
uation, Modeling Techniques and Tools, LNCS 794: 353-368, Springer-Verlag, 1994.
24. A. Goyal and A.N. Tantawi. A measure of guaranteed availability and its numerical
evaluation. IEEE Transactions on Computers, 37: 25–32, 1988.
25. W.K. Grassmann. Finding transient solutions in Markovian event systems through
randomization. In Numerical Solution of Markov Chains, pp. 357–371, Marcel
Dekker Inc, 1991.
26. D. Gross and D.R. Miller. The randomization technique as a modeling tool and
solution procedure for transient Markov chains. Operations Research 32(2): 343–
361, 1984.
27. D. Harel. Statecharts: a visual formalism for complex systems. Science of Computer
Programming, 8: 231–274, 1987.
28. B.R. Haverkort. Performance of Computer Communication Systems: A Model-
Based Approach. John Wiley & Sons, 1998.
29. B.R. Haverkort, R. Marie, G. Rubino, and K.S. Trivedi (editors). Performability
Modelling: Techniques and Tools. John Wiley & Sons, 2001.
30. B.R. Haverkort and I. Niemegeers. Performability modelling tools and techniques.
Performance Evaluation, 25: 17–40, 1996.
31. B.R. Haverkort, H. Hermanns, and J.-P. Katoen. The use of model checking tech-
niques for quantitative dependability evaluation. In IEEE Symposium on Reliable
Distributed Systems., pp. 228–238. IEEE CS Press, 2000.
32. B.R. Haverkort, L. Cloth, H. Hermanns, J.-P. Katoen, and C. Baier. Model
checking CSRL-specified performability properties. In 5th Int. Workshop on Per-
formability Modeling of Computer and Communication Systems, Erlangen, Ar-
beitsberichte des IMMD, 34 (13), 2001. 2001.
33. B.R. Haverkort, L. Cloth, H. Hermanns, J.-P. Katoen, and C. Baier. Model check-
ing performability properties. In Proc. IEEE Int’l Conference on Dependable Sys-
tems and Networks, IEEE CS press, 2002.
34. H. Hermanns. Construction and verfication of performance and reliability models.
Bulletin of the EATCS, 74:135-154, 2001.
35. H. Hermanns, U. Herzog, and J.-P. Katoen. Process algebra for performance eval-
uation. Theoretical Computer Science, 274(1-2):43–87, 2002.
36. H. Hermanns, U. Herzog, U. Klehmet, V. Mertsiotakis, and M. Siegle. Compo-
sitional performance modelling with the TIPPtool. Performance Evaluation,
39(1-4): 5–35, 2000.
37. H. Hermanns, J.-P. Katoen, J. Meyer-Kayser, and M. Siegle. A Markov chain model
checker. In Tools and Algorithms for the Construction and Analysis of Systems,
LNCS 1785: 347–362, Springer-Verlag, 2000.
38. H. Hermanns, J.-P. Katoen, J. Meyer-Kayser, and M. Siegle. Towards model check-
ing stochastic process algebra. In Integrated Formal Methods, LNCS 1945: 420–439,
39. H. Hermanns and M. Siegle. Bisimulation algorithms for stochastic process alge-
bras and their BDD-based implementation. In Formal Methods for Real-Time and
Probabilistic Systems, LNCS 1601: 244–265, Springer-Verlag, 1999.
40. J. Hillston. A Compositional Approach to Performance Modelling. Cambridge
University Press, 1996.
41. G.J. Holzmann. The model checker Spin. IEEE Transactions on Software Engi-
neering, 23(5): 279–295, 1997.
42. G. Horton, V. Kulkarni, D. Nicol, K. Trivedi. Fluid stochastic Petri nets: Theory,
application and solution techniques. Eur. J. Oper. Res., 105(1): 184–201,1998.
43. R.A. Howard. Dynamic Probabilistic Systems; Volume 1: Markov Models. John
Wiley & Sons, 1971.
44. G.G. Infante-Lopez, H. Hermanns, and J.-P. Katoen. Beyond memoryless distri-
butions: Model checking semi-Markov chains. In Process Algebra and Probabilistic
Methods, LNCS 2165: 57–70, Springer-Verlag, 2001.
45. A. Jensen. Markov chains as an aid in the study of Markov processes. Skandinavisk
Aktuarietidskrift 36: 87–91, 1953.
46. J.-P. Katoen, M.Z. Kwiatkowska, G. Norman, and D. Parker. Faster and symbolic
CTMC model checking. In Process Algebra and Probabilistic Methods, LNCS 2165:
23–38, Springer-Verlag, 2001.
47. J.G. Kemeny and J.L. Snell. Finite Markov Chains. Van Nostrand, 1960.
48. V.G. Kulkarni. Modeling and Analysis of Stochastic Systems. Chapman & Hall,
1995.
49. M.Z. Kwiatkowska, G. Norman, and A. Pacheco. Model checking CSL until for-
mulae with random time bounds. In Process Algebra and Probabilistic Methods,
LNCS 2399, Springer-Verlag, 2002.
50. K.L. McMillan. Symbolic Model Checking. Kluwer Academic Publishers, 1993.
51. J.F. Meyer. On evaluating the performability of degradable computing systems.
IEEE Transactions on Computers, 29(8): 720–731, 1980.
52. J.F. Meyer. Closed-form solutions of performability, IEEE Transactions on Com-
puters, 31(7): 648–657, 1982.
53. J.F. Meyer. Performability: a retrospective and some pointers to the future. Per-
formance Evaluation, 14(3-4): 139–156, 1992.
54. W.D. Obal II and W.H. Sanders. State-space support for path-based reward vari-
ables. Performance Evaluation, 35: 233–251, 1999.
55. D. Peled. Software Reliability Methods. Springer-Verlag, 2001.
56. W.H. Sanders, W.D. Obal II, M.A. Qureshi, and F.K. Widnajarko. The UltraSAN
modeling environment. Performance Evaluation, 24: 89–115, 1995.
57. B. Sericola. Occupation times in Markov processes. Stochastic Models, 16(5): 339–
351, 2000.
58. E. de Souza e Silva and H.R. Gail. Performability analysis of computer systems:
from model specification to solution. Perf. Ev., 14: 157–196, 1992.
59. W.J. Stewart. Introduction to the Numerical Solution of Markov Chains. Princeton
University Press, 1994.
60. H.C. Tijms, R. Veldman. A fast algorithm for the transient reward distribution in
continuous-time Markov chains, Operation Research Letters, 26: 155–158, 2000.
M e a s u r e m e n t-B a s e d A n a ly s is o f S y s te m D e p e n d a b ility
U s in g F a u lt I n je c tio n a n d F ie ld F a ilu r e D a ta
R a v is h a n k a r K . Iy e r a n d Z b ig n ie w K a lb a rc z y k
C e n te r fo r R e lia b le a n d H ig h -P e r fo rm a n c e C o m p u tin g
U n iv e r s ity o f Illin o is a t U rb a n a - C h a m p a ig n
1 3 0 8 W . M a in S t., U r b a n a , IL 6 1 8 0 1 -2 3 0 7
{ i y e r , k a l b a r } @ c r h c . u i u c . e d u
A b s tr a c t. T h e d is c u s s io n in th is p a p e r fo c u s e s o n th e is s u e s in v o lv e d in
a n a ly z in g th e a v a ila b ility o f n e tw o rk e d s y s te m s u s in g fa u lt in je c tio n a n d th e
fa ilu re d a ta c o lle c te d b y th e lo g g in g m e c h a n is m s b u ilt in to th e s y s te m . In
p a rtic u la r w e a d d re s s : (1 ) a n a ly s is in th e p r o to ty p e p h a s e u s in g p h y s ic a l fa u lt
in je c tio n to a n a c tu a l s y s te m . W e u s e e x a m p le o f fa u lt in je c tio n -b a s e d
e v a lu a tio n o f a s o ftw a re -im p le m e n te d fa u lt to le ra n c e (S IF T ) e n v iro n m e n t (b u ilt
a ro u n d a s e t o f s e lf-c h e c k in g p ro c e s s e s c a lle d A R M O R S ) th a t p ro v id e s e rro r
d e te c tio n a n d re c o v e ry s e rv ic e s to s p a c e b o rn e s c ie n tific a p p lic a tio n s a n d (2 )
m e a s u r e m e n t-b a s e d a n a ly s is o f s y s te m s in th e fie ld . W e u s e e x a m p le o f L A N o f
W in d o w s N T b a s e d c o m p u te rs to p re s e n t m e th o d s fo r c o lle c tin g a n d a n a ly z in g
fa ilu re d a ta to c h a ra c te riz e n e tw o rk s y s te m d e p e n d a b ility . B o th , fa u lt in je c tio n
a n d fa ilu re d a ta a n a ly s is e n a b le u s to s tu d y n a tu ra lly o c c u rrin g e rro rs a n d to
p ro v id e fe e d b a c k to s y s te m d e s ig n e rs o n p o te n tia l a v a ila b ility b o ttle n e c k s . F o r
e x a m p le , th e s tu d y o f fa ilu re s in a n e tw o rk o f W in d o w s N T m a c h in e s re v e a ls
th a t m o s t o f th e p ro b le m s th a t le a d to re b o o ts a re s o ftw a re re la te d a n d th a t
th o u g h th e a v e ra g e a v a ila b ility e v a lu a te s to o v e r 9 9 % , a ty p ic a l m a c h in e , o n
a v e ra g e , p ro v id e s a c c e p ta b le s e rv ic e o n ly a b o u t 9 2 % o f th e tim e .
1 I n tr o d u c tio n
T h e d e p e n d a b ility o f a s y s te m c a n b e e x p e rim e n ta lly e v a lu a te d a t d iffe re n t p h a s e s o f

its life c y c le . In th e d e s ig n p h a s e , c o m p u te r-a id e d d e s ig n (C A D ) e n v iro n m e n ts a re
u s e d to e v a lu a te th e d e s ig n v ia s im u la tio n , in c lu d in g s im u la te d fa u lt in je c tio n . S u c h
fa u lt in je c tio n te s ts th e e ffe c tiv e n e s s o f fa u lt-to le ra n t m e c h a n is m s a n d e v a lu a te s
s y s te m d e p e n d a b ility , p ro v id in g tim e ly fe e d b a c k to s y s te m d e s ig n e rs . S im u la tio n ,
h o w e v e r, re q u ire s a c c u ra te in p u t p a ra m e te rs a n d v a lid a tio n o f o u tp u t re s u lts .
A lth o u g h th e p a ra m e te r e s tim a te s c a n b e o b ta in e d fro m p a s t m e a s u re m e n ts , th is is
o fte n c o m p lic a te d b y d e s ig n a n d te c h n o lo g y c h a n g e s . In th e p r o to ty p e p h a s e , th e
s y s te m ru n s u n d e r c o n tro lle d w o rk lo a d c o n d itio n s . In th is s ta g e , c o n tro lle d p h y s ic a l
fa u lt in je c tio n is u s e d to e v a lu a te th e s y s te m b e h a v io r u n d e r fa u lts , in c lu d in g th e
d e te c tio n c o v e ra g e a n d th e re c o v e ry c a p a b ility o f v a rio u s fa u lt to le ra n c e m e c h a n is m s .
F a u lt in je c tio n o n th e re a l s y s te m c a n p ro v id e in fo rm a tio n a b o u t th e fa ilu re p ro c e s s ,
fro m fa u lt o c c u rre n c e to s y s te m re c o v e ry , in c lu d in g e rro r la te n c y , p ro p a g a tio n ,
d e te c tio n , a n d re c o v e ry (w h ic h m a y in v o lv e re c o n fig u ra tio n ). In th e o p e r a tio n a l
p h a s e , a d ire c t m e a s u re m e n t-b a s e d a p p ro a c h c a n b e u s e d to m e a s u re s y s te m s in th e
M e a s u re m e n t-B a s e d A n a ly s is o f S y s te m D e p e n d a b ility 2 9 1
fie ld u n d e r re a l w o rk lo a d s . T h e c o lle c te d d a ta c o n ta in a la rg e a m o u n t o f in fo rm a tio n

a b o u t n a tu ra lly o c c u rrin g e rro rs /fa ilu re s . A n a ly s is o f th is d a ta c a n p ro v id e
u n d e rs ta n d in g o f a c tu a l e rro r/fa ilu re c h a ra c te ris tic s a n d in s ig h t in to a n a ly tic a l m o d e ls .
A lth o u g h m e a s u re m e n t-b a s e d a n a ly s is is u s e fu l fo r e v a lu a tin g th e re a l s y s te m , it is
lim ite d to d e te c te d e rro rs . F u rth e r, c o n d itio n s in th e fie ld c a n v a ry w id e ly , c a s tin g
d o u b t o n th e s ta tis tic a l v a lid ity o f th e re s u lts . T h u s , a ll th re e a p p ro a c h e s – s im u la te d
fa u lt in je c tio n , p h y s ic a l fa u lt in je c tio n , a n d m e a s u re m e n t-b a s e d a n a ly s is – a re
re q u ire d fo r a c c u ra te d e p e n d a b ility a n a ly s is .
In th e d e s ig n p h a s e , s im u la te d fa u lt in je c tio n c a n b e c o n d u c te d a t d iffe re n t le v e ls :
th e e le c tric a l le v e l, th e lo g ic le v e l, a n d th e fu n c tio n le v e l. T h e o b je c tiv e s o f s im u la te d
fa u lt in je c tio n a re to d e te rm in e d e p e n d a b ility b o ttle n e c k s , th e c o v e ra g e o f e rro r
d e te c tio n /re c o v e ry m e c h a n is m s , th e e ffe c tiv e n e s s o f re c o n fig u ra tio n s c h e m e s ,
p e rfo rm a n c e lo s s , a n d o th e r d e p e n d a b ility m e a s u re s . T h e fe e d b a c k fro m s im u la tio n
c a n b e e x tre m e ly u s e fu l in c o s t-e ffe c tiv e re d e s ig n o f th e s y s te m . F o r th o ro u g h
d is c u s s io n o f d iffe re n t te c h n iq u e s fo r s im u la te d fa u lt in je c tio n c a n b e fo u n d in [1 0 ].
In th e p ro to ty p e p h a s e , w h ile th e o b je c tiv e s o f p h y s ic a l fa u lt in je c tio n a re s im ila r to
th o s e o f s im u la te d fa u lt in je c tio n , th e m e th o d s d iffe r ra d ic a lly b e c a u s e re a l fa u lt
in je c tio n a n d m o n ito rin g fa c ilitie s a re in v o lv e d . P h y s ic a l fa u lts c a n b e in je c te d a t th e
h a rd w a re le v e l (lo g ic o r e le c tric a l fa u lts ) o r a t th e s o ftw a re le v e l (c o d e o r d a ta
c o rru p tio n ). H e a v y -io n ra d ia tio n te c h n iq u e s c a n a ls o b e u s e d to in je c t fa u lts a n d
s tre s s th e s y s te m . T h e d e ta ile d tre a tm e n t o f th e in s tru m e n ta tio n in v o lv e d in fa u lt
in je c tio n e x p e rim e n ts u s in g re a l e x a m p le s , in c lu d in g s e v e ra l fa u lt in je c tio n
e n v iro n m e n ts is g iv e n in [1 0 ].
In th e o p e ra tio n a l p h a s e , m e a s u re m e n t-b a s e d a n a ly s is m u s t a d d re s s is s u e s s u c h a s
h o w to m o n ito r c o m p u te r e rro rs a n d fa ilu re s a n d h o w to a n a ly z e m e a s u re d d a ta to
q u a n tify s y s te m d e p e n d a b ility c h a ra c te ris tic s . A lth o u g h m e th o d s fo r th e d e s ig n a n d
e v a lu a tio n o f fa u lt-to le ra n t s y s te m s h a v e b e e n e x te n s iv e ly re s e a rc h e d , little is k n o w n
a b o u t h o w w e ll th e s e s tra te g ie s w o rk in th e fie ld . A s tu d y o f p ro d u c tio n s y s te m s is
v a lu a b le n o t o n ly fo r a c c u ra te e v a lu a tio n b u t a ls o fo r id e n tify in g re lia b ility
b o ttle n e c k s in s y s te m d e s ig n . In [1 0 ] th e m e a s u re m e n t-b a s e d a n a ly s is is b a s e d o n o v e r
2 0 0 m a c h in e -y e a rs o f d a ta g a th e re d fro m IB M , D E C , a n d T a n d e m s y s te m s (n o te th a t
th e s e a re n o t n e tw o rk e d s y s te m s ).
In th is p a p e r w e d is c u s s th e c u rre n t re s e a rc h in th e a re a o f e x p e rim e n ta l a n a ly s is o f
c o m p u te r s y s te m d e p e n d a b ility in th e c o n te x t o f m e th o d o lo g ie s s u ite d fo r
m e a s u re m e n t-b a s e d d e p e n d a b ility a n a ly s is o f n e tw o rk e d s y s te m s . In p a rtic u la r w e
fo c u s o n :
A n a ly s is in th e p r o to ty p e p h a s e u s in g p h y s ic a l fa u lt in je c tio n to a n a c tu a l s y s te m .
W e u s e e x a m p le o f fa u lt in je c tio n -b a s e d e v a lu a tio n o f a s o ftw a re -im p le m e n te d
fa u lt to le ra n c e (S IF T ) e n v iro n m e n t (b u ilt a ro u n d a s e t o f s e lf-c h e c k in g p ro c e s s e s
c a lle d A R M O R S , [1 3 ]) th a t p ro v id e s e rro r d e te c tio n a n d re c o v e ry s e rv ic e s to
s p a c e b o rn e s c ie n tific a p p lic a tio n s .
M e a s u r e m e n t b a s e d a n a ly s is o f s y s te m s in th e fie ld . W e u s e e x a m p le o f L A N o f
W in d o w s N T b a s e d c o m p u te rs to p re s e n t m e th o d s fo r c o lle c tin g a n d a n a ly z in g
fa ilu re d a ta to c h a ra c te riz e n e tw o rk s y s te m d e p e n d a b ility .
2 9 2 R .K . Iy e r a n d Z . K a lb a r c z y k
2 F a u lt/E r r o r I n je c tio n C h a r a c te r iz a tio n o f th e S I F T

E n v ir o n m e n t fo r S p a c e b o r n e A p p lic a tio n s
F a u lt/e rro r in je c tio n is a n a ttra c tiv e a p p ro a c h to th e e x p e rim e n ta l v a lid a tio n o f

d e p e n d a b le s y s te m s . T h e o b je c tiv e o f fa u lt in je c tio n is to m im ic th e e x is te n c e o f
fa u lts a n d e rro rs a n d h e n c e to e n a b le s tu d y in g th e fa ilu re b e h a v io r o f th e s y s te m .
F a u lt\e rro r in je c tio n c a n b e e m p lo y e d to c o n d u c t d e ta ile d s tu d ie s o f th e c o m p le x
in te r a c tio n s b e tw e e n f a u lt a n d f a u lt h a n d lin g m e c h a n is m s , e .g ., [ 1 ] a n d [ 1 0 ] . I n
p a rtic u la r fa u lt in je c tio n a im s a t (1 ) e x p o s in g d e fic ie n c ie s o f fa u lt to le ra n c e
m e c h a n is m s ( i.e ., f a u lt r e m o v a l) , e .g ., [ 3 ] , a n d ( 2 ) e v a lu a tin g c o v e r a g e o f f a u lt
to le r a n c e m e c h a n is m s ( i.e ., f a u lt f o r e c a s tin g , e .g ., [ 2 ] . N u m b e r o f to o ls w e r e p r o p o s e d
to s u p p o r t f a u lt in je c tio n a n a ly s is a n d e v a lu a tio n o f s y s te m s , e .g ., F E R R A R I [ 1 4 ] ,
F IA T [5 ], a n d N F T A P E [2 2 ].
T h is s e c tio n p re s e n ts a n e x a m p le o f a p p ly in g fa u lt/e rro r in je c tio n in a s s e s s in g fa u lt
to le ra n c e m e c h a n is m s o f s o ftw a re im p le m e n te d fa u lt to le ra n c e e n v iro n m e n t fo r
s p a c e b o rn e a p p lic a tio n s . In tra d itio n a l s p a c e b o rn e a p p lic a tio n s , o n b o a rd in s tru m e n ts
c o lle c t a n d tra n s m it ra w d a ta b a c k to E a rth fo r p ro c e s s in g . T h e a m o u n t o f s c ie n c e
th a t c a n b e d o n e is c le a rly lim ite d b y th e te le m e try b a n d w id th to E a rth . T h e R e m o te
E x p lo ra tio n a n d E x p e rim e n ta tio n (R E E ) p ro je c t a t N A S A /J P L in te n d s to u s e a c lu s te r
o f c o m m e rc ia l o ff-th e -s h e lf (C O T S ) p ro c e s s o rs to a n a ly z e th e d a ta o n b o a rd a n d s e n d
o n ly th e re s u lts b a c k to E a rth . T h is a p p ro a c h n o t o n ly s a v e s d o w n lin k b a n d w id th , b u t
a ls o p ro v id e s th e p o s s ib ility o f m a k in g re a l-tim e , a p p lic a tio n -o rie n te d d e c is io n s .
W h ile fa ilu re s in th e s c ie n tific a p p lic a tio n s a re n o t c ritic a l to th e s p a c e c ra ft’s h e a lth
in th is e n v iro n m e n t (s p a c e c ra ft c o n tro l is p e rfo rm e d b y a s e p a ra te , tru s te d c o m p u te r),
th e y c a n b e e x p e n s iv e n o n e th e le s s . T h e c o m m e rc ia l c o m p o n e n ts u s e d b y R E E a re
e x p e c te d to e x p e rie n c e a h ig h ra te o f ra d ia tio n -in d u c e d tra n s ie n t e rro rs in s p a c e
(ra n g in g fro m o n e p e r d a y to s e v e ra l p e r h o u r), a n d d o w n tim e d ire c tly le a d s to th e
lo s s o f s c ie n tific d a ta . H e n c e , a fa u lt-to le ra n t e n v iro n m e n t is n e e d e d to m a n a g e th e
R E E a p p lic a tio n s .
T h e m is s io n s e n v is io n e d to ta k e a d v a n ta g e o f th e S IF T e n v iro n m e n t fo r e x e c u tin g
M P I-b a s e d [1 9 ] s c ie n tific a p p lic a tio n s in c lu d e th e M a rs R o v e r, th e O rb itin g T h e rm a l
Im a g in g S p e c tro m e te r (O T IS ). M o re d e ta ils o n th e a p p lic a tio n s a n d th e fu ll
d e p e n d a b ility a n a ly s is c a n b e fo u n d in [3 1 ] a n d [3 2 ], re s p e c tiv e ly .
T h e re m a in in g o f th is s e c tio n p re s e n ts a m e th o d o lo g y fo r e x p e rim e n ta lly
e v a lu a tin g a d is trib u te d S IF T e n v iro n m e n t e x e c u tin g a n R E E te x tu re a n a ly s is
p ro g ra m fro m th e M a rs R o v e r m is s io n . E rro rs a re in je c te d s o th a t th e c o n s e q u e n c e s o f
fa u lts c a n b e s tu d ie d . T h e e x p e rim e n ts d o n o t a tte m p t to a n a ly z e th e c a u s e o f th e
e rro rs o r fa u lt c o v e ra g e . R a th e r, th e e rro r in je c tio n s p ro g re s s iv e ly s tre s s th e d e te c tio n
a n d re c o v e ry m e c h a n is m s o f th e S IF T e n v iro n m e n t:
1 . S IG IN T /S IG S T O P in je c tio n s . M a n y fa u lts a re k n o w n to le a d to c ra s h a n d h a n g
fa ilu re s . S IG IN T /S IG S T O P in je c tio n s re p ro d u c e th e s e firs t-o rd e r e ffe c ts o f fa u lts
in a c o n tro lle d m a n n e r th a t m in im iz e s th e p o s s ib ility o f e rro r p ro p a g a tio n o r
c h e c k p o in t c o rru p tio n .
2 . R e g is te r a n d te x t-s e g m e n t in je c tio n s . T h e n e x t s e t o f e rro r in je c tio n s re p re s e n t
c o m m o n e ffe c ts o f s in g le -e v e n t u p s e ts b y c o rru p tin g th e s ta te in th e re g is te r s e t a n d
te x t s e g m e n t m e m o ry . T h is in tro d u c e s th e p o s s ib ility o f e rro r p ro p a g a tio n a n d
c h e c k p o in t c o rru p tio n .
3 . H e a p in je c tio n s . T h e th ird s e t o f e x p e rim e n ts fu rth e r b ro a d e n th e fa ilu re s c e n a rio s

b y in je c tin g e rro rs in th e d y n a m ic h e a p d a ta to m a x im iz e th e p o s s ib ility o f e rro r
p ro p a g a tio n . T h e re s u lts fro m th e s e e x p e rim e n ts a re e s p e c ia lly u s e fu l in e v a lu a tin g
h o w w e ll in tra p ro c e s s s e lf-c h e c k s lim it e rro r p ro p a g a tio n .
R E E c o m p u ta tio n a l m o d e l. T h e R E E c o m p u ta tio n a l m o d e l c o n s is ts o f a tru s te d ,
ra d ia tio n -h a rd e n e d (ra d -h a rd ) S p a c e c ra ft C o n tro l C o m p u te r (S C C ) a n d a c lu s te r o f
C O T S p ro c e s s o rs th a t e x e c u te th e S IF T e n v iro n m e n t a n d th e s c ie n tific a p p lic a tio n s .
T h e S C C s c h e d u le s a p p lic a tio n s fo r e x e c u tio n o n th e R E E c lu s te r th ro u g h th e S IF T
R E E te s tb e d c o n fig u r a tio n . T h e e x p e rim e n ts w e re e x e c u te d o n a 4 -n o d e te s tb e d
c o n s is tin g o f P o w e rP C 7 5 0 p ro c e s s o rs ru n n in g th e L y n x re a l-tim e o p e ra tin g s y s te m .
N o d e s a re c o n n e c te d th ro u g h 1 0 0 M b p s E th e rn e t in th e te s tb e d . B e tw e e n o n e a n d tw o
m e g a b y te s o f R A M o n e a c h p ro c e s s o r w e re s e t a s id e to e m u la te lo c a l n o n v o la tile
m e m o ry a v a ila b le to e a c h n o d e . T h e n o n v o la tile R A M is e x p e c te d to s to re te m p o ra ry
s ta te in f o r m a tio n th a t m u s t s u r v iv e h a r d w a r e r e b o o ts ( e .g ., c h e c k p o in tin g in f o r m a tio n
n e e d e d d u rin g re c o v e ry ). N o n v o la tile m e m o ry v is ib le to a ll n o d e s is e m u la te d b y a
re m o te file s y s te m re s id in g o n a S u n w o rk s ta tio n th a t s to re s p ro g ra m e x e c u ta b le s ,
a p p lic a tio n in p u t d a ta , a n d a p p lic a tio n o u tp u t d a ta .
2 .1 S I F T E n v ir o n m e n t fo r R E E
T h e R E E a p p lic a tio n s a re p ro te c te d b y a S IF T e n v iro n m e n t d e s ig n e d a ro u n d a s e t o f

s e lf-c h e c k in g p ro c e s s e s c a lle d A R M O R S (A d a p tiv e R e c o n fig u ra b le M o b ile O b je c ts o f
R e lia b ility ) th a t e x e c u te o n e a c h n o d e in th e te s tb e d . A R M O R s c o n tro l a ll o p e ra tio n s in
th e S IF T e n v iro n m e n t a n d p ro v id e e rro r d e te c tio n a n d re c o v e ry to th e a p p lic a tio n a n d
to th e A R M O R p ro c e s s e s th e m s e lv e s . W e p ro v id e a b rie f s u m m a ry o f th e A R M O R -b a s e d
S IF T e n v iro n m e n t a s im p le m e n te d fo r th e R E E a p p lic a tio n s ; a d d itio n a l d e ta ils o f th e
g e n e ra l A R M O R a rc h ite c tu re a p p e a r in [1 3 ].
S I F T A r c h ite c tu r e
A n A R M O R is a m u ltith re a d e d p ro c e s s in te rn a lly s tru c tu re d a ro u n d o b je c ts c a lle d
e le m e n ts th a t c o n ta in th e ir o w n p riv a te d a ta a n d p ro v id e e le m e n ta ry fu n c tio n s o r
s e r v ic e s ( e .g ., d e te c tio n a n d r e c o v e r y f o r r e m o te A R M O R p r o c e s s e s , in te r n a l s e lf -
c h e c k in g m e c h a n is m s , o r c h e c k p o in tin g s u p p o rt). T o g e th e r, th e e le m e n ts c o n s titu te
th e fu n c tio n a lity th a t d e fin e s a n A R M O R ’s b e h a v io r. A ll A R M O R s c o n ta in a b a s ic
s e t o f e le m e n ts th a t p ro v id e a c o re fu n c tio n a lity , in c lu d in g th e a b ility to (1 )
im p le m e n t re lia b le p o in t-to -p o in t m e s s a g e c o m m u n ic a tio n b e tw e e n A R M O R s , (2 )
c o m m u n ic a te w ith th e lo c a l d a e m o n A R M O R p ro c e s s , (3 ) re s p o n d to h e a rtb e a ts fro m
th e lo c a l d a e m o n , a n d (4 ) c a p tu re A R M O R s ta te . S p e c ific A R M O R s e x te n d th is c o re
fu n c tio n a lity b y a d d in g e x tra e le m e n ts .
T y p e s o f A R M O R s . T h e S IF T e n v iro n m e n t fo r R E E a p p lic a tio n s c o n s is ts o f fo u r
k in d s o f A R M O R p ro c e s s e s : a F a u lt T o le ra n c e M a n a g e r (F T M ), a H e a rtb e a t
A R M O R , d a e m o n s , a n d E x e c u tio n A R M O R s
F a u lt T o le r a n c e M a n a g e r (F T M ). A s in g le F T M e x e c u te s o n o n e o f th e n o d e s a n d
is re s p o n s ib le fo r re c o v e rin g fro m A R M O R a n d n o d e fa ilu re s a s w e ll a s in te rfa c in g
w ith th e e x te rn a l S p a c e c ra ft C o n tro l C o m p u te r (S C C ).
H e a r tb e a t A R M O R . T h e H e a rtb e a t A R M O R e x e c u te s o n a n o d e s e p a ra te fro m th e
F T M . Its s o le re s p o n s ib ility is to d e te c t a n d re c o v e r fro m fa ilu re s in th e F T M
th ro u g h th e p e rio d ic p o llin g fo r liv e n e s s .
D a e m o n s . E a c h n o d e o n th e n e tw o rk e x e c u te s a d a e m o n p ro c e s s . D a e m o n s a re th e
g a te w a y s fo r A R M O R -to -A R M O R c o m m u n ic a tio n , a n d th e y d e te c t fa ilu re s in th e
lo c a l A R M O R s .
E x e c u tio n A R M O R s . E a c h a p p lic a tio n p ro c e s s is d ire c tly o v e rse e n b y a lo c a l
E x e c u tio n A R M O R .
E x e c u tin g R E E A p p lic a tio n s

F ig . 1 illu s tr a te s a c o n f ig u r a tio n o f th e S I F T e n v iro n m e n t w ith tw o M P I a p p lic a tio n s
(fro m th e M a rs R o v e r a n d O T IS m is s io n s ) e x e c u tin g o n a fo u r-n o d e te s tb e d . A rro w s
in th e fig u re d e p ic t th e re la tio n s h ip s a m o n g th e v a rio u s p ro c e s s e s ( e .g ., th e a p p lic a tio n
s e n d s p ro g re s s in d ic a to rs to th e E x e c u tio n A R M O R s , th e F T M is re s p o n s ib le fo r
re c o v e rin g fro m fa ilu re s in th e H e a rtb e a t A R M O R , a n d th e F T M h e a rtb e a ts th e d a e m o n
p ro c e sse s).
E a c h a p p lic a tio n p ro c e s s is lin k e d w ith a S IF T in te rfa c e th a t e s ta b lis h e s a o n e -w a y
c o m m u n ic a tio n c h a n n e l w ith th e lo c a l E x e c u tio n A R M O R a t a p p lic a tio n in itia liz a tio n .
T h e a p p lic a tio n p ro g ra m m e r c a n u s e th is in te rfa c e to in v o k e a v a rie ty o f fa u lt
to le ra n c e s e rv ic e s p ro v id e d b y th e A R M O R .
E r r o r D e te c tio n H ie r a r c h y
T h e to p -d o w n e rro r d e te c tio n h ie ra rc h y c o n s is ts o f:
N o d e a n d d a e m o n e r r o r s . T h e F T M p e rio d ic a lly e x c h a n g e s h e a rtb e a t m e s s a g e s
w ith e a c h d a e m o n (e v e ry 1 0 s in o u r e x p e rim e n ts ) to d e te c t n o d e c ra s h e s a n d
h a n g s . If th e F T M d o e s n o t re c e iv e a re s p o n s e b y th e n e x t h e a rtb e a t ro u n d , it
a s s u m e s th a t th e n o d e h a s fa ile d . A d a e m o n fa ilu re is tre a te d a s a n o d e fa ilu re .
A R M O R e r r o r s . E a c h A R M O R c o n ta in s a s e t o f a s s e rtio n s o n its in te rn a l s ta te ,
in c lu d in g r a n g e c h e c k s , v a lid ity c h e c k s o n d a ta ( e .g ., a v a lid A R M O R I D ) , a n d d a ta
s tru c tu re in te g rity c h e c k s . O th e r in te rn a l s e lf-c h e c k s a v a ila b le to th e A R M O R s
in c lu d e p re e m p tiv e c o n tro l flo w c h e c k in g , I/O s ig n a tu re c h e c k in g , a n d
d e a d lo c k /liv e lo c k d e te c tio n [4 ]. In o rd e r to lim it e rro r p ro p a g a tio n , th e A R M O R k ills
its e lf w h e n a n in te rn a l c h e c k d e te c ts a n e rro r. T h e d a e m o n d e te c ts c ra s h fa ilu re s in
th e A R M O R s o n th e n o d e v ia o p e ra tin g s y s te m c a lls . T o d e te c t h a n g fa ilu re s , th e
d a e m o n p e rio d ic a lly (e v e ry 1 0 s in th e e x p e rim e n ts ) s e n d s “ A re -y o u -a liv e ? ”
m e s s a g e s to its lo c a l A R M O R s .
R E E a p p lic a tio n s . A ll a p p lic a tio n c ra s h fa ilu re s a re d e te c te d b y th e lo c a l E x e c u tio n
A R M O R . C r a s h f a ilu r e s in th e M P I p r o c e s s w ith r a n k 0 c a n b e d e te c te d b y th e
E x e c u tio n A R M O R th r o u g h o p e r a tin g s y s te m c a lls ( i.e ., w a i t p i d ) . T h e o th e r
E x e c u tio n A R M O R s p e rio d ic a lly c h e c k th a t th e ir M P I p ro c e s s e s (ra n k s 1 th ro u g h n )
a re s till in th e o p e ra tin g s y s te m ’s p ro c e s s ta b le . If n o t, it c o n c lu d e s th a t th e
a p p lic a tio n h a s c ra s h e d . A n a p p lic a tio n p ro c e s s n o tifie s th e lo c a l E x e c u tio n
A R M O R th r o u g h its c o m m u n ic a tio n c h a n n e l b e f o r e e x itin g n o r m a lly s o th a t th e
A R M O R d o e s n o t m is in te r p r e t th is e x it a s a n a b n o r m a l te r m in a tio n .
n e tw o rk
D a e m o n D a e m o n D a e m o n D a e m o n
H e a rtb e a t E x e c u tio n E x e c u tio n E x e c u tio n E x e c u tio n

A R M O R A R M O R F T M A R M O R A R M O R
A R M O R
S IF T S IF T S IF T S IF T
In te rfa c e In te r fa c e In te rfa c e In te r fa c e
L e g e n d :
R o v e r R o v e r O T IS O T IS
H e a rtb e a ts
P ro c e ss P ro c e ss P ro c e ss P ro c e ss
P ro g re ss In d ic a to rs
(ra n k 0 ) (ra n k 1 ) (ra n k 0 ) (ra n k 1 )
R e c o v e ry
N o d e 1 N o d e 2 N o d e 3 N o d e 4
F ig . 1 . S IF T A rc h ite c tu re fo r E x e c u tin g tw o M P I A p p lic a tio n s o n a F o u r-N o d e N e tw o rk .
A p o llin g te c h n iq u e is u s e d to d e te c t a p p lic a tio n h a n g s in w h ic h th e E x e c u tio n

A R M O R p e rio d ic a lly c h e c k s fo r p r o g r e s s in d ic a to r u p d a te s s e n t b y th e a p p lic a tio n . A
p ro g re s s in d ic a to r is a n “ I’m -a liv e ” m e s s a g e c o n ta in in g in fo rm a tio n th a t d e n o te s
a p p lic a tio n p r o g r e s s ( e .g ., a lo o p ite r a tio n c o u n te r ) . I f th e E x e c u tio n A R M O R d o e s n o t
r e c e iv e a p ro g re s s in d ic a to r w ith in a n a p p lic a tio n -s p e c ific tim e p e rio d , th e A R M O R
c o n c lu d e s th a t th e a p p lic a tio n p ro c e s s h a s h u n g .
E r r o r R e c o v e r y
N o d e s . T h e F T M m ig ra te s th e A R M O R a n d a p p lic a tio n p ro c e s s e s th a t w e re e x e c u tin g
o n th e fa ile d n o d e to o th e r w o rk in g n o d e s in th e S IF T e n v iro n m e n t.
A R M O R s . A R M O R s ta te is r e c o v e r e d f r o m a c h e c k p o in t. T o p r o te c t th e A R M O R s ta te
a g a in s t p ro c e s s fa ilu re s , a c h e c k p o in tin g te c h n iq u e c a lle d m ic r o c h e c k p o in tin g is u s e d
[3 0 ]. M ic ro c h e c k p o in tin g le v e ra g e s th e m o d u la r e le m e n t c o m p o s itio n o f th e A R M O R
p ro c e s s to in c re m e n ta lly c h e c k p o in t s ta te o n a n e le m e n t-b y -e le m e n t b a s is .
R E E A p p lic a tio n s . O n d e te c tin g a n a p p lic a tio n fa ilu re , th e E x e c u tio n A R M O R
n o tifie s th e F T M to in itia te re c o v e ry . T h e v e rs io n o f M P I u s e d o n th e R E E te s tb e d
p re c lu d e s in d iv id u a l M P I p ro c e s s e s fro m b e in g re s ta rte d w ith in a n a p p lic a tio n ;
th e re fo re , th e F T M in s tru c ts a ll E x e c u tio n A R M O R s to te rm in a te th e ir M P I p ro c e s s e s
b e fo re re s ta rtin g th e a p p lic a tio n . T h e a p p lic a tio n e x e c u ta b le b in a rie s m u s t b e re lo a d e d
fro m th e re m o te d is k d u rin g re c o v e ry .
2 .2 I n je c tio n E x p e r im e n ts
E rro r in je c tio n e x p e rim e n ts in to th e a p p lic a tio n a n d S IF T p ro c e s s e s w e re c o n d u c te d

to : (1 ) s tre s s th e d e te c tio n a n d re c o v e ry m e c h a n is m s o f th e S IF T e n v iro n m e n t, (2 )
d e te rm in e th e fa ilu re d e p e n d e n c ie s a m o n g S IF T a n d a p p lic a tio n p ro c e s s e s , (3 )
m e a s u re th e S IF T e n v iro n m e n t o v e rh e a d o n a p p lic a tio n p e rfo rm a n c e , (4 ) m e a s u re th e
o v e rh e a d o f re c o v e rin g S IF T p ro c e s s e s a s s e e n b y th e a p p lic a tio n .
1 . S tu d y th e e ffe c ts o f e rro r p ro p a g a tio n a n d th e e ffe c tiv e n e s s o f in te rn a l s e lf-
c h e c k s in lim itin g e rro r p ro p a g a tio n .
T h e e x p e rim e n ts u s e d N F T A P E , a s o ftw a re fra m e w o rk fo r c o n d u c tin g in je c tio n
c a m p a ig n s [2 2 ].
E r r o r M o d e ls
T h e e rro r m o d e ls u s e d th e in je c tio n e x p e rim e n ts re p re s e n t a c o m b in a tio n o f th o s e
e m p lo y e d in s e v e ra l p a s t e x p e rim e n ta l s tu d ie s a n d th o s e p ro p o s e d b y J P L e n g in e e rs .
S IG IN T /S IG S T O P . T h e s e s ig n a ls w e re u s e d to m im ic “ c le a n ” c ra s h a n d h a n g
fa ilu re s a s d e s c rib e d in th e in tro d u c tio n .
R e g is te r a n d te x t-s e g m e n t e r r o r s . F a u lt a n a ly s is h a s p re d ic te d th a t th e m o s t
p re v a le n t fa u lts in th e ta rg e te d s p a c e b o rn e e n v iro n m e n t w ill b e s in g le -b it m e m o ry
a n d re g is te r fa u lts , a lth o u g h s h rin k in g fe a tu re s iz e s h a v e ra is e d th e lik e lih o o d o f
c lo c k e rro rs a n d m u ltip le -b it flip s in fu tu re te c h n o lo g ie s . S e v e ra l e rro r in je c tio n s
w e re u n ifo rm ly d is trib u te d w ith in e a c h ru n s in c e e a c h in je c tio n w a s u n lik e ly to
c a u s e a n im m e d ia te fa ilu re , a n d o n ly th e m o s t fre q u e n tly u s e d re g is te rs a n d
fu n c tio n s in th e te x t s e g m e n t w e re ta rg e te d fo r in je c tio n .
H e a p e r r o r s . H e a p in je c tio n s w e re u s e d to s tu d y th e e ffe c ts o f e rro r p ro p a g a tio n .
O n e e rro r w a s in je c te d p e r ru n in to n o n -p o in te r d a ta v a lu e s o n ly , a n d th e e ffe c ts o f
th e e rro r w e re tra c e d th ro u g h th e s y s te m .
E rro rs w e re n o t in je c te d in to th e o p e ra tin g s y s te m s in c e o u r e x p e rie n c e h a s s h o w n
th a t k e rn e l in je c tio n s ty p ic a lly le d to a c ra s h , le d to a h a n g , o r h a d n o im p a c t.
M a d e ria e t a l. [1 8 ] u s e d th e s a m e R E E te s tb e d to e x a m in e th e im p a c t o f tra n s ie n t
e rro rs o n L y n x O S .
D e fin itio n s a n d M e a s u r e m e n ts
S y s te m , e x p e r im e n t, a n d r u n . W e u s e th e te rm s y s te m to re fe r to th e R E E c lu s te r a n d
a s s o c ia te d s o f tw a r e ( i.e ., th e S I F T e n v ir o n m e n t a n d a p p lic a tio n s ) . T h e s y s te m d o e s
n o t in c lu d e th e ra d ia tio n -h a rd e n e d S C C o r c o m m u n ic a tio n c h a n n e l to th e g ro u n d . A n
e rro r in je c tio n e x p e r im e n t ta rg e te d a s p e c ific p ro c e s s (a p p lic a tio n p ro c e s s , F T M ,
E x e c u tio n A R M O R , o r H e a rtb e a t A R M O R ) u s in g a p a rtic u la r e rro r m o d e l. F o r e a c h
p ro c e s s /e rro r m o d e l p a ir, a s e rie s o f r u n s w e re e x e c u te d in w h ic h o n e o r m o re e rro rs
w e re in je c te d in to th e ta rg e t p ro c e s s .
A c tiv a te d e r r o r s a n d fa ilu r e s . A n in je c tio n c a u s e s a n e rro r to b e in tro d u c e d in to th e
s y s te m ( e .g ., c o r r u p tio n a t a s e le c te d m e m o r y lo c a tio n o r c o r r u p tio n o f th e v a lu e in a
re g is te r). A n e rro r is s a id to b e a c tiv a te d if p ro g ra m e x e c u tio n a c c e s s e s th e e rro n e o u s
v a lu e . A fa ilu r e re fe rs to a p ro c e s s d e v ia tin g fro m its e x p e c te d (c o rre c t) b e h a v io r a s
d e te rm in e d b y a ru n w ith o u t fa u lt in je c tio n . T h e a p p lic a tio n c a n a ls o fa il b y
p ro d u c in g o u tp u t th a t fa lls o u ts id e a c c e p ta b le to le ra n c e lim its a s d e fin e d b y a n
e x te rn a l a p p lic a tio n -p ro v id e d v e rific a tio n p ro g ra m .
U s e r n o tifie d
U s e r s u b m its
A p p s ta rts A p p e n d s o f te rm in a tio n
a p p jo b
S e tu p th e A R M O R s
e n v iro n m e n t u n in s ta lle d
tim e
A c tu a l a p p lic a tio n
e x e c u tio n tim e
P e rc e iv e d a p p lic a tio n
e x e c u tio n tim e
F ig . 2 . P e rc e iv e d v s . A c tu a l E x e c u tio n T im e
A s y s te m fa ilu r e o c c u rs w h e n e ith e r (1 ) th e a p p lic a tio n c a n n o t c o m p le te w ith in a

p re d e fin e d tim e o u t o r (2 ) th e S IF T e n v iro n m e n t c a n n o t re c o g n iz e th a t th e a p p lic a tio n
h a s c o m p le te d s u c c e s s fu lly . S y s te m fa ilu re s re q u ire th a t th e S C C re in itia liz e th e S IF T
e n v iro n m e n t b e fo re c o n tin u in g , b u t th e y d o n o t th re a te n th e S C C o r s p a c e c ra ft
in te g rity 1.
R e c o v e r y tim e . R e c o v e ry tim e is th e in te rv a l b e tw e e n th e tim e a t w h ic h a fa ilu re is
d e te c te d a n d th e tim e a t w h ic h th e ta rg e t p ro c e s s re s ta rts . F o r A R M O R p ro c e s s e s , th is
in c lu d e s th e tim e re q u ire d to re s to re th e A R M O R ’s s ta te fro m c h e c k p o in t. In th e c a s e o f
a n a p p lic a tio n fa ilu re , th e tim e lo s t to ro llin g b a c k to th e m o s t re c e n t a p p lic a tio n
c h e c k p o in t is a c c o u n te d fo r in th e a p p lic a tio n ’s to ta l e x e c u tio n tim e , n o t in th e
re c o v e ry tim e fo r th e a p p lic a tio n .
P e r c e iv e d a p p lic a tio n e x e c u tio n tim e . T h e p e rc e iv e d e x e c u tio n tim e is th e in te rv a l
b e tw e e n th e tim e a t w h ic h th e S C C s u b m its a n a p p lic a tio n fo r e x e c u tio n a n d th e tim e
a t w h ic h th e S IF T e n v iro n m e n t re p o rts to th e S C C th a t th e a p p lic a tio n h a s c o m p le te d .
A c tu a l a p p lic a tio n e x e c u tio n tim e . T h e a c tu a l e x e c u tio n tim e is th e in te rv a l
b e tw e e n th e s ta rt a n d th e e n d o f th e a p p lic a tio n . T h e d iffe re n c e b e tw e e n p e rc e iv e d
a n d a c tu a l e x e c u tio n tim e a c c o u n ts fo r th e tim e re q u ire d to in s ta ll th e E x e c u tio n
A R M O R s b e f o r e r u n n in g th e a p p lic a tio n a n d th e tim e r e q u ir e d to u n in s ta ll th e
E x e c u tio n A R M O R s a fte r th e a p p lic a tio n c o m p le te s (s e e F ig . 2 ). T h is is a fix e d
o v e rh e a d in d e p e n d e n t o f th e a c tu a l a p p lic a tio n e x e c u tio n tim e .
B a s e lin e a p p lic a tio n e x e c u tio n tim e . In th e in je c tio n e x p e rim e n ts , th e p e rc e iv e d
a n d a c tu a l a p p lic a tio n e x e c u tio n tim e s a re c o m p a re d to a b a s e lin e m e a s u re m e n t in
o rd e r to d e te rm in e th e p e rfo rm a n c e o v e rh e a d a d d e d b y th e S IF T e n v iro n m e n t a n d
re c o v e ry . T w o m e a s u re s o f b a s e lin e a p p lic a tio n p e rfo rm a n c e a re u s e d : (1 ) th e
a p p lic a tio n e x e c u tin g w ith o u t th e S IF T e n v iro n m e n t a n d w ith o u t fa u lt in je c tio n a n d
(2 ) th e a p p lic a tio n e x e c u tin g in th e S IF T e n v iro n m e n t b u t w ith o u t fa u lt in je c tio n . T h e
d iffe re n c e b e tw e e n th e s e tw o m e a s u re s p ro v id e s th e o v e rh e a d th a t th e S IF T p ro c e s s e s
im p o s e o n th e a p p lic a tio n . T a b le 1 s h o w s th a t th e S IF T e n v iro n m e n t a d d s le s s th a n
tw o s e c o n d s to th e p e rc e iv e d a p p lic a tio n e x e c u tio n tim e . T h e m e a n a p p lic a tio n
e x e c u tio n tim e a n d re c o v e ry tim e a re c a lc u la te d fo r e a c h fa u lt m o d e l. N in e ty -fiv e
p e rc e n t c o n fid e n c e in te rv a ls (t-d is trib u tio n ) a re a ls o c a lc u la te d fo r a ll m e a s u re m e n ts .
T a b le 1 . B a s e lin e A p p lic a tio n E x e c u tio n T im e
P e rc e iv e d A c tu a l
W ith o u t S IF T 7 5 .7 1 0 .6 5 7 5 .7 1 0 .6 5
W ith S IF T 7 7 .9 7 0 .4 8 7 5 .7 4 0 .4 8
2 .3 C r a s h a n d H a n g F a ilu r e s
T h is s e c tio n p re s e n ts re s u lts fro m S IG IN T a n d S IG S T O P in je c tio n s in to th e

a p p lic a tio n a n d S IF T p ro c e s s e s , w h ic h w e re u s e d to e v a lu a te th e S IF T e n v iro n m e n t’s
1 1 W h ile th e v a st m a jo rity o f fa ilu re s in th e S IF T e n v iro n m e n t w ill n o t a ffe c t th e tru s te d S C C ,

in re a lity th e re e x is ts a n o n z e ro p ro b a b ility th a t th e S C C c a n b e im p a c te d b y S IF T fa ilu re s .
W e d is c o u n t th is p o s s ib ility in th e p a p e r b e c a u s e th e re is n o t a fu ll-fle d g e d S C C a v a ila b le fo r
c o n d u c tin g su c h a n a n a ly s is .
a b ility to h a n d le c ra s h a n d h a n g fa ilu re s . W e firs t s u m m a riz e th e m a jo r fin d in g s fro m

o v e r 7 0 0 c ra s h a n d h a n g in je c tio n s :
A ll in je c te d e rro rs in to b o th th e a p p lic a tio n a n d S IF T p ro c e s s e s w e re re c o v e re d .
R e c o v e rin g fro m e rro rs in S IF T p ro c e s s e s im p o s e d a m e a n o v e rh e a d o f 5 % to th e
a p p lic a tio n ’s a c tu a l e x e c u tio n tim e . T h is 5 % o v e rh e a d in c lu d e s 2 5 c a s e s o u t o f
ro u g h ly 7 0 0 ru n s in w h ic h th e a p p lic a tio n w a s fo rc e d to b lo c k o r re s ta rt b e c a u s e o f
th e u n a v a ila b ility o f a S IF T p ro c e s s . N e g le c tin g th o s e c a s e s in w h ic h th e
a p p lic a tio n m u s t re d o lo s t c o m p u ta tio n , th e o v e rh e a d im p o s e d b y a re c o v e rin g
S IF T p ro c e s s w a s in s ig n ific a n t.
C o rre la te d fa ilu re s in v o lv in g a S IF T p ro c e s s a n d th e a p p lic a tio n w e re o b s e rv e d . In
2 5 c a s e s , c ra s h a n d h a n g fa ilu re s c a u s e d a S IF T p ro c e s s to b e c o m e u n a v a ila b le ,
p ro m p tin g th e a p p lic a tio n to fa il w h e n it d id n o t re c e iv e a tim e ly re s p o n s e fro m th e
fa ile d S IF T p ro c e s s . A ll c o rre la te d fa ilu re s w e re s u c c e s s fu lly re c o v e re d .
R e s u lts fo r 1 0 0 ru n s p e r ta rg e t a re s u m m a riz e d in T a b le 2 . In s o m e c a s e s , th e
in je c tio n tim e (u s e d to d e te rm in e w h e n to in je c t th e e rro r) o c c u rre d a fte r th e
a p p lic a tio n c o m p le te d . F o r th e s e ru n s , n o e rro r w a s in je c te d . T h e ro w “ B a s e lin e ”
re p o rts th e a p p lic a tio n e x e c u tio n tim e w ith n o fa u lt in je c tio n . O n e h u n d re d ru n s w e re
c h o s e n in o rd e r to e n s u re th a t fa ilu re s o c c u rre d th ro u g h o u t th e v a rio u s p h a s e s o f a n
a p p lic a tio n ’s e x e c u tio n ( in c lu d in g a n id le S IF T e n v iro n m e n t b e fo re a p p lic a tio n
e x e c u tio n , a p p lic a tio n s u b m is s io n a n d in itia liz a tio n , a p p lic a tio n e x e c u tio n ,
a p p lic a tio n te rm in a tio n , a n d s u b s e q u e n t c le a n u p o f th e S IF T e n v iro n m e n t).
A p p lic a tio n R e c o v e r y
H a n g s a re th e m o s t e x p e n s iv e a p p lic a tio n fa ilu re s in te rm s o f lo s t p ro c e s s in g tim e .
A p p lic a tio n h a n g s a re d e te c te d u s in g a p o llin g te c h n iq u e in w h ic h th e E x e c u tio n
A R M O R e x e c u te s a th re a d th a t w a k e s u p e v e ry 2 0 s e c o n d s to c h e c k th e v a lu e o f a
c o u n te r in c re m e n te d b y p ro g re s s in d ic a to r m e s s a g e s s e n t b y th e a p p lic a tio n . B e c a u se
th e c o u n te r is p o lle d a t fix e d in te rv a ls , th e e rro r d e te c tio n la te n c y fo r h a n g s c a n b e u p
to tw ic e th e c h e c k in g p e rio d .
T a b le 2 . S IG IN T /S IG S T O P In je c tio n R e s u lts
S u c c e ssfu l A p p . E x e c . T im e (s ) R e c o v e ry
T a rg e t F a ilu re s
R e c o v e r ie s P e rc e iv e d A c tu a l T im e (s )
S IG IN T
B a s e lin e - - 7 4 .7 8 0 .5 5 7 2 .6 8 0 .4 9 -
A p p lic a tio n 1 0 0 1 0 0 8 9 .8 0 1 .5 0 8 7 .8 8 1 .5 0 0 .4 8 0 .0 5
F T M 8 1 8 1 7 9 .6 0 1 .6 1 7 3 .8 9 0 .2 5 0 .6 4 0 .1 6
E x e c u tio n A R M O R 1 0 0 1 0 0 7 7 .9 1 1 .0 1 7 5 .9 8 1 .0 0 0 .6 1 0 .0 7
H e a rtb e a t A R M O R 9 7 9 7 7 5 .2 6 0 .9 2 7 4 .3 9 0 .9 6 0 .4 7 0 .1 2
S IG S T O P
B a s e lin e - - 7 1 .9 6 0 .3 2 7 0 .0 3 0 .2 7 -
A p p lic a tio n 8 4 8 4 1 1 2 .2 1 1 .8 7 1 1 0 .2 1 1 .8 7 0 .4 7 0 .0 5
F T M 9 7 9 7 7 6 .2 0 1 .9 4 7 0 .0 9 0 .8 8 0 .7 9 0 .1 5
E x e c u tio n A R M O R 9 8 9 8 8 5 .0 1 4 .4 1 8 2 .2 1 4 .2 8 0 .6 3 0 .1 5
H e a rtb e a t A R M O R 7 7 7 7 7 1 .8 8 0 .2 4 7 0 .2 4 0 .2 4 0 .5 6 0 .2 1
S I F T E n v ir o n m e n t R e c o v e r y
F T M r e c o v e r y . T h e p e rc e iv e d e x e c u tio n tim e fo r th e a p p lic a tio n is e x te n d e d if (1 ) th e
F T M fa ils w h ile s e ttin g u p th e e n v iro n m e n t b e fo re th e a p p lic a tio n e x e c u tio n b e g in s o r
(2 ) th e F T M fa ils w h ile c le a n in g u p th e e n v iro n m e n t a n d n o tify in g th e S p a c e c ra ft
C o n tro l C o m p u te r th a t th e a p p lic a tio n te rm in a te d . T h e a p p lic a tio n is d e c o u p le d fro m
th e F T M ’s e x e c u tio n a fte r s ta rtin g , s o fa ilu re s in th e F T M d o n o t a ffe c t it. T h e o n ly
o v e rh e a d in a c tu a l e x e c u tio n tim e o rig in a te s fro m th e n e tw o rk c o n te n tio n d u rin g th e
F T M ’ s r e c o v e r y , w h ic h la s ts f o r o n ly 0 .6 - 0 .7 s .
A n F T M -a p p lic a tio n c o r r e la te d fa ilu r e . T h e e rro r in je c tio n s a ls o re v e a le d a
c o rre la te d fa ilu re in w h ic h th e F T M fa ilu re c a u s e d th e a p p lic a tio n to re s ta rt in 2 o f th e
1 7 8 ru n s (s e e [3 2 ] fo r d e s c rip tio n o f c o rre la te d fa ilu re s c e n a rio s ).
T h e S IF T e n v iro n m e n t is a b le to re c o v e r fro m th is c o rre la te d fa ilu re b e c a u s e th e
c o m p o n e n ts p e rfo rm in g th e d e te c tio n (H e a rtb e a t A R M O R d e te c tin g F T M fa ilu re s a n d
E x e c u tio n A R M O R d e te c tin g a p p lic a tio n fa ilu re s ) a re n o t a ffe c te d b y th e fa ilu re s .
E x e c u tio n A R M O R . O f th e 1 9 8 c ra s h /h a n g e rro rs in je c te d in to th e E x e c u tio n
A R M O R s , 1 7 5 re q u ire d re c o v e ry o n ly in th e E x e c u tio n A R M O R . F o r th e s e ru n s , th e
a p p lic a tio n e x e c u tio n o v e rh e a d w a s n e g lig ib le . T h e o v e rh e a d re p o rte d in T a b le 2 (u p
to 1 0 % fo r h a n g fa ilu re s ) re s u lte d fro m th e re m a in in g 2 3 c a s e s in w h ic h th e
a p p lic a tio n w a s fo rc e d to re s ta rt.
A n E x e c u tio n A R M O R -a p p lic a tio n c o r r e la te d fa ilu r e . If th e a p p lic a tio n p ro c e s s
a tte m p te d to c o n ta c t th e E x e c u tio n A R M O R ( e .g ., to s e n d p r o g r e s s in d ic a to r u p d a te s
o r to n o tify th e E x e c u tio n A R M O R th a t it is te rm in a tin g n o rm a lly ) w h ile th e A R M O R
w a s re c o v e rin g , th e a p p lic a tio n p ro c e s s b lo c k e d u n til th e E x e c u tio n A R M O R
c o m p le te ly re c o v e re d . B e c a u s e th e M P I p ro c e s s e s a re tig h tly c o u p le d , a c o rre la te d
fa ilu re is p o s s ib le if th e E x e c u tio n A R M O R o v e rs e e in g th e o th e r M P I p ro c e s s
d ia g n o s e d th e b lo c k in g a s a n a p p lic a tio n h a n g a n d in itia te d re c o v e ry .
T h is c o rre la te d fa ilu re o c c u rre d m o s t o fte n w h e n th e E x e c u tio n A R M O R h u n g
( i.e ., d u e to S I G S T O P in je c tio n s ) : 2 2 c o r r e la te d f a ilu r e s w e r e d u e to S I G S T O P
in je c tio n s a s o p p o s e d to 1 c o r r e la te d f a ilu r e r e s u ltin g f r o m a n A R M O R c r a s h ( i.e .,
d u e to S IG IN T in je c tio n s ). T h is is b e c a u s e a n E x e c u tio n A R M O R c ra s h fa ilu re is
d e te c te d im m e d ia te ly b y th e d a e m o n th ro u g h o p e ra tin g s y s te m c a lls , m a k in g th e
E x e c u tio n A R M O R u n a v a ila b le fo r o n ly a s h o rt tim e . H a n g s , h o w e v e r, a re d e te c te d
v ia a 1 0 -s e c o n d h e a rtb e a t.
2 .4 R e g is te r a n d T e x t-S e g m e n t I n je c tio n s
T h is s e c tio n e x p a n d s th e s c o p e o f th e in je c tio n s to fu rth e r s tre s s th e d e te c tio n a n d

re c o v e ry m e c h a n is m s b y a llo w in g fo r th e p o s s ib ility o f c h e c k p o in t c o rru p tio n a n d
e rro r p ro p a g a tio n to a n o th e r p ro c e ss . R e s u lts fro m a p p ro x im a te ly 9 ,0 0 0 s in g le -b it
e rro rs in to th e re g is te r se t a n d te x t s e g m e n t o f th e a p p lic a tio n a n d S IF T p ro c e sse s
s h o w th a t:
M o s t re g is te r a n d te x t-s e g m e n t e rro rs le d to c ra s h a n d h a n g fa ilu re s th a t w e re
re c o v e re d b y th e S IF T e n v iro n m e n t.
E le v e n o f th e a p p ro x im a te ly 7 0 0 o b s e rv e d fa ilu re s le d to s y s te m fa ilu re s in w h ic h
e ith e r th e a p p lic a tio n d id n o t c o m p le te o r th e S IF T e n v iro n m e n t d id n o t d e te c t th a t
th e a p p lic a tio n s u c c e s s fu lly c o m p le te d . T h e s e 1 1 s y s te m fa ilu re s re s u lte d fro m
in je c te d e rro rs th a t c o rru p te d a n A R M O R ’s c h e c k p o in t o r p ro p a g a te d o u ts id e th e
in je c te d p ro c e ss.
T e x t-s e g m e n t e rro rs w e re m o re lik e ly th a n re g is te r e rro rs to le a d to s y s te m fa ilu re s .
T h is w a s b e c a u se v a lu e s in r e g is te r s ty p ic a lly h a d a s h o r te r lif e tim e ( i.e ., th e y w e r e
e ith e r n e v e r u se d o r q u ic k ly o v e rw ritte n ) w h e n c o m p a re d to in fo rm a tio n s to re d in
th e te x t se g m e n t.
T a b le 3 s u m m a riz e s th e re s u lts o f a p p ro x im a te ly 6 ,0 0 0 re g is te r in je c tio n s a n d
3 ,0 0 0 te x t- s e g m e n t in je c tio n s in to b o th th e a p p lic a tio n a n d A R M O R p r o c e s s e s .
F a ilu re s a re c la s s ifie d in to fo u r c a te g o rie s : s e g m e n ta tio n fa u lts , ille g a l in s tru c tio n s ,
h a n g s , a n d e rro rs d e te c te d v ia a s s e rtio n s . T h e s e c o n d c o lu m n in T a b le 3 g iv e s th e
n u m b e r o f s u c c e s s fu l re c o v e rie s v s . th e n u m b e r o f fa ilu re s fo r e a c h s e t o f
e x p e rim e n ts . E rro rs th a t w e re n o t s u c c e s s fu lly re c o v e re d le d to s y s te m fa ilu re s (4 d u e
to F T M fa ilu re s , 5 d u e to E x e c u tio n A R M O R fa ilu re s , a n d 2 d u e to H e a rtb e a t
A R M O R fa ilu re s ).
F T M r e c o v e r y . T a b le 3 s h o w s th a t th e F T M s u c c e s s fu lly re c o v e re d fro m a ll
re g is te r in je c tio n s . T w o te x t-s e g m e n t in je c tio n s w e re d e te c te d th ro u g h a s s e rtio n s o n
th e F T M ’s in te rn a l d a ta s tru c tu re s , a n d b o th o f th e s e e rro rs w e re re c o v e re d .
T a b le 3 a ls o s h o w s th a t th e F T M c o u ld n o t re c o v e r fro m fo u r te x t-s e g m e n t e rro rs .
In e a c h c a s e , th e e rro r c o rru p te d th e F T M ’s c h e c k p o in t p rio r to c ra s h in g . B e c a u s e th e
c h e c k p o in t w a s c o rru p te d , th e F T M c ra s h e d s h o rtly a fte r b e in g re c o v e re d . T h is c y c le
o f fa ilu re a n d re c o v e ry re p e a te d u n til th e ru n tim e d o u t.
T h e re w e re s e v e n c a s e s o f a c o rre la te d fa ilu re in w h ic h th e F T M fa ile d d u rin g th e
a p p lic a tio n ’s in itia liz a tio n : th re e fro m te x t-s e g m e n t in je c tio n s a n d fo u r fro m re g is te r
in je c tio n s . B o th th e F T M a n d th e a p p lic a tio n re c o v e re d fro m a ll s e v e n c o rre la te d
fa ilu re s .
T a b le 3 . R e g is te r a n d T e x t-S e g m e n t In je c tio n R e s u lts
F a ilu re C la s s ific a tio n A p p . E x e c . T im e (s )

R e c o v e r ie s / R e c o v e rry
T a rg e t S e g . I lle g a l A s s e rt-
F a ilu re s H a n g P e rc e iv e d A c tu a l T im e (s )
fa u lt in s tr. io n
B a s e lin e - - - - - 7 1 .9 6 0 .3 2 7 0 .0 3 0 .2 7 -
R e g is te r In je c tio n s
A p p lic a tio n 9 5 / 9 5 7 1 4 2 0 0 9 0 .7 0 2 .5 7 8 8 .8 1 2 .5 7 0 .7 0 0 .2 1
F T M 8 4 / 8 4 5 8 6 1 6 4 7 5 .6 5 1 .5 4 7 3 .4 2 1 .2 8 0 .7 1 0 .0 3
E x e c u tio n
7 7 / 8 0 5 6 6 1 5 3 7 6 .1 9 1 .8 2 7 3 .5 6 1 .8 3 0 .4 5 0 .0 8
A R M O R
H e a rtb e a t
7 7 / 7 7 6 2 6 8 1 7 3 .0 0 0 .2 2 7 0 .6 6 0 .2 1 0 .3 1 0 .0 4
A R M O R
T e x t-s e g m e n t In je c tio n s
A p p lic a tio n 8 2 / 8 2 4 1 2 3 1 8 0 8 9 .4 7 2 .8 7 8 7 .4 9 2 .8 8 1 .0 5 0 .3 3
F T M 8 4 / 8 8 5 3 2 8 5 2 7 6 .4 7 2 .8 7 7 1 .0 0 2 .3 1 0 .5 1 0 .0 5
E x e c u tio n
9 3 / 9 5 4 5 3 1 1 1 8 7 7 .4 8 1 .9 3 7 4 .8 3 1 .8 6 0 .4 3 0 .0 4
A R M O R
H e a rtb e a t
9 5 / 9 7 5 3 3 3 1 1 0 7 3 .2 3 0 .3 7 7 1 .2 1 0 .3 6 0 .3 0 0 .0 1
A R M O R
E x e c u tio n A R M O R r e c o v e r y . T h re e re g is te r in je c tio n s a n d tw o te x t-s e g m e n t

in je c tio n s in to th e E x e c u tio n A R M O R le d to s y s te m fa ilu re . In e a c h o f th e s e c a s e s ,
th e e rro r p ro p a g a te d to o th e r A R M O R p ro c e s s e s o r to th e E x e c u tio n A R M O R ’s
c h e c k p o in t.
O n e te x t-s e g m e n t in je c tio n a n d th re e re g is te r in je c tio n s c a u s e d e rro rs in th e
E x e c u tio n A R M O R to p r o p a g a te to th e F T M ( i.e ., th e e r r o r w a s n o t f a il- s ile n t) .
A lth o u g h th e E x e c u tio n A R M O R d id n o t c ra s h , it s e n t c o rru p te d d a ta to th e F T M
w h e n th e a p p lic a tio n te rm in a te d , c a u s in g th e F T M to c ra s h . T h e F T M s ta te in its
c h e c k p o in t w a s n o t a ffe c te d b y th e e rro r, s o th e F T M w a s a b le to re c o v e r to a v a lid
s ta te . B e c a u s e th e F T M d id n o t c o m p le te p ro c e s s in g th e E x e c u tio n A R M O R ’s
n o tific a tio n m e s s a g e , th e F T M d id n o t s e n d a n a c k n o w le d g m e n t b a c k to th e
E x e c u tio n A R M O R . T h e m is s in g a c k n o w le d g m e n t p ro m p te d th e E x e c u tio n A R M O R
to re s e n d th e fa u lty m e s s a g e , w h ic h a g a in c a u s e d th e F T M to c ra s h . T h is c y c le o f
re c o v e ry fo llo w e d b y th e re tra n s m is s io n o f fa u lty d a ta c o n tin u e d u n til th e ru n tim e d
o u t.
O n e o f th e te x t-s e g m e n t in je c tio n s c a u s e d th e E x e c u tio n A R M O R to s a v e a
c o rru p te d c h e c k p o in t b e fo re c ra s h in g . W h e n th e A R M O R re c o v e re d , it re s to re d its
s ta te fro m th e fa u lty c h e c k p o in t a n d c ra s h e d s h o rtly th e re a fte r. T h is c y c le re p e a te d
u n til th e ru n tim e d o u t.
In a d d itio n to th e s y s te m fa ilu re s d e s c rib e d a b o v e , th re e te x t-s e g m e n t in je c tio n s
in to th e E x e c u tio n A R M O R re s u lte d in th e re s ta rtin g o f th e te x tu re a n a ly s is
a p p lic a tio n . A ll th re e o f th e s e c o rre la te d fa ilu re s w e re s u c c e s s fu lly re c o v e re d .
H e a r tb e a t A R M O R r e c o v e r y . T h e H e a rtb e a t A R M O R re c o v e re d fro m a ll re g is te r
e rro rs , w h ile te x t-s e g m e n t in je c tio n s b ro u g h t a b o u t tw o s y s te m fa ilu re s . A lth o u g h n o
c o rru p te d s ta te e s c a p e d th e H e a rtb e a t A R M O R , th e e rro r p re v e n te d th e H e a rtb e a t
A R M O R fro m re c e iv in g in c o m in g m e s s a g e s . T h u s , th e H e a rtb e a t A R M O R fa ls e ly
d e te c te d th a t th e F T M h a d fa ile d , s in c e it d id n o t re c e iv e a h e a rtb e a t re p ly fro m th e
F T M . T h e A R M O R th e n b e g a n to in itia te re c o v e ry o f th e F T M b y (1 ) in s tru c tin g th e
F T M ’s d a e m o n to re in s ta ll th e F T M p ro c e s s , a n d (2 ) in s tru c tin g th e F T M to re s to re its
s ta te fro m c h e c k p o in t a fte r re c e iv in g a c k n o w le d g m e n t th a t th e F T M h a s b e e n
s u c c e s s fu lly re in s ta lle d .
A m o n g th e s u c c e s s fu l re c o v e rie s fro m te x t-s e g m e n t e rro rs s h o w n in T a b le 3 , fo u r
in v o lv e d c o rru p te d h e a rtb e a t m e s s a g e s th a t c a u s e d th e F T M to fa il. A lth o u g h fa u lty
d a ta e s c a p e d th e H e a rtb e a t A R M O R , th e c o rru p te d m e s s a g e d id n o t c o m p ro m is e th e
F T M ’s c h e c k p o in t. T h u s , th e F T M w a s a b le to re c o v e r fro m th e s e fo u r fa ilu re s .
2 .5 H e a p I n je c tio n s
C a re fu l e x a m in a tio n o f th e re g is te r in je c tio n e x p e rim e n ts s h o w e d th a t c ra s h fa ilu re s

w e re m o s t o fte n c a u s e d b y s e g m e n ta tio n fa u lts ra is e d fro m d e re fe re n c in g a c o rru p te d
p o in te r. T o m a x im iz e th e c h a n c e s fo r e rro r p ro p a g a tio n , o n ly d a ta (n o t p o in te rs ) w e re
in je c te d o n th e h e a p . R e s u lts fro m ta rg e te d in je c tio n s in to F T M h e a p m e m o ry w e re
g ro u p e d b y th e e le m e n t in to w h ic h th e e rro r w a s in je c te d . T a b le 4 s h o w s th e n u m b e r
o f s y s te m fa ilu re s o b s e rv e d fro m 1 0 0 e rro r in je c tio n s p e r e le m e n t, c la s s ifie d a s to th e
th e ir e ffe c t o n th e s y s te m . O n e h u n d re d ta rg e te d in je c tio n s w e re s u ffic ie n t to o b s e rv e
e ith e r e s c a p e d o r d e te c te d e rro rs g iv e n th e a m o u n t o f s ta te in e a c h e le m e n t; o v e r a ll,
5 0 0 h e a p in je c tio n s w e re c o n d u c te d o n th e F T M .
T a b le 4 . S y s te m F a ilu re s O b s e rv e d T h ro u g h H e a p In je c tio n s
L e g e n d (E ffe c t o n s y s te m ): (A ) u n a b le to re g is te r d a e m o n s, (B ) u n a b le to in s ta ll E x e c u tio n A R M O R s , (C ) u n a b le to s ta rt
a p p lic a tio n s , ( D ) u n a b le to u n in s ta ll E x e c u tio n A R M O R s a fte r a p p lic a tio n c o m p le te s .
L e g e n d (S y s te m fa ilu r e /a s s e r tio n c h e c k c la s s ific a tio n ): (2 ) sy s te m fa ilu re w ith o u t a s s e rtio n firin g , (3 ) s y s te m fa ilu re w ith
a ss e rtio n firin g , (4 ) su c c e ssfu l r e c o v e rie s a fte r a s s e rtio n fire d .
E le m e n t E ffe c t o n S y s te m S y s te m F a ilu re s
# 4
A B C D T o ta l # 2 # 3
m g r _ a r m o r _ i n f o . S to re s in fo rm a tio n a b o u t
s u b o rd in a te A R M O R s s u c h a s lo c a tio n a n d 4 1 5 4 1 4 6 8 1 9
e le m e n t c o m p o s itio n .
e x e c _ a r m o r _ i n f o . S to re s in fo rm a tio n
a b o u t e a c h E x e c u tio n A R M O R s u c h a s s ta tu s o f 0 0 5 4 9 4 5 9
s u b o rd in a te a p p lic a tio n .
a p p _ p a r a m . S to re s in fo rm a tio n a b o u t
a p p lic a tio n s u c h a s e x e c u ta b le n a m e ,
0 0 0 0 0 0 0 2
c o m m a n d - lin e a r g u m e n ts , a n d n u m b e r o f tim e s
a p p lic a tio n re s ta rte d .
a g r _ a p p _ d e t e c t . U s e d to d e te c t th a t a ll
p ro c e s s e s fo r M P I a p p lic a tio n h a v e te r m in a te d 0 0 0 0 0 0 0 4
a n d to in itia te re c o v e r y if n e c e s s a r y .
n o d e _ m g m t . S to re s in fo rm a tio n a b o u t th e
n o d e s , in c lu d in g th e re s id e n t d a e m o n a n d 0 1 4 0 0 1 4 0 1 4 3
h o s tn a m e .
T O T A L 4 1 5 1 0 8 3 7 1 0 2 7 3 7
M a n y d a ta e rro rs w e re d e te c ta b le th ro u g h a s s e rtio n s w ith in th e F T M , b u t n o t a ll

a s s e rtio n s w e re e ffe c tiv e in p re v e n tin g s y s te m fa ilu re s . O n e o f fo u r s c e n a rio s re s u lte d
a fte r a d a ta e rro r w a s in je c te d (th e la s t th re e c o lu m n s in T a b le 4 a re n u m b e re d to re fe r
to s c e n a rio s 2 -4 ):
1 . T h e d a ta e rro r w a s n o t d e te c te d b y a n a s s e rtio n a n d h a d n o e ffe c t o n th e s y s te m .
T h e a p p lic a tio n c o m p le te d s u c c e s s fu lly a s if th e re w e re n o e rro r.
2 . T h e d a ta e rro r w a s n o t d e te c te d b y a n a s s e rtio n b u t le d to a s y s te m fa ilu re . N o n e o f
th e s y s te m fa ilu re s im p a c te d th e a p p lic a tio n w h ile it w a s e x e c u tin g .
3 . T h e d a ta e rro r w a s d e te c te d b y a n a s s e rtio n c h e c k , b u t o n ly a fte r th e e rro r h a d
p ro p a g a te d to th e F T M ’s c h e c k p o in t o r to a n o th e r p ro c e s s . R o llin g b a c k th e
F T M ’s s ta te in th e s e c irc u m s ta n c e s w a s in e ffe c tiv e , a n d s y s te m fa ilu re s re s u lte d
fro m w h ic h th e S IF T e n v iro n m e n t c o u ld n o t re c o v e r. T h e s e c a s e s s h o w th a t e rro r
la te n c y is a fa c to r w h e n a tte m p tin g to re c o v e r fro m e rro rs in a d is trib u te d
4 . T h e d a ta e rro r w a s d e te c te d b y a n a s s e rtio n c h e c k b e fo re p ro p a g a tin g to th e F T M ’s
c h e c k p o in t o r to a n o th e r p ro c e s s . A fte r a n a s s e rtio n fire d , th e F T M k ille d its e lf
a n d re c o v e re d a s if it h a d e x p e rie n c e d a n o rd in a ry c ra s h fa ilu re .
T h e in je c tio n re s u lts in T a b le 4 s h o w th a t th e le a s t s e n s itiv e e le m e n ts (a p p _ p a r a m
a n d m g r _ a p p _ d e te c t) w e re th o s e m o d u le s w h o s e s ta te w a s s u b s ta n tia lly re a d -o n ly
a f te r b e in g w ritte n e a rly w ith in th e ru n . W ith a s s e rtio n s in p la c e , n o n e o f th e d a ta
e r ro rs le d to s y s te m fa ilu re s . A t th e o th e r e n d o f th e s e n s itiv ity s p e c tru m , 2 8 e rro rs in
tw o e le m e n ts c a u s e d s y s te m fa ilu re s . In c o n tra s t w ith th e e le m e n ts c a u s in g n o s y s te m
fa ilu re s , th e d a ta in m g r _ a r m o r _ in fo a n d n o d e _ m g m t w e re re p e a te d ly w ritte n d u rin g
th e in itia liz a tio n p h a s e s o f a ru n .
T a b le 4 a ls o s h o w s th e e ffic ie n c y o f a s s e rtio n c h e c k s in p re v e n tin g s y s te m fa ilu re s .

T h e rig h tm o s t tw o c o lu m n s in th e ta b le re p re s e n t th e to ta l n u m b e r o f ru n s in w h ic h
a s s e rtio n s d e te c te d e rro rs . F o r e x a m p le , a s s e rtio n s in th e m g r _ a r m o r _ in fo e le m e n t
d e te c te d 2 7 e rro rs , a n d 1 9 o f th o s e e rro rs w e re s u c c e s s fu lly re c o v e re d . T h e d a ta a ls o
s h o w th a t a s s e rtio n s c o u p le d w ith th e in c re m e n ta l m ic ro c h e c k p o in tin g w e re a b le to
p re v e n t s y s te m fa ilu re s in 5 8 % o f th e c a s e s (2 7 o f 6 4 ru n s in w h ic h a s s e rtio n s fire d ).
O n th e o th e r h a n d , a s s e rtio n s d e te c te d th e e rro r to o la te to p re v e n t s y s te m fa ilu re s
in 2 7 c a s e s . F o r e x a m p le , 1 4 o f th e 1 7 ru n s in w h ic h a s s e rtio n s d e te c te d e rro rs in th e
n o d e _ m g m t e le m e n t re s u lte d in s y s te m fa ilu re s . T h is p ro b le m w a s re c tifie d b y a d d in g
c h e c k s to th e tra n s la tio n re s u lts b e fo re s e n d in g th e m e s s a g e .
2 .6 L e sso n s L e a r n e d
S IF T o v e r h e a d s h o u ld b e k e p t s m a ll. S y s te m d e s ig n e rs m u s t b e a w a re th a t S IF T
s o lu tio n s h a v e th e p o te n tia l to d e g ra d e th e p e rfo rm a n c e a n d e v e n th e d e p e n d a b ility o f
th e a p p lic a tio n s th e y a re in te n d e d to p ro te c t. O u r e x p e rim e n ts s h o w th a t th e
fu n c tio n a lity in S IF T c a n b e d is trib u te d a m o n g s e v e ra l p ro c e s s e s th ro u g h o u t th e
n e tw o rk s o th a t th e o v e rh e a d im p o s e d b y th e S IF T p ro c e s s e s is in s ig n ific a n t w h ile th e
a p p lic a tio n is ru n n in g .
S IF T r e c o v e r y tim e s h o u ld b e k e p t s m a ll. M in im iz in g th e S IF T p ro c e s s re c o v e ry
tim e is d e s ira b le fro m tw o s ta n d p o in ts : (1 ) re c o v e rin g S IF T p ro c e s s e s h a v e th e
p o te n tia l to a ffe c t a p p lic a tio n p e rfo rm a n c e b y c o n te n d in g fo r p ro c e s s o r a n d n e tw o rk
re s o u rc e s , a n d (2 ) a p p lic a tio n s re q u irin g s u p p o rt fro m th e S IF T e n v iro n m e n t a re
a ffe c te d w h e n S IF T p ro c e s s e s b e c o m e u n a v a ila b le . O u r re s u lts in d ic a te th a t fu lly
r e c o v e r in g a S I F T p r o c e s s ta k e s a p p r o x im a te ly 0 .5 s . T h e m e a n o v e r h e a d a s s e e n b y
th e a p p lic a tio n fro m S IF T re c o v e ry is le s s th a n 5 % , w h ic h ta k e s in to a c c o u n t 1 0 o u t
o f ro u g h ly 8 0 0 fa ilu re s fro m re g is te r, te x t-s e g m e n t a n d h e a p in je c tio n s th a t c a u s e d th e
a p p lic a tio n to b lo c k o r re s ta rt b e c a u s e o f th e u n a v a ila b ility o f a S IF T p ro c e s s . T h e
o v e rh e a d fro m re c o v e ry is in s ig n ific a n t w h e n th e s e 1 0 c a s e s a re n e g le c te d .
S IF T /a p p lic a tio n in te r fa c e s h o u ld b e k e p t s im p le . In a n y m u ltip ro c e s s S IF T
d e s ig n , s o m e S IF T p ro c e s s e s m u s t b e c o u p le d to th e a p p lic a tio n in o rd e r to p ro v id e
e rro r d e te c tio n a n d re c o v e ry . T h e E x e c u tio n A R M O R s p la y th is ro le in o u r S IF T
e n v iro n m e n t. B e c a u s e o f th is d e p e n d e n c y , it is im p o rta n t to m a k e th e E x e c u tio n
A R M O R s a s s im p le a s p o s s ib le . A ll re c o v e ry a c tio n s a n d th o s e o p e ra tio n s th a t a ffe c t
th e g lo b a l s y s te m ( e .g ., jo b s u b m is s io n a n d d e te c tin g r e m o te n o d e f a ilu r e s ) a r e
d e le g a te d to a re m o te S IF T p ro c e s s th a t is d e c o u p le d fro m th e a p p lic a tio n ’s
e x e c u tio n . T h is s tra te g y a p p e a rs to w o rk , a s o n ly 5 o f 3 7 3 o b s e rv e d E x e c u tio n
A R M O R fa ilu re s le d to s y s te m fa ilu re s .
S IF T a v a ila b ility im p a c ts th e a p p lic a tio n . L o w re c o v e ry tim e a n d a g g re s s iv e
c h e c k p o in tin g o f th e S IF T p ro c e s s e s h e lp m in im iz e th e S IF T e n v iro n m e n t d o w n tim e ,
m a k in g th e e n v iro n m e n t a v a ila b le fo r p ro c e s s in g a p p lic a tio n re q u e s ts a n d fo r
re c o v e rin g fro m a p p lic a tio n fa ilu re s .
S y s te m fa ilu r e s a r e n o t n e c e s s a r ily fa ta l. O n ly 1 1 o f th e 1 0 ,0 0 0 in je c tio n s re s u lte d
in a s y s te m fa ilu re in w h ic h th e S IF T e n v iro n m e n t c o u ld n o t re c o v e r fro m th e e rro r.
T h e s e s y s te m fa ilu re s d id n o t a ffe c t a n e x e c u tin g a p p lic a tio n .
3 E r r o r a n d F a ilu r e A n a ly s is o f a L A N o f W in d o w s N T -B a s e d
S e r v e r s
D ire c t m o n ito rin g , re c o rd in g , a n d a n a ly s is o f n a tu ra lly o c c u rrin g e rro rs a n d fa ilu re s in

th e s y s te m c a n p ro v id e v a lu a b le in fo rm a tio n o n a c tu a l e rro r/fa ilu re b e h a v io r, id e n tify
s y s te m b o ttle n e c k s , q u a n tify d e p e n d a b ility m e a s u re s , a n d v e rify a s s u m p tio n s m a d e in
a n a ly tic a l m o d e ls . In th is s e c tio n w e p ro v id e a n e x a m p le o f s y s te m d e p e n d a b ility
a n a ly s is u s in g fa ilu re d a ta c o lle c te d fro m a L o c a l A re a N e tw o rk s (L A N ) o f W in d o w s
N T se rv e rs.
In m o s t c o m m e rc ia l s y s te m s , in fo rm a tio n a b o u t fa ilu re s c a n b e o b ta in e d fro m th e
m a n u a l lo g s m a in ta in e d b y a d m in is tra to rs o r fro m th e a u to m a te d e v e n t-lo g g in g
m e c h a n is m s in th e u n d e rly in g o p e ra tin g s y s te m . M a n u a l lo g s a re v e ry s u b je c tiv e a n d
o fte n u n a v a ila b le . H e n c e th e y a re n o t ty p ic a lly s u ite d fo r a u to m a te d a n a ly s is o f
fa ilu re s . In c o n tra s t, th e e v e n t lo g s m a in ta in e d b y th e s y s te m h a v e p re d e fin e d fo rm a ts ,
p r o v id e c o n te x tu a l in f o r m a tio n in c a s e o f f a ilu r e s ( e .g ., a tr a c e o f s ig n if ic a n t e v e n ts
th a t p re c e d e a fa ilu re ), a n d a re th u s c o n d u c iv e to a u to m a te d a n a ly s is . M o re o v e r, a s
fa ilu re s a re re la tiv e ly ra re e v e n ts , it is n e c e s s a ry to m e tic u lo u s ly c o lle c t a n d a n a ly z e
e rro r d a ta fo r m a n y m a c h in e -m o n th s fo r th e re s u lts o f th e d a ta a n a ly s is to b e
s ta tis tic a lly v a lid . S u c h re g u la r a n d p ro lo n g e d d a ta a c q u is itio n is p o s s ib le o n ly
th ro u g h a u to m a te d e v e n t lo g g in g . H e n c e m o s t s tu d ie s o f fa ilu re s in s in g le a n d
n e tw o rk e d c o m p u te r s y s te m s a re b a s e d o n th e e rro r lo g s m a in ta in e d b y th e o p e ra tin g
s y s te m ru n n in g o n th o s e m a c h in e s .
T h is s e c tio n p re s e n ts m e th o d o lo g y a n d re s u lts fro m a n a n a ly s is o f fa ilu re s fo u n d in
a n e tw o rk o f a b o u t 7 0 W in d o w s N T b a s e d m a il s e rv e rs (ru n n in g M ic ro s o ft E x c h a n g e
s o f tw a r e ) . T h e d a ta f o r th e s tu d y is o b ta in e d f r o m e v e n t lo g s ( i.e ., lo g s o f m a c h in e
e v e n ts th a t a re m a in ta in e d a n d m o d ifie d b y th e W in d o w s N T o p e ra tin g s y s te m )
c o lle c te d o v e r a s ix -m o n th p e rio d fro m th e m a il ro u tin g n e tw o rk o f a c o m m e rc ia l
o rg a n iz a tio n . In th is s tu d y w e a n a ly z e o n ly m a c h in e re b o o ts b e c a u s e th e y c o n s titu te a
s ig n ific a n t p o rtio n o f a ll lo g g e d fa ilu re d a ta a n d a re th e m o s t s e v e re ty p e o f fa ilu re .
A s a s ta rtin g p o in t, a p re lim in a ry d a ta a n a ly s is is c o n d u c te d to c la s s ify th e n a tu re o f
o b s e rv e d fa ilu re e v e n ts . T h is fa ilu re c a te g o riz a tio n is th e n u s e d to e x a m in e th e
b e h a v io r o f in d iv id u a l m a c h in e s in d e ta il a n d to d e riv e a fin ite s ta te m o d e l. T h e m o d e l
d e p ic ts th e b e h a v io r o f a ty p ic a l m a c h in e . F in a lly , a d o m a in -w id e a n a ly s is is
p e rfo rm e d to c a p tu re th e b e h a v io r o f th e d o m a in in a fin ite s ta te m o d e l. T h e th o ro u g h
fa ilu re d a ta a n a ly s is , th e re a d e r c a n fin d in [1 2 ].
R e la te d W o r k . A n a ly s is o f fa ilu re s in c o m p u te r s y s te m s h a s b e e n th e fo c u s o f
a c tiv e re s e a rc h fo r q u ite s o m e tim e . S tu d ie s o f fa ilu re s o c c u rrin g in c o m m e rc ia l
s y s te m s ( e .g ., V A X /V M S , T a n d e m /G U A R D I A N ) a r e b a s e d p r im a r ily o n f a ilu r e d a ta
c o lle c te d fro m th e fie ld . T h e fo c u s o f s u c h s tu d ie s is o n c a te g o riz in g th e n a tu re o f
f a ilu r e s in th e s y s te m s ( e .g ., s o f tw a r e f a ilu r e s , h a r d w a r e f a ilu r e s ) , id e n tif y in g
a v a ila b ility b o ttle n e c k s , a n d o b ta in in g m o d e ls to e s tim a te th e a v a ila b ility o f th e
s y s t e m s b e i n g a n a l y z e d . L e e [ 1 5 ] , [ 1 6 ] a n a l y z e d f a i l u r e s i n T a n d e m ’s G U A R D I A N
o p e ra tin g s y s te m . T a n g [2 5 ] a n a ly z e d e rro r lo g s p e rta in in g to a m u ltic o m p u te r
e n v iro n m e n t b a s e d o n V A X /V M S c lu s te r. T h a k u r [2 7 ] p re s e n te d a n a n a ly s is o f
fa ilu re s in th e T a n d e m N o n s to p -U X o p e ra tin g s y s te m .
H s u e h [ 9 ] e x p l o r e d e r r o r s a n d r e c o v e r y i n I B M ’s M V S o p e r a t i n g s y s t e m . B a s e d o n
th e e rro r lo g s c o lle c te d fro m M V S s y s te m s , a s e m i-M a rk o v m o d e l o f m u ltip le e rro rs
( i.e . e rro rs th a t m a n ife s t th e m s e lv e s in m u ltip le w a y s ) w a s c o n s tru c te d to a n a ly z e

sy s te m fa ilu re b e h a v io r. M e a s u re m e n t-b a s e d s o ftw a re re lia b ility m o d e ls w e re a ls o
p re s e n te d in [1 5 ], [1 6 ] (fo r th e G U A R D IA N s y s te m ) a n d [2 5 ], [2 6 ] (fo r th e V A X
c lu s te r).
T h e im p a c t o f w o rk lo a d o n s y s te m fa ilu re s w a s a ls o e x te n s iv e ly s tu d ie d . C a s tillo
[6 ] d e v e lo p e d a s o ftw a re re lia b ility p re d ic tio n m o d e l th a t to o k in to a c c o u n t th e
w o rk lo a d im p o s e d o n th e s y s te m . Iy e r [1 1 ] e x a m in e d th e e ffe c t o f w o rk lo a d o n th e
re lia b ility o f th e IB M 3 0 8 1 o p e ra tin g s y s te m . M o u ra d [2 1 ] p e rfo rm e d a re lia b ility
s tu d y o n th e IB M M V S /X A o p e ra tin g s y s te m a n d fo u n d th a t th e e rro r d is trib u tio n is
h e a v ily d e p e n d e n t o n th e ty p e o f s y s te m u tiliz a tio n . M e y e r [2 0 ] p re s e n te d a n a n a ly s is
o f th e in flu e n c e o f w o rk lo a d o n th e d e p e n d a b ility o f c o m p u te r s y s te m s .
L in [1 7 ] a n d T s a o [2 8 ] fo c u s e d o n tre n d a n a ly s is in e rro r lo g s . G ra y [8 ] p re s e n te d
re s u lts fro m a c e n s u s o f T a n d e m s y s te m s . C h illa re g e [7 ] p re s e n te d a s tu d y o f th e
im p a c t o f fa ilu re s o n c u s to m e rs a n d th e fa u lt life tim e s . S u lliv a n [2 3 ], [2 4 ] e x a m in e d
s o ftw a re d e fe c ts o c c u rrin g in o p e ra tin g s y s te m s a n d d a ta b a s e s (b a s e d o n fie ld d a ta ).
V e la rd i [2 9 ] e x a m in e d fa ilu re s a n d re c o v e ry in th e M V S o p e ra tin g s y s te m .
3 .1 E r r o r L o g g in g in W in d o w s N T
W in d o w s N T o p e ra tin g s y s te m o ffe rs c a p a b ilitie s fo r e rro r lo g g in g . T h is s o ftw a re

re c o rd s in fo rm a tio n o n e rro rs o c c u rrin g in th e v a rio u s s u b s y s te m s , s u c h a s m e m o ry ,
d is k , a n d n e tw o rk s u b s y s te m s , a s w e ll a s o th e r s y s te m e v e n ts , s u c h a s re b o o ts a n d
s h u td o w n s . T h e re p o rts u s u a lly in c lu d e in fo rm a tio n o n th e lo c a tio n , tim e , ty p e o f th e
e r r o r , th e s y s te m s ta te a t th e tim e o f th e e r r o r , a n d s o m e tim e s e r r o r r e c o v e r y ( e .g .,
re try ) in fo rm a tio n . T h e m a in a d v a n ta g e o f o n -lin e a u to m a tic lo g g in g is its a b ility to
re c o rd a la rg e a m o u n t o f in fo rm a tio n a b o u t tra n s ie n t e rro rs a n d to p ro v id e d e ta ils o f
a u to m a tic e rro r re c o v e ry p ro c e s s e s , w h ic h c a n n o t b e d o n e m a n u a lly . D is a d v a n ta g e s
a re th a t a n o n -lin e lo g d o e s n o t u s u a lly in c lu d e in fo rm a tio n a b o u t th e c a u s e a n d
p ro p a g a tio n o f th e e rro r o r a b o u t o ff-lin e d ia g n o s is . A ls o , u n d e r s o m e c ra s h s c e n a rio s ,
th e s y s te m m a y fa il to o q u ic k ly fo r a n y e rro r m e s s a g e s to b e re c o rd e d .
A n im p o rta n t q u e s tio n to b e a s k e d h e re is : H o w a c c u ra te a re e v e n t lo g s in
c h a ra c te riz in g fa ilu re b e h a v io r o f th e s y s te m ? W h ile e v e n t lo g s p ro v id e v a lu a b le
in s ig h t in to u n d e rs ta n d in g th e n a tu re a n d d y n a m ic s o f ty p ic a l p ro b le m s o b s e rv e d in a
n e tw o rk s y s te m , in m a n y c a s e s th e in fo rm a tio n in e v e n t lo g s is n o t s u ffic ie n t to
p r e c is e ly d e te r m in e a n a tu r e o f a p r o b le m ( e .g ., w h e th e r it w a s a s o f tw a r e o r h a r d w a r e
c o m p o n e n t fa ilu re ). T h e o n ly re lia b le w a y to im p ro v e a c c u ra c y o f lo g s is (1 ) to
p e rfo rm m o re fre q u e n t, d e ta ile d lo g g in g b y e a c h c o m p o n e n t a n d (2 ) in s tru m e n t th e
W in d o w s N T c o d e w ith n e w (m o re p re c is e ) lo g g in g m e c h a n is m s . H o w e v e r, th e re is
a lw a y s a tra d e -o ff b e tw e e n a c c u ra c y a n d in tru s iv e n e s s o f m e a s u re m e n ts . N o
c o m m e rc ia l o rg a n iz a tio n w ill p e rm it s o m e o n e to in s ta ll a n u n te s te d to o l to m o n ito r
th e n e tw o rk . C o n s e q u e n tly , w e u s e e x is tin g lo g s n o t o n ly to c h a ra c te riz e fa ilu re
b e h a v io r o f th e n e tw o rk (p re s e n te d in th is p a p e r), b u t a ls o to d e te rm in e h o w th e
lo g g in g s y s te m c o u ld b e im p r o v e d ( e .g ., b y a d d in g to th e o p e r a tin g s y s te m a q u e r y
m e c h a n is m to re m o te ly p ro b e s y s te m c o m p o n e n ts a b o u t th e ir s ta tu s ). It s h o u ld b e
n o te d th a t in m a n y c o m m e r c ia l o p e r a tin g s y s te m s ( e .g ., M V S ) e v e n t lo g s a r e a c c u r a te
e n o u g h to d o c u m e n t fa ilu re s .
3 .2 C la s s ific a tio n o f D a ta C o lle c te d fr o m a L A N o f W in d o w s N T -B a s e d S e r v e r s
T h e in itia l b re a k u p o f th e d a ta o n a s y s te m re b o o t is p rim a rily b a s e d o n th e e v e n ts th a t

p re c e d e d th e c u rre n t re b o o t b y n o m o re th a n a n h o u r (a n d th a t o c c u rre d a fte r th e
p re v io u s re b o o t). F o r e a c h in s ta n c e o f a re b o o t, th e m o s t s e v e re a n d fre q u e n tly
o c c u rrin g e v e n ts (h e re a fte r re fe rre d to a s p ro m in e n t e v e n ts ) a re id e n tifie d . T h e
c o rre s p o n d in g re b o o t is th e n c a te g o riz e d b a s e d o n th e s o u rc e a n d th e id o f th e s e
p ro m in e n t e v e n ts . In s o m e c a s e s , th e p ro m in e n t e v e n ts a re s p e c ific e n o u g h to id e n tify
th e p ro b le m th a t c a u s e d th e re b o o t. In o th e r c a s e s , o n ly a h ig h -le v e l d e s c rip tio n o f th e
p ro b le m c a n b e o b ta in e d b a s e d o n th e k n o w le d g e o f th e p ro m in e n t e v e n ts . T a b le 5
s h o w s th e b re a k u p o f th e re b o o ts b y c a te g o ry .
H a r d w a r e o r fir m w a r e r e la te d p r o b le m s : T h is c a te g o ry in c lu d e s e v e n ts th a t
in d ic a te a p r o b le m w ith h a r d w a r e c o m p o n e n ts ( n e tw o r k a d a p te r , d is k , e tc .) , th e ir
a s s o c ia te d d riv e rs (ty p ic a lly d riv e rs fa ilin g to lo a d b e c a u s e o f a p ro b le m w ith th e
d e v ic e ) , o r s o m e f ir m w a r e ( e .g ., s o m e e v e n ts in d ic a te d th a t th e P o w e r O n S e lf T e s t
h a d fa ile d ).
C o n n e c tiv ity p r o b le m s : T h is c a te g o ry d e n o te s e v e n ts th a t in d ic a te d th a t e ith e r a
s y s te m c o m p o n e n t ( e .g ., r e d ir e c to r , s e r v e r ) o r a c r itic a l a p p lic a tio n ( e .g ., M S
E x c h a n g e S y s te m A tte n d a n t) c o u ld n o t re trie v e in fo rm a tio n fro m a re m o te m a c h in e .
In th e s e s c e n a rio s , it is n o t p o s s ib le to p in p o in t th e a c tu a l c a u s e o f th e c o n n e c tiv ity
p ro b le m . S o m e o f th e c o n n e c tiv ity fa ilu re s re s u lt fro m n e tw o rk a d a p te r p ro b le m s a n d
h e n c e a re c a te g o riz e d a s h a rd w a re re la te d .
T a b le 5 . B re a k u p o f R e b o o ts B a s e d o n P ro m in e n t E v e n ts
C a te g o r y F r e q u e n c y P e r c e n ta g e
T o ta l re b o o ts 1 1 0 0 1 0 0
H a rd w a re o r firm w a re p ro b le m s 1 0 5 9
C o n n e c tiv ity p ro b le m s 2 4 1 2 2
C ru c ia l a p p lic a tio n fa ilu re s 1 5 2 1 4
P ro b le m s w ith a s o ftw a re c o m p o n e n t 4 2 4
N o rm a l s h u td o w n s 6 3 6
N o rm a l re b o o ts /p o w e r-o ff (n o in d ic a tio n o f 1 7 8 1 6
a n y p ro b le m s )
U n k n o w n 3 1 9 2 9
C r u c ia l a p p lic a tio n fa ilu r e : T h is c a te g o ry e n c o m p a s s e s re b o o ts , w h ic h a re

p re c e d e d b y s e v e re p ro b le m s w ith , a n d p o s s ib ly s h u td o w n o f, c ritic a l a p p lic a tio n
s o f t w a r e ( s u c h a s M e s s a g e T r a n s f e r A g e n t ) . I n s u c h c a s e s , i t w a s n ’t c l e a r w h y t h e
a p p lic a tio n re p o rte d p ro b le m s . If a n a p p lic a tio n s h u td o w n o c c u rs a s a re s u lt o f
c o n n e c tiv ity p ro b le m , th e n th e c o rre s p o n d in g re b o o t is c a te g o riz e d a s c o n n e c tiv ity -
re la te d .
P r o b le m s w ith a s o ftw a r e c o m p o n e n t: T y p ic a lly th e s e re b o o ts a re c h a ra c te riz e d b y
s ta rtu p p ro b le m s (s u c h a s a c ritic a l s y s te m c o m p o n e n t n o t lo a d in g o r a d riv e r e n try
p o in t n o t b e in g fo u n d ). A n o th e r s ig n ific a n t ty p e o f p ro b le m in th is c a te g o ry is th e
m a c h in e ru n n in g o u t o f v irtu a l m e m o ry , p o s s ib ly d u e to a m e m o ry le a k in a s o ftw a re
c o m p o n e n t. In m a n y o f th e s e c a s e s , th e c o m p o n e n t c a u s in g th e p ro b le m is n o t
id e n tifia b le .
N o r m a l s h u td o w n s : T h is c a te g o ry c o v e rs re b o o ts , w h ic h a re n o t p re c e d e d b y
w a rn in g s o r e rro r m e s s a g e s . A d d itio n a lly , th e re a re e v e n ts th a t in d ic a te s h u ttin g d o w n
o f c r itic a l a p p lic a tio n s o f tw a r e a n d s o m e s y s te m c o m p o n e n ts ( e .g ., th e B R O W S E R ) .
T h e s e re p re s e n t s h u td o w n s fo r m a in te n a n c e o r fo r c o rre c tin g p ro b le m s n o t c a p tu re d in
th e e v e n t lo g s .
N o r m a l r e b o o ts /p o w e r -o ff: T h is c a te g o ry c o v e rs re b o o ts w h ic h a re ty p ic a lly n o t
p re c e d e d b y s h u td o w n e v e n ts , b u t d o n o t a p p e a r to b e c a u s e d b y a n y p ro b le m s e ith e r.
N o w a rn in g s o r e rro r m e s s a g e s a p p e a r in th e e v e n t lo g b e fo re th e re b o o t.
B a s e d o n d a ta in T a b le 5 , th e fo llo w in g o b s e rv a tio n s c a n b e m a d e a b o u t th e fa ilu re s :
1 . 2 9 % o f th e re b o o ts c a n n o t b e c a te g o riz e d . S u c h re b o o ts a re in d e e d p re c e d e d b y
e v e n ts o f s e v e rity 2 o r le s s e r, b u t th e re is n o t e n o u g h in fo rm a tio n a v a ila b le to
d e c id e (a ) w h e th e r th e e v e n ts w e re s e v e re e n o u g h to fo rc e a re b o o t o f th e m a c h in e
o r (b ) th e n a tu re o f th e p ro b le m th a t th e e v e n ts re fle c t.
2 . A s ig n ific a n t p e rc e n ta g e (2 2 % ) o f th e re b o o ts h a v e re p o rte d c o n n e c tiv ity p ro b le m s .
C o n n e c tiv ity p ro b le m s s u g g e s t th a t th e re c o u ld b e p ro p a g a te d fa ilu re s in th e
d o m a in . F u rth e rm o re , it is p o s s ib le th a t th e m a c h in e s fu n c tio n in g a s th e m a s te r
b ro w s e r a n d th e P rim a ry D o m a in C o n tro lle r (P D C )2, re s p e c tiv e ly a re p o te n tia l
re lia b ility b o ttle n e c k s o f th e d o m a in .
3 . O n ly a s m a ll p e rc e n ta g e (1 0 % ) o f th e re b o o ts c a n b e tra c e d to a s y s te m h a rd w a re
c o m p o n e n t. M o s t o f th e id e n tifia b le p ro b le m s a re s o ftw a re re la te d .
4 . N e a r ly 5 0 % o f th e r e b o o ts a r e a b n o r m a l r e b o o ts ( i.e ., th e r e b o o ts w e r e d u e to a
p ro b le m w ith th e m a c h in e ra th e r th a n d u e to a n o rm a l s h u td o w n ).
5 . In n e a rly 1 5 % o f th e c a s e s , s e v e re p ro b le m s w ith a c ru c ia l m a il s e rv e r a p p lic a tio n
fo rc e a re b o o t o f th e m a c h in e .
3 .3 A n a ly s is o f F a ilu r e B e h a v io r o f I n d iv id u a l M a c h in e s
A fte r th e p re lim in a ry in v e s tig a tio n o f th e c a u s e s o f fa ilu re s , w e p ro b e fa ilu re s fro m

th e p e rs p e c tiv e o f a n in d iv id u a l m a c h in e a s w e ll a s th e w h o le n e tw o rk . F irs t w e fo c u s
o n th e fa ilu re b e h a v io r o f in d iv id u a l m a c h in e s in th e d o m a in to o b ta in (1 ) e s tim a te s o f
m a c h in e u p -tim e s a n d d o w n -tim e s , (2 ) a n e s tim a te o f th e a v a ila b ility o f e a c h
m a c h in e , a n d (3 ) a fin ite s ta te m o d e l to d e s c rib e th e fa ilu re b e h a v io r o f a ty p ic a l
m a c h in e in th e d o m a in . M a c h in e u p -tim e s a n d d o w n -tim e s a re e s tim a te d a s fo llo w s :
F o r e v e ry re b o o t e v e n t e n c o u n te re d , th e tim e s ta m p o f th e re b o o t is re c o rd e d .
T h e tim e s ta m p o f th e e v e n t im m e d ia te ly p re c e d in g th e re b o o t is a ls o re c o rd e d .
(T h is w o u ld b e th e la s t e v e n t lo g g e d b y th e m a c h in e b e fo re it g o e s d o w n .)
A s m o o th in g fa c to r o f o n e h o u r is a p p lie d to th e r e b o o ts ( i.e ., fo r m u ltip le r e b o o ts
th a t o c c u rre d w ith in a n p e rio d o f o n e h o u r, e x c e p t th e la s t o n e , a re d is re g a rd e d ).
E a c h u p -tim e e s tim a te is g e n e ra te d b y c a lc u la tin g th e tim e d iffe re n c e b e tw e e n a
re b o o t tim e s ta m p a n d th e tim e s ta m p o f th e e v e n t p re c e d in g th e n e x t re b o o t.
2 In th e a n a ly z e d n e tw o rk , th e m a c h in e s b e lo n g e d to a c o m m o n W in d o w s N T d o m a in . O n e o f
th e m a c h in e s w a s c o n fig u re d a s th e P rim a ry D o m a in C o n tro lle r (P D C ). T h e re s t o f th e
m a c h in e s fu n c tio n e d a s B a c k u p D o m a in C o n tro lle rs (B D C s ).
E a c h d o w n -tim e e s tim a te is o b ta in e d b y c a lc u la tin g th e tim e d iffe re n c e b e tw e e n a

re b o o t tim e s ta m p a n d th e tim e s ta m p o f th e e v e n t p re c e d in g it.
M a c h in e u p tim e s a n d m a c h in e d o w n tim e s a re p re se n te d in T a b le 6 . A s th e s ta n d a rd
d e v ia tio n s u g g e s ts , th e r e is a g re a t d e g re e o f v a ria tio n in th e m a c h in e u p tim e s . T h e
lo n g e st u p tim e w a s n e a rly th re e m o n th s. T h e a v e ra g e is s k e w e d b e c a u se o f so m e o f
th e lo n g e r u p tim e s . T h e m e d ia n is m o re re p re s e n ta tiv e o f th e ty p ic a l u p tim e .
T a b le 6 . M a c h in e U p tim e & D o w n tim e S ta tis tic s
Ite m M a c h in e M a c h in e
U p tim e S ta tis tic s D o w n tim e S ta tis tic s
N u m b e r o f e n trie s 6 1 6 6 8 2
M a x im u m 8 5 .2 d a y s 1 5 .7 6 d a y s
M in im u m 1 h o u r 1 se c o n d
A v e ra g e 1 1 .8 2 d a y s 1 .9 7 h o u rs
M e d ia n 5 .5 4 d a y s 1 1 .4 3 m in u te s
S ta n d a rd D e v ia tio n 1 5 .6 5 6 d a y s 1 5 .8 6 h o u rs
A s th e ta b le s h o w s , 5 0 % o f th e d o w n tim e s la s t a b o u t 1 2 m in u te s . T h is is p ro b a b ly to o
sh o rt a p e rio d to re p la c e h a rd w a re c o m p o n e n ts a n d re c o n fig u re th e m a c h in e . T h e
im p lic a tio n is th a t m a jo rity o f th e p ro b le m s a re s o ftw a re re la te d (m e m o ry le a k s ,
m is lo a d e d d r iv e r s , a p p lic a tio n e r r o r s e tc .) . T h e m a x im u m v a lu e is u n r e a lis tic a n d
m ig h t h a v e b e e n d u e to th e m a c h in e b e in g te m p o ra rily ta k e n o ff-lin e a n d p u t b a c k in
a fte r a fo rtn ig h t.
S in c e th e m a c h in e s u n d e r c o n s id e ra tio n a re d e d ic a te d m a il s e rv e rs , b rin g in g d o w n
o n e o r m o re o f th e m w o u ld p o te n tia lly d is ru p t s to ra g e , fo rw a rd in g , re c e p tio n , a n d
d e liv e ry o f m a il. T h e d is ru p tio n c a n b e p re v e n te d if e x p lic it re ro u tin g is p e r-fo rm e d to
a v o id th e m a c h in e s th a t a re d o w n . B u t it is n o t c le a r if s u c h re ro u tin g w a s d o n e o r c a n
b e d o n e . In th is c o n te x t th e fo llo w in g o b s e rv a tio n s w o u ld b e c a u s e s fo r c o n c e rn : (1 )
a v e ra g e d o w n tim e m e a s u re d w a s n e a rly 2 h o u rs o r (2 ) 5 0 % o f th e m e a s u re d u p tim e
sa m p le s w e re a b o u t 5 d a y s o r le s s .
A v a ila b ility
H a v in g e s tim a te d m a c h in e u p tim e a n d d o w n tim e , w e c a n e s tim a te th e a v a ila b ility o f
e a c h m a c h in e . T h e a v a ila b ility is e v a lu a te d a s th e ra tio :
[< a v e ra g e u p tim e > / (< a v e ra g e u p tim e > + < a v e ra g e d o w n tim e > )]* 1 0 0
T a b le 7 s u m m a riz e s th e a v a ila b ility m e a s u re m e n ts . A s th e ta b le d e p ic ts , th e
m a jo r ity o f th e m a c h in e s h a v e a n a v a ila b ility o f 9 9 .7 % o r h ig h e r . A ls o th e r e is n o t a
la rg e v a ria tio n a m o n g th e in d iv id u a l v a lu e s . T h is is s u rp ris in g c o n s id e rin g th e ra th e r
la rg e d e g re e o f v a ria tio n in th e a v e ra g e u p tim e s . It fo llo w s th a t m a c h in e s w ith s m a lle r
a v e ra g e u p -tim e s a ls o h a d c o rre s p o n d in g ly s m a lle r a v e ra g e d o w n tim e s , s o th a t th e
ra tio s a re n o t v e ry d iffe re n t. H e n c e , th e d o m a in h a s tw o ty p e s o f m a c h in e s : th o s e th a t
re b o o t o fte n b u t re c o v e r q u ic k ly a n d th o s e th a t s ta y u p re la tiv e ly lo n g e r b u t ta k e
lo n g e r to re c o v e r fro m a fa ilu re .
T a b le 7 . M a c h in e A v a ila b ility
Ite m V a lu e
N u m b e r o f m a c h in e s 6 6
M a x im u m 9 9 .9 9
M in im u m 8 9 .3 9
M e d ia n 9 9 .7 6
A v e ra g e 9 9 .3 5
S ta n d a rd D e v ia tio n 1 .5 2
F ig . 3 s h o w s th e u n a v a ila b ility d is tr ib u tio n a c ro s s th e m a c h in e s (u n a v a ila b ility w a s

e v a lu a te d a s : 1 0 0 – A v a ila b ility ). L e s s th a n 2 0 % o f th e m a c h in e s h a d a n a v a ila b ility
o f 9 9 .9 % o r h ig h e r . H o w e v e r , n e a r ly 9 0 % o f th e m a c h in e s h a d a n a v a ila b ility o f 9 9 %
o r h ig h e r. It s h o u ld b e n o te d th a t th e s e n u m b e rs in d ic a te th e fra c tio n o f tim e th e
m a c h in e is a liv e . T h e y d o n o t n e c e s s a rily in d ic a te th e a b ility o f th e m a c h in e to
p ro v id e u s e fu l s e rv ic e b e c a u s e th e m a c h in e c o u ld b e a liv e b u t s till u n a b le to p ro v id e
th e s e rv ic e e x p e c te d o f it. T o e la b o ra te , e a c h o f th e m a c h in e s in th e d o m a in a c ts a s a
m a il s e rv e r fo r a s e t o f u s e r m a c h in e s . H e n c e , if a n y o f th e s e m a il s e rv e rs h a s
p ro b le m s th a t p re v e n t it fro m re c e iv in g , s to rin g , fo rw a rd in g , o r d e liv e rin g m a il, th e n
th a t s e rv e r w o u ld e ffe c tiv e ly b e u n a v a ila b le to th e u s e r m a c h in e s e v e n th o u g h it is u p
a n d ru n n in g . H e n c e , to o b ta in a b e tte r e s tim a te o f m a c h in e a v a ila b ility , it is n e c e s s a ry
to e x a m in e h o w lo n g th e m a c h in e is a c tu a lly a b le to p ro v id e s e rv ic e to u s e r m a c h in e s .
F ig . 3 . U n a v a ila b ility D is trib u tio n

M o d e lin g M a c h in e B e h a v io r
T o o b ta in m o re a c c u ra te e s tim a te s o f m a c h in e a v a ila b ility , w e m o d e le d th e b e h a v io r
o f a ty p ic a l m a c h in e in te rm s o f a fin ite s ta te m o d e l. T h e m o d e l w a s b a s e d o n th e
e v e n ts th a t e a c h m a c h in e lo g s . In th e m o d e l, e a c h s ta te re p re s e n ts a le v e l o f
fu n c tio n a lity o f th e m a c h in e . A m a c h in e is e ith e r in a fu lly fu n c tio n a l s ta te , in w h ic h
it lo g s e v e n ts th a t in d ic a te n o rm a l a c tiv ity , o r in a p a rtia lly fu n c tio n a l s ta te , in w h ic h it
lo g s e v e n ts th a t in d ic a te p ro b le m s o f a s p e c ific n a tu re .
S e le c tio n a n d a s s ig n m e n t o f s ta te s to a m a c h in e w a s p e rfo rm e d a s fo llo w s. T h e
lo g s w e re s p lit in to tim e -w in d o w s o f o n e h o u r e a c h . F o r e a c h s u c h w in d o w , th e
m a c h in e w a s a s s ig n e d a s ta te , w h ic h it o c c u p ie d th ro u g h o u t th e d u ra tio n o f th e
w in d o w . T h e a s s ig n m e n t w a s b a s e d o n th e e v e n ts th a t th e m a c h in e lo g g e d in th e
w in d o w . T a b le 8 d e s c rib e s th e s ta te s id e n tifie d fo r th e m o d e l.
T a b le 8 . M a c h in e S ta te s
S ta te N a m e M a in E v e n ts (id /s o u r c e /s e v e r ity ) E x p la n a tio n
R e b o o t 6 0 0 5 /E v e n tL o g /4 M a c h in e lo g s re b o o t a n d o th e r
in itia liz a tio n e v e n ts
F u n c tio n a l 5 7 1 5 /N E T L O G O N /4 M a c h in e lo g s su c c e ssfu l
1 0 1 6 /M S E x c h a n g e IS P riv a te /8 c o m m u n ic a tio n w ith P D C
C o n n e c tiv ity p ro b le m s 3 0 9 6 /N E T L O G O N /1 P ro b le m s lo c a tin g th e P D C
5 7 1 9 /N E T L O G O N /1
S ta rtu p p ro b le m s 7 0 0 0 /S e rv ic e C o n tro l M a n a g e r/1 S o m e s y s te m c o m p o n e n t o r
7 0 0 1 /S e rv ic e C o n tro l M a n a g e r/1 a p p lic a tio n fa ile d to s ta rtu p
M T A p ro b le m s 2 2 0 6 /M S E x c h a n g e M T A /2 M e ssa g e T ra n sf e r A g e n t h a s
2 2 0 7 /M S E x c h a n g e M T A /2 p ro b le m s w ith s o m e in te rn a l
d a ta b a s e s
A d a p te r p ro b le m s 4 1 0 5 /C p q N F 3 /1 T h e N e tF le x A d a p te r d riv e r re p o rts
4 1 0 6 /C p q N F 3 /1 p ro b le m s
T e m p o ra ry M T A 9 3 2 2 /M S E x c h a n g e M T A /4 M e ssa g e T ra n sf e r A g e n t re p o rts
p ro b le m s 9 2 7 7 /M S E x c h a n g e M T A /2 p ro b le m s o f a te m p o ra ry (o r le s s
3 1 7 5 /M S E x c h a n g e M T A /2 s e v e re ) n a tu re
1 2 0 9 /M S E x c h a n g e M T A /2
S e rv e r p ro b le m s 2 0 0 6 /S rv /1 S e rv e r c o m p o n e n t re p o rts h a v in g
re c e iv e d b a d ly f o rm a tte d re q u e s ts
B R O W S E R p ro b le m s 8 0 2 1 /B R O W S E R /2 B ro w se r re p o rts in a b ility to c o n ta c t
8 0 3 2 /B R O W S E R /1 th e m a s te r b ro w se r
D is k p ro b le m s 1 1 /C p q 3 2 fs 2 /1 D is k d riv e rs re p o rt p ro b le m s
5 /C p q 3 2 fs 2 /1
9 /C p q a rra y /1
1 1 /C p q a rra y /1
T a p e p ro b le m s 1 5 /d ltta p e /1 T a p e d riv e r re p o rts p ro b le m s
S n m p e le a p ro b le m s 3 0 0 6 /S n m p e le a /1 S n m p e v e n t lo g a g e n t re p o rts e rro r
w h ile re a d in g a n e v e n t lo g re c o rd
S h u td o w n 8 0 3 3 /B R O W S E R /4 A p p lic a tio n /m a c h in e s h u td o w n in
1 0 0 3 /M S E x c h a n g e S A /4 p ro g re ss
E a c h m a c h in e (e x c e p t th e P rim a ry D o m a in C o n tro lle r (P D C ) w h o s e tra n s itio n s

w e re d iffe re n t fro m th e re s t) in th e d o m a in w a s m o d e le d in te rm s o f th e s ta te s
m e n tio n e d in th e ta b le . A h y p o th e tic a l m a c h in e w a s c re a te d b y c o m b in in g th e
tra n s itio n s o f a ll th e in d iv id u a l m a c h in e s a n d f ilte rin g o u t tra n s itio n s th a t o c c u rre d
le s s fre q u e n tly . F ig . 4 d e s c rib e s th is h y p o th e tic a l m a c h in e . In th e f ig u re , th e w e ig h t o n
e a c h o u tg o in g e d g e re p re s e n ts th e fra c tio n o f a ll tra n s itio n s fro m th e o rig in a tin g s ta te
( i.e ., ta il o f th e a rro w ) th a t e n d u p in a g iv e n te r m in a tin g s ta te ( i.e ., h e a d o f th e a r r o w ) .

F o r e x a m p le , if th e re is a n e d g e fro m s ta te A to s ta te B w ith a w e ig h t o f 0 .5 , th e n it
w o u ld in d ic a te th a t 5 0 % o f a ll tra n s itio n s fro m s ta te A a re to s ta te B . F ro m F ig . 4 th e
fo llo w in g o b s e rv a tio n s c a n b e m a d e :
O n ly a b o u t 4 0 % o f th e tra n s itio n s o u t o f th e R e b o o t s ta te s a re to th e F u n c tio n a l
s ta te . T h is in d ic a te s th a t in th e m a jo rity o f th e c a s e s , e ith e r th e re b o o t is n o t a b le to
s o lv e th e o rig in a l p ro b le m , o r it c re a te s n e w o n e s .
M o re th a n h a lf o f th e tra n s itio n s o u t o f th e S ta r tu p p r o b le m s a re to th e C o n n e c tiv ity
p r o b le m s s ta te . T h u s , th e m a jo rity o f th e s ta rtu p p ro b le m s a re re la te d to
c o m p o n e n ts th a t p a rtic ip a te in n e tw o rk a c tiv ity .
M o s t o f th e p ro b le m s th a t a p p e a r w h e n th e m a c h in e is fu n c tio n a l a re re la te d to
n e tw o rk a c tiv ity . P ro b le m s w ith th e d is k a n d o th e r c o m p o n e n ts a re le s s f re q u e n t.
F ig . 4 . S ta te T ra n s itio n s o f a T y p ic a l M a c h in e
M o re th a n 5 0 % o f th e tra n s itio n s o u t o f D is k p r o b le m s s ta te a re to th e F u n c tio n a l

s ta te . A ls o , w e d o n o t o b s e rv e a n y s ig n ific a n t tra n s itio n s fro m th e D is k p r o b le m s
s ta te to o th e r s ta te s . T h is c o u ld b e d u e to o n e o r m o re o f th e fo llo w in g :
1 . T h e m a c h in e s a re e q u ip p e d w ith re d u n d a n t d is k s s o th a t e v e n if o n e o f th e m is
d o w n , th e fu n c tio n a lity is n o t d is ru p te d in a m a jo r w a y .
2 . T h e d is k p ro b le m s , th o u g h p e rs is te n t, a re n o t s e v e re e n o u g h to d is ru p t n o rm a l
a c tiv ity (m a y b e re trie s to a c c e s s th e d is k s u c c e e d ).
3 . T h e a c tiv itie s th a t a re c o n s id e re d to b e re p re s e n ta tiv e o f th e F u n c tio n a l s ta te
m a y n o t in v o lv e m u c h d is k a c tiv ity .
O v e r 1 1 % o f th e tra n s itio n s o u t o f th e T e m p o r a r y M T A p r o b le m s s ta te a re to th e
B ro w se r p ro b le m s s ta te . W e s u s p e c t th a t th e re w a s a lo c a l p ro b le m th a t c a u s e d
R P C s to tim e o u t o r fa il a n d c a u s e d p ro b le m s f o r th e M T A a n d B R O W S E R .
A n o th e r p o s s ib ility is th a t, in b o th c a s e s , it w a s th e sa m e re m o te m a c h in e th a t
c o u ld n o t b e c o n ta c te d . B a s e d o n th e a v a ila b le d a ta , it w a s n o t p o s s ib le to
d e te rm in e th e re a l c a u s e o f th e p ro b le m .
T o v ie w th e tra n s itio n s fro m a d iffe re n t p e rs p e c tiv e , w e c o m p u te d th e w e ig h t o f e a c h
o u tg o in g e d g e a s a fra c tio n o f a ll th e tra n s itio n s in th e fin ite s ta te m a c h in e . S u c h a
c o m p u ta tio n p ro v id e d s o m e in te re s tin g in s ig h ts , w h ic h a re e n u m e ra te d b e lo w :
1 . N e a rly 1 0 % o f a ll th e tra n s itio n s a re b e tw e e n th e F u n c tio n a l a n d T e m p o r a r y M T A
p ro b le m s s ta te s . T h e s e M T A p ro b le m s a re ty p ic a lly p ro b le m s w ith s o m e R P C c a lls
(e ith e r fa ilin g o r b e in g c a n c e le d ).
2 . A b o u t 0 .5 % ( 1 in 2 0 0 ) o f a ll tr a n s itio n s a r e to th e R e b o o t s ta te .
3 . T h e m a jo rity o f th e tra n s itio n s in to th e M T A p r o b le m s s ta te a re fro m th e R e b o o t
s ta te . T h u s , M T A p ro b le m s a re p rim a rily p ro b le m s th a t o c c u r a t s ta rtu p . In
c o n tra s t, th e m a jo rity o f th e tra n s itio n s in to th e S e r v e r p r o b le m s s ta te a n d th e
B r o w s e r p r o b le m s s ta te (e x c lu d in g th e s e lf lo o p s ) a re fro m th e F u n c tio n a l s ta te .
S o , th e s e p ro b le m s (o r a t le a s t a s ig n ific a n t fra c tio n o f th e m ) ty p ic a lly a p p e a r a fte r
th e m a c h in e is fu n c tio n a l.
4 . A b o u t 9 2 % o f a ll tra n s itio n s a re in to th e F u n c tio n a l s ta te . T h is fig u re is
a p p ro x im a te ly a m e a s u re o f th e a v e ra g e tim e th e h y p o th e tic a l m a c h in e s p e n d s in
th e fu n c tio n a l s ta te . H e n c e it is a m e a s u re o f th e a v e ra g e a v a ila b ility o f a ty p ic a l
m a c h in e . In th is c a s e , a v a ila b ility m e a s u re s th e a b ility o f th e m a c h in e to p ro v id e
s e rv ic e , n o t ju s t to s ta y a liv e .
3 .4 M o d e lin g D o m a in B e h a v io r
A n a ly z in g s y s te m b e h a v io r fro m th e p e rs p e c tiv e o f th e w h o le d o m a in (1 ) p ro v id e s a
m a c ro s c o p ic v ie w o f th e s y s te m ra th e r th a n a m a c h in e -s p e c ific v ie w , (2 ) h e lp s to
c h a ra c te riz e th e n a tu re o f in te ra c tio n s in th e n e tw o rk , a n d (3 ) a id s in id e n tify in g
p o te n tia l re lia b ility b o ttle n e c k s a n d s u g g e s ts w a y s to im p ro v e re s ilie n c e to o p e ra tio n a l
fa u lts .
In te r -r e b o o t T im e s . A n im p o rta n t c h a ra c te ris tic o f th e d o m a in is h o w o fte n r e b o o ts
o c c u r w ith in it. T o e x a m in e th is , th e w h o le d o m a in is tre a te d a s a b la c k b o x , a n d
e v e ry re b o o t o f e v e ry m a c h in e in th e d o m a in is c o n s id e r e d to b e a re b o o t o f th e b la c k
b o x . T a b le 9 s h o w s th e s ta tis tic s o f s u c h in te r-re b o o t tim e s m e a su re d a c ro s s th e w h o le
d o m a in .
T a b le 9 . In te r-re b o o t T im e S ta tis tic s fo r th e D o m a in

Ite m V a lu e
N u m b e r o f s a m p le s 8 8 2
M a x im u m 2 .4 6 d a y s
M in im u m L e ss th a n 1 s e c o n d
M e d ia n 2 4 0 2 se c o n d s
A v e ra g e 4 .0 9 h o u rs
S ta n d a rd D e v ia tio n 7 .5 2 h o u rs
F in ite S ta te M o d e l o f th e D o m a in
T h e p ro p e r fu n c tio n in g o f th e d o m a in re lie s o n th e p ro p e r fu n c tio n in g o f th e P D C a n d
its in te ra c tio n s w ith th e B a c k u p D o m a in C o n tro lle rs (B D C s). T h u s it w o u ld se e m
u s e fu l to re p re s e n t th e d o m a in in te rm s o f h o w m a n y B D C s a re a liv e a t a n y g iv e n
m o m e n t a n d a ls o in te rm s o f th e P D C b e in g fu n c tio n a l o r n o t. A c c o rd in g ly , a fin ite
s ta te m o d e l w a s c o n s tru c te d a s fo llo w s :
1 . T h e d a ta c o lle c tio n p e rio d w a s b ro k e n u p in to tim e w in d o w s o f a fix e d le n g th ,
2 . F o r e a c h s u c h tim e w in d o w , th e s ta te o f th e d o m a in w a s c o m p u te d , a n d
3 . A tra n s itio n d ia g ra m w a s c o n s tru c te d b a s e d o n th e s ta te in fo rm a tio n .
T h e s ta te o f th e d o m a in d u rin g a g iv e n tim e w in d o w w a s c o m p u te d b y e v a lu a tin g
th e n u m b e r o f m a c h in e s th a t re b o o te d d u rin g th a t tim e w in d o w . M o re s p e c ific a lly , th e
s ta te s w e re id e n tifie d a s sh o w n in T a b le 1 0 . F ig . 5 s h o w s th e tra n s itio n s in th e
d o m a in . E a c h tim e w in d o w w a s o n e h o u r lo n g .
T a b le 1 0 . D o m a in S ta te s a n d th e ir In te rp re ta tio n
S ta te N a m e M e a n in g
P D C P rim a ry D o m a in C o n tro lle r (P D C ) re b o o te d
B D C 1 B a c k u p D o m a in C o n tro lle r (B D C ) re b o o te d
M B D C M a n y B D C s re b o o te d
P D C + B D C P D C a n d O n e B D C re b o o te d
P D C + M B D C P D C a n d M a n y B D C s re b o o te d
F F u n c tio n a l (n o re b o o ts o b s e rv e d )
F ig . 5 . D o m a in S ta te T ra n s itio n s
F ig . 5 re v e a ls s o m e in te re s tin g in s ig h ts .
1 . N e a rly 7 7 % o f a ll tra n s itio n s fro m th e F s ta te , e x c lu d in g th e s e lf-lo o p s , a re to th e

B D C s ta te . If th e s e tra n s itio n s d o in d e e d re s u lt in d is ru p tio n in s e rv ic e , th e n it is
p o s s ib le to im p ro v e th e o v e ra ll a v a ila b ility s ig n ific a n tly ju s t b y to le ra tin g s in g le
m a c h in e fa ilu re s .
2 . A n o n -n e g lig ib le n u m b e r o f tra n s itio n s a re b e tw e e n th e F s ta te a n d th e M B D C s ta te
a n d b e tw e e n s ta te s B D C a n d M B D C . T h is w o u ld in d ic a te p o te n tia lly c o rre la te d
fa ilu re s a n d re c o v e ry (s e e [1 2 ] fo r m o re d e ta ils ).
3 . M a jo rity o f tra n s itio n s fro m s ta te P D C a re to s ta te F . T h is c o u ld b e e x p la in e d b y
o n e o f th e fo llo w in g :
- M o s t o f th e p ro b le m s w ith th e P D C a re n o t p ro p a g a te d to th e B D C s ,
- T h e P D C ty p ic a lly re c o v e rs b e fo re a n y s u c h p ro p a g a tio n ta k e s e ffe c t o n th e
B D C s, o r
- T h e p ro b le m s o n th e P D C a re n o t s e v e re e n o u g h to b rin g it d o w n , b u t th e y
m ig h t w o rs e n a s th e y p ro p a g a te to th e B D C s a n d fo rc e a re b o o t.
H o w e v e r, 2 0 % o f th e tra n s itio n s fro m th e P D C s ta te a re to th e P D C + B D C s ta te .
S o th e re is a p o s s ib ility o f th e p ro p a g a tio n o f fa ilu re s .
4 C o n c lu s io n s
T h e d is c u s s io n in th is p a p e r fo c u se d o n th e is s u e s in v o lv e d in a n a ly z in g th e
a v a ila b ility o f n e tw o rk e d s y s te m s u s in g fa u lt in je c tio n a n d th e fa ilu re d a ta c o lle c te d
b y th e lo g g in g m e c h a n is m s b u ilt in to th e s y s te m . T o a c h ie v e a c c u ra te a n d
c o m p re h e n s iv e s y s te m d e p e n d a b ility e v a lu a tio n th e a n a ly s is m u st s p a n th e th re e
p h a s e s o f s y s te m life : d e s ig n p h a s e , p ro to ty p e p h a se , a n d o p e ra tio n a l p h
a se .
F o r e x a m p le th e p re s e n te d fa u lt in je c tio n s tu d y o f th e A R M O R - b a s e d S I F T
e n v iro n m e n t d e m o n s tra te d th a t:
1 . S tru c tu rin g th e fa u lt in je c tio n e x p e rim e n ts to p ro g re s s iv e ly s tre s s th e e rro r
d e te c tio n a n d re c o v e ry m e c h a n is m s is a u s e fu l a p p ro a c h to e v a lu a tin g p e rfo rm a n c e
a n d e rro r p ro p a g a tio n .
2 . E v e n th o u g h th e p ro b a b ility fo r c o rre la te d fa ilu re s is s m a ll, its p o te n tia l im p a c t o n
a p p lic a tio n a v a ila b ility is s ig n ific a n t.
3 . T h e S IF T e n v iro n m e n t s u c c e s s fu lly re c o v e re d fro m a ll c o rre la te d fa ilu re s
in v o lv in g th e a p p lic a tio n a n d a S IF T p ro c e s s b e c a u s e th e p ro c e s s e s p e rfo rm in g
e rro r d e te c tio n a n d re c o v e ry w e re d e c o u p le d fro m th e fa ile d p ro c e s s e s .
4 . T a rg e te d in je c tio n s in to d y n a m ic d a ta o n th e h e a p w e re u s e fu l in fu rth e r
in v e s tig a tin g s y s te m fa ilu re s b ro u g h t a b o u t b y e rro r p ro p a g a tio n . A s s e rtio n s w ith in
th e S IF T p ro c e s s e s w e re s h o w n to re d u c e th e n u m b e r o f s y s te m fa ilu re s fro m d a ta
e rro r p ro p a g a tio n b y u p to 4 2 % .
S im ila rly a n a ly s is o f fa ilu re d a ta c o lle c te d in a n e tw o rk o f W in d o w s N T m a c h in e s
p ro v id e s in s ig h ts in to n e tw o rk s y s te m fa ilu re b e h a v io r.
1 . M o s t o f th e p ro b le m s th a t le a d to r e b o o ts a re s o ftw a re re la te d . O n ly 1 0 % a re
a ttrib u ta b le to s p e c ific h a r d w a re c o m p o n e n ts .
2 . R e b o o tin g th e m a c h in e d o e s n o t a p p e a r to s o lv e th e p ro b le m in m a n y c a s e s . In
a b o u t 6 0 % o f th e re b o o ts , th e re b o o te d m a c h in e re p o rte d p ro b le m s w ith in a h o u r o r
tw o o f th e re b o o t.
3 . T h o u g h th e a v e ra g e a v a ila b ility e v a lu a te s to o v e r 9 9 % , a ty p ic a l m a c h in e in th e
d o m a in , o n a v e ra g e , p ro v id e s a c c e p ta b le s e rv ic e o n ly a b o u t 9 2 % o f th e tim e .
4 . A b o u t 1 % o f th e re b o o ts in d ic a te m e m o ry le a k s in th e s o ftw a re .
5 . T h e re a re in d ic a tio n s o f p ro p a g a te d o r c o rre la te d fa ilu re s . T y p ic a lly , in su c h c a se s,
m u ltip le m a c h in e s e x h ib it id e n tic a l o r s im ila r p ro b le m s a t a lm o s t th e s a m e tim e .
M o re o v e r, th e fa ilu re d a ta a n a ly s is a ls o p ro v id e s in s ig h ts in to th e e rro r lo g g in g
m e c h a n is m . F o r e x a m p le , e v e n t-lo g g in g fe a tu re s th a t a re a b s e n t, b u t d e s ira b le , in
W in d o w s N T c a n b e s u g g e s te d :
1 . T h e p re s e n c e o f a W in d o w s N T s h u td o w n e v e n t w ill im p ro v e th e a c c u ra c y in
id e n tify in g th e c a u s e s o f re b o o ts . It w ill a ls o le a d to b e tte r e s tim a te s o f m a c h in e
a v a ila b ility .
2 . M o s t o f th e e v e n ts o b s e rv e d in th e lo g s w e re e ith e r d u e to a p p lic a tio n s o r to h ig h -
le v e l s y s te m c o m p o n e n ts , s u c h a s file -s y s te m d riv e rs . It is n o t e v id e n t if th is is d u e
to a g e n u in e a b s e n c e o f p ro b le m s a t th e lo w e r le v e ls o r it is ju s t b e c a u s e th e lo w e r-
le v e l s y s te m c o m p o n e n ts lo g e v e n ts s p a rin g ly o r re s o rt to o th e r m e a n s to re p o rt
e v e n ts . If th e la tte r is tru e , th e n im p ro v e d e v e n t lo g g in g b y th e lo w e r-le v e l s y s te m
c o m p o n e n ts (p ro to c o l d riv e rs , m e m o ry m a n a g e rs ) c a n e n h a n c e th e v a lu e o f e v e n t
lo g s in d ia g n o s is .
A c k n o w le d g m e n ts . T h is m a n u s c rip t is b a se d o n a re s e a rc h s u p p o rte d in p a rt b y
N A S A u n d e r g ra n t N A G -1 -6 1 3 , in c o o p e ra tio n w ith th e Illin o is C o m p u te r L a b o ra to ry
fo r A e ro sp a c e S y s te m s a n d S o ftw a re (IC L A S S ), b y T a n d e m C o m p u te rs , a n d in p a rt
b y a N A S A /J P L c o n tra c t 9 6 1 3 4 5 , a n d b y N S F g ra n ts C C R 0 0 -8 6 0 9 6 IT R a n d C C R
9 9 -0 2 0 2 6 .
R e fe r e n c e s
1 . J .A r la t, e t a l., “ F a u lt I n je c tio n f o r D e p e n d a b ility V a lid a tio n – A M e th o d o lo g y a n d S o m e

A p p lic a tio n s ,” I E E E T r a n s . O n S o ftw a r e E n g in e e r in g , V o l. 1 6 , N o . 2 , p p . 1 6 6 - 1 8 2 , F e b .
1 9 9 0 .
2 . J .A r la t, e t a l., “ F a u lt I n je c tio n a n d D e p e n d a b ility E v a lu a tio n o f F a u lt- T o le r a n t S y s te m s ,”
I E E E T r a n s . O n C o m p u te r s , V o l. 4 2 , N o . 8 , p p .9 1 3 - 9 2 3 , A u g . 1 9 9 3 .
3 . D . A v r e s k y , e t a l., “ F a u lt I n je c tio n f o r th e F o r m a l T e s tin g o f F a u lt T o le r a n c e ,” P r o c . 2 2 n d
In t. S y m p . F a u lt-T o le r a n t C o m p u tin g , p p . 3 4 5 -3 5 4 , J u n e 1 9 9 2 .
4 . S . B a g c h i, “ H ie ra rc h ic a l e rro r d e te c tio n in a s o ftw a re -im p le m e n te d fa u lt to le ra n c e (S IF T )
e n v ir o n m e n t,” P h .D . T h e s is , U n iv e r s ity o f I llin o is , U r b a n a , I L , 2 0 0 1 .
5 . J .H . B a rto n , E .W . C z e c k , Z .Z . S e g a ll, a n d D .P . S ie w io re k , “ F a u lt in je c tio n e x p e r im e n ts
u s in g F I A T ,” I E E E T r a n s . C o m p u te r s , V o l.3 9 , p p .5 7 5 - 5 8 2 , A p r . 1 9 9 0 .
6 . X . C a s tillo a n d D .P . S ie w io r e k , " A W o r k lo a d D e p e n d e n t S o f tw a r e R e lia b ility P r e d ic tio n
M o d e l , '' P r o c . 1 2 t h I n t . S y m p . F a u l t - T o l e r a n t C o m p u t i n g , p p . 2 7 9 - 2 8 6 , 1 9 8 2 .
7 . R . C h illa r e g e ,S . B iy a n i, a n d J . R o s e n th a l, " M e a s u r e m e n t O f F a ilu r e R a te in W id e ly
D is tr ib u te d S o f tw a r e ," P r o c . 2 5 th I n t. S y m p . F a u lt- T o le r a n t C o m p u tin g , p p . 4 2 4 - 4 3 3 ,
1 9 9 5 .
8 . J . G r a y , “ A C e n s u s o f T a n d e m S y s t e m A v a i l a b i l i t y b e t w e e n 1 9 8 5 a n d 1 9 9 0 , '' I E E E T r a n s .
R e lia b ility ,” V o l. 3 9 , N o . 4 , p p . 4 0 9 - 4 1 8 , 1 9 9 0 .
9 . M .C . H s u e h , R .K . I y e r , a n d K .S . T r iv e d i, " P e r f o r m a b ility M o d e lin g B a s e d o n R e a l D a ta :

A C a s e S t u d y , ’’ I E E E T r a n s . C o m p u t e r s , V o l . 3 7 , N o . 4 , p p . 4 7 8 - 4 8 4 , A p r i l 1 9 8 8 .
1 0 . R . Iy e r , D . T a n g , “ E x p e r im e n ta l A n a ly s is o f C o m p u te r S y s te m D e p e n d a b ility ,” C h a p te r 5
in F a u lt T o le r a n t C o m p u te r D e s ig n , D .K . P r a d h a n , P r e n tic e H a ll, p p .2 8 2 - 3 9 2 , 1 9 9 6 .
1 1 . R .K . I y e r a n d D .J . R o s s e tti, “ E f f e c t o f S y s te m W o r k lo a d o n O p e r a tin g S y s te m R e lia b ility :
A S tu d y o n I B M 3 0 8 1 ,” I E E E T r a n s . S o ftw a r e E n g in e e r in g , V o l. S E - 1 1 , N o . 1 2 , p p . 1 4 3 8 -
1 4 4 8 , 1 9 8 5 .
1 2 . M . K a ly a n a k r is h n a m , “ F a ilu r e D a ta A n a ly s is o f L A N o f W in d o w s N T B a s e d C o m p u te r s ,”
P r o c . 1 8 th S y m p . o n R e lia b le D is tr ib u te d S y s te m s , p p .1 7 8 - 1 8 7 , O c to b e r 1 9 9 9 .
1 3 . Z . K a lb a rc z y k , R . Iy e r, S . B a g c h i, K . W h is n a n t, “ C h a m e le o n : A s o ftw a re in fra s tru c tu re fo r
a d a p tiv e f a u lt to le r a n c e ,” I E E E T r a n s . o n P a r a lle l a n d D is tr ib u te d S y s te m s , v o l. 1 0 , n o . 6 ,
p p . 5 6 0 -5 7 9 , 1 9 9 9 .
1 4 . G .A . K a n a w a ti, N .A . K a n a w a ti, a n d J .A . A b r a h a m , “ F E R R A R I : A f le x ib le s o f tw a r e - b a s e d
f a u lt a n d e r r o r in je c tio n s y s te m ,” I E E E T r a n s . C o m p u te r s , V o l.4 4 , p p .2 4 8 - 2 6 0 , F e b . 1 9 9 5 .
1 5 . I . L e e a n d R .K . I y e r , “ A n a ly s is o f S o f tw a r e H a lts in T a n d e m S y s te m ,” P r o c . 3 r d I n t.
S y m p . S o ftw a r e R e lia b ility E n g in e e r in g , p p . 2 2 7 -2 3 6 , 1 9 9 2 .
1 6 . I . L e e a n d R .K . I y e r , “ S o f tw a r e D e p e n d a b ility in th e T a n d e m G U A R D I A N O p e r a tin g
S y s te m ,” I E E E T r a n s . o n S o ftw a r e E n g in e e r in g , V o l. 2 1 , N o . 5 , p p . 4 5 5 - 4 6 7 , 1 9 9 5 .
1 7 . T .T . L in , D .P . S ie w io r e k , “ E r r o r L o g A n a ly s is : S ta tis tic a l M o d e lin g a n d H e u r is tic T r e n d
A n a ly s is ,” I E E E T r a n s . R e lia b ility , V o l. 3 9 , N o . 4 , p p .4 1 9 - 4 3 2 , 1 9 9 0 .
1 8 . H . M a d e ria , R . S o m e , F . M o e re ira , D . C o s ta , D . R e n n e ls , “ E x p e rim e n ta l e v a lu a tio n o f a
C O T S s y s te m f o r s p a c e a p p lic a tio n s ,” P r o c . O f I n t. C o n f. O n D e p e n d a b le S y s te m s a n d
N e tw o r k s (D S N ’0 2 ), W a s h in g to n D C , p p . 3 2 5 -3 3 0 , J u n e 2 0 0 2 .
1 9 . M e s s a g e P a s s in g I n te r f a c e F o r u m , “ M P I - 2 : E x te n s io n s to th e M e s s a g e P a s s in g I n te r f a c e ,”
h ttp : //w w w .m p i- fo r u m .o r g /d o c s /m p i- 2 0 .p s .
2 0 . J .F . M e y e r a n d L . W e i, “ A n a ly s is o f W o r k lo a d I n f lu e n c e o n D e p e n d a b ility ” P r o c . 1 8 th I n t.
S y m p . F a u lt-T o le r a n t C o m p u tin g , p p .8 4 -8 9 , 1 9 8 8 .
2 1 . S . M o u ra d a n d D . A n d r e w s , “ O n th e R e lia b ility o f th e IB M M V S /X A O p e ra tin g S y s te m ,”
IE E E T r a n s . o n S o ftw a r e E n g in e e r in g , O c to b e r 1 9 8 7 .
2 2 . D . S to tt, B . F lo e rin g , Z . K a lb a rc z y k , a n d R . Iy e r, “ D e p e n d a b ility a s s e s s m e n t in d is trib u te d
s y s te m s w ith lig h tw e ig h t fa u lt in je c to rs in N F T A P E ,” P r o c . I n t. P e r fo r m a n c e a n d
D e p e n d a b ility S y m p o s iu m , IP D S -0 0 , p p . 9 1 -1 0 0 , 2 0 0 0 .
2 3 . M .S . S u lliv a n , R . C h illa r e g e ,” S o f tw a r e D e f e c ts a n d T h e ir I m p a c t o n S y s te m A v a ila b ility
— A S tu d y o f F ie ld F a ilu r e s in O p e r a tin g S y s te m s ,” P r o c . 2 1 s t I n t. S y m p . F a u lt- T o le r a n t
C o m p u tin g , p p . 2 -9 , 1 9 9 1 .
2 4 . M .S . S u lliv a n a n d R . C h illa r e g e , “ A C o m p a r is o n o f S o f tw a r e D e f e c ts in D a ta b a s e
M a n a g e m e n t S y s te m s a n d O p e r a tin g S y s te m s ,” P r o c . 2 2 n d I n t. S y m p . F a u lt- T o le r a n t
C o m p u tin g , p p .4 7 5 - 4 8 4 , 1 9 9 2 .
2 5 . D . T a n g a n d R .K . I y e r , “ A n a ly s is o f th e V A X /V M S E r r o r L o g s in M u ltic o m p u te r
E n v ir o n m e n ts — A C a s e S tu d y o f S o f tw a r e D e p e n d a b ility ,” P r o c . 3 r d I n t. S y m p . S o ftw a r e
R e lia b ility E n g in e e r in g , R e s e a rc h T ria n g le P a rk , N o rth C a ro lin a , p p . 2 1 6 -2 2 6 , O c to b e r
1 9 9 2 .
2 6 . D . T a n g a n d R .K . I y e r , " D e p e n d a b ility M e a s u r e m e n t a n d M o d e lin g o f a M u ltic o m p u te r
S y s t e m s , '' I E E E T r a n s . C o m p u t e r s , V o l . 4 2 , N o . 1 , p p . 6 2 - 7 5 , J a n u a r y 1 9 9 3 .
2 7 . A .T h a k u r , R .K .I y e r , L . Y o u n g , I . L e e , " A n a ly s is o f F a ilu r e s in th e T a n d e m N o n S to p - U X
O p e r a t i n g S y s t e m ," P r o c . I n t ’l S y m p . S o f t w a r e R e l i a b i l i t y E n g i n e e r i n g , p p . 4 0 - 4 9 , 1 9 9 5 .
2 8 . M .M . T s a o a n d D .P . S ie w io r e k , “ T r e n d A n a ly s is o n S y s te m E r r o r f ile s ,” P r o c . 1 3 th I n t.
S y m p . F a u lt-T o le r a n t C o m p u tin g , p p . 1 1 6 -1 1 9 , J u n e 1 9 8 3 .
2 9 . P . V e la r d i a n d R .K . I y e r , “ A S tu d y o f S o f tw a r e F a ilu r e s a n d R e c o v e r y in th e M V S
O p e ra tin g S y s te m ” ' IE E E T r a n s . O n C o m p u te r s , V o l. C -3 3 , N o . 6 , p p .5 6 4 -5 6 8 , J u n e
1 9 8 4 .
3 0 . K . W h is n a n t, Z . K a lb a rc z y k , a n d R . Iy e r, “ M ic ro -c h e c k p o in tin g : C h e c k p o in tin g fo r
m u ltith r e a d e d a p p lic a tio n s ,” in P r o c e e d in g s o f th e 6 th I n te r n a tio n a l O n - L in e T e s tin g
W o r k s h o p , J u ly 2 0 0 0 .
3 1 . K . W h is n a n t, R . Iy e r, Z . K a lb a rc z y k , P . Jo n e s, “ A n E x p e rim e n ta l E v a lu a tio n o f th e
A R M O R -b a se d R E E S o f tw a re -Im p le m e n te d F a u lt T o le ra n c e E n v ir o n m e n t,” p e n d in g
te c h n ic a l re p o rt, U n iv e rs ity o f Illin o is , U rb a n a , IL , 2 0 0
1 .
3 2 . K . W h is n a n t, e t a l., “ A n E x p e rim e n ta l E v a lu a tio n o f th e R E E S IF T E n v iro n m e n t fo r
S p a c e b o rn e A p p lic a tio n s ,” P ro c . O f In t. C o n f. O n D e p e n d a b le S y s te m s a n d N e tw o r k s
(D S N ’ 0 2 ), W a s h in g to n D C , p p . 5 8 5 -5 9 4 , Ju n e 2 0 0 2 .
Software Reliability and Rejuvenation: Modeling
and Analysis
Kishor S. Trivedi and Kalyanaraman Vaidyanathan
Dept. of Electrical & Computer Engineering

Duke University
Durham, NC 27708-0291, USA
{kst,kv}@ee.duke.edu
Abstract. Several recent studies have established that most system out-
ages are due to software faults. Given the ever increasing complexity of
software and the well-developed techniques and analysis for hardware
reliability, this trend is not likely to change in the near future. In this
paper, we classify software faults and discuss various techniques to deal
with them in the testing/debugging phase and the operational phase of
the software. We discuss the phenomenon of software aging and a preven-
tive maintenance technique to deal with this problem called software re-
juvenation. Stochastic models to evaluate the effectiveness of preventive
maintenance in operational software systems and to determine optimal
times to perform rejuvenation for different scenarios are described. We
also present measurement-based methodologies to detect software aging
and estimate its effect on various system resources. These models are
intended to help develop software rejuvenation policies. An automated
online measurement-based approach has been used in the software reju-
venation agent implemented in a major commercial server.
1 Introduction
Outages in computer systems consist of both hardware and software failures.

While hardware failures have been studied extensively and varied mechanisms
have been presented to increase system availability with regard to such fail-
ures, software failures and the corresponding reliability/availability analysis has
not drawn much attention from researchers. The study of software failures has
now become more important since it has been recognized that computer sys-
tems outages are more due to software faults than to hardware faults [19,35,40].
Therefore, software reliability is one of the weakest links in system reliability.
In this paper, we attempt to classify software faults based on an extension of
Gray’s classification [17] and discuss the various techniques to deal with these
faults in the testing/debugging and operational phase of the software. We then
describe the phenomenon of software aging, where the state of the software
system gradually degrades with time. This might eventually cause a performance
degradation of the system or result in a crash/hang failure. Particular attention
is given to software rejuvenation - a proactive form of environment diversity

Software Reliability and Rejuvenation: Modeling and Analysis 319
to deal with software aging, explaining its various approaches and methods in
practice.
1.1 What Is a Software Failure?

According to Laprie et al. [29], “a system failure occurs when the delivered service
no longer complies with the specifications, the latter being an agreed descrip-
tion of the system’s expected function and/or service”. This definition applies to
both hardware and software system failures. Faults or bugs in a hardware or a
software component cause errors. An error is defined as that part of the system
which is liable to lead to subsequent failure, and an error affecting the service is
an indication that a failure occurs or has occurred. If the system comprises of
multiple components, errors can lead to a component failure. As various compo-
nents in the system interact, failure of one component might introduce one or
more faults in another.
1.2 Classification of Software Faults

Gray [17] classifies software faults into Bohrbugs and Heisenbugs. Bohrbugs are
essentially permanent design faults and hence almost deterministic in nature.
They can be identified easily and weeded out during the testing and debugging
phase (or early deployment phase) of the software life cycle. Heisenbugs, on
the other hand, are essentially permanent faults whose conditions of activation
occur rarely or are not easily reproducible. Hence, these faults result in transient
failures, i.e., failures which may not recur if the software is restarted. Some
typical situations in which Heisenbugs might surface are boundaries between
various software components, improper or insufficient exception handling and
interdependent timing of various events. It is for this reason that Heisenbugs are
extremely difficult to identify through testing. Hence, a mature piece of software
in the operational phase, released after its development and testing stage, is more
likely to experience failures caused by Heisenbugs than due to Bohrbugs. Most
recent studies on failure data have reported that a large proportion of software
failures are transient in nature [17,18], caused by phenomena such as overloads
or timing and exception errors [9,40]. The study of failure data from Tandem’s
fault tolerant computer system indicated that 70% of the failures were transient
failures, caused by faults like race conditions and timing problems [30].
1.3 Software Aging

The phenomenon of software aging has been reported by several recent stud-
ies [16,25]. It was observed that once the software was started, potential fault
conditions gradually accumulated with time leading to either performance degra-
dation or transient failures or both. Failures may be of crash/hang type or those
resulting from data inconsistency because of aging. Typical causes of aging, i.e.,
slow degradation, are memory bloating or leaking, unreleased file-locks, data
corruption, storage space fragmentation and accumulation of round off errors.
320 K.S. Trivedi and K. Vaidyanathan
Popular and widely used software like the web browser Netscape is known
to suffer from serious memory leaks which lead to occasional crash/hang of the
application. This problem is particularly pronounced in systems with low swap
space. The newsreader software xrn also experiences problems due to memory
leaks. Software aging has not only been observed in software used on a mass scale
but also in specialized software used in high availability and safety-critical ap-
plications. This phenomenon has been observed in general purpose UNIX appli-
cations [25]. The applications experienced a crash/hang failure over time which
resulted in unplanned and expensive downtime. Avritzer and Weyuker [4] report
aging manifesting as gradual performance degradation in an industrial telecom-
munication software system. They deal with soft failures, i.e, a type of failure
where the system may enter a faulty state in which the system is still available
for service but has degraded to unacceptable performance levels, losing users or
packets. A similar kind of gradual performance degradation in file systems lead-
ing to a soft failure is discussed by Smith and Seltzer [39]. Their study shows
that in a degraded file system caused by normal usage and filling up of stor-
age space, the read throughput may be as much as 40% lower than that in an
empty file system. The reason behind this is the fragmentation of storage space
over time which results in non-sequential allocation of blocks. The most glaring
example of software aging in recent times is reported by Marshall [32]. In this
case, software aging resulted in loss of human life. The software system in the US
Patriot missiles deployed during the Gulf War accumulated numerical roundoff
error. This led to the interpretation of an incoming Iraqi Scud missile as a false
alarm which cost the lives of 28 US soldiers.
We designate faults attributed to software aging, which are quite different
from Bohrbugs and Heisenbugs, as aging-related faults. These faults are similar to
Heisenbugs in that they are activated under certain conditions (for example, lack
of OS resources) which may not be easily reproducible. However, as discussed
later, their modes and methods of recovery differ significantly. Figure 1 shows
our extended classification and treatment strategies for each class.
Fig. 1. Classification and treatment of software faults

1.4 Software Fault Tolerance
Techniques for tolerating faults in software have been divided into three classes:
– Design diversity: Design diversity techniques are specifically developed

to tolerate design faults in software arising out of wrong specifications and
incorrect coding. Two or more variants of a software developed by different
teams, but to a common specification are used. These variants are then used
in a time or space redundant manner to achieve fault tolerance. Popular
techniques which are based on the design diversity concept for fault tolerance
in software are N-version programming [3], recovery block [23] and N-self
checking programming [28]. The design diversity approach was developed
mainly to deal with Bohrbugs, but can to some extent deal with Heisenbugs.
– Data diversity: Data diversity, a technique for fault tolerance in software,
was introduced by Amman and Knight [2]. While the design diversity ap-
proaches to provide fault tolerance rely on multiple versions of the software
written to the same specifications, the data diversity approach uses only one
version of the software. This approach relies on the observation that software
sometime fails for certain values in the input space and this failure could be
averted if there is a minor perturbation of input data which is acceptable to
the software. Data diversity can work well with Bohrbugs and is cheaper to
implement than design diversity techniques. To some extent, data diversity
can also deal with Heisenbugs since different input data is presented and by
definition, these bugs are non-deterministic and non-repeatable.
– Environment diversity Environment diversity is the newest approach to
fault tolerance in software. Although this technique has been used for long
in an ad hoc manner, only recently has it gained recognition and importance.
Having its basis on the observation that most software failures are transient
in nature, the environment diversity approach requires reexecuting the soft-
ware in a different environment [27]. Environment diversity deals effectively
with Heisenbugs by exploiting their definition and nature. Adams [1] has
proposed restarting the system as the best approach to masking software
faults. Environment diversity, a generalization of restart [24,27], is a cheap
but effective technique for fault tolerance in software. Examples of envi-
ronment diversity techniques include operation retry operation, application
restart and node reboot. The retry and restart operations can be done on the
same node or on another spare (cold/warm/hot) node [30]. A specific form
of environment diversity, called software rejuvenation [25,47], which forms
the crux of this paper is discussed in detail in the following sections.
1.5 Software Rejuvenation
To counteract software aging, a proactive technique called software rejuvenation

has been proposed [25,47]. It involves stopping the running software occasion-
ally, “cleaning” its internal state and restarting it. Garbage collection, flushing
operating system kernel tables, reinitializing internal data structures are some
examples of what cleaning the internal state of a software might involve. An ex-
treme, but well known example of rejuvenation is a hardware reboot. It has been
implemented in the real-time system collecting billing data for most telephone
exchanges in the United States [5]. A very similar technique called software
capacity restoration, has been used by Avritzer and Weyuker in a large telecom-
munications switching software [4], where the switching computer is rebooted
occasionally upon which its service rate is restored to the peak value. Grey [20]
proposed performing operations solely for fault management in SDI (Strategic
Defense Initiative) software which are invoked whether or not the fault exists
and called it operational redundancy. Tai et al. [41] have proposed and analyzed
the use of on-board preventive maintenance for maximizing the probability of
successful mission completion of spacecrafts with very long mission times. The
necessity of performing preventive maintenance in a safety critical environment
is evident from the example of aging in Patriot’s software [32]. The failure which
resulted in loss of human lives could have been prevented if the computer was
restarted after each 8 hours of running time. Rejuvenation has been implemented
in various other kinds of systems - transaction processing systems [7], web servers
[46] and cluster servers [8].
Software rejuvenation (preventive maintenance) incurs an overhead (in terms
of performance, cost and downtime) which should be balanced against the loss
incurred due to unexpected outage caused by a failure. Thus, an important
research issue is to determine the optimal times to perform rejuvenation. In this
paper, we present two approaches for analyzing software aging and studying
aging-related failures.
The rest of this paper is organized as follows. Section 2 describes various
analytical models for software aging and to determine optimal times to perform
rejuvenation. Measurement-based models are dealt with in Section 3. The im-
plementation of a software rejuvenation agent in a major commercial server is
discussed in Section 4. Section 5 describes various approaches and methods of
rejuvenation and Section 6 concludes the paper with pointers to future work.
2 Analytic Models for Software Rejuvenation
The aim of the analytic modeling is to determine optimal times to perform re-
juvenation which maximize availability and minimize the probability of loss or
the response time of a transaction (in the case of a transaction processing sys-
tem). This is particularly important for business-critical applications for which
adequate response time can be as important as system uptime. The analysis
is done for different kinds of software systems exhibiting varied failure/aging
characteristics.
The accuracy of a modeling based approach is determined by the assumptions
made in capturing aging. In [12,13,14,25,41] only the failures causing unavail-
ability of the software are considered, while in [34] only a gradually decreasing
service rate of a software which serves transactions is assumed. Garg et al. [15],
however, consider both these effects of aging together in a single model. Mod-
els proposed in [12,13,25] are restricted to hypo-exponentially distributed time

to failure. Those proposed in [14,34,41] can accommodate general distributions
but only for the specific aging effect they capture. Generally distributed time to
failure, as well as the service rate being an arbitrary function of time are allowed
in [15]. It has been noted [40] that transient failures are partly caused by over-
load conditions. Only the model presented by Garg et al. [15] captures the effect
of load on aging. Existing models also differ in the measures being evaluated.
In [14,41] software with a finite mission time is considered. In the [12,13,15,25]
measures of interest in a transaction based software intended to run forever are
evaluated.
Bobbio et al.[6] present fine grained software degradation models, where one
can identify the current degradation level based on the observation of a sys-
tem parameter, are considered. Optimal rejuvenation policies based on a risk
criterion and an alert threshold are then presented. Dohi et al. [10,11] present
software rejuvenation models based on semi-Markov processes. The models are
analyzed for optimal rejuvenation strategies based on cost as well as steady-state
availability. Given a sample data of failure times, statistical non-parametric al-
gorithms based on the total time on test transform are presented to obtain the
optimal rejuvenation interval.
2.1 Basic Model for Rejuvenation
Figure 2 shows the basic software rejuvenation model proposed by Huang et al.
[25]. The software system is initially in a “robust” working state, 0. As time
progresses, it eventually transits to a “failure-probable” state 1. The system is
still operational in this state but can fail (move to state 2) with a non-zero
probability. The system can be repaired and brought back to the initial state
0. The software system is also rejuvenated at regular intervals from the failure
probable state 1 and brought back to the robust state 0.
completion of
completion of
repair
0 rejuvenation
state
change
2 1 3
system failure rejuvenation
Fig. 2. State transition diagram for rejuvenation
Huang et al. [25] assume that the stochastic behavior of the system can be
described by a simple continuous-time Markov chain (CTMC) [43]. Let Z be the
random time interval when the highly robust state changes to the failure probable
state, having the exponential distribution Pr{Z ≤ t} = F0 (t) = 1 − exp(−t/μ0 )

(μ0 > 0). Just after the state becomes the failure probable state, a system
failure may occur with a positive probability. Without loss of generality, we
assume that the random variable Z is observable during the system operation.
Define the failure time X (from state 1) and the repair time Y , having the
exponential distributions Pr{X ≤ t} = Ff (t) = 1 − exp(−t/λf ) and Pr{Y ≤
t} = Fa (t) = 1 − exp(−t/μa ) (λf > 0, μa > 0). If the system failure occurs
before triggering a software rejuvenation, then the repair is started immediately
at that time and is completed after the random time Y elapses. Otherwise,
the software rejuvenation is started. Note that the software rejuvenation cycle
is measured from the time instant just after the system enters state 1. Define
the distribution functions of the time to invoke the software rejuvenation and
of the time to complete software rejuvenation by Fr (t) = 1 − exp(−t/μr ) and
Fc (t) = 1 − exp(−t/μc ) (μc > 0, μr > 0), respectively. The CTMC is then
analyzed and the expected system down time and the expected cost per unit
time in the steady state is computed. An optimal rejuvenation interval which
minimizes expected downtime (or expected cost) is obtained.
It is not difficult to introduce the periodic rejuvenation schedule and to ex-
tend the CTMC model to the general one. Dohi et al. [10,11] developed semi-
Markov models with the periodic rejuvenation and general transition distribution
functions. More specifically, let Z be the random variable having the common
distribution function Pr{Z ≤ t} = F0 (t) with finite mean μ0 (> 0). Also, let
X and Y be the random variables having the common distribution functions
Pr{X ≤ t} = Ff (t) and Pr{Y ≤ t} = Fa (t) with finite means λf (> 0) and
μa (> 0), respectively. Denote the distribution function of the time to invoke
the software rejuvenation and the distribution of the time to complete software
rejuvenation by Fr (t) and Fc (t) (with mean μc (> 0)), respectively. After com-
pleting the repair or the rejuvenation, the software system becomes as good as
new, and the software age is initiated at the beginning of the next highly ro-
bust state. Consequently, we define the time interval from the beginning of the
system operation to the next one as one cycle, and the same cycle is repeated
again and again. The time to software rejuvenation (the rejuvenation interval)
is a constant, t0 , i.e., Fr (t) = U (t − t0 ), where U (·) is the unit step function.
The underlying stochastic process is a semi-Markov process with four regen-
eration states. If the sojourn times in all states are exponentially distributed,
this model is the CTMC in Huang et al. [25]. Using the renewal theory [36], the
steady-state system availability is computed as
. /
A(t0 ) = Pr software system is operative in the steady state
* t0
μ0 + 0
F f (t)dt
= * t0
μ0 + μa Ff (t0 ) + μc F f (t0 ) + 0
F f (t)dt
= S(t0 )/T (t0 ), (1)
where in general φ(·) = 1 − φ(·) The problem is to derive the optimal software
rejuvenation interval t∗0 which maximizes the system availability in the steady
state A(t0 ). We make the following assumption that the mean time to repair is
strictly larger than the mean time to complete the software rejuvenation (i.e.,
μa > μc ). This assumption is quite reasonable and intuitive. The following result
gives the optimal software rejuvenation schedule for the semi-Markov model.
Assume that the failure time distribution is strictly IFR (increasing failure
rate) [43]. Define the following non-linear function:
. /
q(t0 ) = T (t0 ) − (μa − μc )rf (t0 ) + 1 S(t0 ), (2)
where rf (t) = (dFf (t)/dt)/F f (t) is the failure rate.

(i) If q(0) > 0 and q(∞) < 0, then there exists a finite and unique optimal
software rejuvenation schedule t∗0 (0 < t∗0 < ∞) satisfying q(t∗0 ) = 0, and
the maximum system availability is
1
A(t∗0 ) = . (3)
(μa − μc )rf (t∗0 ) + 1
(ii) If q(0) ≤ 0, then the optimal software rejuvenation schedule is t∗0 = 0, i.e. it
is optimal to start the rejuvenation just after entering the failure probable
state, and the maximum system availability is A(0) = μ0 /(μ0 + μc ).
(iii) If q(∞) ≥ 0, then the optimal rejuvenation schedule is t∗0 → ∞, i.e. it
is optimal not to carry out the rejuvenation, and the maximum system
availability is A(∞) = (μ0 + λf )/(μ0 + μa + λf ).
If the failure time distribution is DFR (decreasing failure rate), then the
system availability A(t0 ) is a convex function of t0 , and the optimal rejuvenation
schedule is t∗0 = 0 or t∗0 → ∞ [10,11].
Garg et al. [12] have developed a Markov Regenerative Stochastic Petri Net
(MRSPN) model where rejuvenation is performed at deterministic intervals as-
suming that the failure probable state 1 is not observable.
2.2 Preventive Maintenance in Transactions Based Software

Systems
In [15], Garg et al. consider a transaction-based software system whose macro-
states representation is presented in Figure 3. The state in which the software is
available for service (albeit with decreasing service rate) is denoted as state A.
After failure a recovery procedure is started. In state B the software is recover-
ing from failure and is unavailable for service. Lastly, the software occasionally
undergoes preventive maintenance (PM), denoted by state C. PM is allowed
only from state A. Once recovery from failure or PM is complete, the software
is reset to state A and is as good as new. From this moment, which constitutes
a renewal, the whole process stochastically repeats itself.
The system consists of a server type software to which transactions arrive
at a constant rate λ. Each transaction receives service for a random period.
The service rate of the software is an arbitrary function measured from the
R e c o v e rin g U n d e rg o in g P M
B A C
A v a ila b le
Fig. 3. Macro-states representation of the software behavior
last renewal of the software (because of aging) denoted by μ(·). Therefore, a

* time t1 , occupies the server for a time whose
transaction which starts service at
t
− μ(·) dt
distribution is given by 1 − e t1 . If the software is busy processing a
transaction, arriving customers are queued. Total number of transactions that
the software can accommodate is K (including the one being processed) and any
more arriving when the queue is full are lost. The service discipline is FCFS. The
software fails with *a rate ρ(·), that is, the CDF of the time to failure X is given
t
− ρ(·) dt
by FX (t) = 1 − e 0 . Times to recover from failure Yf and to perform PM
Yr are random variables with associated general CDFs FYf and FYr respectively.
The model does not require any assumptions on the nature of FYf and FYr . Only
the respective expectations γf = E[Yf ] and γr = E[Yr ] are assumed to be finite.
Any transactions in the queue at the time of failure or at the time of initiation
of PM are assumed to be lost. Moreover, any transactions which arrive while the
software is recovering or undergoing PM are also lost.
The effect of aging in the model may be captured by using decreasing service
rate and increasing failure rate, where the decrease or the increase respectively
can be a function of time, instantaneous load, mean accumulated load or a
combination of the above.
Two policies which can be used to determine the time to perform PM are
considered. Under policy I which is purely time-based, PM is initiated after a
constant time δ has elapsed since it was started (or restarted). Under policy
II, which is based on instantaneous load and time, a constant waiting period
δ must elapse before PM is attempted. After this time PM is initiated if and
only if there are no transactions in the system. Otherwise, the software waits
until the queue is empty upon which PM is initiated. The actual PM interval
under Policy II is determined by the sum of PM wait δ and the time it takes
for the queue to get empty from that point onwards B. Since the latter quantity
is dependent on system parameters and can not be controlled, the actual PM
interval has a range [δ, ∞).
Given the above behavioral model the following measures are derived for each
policy: steady state availability of the software ASS , long run probability of loss
of a transaction Ploss , and expected response time of a transaction given that
it is successfully served Tres . The goal is to determine optimal values of δ (PM
interval under policy I and PM wait under policy II) based on the constraints
on one or more of these measures.
According to the model described above at any time t the software can be in
any one of three states: up and available for service (state A), recovering from a
failure (state B) or undergoing PM (state C). Let {Z(t), t ≥ 0} be a stochastic
process which represents the state of the software at time t. Further, let the
sequence of random variables Si , i > 0 represent the times at which transitions
among different states take place. Since the entrance times Si constitute renewal
points {Z(Si ), i > 0} is an embedded discrete time Markov chain (DTMC) with
a transition probability matrix P given by:
⎡ ⎤
0 PAB PAC
P = ⎣1 0 0 ⎦. (4)
1 0 0
The steady state probability πi of the DTMC being in state i, i ∈ {A, B, C} is:

1 1 1
π = [πA , πB , πC ] = , PAB , PAC . (5)
2 2 2
The software behavior is modeled via the stochastic process {(Z(t), N (t)) , t ≥
0}. If Z(t) = A, then N (t) ∈ {0, 1, . . . , K} as the queue can accommodate up
to K transactions. If Z(t) ∈ {B, C}, then N (t) = 0, since by assumption all
transactions arriving while the software is either recovering or undergoing PM
are lost. Further, the transactions already in the queue at the transition instant
are also discarded. It can be shown that the process {(Z(t), N (t)) , t ≥ 0} is a
Markov regenerative process (MRGP). Transition to state A from either B or C
constitutes a regeneration instant.
Let U be a random variable denoting the sojourn time in state A, and
denote its expectation by E[U ]. Expected sojourn times of the MRGP in
states B and C are already defined to be γf and γr . The steady state
availability is obtained using the standard formulae from MRGP theory:
ASS = P r{software is in state A}
πA E[U ]
= . (6)
πB γf + πC γr + πA E[U ]
The probability that a transaction is lost is defined as the ratio of expected
number of transactions which are lost in an interval to the expected total num-
ber of transactions which arrive during that interval. Since the evolution of
{Z(t), N (t)), t > 0} in the intervals comprising of successive visits to state A is
stochastically identical it suffices to consider just one such interval. The number
of transactions lost is given by the summation of three quantities: (1) transac-
tions in the queue when the system is exiting state A because of the failure or
initiation of PM (2) transactions that arrive while failure recovery or PM is in
progress and (3) transactions that are disregarded due to the buffer being full.
The last quantity is of special significance since the probability of buffer being
full will increase due to the degrading service rate. It follows that the probability
of loss is given by
) ∞
πA E[Nl ] + λ πB γf + πC γr + πA pK (t)dt
0
Ploss = (7)
λ (πB γf + πC γr + πA E[U ])
where E[Nl ] is the expected number of transactions in the buffer when the system
is exiting state A. Equation 7 is valid only for policy II. Under policy
* ∞ I sojourn
time in state A is limited by δ, so the upper limit in the integral 0 pK (t)dt is
δ instead of ∞.
Next an upper bound on the mean response time of a transaction given that it
is successfully served, Tres , is derived. The mean number of transactions, denoted
by E, which are accepted for service while the software is in state A is given
by the mean number of transactions which are not accepted due to the buffer
being full, subtracted from the mean total number of transactions
) which
arrive
∞
while the software is in state A, that is, E = λ E[U ] − pK (t)dt . Out of
t=0
these transactions, on the average, E[Nl ] are discarded later because of failure
or initiation of PM. Therefore, the mean number of transactions which actually
receive service given that they were accepted is given by E − E[Nl ]. The mean
total amount of )time the transactions spent in the system while the software is in
∞
state A is W = ipi (t) dt. This time is composed of the mean time spent
t=0 i
by the transactions which were served as well as those which were discarded,
denoted as WS and WD , respectively. Therefore, W = WS + WD . The response
time we are interested in is given by Tres = WS /(E − E[Nl ]), which is upper
W
bounded by Tres < E−E[N l]
.
pi (t) is the probability that there are i transactions queued for service, which
is also the probability of being in state i of the subordinated process at time t.
pi (t) is the probability that the system failed when there were i transactions
queued for service. These transient probabilities for both policies can be obtained
by solving the systems of forward differential-difference equations given in [15]. In
general they do not have a closed-form analytical solution and must be evaluated
numerically. Once these probabilities are obtained, the rest of the quantities PAB ,
PAC , E[U ] and E[Nl ] can be easily computed [15] and then used to obtain the
steady state availability ASS , the probability of transaction lost Ploss and the
upper bound on the response time of a transaction Tres .
Examples are presented to illustrate the usefulness of the presented model in
determining the optimum value of δ (PM interval in the case of policy I and PM
wait in the case of policy II). First, the service rate and failure rate are assumed
to be functions of real time, where ρ(t) is defined to be the hazard function
of Weibull distribution, while μ(t) is defined to be a monotone non-increasing
function that approximates the service degradation. Figure 4 shows Ass and Ploss
for both policies plotted against δ for different values of the mean time to perform
PM γr . Under both policies, it can be seen that for any particular value of δ,
higher the value of γr , lower is the availability and higher is the corresponding
loss probability. It can also be observed that the value of δ which minimizes
probability of loss is much lower than the one which maximizes availability. In
0 .9 9 8
I, 0 .1 5
0 .9 9 7 0 .0 4
I, 0 .3 5
I, 0 .5 5
0 .9 9 6 0 .0 3 I, 0 .8 5
II, 0 .1 5
L o s s P ro b a b ility
A v a ila b ility 0 .9 9 5 0 .0 2
I, 0 .1 5 II, 0 .3 5
0 .9 9 4 I, 0 .3 5 II, 0 .5 5
I, 0 .5 5 0 .0 2 II, 0 .8 5
0 .9 9 3 I, 0 .8 5
II, 0 .1 5 0 .0 1
0 .9 9 2 II, 0 .3 5
II, 0 .5 5 0 .0 1
0 .9 9 1
II, 0 .8 5
0 .9 9 0 0 .0 0
0 .0 1 0 0 .0 2 0 0 .0 3 0 0 .0 4 0 0 .0 0 .0 5 0 .0 1 0 0 .0 1 5 0 .0
δ δ
Fig. 4. Results for experiment 1
fact, the probability of loss becomes very high at values of δ which maximize
availability. For any specific value of γr , policy II results in a lower minima in
loss probability than that achieved under policy I. Therefore, if the objective is
to minimize long run probability of loss, such as in the case of telecommunication
switching software, policy II always fares better than policy I.
1 .0 0 0 0 .1 0 6 .0
5 .0 re a l tim e
L o s s P ro b a b ility
0 .9 9 7 0 .0 8
R e s p o n s e T im e
b u s y tim e
4 .0
A v a ila b ility
n o fa ilu re
0 .9 9 5 0 .0 5 3 .0
re a l tim e re a l tim e
B u s y tim e b u s y tim e 2 .0
0 .9 9 2 n o fa ilu re 0 .0 3 n o fa ilu re
1 .0
0 .9 9 0 0 .0 0 0 .0
0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0
δ δ δ
Fig. 5. Results of experiment 2
Figure 5 shows ASS , Ploss and upper bound on Tres plotted against δ under
policy I. Each of the figures contains three curves. μ(·) and ρ(·) in the solid curve
are functions of real time μ(t) and ρ(t), whereas in the dotted curve they are
functions (with the same parameters) of the mean total processing time μ(L(t))
and ρ(L(t)). The dashed curve represents a third system in which no crash/hang
failures occur ρ(·) = 0, but service degradation is present with μ(·) = μ(t).
This experiment illustrates the importance of making the right assumptions in
capturing aging because as seen from the figure, depending on the forms chosen
for μ(·) and ρ(·), the measures vary in a wide range.
2.3 Software Rejuvenation in a Cluster System
Software rejuvenation has been applied to cluster systems [8,45]. This is a novel
application, which significantly improves cluster system availability and produc-
tivity. The Stochastic Reward Net (SRN) model of a cluster system employing
simple time-based rejuvenation is shown in Figure 6. The cluster consists of n
nodes which are initially in a “robust” working state, Pup . The aging process
is modeled as a 2-stage hypo-exponential distribution (increasing failure rate)

[43] with transitions Tf prob and Tnoderepair . Place Pf prob represents a “failure-
probable” state in which the nodes are still operational. The nodes then can
eventually transit to the fail state, Pnodef ail1 . A node can be repaired through
the transition Tnoderepair , with a coverage c. In addition to individual node fail-
ures, there is also a common-mode failure (transition Tcmode ). The system is also
considered down when there are a (a ≤ n) individual node failures. The system
is repaired through the transition Tsysrepair .
P c lo c k
n
T im m d 4
P u p
n
g 2
g 1 T re ju v in te rv a l
T c m o d e P s ta rtre ju v
g 2
#
T fp ro b
T n o d e re p a ir g 6 T im m d 1 0
g 5
T im m d 8 g 4 T im m d 9 g 4
T s y s re p a ir T im m d 6 T im m d 5 p ro b = n 1 p ro b = n 2
P n o d e fa il2 P fp ro b
T im m d 1 3 P re ju v 1 P re ju v 2
g 1 g 1
T im m d 3 g 3
# g 1
T im m d 1 T n o d e fa il T re ju v 1 T re ju v 2
c
T im m d 1 1 T im m d 1 2 g 1
g 7 g 7
P s y s fa il P n o d e fa il1
(1 -c ) T im m d 1 4
# #
P re ju v e d
T im m d 2 T n o d e fa ilre ju v P fp ro b re ju v T fp ro b re ju v
T im m d 1 5 g 1 T im m d 7
g 1
Fig. 6. SRN model of a cluster system employing simple time-based rejuvenation
In the simple time-based policy, rejuvenation is done successively for all the
operational nodes in the cluster, at the end of each deterministic interval. The
transition Trejuvinterval fires every d time units depositing a token in place
Pstartrejuv . Only one node can be rejuvenated at any time (at places Prejuv1
or Prejuv2 ). Weight functions are assigned such that the probability of selecting
a token from Pup or Pf prob is directly proportional to the number of tokens in
each. After a node has been rejuvenated, it goes back to the “robust” working
state, represented by place Prejuved . This is a duplicate place for Pup in order to
distinguish the nodes which are waiting to be rejuvenated from the nodes which
have already been rejuvenated. A node, after rejuvenation, is then allowed to
fail with the same rates as before rejuvenation even when another node is being
rejuvenated. Duplicate places for Pupb and Pf prob are needed to capture this.
Node repair is disabled during rejuvenation. Rejuvenation is complete when the
sum of nodes in places Prejuved , Pf probrejuv and Pnodef ail2 is equal to the total
number of nodes, n. In this case, the immediate transition Timmd10 fires, putting
back all the rejuvenated nodes in places Pup and Pf prob . Rejuvenation stops
when there are a−1 tokens in place Pnodef ail2 , to prevent a system failure. The
clock resets itself when rejuvenation is complete and is disabled when the system
is undergoing repair. Guard functions (g1 through g7) are assigned to express
complex enabling conditions textually.
In condition-based rejuvenation (Figure 7), rejuvenation is attempted only
when a node transits into the “failure probable” state. In practice, this degraded
state could be predicted in advance by means of analyses of some observable
system parameters [16]. In case of a successful prediction, assuming that no
other node is being rejuvenated at that time, the newly detected node can be
rejuvenated. A node is allowed to fail even while waiting for rejuvenation.
n
T im m d 4
P u p
n
g 1
P d e te c t T im m d 8
T im m d 1 1
g 2 T c m o d e
#
T fp ro b
T n o d e re p a ir g 1
c 2
g 5
T im m d 7 g 4
T im m d 6 P fp ro b
T s y s re p a ir P n o d e fa il2 # T im m d 9
P re ju v
T n o d e fa il2
T im m d 3 g 1
g 3
g 1
T im m d 1 (1 -c 2 ) T im m d 1 0 T re ju v
c 1
P s y s fa il T n o d e fa il1
(1 -c 1 ) P n o d e fa il1
P d e te c tfa il
#
T im m d 2
g 1 T im m d 5
Fig. 7. SRN model of a cluster system employing condition-based rejuvenation
For the analyses, the following values are assumed. The mean times spent in
places Pup and Pf prob are 240 hrs and 720 hrs respectively. The mean times to
repair a node, to rejuvenate a node and to repair the system are 30 min, 10 min
and 4 hrs respectively. In this analysis, the common-mode failure is disabled and
node failure coverage is assumed to be perfect. All the models were solved using
the SPNP (Stochastic Petri Net Package) tool [22]. The measures computed
were expected unavailability and the expected cost incurred over a fixed time
interval. It is assumed that the cost incurred due to node rejuvenation is much
less than the cost of a node or system failure since rejuvenation can be done at
predetermined or scheduled times. In our analysis, we fix the value for costnodef ail
at $5,000/hr, the costrejuv at $250/hr. The value of costsysf ail is computed as
the number of nodes, n, times costnodef ail .
Figure 8 shows the plots for an 8/1 configuration (8 nodes including 1 spare)
system employing simple time-based rejuvenation. The upper plot and lower
plots show the expected cost incurred and the expected downtime (in hours)
respectively in a given time interval, versus rejuvenation interval (time between
successive rejuvenation) in hours. If the rejuvenation interval is close to zero, the
system is always rejuvenating and thus incurs high cost and downtime. As the
rejuvenation interval increases, both expected unavailability and cost incurred
decrease and reach an optimum value. If the rejuvenation interval goes beyond
the optimal value, the system failure has more influence on these measures than
rejuvenation. The analysis was repeated for 2/1, 8/2, 16/1 and 16/2 configu-
rations. For time-based rejuvenation, the optimal rejuvenation interval was 100
hours for the 1-spare clusters, and approximately 1 hour for the 2-spare clus-
ters. In our analysis of condition-based rejuvenation, we assumed 90% prediction
coverage. For systems that have one spare, time-based rejuvenation can reduce
downtime by 26% relative to no rejuvenation. Condition-based rejuvenation does
somewhat better, reducing downtime by 62% relative to no rejuvenation. How-
ever, when the system can tolerate more than one failure at a time, downtime is
reduced by 98% to 95% via time-based rejuvenation, compared to a mere 85%
for condition-based rejuvenation.
4
x 1 0
2 .2
2
E x p e c te d C o s t
1 .8
1 .6
1 .4
1 .2
0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0
R e ju v e n a tio n In te r v a l ( h o u r s )
1 .0 5
1
E x p e c te d D o w n tim e
0 .9 5
0 .9
0 .8 5
0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0
R e ju v e n a tio n In te r v a l ( h o u r s )
Fig. 8. Results for an 8/1 cluster system employing time-based rejuvenation
3 Measurement Based Models for Software Rejuvenation

While all the analytical models are based on the assumption that the rate of
software aging is known, in the measurement based approach, the basic idea
is to monitor and collect data on the attributes responsible for determining the
health of the executing software. The data is then analyzed to obtain predictions
about possible impending failures due to resource exhaustion.
In this section we describe the measurement-based approach for detection
and validation of the existence of software aging. The basic idea is to periodi-
cally monitor and collect data on the attributes responsible for determining the
health of the executing software, in this case the UNIX operating system. Garg
et al. [16] propose a methodology for detection and estimation of aging in the
UNIX operating system. An SNMP-based distributed resource monitoring tool
was used to collect operating system resource usage and system activity data
from nine heterogeneous UNIX workstations connected by an Ethernet LAN at
the Department of Electrical and Computer Engineering at Duke University. A
central monitoring station runs the manager program which sends get requests
periodically to each of the agent programs running on the monitored work-
stations. The agent programs in turn obtain data for the manager from their
respective machines by executing various standard UNIX utility programs like
pstat, iostat and vmstat. For quantifying the effect of aging in operating system
resources, the metric Estimated time to exhaustion is proposed. The earlier work
[16] uses a purely time-based approach to estimate resource exhaustion times,
whereas the the work presented in [44] takes into account the current system
workload as well.
A methodology based on time-series analysis to detect and estimate resource
exhaustion times due to software aging in a web server while subjecting it to
an artificial workload, is proposed in [31]. Avritzer and Weyuker [4] monitor
production traffic data of a large telecommunication system and describe a reju-
venation strategy which increases system availability and minimizes packet loss.
Cassidy et al. [7] have developed an approach to rejuvenation for large online
transaction processing servers. They monitor various system parameters over a
period of time. Using pattern recognition methods, they come to the conclusion
that 13 of those parameters deviate from normal behavior just prior to a crash,
providing sufficient warning to initiate rejuvenation.
3.1 Time-Based Estimation

In the time-based estimation method presented by Garg et al. [16], data was
collected from the UNIX machines at intervals of 15 minutes for about 53 days.
Time-ordered values for each monitored object are obtained, constituting a time
series for that object. The objective is to detect aging or a long term trend
(increasing or decreasing) in the values. Only results for the data collected from
the machine Rossby are discussed here.
First, the trends in operating system resource usage and system activity are
detected using smoothing of observed data by robust locally weighted regression,
proposed by Cleveland [16]. This technique is used to get the global trend be-
tween outages by removing the local variations. Then, the slope of the trend is
estimated in order to do prediction. Figure 9 shows the smoothed data super-
imposed on the original data points from the time series of objects for Rossby.
Amount of real memory free (plot 1) shows an overall decrease, whereas file table
size (plot 2) shows an increase. Plots of some other resources not discussed here
also showed an increase or decrease. This corroborates the hypothesis of aging
with respect to various objects.
The seasonal Kendall test [16] was applied to each of these time series to
detect the presence of any global trends at a significance level, α, of 0.05. With
Zα =1.96, all values are such that the null hypothesis (H0 ) that no trend exists
3 5 0 0 0
R e a l M e m o ry F re e
2 5 0 0 0 1 5 0 0 0
0 1 0 2 0 3 0 4 0 5 0
T im e
1 8 0 2 0 0 2 2 02 6 0
2 4 0
F ile T a b le S iz e
1 6 0
1 4 0
0 1 0 2 0 3 0 4 0 5 0
T im e
Fig. 9. Non-parametric regression smoothing for Rossby objects
is rejected for the variables considered. Given that a global trend is present
and that its slope is calculated for a particular resource, the time at which the
resource will be exhausted because of aging only, is estimated. Table 1 refers to
several objects on Rossby and lists an estimate of the slope (change per day) of
the trend obtained by applying Sen’s slope estimate for data with seasons [16].
The values for real memory and swap space are in Kilobytes. A negative slope, as
in the case of real memory, indicates a decreasing trend, whereas a positive slope,
as in the case of file table size, is indicative of an increasing trend. Given the
slope estimate, the table lists the estimated time to failure of the machine due to
aging only with respect to this particular resource. The calculation of the time
to exhaustion is done by using the standard linear approximation y = mx + c.
A comparative effect of aging on different system resources can be obtained
from the above estimates. Overall, it was found that file table size and process
table size are not as important as used swap space and real memory free since they
have a very small slope and high estimated times to failure due to exhaustion.
Based on such comparisons, we can identify important resources to monitor and
manage in order to deal with aging related software failures. For example, the
resource used swap space has the highest slope and real memory free has the
second highest slope. However, real memory free has a lower time to exhaustion
than used swap space.
Table 1. Estimated slope and time to exhaustion for Rossby, Velum and Jefferson
objects
Resource Initial Max Sen’s Slope 95% Confidence Estimated Time

Name Value Value Estimation Interval to Exh. (days)
Rossby
Real Memory Free 40814.17 84980 -252.00 -287.75 : -219.34 161.96
File Table Size 220 7110 1.33 1.30 : 1.39 5167.50
Process Table Size 57 2058 0.43 0.41 : 0.45 4602.30
Used Swap Space 39372 312724 267.08 220.09 : 295.50 1023.50
Jefferson
Real Memory Free 67638.54 114608 -972.00 -1006.81 : -939.08 69.59
File Table Size 268.83 7110 1.33 1.30 : 1.38 5144.36
Process Table Size 67.18 2058 0.30 0.29 : 0.31 6696.41
Used Swap Space 47148.02 524156 577.44 545.69 : 603.14 826.07
3.2 Time and Workload-Based Estimation
The method discussed in the previous subsection assumes that accumulated use
of a resource over a time period depends only on the elapsed time. However, it
is intuitive that the rate at which a resource is consumed is dependent on the
current workload. In this subsection, we discuss a measurement-based model to
estimate the rate of exhaustion of operating system resources as a function of
both time and the system workload [44]. The SNMP-based distributed resource
monitoring tool described previously was used for collecting operating system
resource usage and system activity parameters (at 10 min intervals) for over 3
months. Only results for the data collected from the machine Rossby are dis-
cussed here. The longest stretch of sample points in which no reboots or failures
occurred were used for building the model. A semi-Markov reward model [42] is
constructed using the data. First different workload states are identified using
statistical cluster analysis and a state-space model is constructed. Corresponding
to each resource, a reward function based on the rate of resource exhaustion in
the different states is then defined. Finally the model is solved to obtain trends
and the estimated exhaustion rates and time to exhaustion for the resources.
The following variables were chosen to characterize the system workload -
cpuContextSwitch, sysCall, pageIn, and pageOut. Hartigan’s k-means clustering
algorithm [21] was used for partitioning the data points into clusters based on
workload. The statistics for the eleven workload clusters obtained are shown
in Table 2. Clusters whose centroids were relatively close to each other and
those with a small percentage of data points in them, were merged to simplify
computations. The resulting clusters are W1 = {1, 2, 3}, W2 = {4, 5}, W3 = {6},
W4 = {7}, W5 = {8}, W6 = {9}, W7 = {10} and W8 = {11}.
Transition probabilities from one state to another were computed from data,
resulting in transition probability matrix P of the embedded discrete time
Markov chain The sojourn time distribution for each of the workload states
was fitted to either 2-stage hyper-exponential or 2-stage hypo-exponential dis-
Table 2. Statistics for the workload clusters
Cluster Center % of
No. cpuConSw sysCall pgOut pgIn pts.
1 48405.16 94194.66 5.16 677.83 0.98
2 54184.56 122229.68 5.39 81.41 0.76
3 34059.61 193927.00 0.02 136.73 0.93
4 20479.21 45811.71 0.53 243.40 1.89
5 21361.38 37027.41 0.26 12.64 7.17
6 15734.65 54056.27 0.27 14.45 6.55
7 37825.76 40912.18 0.91 12.21 11.77
8 11013.22 38682.46 0.03 10.43 42.87
9 67290.83 37246.76 7.58 19.88 4.93
10 10003.94 32067.20 0.01 9.61 21.23
11 197934.42 67822.48 415.71 184.38 0.93
tribution functions. The fitted distributions were tested using the Kolmogorov-
Smirnov test at a significance level of 0.01.
Two resources, usedSwapSpace and realMemoryFree, are considered for the
analysis, since the previous time-based analysis suggested that they are criti-
cal resources. For each resource, the reward function is defined as the rate of
corresponding resource exhaustion in different states. The true slope (rate of
increase/decrease) of a resource at every workload state is estimated by using
Sen’s non-parametric method [44]. Table 3 shows the slopes with 95% confidence
intervals.
It was observed that slopes in a given workload state for a particular resource
during different visits to that state are almost the same. Further, the slopes across
different workload states are different and generally higher the system activity,
higher is the resource utilization. This validates the assumption that resource
usage does depend on the system workload and the rates of exhaustion vary
with workload changes. It can also be observed from Table 3 that the slopes
for usedSwapSpace in all the workload states are non-negative, and the slopes
for realMemoryFree are non-positive in all the workload states except in one.
It follows that usedSwapSpace increases whereas realMemoryFree decreases over
time which validates the software aging phenomenon.
The semi-Markov reward model was solved using the SHARPE tool [37] de-
veloped by researchers at Duke University. The slope for the workload-based esti-
mation is computed as the expected reward rate in steady state from the model.
The times to resource exhaustion is computed as the job completion time (mean
time to accumulate x amount of reward) of the Markov reward model. Table 4
gives the estimates for the slope and time to exhaustion for usedSwapSpace and
realMemoryFree. It can be seen that workload based estimations gave a lower
time to resource exhaustion than those computed using time based estimations.
Since the machine failures due to resource exhaustion were observed much before
Table 3. Slope estimates (in KB/10 min)
usedSwapSpace realMemoryFree
State Slope 95 % Conf. Slope 95 % Conf.
Est. Interval Est. Interval
W1 119.3 5.5 - 222.4 -133.7 -137.7 - -133.3
W2 0.57 0.40 - 0.71 -1.47 -1.78 - -1.09
W3 0.76 0.73 - 0.80 -1.43 -2.50 - -0.62
W4 0.57 0.00 - 0.69 -1.23 -1.67 - -0.80
W5 0.78 0.75 - 0.80 0.00 -5.65 - 6.00
W6 0.81 0.64 - 1.00 -1.14 -1.40 - -0.88
W7 0.00 0.00 - 0.00 0.00 0.00 - 0.00
W8 91.8 72.4 - 111.0 91.7 -369.9 - 475.2
the times to resource exhaustion estimated by the time based method, it follows
that the workload based approach results in better estimations.
Table 4. Estimates for slope (in KB/10 min) and time to exhaustion (in days) for
usedSwapSpace and realMemoryFree
Method usedSwapSpace realMemoryFree

of Slope 95 % Conf. Est. Time Slope 95 % Conf. Est. Time
Estimation Estimate Interval to Exh. Estimate Interval to Exh.
Time based 0.787 0.786 - 0.788 2276.46 -2.806 -3.026 - -2.630 60.81
Workload based 4.647 1.191 - 7.746 490.50 -4.1435 -9.968 - 2.592 41.38
3.3 Time Series and ARMA Models

In this section, a measurement-based approach based on time-series analysis to
detect software aging and to estimate resource exhaustion times due to aging in
a web server is described [31]. The experiments are conducted on an Apache web
server running on the Linux platform. Before carrying out other experiments,
the capacity of the web server is determined so that the appropriate workload
to use in the experiments can be decided. The capacity of the web server was
found to be around 390 requests/sec. In the next part of the experiment, the
web server was run without rejuvenation for a long time until the performance
degraded or until the server crashed. The requests were generated by httperf [33]
to get one of five specified files from the server of sizes 500 bytes, 5KB, 50KB,
500KB and 5MB. The corresponding probabilities that a given file is requested
are: 0.35, 0.5, 0.14, 0.009 and 0.001, respectively. During the period of running,
the performance measured by the workload generator and system parameters
collected by the Linux system monitoring tool, procmon, were recorded.
The first data set was collected in a 7-day period with a connection rate of
350 requests/sec. The second set was collected in a 25-day period with connec-
tion rate of 400 request/sec. During the experiment, we recorded more than 100
parameters, but for our modeling purposes, six representative parameters per-
taining to system resources were selected (Table 5). In addition to the six system
status parameters, the response time of the web server, recorded by httperf on
the client machine, is also included in the model as a measure of performance of
the web server.
Table 5. Analyzed parameters and their physical meaning
Parameter Physical meaning

PhysicalMemoryFree Free physical memory
SwapSpaceUsed Used swap space
LoadAvg5Min Average CPU load in the last five minutes
NumberDiskRequests Number of disk requests in the last five minutes
PageOutCounter Number of pages paged out in the last five minutes
NewProcesses Number of newly spawned processes in the last five minutes
ResponseTime The interval from the time httperf sends out the first byte of
request until it receives the first byte of reply
After collecting the data, it needs to be analyzed to determine if software

aging exists, which is indicated by degradation in performance of the web server
and/or exhaustion of system resources. The performance of the web server is
measured by response time which is the interval from the time a client sends
out the first byte of request until it receives the first byte of reply. Figure 10(a)
shows the plot of the response time in data set I. To identify the trend, the
range of y-axis is magnified (Figure 10(b)). The response time becomes longer
with the running time of the experiment. To determine whether the trend is
just a fluctuation due to noise or an essential characteristic of the data, a linear
regression model is used to fit the time series of the response time. The least
squares solution is r = 15.5655+0.027t, where r is response time in milliseconds,
t is the time from the beginning of the experiment. The 95% confidence interval
for the slope is (0.019, 0.036) ms/hour. Since the slope is positive, it can be
concluded that the performance of the web server is degrading.
Performing the same analysis to the parameters related to system resources,
it was found that the available resources are decreasing. Estimated slopes of
some of the parameters using linear regression model are listed in Table 6.
The parameters in data set II are used as the modeling objects since the du-
ration of data set II is longer than that of data set I. In this case, there are seven
parameters to be analyzed. The analysis can be done using two different ap-
proaches: (1) building a univariate model for each of the outputs or, 2) building
only one multivariate model with seven outputs. In this case, seven univariate
models are built and then combined into a single multivariate model. First, the
parameters are determined to determine their characteristics and build an ap-
3 0 0 4 0
2 5 0 3 5
2 0 0 3 0
r e s p o n s e tim e ( m s )
r e s p o n s e tim e ( m s )
1 5 0 2 5
1 0 0 2 0
5 0 1 5
0 1 0
0 5 0 1 0 0 1 5 0 2 0 0 0 5 0 1 0 0 1 5 0 2 0 0
tim e ( h o u r s ) tim e ( h o u r s )
(a ) (b )
Fig. 10. Response time of the web server
Table 6. Estimated slope of parameters
Data Set Parameter Slope 95% confidence interval

response time 0.027 ms/hour ( 0.019, 0.036) ms/hour
I free physical memory -88.472 KB/hour (-93.337, -83.607) KB/hour
used swap space 29.976 KB/hour ( 29.290, 30.662) KB/hour
response time 0.063 ms/hour ( 0.057, 0.068) ms/hour
II free physical memory 15.183 KB/hour ( 14.094, 16.271) KB/hour
used swap space 7.841 KB/hour ( 7.658, 8.025) KB/hour
propriate model with one output and four inputs for each parameter - connection
rate, linear trend, periodic series with a period of one week, and periodic series
with a period of one day. The autocorrelation function (ACF) and the partial
autocorrelation function (PACF) for the output are computed. The ACF and
the PACF help us decide the appropriate model for the data [38]. For example,
from the ACF and PACF of used swap space it can be determined that an au-
toregressive model of order 1 [AR(1)] is suitable for this data series. Adding the
inputs to the AR(1) model, we get the ARX(1) model for used swap space:
Yt = aYt−1 + b1 Xt + b2 Lt + b3 Wt + b4 Dt , (8)
where Yt is the used swap space, Xt is the connection rate, Lt is the time step
which represents the linear trend, Wt is the weekly periodic series and Dt is the
daily periodic series. After observing the ACF and PACF of all the parameters,
we find that all of the PACFs cut off at certain lags. So all the multiple input
single output (MISO) models are of the ARX type, only with different orders.
This gives great convenience in combining them into a multiple input multiple
output (MIMO) ARX model which is described later.
In order to combine the MISO ARX models into a MIMO ARX model, we
need to choose the order between different outputs. This is done by inspecting
the CCF (cross-correlation function) between each pair of the outputs to find
out the leading relationship between them. If the CCF between parameter A and
B gets its peak value at a positive lag k, we say that A leads B by k steps and
it might be possible to use A to predict B. In our analysis, there are 21 CCFs
that need to be computed. And in order to reduce the complexity, we only use
the CCFs that exhibit obvious leading relationship with lags less than 10 steps.
The next step after determination of the orders is to estimate the coefficients
of the model by the least squares method. The first half of the data is used to
estimate the parameters and the rest of the data is then used to verify the model.
Figure 11 shows the two-hour-ahead (24-step) predicted used swap space which
6
x 1 0
1 4
m e a s u re d
1 3 tw o − h o u r p r e d ic te d
1 2
1 1
u s e d s w a p s p a c e (b y te s )
1 0
4
0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0
T im e ( h o u r s )
Fig. 11. Measured and two-hour ahead predicted used swap space
is computed using the established model and the data measured up to two hours
before the predicted time point. From the plots, we can see that the predicted
values are very close to the measured values.
4 Implementation of a Software Rejuvenation Agent
The first commercial version of a software rejuvenation agent (SRA) for the IBM
xSeries line of cluster servers has been implemented with our collaboration [8,
26,45]. The SRA was designed to monitor consumable resources, estimate the
time to exhaustion of those resources, and generate alerts to the management in-
frastructure when the time to exhaustion is less than a user-defined notification
horizon. For Windows operating systems, the SRA acquires data on exhaustible
resources by reading the registry performance counters and collecting parameters
such as available bytes, committed bytes, non-paged pool, paged pool, handles,
threads, semaphores, mutexes, and logical disk utilization. For Linux, the agent
accesses the /proc directory structure and collects equivalent parameters such
as memory utilization, swap space, file descriptors and inodes. All collected pa-
rameters are logged on to disk. They are also stored in memory preparatory to
time-to-exhaustion analysis.
In the current version of the SRA, rejuvenation can be based on elapsed time
since the last rejuvenation, or on prediction of impending exhaustion. When
using Timed Rejuvenation, a user interface is used to schedule and perform re-
juvenation at a period specified by the user. It allows the user to select when
to rejuvenate different nodes of the cluster, and to select “blackout” times dur-
ing which no rejuvenation is to be allowed. Predictive Rejuvenation relies on
curve-fitting analysis and projection of the utilization of key resources, using
recently observed data. The projected data is compared to prespecified upper
and lower exhaustion thresholds, within a notification time horizon. The user
specifies the notification horizon and the parameters to be monitored (some pa-
rameters believed to be highly indicative are always monitored by default), and
the agent periodically samples the data and performs the analysis. The predic-
tion algorithm fits several types of curves to the data in the fitting window. These
different curve types have been selected for their ability to capture different types
of temporal trends. A model-selection criterion is applied to choose the “best”
prediction curve, which is then extrapolated to the user-specified horizon. The
several parameters that are indicative of resource exhaustion are monitored and
extrapolated independently. If any monitored parameter exceeds the specified
minimum or maximum value within the horizon, a request to rejuvenate is sent
to the management infrastructure. In most cases, it is also possible to identify
which process is consuming the preponderance of the resource being exhausted,
in order to support selective rejuvenation of just the offending process or a group
of processes.
5 Approaches and Methods of Software Rejuvenation

Software rejuvenation can be divided broadly into two approaches as follows:
– Open-loop approach: In this approach, rejuvenation is performed with-
out any feedback from the system. Rejuvenation in this case, can be
based just on elapsed time (periodic rejuvenation) [25,12] and/or instan-
taneous/cumulative number of jobs on the system [15].
– Closed-loop approach: In the closed-loop approach, rejuvenation is per-
formed based on information on the system “health”. The system is moni-
tored continuously (in practice, at small deterministic intervals) and data is
collected on the operating system resource usage and system activity. This
data is then analyzed to estimate time to exhaustion of a resource which
may lead to a component or an entire system degradation/crash. This es-
timation can be based purely on time, and workload-independent [16,8] or
can be based on both time and system workload [44].
The closed-loop approach can be further classified based on whether the
data analysis is done off-line or on-line. Off-line data analysis is done based
on system data collected over a period of time (usually weeks or months).
The analysis is done to estimate time to rejuvenation. This off-line analysis

approach is best suited for systems whose behavior is fairly deterministic.
The on-line closed-loop approach, on the other hand, performs on-line anal-
ysis of system data collected at deterministic intervals. Another approach to
estimate the optimal time to rejuvenation could be based on system failure
data [11]. This approach is more suited for off-line data analysis.
This classification of approaches to rejuvenation is shown in Figure 12.
S O F T W A R E R E JU V E N A T IO N
O p e n -lo o p a p p ro a c h C lo s e d -lo o p a p p ro a c h
E la p s e d E la p s e d tim e
tim e a n d lo a d O ff-lin e O n -lin e
(p e rio d ic )
T im e -b a s e d T im e & F a ilu re T im e -b a s e d T im e &

a n a ly s is w o rk lo a d -b a s e d d a ta a n a ly s is w o rk lo a d -b a s e d
a n a ly s is a n a ly s is
Fig. 12. Approaches to software rejuvenation
Rejuvenation is a very general proactive fault management approach and can

be performed at different levels - the system level or the application level. An
example of a system level rejuvenation is a hardware-reboot. At the application
level, rejuvenation is performed by stopping and restarting a particular offending
application, process or a group of processes. This is also known as a partial
rejuvenation. The above rejuvenation approaches when performed on a single
node can lead to undesired and often costly downtime. Rejuvenation has been
recently extended for cluster systems, in which two or more nodes work together
as a single system [8,45]. In this case, rejuvenation can be performed by causing
no or minimal downtime by failing over applications to another spare node.
6 Conclusions
In this paper, we classified software faults based on an extension of Gray’s clas-

sification and discussed the various techniques to deal with them. Attention was
devoted to software rejuvenation, a proactive technique to counteract the phe-
nomenon of software aging. Various analytical models for software aging and to
determine optimal times to perform rejuvenation were described. Measurement-

based models based on data collected from operating systems were also discussed.
The implementation of a software rejuvenation agent in a major commercial
server was then briefly described. Finally, various approaches to rejuvenation
and rejuvenation granularity were discussed.
In the measurement-based models presented in this paper, only aging due to
each individual resource has been captured. In the future, one could improve the
algorithm used for aging detection to involve multiple parameters simultaneously,
for better prediction capability and reduced false alarms. Dependences between
the various system parameters could be studied. The best statistical data analysis
method for a given system is also yet to be determined.
References
1. E. Adams. Optimizing Preventive Service of the Software Products. IBM Journal
of R&D, 28(1):2-14, January 1984.
2. P. E. Amman and J. C. Knight. Data Diversity: An Approach to Software Fault
Tolerance. In Proc. of 17th Int. Symp. on Fault Tolerant Computing, pages 122-126,
June 1987.
3. A. Avizienis and L. Chen. On the Implementation of N-version Programming for
Software Fault Tolerance During Execution. In Proc. IEEE COMPSAC 77, pp
149-155, November 1977.
4. A. Avritzer and E.J. Weyuker. Monitoring Smoothly Degrading Systems for In-
creased Dependability. Empirical Software Eng. Journal, Vol 2, No. 1, pp 59-77,
1997.
5. L. Bernstein. Text of seminar delivered by Mr. Bernstein. In University Learning
Center, George Mason University, January 29 1996.
6. A. Bobbio, A. Sereno and C. Anglano. Fine Grained Software Degradation Models
for Optimal rejuvenation policies. Performance Evaluation, Vol. 46, pp 45-62, 2001.
7. K. Cassidy, K. Gross and A. Malekpour. Advanced Pattern Recognition for De-
tection of Complex Software Aging in Online Transaction Processing Servers. In
Proc. Dependable Systems and Networks, DSN 2002, Washington D.C., June 2002.
8. V. Castelli, R. E. Harper, P. Heidelberger, S. W. Hunter, K. S. Trivedi, K.
Vaidyanathan and W. Zeggert. Proactive Management of Software Aging. IBM
Journal of R&D, Vol. 45, No.2, March 2001.
9. R. Chillarege, S. Biyani and J. Rosenthal. Measurement of Failure Rate in Widely
Distributed Software. In Proc. of 25th IEEE Int. Symp. on Fault Tolerant Com-
puting, pp 424-433, Pasadena, CA, July 1995.
10. T. Dohi, K. Goševa–Popstojanova and K. S. Trivedi. Analysis of Software Cost
Models with Rejuvenation. In Proc. of the 5th IEEE Int. Symp. on High Assurance
Systems Engineering, HASE 2000, Albuquerque, NM, November 2000.
11. T. Dohi, K. Goševa–Popstojanova and K. S. Trivedi. Statistical Non-Parametric
Algorithms to Estimate the Optimal Software Rejuvenation Schedule. Proc. of the
2000 Pacific Rim Int. Symp. on Dependable Computing, PRDC 2000, Los Angeles,
CA, December 2000.
12. S. Garg, A. Puliafito, M. Telek and K. S. Trivedi. Analysis of Software Rejuvenation
Using Markov Regenerative Stochastic Petri Net. In Proc. of the Sixth Int. Symp.
on Software Reliability Engineering, pp 180-187, Toulouse, France, October 1995.
13. S. Garg, Y. Huang, C. Kintala and K. S. Trivedi. Time and Load Based Soft-
ware Rejuvenation: Policy, Evaluation and Optimality. In Proc. of the First Fault-
Tolerant Symposium, Madras, India, December 1995.
14. S. Garg, Y. Huang and C. Kintala, K.S. Trivedi, Minimizing Completion Time of
a Program by Checkpointing and Rejuvenation. Proc. 1996 ACM SIGMETRICS
Philadelphia, PA, pp 252-261, May 1996.
15. S. Garg, A. Puliafito, M. Telek and K. S. Trivedi. Analysis of Preventive Main-
tenance in Transactions Based Software Systems. IEEE Trans. on Computers, pp
96-107, Vol.47, No.1, January 1998.
16. S. Garg, A. van Moorsel, K. Vaidyanathan and K. S. Trivedi. A Methodology for
Detection and Estimation of Software Aging. In Proc. of the Ninth Int. Symp.
on Software Reliability Engineering, pp 282-292, Paderborn, Germany, November
1998.
17. J. Gray. Why do Computers Stop and What Can be Done About it? In Proc. of
5th Symp. on Reliability in Distributed Software and Database Systems, pp 3-12,
January 1986.
18. J. Gray. A Census of Tandem System Availability Between 1985 and 1990. IEEE
Trans. on Reliability, 39:409-418, October 1990.
19. J. Gray and D. P. Siewiorek. High-Availability Computer Systems. IEEE Com-
puter, pages 39-48, September 1991.
20. B. O. A. Grey. Making SDI Software Reliable through Fault-tolerant Techniques.
Defense Electronics, pp 77–80,85–86, August 1987.
21. J. A. Hartigan. Clustering Algorithms. New York:Wiley, 1975.
22. C. Hirel, B. Tuffin and K. S. Trivedi. SPNP: Stochastic Petri Net Package. Version
6.0. B. R. Haverkort et al. (eds.): TOOLS 2000, Lecture Notes in Computer Science
1786, pp 354-357, Springer-Verlag Heidelberg, 2000.
23. J. J. Horning, H. C. Lauer, P. M. Melliar-Smith and B. Randell. A Program
Structure for Error Detection and Recovery. Lecture Notes in Computer Science,
16:177-193, 1974.
24. Y. Huang, P. Jalote and C. Kintala. Two Techniques for Transient Software Error
Recovery. Lecture Notes in Computer Science, Vol. 774, pp 159-170. Springer
Verlag, Berlin, 1994.
25. Y. Huang, C. Kintala, N. Kolettis and N. D. Fulton. Software Rejuvenation:
Analysis, Module and Applications. In Proc. of 25th Symp. on Fault Tolerant
Computing, pp 381-390, Pasadena, CA, June 1995.
26. IBM Netfinity Director Software Rejuvenation - White Paper. IBM Corporation,
Research Triangle Park, NC, January 2001.
27. P. Jalote, Y. Huang and C. Kintala. A Framework for Understanding and Handling
Transient Software Failures. In Proc. 2nd ISSAT Int. Conf. on Reliability and
Quality in Design, Orlando, FL, 1995.
28. J. C. Laprie, J. Arlat, C. Béounes, K. Kanoun and C. Hourtolle. Hardware and Soft-
ware Fault Tolerance: Definition and Analysis of Architectural Solutions. In Proc.
of 17th Symp. on Fault Tolerant Computing, pp 116-121, Pittsburgh, PA,1987.
29. J. C. Laprie (Ed.). Dependability: Basic Concepts and Terminology. Springer-
Verlag, Wien, New York, 1992.
30. I. Lee and R. K. Iyer. Software Dependability in the Tandem GUARDIAN System.
IEEE Trans. on Software Engineering, pp 455-467, Vol. 21, No. 5, May 1995.
31. L. Li, K. Vaidyanathan and K. S. Trivedi. An Approach to Estimation of Soft-
ware Aging in a Web Server. In Proc. of the Int. Symp. on Empirical Software
Engineering, ISESE 2002, Nara, Japan, October 2002 (to appear).
32. E. Marshall. Fatal Error: How Patriot Overlooked a Scud. Science, pp 1347, March
13 1992.
33. D. Mosberger and T. Jin. Httperf - A Tool for Measuring Web Server Performance
In First Workshop on Internet Server Performance, WISP, Madison, WI, pp.59-67,
June 1998.
34. A. Pfening, S. Garg, A. Puliafito, M. Telek and K. S. Trivedi. Optimal Rejuvenation
for Tolerating Soft Failures. Performance Evaluation, 27 & 28, pp 491-506, October
1996.
35. D. K. Pradhan. Fault-Tolerant Computer System Design. Prentice Hall, Englewood
Cliffs, NJ, 1996.
36. S. M. Ross. Stochastic Processes. John Wiley & Sons, New York, 1983.
37. R. A. Sahner, K. S. Trivedi, A. Puliafito. Performance and Reliability Analysis
of Computer Systems - An Example-Based Approach Using the SHARPE Software
Package. Kluwer Academic Publishers, Norwell, MA, 1996.
38. R. H. Shumway and D. S. Stoffer. Time Series Analysis and Its Applications,
Springer-Verlag, New York, 2000.
39. K. Smith and M. Seltzer. File System Aging - Increasing the Relevance of File
System Benchmarks In Proc. of ACM SIGMETRICS, June 1997.
40. M. Sullivan and R. Chillarege. Software Defects and Their Impact on System
Availability - A Study of Field Failures in Operating Systems. In Proc. 21st IEEE
Int. Symp. on Fault Tolerant Computing, pages 2–9, 1991.
41. A. T. Tai, S. N. Chau, L. Alkalaj, and H. Hecht. On-board Preventive Mainte-
nance: Analysis of Effectiveness and Optimal Duty Period. In Proc. of 3rd Int.
Workshop on Object-oriented Real-time Dependable Systems, Newport Beach, Cal-
ifornia, February 1997.
42. K. S. Trivedi, J. Muppala, S. Woolet and B. R. Haverkort. Composite Performance
and Dependability Analysis. Performance Evaluation, Vol. 14, Nos. 3-4, pp 197-
216, February 1992.
43. K. S. Trivedi. Probability and Statistics, with Reliability, Queuing and Computer
Science Applications, 2nd edition. John Wiley, 2001.
44. K. Vaidyanathan and K. S. Trivedi. A Measurement-Based Model for Estimation
of Resource Exhaustion in Operational Software Systems. In Proc. of the Tenth
IEEE Int. Symp. on Software Reliability Engineering, pp 84-93, Boca Raton, FL,
November 1999.
45. K. Vaidyanathan, R. E. Harper, S. W. Hunter, K. S. Trivedi. Analysis and Imple-
mentation of Software Rejuvenation in Cluster Systems. In Proc. of the Joint Int.
Conf. on Measurement and Modeling of Computer Systems, ACM SIGMETRICS
2001/Performance 2001, Cambridge, MA, June 2001.
46. http://www.apache.org
47. http://www.software-rejuvenation.com
P e r fo r m a n c e V a lid a tio n o f M o b ile S o ftw a r e A r c h ite c tu r e s
1 2 1
V in c e n z o G ra s s i , V itto rio C o rte lle s s a , R a ffa e la M ira n d o la
1
D ip a rtim e n to d i In fo rm a tic a , S is te m i e P ro d u z io n e
U n iv e rs ità d i R o m a “ T o r V e rg a ta ” , Ita ly
g r a s s i v @ a c m . o r g , m i r a n d o l a @ i n f o . u n i r o m a 2 . i t
2
D ip a rtim e n to d i In fo rm a tic a
U n iv e rs ità d e L ’A q u ila , Ita ly
c o r t e l l e @ u n i v a q . i t
A b s t r a c t . D e s ig n p a ra d ig m s b a s e d o n th e id e a o f c o d e m o b ility h a v e b e e n
re c e n tly in tro d u c e d , w h e re c o m p o n e n ts o f a n a p p lic a tio n m a y (a u to n o m o u s ly
o r u p o n re q u e s t) m o v e to d iffe re n t lo c a tio n s , d u rin g th e a p p lic a tio n
e x e c u tio n . B e s id e s , s o ftw a re te c h n o lo g ie s a re re a d ily a v a ila b le (e .g . J a v a -
b a s e d ), th a t p ro v id e to o ls to im p le m e n t th e s e p a ra d ig m s . B a s e d o n m o b ile
c o d e p a ra d ig m s a n d te c h n o lo g ie s , d iffe re n t b u t fu n c tio n a lly e q u iv a le n t
s o f tw a r e a r c h ite c tu r e s c a n b e d e f in e d a n d it is w id e ly re c o g n iz e d th a t, in
g e n e ra l, th e a d o p tio n o f a p a rtic u la r a rc h ite c tu re c a n h a v e a la rg e im p a c t o n
q u a lity a ttrib u te s su c h a s m o d ifia b ility , re u s a b ility , re lia b ility , a n d
p e rfo rm a n c e . H e n c e , v a lid a tio n a g a in s t s p e c ific a ttrib u te s is n e c e s s a ry a n d
c la im s fo r a c a re fu l p la n n in g o f th is a c tiv ity . W ith in th is fra m e w o rk , th e g o a l
o f th is tu to ria l is tw o fo ld : to p ro v id e a g e n e ra l m e th o d o lo g y fo r th e
v a lid a tio n o f s o ftw a re a rc h ite c tu re s , w h e re th e fo c u s is o n th e tra n s itio n fro m
th e m o d e lin g o f s o ftw a re a rc h ite c tu re s to th e v a lid a tio n o f n o n -fu n c tio n a l
re q u ire m e n ts ; to s u b s ta n tia te th is g e n e ra l m e th o d o lo g y in to th e s p e c ific c a s e
o f s o ftw a re a rc h ite c tu re s e x p lo itin g m o b ile c o d e .
T h e p e rv a s iv e d e p lo y m e n t o f la rg e -s c a le n e tw o rk in g in fra s tru c tu re s is v a s tly c h a n g in g
th e a rc h ite c tu re o f s o ftw a re s y s te m s a n d a p p lic a tio n s , le a d in g to m o re a n d m o re
a p p lic a tio n s d e s ig n e d to o p e ra te in d is trib u te d w id e a re a e n v iro n m e n ts , th u s in tro d u c in g
n e w c h a lle n g e s to a rc h ite c ts o f s c a la b le d is trib u te d a p p lic a tio n s . In d e e d , th e la rg e
n u m b e r o f a v a ila b le h o s ts w ith v e ry d iffe re n t c a p a b ilitie s , c o n n e c te d b y n e tw o rk s w ith
v a ry in g c a p a c itie s a n d lo a d s , im p lie s th a t th e d e s ig n e r is u n lik e ly to k n o w a p rio ri h o w
to s tru c tu re th e a p p lic a tio n in a w a y th a t b e s t le v e ra g e s th e a v a ila b le in fra s tru c tu re , a n d
th a t a n y a s s u m p tio n re g a rd in g th e u n d e rly in g p h y s ic a l s y s te m , w h ic h is m a d e e a rly a t
th e d e s ig n tim e , is u n lik e ly to h o ld la te r.
T h is h ig h ly h e te ro g e n e o u s a n d d y n a m ic e n v iro n m e n t ris e s p ro b le m s th a t c o u ld b e
c o n s id e re d n e g lig ib le in lo c a l a re a e n v ir o n m e n ts . A s a c o n s e q u e n c e , te c h n o lo g ie s ,
a rc h ite c tu re s a n d m e th o d o lo g ie s tra d itio n a lly u s e d to d e v e lo p d is trib u te d a p p lic a tio n s in
lo c a l a r e a e n v ir o n m e n ts , u s u a lly b a s e d o n th e n o tio n o f lo c a tio n tr a n s p a r e n c y , e x h ib it
s e v e ra l lim ita tio n s in w id e a re a e n v iro n m e n ts , a n d o fte n fa il in p ro v id in g th e d e s ire d
q u a lity le v e l. O n th e c o n tra ry , lo c a tio n a w a r e n e s s h a s b e e n s u g g e s te d a s a n in n o v a tiv e
a p p ro a c h in th e d e s ig n o f s o ftw a re a p p lic a tio n s fo r w id e a re a e n v iro n m e n ts , to d e a l s in c e
th e e a rly d e s ig n p h a s e s w ith th e c h a ra c te ris tic s a n d c o n s tra in ts o f th e d iffe re n t lo c a tio n s .
E x p lic itly c o n s id e rin g c o m p o n e n ts lo c a tio n a t th e a p p lic a tio n le v e l s tra ig h tfo rw a rd ly
le a d s to e x p lo it th e lo c a tio n c h a n g e a s a n e w d im e n s io n in th e d e s ig n a n d
im p le m e n ta tio n o f d is trib u te d a p p lic a tio n s . In d e e d , m o b ile c o d e d e s ig n p a ra d ig m s , b a s e d
o n th e a b ility o f m o v in g c o d e a c r o s s th e n o d e s o f a n e tw o rk , h a v e b e e n re c e n tly
M .C . C a lz a r o s s a a n d S . T u c c i ( E d s .) : P e r f o r m a n c e 2 0 0 2 , L N C S 2 4 5 9 , p p . 3 4 6 − 3 7 3 , 2 0 0 2 .
P e rfo rm a n c e V a lid a tio n o f M o b ile S o ftw a re A rc h ite c tu re s 3 4 7
in tr o d u c e d . B e s id e s , s o f tw a r e te c h n o lo g ie s a r e r e a d ily a v a ila b le ( e .g . J a v a - b a s e d ) , th a t
p ro v id e to o ls to im p le m e n t th e s e p a ra d ig m s , s o th a t b o th h a v e b e c o m e a c e n tra l p a rt o f
th e to o ls e t s u p p o rtin g th e d e s ig n o f a p p lic a tio n s fo r w id e a re a e n v iro n m e n ts
C o d e m o b ility , a s it is in te n d e d in th is p e rs p e c tiv e , s h o u ld n o t b e c o n fu s e d w ith th e
w e ll k n o w n c o n c e p t o f p r o c e s s m ig r a tio n , e v e n if th e a d o p te d m e c h a n is m s to
im p le m e n t th e m m a y b e s im ila r. P ro c e s s m ig ra tio n is a (d is trib u te d ) O S is s u e , re a liz e d
tra n s p a re n tly to th e a p p lic a tio n (u s u a lly to g e t lo a d b a la n c in g ), a n d h e n c e d o e s n o t
re p re s e n t a to o l in th e h a n d s o f th e a p p lic a tio n d e s ig n e r; o n th e c o n tra ry , c o d e m o b ility
is in te n d e d to b rin g th e a b ility o f c h a n g in g lo c a tio n u n d e r th e c o n tro l o f th e d e s ig n e r,
s o r e p r e s e n tin g a n e w to o l h e /s h e c a n e x p lo it to a c c o m p lis h q u a lity re q u ire m e n ts ,
la y in g th e fo u n d a tio n fo r a n e w g e n e ra tio n o f te c h n o lo g ie s , a rc h ite c tu re s , m o d e ls , a n d
a p p lic a tio n s .
U s in g m o b ile c o d e p a ra d ig m s a n d te c h n o lo g ie s , d iffe re n t b u t fu n c tio n a lly e q u iv a le n t
s o ftw a re a rc h ite c tu re s c a n b e d e s ig n e d a n d im p le m e n te d , a n d it is w id e ly re c o g n iz e d th a t,
in g e n e ra l, th e a d o p tio n o f a p a rtic u la r a rc h ite c tu re c a n h a v e a la rg e im p a c t o n q u a lity
a ttrib u te s o f a d is trib u te d a p p lic a tio n s u c h a s m o d ifia b ility , re u s a b ility , re lia b ility , a n d
p e rfo rm a n c e [3 3 ]. In p a rtic u la r, w ith re s p e c t to p e rfo rm a n c e , c o d e m o b ility o ffe rs to
a p p lic a tio n d e s ig n e rs n e w la titu d e s in u s in g th e s y s te m s re s o u rc e s . N o lo n g e r re m o te
re s o u rc e s m u s t b e a c c e s s e d re m o te ly ; in s te a d , (p a rt o f) th e a p p lic a tio n c a n m o v e to u s e
th e re s o u rc e s lo c a lly . U n d e r th e rig h t c irc u m s ta n c e s , th is c a n re d u c e b o th n e tw o rk tra ffic
a n d n e tw o rk p ro to c o l o v e rh e a d , s o re d u c in g th e to ta l a m o u n t o f w o rk d o n e b y th e
s y s te m , a n d im p ro v in g th e p e rfo rm a n c e o f th e e n tire s y s te m . O n th e o th e r h a n d , u n d e r
th e w r o n g c ir c u m s ta n c e s , th e e n tir e s y s te m s lo w s d o w n , e .g . b e c a u s e o f e x c e s s iv e
m ig ra tio n tra ffic , o r in c re a s e d lo a d a t a lre a d y c o n g e s te d n o d e s . H e n c e , v a lid a tio n o f
m o b ility -b a s e d a rc h ite c tu re s a g a in s t s p e c ific p e rfo rm a n c e a ttrib u te s is n e c e s s a ry , a n d
c a lls fo r a c a re fu l p la n n in g o f th is a c tiv ity .
T h e g o a l o f th is tu to r ia l is tw o f o ld : to p ro v id e a g e n e ra l m e th o d o lo g y fo r th e
v a lid a tio n o f s o ftw a re a rc h ite c tu re s , w h e re th e fo c u s is o n th e tra n s itio n fro m th e
m o d e lin g o f s o ftw a re a rc h ite c tu re s to th e v a lid a tio n o f n o n -fu n c tio n a l re q u ire m e n ts ; to
s h o w h o w th is g e n e ra l m e th o d o lo g y c a n b e s u b s ta n tia te d in th e s p e c ific c a s e o f s o ftw a re
a rc h ite c tu re s e x p lo itin g m o b ile c o d e . W e e m p h a s iz e th e fo rm e r p o in t in s e c tio n 2 ,
w h e re w e p ro v id e a ta x o n o m y o f th e p a ra m e te rs e v e ry a p p ro a c h to s o ftw a re a rc h ite c tu re
v a lid a tio n s h o u ld d e p e n d o n . T h e n , fo r th e la tte r p o in t, w e re v ie w a p p ro a c h e s fo r th e
v a lid a tio n o f m o b ile s o ftw a re a rc h ite c tu re s , p re s e n tin g th e m in th e fra m e w o rk o f th e
a b o v e m e n tio n e d ta x o n o m y . T o p ro v id e a b a s ic u n d e rs ta n d in g o f th e fe a tu re s a n d
p e rfo rm a n c e re la te d c o s ts o f d iffe re n t m o b ile c o d e s ty le s , w e firs t g iv e in s e c tio n 3 a
b a s ic ta x o n o m y o f th e s e s ty le s , a n d th e n , in s e c tio n 4 , w e s u rv e y m e th o d o lo g ie s fo r
p e rfo rm a n c e v a lid a tio n o f m o b ile s o ftw a re a rc h ite c tu re s . W e c la s s ify th e s e
m e th o d o lo g ie s a s a d -h o c a n d g e n e r a l-p u r p o s e m e th o d o lo g ie s . A d -h o c m e th o d o lo g ie s
c o n s id e r c o d e m o b ility “ in is o la tio n ” , la c k in g o f fe a tu re s to m o d e l a w h o le s o ftw a re
a p p lic a tio n . G e n e r a l- p u r p o s e m e th o d o lo g ie s o v e r c o m e th is lim ita tio n b y e m b e d d in g
c o d e m o b ility m o d e lin g in to s o m e fo rm a lis m fo r th e s p e c ific a tio n o f s o ftw a re
a p p lic a tio n s . F in a lly , s e c tio n 5 c o n c lu d e s th e p a p e r a n d p ro v id e s h in ts fo r fu tu re
re se a rc h .
2 N o n F u n c tio n a l R e q u ir e m e n ts V a lid a tio n

T h e v a lid a tio n o f s o ftw a re a rc h ite c tu re s c a n b e p e rfo rm e d v e rs u s fu n c tio n a l a n d /o r n o n -
fu n c tio n a l re q u ire m e n ts (N F R ). A p p ro a c h e s b a s ic a lly d iffe r in th e tw o c a s e s , a s th e
fo rm e r a re s ta te m e n ts o f s e rv ic e s th e s o ftw a re s y s te m s h o u ld p ro v id e , h o w it s h o u ld
3 4 8 V . G ra s s i, V . C o rte lle s s a , a n d R . M ira n d o la
re a c t to p a rtic u la r in p u ts a n d b e h a v e in p a rtic u la r s itu a tio n s , w h e re a s N F R a re

c o n s tr a in ts o n th e s e r v ic e s o f f e r e d b y th e s o ftw a re s y s te m th a t a ffe c t th e s o ftw a re
q u a lity .
A lth o u g h v a lid a tio n is b e in g a c c e p te d a s a c ru c ia l a c tiv ity in s o ftw a re d e v e lo p m e n t,
y e t N F R a re o fte n n e g le c te d . E c o n o m ic re a s o n s (s u c h a s s h o rt tim e to m a rk e t an d
s p e c ia l s k ills re q u ire d ) a n d p ra c tic a l re a s o n s (n o n -fu n c tio n a l s o ftw a re a s p e c ts a re o fte n
d e te rm in e d b y th e la te s t d e c is io n s in th e life c y c le , s u c h a s th e h a rd w a re c o n fig u ra tio n
h o s tin g th e s o ftw a re s y s te m ) c o n trib u te to th e re lu c ta n c e fro m th e w o rld o f s o ftw a re
d e v e lo p m e n t to a d o p t a n e n g in e e re d a p p ro a c h to th e v a lid a tio n o f N F R . A c o n s is te n t
e ffo rt h a s b e e n s p e n t in th e la s t fe w y e a rs in o rd e r to fill th is g a p b e tw e e n s o ftw a re
d e v e lo p m e n t a n d v a lid a tio n v e rs u s N F R . B e y o n d e v e ry p a rtic u la r a p p ro a c h to th e
p ro b le m , a c o m m o n g ro u n d to w o rk o n c a n b e e n v is a g e d in th e fo llo w in g tw o is s u e s :
(i) d e te rm in in g th e a m o u n t a n d th e ty p e o f in fo rm a tio n to e m b e d in a s o ftw a re
a rc h ite c tu re in o rd e r to e n a b le its v a lid a tio n v e rs u s N F R , (ii) in tro d u c in g a lg o rith m s to
tra n s la te a rc h ite c tu re d e s c rip tio n la n g u a g e s /n o ta tio n (a u g m e n te d w ith a d d itio n a l
in fo rm a tio n ) in to a m o d e l re a d y to b e v a lid a te d . V a rio u s a p p ro a c h e s h a v e b e e n re c e n tly
in tro d u c e d fo r b o th th e is s u e s (s e e [3 ] fo r a n o v e rv ie w o n th is to p ic ). T w o a ttrib u te s
a p p e a r to d a y c ru c ia l to m a k e a n y a p p ro a c h to th e v a lid a tio n o f N F R re a lis tic a lly
a c c e p te d b y th e s o f tw a r e c o m m u n ity , th a t a r e : tr a n s p a r e n c y , i.e ., m in im a l a f f e c tio n o n
th e s o ftw a re n o ta tio n a n d th e s o ftw a re p ro c e s s a d o p te d (to c o p e w ith is s u e (i)), a n d
e ffe c tiv e n e s s , i.e ., lo w c o m p le x ity a lg o rith m s to a n n o ta te a n d tra n s fo rm s o ftw a re
m o d e ls (to c o p e w ith is s u e (ii)).
W e p ro p o s e h e re a c la s s ific a tio n o f th e p a ra m e te rs th a t a ll th e s e a p p ro a c h e s h a v e to
d e a l w ith , (m a in ly a im e d a t p ro v id in g g u id e lin e s fo r a s tru c tu re d a p p ro a c h to N F R
v a lid a tio n ), a n d w e s h o w h o w a N F R v a lid a tio n a p p ro a c h c a n b e s e e n a s a n
in s ta n tia tio n fro m th is fra m e w o rk . F ro m s e c tio n 4 o n w e fo c u s o n a p a rtic u la r c la s s o f
in s ta n c e s , th a t a r e th e a p p r o a c h e s to th e p e r f o r m a n c e v a lid a tio n o f m o b ile s o ftw a re
a rc h ite c tu re s .
T h e p a ra m e te rs o f a m e th o d o lo g y to v a lid a te a s o ftw a re a rc h ite c tu re v e rs u s n o n -
fu n c tio n a l re q u ire m e n ts c a n b e e x p re s s e d a s fo llo w s :
a r c h ite c tu r a l s ty le ( A S ) – th e s ty le , if a n y , a d o p te d to b u ild th e s o ftw a re a rc h ite c tu re
( e .g ., c lie n t- s e r v e r , m o b ile c o d e , e tc .1) ;
o r ig in a l n o ta tio n ( O N ) – th e a rc h ite c tu ra l d e s c rip tio n la n g u a g e /n o ta tio n u s e d to
m o d e l th e s o f tw a r e a r c h ite c tu r e , a s it is f r o m th e s o f tw a r e d e v e lo p m e n t p r o c e s s ( e .g .,
U M L , a P r o c e s s A lg e b r a , e tc .) ;
n o n -fu n c tio n a l a ttr ib u te (N F A ) – th e n o n -fu n c tio n a l a ttrib u te th a t is c o n c e rn e d w ith
th e s e t o f r e q u ir e m e n ts th a t th e s o ftw a re a rc h ite c tu re m u s t fu lfill ( e .g ., re lia b ility ,
p e r f o r m a n c e , s a f e ty , e tc .) ;
m is s in g in fo r m a tio n (M I ) – th e in fo rm a tio n th a t is la c k in g in th e s o ftw a re
a rc h ite c tu re d e s c rip tio n , w h ic h is ra th e r c ru c ia l fo r th e ty p e o f v a lid a tio n th a t is p u rs u e d
1
N o te th a t a n a rc h ite c tu ra l s ty le is d e fin e d a s a s e t o f c o n s tru c tio n ru le s th a t a d e v e lo p e r h a s
to fo ll o w w h ile b u ild in g a s o ftw a re a rc h i te c tu re . D e p e n d in g o n th e s ty l e , th o s e ru le s c a n
s p re a d o v e r d iffe r e n t a s p e c ts , s u c h a s : ty p e s o f in te ra c tio n s a m o n g c o m p o n e n ts , ro le s o f
c o m p o n e n ts , ty p e s o f c o n n e c to rs , e tc . In o u r c a s e , w e fo c u s o n a rc h ite c tu ra l s ty le s d e fin e d
o n th e c a p a b ility o f c o m p o n e n ts to m o v e .
(e .g ., n u m b e r o f in v o c a tio n s o f a c o m p o n e n t w ith in a c e rta in s c e n a rio , m a p p in g o f

c o m p o n e n ts to p la tf o r m s ite s , e tc .2) ;
c o lle c tio n te c h n iq u e ( C T ) – th e te c h n iq u e a d o p te d to c o lle c t th e m is s in g in fo rm a tio n
( e .g ., p r o to ty p e e x e c u tio n , r e tr ie v in g f r o m a r e p o s ito r y o f p r o je c ts ) ;
ta r g e t m o d e l n o ta tio n ( T M N ) – th e n o ta tio n a d o p te d fo r re p re s e n tin g th e m o d e l
w h o s e s o lu tio n p ro v id e s th e n o n -fu n c tio n a l a ttrib u te v a lu e s u s e fu l fo r th e v a lid a tio n
ta s k ( e .g ., P e tr i N e ts , Q u e u in g N e tw o r k s , e tc .) ;
s o lu tio n te c h n iq u e ( S T ) – th e te c h n iq u e a d o p te d to p ro c e s s th e ta rg e t m o d e l a n d
o b ta in a n u m e r ic a l s o lu tio n ( e .g ., s im u la tio n , a n a ly tic a l, e tc .) .
E v e ry v a lid a tio n a p p ro a c h c a n b e re d u c e d to a n a s s ig n m e n t o f v a lu e s to th e a b o v e
p a ra m e te rs , th e re fo re it m a y b e in te n d e d a s a n in s ta n c e o f th e fra m e w o rk th a t w e a r e
o u tlin in g . F o r e x a m p le , in a B a y e s ia n a p p ro a c h to th e re lia b ility v a lid a tio n o f a U M L -
b a s e d s o ftw a re a rc h ite c tu re , in w h ic h th e o p e ra tio n a l p ro file a n d th e fa ilu re p ro b a b ilitie s
a re m is s in g , th e fo llo w in g v a lu e s fo r th e a b o v e p a ra m e te rs m a y b e d e v is e d :
a r c h ite c tu r a l s ty le ( A S ) = “ d o n ’t c a re ” ;
o r ig in a l n o ta tio n (O N ) = U n ifie d M o d e lin g L a n g u a g e ;
n o n -fu n c tio n a l a ttr ib u te (N F A ) = re lia b ility ;
m is s in g in fo r m a tio n (M I ) = o p e ra tio n a l p ro file , fa ilu re p ro b a b ilitie s ;
c o lle c tio n te c h n iq u e ( C T ) = re p o s ito ry (o p e ra tio n a l p ro file ), u n it te s tin g (fa ilu re
p ro b a b ilitie s );
ta r g e t m o d e l n o ta tio n (T M N ) = B a y e s ia n s to c h a s tic m o d e l;
s o lu tio n te c h n iq u e (S T ) = n u m e ric a l s im u la tio n .
O b v io u s ly th e c h o ic e o f a p a r a m e te r v a lu e is n o t a lw a y s in d e p e n d e n t fro m th e
c h o ic e s o f th e o th e r o n e s . In m a n y c a s e s th e d o m a in o f a c h o ic e is re s tric te d to a s u b s e t
o f p o te n tia l v a lu e s a s a c o n s e q u e n c e o f a n o th e r p a ra m e te r a s s ig n m e n t. F o r e x a m p le , in
c as e a r e lia b ility v a lid a tio n h a s to b e p e r f o r m e d ( i.e ., N F A = re lia b ility ), it is q u ite
in c o n v e n ie n t to c h o o s e a Q u e u in g M o d e l a s ta rg e t ( i.e ., T M N = Q u e u e in g M o d e l),
b e c a u s e q u e u e s a re s u ita b le to re p re s e n t d e la y s a n d c o n te n tio n s , a n d th e y b a d ly w o rk to
c o m b in e fa ilu re p ro b a b ilitie s . T h e re fo re , a lth o u g h a p o te n tia l d o m a in fo r e v e ry
p a ra m e te r c a n b e d e fin e d , in p ra c tic e lim ita tio n s m a y re c ip ro c a lly in d u c e re s tric tio n s o f
d o m a in s w h ile th e c h o ic e s a re p ro g re s s iv e ly p e rfo rm e d .
In fig u re 1 w e p ro p o s e a d e p e n d e n c y g ra p h , w h e re e a c h n o d e re p re s e n ts o n e o f th e
a b o v e p a r a m e te r s . T w o ty p e s o f e d g e s ( i,j) a r e in tro d u c e d , b o th r e p r e s e n tin g a
d e p e n d e n c y b e tw e e n p a ra m e te rs i a n d j, w ith th e f o llo w in g s e m a n tic s :
w e a k d e p e n d e n c y (d a sh e d a rro w ) – it w o u ld b e b e tte r c h o o s in g th e v a lu e o f j a fte r
c h o o s in g th e v a lu e o f i; th is m e a n s th a t th e v a lu e a s s ig n e d to i h e lp s th e v a lid a tio n
te a m to b e tte r u n d e rs ta n d w h ic h w o u ld b e th e m o re a p p ro p ria te c h o ic e fo r j;
s tr o n g d e p e n d e n c y (c o n tin u o u s a r r o w ) – a v a lu e m u s t b e a s s ig n e d to j a fte r a s s ig n in g
a v a lu e to i; th is m e a n s th a t, w ith o u t k n o w in g th e v a lu e o f i, th e v a lid a tio n te a m
c a n n o t p e rfo rm th e c h o ic e o f j.
2
U s u a lly th e m is s in g in fo rm a tio n a p p e a rs (in th e w h o le a p p ro a c h ) e ith e r a s a n n o ta tio n s o n
th e a v a ila b le s o ftw a re a rc h ite c tu re d e s c rip tio n o r a s a n in te g ra tio n o f th e d e s c rip tio n its e lf
(in th e la tte r c a s e , fo r e x a m p le , a s a n e x te n s io n o f a s o ftw a re c o n n e c to r).
o r i gin a l n o t a t i o n a r c h it e ct u r a l s ty l e n o n - f u n ti o n a l
(O N ) (A S ) a t rt i b u t e N ( F A )
m i s s i n g i n f o r m a t io n t a r g e t m o de l
(M I ) n o t a ti o n ( T M N )
c o l le c t i o n t e h c n i q u e s o l u ti o n t ec h n i q u e
( C T ) (S T )
F i g u r e 1 . G ra p h o f d e p e n d e n c ie s a m o n g p a ra m e te rs fo r N F R v a lid a tio n
B e tw e e n O N a n d M I in fig u re 1 th e re is a s tro n g d e p e n d e n c y a s th e s o ftw a re m o d e l

n o ta tio n d e te rm in e s th e s e t o f ite m s a n d re la tio n s h ip s th a t a re a v a ila b le to m o d e l th e
a rc h ite c tu re , h e n c e d e te rm in e s a ls o th e s e t o f m is s in g ite m s a n d re la tio n s h ip s , d e p e n d in g
o n th e ty p e o f v a lid a tio n to p e r f o r m ( N F A ) . T h e s a m e ty p e o f d e p e n d e n c y o c c u rs
b e tw e e n N F A a n d M I , a s th e in fo rm a tio n la c k in g in th e a rc h ite c tu ra l d e s ig n m a y
h e a v ily d iffe r d e p e n d in g o n th e ty p e o f n o n -fu n c tio n a l a ttrib u te to v a lid a te . A s s e e n in
th e e x a m p le a b o v e , in s te a d , th e d e p e n d e n c y b e tw e e n A S a n d M I c a n b e c o n s id e re d a s a
w e a k o n e , b e c a u s e k n o w in g th e a rc h ite c tu ra l s ty le m a y h e lp to d e te rm in e th e m is s in g
in f o r m a tio n b u t, in s o m e c a s e s , c a n n o t a f f e c t a t a ll th is s e le c tio n ( e .g ., th e a r c h ite c tu r a l
s ty le h a s a lm o s t n o re la tio n s h ip w ith th e in fo rm a tio n to b e a d d e d to U M L d ia g ra m s in
o rd e r to p e rfo rm a re lia b ility a n a ly s is o f a c o m p o n e n t b a s e d s o ftw a re s y s te m ). In a
s im ila r w a y it c a n b e c o n s id e re d th e w e a k d e p e n d e n c y b e tw e e n A S a n d O N , b e c a u s e th e
a rc h ite c tu ra l s ty le m a y d riv e s o ftw a re d e v e lo p e rs to c h o o s e th e m o s t a p p ro p ria te
n o ta tio n th a t b e tte r s u its th a t s ty le c o n s tra in ts .
O n th e o th e r h a n d , th e d e p e n d e n c y b e tw e e n T M N a n d S T h a s b e e n re p re s e n te d a s a
w e a k o n e b e c a u s e a c e rta in ty p e o f m o d e l c a n b e s o lv e d (a lm o s t in a ll c a s e s ) b y d iffe re n t
te c h n iq u e s w ith d iffe re n t s o lu tio n p ro c e s s c o m p le x ity . T h e re fo re , in th e la tte r c a s e , it
w o u ld b e b e tte r to d e la y th e c h o ic e o f a s o lu tio n te c h n iq u e a fte r th e s e le c tio n o f th e
m o d e l n o ta tio n , in o r d e r to b e a b le to u s e th e te c h n iq u e w ith lo w e s t c o m p le x ity .
A n a lo g o u s c o n s id e ra tio n s c a n b e m a d e a ro u n d th e d e p e n d e n c y o f C T fro m M I, a s if w e
k n o w w h a t ty p e o f in fo rm a tio n h a s to b e c o lle c te d th e n w e c a n d e v is e a m u c h e ffe c tiv e
te c h n iq u e fo r th e C T ta s k . A n d , fin a lly , th e re is a w e a k d e p e n d e n c y b e tw e e n N F A a n d
T M N , d e riv e d fro m th e c o n s id e ra tio n th a t th e s a m e n o n -fu n c tio n a l a ttrib u te c a n b e
v a lid a te d u s in g d if f e r e n t ty p e s o f m o d e l ( e .g . P e tr i N e ts a n d Q u e u in g M o d e ls a re
s u ita b le m o d e ls fo r p e rfo rm a n c e e v a lu a tio n ), b u t th e c o m p le x ity o f th e v a lid a tio n
p ro c e s s m a y h e a v ily c h a n g e if u s in g a n o ta tio n ra th e r th a n a n o th e r, a n d th is d e p e n d s o n
th e s p e c ific n o n -fu n c tio n a l re q u ire m e n ts u n d e r v a lid a tio n .
F o r a n y p a ir o f p a r a m e te r s ( i,j) w ith o u t a c o n n e c tin g p a th in th e g r a p h o f f ig u r e 1 ,
n o e v id e n t d e p e n d e n c y o c c u rs , n a m e ly th e y c a n b e c o n c u rre n tly c h o s e n b e c a u s e o n e
v a lu e d o e s n o t b r in g a n y in f o r m a tio n o n th e o th e r o n e . F o r e x a m p le , th e re is n o
d e p e n d e n c y b e tw e e n C T a n d T M N , a s th e w a y w e c o lle c t th e m is s in g in fo rm a tio n is
n o t a ffe c te d b y th e ty p e o f ta rg e t m o d e l n o ta tio n , w h ic h a ffe c ts , in s te a d , th e w a y w e
re p re s e n t th a t in fo rm a tio n .
W ith th is c la s s ific a tio n w e h a v e in tro d u c e d a p a rtia l o rd e r in th e c h o ic e s th a t a
v a lid a tio n te a m h a s to p e rfo rm in o rd e r to a c c o m p lis h its v a lid a tio n ta s k . E s s e n tia lly
th re e p rim a ry p a ra m e te rs h a v e b e e n id e n tifie d in fig u re 1 , th a t a re A S , N F A a n d O N
(n o te th a t th e d e p e n d e n c y b e tw e e n A S a n d O N , b e in g w e a k , d o e s n o t a lw a y s h o ld ). T h is
m a tc h e s w ith a lm o s t a ll th e p ra c tic a l s itu a tio n s w h e re , s ta rtin g fro m a n a rc h ite c tu ra l
la n g u a g e /n o ta tio n O N (w h o s e c h o ic e s h o u ld b e in flu e n c e d b y th e a rc h ite c tu ra l s ty le
A S ), a n d h a v in g in m in d a n o n -fu n c tio n a l a ttrib u te to v a lid a te (N F A ), a ll th e re m a in in g

c h o ic e s a b o u t th e ta rg e t m o d e l a n d th e s o ftw a re a n n o ta tio n s h a v e to b e m a d e .
O n c e a ll th e p a ra m e te rs h a v e b e e n d e te rm in e d , w h a t re m a in s to d o is to in tro d u c e a
m e th o d o lo g y fo r th e tra n s la tio n o f th e a rc h ite c tu re d e s c rip tio n la n g u a g e /n o ta tio n , w ith
a d d itio n a l in fo rm a tio n a n n o ta te d o n it, in to th e ta rg e t m o d e l re a d y to b e v a lid a te d . In th e
fra m e w o rk o f th e c la s s ific a tio n s c h e m e o f fig u re 1 s u c h ty p e o f tra n s la tio n a lg o rith m
c a n b e s e e n a s O N + M I → T M N . A n y tra n s la tio n m e th o d o lo g y s h o u ld m a tc h th e
a ttrib u te s re m a rk e d a t th e b e g in n in g o f th is s e c tio n , th a t is m in im a l a ffe c tio n o n th e
s o ftw a re p ro c e s s a n d e ffe c tiv e n e s s . S e v e ra l p ro p o s a ls h a v e b e e n fo rm u la te d in th e la s t
fe w y e a rs fo r tra n s la tin g w id e ly u s e d s o ftw a re n o ta tio n s in to ta rg e t m o d e ls , b u t th e re is
s till c o n s id e ra b le ro o m fo r fu rth e r im p ro v e m e n ts a n d e x te n s io n s .
T h e tra c k o f th is tu to ria l is to lo o k , w ith in th e fra m e w o rk in tro d u c e d in th is s e c tio n ,
a t th e s o f tw a r e v a lid a tio n a p p r o a c h e s h a v in g A S in th e d o m a in o f m o b ile s o ftw a re
a rc h ite c tu re s a n d N F A in th e d o m a in o f p e rfo rm a n c e (s e e s e c tio n 4 ).
3 M o b ile C o d e P a r a d ig m s
T h e d e f in itio n o f th e s o ftw a r e a r c h ite c tu r e o f a n a p p lic a tio n , th a t is its c o a r s e - g r a in e d
o rg a n iz a tio n in te rm s o f c o m p o n e n ts a n d in te ra c tio n s a m o n g th e m , re p re s e n ts o n e o f
th e firs t a n d c ru c ia l s te p s in th e d e s ig n s ta g e [5 ]. S o ftw a re a rc h ite c tu re s c a n b e c la s s ifie d
a c c o rd in g to th e a d o p te d d e s ig n p a r a d ig m s ( o r a r c h ite c tu r a l s ty le s ) , e a c h o f th e m
c h a ra c te riz e d b y s p e c ific a rc h ite c tu ra l a b s tra c tio n s /ru le s a n d re fe re n c e s tru c tu re s , th a t c a n
th e n b e in s ta n tia te d in to a c tu a l a rc h ite c tu re s . C lie n t-s e rv e r is a tra d itio n a l e x a m p le o f
d e s ig n s ty le . In th is p e rs p e c tiv e , d iffe re n t m o b ile c o d e s ty le s c a n b e id e n tifie d , e a c h
c h a r a c te r iz e d b y d if f e r e n t in te r a c tio n p a tte r n s a m o n g c o m p o n e n ts lo c a te d a t d iffe re n t
s ite s , a n d th e a v a ila b le te c h n o lo g ie s fo r c o d e m o b ility p ro v id e s th e m e c h a n is m s to
in s ta n tia te th e m . A re v ie w o f th e s e te c h n o lo g ie s is o u t o f th e s c o p e o f th is p a p e r (s e e
[1 3 , 1 7 ] fo r a re v ie w o f n o ta b le e x a m p le s o f th e m ), w h e re a s in th is s e c tio n w e p re s e n t a
ta x o n o m y o f m o b ile c o d e s ty le s , a im e d a t p ro v id in g a b a s ic u n d e rs ta n d in g o f th e ir
fe a tu re s a n d p e rfo rm a n c e re la te d c o s ts , th a t w ill b e e x p lo ite d in th e s u b s e q u e n t
p re s e n ta tio n o f p e rfo rm a n c e v a lid a tio n m e th o d o lo g ie s fo r m o b ile a rc h ite c tu re s 3.
T h e ta x o n o m y is la rg e ly in s p ire d to th e o n e s p re s e n te d in [1 3 , 2 9 ], a n d is b a s e d o n
th e d e c o m p o s itio n o f d is trib u te d a p p lic a tio n s in to c o d e c o m p o n e n ts (th e k n o w -h o w to
p e rfo rm a c o m p u ta tio n ), r e s o u r c e s c o m p o n e n ts (re fe re n c e s to re s o u rc e s n e e d e d to
p e rfo rm a c o m p u ta tio n ), s ta te c o m p o n e n ts (c o m p ris in g p riv a te d a ta a s w e ll a s c o n tro l
in fo rm a tio n th a t id e n tify a th re a d o f e x e c u tio n , s u c h a s th e c a ll s ta c k a n d in s tru c tio n
p o in te r), in te r a c tio n s ( e v e n ts in v o lv in g tw o o r m o r e c o m p o n e n ts , lik e e x c h a n g in g a
m e s s a g e ), s ite s (lo c a tio n s w h e re p ro c e s s in g ta k e s p la c e ).
A ll th e b a s ic m o b ile c o d e s ty le s in c lu d e d in th is ta x o n o m y c o n s id e r a s in g le
in te ra c tio n b e tw e e n c o m p o n e n ts re s id in g a t tw o d iffe re n t s ite s , a im e d a t c a rry in g o u t a
g iv e n o p e ra tio n . T h e y d iffe r in th e d is trib u tio n o f c o m p o n e n ts a t th e tw o s ite s a t th e
b e g in n in g o f th e in te ra c tio n , in th e in te ra c tio n p a tte rn , a n d in th e d is trib u tio n o f
c o m p o n e n ts a t th e e n d o f th e in te ra c tio n , a s s h o w n in ta b le 1 , w h e re A a n d B d e n o te
th e c o m p o n e n ts th a t p a rtic ip a te in th e in te ra c tio n , C a n d R d e n o te th e c o d e a n d
3
O f c o u rs e , k n o w le d g e o f th e c h a ra c te ris tic s o f a p a rtic u la r m o b ile c o d e te c h n o lo g y w o u ld b e
n e c e s s a ry to fin e ly tu n e , in th e la te p h a s e s o f th e d e v e lo p m e n t c y c le , th e p e rfo rm a n c e m o d e l
o f m o b ile s o ftw a re a rc h ite c tu re .
re s o u rc e s (p a ra m e te rs ) n e e d e d to p e rfo rm th e o p e ra tio n , w h ile L 1 a n d L 2 a re tw o d iffe re n t

s ite s .
T h e s ty le s id e n tifie d a re : r e m o te e x e c u tio n (R E X ), c o d e o n d e m a n d (C O D ), a n d
m o b ile a g e n t (M A ). In a ll th e s ty le s , c o m p o n e n t A in itia te s th e in te ra c tio n .
T a b le 1 . M o b ile c o d e s ty le s
S ty le b e fo re in te ra c tio n a fte r in te r a c tio n

L 1 L 2 L 1 L 2
R E X A , C , R B A , C , R B , C , R
C O D A , R B , C A , C , R B , C
(w e a k /s tro n g ) M A A , C , R B - A , C , B , R
4
In th e R E X s ty le , b o th th e c o d e a n d th e p a ra m e te rs n e e d e d to p e rfo rm th e o p e ra tio n
a re p re s e n t a t th e A s ite , th a t s h ip s b o th o f th e m to th e B s ite , re q u e s tin g B to p e rfo rm
o n its b e h a lf th e o p e ra tio n (e x p lo itin g a ls o o th e r c o d e a n d re s o u rc e c o m p o n e n ts a lre a d y
p re s e n t a t s ite L 2 ). In g e n e ra l, a re p ly c o u ld b e s e n t to A a t th e c o m p le tio n o f th e
o p e ra tio n . J a v a s e r v le ts (“ p u s h ” c a s e ) [3 7 ] a n d th e R E V s c h e m e [3 4 ] a re
im p le m e n ta tio n s o f th is s ty le . T h e C O D s ty le is s o m e h o w th e c o m p le m e n ta ry o f R E X ,
s in c e in th is c a s e it is B th a t o w n s th e c o d e C a n d s h ip s it to A o n its re q u e s t. J a v a
a p p le ts [3 6 ] a re a n im p le m e n ta tio n o f th is s ty le .
In th e tw o s ty le s e x a m in e d s o fa r o n ly d a ta a n d “ p a s s iv e ” p ie c e s o f c o d e a re s e n t
fro m o n e lo c a tio n to a n o th e r o n e , to re -d ire c t th e lo c a tio n o f p ro c e s s in g , w h ile th e
lo c a tio n o f th e a c tiv e c o m p o n e n ts (in p a rtic u la r, th e ir s ta te c o m p o n e n t th a t id e n tifie s
th e ir th re a d o f e x e c u tio n ), re m a in fix e d . O n th e o th e r h a n d , in th e M A s ty le , a n a c tiv e
c o m p o n e n t m o v e s its e lf ( i.e . its s ta te c o m p o n e n t) to g e th e r w ith n e e d e d c o d e a n d
p a r a m e te r s to th e B s ite , to e x p lo it lo c a lly th e re s o u rc e s o f th a t s ite . A fu rth e r
d is tin c tio n c a n b e m a d e b e tw e e n a w e a k a n d a s tr o n g M A s ty le , w h e r e in th e f o r m e r
o n ly d a ta s ta te ( i.e ., p r iv a te v a r ia b le s ) is tra n s fe rre d , w h ile in th e la tte r a ls o th e
e x e c u tio n s ta te ( i.e ., in s tr u c tio n p o in te r a n d c o n tr o l s ta c k ) is tr a n s f e r r e d . I n th e c a s e o f
s tro n g M A , th e tra n s fe rre d c o m p o n e n t c a n im m e d ia te ly re s u m e its e x e c u tio n a t th e n e w
s ite fro m th e e x a c t p o in t w h e re it w a s s to p p e d , a t th e e x p e n s e o f fre e z in g , p a c k in g a n d
tra n s fe rrin g a ll th e c o m p u ta tio n s ta te , w h ic h c o u ld b e q u ite h e a v y . In th e c a s e o f w e a k
M A , th e a m o u n t o f tra n s fe rre d in fo rm a tio n (a n d th e w o rk d o n e to c a p tu re it) m a y b e
m u c h s m a lle r, b u t s o m e m e th o d m u s t b e d e v is e d to d e c id e , o n th e b a s is o f th e e n c o d e d
in fo rm a tio n , w h e re to re s ta rt e x e c u tio n a fte r m ig ra tio n . S e v e ra l te c h n o lo g ie s th a t
im p le m e n t w e a k a n d s tro n g M A p a ra d ig m s a re re v ie w e d in [1 3 , 1 7 ].
4 P e r fo r m a n c e V a lid a tio n o f M o b ile S o ftw a r e A r c h ite c tu r e s

In th is s e c tio n w e p ro p o se a c la s s ific a tio n o f th o s e p e rfo rm a n c e v a lid a tio n a p p ro a c h e s
p r o p o se d in th e lite ra tu re th a t a p p ly to m o b ile s o ftw a re a rc h ite c tu re s . In th e p e rsp e c tiv e
o f th e fra m e w o r k in s e c tio n 2 , th e s e a r e th e a p p ro a c h e s sp re a d in g A S o v e r th e m o b ility -
b a se d s ty le s a n d N F R o v e r th e p e rfo rm a n c e re q u ire m e n ts . A ll th e re m a in in g p a ra m e te rs
4
W e p re fe r to c a ll th is s ty le “ re m o te e x e c u tio n (R E X )” a s in [2 9 ] in s te a d o f th e o fte n u s e d
d e n o m in a tio n “ re m o te e v a lu a tio n (R E V )” to a v o id a m b ig u ity w ith th e R E V s c h e m e p ro p o s e d
in [3 4 ], w h ic h is a p a rtic u la r m e c h a n is m th a t im p le m e n ts th is s ty le .
m a y fre e ly v a ry a m o n g d iffe re n t a p p ro a c h e s , a n d w e p u s h th is c la s s ific a tio n to s h o w ,

w h e re fe a s ib le , v a lu e s th e y a s s u m e in e a c h a p p ro a c h in s ta n c e .
T h e a p p ro a c h e s h e re p re s e n te d a re p a rtitio n e d in to tw o m a in c la s s e s : a d -h o c a n d
g e n e r a l-p u r p o s e m e th o d o lo g ie s .
T h e c o n trib u tio n o f a d -h o c m e th o d o lo g ie s c o n s is ts o f c o s t m o d e ls fo r a s in g le
in te ra c tio n b e tw e e n c o m p o n e n ts , fo r d iffe re n t m o b ile c o d e p a ra d ig m s . T h e c o s t m o d e ls
a re p ro v id e d e ith e r a s c lo s e d -fo rm a n a ly tic e x p re s s io n s , o r a s d y n a m ic m o d e ls (P e tri n e t)
w h ic h a re n u m e ric a lly e v a lu a te d .
G e n e ra l-p u rp o s e m e th o d o lo g ie s c a n b e fu rth e r c la s s ifie d a s m e th o d o lo g ie s b a s e d o n
fo rm a l s p e c ific a tio n s o f th e s o ftw a re b e h a v io r (p ro c e s s a lg e b ra s ), a n d m e th o d o lo g ie s
b a s e d o n s e m i-fo rm a l s p e c ific a tio n s (U n ifie d M o d e lin g L a n g u a g e ). F o r b o th c a s e s w e
h ig h lig h t th e a d v a n ta g e s a n d p re s e n t s o m e m e th o d o lo g ie s p ro p o s e d in th e lite ra tu re to
g e n e ra te a p e rfo rm a n c e m o d e l s ta rtin g fro m d iffe re n t n o ta tio n s fo r m o b ility -b a s e d
a rc h ite c tu ra l m o d e ls .
4 .1 A d - h o c M o d e ls
A d -h o c m o d e ls c o n s id e r c o d e m o b ility “ in is o la tio n ” , p ro v id in g c o s t m o d e ls fo r a s in g le
in te ra c tio n b e tw e e n c o m p o n e n ts , fo r d iffe re n t m o b ile c o d e s ty le s . F ro m o u r v a lid a tio n
f r a m e w o r k v ie w p o in t, th e y c o n s id e r N F A ∈ { to ta l n e tw o rk lo a d , to ta l p ro c e s s in g
tim e } , w h ile th e a d o p te d T M N c o n s is ts o f e ith e r c lo s e d -fo rm a n a ly tic e x p re s s io n s , o r
d y n a m ic m o d e ls th a t c a n b e n u m e ric a lly e v a lu a te d . B e c a u s e o f th e la c k o f fe a tu re s to
m o d e l a w h o le a p p lic a tio n , a d -h o c m o d e ls c a n n o t b e c o n s id e re d a s g e n e ra l v a lid a tio n
m e th o d o lo g ie s . H o w e v e r, th e y c o n trib u te to c la rify th e d e p e n d e n c ie s b e tw e e n (A S ,
N F A ) a n d M I, w h e n A S s p re a d s o v e r m o b ility -b a s e d s ty le s , b y g iv in g in s ig h ts a b o u t
th e q u a n titie s th a t a ffe c t th e s e le c te d N F A s , a n d h e n c e a b o u t th e in fo rm a tio n th a t s h o u ld
b e c o lle c te d in a n y v a lid a tio n m e th o d o lo g y fo r th e s e a ttrib u te s .
W e re v ie w a d -h o c m o d e ls p ro p o s e d in th e lite ra tu re 5, p r e s e n tin g a ll o f th e m in a
u n ifie d s c e n a rio , c o n s is tin g o f a s in g le “ in te ra c tio n s e s s io n ” b e tw e e n tw o p a rtn e rs (A
a n d B ) re s id in g a t d iffe re n t lo c a tio n s , w ith A re q u e s tin g to B th e e x e c u tio n o f a n
o p e ra tio n th a t c a n b e a rtic u la te d in N “ lo w le v e l” re q u e s ts , a n d c o rre s p o n d in g
(in te rm e d ia te ) re s u lts .
C lo s e d - F o r m M o d e ls . W e p re s e n t a ll th e m o d e ls re v ie w e d in th is s e c tio n a s
s p e c ia l in s ta n c e s o f g e n e ra l c lo s e d -fo rm e x p re s s io n s fo r th e a v e ra g e to ta l n e tw o rk lo a d ,
X X
a n d th e a v e ra g e to ta l p r o c e s s in g tim e , d e n o te d a s L a n d T re s p e c tiv e ly , w ith
X ∈ { R E X , C O D , M A } .
C o m m o n p a ra m e te rs u s e d in a ll th e c lo s e d -fo rm s (h e n c e re p re s e n tin g th e m is s in g
in fo rm a tio n M I fo r th e c o n s id e re d m e a s u re s ) a re :
re q : a v e ra g e s iz e (in b y te s ) o f a “ lo w le v e l” o p e ra tio n re q u e st
re p : a v e ra g e s iz e (in b y te s ) o f a “ lo w le v e l” re s u lt
X 6 X
: c o m m u n ic a tio n o v e rh e a d ; B r e q : a v e ra g e s iz e o f a s in g le re q u e s t;
5
M o s t o f th e p a p e rs c o n s id e re d in th is s e c tio n a ls o p re s e n t m o d e ls fo r th e c lie n t-s e rv e r
s ty le , fo r th e s a k e o f c o m p a r is o n w i th c o d e m o b ility s ty le s .
6
T h is c o e ffic ie n t ta k e s in to a c c o u n t th e o v e rh e a d c a u s e d b y a d d itio n a l in fo rm a tio n n e e d e d
X
fo r c o n n e c tio n s e t u p a n d m e s s a g e e n c a p s u la tio n ; in g e n e ra l, th e c o e ffic ie n t, X ∈ { C O D ,
X
B r e p : a v e ra g e s iz e o f a s in g le re p ly ;
X
: n e tw o rk th ro u g h p u t (in b y te s /s e c ); : a v e ra g e n e tw o rk la te n c y ;
X
M : a v e ra g e m a rs h a llin g /u n m a rs h a llin g tim e o f a re q u e s t/re p ly ;
T r Xe q : a v e r a g e p r o c e s s i n g t i m e ( f o r A ) o f a r e q u e s t ;
X
T r e p : a v e ra g e p ro c e s s in g tim e (fo r B ) o f a re p ly ;
X X
: s e m a n tic c o m p re s s io n fa c to r fo r re p lie s (0 < ≤ 1 ).
O th e r p a ra m e te rs , u s e d o n ly in s o m e c lo s e d fo rm s , a re lis te d in th e fo llo w in g .
R E X s ty le . A a sse m b le s th e o rig in a l re q u e s ts in to le s s th a n N “ h ig h -le v e l” o p e ra tio n s
re q u e s ts ( a t m o s t, th e y a re a ll a s s e m b le d in a s in g le o p e ra tio n ), s e n d s th e m to g e th e r
w ith th e c o rre s p o n d in g c o d e to B , a n d g e t s th e c o rre s p o n d in g re p lie s . C lo s e d -fo rm
e x p r e s s io n s fo r th e n e tw o rk lo a d a n d th e p r o c e s s in g tim e a re :
L
R E X
= R
R E X R E X
( B c R o E d Xe + R E X
B re p
R E X
)
T
R E X
=
1
L R E X + R R E X ( R E X R E X
+ M R E X + T re q
R E X
+ T re p )
R E X
w h e re : R : n u m b e r o f “ h ig h le v e l” o p e ra tio n re q u e s ts n e e d e d to c o m p le te th e
in te ra c tio n
B c R o E d Xe : a v e ra g e s iz e o f th e c o d e o f a h ig h le v e l o p e ra tio n s e n t to B fo r re m o te
e v a lu a tio n
T a b l e 2 . P ro p o s e d p a ra m e te rs in s ta n tia tio n s in c lo s e d -fo rm s fo r th e R E X s ty le
R E X R E X R E X R E X R E X
R R E X R E X M R E X R E X
B c o d e B r e p T r e q T r e p
[2 ] 1 > 1 > 0 N re p 1 , 1 /N - - - -
[2 1 ]
7 ≥ 1 , < N > 1 > 0 > 0 1 0 > 0 > 0 > 0
R E X R E X
( r e q ·N /R ) (r e p ·N /R )
C O D s ty le . T h e s c e n a r io m o d e le d b y th e f o llo w in g c lo s e d -fo rm s a ssu m e s th a t A

re q u e s ts th e e x e c u tio n o f le s s th a n N h ig h -le v e l o p e ra tio n s to B ; to m o d e l a C O D -b a se d
in te ra c tio n , w e a s s u m e th a t B , if th e n e e d e d c o d e is p re s e n t a t its s ite , e x e c u te s th e
o p e ra tio n s , o th e rw is e fe tc h e s th e c o d e fro m s o m e o th e r s ite . C lo s e d - fo rm e x p re s s io n s
fo r th e n e tw o rk lo a d a n d th e p ro c e s s in g tim e a re :
L C O D
= R
C O D C O D
( B Cr e O q D + P c C o Od e D ( B C f e O t c D h + B c C o O d eD ) + C O D C O D
B re p )
T
C O D
=
1
L C O D + R C O D ( C O D
+ M C O D + T re q
C O D
+ T re p
C O D
)
R E X , M A } m a y b e d e p e n d e n t o n th e s iz e o f th e d a ta e x c h a n g e d in th e c o m m u n ic a tio n , th a t is
X X
th e o v e rh e a d c o e ffic ie n t fo r d a ta o f s iz e Y is : = (Y ).
7
T h e a u th o rs in [2 1 ] c a lls th e R E X s ty le a s “ s ta tio n a ry a g e n t a c c e s s ” (S A ) s ty le .
w h e re :
C O D
R : a v e ra g e n u m b e r o f “ h ig h le v e l” o p e ra tio n s n e e d e d to c o m p le te th e in te ra c tio n ;
P c C o Od e D : p ro b a b ility th a t th e c o d e fo r a h ig h le v e l o p e ra tio n is n o t a lre a d y p re s e n t a t
th e lo c a tio n o f B ;
B Cf e O t c D h : a v e ra g e s iz e o f th e re q u e s t fo r th e c o d e o f a h ig h le v e l o p e ra tio n s e n t b y B ;
B c C o O d eD : a v e ra g e s iz e o f th e c o d e o f a h ig h le v e l o p e ra tio n .
T a b l e 3 . P ro p o s e d p a ra m e te rs in s ta n tia tio n s in c lo s e d -fo rm s fo r th e C O D s ty le
R
C O D C O D
C O D C O D C O D C O D C O D C O D C O D
M C O D C O D C O D
B r e q P c o d e B c o d e B fe tc h B re p T re q T re q
[2 ] 1 > 1 > 0 ≥ 0 , > 0 > 0 N re p 1 , - - - -

≤ 1 1 /N
W e p o in t o u t th a t th e m o d e l fo r th e p ro c e s s in g tim e o f th e C O D s ty le h a s b e e n
e x tra p o la te d fro m th e m o d e ls fo r o th e r s ty le s , s in c e n o e x p lic it m o d e l fo r th e p ro c e s s in g
tim e o f th is s ty le is p re s e n t in th e lite ra tu re . F o r th is re a s o n n o s p e c ific in s ta n tia tio n o f
p a ra m e te rs in th e la tte r fo u r c o lu m n s o f ta b le 3 is g iv e n .
M A s ty le . A m o v e s to th e B s ite , to in te ra c t lo c a lly w ith B . T h e n , it c a n g o b a c k t o
th e s ta rtin g s ite , o r m o v e to s o m e o th e r s ite , c a rry in g w ith it th e in fo rm a tio n
a c c u m u la te d a t th e B s ite ; in th e la tte r c a s e , it c a n o p tio n a lly s e n d b a c k th e c o lle c te d
in fo rm a tio n to th e s ta rtin g s ite . C lo s e d -fo rm e x p re s s io n s fo r th e n e tw o r k lo a d a n d th e
p ro c e s s in g tim e a re :
L M A
= M A
(( P M A
c o d e + b a c k c o d e ) B c M o dA e
+ (1 + b a c k (
) B M A
s ta te + B d M a At a ) + ( 1 − b a c k ) re p
M A
B M A
re p )
M A 1
L M A + M A
+ M M A + T re q + T re p
T = M A M A
M A
w h e re : P c o d e : p ro b a b ility th a t th e c o d e o f th e m o b ile a g e n t is n o t a lre a d y p re s e n t a t
th e lo c a tio n o f B ;
B c M o dA e : a v e r a g e s i z e o f t h e m o b i l e a g e n t c o d e ;
M A
B s ta te : a v e ra g e s iz e o f th e m o b ile a g e n t e x e c u tio n s ta te ;
B d M a tA a : a v e ra g e s iz e o f th e m o b ile a g e n t d a ta (b e fo re th e in te ra c tio n s ta rts );
⎧ 1 if th e a g e n t g o e s b a c k to th e s ta rtin g lo c a tio
b a c k = ⎨ ;
⎩ 0 o th e rw is e
8 ⎧ 1 if th e a g e n t c o d e is n o t re ta in e d a t th e re tu rn lo c a tio
c o d e = ⎨ ;
⎧ 1 if a “ h ig h le v e l” re p ly is s e n t to th e s ta rtin g lo c a tio
re p = ⎨ .
T a b l e 4 . P ro p o s e d p a ra m e te rs in s ta n tia tio n s in c lo s e d -fo rm s fo r th e M A s ty le
[3 5 ] [9 ] [2 ] [2 2 ] [2 1 ]
M A 1 1 > 1 > 1 > 1
P
M A ≥ 0 , ≤ 1 ≥ 0 , 1 1 1
c o d e
≤ 1
B
M A > 0 (≥ N ·r e q ) > 0 > 0 > 0 > 0
c o d e
B
M A > 0 > 0 > 0 > 0 > 0
s ta te
B d a ta
M A ≥ 0 ≥ 0 ≥ 0 ≥ 0 0
M A > 0 , ≤ 1 1 1 , 1 /N > 0 , ≤ 1 1
M A N ·r e p N ·r e p N ·r e p N ·r e p > 0
B re p
b a c k 0 0 , 1 0 0 1
c o d e - 1 - - 0
re p 0 , 1 0 0 , 1 1 1
9
M A (1 + re p )δ - - 0 0
M A M A M A M A 1 0 - - > 0 > 0
M 2 μ ( B d a ta + B s ta te + re p B r e p )
M A 0 - - 0 > 0
T re q
M A 0 - - > 0 > 0
T re p
I n a ll th e c o n s id e re d m o d e ls fo r th e M A s ty le (w ith th e e x c e p tio n o f [2 2 ], w h e re it
is u n sp e c ifie d ) it is a s s u m e d th a t, a fte r th e c o m p le tio n o f th e in te ra c tio n , th e a g e n t d a ta
o l l o w s : B d M a tA a ← B d M a tA a + M A
g ro w a s f B r Me p A . I n t h i s w a y it is m o d e le d th e (p o s s ib le )
a c c u m u la tio n o f in fo rm a tio n c o lle c te d b y th e m o b ile a g e n t a s it v is its n e w s ite s .
W ith re g a rd to th e p a ra m e te rs in s ta n tia tio n s s h o w n in ta b le 4 , it s h o u ld b e n o te d
th a t th e m a rs h a llin g /u n m a rs h a llin g o v e rh e a d o f [3 5 ] is c a lc u la te d u n d e r th e a s s u m p tio n
th a t th e a g e n t c o d e is a lre a d y a v a ila b le in tra n s p o rt fo rm a t. [2 2 ] a n a ly z e s a b ro a d c a s t
8
If th e a g e n t g o e s b a c k to th e s ta rtin g A lo c a tio n , th e n c o d e = 0 m e a n s th a t o n l y its d a ta
(th e o rig in a l d a ta p lu s th e o n e s c o lle c te d a t B lo c a tio n ) a n d e x e c u tio n s ta te a c tu a lly g o b a c k ,
s in c e a c o p y o f th e ( im m u ta b le ) c o d e h a s b e e n re ta in e d th e re ; th e te rm (1 - b a c k ) th a t
m u ltip lie s re p m e a n s th a t o n ly if th e a g e n t d o e s n o t g o b a c k to th e s ta rtin g lo c a tio n , a
re p ly c o u ld b e s e n t th e re .
9
δ d e n o te s th e a v e r a g e r o u n d tr ip tim e ( in s e c s .) .
1 0
μ > 0 re p re s e n ts a m a rs h a llin g /u n m a rs h a llin g fa c to r (in s e c s /b y te ); th is fa c to r is m u ltip lie d
b y 2 to ta k e in to a c c o u n t b o th m a rs h a llin g a n d u n m a rs h a llin g o f a m e s s a g e .
d a ta filte rin g a p p lic a tio n , w h e re th e M A p a ra d ig m (w ith filte rin g a t th e s e rv e r) is

c o m p a re d a g a in s t b ro a d c a s t filte rin g a t th e c lie n t (h e n c e n o o th e r m o b ile c o d e s ty le is
m o d e le d , w h ile c lo s e d -fo rm s a re p re s e n te d fo r n e tw o rk lo a d a n d p ro c e s s in g tim e in th e
b ro a d c a s t c a s e ). M o re o v e r, [4 ] b u ild s o n th e m o d e l p re s e n te d in [2 ] to d e riv e c lo s e d -
fo rm s e x p re s s in g th e n e tw o rk lo a d c a u s e d b y th e M A s ty le in c o n ju n c tio n w ith
m u ltic a s t p ro to c o ls to d e liv e r th e a g e n t to m u ltip le d e s tin a tio n s .
D y n a m i c m o d e l s . T h e in te ra c tio n s c e n a rio m o d e le d b y th e m o d e ls c o n s id e re d in th is
s e c tio n is th e s a m e c o n s id e re d b e fo re , b u t, d iffe re n tly fro m th e m o d e ls o f th e p re v io u s
s e c tio n , h e re th e y a d o p t T M N = { P e tri n e t} a n d a re lim ite d to N F A = { to ta l p ro c e s s in g
tim e } [2 7 ]. M o re o v e r, th e s e m o d e ls re fe r o n ly to A S ∈ { R E X , M A } , w h e re a s n o
m o d e l is p ro v id e d in [2 7 ] fo r th e C O D s ty le . W e d o n o t p re s e n t d e ta ils o f th e s e m o d e ls .
A n y w a y , th e p a ra m e te rs a re b a s ic a lly th e s a m e a s th e o n e s in th e p re v io u s s e c tio n ,
e x c e p t fo r th e fa c t th a t m a n y o f th e m a re in s ta n tia te d a s ra n d o m v a ria b le s w ith a g iv e n
p ro b a b ility d is trib u tio n , ra th e r th a n a s c o n s ta n t (a v e ra g e ) v a lu e s , a n d c a n b e lis te d a s
fo llo w s (w ith X ∈ { R E X , M A } ):
P c Xo d e = 1 ; X
X
= 1 ; B r e p : u n if o r m ly d is tr ib u te d in [ r e p m in , r e p m a x ]
M A
B d a ta = 0 ; b a c k = 0 ; re p = 0 o r rep = 1 ;
X X X
= 0 ; M = 0 ; T r e q a n d T r Xe p : e x p o n e n tia lly d is trib u te d .
T h e o n ly re m a rk a b le d iffe re n c e w ith th e p re v io u s c lo s e d -fo rm m o d e ls c o n c e rn s th e
se m a n tic c o m p re s s io n th a t in [2 7 ] is m o d e le d o n ly fo r th e M A p a ra d ig m a s a c o n s tra in t
o n th e g ro w th o f th e B d M a tA a p a r a m e te r, a s fo llo w s : B d M a tA a ← B d M a tA a + D r Me p A , w i t h
M A
D r e p u n ifo r m ly d is tr ib u te d in [ r e p m in , n · r e p m a x ] , w h e r e n > 1 is a p a r a m e te r in d e p e n d e n t
o f th e a g e n t h is to ry . H e n c e , d iffe re n tly fro m th e c lo s e d -fo rm m o d e ls o f th e p re v io u s
s e c tio n , th e a g e n t d a ta d o n o t g ro w lin e a rly w ith th e n u m b e r o f v is ite d lo c a tio n s , if th e
a g e n t v is its m o re th a n o n e lo c a tio n .
4 .2 G e n e r a l- P u r p o s e F o r m a l M o d e ls : P r o c e s s A lg e b r a s
T h e v a lid a tio n m e th o d o lo g ie s c o n s id e re d in th is s e c tio n a re b a s e d o n th e s e le c tio n o f
O N = { P ro c e s s A lg e b ra s } , a n d d o n o t fo c u s o n s p e c ific N F A s . H e n c e , th e a d o p te d
T M N is g e n e ra l e n o u g h to a llo w th e e v a lu a tio n o f d iffe re n t N F A s , a n d c o n s is ts o f
T M N = { S to c h a s tic P ro c e s s A lg e b ra s + a s s o c ia te d M a rk o v P ro c e s s e s } ; c o rre s p o n d in g ly ,
p o s s ib le S T s c o n s is t o f a n y s o lu tio n te c h n iq u e s u ita b le fo r th is T M N . F ro m th e s e
c h o ic e s it re s u lts th a t M I in c lu d e s a t le a s t th e (e x p o n e n tia l) c o m p le tio n ra te s o f a ll th e
a c tiv itie s th a t a re m o d e le d in th e a d o p te d P ro c e s s A lg e b ra .
P ro c e s s a lg e b ra s a re w e ll-k n o w n fo rm a lis m s fo r th e m o d e llin g a n d a n a ly s is o f p a ra lle l
a n d d is trib u te d s y s te m s . W h a t m a k e s th e m a ttra c tiv e a s O N fo r th e e v a lu a tio n o f la rg e
a n d c o m p le x s y s te m s , a re m a in ly th e ir c o m p o s itio n a l a n d a b s tra c tio n fe a tu re s , th a t
fa c ilita te b u ild in g c o m p le x s y s te m m o d e ls fro m s m a lle r o n e s . M o re o v e r, th e y a re
e q u ip p e d w ith a fo rm a l s e m a n tic s , th a t a llo w s a n o n a m b ig u o u s s y s te m s p e c ific a tio n ,
a n d a c a lc u lu s th a t a llo w s to p ro v e rig o ro u s ly w h e th e r s o m e fu n c tio n a l p ro p e rtie s h o ld .
S to c h a s tic p ro c e s s a lg e b ra s a re a n e x te n s io n o f th e s e fo rm a lis m s w ith s to c h a s tic
fe a tu re s fo r th e s p e c ific a tio n o f s y s te m a c tiv itie s d u ra tio n , th a t a llo w th e a n a ly s is o f
q u a n tita tiv e n o n -fu n c tio n a l p ro p e rtie s . W e d e fe r to th e v a s t a v a ila b le lite ra tu re fo r
d e ta ils a b o u t th e g e n e r a l c h a r a c te r is tic s o f th e s e f o r m a lis m s ( e .g ., [ 1 4 ] ) , a n d f o c u s in
th is s e c tio n o n p ro c e s s a lg e b ra s fo r th e m o d e lin g o f m o b ile s o ftw a re a rc h ite c tu re s . W e
o n ly p ro v id e th e ir (p a rtia l) fo rm a l s y n ta x a n d in fo rm a l d e s c rip tio n s o f th e c o rre s p o n d in g

s e m a n tic s , a im e d a t illu s tra tin g th e s a lie n t fe a tu re s o f d iffe re n t a p p ro a c h e s to fo rm a l
m o d e lin g o f c o d e m o b ility . T h e n , w e illu s tra te a p ro p o s e d m e th o d o lo g y fo r th e
tra n s la tio n o f th is f o r m a lis m in to a T M N fo r th e a n a ly s is o f N F A s , w ith T M N
c o n s is tin g o f M a rk o v p ro c e s s e s (a n d s to c h a s tic p ro c e s s a lg e b ra s a s in te rm e d ia te T M N ).
A p ro c e ss a lg e b ra is a fo rm a l la n g u a g e , w h o s e s y n ta x b a s ic a lly a p p e a rs lik e th is :
P : : = 0 ⏐ . P ⏐ P + P ⏐ P || P ⏐ … 11
w h e re 0 d e n o te s th e “ n u ll” (te rm in a te d ) p ro c e s s th a t c a n n o t p e rfo rm a n y a c tio n , + a n d ||
d e n o te p ro c e ss c o m p o s itio n b y n o n -d e te rm in is tic a lte rn a tiv e o r p a ra lle lis m ,
re s p e c tiv e ly , a n d π .P d
e n o te s th e p ro c e s s th a t p e rfo rm s a c tio n ∈ A c t, a n d th e n b e h a v e s
a s P (w h e re A c t is a s
e t o f p o s s ib le a c tio n s ). P ro c e s s a lg e b ra s fo r m o b ility m o d e lin g
b a s ic a lly d iffe r in th e s e t A c t o f a c tio n s th e d e fin e d p ro c e s s e s c a n p e rfo rm . W e g ro u p
th e m in to tw o s e c tio n s, b a se d o n th e w a y u s e d to m o d e l th e lo c a tio n o f c o m p o n e n ts .
E x a m p le 1 . T o illu s tra te s o m e o f th e m o d e lin g a p p ro a c h e s re v ie w e d in th is a n d n e x t
s e c tio n , w e w ill u s e a s im p le a p p lic a tio n e x a m p le b a s e d o n a tra v e lin g a g e n c y s c e n a rio ,
w h e re a tra v e l a g e n c y p e rio d ic a lly c o n ta c ts K fly in g c o m p a n ie s to g e t in fo rm a tio n
a b o u t th e c o s t o f a tic k e t fo r s o m e itin e ra ry . T h e a g e n c y e x c h a n g e s a s e q u e n c e o f N
m e s s a g e s w ith e a c h c o m p a n y , to c o lle c t th e re q u ire d in fo rm a tio n . U s in g a tra d itio n a l
c lie n t-s e rv e r a p p ro a c h , th is m e a n s th a t th e a g e n c y s h o u ld e x p lic itly e s ta b lis h N R P C s
w ith e a c h c o m p a n y to c o m p le te th e ta s k . O n th e o th e r h a n d , w ith a R E X a p p ro a c h , th e
a g e n c y c o u ld s e n d a c o d e e n c o m p a s s in g a ll th e N m e s s a g e s a lo n g w ith s o m e g lu in g
o p e ra tio n s , to b e e x e c u te d b y e a c h c o m p a n y , g e ttin g o n ly th e fin a l re p ly . W ith in a
C O D a p p r o a c h , w e c o u ld th in k th a t th e a g e n c y m a k e s a n o v e ra ll re q u e s t to e a c h
c o m p a n y , a n d th a t it is th e re s p o n s ib ility o f e a c h c o m p a n y to p o s s ib ly g e t s o m e w h e re
th e n e e d e d c o d e to fu lfill th e re q u e s t. F in a lly , in a n M A a p p ro a c h , th e a g e n c y c o u ld
d e liv e r a n a g e n t th a t tra v e ls a lo n g a ll th e K c o m p a n ie s g e ttin g lo c a lly th e in fo rm a tio n ,
a n d th e n re p o rts it b a c k to th e a g e n c y . E n d O fE x a m p le 1
M o d e ls w i t h “ I n d ir e c t ” L o c a t io n S p e c if ic a t io n . A lg e b ra s lis te d in th is
s e c tio n c a n b e c o n s id e re d a s a d ire c t d e riv a tio n fro m C C S -lik e a lg e b ra s [2 3 ], a n d a re
c h a ra c te riz e d b y th e m o d e lin g o f m o b ility a s a c h a n g e in th e lin k s th a t c o n n e c t
p ro c e s s e s . B e fo re re v ie w in g th e m , w e b rie fly illu s tra te a b a s ic C C S -lik e a lg e b ra . In th is
c a se , w e h a v e ∈ { i (i = 1 , 2 , … ), in x , o u t x } , w h e re i d e n o te s a “ s ile n t” (in te rn a l)
a c tio n o f a p ro c e s s 12, w h ile in a n d o u t a re in p u t a n d o u tp u t a c tio n s , re s p e c tiv e ly ,
a lo n g th e lin k n a m e d x , th a t c a n b e u s e d to s y n c h ro n iz e a p ro c e s s w ith a n o th e r p a ra lle l
p ro c e s s th a t e x e c u te s th e ir o u tp u t o r in p u t c o u n te rp a rts a lo n g th e s a m e lin k . F o r
e x a m p le , if tw o p ro c e s s e s a re s p e c ifie d a s fo llo w s :
P := o u t a .P 1 Q := in a .Q 1
f r o m t h e s e d e f i n i t i o n s w e g e t t h a t P || Q e v o l v e s i n t o P 1 || Q 1 , t h a t i s , p r o c e s s e s P a n d
Q s y n c h r o n iz e ( i.e ., w a it f o r e a c h o th e r ) th a n k s to a c o m m u n ic a tio n a lo n g lin k a , a n d
1 1
N o te th a t, fo r th e s a k e o f s im p lic ity , th is s y n ta x is in c o m p le te , s in c e w e a re o m ittin g
c o n s tru c ts to d e fin e a b s tra c tio n m e c h a n is m s , o r re c u rs iv e b e h a v io r, e tc .
1 2
S u b s c rip t i is u s e d to d is tin g u is h d iffe re n t in te rn a l a c tio n s , w h ic h is u s e fu l fo r m o d e lin g
p u rp o s e s .
th e n p ro s e c u te in p a ra lle l (p o s s ib ly in d e p e n d e n tly o f e a c h o th e r, if n o o th e r
s y n c h ro n iz in g c o m m u n ic a tio n ta k e s p la c e in th e ir fo llo w in g b e h a v io r).
-c a lc u lu s [2 4 ]. T h is a lg e b ra , b e s id e s s y n c h ro n iz a tio n b e tw e e n p a ra lle l p ro c e sse s,
a llo w s a ls o lin k n a m e s c o m m u n ic a tio n , s o th a t w e c a n c h a n g e th e lin k s a p ro c e ss u se s
to c o m m u n ic a te w ith o th e r p ro c e s s e s . T h e p o s s ib le s y s te m a c tio n s a re ∈ { i (i = 1 ,
2 , … ), in x , o u t x , in x (y ), o u t x (Y )} , w h e re in a d d itio n to th e a b o v e d e f in itio n s , Y (y )
is a “ lin k n a m e ” (lin k v a ria b le ), s e n t (re c e iv e d ) o v e r th e lin k n a m e d x . F o r e x a m p le ,
w ith th e fo llo w in g s p e c if ic a tio n s :
P 1 := o u t a (b ).P 3 P 2 := o u t a (c ).P 3 Q := i n a (y ).o u t y .Q 1
w e g e t t h a t P 1 || Q e v o l v e s i n t o P 3 || o u t b . Q 1 , w h i l e P 2 || Q e v o l v e s i n t o P 3 || o u t c . Q 1 .
In th is e x a m p le , th e p a ra lle l c o m p o s itio n o f Q w ith P 1 o r P 2 g iv e s ris e to th e e v o lu tio n
o f Q in to a p ro c e s s th a t c o m m u n ic a te s a lo n g th e lin k b o r c , re s p e c tiv e ly , a n d th e n
b e h a v e s a s p ro c e ss Q 1 .
H O -c a lc u lu s [ 3 0 ] . B e s id e s th e o p e r a tio n s o f -c a lc u lu s , th is a lg e b r a a llo w s a ls o th e
c o m m u n ic a tio n o f p ro c e s s n a m e s , s o th a t w e c a n c h a n g e th e b e h a v io r o f th e re c e iv in g
p ro c e s s . T h e p o s s ib le s y s te m a c tio n s a re a g a in ∈ { i ( i = 1 , 2 , … ) , in x , o u t x ,
i n x (y ), o u t x (Y ))} , b u t, in a d d itio n to th e a b o v e d e f in itio n s , Y ( y ) m a y a ls o b e a
“ p ro c e s s n a m e ” (p ro c e s s v a ria b le ) b e s id e s a lin k n a m e (v a ria b le ), s e n t ( re c e iv e d ) o v e r th e
lin k n a m e d x . F o r e x a m p le , w ith th e fo llo w in g s p e c ific a tio n s :
P 1 := o u t a (R ).P 3 P 2 := o u t a (S ).P 3 Q := i n a (z ).z .Q 1
w e g e t t h a t P 1 || Q e v o l v e s i n t o P 3 || R . Q 1 , w h i l e P 2 || Q e v o l v e s i n t o P 3 || S . Q 1 . I n
o th e r w o rd s , th e p a ra lle l c o m p o s itio n o f Q w ith P 1 o r P 2 g iv e s ris e to th e e v o lu tio n o f
Q in to a p ro c e s s th a t b e h a v e s lik e p ro c e s s R o r S , re s p e c tiv e ly , a n d th e n a s p ro c e s s Q 1.
E x a m p le 2 13. L e t u s c o n s id e r th e s y s te m o f e x a m p le 1 in th e c a s e o f K = 2 fly in g
c o m p a n ie s , w ith F i a n d a i ( i= 1 , 2 ) d e n o tin g a c o m p a n y a n d th e c h a n n e l u s e d to
c o m m u n ic a te w ith it, C d e n o tin g th e o v e ra ll c o d e c o rre s p o n d in g to th e N “ lo w le v e l”
in te ra c tio n s , a n d R i th e o v e ra ll re s p o n s e c o lle c te d a t c o m p a n y F i. U s in g H O π - c a lc u lu s ,
th is a p p lic a tio n c o u ld b e m o d e le d a s fo llo w s , in c a s e o f R E X p a ra d ig m (w h e re S y s
m o d e ls th e o v e ra ll a p p lic a tio n ):
T ra v A g = o u t a 1 ( C ) .i n a 1 ( x ) .o u t a 2 ( C ) .i n a 2 ( x ) .T r a v A g
F i = i n a i ( z ) . z .o u t a i ( R i ) . F i
S y s = T r a v A g || F 1 || F 2
E n d O fE x a m p le 2 .
M o d e ls w it h “ D ir e c t ” L o c a t io n S p e c if ic a t io n . T h e a b o v e a p p ro a c h e s su g g e st
a s O N fo r th e m o d e lin g o f m o b ile a rc h ite c tu re s a p ro c e s s a lg e b ra w h e re th e lo c a tio n o f
a p ro c e s s is in d ire c tly d e fin e d in te rm s o f its c o n n e c tiv ity , i.e . th e lin k n a m e s it s e e s
a n d th e id e n tity o f th e p ro c e s s e s it c a n c o m m u n ic a te w ith a t a g iv e n in s ta n t o f tim e
u s in g th o s e lin k s ; h e n c e , th e lo c a tio n o f a p ro c e s s c a n b e c h a n g e d b y c h a n g in g th e
lin k s it s e e s (b y s e n d in g it n e w lin k n a m e s , a s in t h e π -c a lc u lu s , o r b y s e n d in g th e
p r o c e s s its e lf , i.e ., its n a m e , a s in th e H O π -c a lc u lu s to a re c e iv in g p ro c e s s th a t h a s a
1 3
A d a p te d fro m [2 5 ].
d iffe re n t lo c a tio n ( a g a in , d e fin e d b y its c o n n e c tiv ity ) ) . O th e r p r o c e s s a lg e b ra s

a p p ro a c h e s h a v e b e e n d e fin e d w h e re th e lo c a tio n o f p ro c e s s e s is d ire c tly a n d e x p lic itly
d e f in e d , g iv in g it a f ir s t c la s s s ta tu s , s o a llo w in g f o r a m o re d ire c t m o d e lin g a n d
re a s o n in g a b o u t p ro b le m s re la te d to lo c a tio n s , s u c h a s a c c e s s rig h ts o r c o d e m o b ility ,
th u s m a k in g th e s e a lg e b ra s s o m e w h a t m o re a p p e a lin g a s O N fo r m o b ile a rc h ite c tu re s .
T w o o f th e s e a p p ro a c h e s a re th e a m b ie n t c a lc u lu s a n d K L A I M . I n th e f o llo w in g w e
b rie fly o u tlin e s o m e o f th e ir fe a tu re s . A s b e fo re , th e p re s e n ta tio n is fa r fro m c o m p le te ,
w ith th e m a in g o a l o f o n ly g iv in g s o m e fla v o r o f th e w a y th e y a d o p t to m o d e l p ro c e s s
lo c a tio n a n d m o b ility in a p ro c e s s a lg e b ra s s e ttin g .
A m b ie n t c a lc u lu s [7 ]. In th is fo rm a lis m th e c o n c e p t o f a m b ie n t is a d d e d to th e b a s ic
c o n s tru c ts fo r p ro c e s s e s d e fin itio n a n d c o m p o s itio n d e s c rib e d a b o v e . A n a m b ie n t h a s a
n a m e th a t s p e c if ie s its id e n tity , a n d c a n b e th o u g h t o f a s a s o rt o f b o u n d a ry th a t
e n c lo s e s a s e t o f ru n n in g p ro c e s s e s . A m b ie n ts , d e n o te d a s n [P ], w h e re n is th e a m b ie n t
n a m e a n d P is th e e n c lo s e d p r o c e s s , c a n b e e n te r e d , e x ite d o r o p e n e d ( i.e ., d is s o lv e d ) b y
a p p ro p ria te o p e ra tio n s e x e c u te d b y a p ro c e s s , s o a llo w in g to m o d e l m o v e m e n t a s th e
c ro s s in g o f a m b ie n t b o u n d a rie s . A m b ie n ts a re h ie ra rc h ic a lly n e s te d , a n d a p ro c e s s c a n
o n ly e n te r a n a m b ie n t w h ic h is s ib lin g o f its a m b ie n t in th e h ie ra rc h y , a n d c a n e x it
o n ly in to th e p a re n t a m b ie n t o f its a m b ie n t; h e n c e , m o v in g to a “ fa r” a m b ie n t in th e
a m b ie n ts h ie ra rc h y re q u ire s , in th is fo rm a lis m , th e e x p lic it c ro s s in g o f m u ltip le
a m b ie n ts . T h e m o b ility o p e r a tio n s f o r a n a m b ie n t n [ .] a r e d e n o te d b y i n a m b n ,
o u t a m b n , o p e n n , re s p e c tiv e ly 14. I n g e n e r a l, a p r o c e s s c a n n o t f o r g e th e m b y its e lf ,
b u t r e c e iv e s th e m th a n k s to th e c o m m u n ic a tio n o p e ra tio n s in a n d o u t . H e n c e , a
p ro c e s s re c e iv in g o n e o f s u c h o p e ra tio n s th ro u g h a c o m m u n ic a tio n a c tu a lly re c e iv e s a
c a p a b ility fo r it, b e in g a llo w e d to e x e c u te th e c o rre s p o n d in g o p e ra tio n o n th e n a m e d
a m b ie n t. T h e (p a rtia l) fo rm a l s y n ta x o f th is a lg e b ra is th e n a s fo llo w s :
P : : = 0 ⏐ π . P ⏐ P + P ⏐ P || P ⏐ n [ P ] ⏐ …
w ith ∈ { i (i = 1 , 2 , … ), in (x ), o u t (M ), in a m b n , o u ta m b n , o p e n n } , w h e re x is
a v a ria b le a n d M s ta n d s fo r e ith e r a n a m b ie n t n a m e (n ) , o r a c a p a b ility fo r a n a m b ie n t
(e ith e r in a m b n , o r o u t a m b n , o r o p e n n ) . C o m m u n ic a tio n is r e s tric te d to b e lo c a l,
i.e . o n ly b e tw e e n p r o c e s s e s e n c lo s e d in th e s a m e a m b ie n t. C o m m u n ic a tio n b e tw e e n
n o n lo c a l p ro c e s s e s re q u ire s th e d e fin itio n o f s o m e s o rt o f “ m e s s e n g e r” a g e n t th a t
e x p lic itly c ro s s e s th e re q u ire d a m b ie n t b o u n d a rie s b rin g in g w ith its e lf th e in fo rm a tio n
to b e c o m m u n ic a te d . A lte rn a tiv e ly , a p ro c e s s c a n m o v e its e lf to th e a m b ie n t o f its
p a rtn e r b e fo re (lo c a lly ) c o m m u n ic a tin g w ith it. In b o th c a s e s , th e m e s s e n g e r o r th e
m o v in g p ro c e s s m u s t p o s s e s s th e n e e d e d c a p a b ilitie s .
K L A I M (K e r n e l L a n g u a g e fo r A g e n ts In te r a c tio n a n d M o b ility ) [1 1 ]. T h is fo rm a lis m
a llo w s to d e fin e a n e t o f lo c a tio n s th a t a re b a s ic a lly n o t n e s te d in to e a c h o th e r, w ith
d ire c t c o m m u n ic a tio n p o s s ib le , in p rin c ip le , b e tw e e n p ro c e s s e s lo c a te d a t a n y lo c a tio n ,
d iffe re n tly fro m th e a m b ie n t c a lc u lu s (a n y w a y , th e e x te n s io n to n e s te d lo c a tio n is
p o s s ib le ). A n o th e r re m a rk a b le d iffe re n c e w ith th e a m b ie n t c a lc u lu s , a n d w ith a ll th e
p re v io u s ly m e n tio n e d a lg e b ra s , c o n s is ts o f th e a d o p tio n o f a g e n e r a tiv e ( r a th e r th a n
m e s s a g e p a s s in g ) s ty le o f c o m m u n ic a tio n , b a s e d o n th e u s e o f tu p le sp a c e s a n d th e
1 4
N o te th a t th e a m b ie n t o p e ra tio n s a re n a m e d i n , o u t a n d o p e n in th e o rig in a l p a p e r [7 ];
w e h a v e re n a m e d th e m to a v o id c o n fu s io n w ith th e n a m e s u s e d in th is p a p e r to d e n o te th e
m e s a g e p a s s in g c o m m u n ic a tio n o p e ra tio n s .
c o m m u n ic a tio n p rim itiv e s o f th e L in d a c o o rd in a tio n la n g u a g e [8 ]. T u p le s p a c e s a re

lin k e d to lo c a tio n s , a n d in te ra c tio n b e tw e e n p ro c e sse s lo c a te d a t d iffe re n t lo c a tio n s c a n
h a p p e n b y p u ttin g o r re trie v in g th e o p p o rtu n e tu p le in to th e tu p le s p a c e a t a g iv e n
lo c a tio n . A g a in , th e (p a rtia l) fo rm a l s y n ta x o f th is a lg e b ra is a s fo llo w s:
P : : = 0 ⏐ π . P ⏐ P + P ⏐ P || P ⏐ …
w ith ∈ { i ( i = 1 , 2 , … ) , in _ t (t)@ l, r e a d _ t(t)@ l, o u t _ t (t)@ l, e v a l_ t (t)@ l,
n e w lo c (u )} , w h e re th e in d ic a te d o p e ra tio n s a re th e u s u a l L in d a -lik e o p e ra tio n s o n a
tu p le t, re s tric te d to o p e ra te o n th e tu p le s p a c e a s s o c ia te d to th e l lo c a tio n 15. M o re o v e r,
th e n e w lo c (u ) o p e ra tio n a llo w s a p ro c e s s to c re a te a n e w (in itia lly p riv a te ) lo c a tio n
th a t c a n b e a c c e s s e d th ro u g h th e n a m e u . N o te th a t th e fie ld s o f a tu p le m a y b e e ith e r
v a lu e s , o r p ro c e s s e s , o r lo c a litie s , o r v a ria b le s o f th e p re v io u s ty p e s . T h is a llo w s a
s im p le m o d e llin g o f a ll th e m o b ile c o d e s ty le s (n a m e ly R E X , C O D a n d M A ), a s
s h o w n in [1 1 ].
O th e r fo r m a l m o d e l s . O th e r a p p ro a c h e s to m o b ility m o d e lin g (a n d fo rm a l
v e rific a tio n o f fu n c tio n a l re q u ire m e n ts fo r m o b ile s y s te m s ) h a v e b e e n p ro p o s e d , n o t
b a se d o n a p ro c e s s a lg e b ra s fra m e w o rk : M o b ile U N IT Y [ 2 6 ] , M o b a d tl [ 1 2 ] ,
C O M M U N IT Y [3 8 ]. T h e fo rm e r tw o h a v e a te m p o ra l lo g ic b a s e d s e m a n tic s , a n d th e
la tte r h a s a c a te g o r y th e o ry b a s e d s e m a n tic s . W e d o n o t c o n s id e r e x p lic itly th e s e
a p p ro a c h e s s in c e th e tra n s la tio n m e th o d o lo g y fro m O N to T M N d e s c rib e d in th e
fo llo w in g s e c tio n h a s b e e n p re s e n te d in th e fra m e w o rk o f p ro c e s s a lg e b ra s .
F r o m P r o c e s s A lg e b r a s t o P e r f o r m a n c e M o d e ls . In th is s e c tio n w e p re s e n t a
m e th o d o lo g y fo r th e tra n s la tio n fro m O N to a s u ita b le T M N , w h e n th e a p p lic a tio n
m o d e l is b u ilt u s in g O N = { P ro c e s s A lg e b ra } . T h is a p p ro a c h h a s b e e n p re s e n te d in th e
fra m e w o rk o f π -c a lc u lu s a n d H O π - c a lc u lu s , b u t it is a p p lic a b le to a n y f o r m a lis m w ith
a n o p e ra tio n a l s e m a n tic s , lik e a ll th e p ro c e s s a lg e b ra s p re s e n te d a b o v e . W e s ta rt w ith a
b rie f re v ie w o f o p e ra tio n a l s e m a n tic s , a n d th e n o u tlin e th e a p p ro a c h , p re s e n te d in [2 5 ].
T h e o p e ra tio n a l s e m a n tic s o f a p ro c e s s s p e c ifie d u s in g th e s y n ta x o f a g iv e n p ro c e s s
a lg e b ra is g iv e n b y a la b e le d tr a n s itio n s y s te m , i.e . ( in f o r m a lly ) a g r a p h th a t r e p r e s e n ts
a ll th e p o s s ib le s y s te m e v o lu tio n s ; e a c h n o d e o f th e g ra p h re p re s e n ts a p a rtic u la r
s y s te m s ta te , w h ile a tra n s itio n re p re s e n ts a s ta te c h a n g e , a n d th e la b e l a s s o c ia te d to a
tra n s itio n p ro v id e s in fo rm a tio n a b o u t th e “ a c tiv itie s ” th a t c a u s e th e c o rre s p o n d in g s ta te
c h a n g e . T h e tr a n s itio n r e la tio n ( i.e . w h ic h a r e th e s ta te s r e a c h a b le in o n e s te p f r o m a
g iv e n s ta te , a n d th e a s s o c ia te d la b e ls ) is s p e c ifie d b y g iv in g a s e t o f s y n ta x -d riv e n ru le s ,
in th e s e n s e th a t th e y a re a s s o c ia te d to th e s y n ta c tic ru le s o f th e a lg e b ra . E a c h ru le ta k e s
P r e m is e s
th e fo rm w h o s e m e a n in g is th a t w h e n e v e r th e p re m is e s (th a t c a n b e
C o n c lu s io n
in te rp re te d a s a g iv e n c o m p u ta tio n a l s te p ) o c c u r, th e n th e c o n c lu s io n w ill o c c u r a s w e ll.
(S im p lifie d ) e x a m p le s o f s u c h ru le s a re th e fo llo w in g :
o u tx i n x
P ⎯ ⎯ → P ' P ⎯ ⎯ → P ' P ⎯ ⎯ ⎯ → P ', Q ⎯ ⎯⎯ → Q '
, , ,
.P ⎯ ⎯ → P P + Q ⎯ ⎯ → P ' P || Q ⎯ ⎯ → P ' || Q P || Q ⎯ ⎯ → P '|| Q '
1 5
N o te th a t th e tu p le o p e ra tio n s a re n a m e d i n , o u t , r e a d a n d e v a l in th e o rig in a l p a p e r
[1 1 ]; w e h a v e re n a m e d th e m to a v o id c o n fu s io n w ith th e n a m e s u s e d in th is p a p e r to d e n o te
th e m e s s a g e p a s s in g c o m m u n ic a tio n o p e ra tio n s .
N o te th a t th e th ir d r u le s p e c if ie s a tr a n s itio n r e la tio n f o r p a r a lle l in d e p e n d e n t

p ro c e s s e s , w h ile th e fo u rth ru le s p e c ifie s a tra n s itio n re la tio n fo r p a ra lle l p ro c e s s e s th a t
s y n c h ro n iz e th e m s e lv e s th ro u g h a c o m m u n ic a tio n o p e ra tio n 16.
E x a m p le 3 . T h e la b e le d tra n s itio n s y s te m o b ta in e d a p p ly in g ru le s a s th e o n e s s p e c ifie d
a b o v e to th e S y s m o d e l o f e x a m p le 2 (u s in g H O π -c a lc u lu s ) is g iv e n b y (a s s u m in g C =
( 1 + 2 ).0 ):
0 1 1 2 3 4 1 5
2 2
E n d O fE x a m p le 3
In g e n e ra l, m e th o d o lo g ie s fo r th e tra n s la tio n o f p ro c e s s a lg e b ra s in to a T M N
s u ita b le fo r th e a n a ly s is o f N F A s a re b a s e d o n th e a s s o c ia tio n o f a s to c h a s tic d u ra tio n ,
th a t h e n c e re p re s e n t a M I fo r th e s e m e th o d o lo g ie s , to th e a c tiv itie s s p e c ifie d in th e
p ro c e s s a lg e b ra , s o o b ta in in g , a s a firs t s te p , a s to c h a s tic p ro c e s s a lg e b ra th a t, in o u r
fra m e w o rk , c a n b e c o n s id e re d a s a n in te rm e d ia te n o ta tio n to w a rd th e fin a l T M N . T h e n ,
s ta rtin g fro m a s to c h a s tic p ro c e s s a lg e b ra m o d e l, a s to c h a s tic d u ra tio n c a n b e a s s o c ia te d
to e a c h la b e l in th e la b e le d tra n s itio n s y s te m th a t re p re s e n ts th e o p e ra tio n a l s e m a n tic o f
th e o rig in a l m o d e l. If th is d u ra tio n is e x p o n e n tia lly d is trib u te d (h e n c e e x p re s s e d b y a n
e x p o n e n tia l ra te ), th e n w e g e t a c o n tin u o u s tim e M a rk o v c h a in . A fu n d a m e n ta l p ro b le m
to m a k e p ra c tic a lly u s a b le th e s e a p p ro a c h e s is h o w g iv in g a m e a n in g fu l v a lu e to th e
e x p o n e n tia l ra te s a s s o c ia te d to th e tra n s itio n s , s in c e , in a re a lis tic s y s te m , th e ir n u m b e r
is v e ry h ig h . T h e id e a o f [2 5 ], is to a s s o c ia te to e a c h tra n s itio n a la b e l th a t d o e s n o t
m e r e ly r e g is te r th e a c tio n a s s o c ia te d to th a t tr a n s itio n ( e .g ., 1, a s in e x a m p le 3 ) , b u t
a ls o th e in fe re n c e ru le s u s e d d u rin g th e d e d u c tio n o f th e tra n s itio n , s o to k e e p tra c e o f
th e “ u n d e rly in g o p e ra tio n s ” th a t le a d to th e e x e c u tio n o f th a t a c tio n . F o r in s ta n c e , in
e x a m p le 3 th e o p e ra tio n u n d e rly in g th e e x e c u tio n o f a c tio n 1 is a s e le c tio n o p e ra tio n
b e tw e e n th e tw o c o n c u rre n tly e n a b le d o p e ra tio n 1 a n d 2. T h e s e “ e n h a n c e d ” la b e ls c a n
b e u s e d to d e fin e a s y s te m a tic w a y fo r d e riv in g th e ra te s to b e a s s o c ia te d to th e s y s te m
tr a n s itio n s . T h e e n h a n c e d la b e ls a r e b u ilt u s in g s y m b o ls f r o m a s u ita b le a lp h a b e t ( e .g .,
{ + , ||, … } ) , t o r e c o r d t h e i n f e r e n c e r u l e s u s e d d u r i n g t h e d e r i v a t i o n o f t h e t r a n s i t i o n s .
F o r e x a m p le , th e tra n s itio n ru le s g iv e n a b o v e w o u ld b e re w ritte n a s fo llo w s to g e t
e n h a n c e d la b e ls 17:
P ⎯ ⎯ → P ' P ⎯ ⎯ → P ' P ⎯ ⎯ o u⎯ ⎯ t x → P ' , Q ⎯ ⎯ ' i ⎯ n ⎯ x → Q '
, + , || , || o u t x , ' i n x
.P ⎯ ⎯ → P P + Q ⎯ ⎯⎯ → P ' P || Q ⎯ ⎯ → P ' || Q P || Q ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ → P '| | Q '
1 6
In th e “ s ta n d a rd ” s e m a n tic s o f p ro c e s s a lg e b ra s [2 4 ], th e la b e l o f th is la tte r ru le is e q u a l to
τ , th a t is a n in v is i b le a c tio n , s in c e th e tw o m a tc h in g in p u t a n d o u tp u t o p e ra tio n s “ c o n s u m e ”
e a c h o th e r, m a k in g th e m u n o b s e rv a b le fo r a n e x te rn a l “ o b s e rv e r” (e .g . a p ro c e s s ru n n in g in
p a ra lle l).
1 7
A g a in , w e a re i n tro d u c in g a s im p lific a tio n : in a c o m p le te s p e c ific a tio n d iffe re n t s y m b o ls
s h o u ld b e u s e d to d is tin g u is h th e s e le c tio n o f th e le ft o r rig h t a lte rn a tiv e in a p a ra lle l o r
a lte rn a tiv e c h o ic e c o m p o s itio n (s e e [2 5 ]).
w h e re a n e n h a n c e d la b e l is , in g e n e ra l, g iv e n b y = , w ith d e n o tin g , a s b e fo re , a
*
p a rtic u la r s y s te m a c tio n , a n d ∈ { + , ||, … } d e n o tin g th e s e q u e n c e o f in f e r e n c e r u le s
fo llo w e d to fire th a t a c tio n .
E x a m p le 4 . T h e tra n s itio n s y s te m o f e x a m p le 3 w o u ld b e e n h a n c e d a s fo llo w s :
< ||i n a 2 ( x ) , ||o u t a 2 ( R 2 ) >

0 1 + 1 2 3 4 + 1 5
< ||o u t a 1 ( C ) , < ||i n a 1 ( x ) , < ||o u t a 2 ( C ) ,
+ 2 ||o u t a 1 ( R 1 ) > ||i n a 2 ( z ) > + 2
||i n a 1 ( z ) >
U s in g th e e n h a n c e d la b e ls , th e ra te o f a tra n s itio n c a n b e c a lc u la te d b y d e fin in g
s u ita b le fu n c tio n s , a s fo llo w s :
*
: A c t → ℜ } → ℜ • A c t → ℜ
+ + +
$ b , $ s : { + , ||, … , $ : { + , ||, … }
w h e re • d e n o te s th e c o n c a te n a tio n o p e ra to r, $ b d e fin e s th e b a s ic e x p o n e n tia l ra te o f a n

a c tio n in a re fe re n c e a rc h ite c tu re d e d ic a te d o n ly to th e e x e c u tio n o f th a t a c tio n w ith o u t
a n y in te rfe re n c e , w h ile $ s d e fin e s a s lo w in g fa c to r in th e e x e c u tio n o f a n a c tio n d u e to
th e e x e c u tio n o f s o m e u n d e rly in g o p e ra tio n in th e ta rg e t a rc h ite c tu re w h e re th e a c tio n is
a c tu a lly e x e c u te d . $ is th e fu n c tio n th a t c a lc u la te s th e a c tu a l e x p o n e n tia l ra te o f th e
tra n s itio n , ta k in g in to a c c o u n t a ll p o s s ib le in te rfe re n c e s , a n d c a n b e b a s ic a lly
r e c u r s i v e l y d e f i n e d u s i n g $ b a n d $ s, a s f o l l o w s 1 8 :
*
$ ( ) = $ b( ), $ ( ) = $ s( ) $ ( ), ∈ { + , ||, … } , ∈ { + , ||, … }
B y s u i t a b l y d e f i n i n g t h e f u n c t i o n s $ b a n d $ s, w e c a n l i m i t t h e p r o b l e m o f
c a lc u la tin g m e a n in g fu l tra n s itio n ra te s to th e p ro b le m o f d e fin in g o n ly th e c o s t o f th e
“ p rim itiv e ” s y s te m a c tio n s , a n d o f th e s lo w in g fa c to rs c a u s e d b y a p a rtic u la r ta rg e t
a rc h ite c tu re . N o te th a t, w ith re s p e c t to o u r v a lid a tio n fra m e w o rk , th is m e a n s th a t th e
m e th o d o lo g y o f [2 5 ], b e s id e s d e fin in g a m e th o d fo r th e tra n s la tio n fro m O N to T M N ,
a ls o g iv e s s tro n g in d ic a tio n a b o u t w h a t M I s h o u ld b e c o lle c te d . H a v in g th is
in fo rm a tio n , th e c a lc u la tio n o f th e a c tu a l tra n s itio n ra te s c a n b e c o m p le te ly a u to m a te d
(b u t it s h o u ld b e re m a rk e d th a t th e d e fin itio n o f th e a b o v e fu n c tio n s is , in g e n e ra l, q u ite
a n a m b i t i o u s t a s k ) . I t s h o u l d b e n o t e d a l s o t h a t , b y c h a n g i n g t h e d e f i n i t i o n o f $ s, w e
c a n a ls o a n a ly z e th e im p a c t o n p e rfo rm a n c e o f d iffe re n t ta rg e t a rc h ite c tu re s .
O n c e th e ra te s o f a ll th e p o s s ib le tra n s itio n s fro m a g iv e n s ta te (re p re s e n tin g th e
s y s t e m b e h a v i n g l i k e p r o c e s s P i) h a v e b e e n d e t e r m i n e d , t h e o v e r a l l r a t e f r o m s t a t e P i t o
a n o th e r s ta te P j w h ic h is o n e -s te p s u c c e s s o r o f s ta te P i is g iv e n b y :
q ( P i, P j) = ∑ $ ( )
P i ⎯ ⎯ϑ⎯ → P j
(n o te th a t, in g e n e ra l, m o re th a n o n e tra n s itio n fro m s ta te P i to s ta te P j m a y b e p re s e n t
in th e g ra p h o f th e tra n s itio n s y s te m ). T h e p ro c e s s s o o b ta in e d g iv e s u s in fo rm a tio n
a b o u t th e ra te a t w h ic h th e s y s te m a c tio n s a re p e rfo rm e d . In g e n e ra l, to c a rry o u t a
1 8
N o te th a t in th is p re s e n ta tio n w e d o n o t h a v e e x p lic ite ly c o n s id e re d th e p ro b le m o f h o w
c a lc u la tin g th e ra te o f s y n c h ro n iz a tio n (i.e ., c o m m u n ic a tio n ) o p e ra tio n s ; fo r a d is c u s s io n o f
th is to p ic , s e e [1 8 ].
p e r f o r m a n c e e v a lu a tio n , w e n e e d a ls o a r e w a r d s tr u c tu r e , th a t a s s o c ia te s to e a c h s ta te o f
th e p ro c e s s a p e rfo rm a n c e -re la te d re w a rd (o r c o s t), th a t c o n s titu te s a n o th e r p o s s ib le M I
th a t s h o u ld b e c o lle c te d . In th e d e s c rib e d a p p ro a c h , th e s e re w a rd s c o u ld b e c a lc u la te d
u s in g a p ro c e d u re s im ila r to th e o n e fo llo w e d to c a lc u la te th e tra n s itio n ra te s . B e s id e s
b e in g a d d e d to th e M a rk o v p ro c e s s d e riv e d fro m a p ro c e s s a lg e b ra s p e c ific a tio n , re w a rd s
c o u ld a ls o b e fo rm a lly in c lu d e d in a s to c h a s tic p ro c e s s a lg e b ra , to a llo w fo rm a l
re a s o n in g a b o u t th e m . F o r a d is c u s s io n a b o u t th is to p ic , w h ic h is b e y o n d th e s c o p e o f
th is p a p e r, s e e [1 8 ].
4 .3 G e n e r a l- P u r p o s e S e m i- F o r m a l M o d e ls : U M L
T h e a d v a n ta g e o f u s in g p ro c e s s a lg e b ra s a s O N m a in ly c o n s is ts in th e p o s s ib ility o f a
rig o ro u s a n d n o n a m b ig u o u s m o d e lin g a c tiv ity . H o w e v e r, th e u s e o f th e s e fo rm a l
n o ta tio n s d o e s n o t h a v e y e t g a in e d w id e s p re a d a c c e p ta n c e in th e p ra c tic e o f s o ftw a re
d e v e lo p m e n t. O n th e c o n tra ry , a s e m i-fo rm a l n o ta tio n , th e U n ifie d M o d e lin g L a n g u a g e
( U M L ) [ 6 ] , th a t la c k s s o m e o f th e f o r m a l r ig o r o f th e n o ta tio n s c o n s id e r e d in th e
p r e v io u s s e c tio n , h a s q u ic k ly b e c o m e a d e - f a c to s ta n d a rd in th e in d u s tr ia l s o ftw a re
d e v e lo p m e n t p ro c e s s . U M L re c e n t s u c c e s s is m a in ly d u e to th e fo llo w in g re a s o n s [1 ]:
• It a llo w s to e m b e d in to th e m o d e l s ta tic a n d d y n a m ic a s p e c ts o f th e s o ftw a re b y
u s in g d iffe re n t d ia g ra m s , e a c h re p re s e n tin g a d iffe re n t v ie w o f th e s o ftw a re
s y s te m . E a c h v ie w c a p tu re s a d iffe re n t s e t o f c o n c e rn s a n d a s p e c ts re g a rd in g th e
s u b je c t. T h e r e f o r e it is b ro a d ly a p p lic a b le to d iffe re n t ty p e s o f d o m a in s o r
s u b je c t a re a s .
• T h e s a m e c o n c e p tu a l fra m e w o rk a n d th e s a m e n o ta tio n c a n b e u s e d fro m
s p e c ific a tio n th ro u g h d e s ig n to im p le m e n ta tio n .
• I n U M L , m o r e th a n in c la s s ic a l o b je c t o rie n te d a p p ro a c h e s , th e b o u n d a rie s
b e tw e e n a n a ly s is , d e s ig n a n d im p le m e n ta tio n a re n o t c le a rly s ta te d . A s a
c o n s e q u e n c e , th e re is m o re fre e d o m in s o ftw a re d e v e lo p m e n t p ro c e s s , e v e n if th e
R a tio n a l U n if ie d P r o c e s s [ 1 9 ] h a s b e e n p r o p o s e d a s a g u id e lin e fo r s o ftw a re
p ro c e s s d e v e lo p m e n t b a s e d o n U M L .
• U M L is n o t a p ro p rie ta ry a n d c lo s e d la n g u a g e b u t a n o p e n a n d fu lly e x te n s ib le
la n g u a g e . T h e e x te n s ib ility m e c h a n is m s a n d th e p o te n tia l fo r a n n o ta tio n s o f
U M L a llo w it to b e c u s to m iz e d a n d ta ilo re d to p a rtic u la r s y s te m ty p e s , d o m a in s ,
a n d m e th o d s /p ro c e s s e s . It c a n b e e x te n d e d to in c lu d e c o n s tru c ts fo r w o rk in g
w ith in a p a r tic u la r c o n te x t ( e .g ., p e r f o r m a n c e r e q u ir e m e n t v a lid a tio n ) w h e r e e v e n
v e ry s p e c ia liz e d k n o w le d g e c a n b e c a p tu re d .
• It is w id e ly s u p p o rte d b y a b ro a d s e t o f to o ls . V a rio u s to o l v e n d o rs in te n d to
s u p p o rt U M L in o rd e r to fa c ilita te its a p p lic a tio n th ro u g h o u t a n o rg a n iz a tio n .
B y h a v in g a s e t o f to o ls th a t s u p p o rt U M L , k n o w le d g e m a y b e m o re re a d ily
c a p t u r e d a n d m a n i p u l a t e d t o m e e t a n o r g a n i z a t i o n 's o b j e c t i v e s .
U M L c o n s is ts o f tw o p a rts : a n o ta tio n , u s e d to d e s c r ib e a s e t o f d ia g ra m s (a ls o
c a lle d th e s y n ta x o f th e la n g u a g e ) a n d a m e ta m o d e l ( a ls o c a lle d th e s e m a n tic s o f th e
la n g u a g e ) th a t s p e c ifie s th e a b s tra c t in te g ra te d s e m a n tic s o f U M L m o d e lin g c o n c e p ts .
T h e U M L n o ta tio n e n c o m p a s s e s s e v e ra l k in d s o f d ia g ra m s , m o s t o f th e m b e lo n g in g to
p re v io u s m e th o d o lo g ie s , th a t p ro v id e s p e c ific v ie w s o f th e s y s te m . U M L d ia g ra m s c a n
b e d is tin g u is h e d in to fo u r m a in ty p e s :
1 . S ta tic d ia g ra m s : U s e C a s e , C la s s a n d O b je c t D ia g ra m s
2 . B e h a v io ra l d ia g ra m s : A c tiv ity a n d S ta te D ia g ra m s
3 . In te ra c tio n d ia g ra m s : S e q u e n c e a n d C o lla b o ra tio n D ia g ra m s
4 . Im p le m e n ta tio n d ia g ra m s : C o m p o n e n t a n d D e p lo y m e n t D ia g ra m s
“ S t a n d a r d ” U M L a s O N f o r m o b ile a r c h it e c t u r e . S ta n d a rd U M L c a n b e u s e d
a s O N fo r m o b ile a rc h ite c tu re s , s in c e U M L a lre a d y p ro v id e s s o m e m e c h a n is m s fo r th is
g o a l. T h e y a r e m a in ly b a s e d o n th e u s e o f a ta g g e d v a lu e l o c a t i o n w ith in a
c o m p o n e n t to e x p re s s its lo c a tio n , a n d o f th e c o p y a n d b e c o m e s te re o ty p e s to e x p re s s
th e lo c a tio n c h a n g e o f a c o m p o n e n t. T h e fo rm e r s te re o ty p e c a n b e u s e d to s p e c ify th e
c re a tio n o f a n in d e p e n d e n t c o m p o n e n t c o p y a t a n e w lo c a tio n (lik e in th e C O D a n d
R E X s ty le s ), a n d th e la tte r to s p e c ify a lo c a tio n c h a n g e o f a c o m p o n e n t th a t p re s e rv e s
its id e n tity (lik e in th e M A s ty le ). In [6 ] it is s h o w n h o w to u s e th e s e m e c h a n is m s
w ith in a C o lla b o ra tio n D ia g ra m to m o d e l th e lo c a tio n c h a n g e o f a m o b ile c o m p o n e n t
in te rle a v e d w ith in te ra c tio n s a m o n g c o m p o n e n ts .
E x a m p le 5 . T h e tr a v e l a g e n c y a p p lic a tio n c a n b e m o d e le d b y th e fo llo w in g

C o lla b o ra tio n D ia g ra m , b a s e d o n s ta n d a rd U M L , in c a s e o f M A p a ra d ig m .
* [i := 1 ..N ]
3 .i : re q ()
c : C O L L E C T O R f1 : F L Y C O M P
lo c a t io n = L 1 lo c a t io n = L 1
3 .i .1 : R e p ()
4 : « b e c o m e »
2 : « b e c o m e »
* [i := 1 ..N ]
5 .i : re q ()
a : A G E N C Y c : C O L L E C T O R f2 : F L Y C O M P
lo c a t io n = L 0 lo c a t io n = L 2 lo c a t io n = L 2
7 : e n d ()
5 .i .1 : R e p ()
6 : « b e c o m e »
c : C O L L E C T O R
1 : s ta rt( )
lo c a t io n = L 0
H o w e v e r, th is m o d e llin g a p p ro a c h p re s e n ts s o m e d ra w b a c k s , s in c e it m ix e s to g e th e r
tw o d iffe re n t v ie w s , o n e c o n c e rn in g th e a rc h ite c tu ra l s ty le ( e .g . th e f a c t th a t a
c o m p o n e n t b e h a v e s a c c o rd in g to s o m e m o b ility s ty le ), a n d th e o th e r o n e c o n c e rn in g th e
a c tu a l s e q u e n c e o f m e s s a g e s e x c h a n g e d b e tw e e n c o m p o n e n ts d u rin g a p a rtic u la r
in te ra c tio n . M o re o v e r, th is a p p ro a c h m a y le a d to a p ro life ra tio n o f o b je c ts in th e
d ia g ra m s , th a t a c tu a lly re p re s e n t th e s a m e o b je c t a t d iffe re n t lo c a tio n s . B o th th e s e
d ra w b a c k s c a n le a d to q u ite o b s c u re m o d e ls o f th e a p p lic a tio n b e h a v io r.
“ E x t e n d e d ” U M L a s O N f o r m o b ile a r c h it e c t u r e . T o o v e rc o m e th e d ra w b a c k s
o f s ta n d a rd U M L a s O N fo r m o b ile a rc h ite c tu re s , th e d e p e n d e n c y b e tw e e n th e m o d e le d
A S a n d th e c h o s e n O N c a n b e m a d e e x p lic it b y a d o p tin g a d iffe re n t a p p ro a c h b a s e d o n
th e u s e o f b o th C o lla b o r a tio n a n d S e q u e n c e D ia g r a m s ( C D a n d S D ) , w ith a c le a r
s e p a ra tio n o f c o n c e rn s b e tw e e n th e m , a s p ro p o s e d in [1 5 ]. T h e S D d e s c rib e s th e a c tu a l
s e q u e n c e o f in te ra c tio n s b e tw e e n c o m p o n e n ts , w h ic h is b a s ic a lly in d e p e n d e n t o f th e
a d o p te d s ty le a n d o b e y s o n ly to th e in trin s ic lo g ic o f th e a p p lic a tio n , w h ile th e C D
o n ly m o d e ls th e in te r a c tio n s tr u c tu r e ( i.e . w h o in te r a c ts w ith w h o m ) a n d s ty le , w ith o u t
s h o w in g th e a c tu a l s e q u e n c e o f e x c h a n g e d m e s s a g e s .
T h e in te ra c tio n lo g ic is d e s c rib e d u s in g th e s ta n d a rd U M L n o ta tio n fo r S D . T h e
in te ra c tio n s tru c tu re is m o d e le d b y th e lin k s th a t c o n n e c t c o m p o n e n ts in C D , w ith
a rro w s s p e c ify in g u n id ire c tio n a l o r b id ire c tio n a l in te ra c tio n s . F o r th e in te ra c tio n s ty le ,
th e m a in g o a l is to d is tin g u is h a s ty le w h e re c o m p o n e n ts lo c a tio n is s ta tic a lly
a s s ig n e d , fro m a s ty le w h e re c o m p o n e n ts d o c h a n g e lo c a tio n to a d a p t to e n v iro n m e n t
c h a n g e . T o th is p u rp o s e , th e s ta n d a rd l o c a t i o n ta g g e d v a lu e c a n b e u s e d to s p e c ify
th e c o m p o n e n t lo c a tio n , w h ile it is n e c e s s a ry to e x te n d th e U M L s e m a n tic s b y
in tro d u c in g a n e w s te re o ty p e m o v e T o th a t a p p lie s to m e s s a g e s in th e C D . W h e re
p re s e n t, m o v e T o in d ic a te s th a t th e s o u r c e c o m p o n e n t m o v e s to th e lo c a tio n o f its
ta rg e t b e fo re s ta rtin g a s e q u e n c e o f c o n s e c u tiv e in te ra c tio n s w ith it. If n o o th e r
in fo rm a tio n is p re s e n t, th is s ty le a p p lie s to e a c h s e q u e n c e o f in te ra c tio n s s h o w n in th e
a s s o c ia te d S D , b e tw e e n th e s o u rc e a n d ta rg e t c o m p o n e n ts o f th e m o v e T o m e s s a g e ;
o th e r w is e a c o n d itio n c a n b e a d d e d to r e s tr ic t th is s ty le to a s u b s e t o f in te ra c tio n s
b e tw e e n tw o c o m p o n e n ts . It s h o u ld b e n o te d th a t th is a p p ro a c h a p p e a rs s u ita b le to
m o d e l o n ly m o b ile a rc h ite c tu re s w h e re th e a rc h ite c tu re s ty le is A S = { M A } .
E x a m p le 6 A c c o rd in g to th e a d o p te d m o d e lin g fra m e w o rk , th e tra v e l a g e n c y e x a m p le
a p p lic a tio n c a n b e m o d e le d a s s h o w n in fig u re 2 .
F ig u r e 2 . T ra v e l a g e n c y e x a m p le : (a ) in te ra c tio n lo g ic , (b ) a rc h ite c tu ra l s ty le
a :A G E N C Y c :C O L L . f1 :F L Y C . F 2 :F L Y C . f1 :F L Y C O M P .
s ta rt( ) lo c a tio n = L 1
r e q ( i) « m o v e T o »
R e p (i)
a :A G E N C Y « m o v e T o » c :C O L L E C T O R
* ( i= 1 .. N )
lo c a tio n = L 0 lo c a tio n = L ?
r e q ( i)
R e p (i) « m o v e T o »
f1 :F L Y C O M P .
e n d () * ( i= 1 .. N )
lo c a tio n = L 2
(a ) (b )
F ig u r e 2 .a s h o w s a S D th a t d e s c r ib e s in d e ta il th e “ lo g ic ” o f th e in te r a c tio n , i.e . th e
s e q u e n c e o f m e s s a g e s e x c h a n g e d a m o n g th e c o m p o n e n ts . In th is d ia g ra m n o
in fo rm a tio n is p re s e n t a b o u t th e a d o p te d s ty le , th a t is w h e th e r o r n o t s o m e c o m p o n e n t
c h a n g e s lo c a tio n d u rin g th e in te ra c tio n s . T h is in fo rm a tio n is p ro v id e d b y th e C D in
fig u re 2 .b , th a t m o d e ls a s ty le w h e re c o m p o n e n t m o b ility is c o n s id e re d . M o re
p re c is e ly , th e d ia g ra m s h o w s th a t o n ly c c a n c h a n g e lo c a tio n , a n d a c c o rd in g to th e
m o v e T o s e m a n tic s d e s c rib e d a b o v e , it m o v e s to th e lo c a tio n o f a , f 1 o r f 2 b e fo re
in te r a c tin g w ith th e m . N o te th a t in f ig u r e 2 .b th e lo c a tio n o f c is le f t u n s p e c if ie d ( L ? ) ,
s in c e it c a n d y n a m ic a lly c h a n g e . In g e n e ra l, it is p o s s ib le to g iv e it a s p e c ifie d v a lu e in
th e d ia g ra m , th a t w o u ld s h o w th e “ in itia l” lo c a tio n o f th e m o b ile o b je c t in a n in itia l
d e p l o y m e n t c o n f i g u r a t i o n . EndOfExample6
In g e n e ra l, th e re c o u ld b e u n c e rta in ty a b o u t th e c o n v e n ie n c e o f a d o p tin g a m o b ile
c o d e s ty le in th e d e s ig n o f a n a p p lic a tio n . T o m o d e l th is u n c e rta in ty a b o u t th e
a r c h ite c tu r e ( i.e . lo c a tio n a n d p o s s ib le m o b ility o f c o m p o n e n ts ), a n e w s te re o ty p e
m o v e T o ? h a s b e e n p r o p o s e d in [ 1 5 ] , th a t e x te n d s th e s e m a n tic s o f th e m o v e T o
s te re o ty p e d e s c rib e d a b o v e . W h e n a m e s s a g e b e tw e e n tw o c o m p o n e n ts in a C D is
la b e le d w ith m o v e T o ? , th is m e a n s th a t th e s o u r c e c o m p o n e n t “ c o u ld ” m o v e to th e
lo c a tio n o f its ta rg e t a t th e b e g in n in g o f a s e q u e n c e o f in te ra c tio n s w ith it. In a s e n s e ,
th is m e a n s th a t, b a s e d o n th e in fo rm a tio n th e d e s ig n e r h a s a t th a t s ta g e , h e /s h e
c o n s id e rs a c c e p ta b le b o th a s ta tic a n d a m o b ile a rc h ite c tu re . H e n c e , a g e n e ra l U M L
s u p p o rt to m o d e l a (p o s s ib ly ) m o b ile a rc h ite c tu ra l s ty le c o n s is ts o f a C D w h e re s o m e
m e s s a g e s a re u n la b e le d , s o m e c a n b e la b e le d w ith th e (p o s s ib ly c o n s tra in e d ) m o v e T o
s te re o ty p e , a n d s o m e w ith th e m o v e T o ? s te re o ty p e . T h e fo rm e r tw o c a s e s c o rre s p o n d
t o a s i t u a t i o n w h e r e t h e d e s ig n e r f e e ls c o n f id e n t e n o u g h to d e c id e a b o u t th e b e s t
a r c h ite c tu r a l s ty le , w h ile th e la tte r to a s itu a tio n w h e re th e d e s ig n e r la c k s s u c h a
c o n fid e n c e . In th e n e x t s e c tio n , w e illu s tra te m e th o d o lo g ie s fo r th e tra n s la tio n fro m a

m o d e l d e fin e d u s in g th e e x te n d e d U M L a s O N , to s u ita b le T M N s fo r th e a n a ly s is o f
d iffe re n t N F A s .
F r o m e x te n d e d U M L m o d e ls to p e r fo r m a n c e m o d e ls . T h e g o a l is to b u ild a
s to c h a s tic m o d e l th a t d e s c rib e s th e s y s te m d y n a m ic s , w h o s e e v a lu a tio n p ro v id e s
in f o rm a tio n a b o u t th e p e rfo rm a n c e o n e c a n e x p e c t b y a d o p tin g a n M A o r “ s ta tic ”
a rc h ite c tu ra l s ty le , a s d e s c rib e d in th e p re v io u s s e c tio n . In te rm s o f th e e x te n d e d U M L
p res e n te d a b o v e , th e g a in e d in s ig h ts s h o u ld a llo w th e d e s ig n e r to s u b s titu te th e
m o v e T o ? m e s s a g e s in th e p re lim in a ry C D w ith (p o s s ib ly c o n s tra in e d ) m o v e T o
m e s s a g e s , o r w ith n o s u c h m e s s a g e a t a ll, if th e o b ta in e d in s ig h ts p ro v id e e v id e n c e th a t
a s ta tic a rc h ite c tu ra l s ty le is m o r e a d v a n ta g e o u s . S p e c if ic a lly tw o d iffe re n t
m e th o d o lo g ie s h a v e b e e n p ro p o s e d [1 5 , 1 6 ] th a t, s ta rtin g fro m a s e t o f a n n o ta te d U M L
d ia g ra m s d e riv e tw o d iffe re n t T M N s , n a m e ly , a M a rk o v m o d e l o r a q u e u e in g n e tw o rk
m o d e l. In th e fo llo w in g w e b rie fly p re s e n t th e s e tw o m e th o d o lo g ie s .
T h e firs t o n e is s u ita b le fo r c a s e s w h e n th e N F A s o f in te re s t a re m a in ly in te ra c tio n -
re la te d m e a s u r e s ( e .g ., g e n e ra te d n e tw o r k tra ffic ) w ith o u t c o n s id e rin g p o s s ib le
c o n te n tio n w ith o th e r a p p lic a tio n s . In th is a p p ro a c h , th e T M N is a M a rk o v R e w a rd
P r o c e s s ( M R P ) [ 2 8 ] w h e n th e C D s m o d e lin g th e a rc h ite c tu ra l s ty le o n ly u s e th e
m o v e T o s te re o ty p e , a n d a M a rk o v D e c is io n P ro c e s s (M D P ) [2 8 ] w h e n th e C D s
m o d e lin g th e a rc h ite c tu ra l s ty le a ls o u s e th e m o v e T o ? s te re o ty p e .
T h e s e c o n d o n e , b a s e d o n c la s s ic S P E te c h n iq u e [3 2 , 3 3 ], is s u ita b le fo r c a s e s w h e re
th e N F A s o f in te re s t a re m e a s u re s lik e th ro u g h p u t o r re s p o n s e tim e a n d w e a re p o s s ib ly
in te re s te d in c o n s id e rin g c o n te n tio n w ith o th e r a p p lic a tio n s o n th e u s e o f s y s te m
re s o u rc e s . T w o d iffe re n t T M N s a re ta k e n in to a c c o u n t, n a m e ly , E x e c u tio n G ra p h s [3 2 ]
a n d E x te n d e d Q u e u e in g N e tw o rk M o d e ls [2 0 ] fo r N F A s w ith a n d w ith o u t c o n s id e ra tio n
to th e im p a c t o f c o n te n tio n , re s p e c tiv e ly .
In b o th c a s e s , it is a s s u m e d th a t th e d ia g ra m s d e s c rib e d in th e p re v io u s s e c tio n a re
a u g m e n te d w ith a p p ro p ria te a n n o ta tio n s e x p re s s in g th e “ c o s t” o f e a c h in te ra c tio n w ith
re s p e c t to a g iv e n p e rfo rm a n c e m e a s u re (s e e , fo r e x a m p le , [3 1 ] a n d p a p e rs in [3 9 ]), to
re p re s e n t th e n e e d e d M I. F o r e x a m p le , if w e a re in te re s te d in th e g e n e ra te d n e tw o rk
tra ffic , M I in c lu d e s a t le a s t th e s iz e o f e a c h e x c h a n g e d m e s s a g e .
M a r k o v m o d e ls . In g e n e ra l, a M R P m o d e ls a s ta te tra n s itio n s y s te m , w h e re th e n e x t
s ta te is s e le c te d a c c o rd in g to a tra n s itio n p ro b a b ility th a t o n ly d e p e n d s o n th e c u rre n t
s ta te . M o re o v e r, e a c h tim e a s ta te is v is ite d o r a tra n s itio n o c c u rs , a re w a rd is
a c c u m u la te d , th a t d e p e n d s o n th e in v o lv e d s ta te o r tra n s itio n . T y p ic a l m e a s u re s th a t c a n
b e d e riv e d fro m s u c h a m o d e l a re th e re w a rd a c c u m u la te d in a g iv e n tim e in te rv a l, o r th e
re w a rd a c c u m u la tio n ra te in th e lo n g p e rio d . A M D P e x te n d s th e M R P m o d e l b y
a s s o c ia tin g to e a c h s ta te a s e t o f a lte r n a tiv e d e c is io n s , w h e r e b o th th e r e w a r d s a n d th e
tra n s itio n s a s s o c ia te d to th a t s ta te a re d e c is io n d e p e n d e n t. A p o lic y f o r a M D P c o n s is ts
in a s e le c tio n , fo r e a c h s ta te , o f o n e o f th e a s s o c ia te d d e c is io n s , th a t w ill b e ta k e n e a c h
tim e th a t s ta te is v is ite d . H e n c e , d iffe re n t p o lic ie s le a d to d iffe re n t s y s te m b e h a v io rs a n d
to d iffe re n t a c c u m u la te d re w a rd s . In o th e r w o rd s , a M D P d e fin e s a fa m ily o f M R P s , o n e
f o r e a c h d if f e r e n t p o lic y th a t c a n b e d e te r m in e d . A lg o r ith m s e x is t to d e te rm in e th e
o p tim a l p o lic y w ith re s p e c t to s o m e o p tim a lity c rite rio n (e .g . m in im iz a tio n o f th e
a c c u m u la te d re w a rd ) [2 8 ].
In th e tra n s la tio n m e th o d o lo g y a d o p te d in [1 5 ], a M R P /M D P s ta te c o rre s p o n d s to a
p o s s ib le c o n fig u ra tio n o f th e c o m p o n e n ts lo c a tio n , w h ile a s ta te tra n s itio n m o d e ls th e
o c c u rre n c e o f a n in te ra c tio n b e tw e e n c o m p o n e n ts o r a lo c a tio n c h a n g e , a n d th e
a s s o c ia te d re w a rd is th e c o s t o f th a t in te ra c tio n . In c a s e o f M D P , th e d e c is io n s
a s s o c ia te d to s ta te s m o d e l th e a lte rn a tiv e c h o ic e s o f m o b ility o r n o m o b ility a s
a rc h ite c tu ra l s ty le , fo r th o s e c o m p o n e n ts th a t a re th e s o u rc e o f a m o v e T o ? m e s s a g e .
T h e tr a n s la tio n m e th o d f r o m th e e x te n d e d U M L to th is T M N c o n s is ts o f th e
d e fin itio n o f s o m e e le m e n ta ry g e n e ra tio n ru le s , a n d th e n in th e u s e o f th e s e ru le s to
d e fin e a M D P g e n e ra tio n a lg o rith m [1 5 ].
O n c e th e M D P h a s b e e n g e n e ra te d , it c a n b e s o lv e d to d e te rm in e th e o p tim a l p o lic y ,
th a t is th e s e le c tio n o f a d e c is io n in e a c h s ta te th a t o p tim iz e s th e re w a rd a c c u m u la te d in
th e c o rre s p o n d in g M R P . O f c o u rs e , th e o p tim a l p o lic y d e p e n d s o n th e v a lu e s g iv e n to
th e s y s te m p a r a m e te r s ( e .g ., th e s iz e o f th e m e s s a g e s a n d o f th e p o s s ib ly m o b ile
c o m p o n e n t). D iffe re n t v a lu e s fo r th e s e p a ra m e te rs m o d e l d iffe re n t s c e n a rio s .
Q u e u e in g N e tw o r k m o d e ls . A d iffe re n t m e th o d o lo g y fo r th e d e riv a tio n o f p e rfo rm a n c e
m o d e ls f r o m e x te n d e d U M L d ia g ra m s h a s b e e n p ro p o s e d in [ 1 6 ] , b a s e d o n S P E
te c h n iq u e s , h a v in g q u e u e in g n e tw o rk m o d e ls a s b a s ic T M N .
T h e S P E b a s ic c o n c e p t is th e s e p a r a tio n o f th e s o f tw a r e m o d e l (S M ) fro m its
e x e c u tio n e n v iro n m e n t m o d e l ( i.e ., h a rd w a re p la tfo rm m o d e l o r m a c h in e ry m o d e l,
M M ) . T h e S M c a p tu r e s th e e s s e n tia l a s p e c ts o f s o ftw a re b e h a v io r; a n d is u s u a lly
re p re s e n te d b y m e a n s o f E x e c u tio n G ra p h s (E G ). A n E G is a g ra p h w h o s e n o d e s
re p re s e n t s o ftw a re w o rk lo a d c o m p o n e n ts a n d w h o s e e d g e s re p re s e n t tra n s fe rs o f c o n tro l.
E a c h n o d e is w e ig h te d b y a d e m a n d v e c to r th a t re p re s e n ts th e re s o u rc e u s a g e o f th e n o d e
( i.e ., th e d e m a n d f o r e a c h r e s o u r c e ) .
T h e M M m o d e ls th e h a rd w a re p la tfo rm a n d is b a s e d o n th e E x te n d e d Q u e u e in g
N e tw o rk M o d e l (E Q N M ). T o s p e c ify a n E Q N M , w e n e e d to d e fin e : th e c o m p o n e n ts
( i.e ., s e r v ic e c e n te r s ) , th e to p o lo g y ( i.e ., th e c o n n e c tio n s a m o n g c e n te r s ) a n d s o m e
re le v a n t p a ra m e te rs (s u c h a s jo b c la s s e s , jo b ro u tin g a m o n g c e n te rs , s c h e d u lin g
d is c ip lin e a t s e rv ic e c e n te rs , s e rv ic e d e m a n d a t s e rv ic e c e n te rs ). C o m p o n e n t a n d
to p o lo g y s p e c ific a tio n is p e rfo rm e d a c c o rd in g to th e s y s te m d e s c rip tio n , w h ile
p a ra m e te rs s p e c if ic a tio n is o b ta in e d f r o m in f o r m a tio n d e riv e d b y E G s a n d f r o m
k n o w le d g e o f re s o u rc e c a p a b ilitie s . O n c e th e E Q N M is c o m p le te ly s p e c ifie d , it c a n b e
a n a ly z e d b y u s e o f c la s s ic a l s o lu tio n te c h n iq u e s ( s im u la tio n , a n a ly tic a l te c h n iq u e ,
h y b rid s im u la tio n ) to o b ta in p e rfo rm a n c e in d ic e s s u c h a s th e m e a n n e tw o rk re s p o n s e
tim e o r th e u tiliz a tio n in d e x .
T o c o p e w ith m o b ility , in th e m e th o d o lo g y p ro p o s e d in [1 6 ], w e ll-k n o w n
fo rm a lis m s s u c h a s E G a n d E Q N M h a v e b e e n e x te n d e d b y d e fin in g th e m o b ? -E G a n d
m o b ? -E Q N M fo rm a lis m s w ith th e g o a l o f m o d e llin g c o d e m o b ility a n d th e u n c e rta in ty
a b o u t its p o s s ib le a d o p tio n , w ith in a m o d e l o f th e s y s te m d y n a m ic s .
T o in c lu d e th e in fo rm a tio n a b o u t p o s s ib le c o m p o n e n t m o b ility e x p re s s e d in th e
C D s b y m o v e T o ? m e s s a g e s , a n e w k in d o f E G c a lle d m o b ? – E G is d e riv e d [1 6 ]. T h e
m o b ? -E G m o d ifie s th e o rig in a l E G b y in tro d u c in g m v n o d e s th a t m o d e l th e c o s t o f
c o d e m o b ility . M o re o v e r, th e m o b ? -E G e x te n d s th e E G f o r m a lis m b y in tr o d u c in g a
n e w k in d o f n o d e , c a lle d m o b ? , c h a ra c te riz e d b y tw o d iffe re n t o u tc o m e s , “ y e s ” a n d “ n o ” ,
th a t c a n b e n o n -d e te rm in is tic a lly s e le c te d , fo llo w e d b y tw o p o s s ib le E G s . T h e E G
c o rre s p o n d in g to b ra n c h “ y e s ” m o d e ls th e s e le c tio n o f c o m p o n e n t m o b ility s ty le , w h ile
th e E G o f th e b ra n c h “ n o ” m o d e ls th e s ta tic c a s e .
E x a m p le 7 . T h e s tru c tu re (w ith o u t la b e ls s h o w in g p e rfo rm a n c e re la te d in fo rm a tio n ) o f
th e m o b ? -E G d e riv e d fro m th e S D a n d C D o f e x a m p le 6 is illu s tra te d in th e fo llo w in g
fig u re .
m o b ?
y e s n o
m v N
m o b ?
n o
m o b ? y e s
y e s n o N
m v m v
N
N
N
m v
m v m v
M o b ? -E G c a n b e c o n s id e re d b y its e lf a s th e T M N fo r a firs t k in d o f p e rfo rm a n c e
e v a lu a tio n c o rre s p o n d in g to th e s p e c ia l c a s e o f a s ta n d -a lo n e a p p lic a tio n w h e re th e
a p p lic a tio n u n d e r s tu d y is th e u n iq u e in th e e x e c u tio n e n v iro n m e n t (th e re fo re th e re is n o
re s o u rc e c o n te n tio n ). In th is c a s e p e rfo rm a n c e e v a lu a tio n c a n b e c a rrie d o u t b y s ta n d a rd
g ra p h a n a ly s is te c h n iq u e s [3 2 ] to a s s o c ia te a n o v e ra ll “ c o s t” to e a c h p a th in th e m o b ? -
E G a s a fu n c tio n o f th e c o s t o f e a c h n o d e th a t b e lo n g s to th a t p a th . N o te th a t e a c h p a th
in th e m o b ? – E G c o rre s p o n d s to a d iffe re n t m o b ility s tra te g y , c o n c e rn in g w h e n a n d
w h e re c o m p o n e n ts m o v e . H e n c e th e s e re s u lts p ro v id e a n o p tim a l b o u n d o n th e e x p e c te d
p e rfo rm a n c e fo r e a c h s tra te g y , a n d c a n h e lp th e d e s ig n e r in s e le c tin g a s u b s e t o f th e
p o s s ib le m o b ility s tra te g ie s th a t d e s e rv e fu rth e r in v e s tig a tio n in a m o re re a lis tic s e ttin g
o f c o m p e titio n w ith o th e r a p p lic a tio n s .
T h e c o m p le te a p p lic a tio n o f S P E te c h n iq u e s im p lie s th e d e fin itio n o f a ta rg e t
p e rfo rm a n c e m o d e l o b ta in e d fro m th e m e rg in g o f th e m o b ? -E G w ith a Q N m o d e lin g th e
e x e c u tin g p la tfo rm . T h e m e rg in g le a d s to th e c o m p le te s p e c ific a tio n o f a E Q N M b y
d e fin in g jo b c la s s e s a n d ro u tin g , u s in g in fo rm a tio n fro m th e b lo c k s a n d p a ra m e te rs o f
th e m o b ? -E G . H o w e v e r, w e ll k n o w n tra n s la tio n m e th o d o lo g ie s [1 0 , 3 2 ] a re n o t
s u ffic ie n t to p e rfo rm th is m e rg in g b e c a u s e o f th e p re s e n c e o f th e m o b ? n o d e s w ith n o n -
d e te rm in is tic s e m a n tic s in th e m o b ? -E G ; h e n c e it is n e c e s s a ry to g iv e a n e w tra n s la tio n
ru le to c o p e w ith th is k in d o f n o d e s . T o th is e n d a n e x te n s io n o f c la s s ic a l E Q N M s h a s
b e e n p ro p o s e d [1 6 ], to b e u s e d a s T M N w h e n th e O N is th e e x te n d e d U M L d e fin e d in
th e p re v io u s s e c tio n . T h e e x te n s io n is b a s e d o n th e d e fin itio n o f n e w s e rv ic e c e n te rs ,
c a lle d r ? (o u tin g ), th a t m o d e l th e p o s s ib ility , a f te r th e v is it o f a s e r v ic e c e n te r (a n d
th e re fo re th e c o m p le tio n o f a s o ftw a re b lo c k ) to c h o o s e , in a n o n -d e te rm in is tic w a y ,
w h ic h is th e r o u tin g to f o llo w : th e o n e m o d e llin g th e s ta tic s tra te g y o r th e o n e
m o d e llin g th e m o b ile s tra te g y .
In s u c h a w a y , a jo b v is itin g c e n te r r? g e n e ra te s tw o d iffe re n t m u tu a lly e x c lu s iv e
p a th s : o n e p a th m o d e ls th e jo b ro u tin g w h e n th e c o m p o n e n t c h a n g e s its lo c a tio n , th e
o th e r o n e m o d e ls th e ro u tin g o f a s ta tic c o m p o n e n t. N o te th a t, a s n o d e m o b ? in th e
E G , n o d e s r? a re c h a ra c te riz e d b y a n u ll s e rv ic e tim e , s in c e th e y o n ly re p re s e n t a ro u tin g
s e le c tio n p o in t. T h e o b ta in e d m o d e l is c a lle d m o b ? - E Q N M a n d is c h a ra c te riz e d b y
d iffe re n t ro u tin g c h a in s s ta rtin g fro m n o d e s r? . N o te th a t th e s e d iffe re n t ro u tin g c h a in s
a re m u tu a lly e x c lu s iv e ; in o th e r w o rd s a m o b ? -E Q N M a c tu a lly m o d e ls a fa m ily o f

E Q N M s , o n e fo r e a c h d iffe re n t p a th th ro u g h th e r? n o d e s , c o rre s p o n d in g to a d iffe re n t
m o b ility p o lic y .
E x a m p le 8 . T h e fo llo w in g fig u re illu s tra te s a n e x a m p le o f m o b ? -E Q N M d e riv e d fro m
th e m o b ? -E G o f e x a m p le 7 , e x p lo itin g a ls o in fo rm a tio n a b o u t th e e x e c u tio n p la tfo rm
( e .g ., o b ta in e d fro m a U M L D e p lo y m e n t D ia g ra m ). T h e fig u re e v id e n c e s th e m u tu a lly
e x c lu s iv e ro u tin g c h a in s .
n o
y e s C P U 1
C P U 0 r?
n e t0 1
n o
r?
y e s
C P U 2
n e t0 2 n e t1 2
p a th 1
p a th 2
p a th 3
W h e n th e m o b ? -E Q N M is th e T M N , th e S T s u g g e s te d in [1 6 ] fo r c o n te n tio n b a s e d
a n a ly s is is b a s e d o n s o lv in g th e m o b ? -E Q N M th ro u g h w e ll a s s e s s e d te c h n iq u e s [2 0 ,
3 2 ], s e p a ra te ly c o n s id e rin g e a c h d iffe re n t E Q N M b e lo n g in g to th e fa m ily m o d e le d b y
th e m o b ? -E Q N M . W h e n th e n u m b e r o f d iffe re n t E Q N M s is h ig h , th is s o lu tio n
a p p ro a c h c o u ld re s u lt in a h ig h c o m p u ta tio n c o m p le x ity . T h is p ro b le m c a n b e
a lle v ia te d b y e x p lo itin g re s u lts fro m th e s ta n d -a lo n e a n a ly s is . H o w e v e r, m o re e ffic ie n t
s o lu tio n m e th o d s d e s e rv e fu rth e r in v e s tig a tio n . S ta rtin g fro m th e o b ta in e d re s u lts it is
p o s s ib le to c h o o s e th e m o b ility s tr a te g y w h ic h is o p tim a l a c c o rd in g to th e s e le c te d
c rite rio n , fo r e x a m p le th e o n e th a t m in im iz e s th e re s p o n s e tim e .
T h e p r im a r y g o a l o f th is tu to r ia l h a s b e e n to p r o v id e a s tr u c tu r e d v ie w w ith in th e
d o m a in o f p e rfo rm a n c e v a lid a tio n o f m o b ile s o ftw a re a rc h ite c tu re s . T h e c la s s ific a tio n o f
th e a p p ro a c h e s p re s e n te d in s e c tio n 4 h a s b e e n s u p p o rte d b y a g e n e ra l fra m e w o rk
(s e c tio n 2 ) c la s s ify in g th e p a ra m e te rs e a c h a p p ro a c h h a s to d e a l w ith . In ta b le 5 w e
s u m m a riz e th e v a lu e s (in s o m e c a s e s th e c la s s e s o f v a lu e s ) a s s u m e d b y th e p a ra m e te rs
in tro d u c e d in s e c tio n 2 in a ll th e ty p e s o f a p p ro a c h e s re v ie w e d in s e c tio n 4 .
B e s id e s a d -h o c m o d e ls , w h o s e m e rits a n d lim ita tio n s h a v e b e e n o u tlin e d in s e c tio n
4 .1 , tw o k in d s o f a p p r o a c h e s f o r th e s y s te m a tic m o d e lin g a n d a n a ly s is o f N F A in
m o b ile s o ftw a re a rc h ite c tu re s e m e rg e fro m o u r re v ie w , b a s e d o n th e u s e o f fo rm a l o r
s e m i-fo rm a l la n g u a g e s a s O N . W e w o u ld lik e to re m a rk h e re th a t o u r re v ie w is
p ro b a b ly n o t c o m p le te , b u t w e b e lie v e it is re p re s e n ta tiv e o f e x is tin g a p p ro a c h e s .
T h e m e rit o f fo rm a l la n g u a g e s c o m e s p rim a rily fro m th e ir la c k o f a m b ig u ity , a n d
th e ir p re c is e c o m p o s itio n a l fe a tu re s . H o w e v e r, a s it c a n a ls o b e in fe rre d fro m ta b le 5 ,
th e ir u s e in N F A v a lid a tio n re q u ire s th e a s s ig n m e n t o f re w a rd a n d e x p o n e n tia l d u ra tio n
to a ll th e m o d e lle d a c tio n s , th a t c o u ld b e q u ite a d iffic u lt ta s k . T h e m e th o d p re s e n te d in
s e c tio n 4 .2 p r o v id e s g u id e lin e s a b o u t h o w b u ild in g m e a n in g f u l r a te s , b u t le a v e s o p e n

th e p ro b le m o f h o w c o lle c tin g th e re q u ire d d e ta ile d in fo rm a tio n . O n e w a y to o v e rc o m e
th is d ra w b a c k c o u ld b e b a s e d o n b rid g in g th e g a p b e tw e e n p ro c e s s a lg e b ra s a n d o th e r
fo rm a lis m s (lik e U M L ) u s e d b y s o ftw a re d e s ig n e rs , to fa c ilita te th e e x tra c tio n o f th e
d e ta ile d in f o r m a tio n re q u ire d b y p r o c e s s a lg e b r a s f r o m a r tif a c ts p ro d u c e d b y th e
d e s ig n e rs .
O n th e o th e r h a n d , th e u s e o f U M L a s O N fro m w h ic h to d e riv e p e rfo rm a n c e m o d e ls
is n o t im m u n e fro m p ro b le m s a s w e ll, s o th e g e n e ra l p ro b le m o f d e riv in g m e a n in g fu l
p e rfo rm a n c e m o d e ls fro m U M L a rtifa c ts d e s e rv e s fu rth e r in v e s tig a tio n b y its e lf. A ls o
U M L m o d e lin g o f m o b ile s o ftw a re a rc h ite c tu re s s till a p p e a rs n o t c o m p le te ly
s a tis fa c to ry , b e c a u s e o f th e la c k o f w id e ly a c c e p te d m o d e ls fo r a ll th e m o b ile c o d e s ty le s
re v ie w e d in s e c tio n 3 .
T a b l e 5 . S u m m a ry o f p a ra m e te rs in s ta n tia tio n in th e re v ie w e d a p p ro a c h e s to n o n
fu n c tio n a l re q u ire m e n ts v a lid a tio n o f m o b ile s o ftw a re a rc h ite c tu re s
O N A S N F A T M N M I S T
R E X , p ro c e s s in g tim e , c lo s e d -fo rm a n a ly tic
n o n e C O D , M A n e tw o rk lo a d m o d e l
R E X , M A p ro c e s s in g tim e P e tri n e t p a ra m e te rs n u m e ric a l
“ in d ire c t
lo c a tio n ” tra n s itio n a n d
F o rm a l R E X , “ a n y ” s to c h a s tic P . A . re w a rd m a in ly
( P .A .) “ d ire c t C O D , M A a n d M R P ra te s n u m e ric a l
lo c a tio n ”
s ta n d a rd
S e m i- in te ra c tio n re la te d p e rfo rm a n c e m a in ly
F o rm a l (s ta n d -a lo n e M R P , M D P a n n o ta tio n s in n u m e ric a l
(U M L ) w ith a p p lic a tio n s ) th e S D s
m o b ility - M A
o rie n te d th ro u g h p u t, p e rfo rm a n c e
e x te n s io n s re s p o n s e tim e E x e c u tio n a n n o ta tio n s in a n a ly tic ,
(s ta n d -a lo n e a n d G ra p h , th e U M L n u m e ric a l,
c o n te n tio n -b a s e d E Q N M d ia g ra m s s im u la tio n
m e a s u re s )
B e s id e s th e p rim a ry g o a l o f re v ie w in g n o n fu n c tio n a l re q u ire m e n ts v a lid a tio n

a p p ro a c h e s fo r m o b ile s o ftw a re a rc h ite c tu re s , a n o th e r g o a l o f th is tu to ria l h a s b e e n to
p ro m o te th e id e n tific a tio n o f in s ta n c e s o f th e p ro p o s e d g e n e ra l v a lid a tio n fra m e w o rk in
o th e r a re a s , s u c h a s re lia b ility v a lid a tio n o f re a l-tim e s y s te m s , ju s t to n a m e o n e . W e d o
n o t c la im th is c la s s ific a tio n a s b e in g “ th e rig h t o n e ” , b u t w ith th is tu to ria l w e ra th e r
w o u ld lik e to s o lic it fe e d b a c k , e x te n s io n s a n d n e w in s ta n c e s to c o m e o u t.
E v e ry b o d y in v o lv e d in m o d e rn (d is trib u te d , e m b e d d e d , m o b ile ) s o ftw a re s y s te m s
d e s ig n , is c o n c e rn e d a b o u t “ s o ftw a re q u a lity ” , b u t s till v e ry fe w o f th e m ta k e in to
a c c o u n t th e a c tu a l p o s s ib ility o f in tro d u c in g m e th o d o lo g ie s /te c h n iq u e s /to o ls to
s y s te m a tic a lly im p ro v e th is a ttrib u te . T h e re fo re w e b e lie v e th a t in th e a re a o f n o n -
fu n c tio n a l re q u ire m e n ts v a lid a tio n m u c h w o rk s h o u ld b e d o n e in th e d ire c tio n o f
s c h e m a tiz a tio n to m a k e it a n a c c e p ta b le a c tiv ity fro m th e s o ftw a re d e s ig n e r s id e .
A c k n o w le d g e m e n ts
W o rk p a rtia lly s u p p o rte d b y M U R S T p ro je c t “ S A H A R A : S o ftw a re A rc h ite c tu re s fo r
h e te ro g e n e o u s a c c e s s n e tw o rk in fra s tru c tu re s ” .
B ib lio g r a p h y
1 . S . A lh ir, “ T h e tru e v a lu e o f th e U n ifie d M o d e lin g L a n g u a g e ” , D is tr ib u te d C o m p u tin g , 2 9 -
3 1 , J u ly 1 9 9 8 .
2 . M . B a ld i, G .P . P ic c o “ E v a lu a tin g th e tra d e o ffs o f m o b ile c o d e d e s ig n p a ra d ig m s in
n e tw o rk m a n a g e m e n t a p p lic a tio n s ” in P ro c . 2 0 th In t. C o n f. o n S o ftw a r e E n g in e e r in g
(IC S E 9 8 ) , ( R . K e m m e r e r a n d K . F u ta ts u g i e d s .) , K y o to , J a p a n , A p r . 1 9 9 8 .
3 . S .B a ls a m o , M .S im e o n i “ D e r iv in g P e rfo rm a n c e M o d e ls fro m S o ftw a re A rc h ite c tu re
S p e c ific a tio n s ” R e s . R e p . C S -2 0 0 1 -0 4 , D ip . d i In fo rm a tic a , U n iv e rs ità d i V e n e z ia , F e b .
2 0 0 1 ; E S M 2 0 0 1 , S C S , E u r o p e a n S im u la tio n M u tic o n fe r e n c e 2 0 0 1 , P ra g u e , 6 -9 J u n e
2 0 0 1 .
4 . M . B a rb e a u “ T ra n s fe r o f m o b ile a g e n ts u s in g m u ltic a s t: w h y a n d h o w to d o it o n w ire le s s
m o b ile n e tw o rk s ” T e c h . R e p . T R - 0 0 - 0 5 , S c h o o l o f C o m p u te r S c ie n c e , C a rle to n
U n iv e rs ity , J u ly 2 0 0 0 .
5 . L . B a s s , P . C le m e n ts , R . K a z m a n , S o ftw a r e A r c h ite c tu r e s in P r a c tic e , A d d is o n -W e s le y ,
N e w Y o rk , N Y , 1 9 9 8 .
6 . G . B o o c h , J . R u m b a u g h , a n d I.J a c o b s o n , T h e U n ifie d M o d e lin g L a n g u a g e U s e r G u id e ,
A d d is o n W e s le y , N e w Y o rk , 1 9 9 9 .
7 . L . C a r d e lli, A .D . G o r d o n “ M o b ile a m b ie n ts ” F o u n d a tio n s o f S o ftw a r e S c ie n c e a n d
C o m p u ta tio n a l S tr u c tu r e s (M . N iv a t e d .), L N C S 1 3 7 8 , S p rin g e r-V e rla g , 1 9 9 8 , p p . 1 4 0 -
1 5 5
8 . N . C a rrie ro , D . G e le rn te r “ L in d a in c o n te x t” C o m m u n ic a tio n s o f th e A C M , v o l. 3 2 , n o .4 ,
1 9 8 9 , p p . 4 4 4 -4 5 8 .
9 . T .- H . C h ia , S . K a n n a p a n “ S tr a te g ic a lly m o b ile a g e n ts ” in P r o c . 1 s t I n t. C o n f. o n M o b ile
A g e n ts (M A ’9 7 ), S p rin g e r-V e rla g , 1 9 9 7 .
1 0 . V . C o rte lle s s a , R . M ira n d o la “ P R IM A -U M L : a p e rfo rm a n c e v a lid a tio n in c re m e n ta l
m e th o d o lo g y o n e a r ly U M L d ia g ra m s ” S c ie n c e o f C o m p u te r P r o g r a m m in g , E ls e v ie r
S c ie n c e , v o l 4 4 , n .1 , p p 1 0 1 -1 2 9 , J u ly 2 0 0 2 .
1 1 . R . D e N ic o la , G . F e rra ri, R . P u g lie s e , B . V e n n e ri “ K L A IM : a k e rn e l la n g u a g e fo r a g e n ts
in te ra c tio n a n d m o b ility ” IE E E T r a n s . o n S o ftw a r e E n g in e e r in g , v o l. 2 4 , n o . 5 , M a y
1 9 9 8 , p p . 3 1 5 -3 3 0
1 2 . G . F e r r a r i, C . M o n ta n g e r o , L . S e m in i, S . S e m p r in i “ M o b ile a g e n ts c o o r d in a tio n in
M o b a d tl ” P r o c . o f 4 t h In t. C o n f. o n C o o r d in a tio n M o d e ls a n d L a n g u a g e s
(C O O R D IN A T IO N ’0 0 ), ( A . P o r to a n d G .- C . R o m a n e d s .) , S p r in g e r - V e r la g , L im a s s o l,
C y p ru s , S e p t. 2 0 0 0 .
1 3 . A . F u g g e tta , G .P . P ic c o , G . V ig n a “ U n d e rs ta n d in g c o d e m o b ility ” IE E E T r a n s . o n
S o ftw a r e E n g in e e r in g , v o l. 2 4 , n o . 5 , M a y 1 9 9 8 , p p . 3 4 2 -3 6 1 .
1 4 . N . G o tz , U . H e rz o g , M . R e tte lb a c h “ M u ltip ro c e s s o r s y s te m d e s ig n : th e in te g ra tio n o f
fu n c tio n a l s p e c ific a tio n a n d p e rfo rm a n c e a n a ly s is u s in g s to c h a s tic p ro c e s s a lg e b ra s ” in
P e r fo r m a n c e E v a lu a tio n o f C o m p u te r a n d C o m m u n ic a tio n S y s te m s (L . D o n a tie llo a n d R .
N e ls o n e d s .) , L N C S 7 2 9 , S p r in g e r - V e r la g , 1 9 9 3 .
1 5 . V . G r a s s i, R . M ira n d o la , “ M o d e lin g a n d p e rfo rm a n c e a n a ly s is o f m o b ile s o ftw a re
a rc h ite c tu re s in a U M L fra m e w o rk ” in < < U M L 2 0 0 1 > > C o n fe r e n c e P r o c e e d in g s , L N C S
2 1 8 5 , S p rin g e r V e rla g , O c to b e r 2 0 0 1 .
1 6 . V . G ra s s i, R . M ira n d o la , “ P R IM A m o b -U M L : a M e th o d o lo g y fo r P e rfo rm a n c e a n a ly s is o f
M o b ile S o ftw a re A rc h ite c tu re ” , in W O S P 2 0 0 2 , T h ir d In te r n a tio n a l C o n fe r e n c e o n
S o ftw a r e a n d P e r fo r m a n c e , A C M , J u ly 2 0 0 2 .
1 7 . R . G ra y , D . K o tz , G . C y b e n k o , D . R u s “ M o b ile a g e n ts : m o tiv a tio n s a n d s ta te -o f-th e -a rt
s y s te m s ” in H a n d b o o k o f A g e n t T e c h n o lo g y , A A A I/M IT P re s s , 2 0 0 1 .
1 8 . H . H e rm a n n s , U . H e rz o g , J .-P . K a to e n “ P ro c e s s a lg e b ra s fo r p e rfo rm a n c e e v a lu a tio n ” ,
T h e o r e tic a l C o m p u te r S c ie n c e , v o l. 2 7 4 , n o . 1 -2 , 2 0 0 2 , p p . 4 3 -8 7 .
1 9 . I . J a c o b s o n , G . B o o c h , J . R u m b a u g h , T h e U n ifie d S o ftw a r e D e v e lo p m e n t P r o c e s s ,
A d d is o n -W e s le y O b je c t T e c h n o lo g y S e rie s , 1 9 9 9 .
2 0 . R . J a in , A r t o f C o m p u te r S y s te m s P e r fo r m a n c e A n a ly s is , W ile y , N e w Y o rk , 1 9 9 0 .
2 1 . T . K a w a m u ra , S . J o s e p h , A . O h s u g a , S . H o n id e n “ Q u a n tita tiv e e v a lu a tio n o f p a irw is e

in te ra c tio n s b e tw e e n a g e n ts ” in J o in t S y m p . o n A g e n t S y s te m s a n d A p p lic a tio n s a n d
S y m p . o n M o b ile A g e n ts (A S A /M A 2 0 0 0 ), S p rin g e r L N C S 1 8 8 2 , p p . 1 9 2 -2 0 5 , 2 0 0 0 .
2 2 . D . K o tz , G . J ia n g , R . G r a y , G . C y b e n k o , R .A . P e te r s o n “ P e r f o r m a n c e a n a ly s is o f m o b ile
rd
a g e n ts f o r f ilte r in g d a ta s tre a m s o n w ire le s s n e tw o rk s ” in 3 A C M W o r k s h o p o n
M o d e lin g , A n a ly s is a n d S im u la tio n o f W ir e le s s a n d M o b ile S y s te m s (M S W iM 2 0 0 0 ),
A u g . 2 0 0 0 .
2 3 . R . M iln e r, C o m m u n ic a tio n a n d C o n c u r r e n c y , P re n tic e H a ll, 1 9 8 9 .
2 4 . R . M iln e r , C o m m u n ic a tin g a n d M o b ile S y s te m s : th e -c a lc u lu s , C a m b r id g e U n iv . P r e s s ,
1 9 9 9 .
2 5 . C . N o tte g a r, C . P ria m i, P . D e g a n o “ P e rfo rm a n c e e v a lu a tio n o f m o b ile p ro c e s s e s v ia
a b s tra c t m a c h in e s ” IE E E T r a n s . o n S o ftw a r e E n g in e e r in g , v o l. 2 7 , n o . 1 0 , O c t. 2 0 0 1 , p p .
8 6 7 -8 8 9
2 6 . G .P . P ic c o , G .-C . R o m a n , P .J . M c C a n n “ R e a s o n in g a b o u t c o d e m o b ility in M o b ile
U N IT Y ” A C M T r a n s a c tio n s o n S o ftw a r e E n g in e e r in g a n d M e th o d o lo g y , v o l. 1 0 , n o . 3 ,
J u ly 2 0 0 1 , p p . 3 3 8 -3 9 5 .
2 7 . A . P u lia fito , S . R ic c o b e n e , M . S c a rp a “ A n a n a ly tic a l c o m p a ris o n o f th e c lie n t-s e rv e r,
re m o te e v a lu a tio n a n d m o b ile a g e n t p a ra d ig m s ” in 1 s t I n t. S y m p . o n A g e n t S y s te m s a n d
A p p lic a tio n s a n d 3 r d In t. S y m p . o n M o b ile A g e n ts (A S A /M A 9 9 ), O c t. 1 9 9 9 .
2 8 . M .L . P u te r m a n , M a r k o v D e c is o n P r o c e s s e s , J . W ile y a n d S o n s , 1 9 9 4 .
2 9 . K . R o th e rm e l, F . H o h l, N . R a d o u n ik lis “ M o b ile a g e n t s y s te m s : w h a t is m is s in g ? ” in
D is tr ib u te d A p p lic a tio n s a n d In te r o p e r a b le S y s te m s (D A IS 1 9 9 7 ), 1 9 9 7 .
3 0 . D . S a n g io rg i “ E x p re s s in g m o b ility in p ro c e s s a lg e b ra s : firs t-o rd e r a n d h ig h e r-o rd e r
p a ra d ig m s ” P h D th e s is , U n iv . o f E d in b u rg h , 1 9 9 2 .
3 1 . B . S e lic , “ R e s p o n s e to th e o m g rfp fo r s c h e d u la b ility , p e rfo rm a n c e a n d tim e ” , O M G
d o c u m e n t n u m b e r a d /2 0 0 1 -0 6 -1 4 , J u n e 2 0 0 1 .
3 2 . C . S m ith , P e r fo r m a n c e E n g in e e r in g o f S o ftw a r e S y s te m s , A d d is o n -W e s le y , R e a d in g ,
M A , 1 9 9 0 .
3 3 . C . S m ith , L . W illia m s , P e r fo r m a n c e s o lu tio n s : A P r a c tic a l G u id e to C r e a tin g
R e s p o n s iv e , S c a la b le S o ftw a r e , A d d is o n W e s le y , 2 0 0 2 .
3 4 . J .W . S ta m o s , D .K . G iffo rd “ Im p le m e n tin g re m o te e v a lu a tio n ” IE E E T r a n s . o n S o ftw a r e
E n g in e e r in g , v o l. 1 6 , n o . 7 , J u ly 1 9 9 0 , p p . 7 1 0 -7 2 2 .
3 5 . M . S tra s s e r, M . S c h w e h m “ A p e rfo rm a n c e m o d e l fo r m o b ile a g e n t s y s te m s ” in In t.
C o n fe r e n c e o n P a r a lle l a n d D is tr ib u te d P r o c e s s in g T e c h n iq u e s a n d A p p lic a tio n s (P D P T A
9 7 ) , v o l. I I , ( H .R . A r a b n ia e d .) , L a s V e g a s 1 9 9 7 , p p . 1 1 3 2 -1 1 4 0 .
3 6 . S u n M ic ro s y s te m s “ T h e J a v a la n g u a g e ” , W h ite P a p e r, 1 9 9 4 .
3 7 . S u n M ic ro s y s te m s “ T h e J a v a s e rv le t A P I” , W h ite P a p e r, 1 9 9 7 .
3 8 . M . W e r m e lin g e r , J .L . F ia d e iro “ C o n n e c to rs fo r m o b ile p ro g ra m s ” IE E E T r a n s . o n
S o ftw a r e E n g in e e r in g , v o l. 2 4 , n o . 5 , M a y 1 9 9 8 , p p . 3 3 1 -3 4 1 .
3 9 . W O S P 2 0 0 0 , P ro c . o f th e 2 n d In t. W o r k s h o p o n S o ftw a r e a n d P e r fo r m a n c e , A C M , 2 0 0 0 .
Performance Issues of Multimedia Applications
Edmundo de Souza e Silva1 , Rosa M. M. Leão1 , Berthier Ribeiro-Neto2 , and

Sérgio Campos2
1
Federal University of Rio de Janeiro
COPPE/PESC, Computer Science Department {rosam, edmundo}@land.ufrj.br
2
Federal University of Minas Gerais
Computer Science Department {berthier,scampos}@dcc.ufmg.br
Abstract. The dissemination of the Internet technologies, increasing

communication bandwidth and processing speeds, and the growth in de-
mand for multimedia information gave rise to a variety of applications.
Many of these applications demand the transmission of a continuous
flow of data in real time. As such, continuous media applications may
have high storage requirements, high bandwidth needs and strict delay
and loss requirements. These pose significant challenges to the design
of such systems, specially since the Internet currently provides no QoS
guarantees to the data it delivers. An extensive range of problems have
been investigated in the last years from issues on how to efficiently store
and retrieve continuous media information in large systems, to issues on
how to efficiently transmit the retrieved information via the Internet.
Although broad in scope, the problems under investigation are tightly
coupled. The purpose of this chapter is to survey some of the techniques
proposed to cope with these challenges.
1 Introduction
The fast development of new technologies for high bandwidth networks, wireless
communication, data compression, and high performance CPUs has made it
technically possible to deploy sophisticated communication infrastructures for
supporting a variety of multimedia applications. Among these we can distinguish,
for instance, quality audio and video on demand (to the home), virtual reality
environments, digital libraries, and cooperative design.
Multimedia objects, such as movies, voice extracts, texts, and pictures, are
usually stored in compressed (encoded) form on the disks of a multimedia server.
Since the encoded objects might be long, the playing of an object should not be
delayed until the whole object is transmitted. Instead, the playing of the object
should be initiated as early as possible.
A common characteristic among multimedia applications is the so-called con-
tinuous nature of their generated data. In continuous media (CM), strict timing
relationships exist that define the schedule by which CM data must be rendered

This work is supported in part by grants from CNPq/ProTeM. E. de Souza e Silva
is also supported by additional grants from CNPq/PRONEX and FAPERJ.

Performance Issues of Multimedia Applications 375
(e.g., a video displayed, 3D graphics rendered, or audio played out). These tim-
ing relationships coupled with the high aggregate bandwidth needs, the high
individual application bandwidth needs, and the high storage requirements pose
significant challenges to the design of such systems. This is particularly trouble-
some in the scenario of the Internet, which is beginning to be used to convey
multimedia data but which was not designed for this purpose.
In this work, we discuss the main technical issues involved in the design and
implementation of practical (distributed) multimedia systems. We take a partic-
ular view, which divides the system in three main components: the multimedia
server, the resource sharing techniques for transmitting data across the network,
and methods for improving the utilization of network bandwidth and buffers. We
look at each of these components, reviewing the related literature, introducing
the key underlying technical issues, and providing insights on how each of them
impacts the performance of the multimedia system.
2 The Multimedia Server
The multimedia server is a key component of a distributed multimedia system.

Its performance (in terms of the number of clients supported) affects the overall
cost of the system and might be decisive for determining economical viability.
As a result, studying the performance of multimedia servers is an important
problem which has received wide attention lately [8,42,19,18,17,28,30,34,33,45,
46,54,63]. The server is a computer system containing one or more processors,
a finite amount of memory M , and a number D of disks. The disks are used to
store compressed multimedia objects, which are retrieved by the clients.
Compressed video objects are composed of frames, where a frame is a snap-
shot of the state of all bits in the screen. To decode the frames in a stream,
the client has to store them in memory which requires some level of buffering.
The frames are consumed at a constant rate. Since the number of bits in each
component frame varies, the input bit rate and the output bit rate for the buffer
at the client side are variable (VBR).
It is common to implement the server such that it always sends data to the
client in blocks of fixed size. When it is possible to always send the same number
of blocks in the unit of time, we say that the traffic flows at a constant bit
rate (CBR). Keeping a CBR (or nearly CBR) traffic implies that the frame rate
varies at the input of the client buffer. To avoid interruption of the display, a
much larger buffer might be required at the client to compensate for variations
in the frame arrival rate. In Sec. 3 we discuss traffic smoothing techniques to
compensate for these rate variations. Several proposals in the literature are then
based on CBR assumptions [9,14,53,55,67].
Due to disk seek and rotational delays, one or more sectors need to be re-
trieved from the server during each disk access to attain good performance.
The set of disk sectors that the server sends to the client at one time is here
called a data block. Each data block is stored in the buffer of the client and
consumed from there. While the client decodes a data block, other clients can
376 E. de Souza e Silva et al.
be served. This way the server is able to multiplex the disk bandwidth among
various clients, which are served concurrently. The approach works because the
total disk bandwidth available at the server far exceeds the display rate with
which each client consumes bytes.
Let Oi be a reference to the ith multimedia object in the server and bi be a
reference to any data block of the object Oi . Consistently with several prototype
implementations, we assume that the data blocks of each object Oi are all of
the same size. The data blocks of distinct objects, however, might be of different
sizes (i.e., size(bi ) = size(bj )).
A client makes a request for an object Oi . If this request is admitted into the
system, the server starts sending blocks of the object Oi to the client machine.
The client might have to wait until the buffer fills up to a pre-defined threshold
before starting to play the object. The time interval between the client request
and the beginning of the display is called startup latency. To send the blocks
to the client, the server first retrieves them from disk into main memory. Thus,
buffers are also required at the server side.
A client gets a block of data and starts consuming it. Before consuming all
the data in that block, the client must get the next block of data for the object
it is playing. Otherwise, interruption in the service will occur. In the case of
a movie, this means that the motion picture might suddenly freeze in front of
the user (also called hiccup). Thus, each client must get the blocks of data in a
timely fashion.
2.1 The Size of the Multimedia Server
The size of a multimedia server installation is a direct function of the number

D of disks in the system. Given the server size, the maximum load that can be
imposed to the system is determined, as we now discuss.
The number of disks used in a multimedia server is related to the bandwidth
demand, to the storage requirements, and to the amount of capital available for
investing in the system. Consider, for instance, movie objects encoded in MPEG-
2 (320 × 240 screen). The typical bandwidth requirement for such objects is 1.5
Mbps (mega bits per second). Thus, to support the display of 1500 MPEG-2
movie objects, a total net bandwidth of 2250 Mbps is required. The scenario 1
below illustrates this situation.
Scenario 1: SCSI Technology: effective disk bandwidth: 60 MBps = 480
Mbps1 ; disk storage capacity: 73.4 GB; maximum number of concurrent cus-
tomers: 1500; bandwidth requirement of 1 MPEG-2 object: 1.5 Mbps; storage re-
quirement of 1 MPEG-2 object: 1 GB; effective server bandwidth required: 1500
* 1.5 = 2250 Mbps; rough number of disks required in the server: 2250/480 = 5;
number of distinct MPEG-2 objects in storage: 5 ∗ 73.4! = 367; number of disks
with 20% redundancy: 6.
1
Estimated bandwidth in mega bits per second (Mbps), including seek time, for cur-
rent disk technology.
Thus, to provide 1500 customers with real-time MPEG-2 streams we need a

total of 5 SCSI disks (current technology). This computation is quite rough, since
it does not consider memory and bus bandwidth bottlenecks, and redundancy
for fault tolerance. With a 20% degree of redundancy, a total of 6 disks would
be required.
Besides bandwidth, the storage requirements need also to be taken into ac-
count. Since the storage capacity of each SCSI disk considered above is 73.4 GB
(giga bytes) and each MPEG-2 object of 1 hour and 40 minutes takes roughly 1
GB, with 5 disks we can store up to 367 MPEG-2 objects.
However, 367 is not really the number of movies one would expect to find
in a video store. Typically, at least a few thousand movies should be available.
One alternative is to use cheaper technology, such as IDE, to provide plenty
of storage capacity with good bandwidth delivery. For instance, consider the
scenario 2 immediately below.
Scenario 2: IDE Technology: effective disk bandwidth: 16 MBps = 128 Mbps;
disk storage capacity: 80 GB; maximum number of concurrent customers: 1500;
bandwidth requirement of 1 MPEG-2 object: 1.5 Mbps; storage requirement of
1 MPEG-2 object: 1 GB; effective server bandwidth required: 1500 * 1.5 = 2250
Mbps; rough number of disks required in the server: 2250/128 = 18; number
of distinct MPEG-2 objects in storage: 18 ∗ 80! = 1440; number of disks with
50% redundancy: 27.
Thus, we can now store up to 1440 distinct MPEG-2 objects in 18 IDE disks
of 80 GB each. And this is accomplished while attending up to 1500 concurrent
customers as before. Notice that we now use a degree of redundancy of 50%,
because disks based on IDE technology are not as reliable as disks based on
SCSI technology. Since each IDE disk in scenario 2 costs about 1/6 of an SCSI
disk in scenario 1, the configuration in scenario 2 is either cheaper or price
equivalent to the configuration in scenario 1. Most important, replacing IDE
disks is easier because they can be bought everywhere at any time (i.e., IDE
technology is really ubiquitous nowadays).
The data block size, contrary to the number of disks, is related more to
the design of the system itself. Given a number of disks and a memory space
for buffers, usually an optimal block size can be determined. The block size
can be chosen to minimize the cost per stream or to maximize the number of
streams that can be supported concurrently. One primary constraint has to be
met: the block size must be large enough to amortize the delays due to seek and
rotational latency. Block sizes ranging from 512 Kbytes to 1 Mbytes are usually
large enough to accomplish this effect.
2.2 The Layout
The blocks which compose the various multimedia objects are laid out across
the disks in the system. A simplistic approach is to store all blocks of the same
object on a single disk. The main advantage of this approach is simplicity and
ease of maintenance. However, there is a considerable disadvantage. If a popular
video is heavily requested, the disk that stores that video will be overloaded.
Thus, severe load imbalance might result, which limits the number of clients
that can be served. More sophisticated strategies involve spreading the blocks of
the same object across multiple disks (the so called striping techniques).
Layout Using Striping. The key idea of striping is to spread out the data
blocks of each object across the disks of the server. This way, during the service
time of an object, each client request is continuously moved from one disk to
another and shares the bandwidth of all the disks in the system. We say that
the object storage has been decoupled from the disks and call this effect object
decoupling (see Figure 1). Object decoupling provides a load balancing effect
which allows a higher number of clients in the system and a better utilization of
disk bandwidth.
Usually, when a striped layout is used, the server operates in cycles. At
each cycle of duration T , the server retrieves one data block for each client in
the system (this retrieval incurs in three delays: seek time, rotational latency,
and transfer time). While that client consumes the block, other clients can be
served. Discontinuities in the service are avoided by guaranteeing that each client
is served in every cycle. When all clients in the system have been served, the
server sleeps if there is still time available in the current cycle of duration T .
To accommodate objects with distinct bandwidth requirements, we can sim-
ply allow the sizes of the storage units to vary. For instance, objects Oi and Oj
will have blocks sizes bi and bj (bi = bj ), respectively. Each block is stored as
a separate storage unit. For a same object Oi , however, the block sizes are all
the same (i.e., bi [j] = bi [j + 1]). At the disk level, one can keep the storage unit
size constant to avoid fragmentation. For an object that has higher bandwidth
requirements, two or more storage units can be combined to compose a data
block, as illustrated in Figure 1. Each data block of the object Oi is composed
of a single storage unit while each data block of the object Oj is composed of
two storage units. We see that two or more disks might now be involved in the
retrieval of a unique data block. Since the storage unit size is kept constant,
storage and bandwidth fragmentation problems are minimized.
Random Data Allocation Layout. Striping layouts are good because they
provide object decoupling. However, in general, all striping strategies impose a
tight coupling between the layout itself and the block access pattern as a way
to balance the load among the various disks. To avoid this tight coupling, an al-
ternative is to employ a random data allocation. It can be shown that a random
layout is as good as striping in terms of performance [64], but presents important
advantages as we briefly point out here.
A random data allocation layout uses storage units that are all of the same
size. However, contrary to the striping approach, each storage unit is stored in a
disk position that is determined according to the following procedure: (a) select
a disk at random; (b) within that disk, select a free position at random.
As a result, storage units are placed randomly across the disks of the system.
Objects with higher bandwidth requirements are served by combining several
O i
o b je c t
d e c o u p lin g
1 2 D D + 1
O
j
1 2 D /2
b lo c k
d e c o u p lin g
e q u a l-s iz e d
s to r a g e u n its
1 2 3 4 5 D
Fig. 1. Hybrid layout with equal-sized stripe units and block sizes which vary from one
object to the other.
storage units to form a data block. Also, the physical location of the data blocks
is now independent of the block access pattern.
A random data allocation layout provides the following characteristics: object
decoupling; access pattern decoupling; no disk storage fragmentation; small prob-
ability of prolonged bandwidth fragmentation; good performance. The good per-
formance is attained because the load tends to be statistically balanced among
the various disks. Random data allocation is the only layout scheme that pro-
vides all these features together. Because of this, it simplifies the overall design
and implementation of the system. Therefore, we argue that it is the paradigm
of choice for the design and implementation of multimedia servers in general.
Comparative Performance Analysis: Striping versus Random Layout.
In [64] a detailed comparison between a server based on striping and a server
based on a random layout is presented. The experimental results show that
system performance with a random layout is competitive or superior to the
performance obtained with a striping layout. This is the case not only for un-
predictable access patterns generated by sophisticated interactive applications
such as virtual worlds and scientific visualizations, but also for sequential access
patterns generated by more standard video applications.
To illustrate, let us focus on the case of standard video applications. When
only a small amount of buffer is allowed at the server (say, 1.5 MBytes per
stream), a striping layout performs slightly superior to a random layout providing
an increase in the maximum number of streams sustained of roughly 5%. If the
amount of buffer per stream is allowed to increase to 3.5 MBytes per stream,
both layouts lead to the same overall performance. Additional increments in
buffer space per stream favor the random layout, whose performance becomes
superior.
Assume now that more disk space is made available, such that data blocks
can be replicated. This is useful, for instance, to improve reliability against disk
failure. Consider a 25% degree of replication of video data blocks. This is good
for a random layout because replicated blocks can be used to alleviate the load
of momentarily overloaded disks. In this case, with a buffer space of 3.5 MBytes
per stream, a server based on a random layout presents performance (maximum
number of streams sustained by the server) that is 10-15% higher than the
performance of a server based on a striping layout.
2.3 Staging, Reconfiguration, and Fault Tolerance
In practical installations, there are other important issues that have to be con-
sidered for proper operation of a multimedia server. Among these, we distinguish
the staging of new videos, the reconfiguration of the server to improve perfor-
mance, and fault tolerance against failures of service in the disks of the server.
In this section, we discuss these issues in more detail and compare their relative
performance considering random-based and striping-based servers.
The Staging Mechanism. Since multimedia objects might be quite large (par-
ticularly movie objects), the number of objects that can be stored on the disks
of the system might be quite limited. This implies that the objects in the system
need to be replaced by new ones from time to time. Since the new objects are
usually loaded from tape, we call this process the staging mechanism. This is
an issue which has not received much attention in the specialized literature but
which is critically important in any practical system.
For offline staging, the use of block decoupling provides an efficient solution.
If online staging is desired, the admission control and the scheduling processes
are affected because a new stream has to be admitted and scheduled. This type of
stream might require higher bandwidth because redundant data (such as copies
of the data to support fault tolerance) have also to be updated. For a striped
layout under heavy load, it might be the case that two fragmented pieces of
bandwidth are available but cannot be used for staging because a coalesced
bandwidth is required. For instance, this might be the case of a new stream
which is been fed live to the server. If the new stream is not live, then there is
no problem because the staging can proceed in non real-time mode.
Staging has similar costs both for a striped and for a random layout, when-
ever the staging is offline. For online staging, a random layout is advantageous.
For instance, a random layout makes it easier to deal with the staging of a new
stream which is been fed live. Also, a random layout allows the staging of a
new object at a rate which is different from its playout rate which is often more
difficult to do with a striped layout.
Disk Reconfiguration. In practical situations, it is reasonable to expect that
the demand on a given server system might eventually exceed its planned ca-
pacity. For instance, it might be the case that the demand for disk bandwidth
exceeds the total disk bandwidth currently available in the server. This problem
can be fixed by adding new disks to the system and copying data blocks (of the
objects already in the system and of new objects) into the new disks. This is
what we call disk reconfiguration. We would like to be able to reconfigure the
system while maintaining the server fully operational.
Consider that we have D disks in the system and that we want to add K
new disks. For simplicity, we consider here that the new disks are of the same
capacity and of the same bandwidth as the disks already installed in the server.
Consider also that no new objects will be added to the system. With current
disk technology, the extra K disks can be “hot” inserted into the system while
it is running. Thus, no interruption in service is required. However, the storage
units need to be remapped to take advantage of the newly available bandwidth.
To exemplify, assume an installation with 8 disks to which 2 new disks are
added. We have that D = 8 and K = 2. In this case, it can be shown that 80%
of all storage units need to be moved if the layout is done with striping, while
only 20% of all storage units need to be moved if the layout is random. Thus, we
conclude that it is much cheaper to reconfigure an installation when the layout
is random.
Fault Tolerance. Maintaining the integrity of the data and its accessibility are
crucial aspects of a multimedia server. Particularly critical are failures of the
disks of the system. While each individual disk is fairly reliable, a large set of
disks presents a considerably higher likelihood of failure of a component. With
a multimedia server, it is particularly important to provide tolerance to this
type of failure because failure of a single disk might disrupt the service to all
clients in the system. Basically, fault tolerance is provided by the maintenance
of redundant information about the data. Two basic schemes can be used: full
replication and parity encoding.
With parity encoding, the D disks of the system are divided in ng groups.
Let g, g = D/ng , be the number of disks per group. For each group, one of the
disks is reserved for storing parity information while the remaining g − 1 are
used for storing data. The parity information is computed as the exclusive-or of
the storage units in the g − 1 disks. We use storage units instead of data blocks
because, in case block decoupling is used, data blocks are not confined to a single
disk. Let sui [k], sui [k + 1], . . . , sui [k + g − 2] be g − 1 consecutive storage units
(belonging to data blocks of object Oi ) which appear each in a separate disk
(assume this for now). Then, the parity information pi [k] for this set of storage
units is computed as pi [k] = sui [k] ⊕ sui [k + 1] ⊕ . . . ⊕ sui [k + g − 2]. The
set composed of the parity storage unit p[k] and of the g − 1 storage units from
sui [k] to sui [k + g − 2] is called a parity group of size g. If the disk that contains
sui [k + 1] is lost, this storage unit can be rebuilt by the following computation
sui [k + 1] = sui [k] ⊕ pi [k] ⊕ . . . ⊕ sui [k + g − 2]. Thus, the disk with the
parity information takes the place of the disk which was lost.
The idea of fault tolerance with full replication is to use additional space
which is of the same size of the space occupied by the whole set of data blocks.
Thus, all data blocks are duplicated. While more expensive in terms of space,
this approach allows recovering from some types of catastrophic failures and
improving the performance of the system. Gains in performance are possible
because any request for a data block can now be served by two different disks
and thus, we can always select the disk with a smaller queue.
Full replication can also be useful in situations where a parity-based scheme

is not the most appropriate one. For instance, consider a distributed server com-
posed of multiple machines which contain themselves multiple disks. Assume
that we stripe the data across the multiple disks. In case a parity-based scheme
is adopted for fault tolerance, parity groups should be confined to individual
machines to avoid overheads in buffer, networking, and synchronization. This
provides tolerance to a disk failure but not to the failure of a machine. To pro-
vide tolerance to a machine failure, a full replication scheme can be adopted
instead in which each data block and its copy reside on separate machines. This
is in fact the approach adopted in [9].
It can be shown that a random layout allows using a parity-based scheme as
well as any random layout. Further, full replication can be better taken advantage
of with the adoption of a random layout (instead of a striped layout). In fact,
the design and implementation of recovery and load balancing algorithms is
simplified because one can rely on the randomness of the data block allocation
to even out the load.
3 Transmitting Information
There are several performance issues that need to be addressed in order to trans-
mit continuous real-time streams over the Internet with acceptable quality. For
instance, real time video encoded in MPEG2 typically requires an average band-
width of approximately 1-4 Mbps, and a voice stream approximately from 6-
64Kbps, depending on the encoding scheme. However, so far the Internet does
not allow bandwidth reservation as needed. In addition congestion in the net-
work may cause significant variability on the interval between the arrival of
successive packets (jitter). Since real time streams must be decoded and played
following strict time constraints, large jitter values will cause the playout pro-
cess to be interrupted. Packet losses may also severely degrade the quality of the
multimedia presentation, depending on the loss pattern. Yet another problem is
network heterogeneity and client heterogeneity. Client heterogeneity means that
the receivers have different network requirements, due to different capabilities
to present the received multimedia information. For multicast applications, the
heterogeneity imposes an additional challenge since a stream being transmitted
would have to be multicast through several networks and clients (with possi-
bly drastic different characteristics) and somehow adapt to the needs of each
client. In this section we discuss a few mechanisms used to mitigate the effects
of random delay and losses in the network.
3.1 How to Cope with Network Jitter and the Rate Variability
We start by considering an audio stream encoded with PCM, say with silence
detection. The audio stream is sampled at 125μsec interval and usually 160
samples are collected in a single packet generating a CBR stream of one IP packet
per 20msec [47] at each active interval. The client consumes the 160 samples at
every 20msec, and thus it is vulnerable to random delays in the network. If the
expected information does not arrive on time, annoying distortions may occur
in the decoded audio signal. Let T be the packet generation interval and X the
corresponding segment interarrival time. The random variable J = X − T is
called jitter.
One simple mechanism to reduce the jitter is to use a playout buffer at the
client, where a given number of packets are stored. At the beginning of each
active period, where packets are generated, the client stores packets till a given
threshold is reached before starting to decode the received samples. The thresh-
old value may be fixed at the beginning of the connection or be adjusted dynam-
ically during the duration of the session. Figure 2 illustrates the basic idea. In
number of
p a c k e t l o s s
packets
( s t a r v a t i o n )
u(t)
p l a y o u t
u(t) b u f f e r f u l l
a(t)
a(t)
l(t)
10
l(t)
B B
5
H H
0 5 10 15 20 0 5 10 15 20 time
Fig. 2. The playout buffer
that figure, the curve l(t) is equal to the number of packets consumed by the ap-
plication by time t. (In this example, it is assumed PCM encoding and thus the
packet consumption rate is constant.) The upper curve is simply u(t) = l(t) + B
where B is the playout buffer space. The curve labeled a(t) is equal to the num-
ber of packets that have arrived by time t. Note that the arrival instants are not
equally spaced due to the jitter introduced by the network. In the leftmost part
of Fig. 2, B = 8 and H = 6, and so the decoding starts immediately after the
arrival of the 6-th packet. The amount of packets stored in the playout buffer
as a function of time is a(t) − l(t), while u(t) − a(t) quantifies the buffer space
available at t. Buffer starvation occurs if the lower curve touches the bottom
curve and buffer overflow occurs when the middle curve crosses the top curve.
As shown in the figure, the buffer empties at t = 18. Thus, at t = 19 there is no
packet to be decoded (buffer starvation). When this occurs, some action must be
taken perhaps re-playing the last information in the buffer, as an approximation
of the data carried by the missing packet at that time. In the right hand part
of Fig. 2 the threshold value H is increased to H = 7. As a consequence, l(t)
is shifted to the right. In this example, this change prevents buffer starvation
during the observation period. The value of B is also decreased to 7, and u(t) is
moved downwards with respect to the preceding curve.
It is easy to see that this simple technique eliminates any negative jitter. From
Fig. 2, it is also clear that larger threshold values decrease the jitter variabil-
ity. However, latency increases with increasing threshold values. But interactive
applications, such as a live conversation, do not tolerate latencies larger than
200 − 300 msec. This imposes a constrain of 20-25 packets on H. An issue is the
choice of H and the amount of buffer space necessary to minimize the loss of
packets in case a long burst of packets arrive at the receiver.
Diniz and de Souza e Silva [22] calculate the distribution of the jitter as seen
by the client, when a playout buffer is used. The packet interarrival time is mod-
eled by a phase-type distribution that matches the first and second moments
of this measure obtained from actual network measurements. Packets are con-
sumed at constant rate (PCM), similar to the example of Fig. 2. Silent periods
are included in the model. The goal is to study the tradeoffs between latency and
probability of a positive jitter. It was concluded that the probability of a positive
jitter can be significantly reduced, while maintaining an acceptable latency for
real time traffic.
In addition to the delay variability imposed by the network, compressed
audio/video streams exhibit non-negligible burstiness on several time scales, due
to the encoding schemes and scene variations. Sharp variations on traffic rates
have a negative impact on resource utilization. For instance, more bandwidth
may be necessary to maintain the necessary QoS for the application. The issue
is to develop control algorithms to smooth the CM traffic before transmission to
the clients.
Smoothing techniques can be applied at the traffic source or at another in-
termediate node (e.g., a proxy) in the path to the client. Sen et al [65] address
the issue of online bandwidth smoothing. To better understand the problem con-
sider Fig. 3, where it is assumed that there is no variable delay imposed by the
network when a compressed video stream is sent to a client.
Due to the compression encoding, the rate of bit consumption at the client
node varies with time. Video servers however, read fixed size blocks of informa-
tion from the storage server (each block may be fragmented into packets to fit
the network maximum transfer unit (MTU) before transmission). Then in Fig.
3, the interval between the consumption of constant size data blocks at the client
varies with time. The jumps in the y-axis (a block of data) are of constant size.
This contrasts with the usual representation of variable frame size consumed at
fixed intervals of time (e.g. 1/30sec).
In Fig. 3, a controller at the server site schedules the transmission of video
blocks after they are retrieved from the storage server and queued in a FIFO
buffer. Two sets of curves are shown in the figure. In set 1, let lc (t) be the number
of bits consumed by the client by time t, and uc (t) = lc (t) + Bc , where Bc is
the size of the playout buffer. Similarly, let as (t) (in set 2) be the accumulated
number of bits that are read from the server disks during (0, t) according to the
ls(t) smoothed stream ac(t)

as(t) lc(t)
server
smoothing client
algorithm
number of
packets
as(t) uc(t)
us(t)
ac(t)
lc(t)
10
B s
ls(t)
H
set 2
set 1
0 5 10 15 20
t0 25
t1 30 35
t’1 40 time
Fig. 3. The concept of smoothing
demand of the client, and ls (t) be the smoothed stream curve, i.e. the number
of bits effectively dispatched by t. Note that: (a) ac (t) = ls (t − τ ) where τ is a
(assumed constant) network delay from the server to the client; (b) as (t − τ ) =
lc (t) where τ # is the constant network delay plus the delay to fill the playout
buffer; (c) the jumps of lc (t) occur at the instants of consumption of a block of
data. If we assume that the playout buffer is filled until its capacity before the
continuous stream is played back, the number of bits in the playout buffer is
given by the difference between the top and the middle curves in set 1.
The server seeks to transmit data to the client as smooth as possible that is,
ls (t) in the figure should resemble a straight line with the smallest possible right
angle. Since the shape of lc (t) and consequently uc (t) and as (t) is determined
by the encoding algorithm applied to the video to be transmitted, the issue is
how to plan the transmission of the data so that uc (t) ≤ ac (t) ≤ lc (t), and
yet the maximum transmission rate is kept as close as possible to the average
consumption rate.
In [62] Salehi et al obtained an efficient algorithm that can generate a trans-
mission schedule given the complete knowledge of ac (t). This is referred to as an
offline algorithm. Roughly, from a initial time ti (start from i = 0), one should
construct the longest possible line that does not violate the constraints imposed
by uc (t) and lc (t) in Fig. 3. Clearly, by construction, this straight line intersects
one of the boundary curves at a time point ti+1 > ti , (and so the rate would
have to be changed at this point), and touches one of these curves at a time
ti+1 < ti+1 . To avoid sudden rate changes one should vary the previous rate
as soon as possible. Consequently a new starting point is chosen at ti+1 . The
process is repeated (setting i = i + 1) until the end of the stream is reached, and
ac (t) is obtained which determines the scheduling algorithm.
The set 2 in Fig. 3 shows the arrival and transmission curves at the server.
Note that the server starts its transmission as soon as a threshold H is reached
(in the figure the threshold is equal to 5 blocks). us (t) = ls (t) + Bs , where
Bs is the FIFO buffer available at the server. One should note that, once ls (t)
is determined, Bs and the threshold can be calculated to avoid overflow and
underflow.
We can represent the curves uc (t), lc (t) and ac (t) as vectors u, l, a respec-
tively, each with dimension N , where N is the number of data blocks in the video
stream, and the i-th entry in one of the vectors, say vector l, is the amount of
time to consume the i-th block. In [62] majorization is used as a measure of
smoothness of the curves. Roughly, if a vector x is majorized by y (x ≺ y) then
a smoother curve than y. It is shown in [62] that if x ≺ y then
x represents
var(x) = i (xi − x)2 and since, by definition, the maximum entry in x is less
or equal than the corresponding entry in y, a vector x that is majorized by y
has smaller maximum rate and smaller variance than y. The schedule algorithm
outlined above is shown to be optimum and unique in [62]. Furthermore the
optimal schedule minimizes the effective bandwidth requirements.
This algorithm is the basis for the online smoothing problem [65]. It is as-
sumed that, at any time τ , the server has the knowledge of the time to consume
each of the next P blocks. This is called the lookahead interval and is used to
compute the optimum smoothing schedule using the algorithm of [62]. Roughly,
blocks that are read from the storage server are delayed by w units and passed
to the server buffer that implements the smoothing algorithm which is invoked
every 1 ≤ α ≤ w blocks.
Further work in the topic include [50] where it is shown that, given a buffer
space of size B at the server queue, a maximum delay jitter J can be achieved
by an off-line algorithm (similar to the above algorithm). Furthermore, an on-
line algorithm can achieve a jitter J using buffer space 2B at the server FIFO
queue. While the smoothing techniques described above are suitable for video ap-
plications, interactive visualization applications pose additional problems. This
subject was studied in [73].
In summary, the amount of buffer space in the playout buffer of Fig. 3 is a
key parameter that determines how smooth the transmission of a CM stream
can be. It also serves to reduce the delay variability introduced by the network.
The queue at the server site in Fig. 3 implements the smoothing algorithm and
the amount of buffer used is also a design parameter. As mentioned above, jitter
reduction and smoothness are achieved at expense of the amount of buffer space.
But the larger these buffers the larger the latency to start playing the stream.
3.2 How to Cope with Losses
In the previous subsection we are concerned with random delays introduced by

the network as well as the ability to reduce sudden rate chains in the coded data
stream to lower the demand for network resources. Besides random delays, the
Internet may drop packets mainly due to congestion at the routers. Furthermore,
packet delays may be so large that they may arrive too late to be played at the
receiver, in a real time application.
A common method to recover from a packet loss is retransmission. However,
a retransmitted packet will arrive at the receiver at least one round-trip-time
(RTT) later than the original copy and this delay may be unacceptable for real
time applications. Therefore, retransmission strategies are only useful if the RTT
between the client and the server is very small compared with amount of time
to empty the playout buffer.
A number of retransmission strategies have been proposed in the context
of multicast streaming. For example, receiver-initiated recovery schemes may
obtain the lost data from neighboring nodes which potentially have short RTTs
with respect to the node that requested the packet [69].
The other methods to cope with packet loss in CM applications are error
concealment, error resilience, interleaving and FEC (forward error correction).
Error resilience and error concealment are techniques closely coupled with the
compression scheme used. Briefly error resilience schemes attempt to limit the
error propagation due to the loss, for instance via re-synchronization. An error
propagation occurs when the decoder needs the information contained in one
frame to decode other frames. Error concealment techniques attempt to recon-
struct the signal from the available information when part of it is lost. This is
possible if the signal exhibits short term self-similarities. For instance, in voice
applications the decoder could simply re-play the last packet received when the
current packet is not available. Another approach is to interpolate neighboring
signal values. Several other error concealment techniques exist (see [57,74] for
more details and references on the subject).
Interleaving is an useful technique for reducing the effect of loss bursts. The
basic idea is to separate adjacent packets of the original stream by a given
distance, and re-organize the sequence at the receiver. This scheme introduces
no redundancy (and so does not consume extra bandwidth), but introduces
latency to re-order the packets at the receiver.
FEC techniques add sufficient redundancy in the CM stream so that the
received bit stream can be reconstructed at the receiver even when packet losses
occur. The main advantage of FEC schemes is the small delay to recover from
losses in comparison with recovery using retransmission. However, this advantage
comes at the expense of increasing the transmission rate. The issue is to develop
FEC techniques that can recover most of the common patterns of losses in the
path from the sender to the receiver without much increase in the bandwidth
requirements to transmit the continuous stream.
A number of FEC techniques have been proposed in the literature [57,74].
The simplest approach is as follows. The stream of packets is divided into groups
of size N − 1, and a XOR operation is performed on the N − 1 packets of each
group. The resulting “parity packet” is transmitted after each group. Clearly, if
a single packet is lost in a group of N packets the loss can be recovered.
Three issues are evident from this simple scheme. First, since a new packet
is generated for every N − 1 packets the bandwidth requirements increases by
a factor of 1/(N − 1). As an example, considering voice PCM transmission, if

N = 5 the new necessary bandwidth to transmit the stream would be 80 Kbps
instead of 64 Kbps. Second, it is necessary to characterize the transmission losses
to evaluate how effective is the FEC scheme. If losses come into bursts of length
B > 1, then this simple scheme is evidently not effective. Third, a block of N
packets must be entirely received in order to recover from a loss. Consequently,
N must be smaller than the receiver playout buffer which is in turn limited
by the latency that can be tolerated in an interactive application. Furthermore,
suppose a loss occurs at position n in a block. Then N −n should be smaller than
the number of packets in the playout buffer at the arrival time of the (n − 1)-th
packet in the block. This indicates that the probability that the queue length of
playout buffer is smaller than N should be small for the method to be effective.
In summary, to avoid a substantial increase in the bandwidth requirements the
block size should be larger, but a large block size implies a large playout buffer
which in turn increase latency.
Measures in the Internet have shown that loss probabilities are little sensitive
to packet sizes [10,29]. Therefore, another scheme to protect against losses is to
transmit a sample of the audio stream in multiple packets [12]. For instance,
piggybacking the voice sample carried by packet n in packet n + 1 as well. This
technique allows the recovery of single losses with minimum latency, but at the
expense of doubling the throughput. A clever way to reduce the bandwidth
requirements is to include in packet n + 1 the original (voice) sample sent in
packet n, but further compressed using a smaller bit rate codec than the primary
encoding. As an example, suppose the primary (voice) sample of each packet is
replicated twice. If we use PCM codec for the primary sample (64k Kbps), GSM
for the first copy (13.2 Kbps) and LPC for the second copy (≈ 5 Kpbs) burst sizes
of length 2 can be recovered at the expense of increasing the throughput from
64 Kbps to ≈ 82 Kbps. Therefore, with only a 20.8% increase in the throughput,
and an increase in latency of 40 msec (2 × 20 msec packet delay), bursts of
length 2 can be recovered in this example.
Packet losses occur mostly due to congestion in the network and so it can be
argued that increasing the throughput rate of a stream transmission to recover
from losses is unfair with respect to flow controlled sessions such as TCP. There-
fore, it is evident that FEC schemes should be optimized for the type of loss
incurred by the network in order to have the smallest impact in the consumed
bandwidth. The work in [29] was aimed at studying the packet loss process in
the Internet and at proposing a new efficient XOR-FEC mechanism extending
previous work. Several measures were calculated from traces between Brazil and
the USA such as: the distribution of the number of consecutive losses and the
distribution of the number of packets received between two losses. These two
measures are particularly important for determining the efficiency of a recovery
XOR-FEC based algorithm. The approach proposed in [29] can be briefly de-
scribed as follows. The CM stream is divided into groups each called a window
and the packets in a window is sub-divided into non-overlapping sets, each pro-
tected by an XOR operation. Figure 4(a) illustrates an example with 6 packets
(a)
1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4
X O R X O R X O R X O R
(b) 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2
Fig. 4. The FEC scheme of [29]
per window and two subsets, and we call this a 2:6 class of algorithm. The result
of the XOR operation for a set of packets is sent piggybacked in a packet of the
next window. Clearly we can use codecs of smaller transmission rate as in [12]
for saving in bandwidth. Note that burst errors of size at most equal to 2 packets
can be recovered, and efficiency in bandwidth is gained at expense of latency
to recover from losses. Furthermore, the scheme in Fig. 4(a) has the practically
same overhead as the simple XOR scheme first described, but can recover from
consecutive losses of size two. Another class of schemes can be obtained by merg-
ing two distinct k : n class of algorithms, such that all packets belong to at least
two different subsets and therefore are covered by two different XORs. Figure
4(b) illustrates an example where schemes 1:2 and 3:6 are overlapped. In this
case, all losses of size one in the larger window can be recovered. Furthermore,
complex loss patterns can also be protected. For example, if packets 2, 3, 4, and
5 were lost, they can all be recovered. The overhead of this mixed scheme is
clearly obtained by adding the overhead of the individual schemes.
In [29] the algorithm is applied in the real data obtained from measures and
the efficiency of different FEC-based algorithms are evaluated. The conclusions
show the importance of adapting to different networks conditions. (This issue
was also addressed by Bolot et al [11], in the context of the scheme of [12].)
Furthermore, in all the tests performed, the class of schemes that mixed two
windows provided better results than the class with a single window under the
same overhead.
Altman et al developed an analytical model to analyze the FEC scheme of [12]
and other related schemes. The loss process was modeled using a simple M/M/K
queue. The aim is to assess the tradeoffs between increased loss protection from
the FEC algorithm and the adverse impact from the resulting increase in the
network resources usage due to the redundancy added to the original stream.
The results show that the scheme studied may not always result in performance
gains, in particular if a non-negligible fraction of the flows implements the same
FEC scheme. It would be interesting to evaluate the performance of other FEC
approaches.
To conclude the subsection we refer to a recent proposed approach to mitigate
the effect of losses. As is evident from above, short term loss correlations have
an adverse effect on the efficiency of the recovery algorithms. One way to reduce
the possible correlations is to split the continuous stream sent from a source to a
given destination into distinct paths. For instance, we could split a video stream
in two and sent the even packets in the sequence via one path and the odd
packets via another path to the destination. This is called path diversity [76].
In [6] it is assumed that the loss characteristics in a path can be represented
by a 2-state Markov chain (Gilbert model) and a Markov model was developed
to access the advantages of the approach. Clearly a number of tradeoffs exists,
depending on the loss characteristics of each path, if the different paths share or
not a set of links, etc.
3.3 Characterizing the Packet Loss Process and the Continuous

Media Traffic Stream
Packet losses is one of the main factors that influence the quality of the signal
received. Therefore, understanding and modeling the loss process is imperative
to analyze the performance of the loss recovery algorithms. In general, mea-
surements are obtained from losses seen by packet probes sent according to the
specific traffic under study (for instance at regular intervals of 20 msec), and
models are obtained to match the collected statistics. However, queueing models
with finite buffer have also been used.
Bolot [10] characterize the burstiness of packet losses by the conditional prob-
ability that a packet is lost given that a previous packet is also lost, and analyze
data from probes sent at constant intervals between several paths in the Inter-
net. A simple finite buffer single server queueing model (fed by two streams, one
representing the probes and the other representing the Internet traffic) was used
as the basis for interpreting the results obtained from the measures.
The most commonly used model for representing error bursts is a 2-state dis-
crete time Markov chain, usually called the Gilbert model. The Gilbert model
assumes that the size of consecutive losses is a geometric random variable. How-
ever, these models may not capture with accuracy the correlation structure of
the loss process. The work in [75] use a 2k -state Markov chain to model the
loss process, aimed at capturing temporal dependencies in the traces they col-
lected. They analyze the accuracy of the models against several traces collected
by sending probes at regular intervals. Salamatian and Vaton [61] propose the
use of Hidden Markov models (HMM) to model the loss sequence in the Inter-
net, due to their capability to represent dependencies in the observed process.
In an HMM each state can output a subset of symbols according to some distri-
bution. The states and transitions between states are not observable, but only
the output symbols. In [61] it is shown that HMM models are more appropri-
ate to represent the loss sequence than the 2k -state Markov chain model used
in [75], with less states. However, one disadvantage of the method is the cost
of estimating the Markov chain parameters. More recently, the authors of [43]
compare four models to represent the loss process in wireless channels, including
the 2k -state Markov model, a HMM with 5 states and a proposed On-Off model
where the holding times at each state are characterized by a mixture of geometric
phases which are determined by using the Baum-Welch algorithm. The conclu-
sion indicates that the extended On-Off model better captures first and second
order statistics of the traces studied. A recent study done by Markopoulou et al

[51] evaluates the quality of voice in the Internet. In this work a methodology
is developed that takes into account delay and loss measurements for assessing
the quality of a call. The measures were obtained by sending regularly spaced
probes to measurement facilities in different cities. The results indicate the need
of carefully evaluating the CM traffic and properly designing the playout buffers.
Traffic characterization is one important topic for understanding the influence
of CM streams in the network resources. The topic is related to loss character-
ization and the main goals are to obtain concise descriptions of the flow under
study and to capture in the model relevant statistics of the flow. The objective is
to predict, with sufficient accuracy, the impact of the traffic generated by appli-
cations on the resources being utilized (both in the network and in the servers),
and evaluate the QoS perceived by the applications. The amount of work done
on traffic characterization is sufficient vast to deserve surveys and books on the
area [31,1,52,56]. Our interest in this chapter is to introduce a few issues on the
topic.
In order to build a model of the traffic load, the proper traffic descriptors that
capture important characteristics of the flows competing for resources have to
be chosen. Examples of traffic descriptors are: the mean traffic rate, the peak-to-
mean ratio, the autocovariance, the index of dispersion and the Hurst parameter.
The issue is to select a set of descriptors such that traces and models with
matching descriptors produce similar performance metrics.
A large number of models have been proposed in the literature. They include
models in which the autocorrelation function decays exponentially (for instance,
the Markovian models), and models in which the autocorrelation function decays
at a slower rate, that is, hyperbolically (in this case the corresponding stationary
process is called long-range dependent [56]). Although not possessing the long-
range dependence property Markov models are attractive due to several reasons.
First, they are mathematically tractable. Second, long-range correlations can be
approximately obtained from certain kind of models. Third, it may be argued
that long-range dependency is not a crucial property for some performance mea-
sures and Markov models can be used to accurately predict performance metrics
(e.g. see [38,37]). Usually, a traffic model is built (or evaluated) by matching the
descriptors calculated from the model against those obtained from measurement
data. For Markovian models it is not difficult to calculate first and second order
statistics [49].
As can be inferred from the above discussion, Markovian models are a useful
tool for the performance evaluation of CM applications. They can be used to
model the traffic stream, generate artificial loads, model the loss process and
evaluate techniques to efficiently transmit CM streams.
4 Resource Sharing Techniques
Conventional multimedia servers provide each client with a separate stream. As

a consequence, the resources available in the multimedia system, particularly
network bandwidth, can be quickly exhausted. Consider, for instance, the sce-
narios 1 and 2 discussed in Sec. 2. To maintain 1500 concurrent streams live,
with a separate bandwidth allocated to each of them, it is necessary to sustain
a total bandwidth of 2,25 Gbps at the output channel of the multimedia server.
While technically feasible, this is quite expensive nowadays and prohibitive from
a commercial point of view.
A common approach for dealing with this problem is to allow several clients to
share a common stream. This is accomplished through mechanisms, here called
resource sharing techniques, which allow the clients to share streams and buffers.
The goal is to reduce the demand for network bandwidth, disk bandwidth, and
storage space. While providing the capability of stream sharing, these techniques
have also to provide QoS to the clients.
As is Sec. 3 client QoS is affected by the server characteristics, such as la-
tency and available disk bandwidth, and by the network characteristics, such as
bandwidth, loss, jitter, and end-to-end delay. The challenge is to provide very
short startup latency and jitter for all client requests and to be able to serve a
large number of users at minimum costs.
The bandwidth sharing mechanisms proposed in the literature fall into two
categories: client request oriented and periodic broadcast. Client request oriented
techniques are based on the transmission, by the server, of a CM stream in
response to multiple requests for its data blocks from the client. Periodic broad-
cast techniques are based on the periodic transmission by the server of the data
blocks.
Besides these sharing mechanisms, one can also use proxy servers to reduce
network load. Proxy servers are an orthogonal technique to bandwidth sharing
protocols, but one which is quite popular because it can be easily implemented
and can be managed at low costs.
We partition our presentation in three topics. We first describe client request
oriented techniques. Then, periodic broadcast mechanisms are presented, followed
by a discussion on proxy-based strategies.
4.1 Client Request Oriented Techniques
The simplest approach for allowing the sharing of bandwidth is to batch new
clients together whenever possible. This is called batching [2,20,21] and works as
follows. Upon the request of a new media stream si by an arriving client ck , a
batching window is initiated. Every new client that arrives within the bounds of
this window and requests the stream si is inserted in a waiting queue i.e., it is
batched together with the client ck . When the window expires, a single trans-
mission for the media stream si is initiated. This transmission is shared by all
clients, as in standard broadcast television. Batching policies reduce bandwidth
requirements at the expense of introducing an additional delay to the users (i.e.,
client startup latency increases).
Stream tapping [15], patching [13,40], and controlled multicast [32] were in-
troduced to avoid the latency problems of batching. They are very similar tech-
niques. They can provide immediate service to the clients, while allowing clients
arriving at different instants to share a common stream.
In the basic patching scheme [40], the server maintains a queue with all pend-
ing requests. Whenever a server channel becomes available, the server admits all
the clients that requested a given video at once. These clients compose a new
batch. Assume, that this new batch of clients requested a CM stream si which
is already being served. Then, all clients in this new batch immediately join this
on-going multicast transmission of si and start buffering the arriving data. To
obtain the initial part of the stream si , which is called a patch because it is no
longer being multicasted, a new channel is opened with the multimedia server.
Data arriving through this secondary channel is immediately displayed. Once the
initial part of si (i.e., the patch) has been displayed, the client starts consuming
data from its internal buffer. Thus, in this approach, the clients are responsible
for maintaining enough buffer space to allow merging the patch portion of the
stream with its main part. They also have to be able to receive data in two
channels.
Stream tapping, optimal patching [13], and controlled multicast differ from
the basic patching scheme in the following way: they define an optimal patching
window wi for each CM stream si . This window is the minimum interval between
the initial instants of two successive complete transmissions of the same stream
si . The size of wi can improve the performance of patching. If wi is set too large,
most of the server channels are used to send patches. On the other hand, if wi is
too small no stream merging will occur. The patching window size is optimal if
it minimizes the requirements of server and network bandwidth. The algorithm
works as follows. Clients which requested the stream si prefetch data from an
on-going multicast transmission, if they arrive within wi units of time from the
beginning of the previous complete transmission. Otherwise, a new multicast
transmission of stream si is initiated. A mathematical model which captures the
relation between the patching window size and the required server bandwidth
is proposed in [13]. In [32] an expression for the optimal patching window is
obtained.
Figure 5 illustrates batching and patching techniques. In Fig. 5(a), three new
clients requesting the stream si arrive within the batching window. They are
served by the same multicast transmission of si . Figure 5(b) shows the patching
mechanism. We assume that the three requests arrive within a patching window.
The request r0 triggers the initial multicast transmission of stream si , r1 triggers
the transmission of the patch interval (t1 − t0 ) of si for r1 , and r2 starts a
transmission of the (t2 − t0 ) missing interval of si for r2 .
A study of the bandwidth required by optimal patching, stream tapping, and
controlled multicast is presented in [27]. It is assumed that the arrivals of client
requests are Poisson with mean rate equal to λi for stream si . The required
server bandwidth for delivery of stream si is given by [27] as: BOP,ST,CM =
(1 + wi2 Ni /2)/(wi + 1/Ni ), where Ni = λi Ti , Ti is the total length of stream si
and wi is the patching window.
m u ltic a s t o f s i
p a tc h m u ltic a s t
m u ltic a s t o f s i
fo r r1
r r r r r p a tc h m u ltic a s t
0 1 2 0 1 r2
fo r r2
tim e t0 t1 t2 tim e
b a tc h in g w in d o w p a tc h in g w in d o w
( a ) b a tc h in g te c h n iq u e ( b ) p a tc h in g te c h n iq u e
Fig. 5. Batching and patching techniques.
The expression presented above is very similar to the results obtained in [13,
32]. The value of the optimal patching window √ can be obtained differentiating the
expression for BOP,ST,CM . It is equal to ( 2Ni + 1−1)/Ni . The server √ bandwidth
for an optimal patching window is given by [32]: Boptimal window = 2Ni + 1 − 1.
A second approach to reduce server and network bandwidth requirements,
called Piggybacking, was introduced in [4,35,48]. The idea is to change dinami-
cally the display rates of on-going stream transmissions to allow one stream to
catch up and merge with the other. Suppose that stream si is currently being
transmitted to a client. If a new request for si arrives, then a new transmission
of si is started. At this point in time, the server slows down the data rate of
the first transmission and speeds up the data rate of the second transmission of
si . As soon as the two transmissions become identical, they can be merged and
one of the two channels can be released. One limitation of this technique is that
it requires a specialized hardware to support the change of the channel speed
dinamically.
In the hierarchical stream merging (HSM) techniques [7,25,27] clients that
request the same stream are hierarchically merged into groups. The client re-
ceives simultaneously two streams: the one triggered by its own request and a
second stream which was initiated by an earlier request from a client. With time,
the client is able to join the latter on-going multicast transmission and the de-
livery of the stream initiated by it can be aborted. The merged clients also start
listening on the next most recently initiated stream. Figure 6 shows a scenario
where four client requests arrive for stream s1 during the interval (t1 , t3 ). The
server initiates a new multicast transmission of s1 for each new client. At time
t2 , client c2 , who is listening to the stream for c1 , can join this on-going multicast
transmission. At time t3 , client c4 joins the transmission of c3 . Thus, after t3 ,
client c4 can listen to the multicast transmission initiated at t1 . At t5 , client c4
will be able to join the transmission started for client c1 . Bandwidth skimming
(BS) [26] is similar to HSM. In this technique, policies are defined to reduce the
user bandwidth requirements to less than twice the stream playback rate.
In [27], an expression for the required bandwidth of a server operating with
the HSM and BS techniques was obtained. Suppose that the transmission rate
needed by a client is equal to b units of the media playback rate (b = 2 for HSM
and b < 2 for bandwidth skimming) and that the request arrivals are Poisson
with mean rate equal to λi for stream si . The required server bandwidth for
delivery of stream si can be approximated by: BHSM,BS ≈ ηb ln(Ni /ηb + 1),
where Ni is defined as above and ηb is the positive real constant that satisfies:
b
ηb [1 − (ηb /(ηb + 1)) ] = 1
C h a n n e l
c 1
1
c 4 jo in s c 1
c jo in s c 2
2 1
c 3 jo in s c 1 3
c 3 4
c 4 jo in s c 3 5
c 6
2
c 4
7
t1 t2 t3 t4 t5
( a ) h ie r a r c h ic a l s tr e a m m e r g in g (b ) s k y s c ra p e r b ro a d c a s t
Fig. 6. Hierarchical stream merging and skyscraper broadcasting
Most of the studies in the literature have evaluated the required server band-
width for the proposed sharing techniques. One key question is how the perfor-
mance of a multimedia system is affected when the server bandwidth is limited
by the equations previously presented. In a recent work [68] analytical models
for HSM, BS and patching techniques are proposed to evaluate two performance
metrics of a multimedia system: the mean time a client request is delayed if the
server is overloaded (it is called the mean client waiting time) and the fraction
of clients who renege if they are not served with low delay (it is called the balk-
ing rate). The models assumptions are: client request arrivals are Poisson, each
client requests the entire media stream and all media streams have the same
length and require the same playout rate. The model proposed to evaluate the
balking rate is a closed two-center queueing network with C clients (C is the
server bandwidth capacity in number of channels). The mean client waiting time
is obtained from a two-center queueing network model with K users (There are
one user per class and each class represents one stream.) Each user models the
first request for a stream si , the others requests that batch with the first are
not represented in the model. Results obtained from the analytical models show
that the client balking rate may be high and the mean client waiting time is
low when the required server bandwidth is defined as in the equations presented
above. Furthermore the two performance parameters are very sensitive to the
increase in the client load.
4.2 Periodic Broadcast Techniques
The idea behind periodic broadcast techniques is that each stream is divided into
segments that can then be simultaneously broadcast periodically on a set of k
different channels. A channel c1 delivers only the first segment of a given stream
and the other (k −1) channels deliver the remainder of the stream. When a client
wants to watch a video, he must wait for the beginning of the first segment on
channel c1 . A client has a schedule for tuning into each of the (k − 1) channels to
receive the remaining segments of the video. The broadcasting schemes can be
classified into three categories [39]. The first group of periodic broadcast tech-
niques divides the stream into increasing sized segments and transmits them in
channels of the same bandwidth. Smaller segments are broadcast more frequently
than larger segments and the segments follow a size progression (l1 , l2 , ..., ln ). In
the Pyramid Broadcasting (PB) protocol [70], the sizes of the segments follow
a geometric distribution and one channel is used to transmit different streams.
The transmission rate of the channels is high enough to provide on time delivery
of the stream. Thus, client bandwidth and storage requirements are also high.
To address the problem of high resource requirements at the client side, a
technique called Permutation-based Pyramid Broadcasting (PPB) was proposed
in [3]. The idea is to multiplex a channel into k subchannels of lower rate. In the
Skyscraper broadcast technique [41] each segment is continuously transmitted at
the video playback rate on one channel as shown in Fig. 6. The series of segments
sizes is 1,2,2,5,5,12,12,25,25,52,52,... with a largest segment size equal to W .
Figure 6 shows two client request arrival times: one just prior the third segment
and the other before the 18th segment broadcast on channel 1. The transmission
schedules of both clients are represented by the gray shaded segments. The
schedule is such that a client is able to continuous playout the stream receiving
data in no more than two channels. The required maximum client buffer space
is equal to the largest segment size.
For all techniques described above the required server bandwidth is equal
to the number of channels and is independent of the client request arrival rate.
Therefore, these periodic broadcast techniques are very bandwidth efficient when
the client request arrival rate is high. A dynamic skyscraper technique was pro-
posed in [24] to improve the performance of the skyscraper. It considered the
dynamic popularity of the videos and assumed lower client arrival rates. It dy-
namically changes the video that is broadcast on the channels. A set of segments
of a video are delivered in transmission clusters. Each cluster starts every W
slots on channel 1 and broadcasts a different video according to the client re-
quests. Client requests are scheduled to the next available transmission cluster
using a FIFO discipline, if there is no transmission cluster already been assigned
to the required video. A new segment size progression is proposed in [27] to
provide immediate service to client. Server bandwidth requirements for trans-
mitting a stream si considering Poisson arrivals with mean rate λi are given by
[27]: BDyn Sky = 2U λi + (K − 2)/(1 + 1/λi W U ), where U is the duration of a
unit-segment, W is the largest segment size and K is the number of segments in
the segment size progression.
Another group, the harmonic broadcast techniques, divide the video in equal
sized segments and transmit them into channels of decreasing bandwidth. The
third group combines the approaches described above. They are a hybrid scheme
of pyramid and harmonic broadcasting.
4.3 Proxy Based Strategies
The use of proxies in the context of CM applications has several advantages.

Server and network bandwidth requirements can be reduced and the client
startup latency can be very low. In Sec. 4.1 and 4.2 the models to evaluate
the scalability of the bandwidth sharing techniques are based on the assump-
tion that client request arrivals are sequential (i.e., clients request a stream and
playout it from the beginning to the end). The required server bandwidth for
these techniques varies logarithmically with the client request arrivals (for stream
merging) and logarithmically with the inverse of the start-up delay (for periodic
broadcasting) [44]. In a recent work [44], tight lower bounds on the required
server bandwidth for multicast delivery techniques when the client request ar-
rivals are not sequential were derived. The results obtained suggested that for
non-sequential access the scalability of these techniques is not so high as for
sequential access. Thus, the use of proxies is a complementary strategy that can
reduce resource requirements and client latency of large scale CM applications.
Provisioning a multimedia application with proxy servers involves determin-
ing which content should be stored at each proxy. Several studies are based on
the storage of data accessed most frequently. Distinct approaches exist in the lit-
erature. One idea is to divide the compressed video in layers that can be cached
at the proxy. An alternative is to cache a portion of a video file at the proxy.
Recent work combines the use of proxies with bandwidth sharing mechanisms
such as periodic broadcast and client request oriented techniques. In the last
approach, not only the popularity of the video should be considered to decide
which portion of the stream have to be stored at the proxy. The data staged
at the proxy depends also on the mechanisms used to share the transmission
of data. In most of the bandwidth sharing mechanisms, the server delivers the
initial portion of a stream more frequently than the latter part. Therefore, the
storage of a prefix can reduce more significantly the transmission costs than the
storage of the suffix.
Caching of Video Layers. In a video layer encoding technique, the compressed
video stream is divided into layers: a base layer and enhancement layers. The
base layer contains essential low quality encoding information, while the en-
hancement layers provide optional information that can be used to improve the
video stream quality.
The approach used in [72] is to divide a video stream in two parts and to
store the bursts of the stream in a proxy. A cut-off rate Crate is defined, where
0 ≤ Crate ≤ Prate (Prate is the peak rate). The first part of the stream (the upper
part) exceeds the cut-off rate, and the remaider of the stream is the lower part.
The upper part is staged at a proxy server and the lower part is retrieved from
the server. The stream transmitted from the server to the clients approaches
to a CBR stream as Crate decreases. Two heuristic algorithms are presented to
determine which video and what percentage of it has to be cached at the proxy.
The first stores hot videos i.e., popular videos, entirely at the proxy. The second
stores a portion of a video so as to minimize the bandwidth requirements on the
server-proxy path. Results shown that the second heuristic performs better than
the first.
Another approach is presented in [59]. A mechanism for caching video layers
is used in conjunction with a congestion control and a quality adaptation mecha-
nism. The number of video layers cached at the proxy is based on the popularity
of the video. The more popular is a video, the more layers are stored in the
proxy. Enhancement layers of cached streams are added according to a quality
adaptation mechanism [60]. One limitation of this approach is that it requires
the implementation of a congestion control and of a quality adaptation mecha-
nism in all the transmissions between clients and proxies and between proxies
and servers.
Partial Caching of the Video File. The scheme proposed in [66] is based on
the storage of the initial frames of a CM stream in a proxy cache. It is called
proxy prefix caching. It was motivated by the observation that the performance
of CM applications can be poor due to the delay, throughput and loss character-
istics of the Internet. As presented in Sec. 3 the use of buffers can reduce network
bandwidth requirements and allow the application to tolerate larger variations
in the network delay. However, the buffer size is limited by the maximum startup
latency a client can tolerate. Proxy prefix caching allows reducing client startup
latency, specially when buffering techniques are used. The scheme work as fol-
lows. When a client requests a stream, the proxy immediately delivers the prefix
to the client and asks the server to initiate the transmission of the remaining
frames of the stream. The proxy uses two buffers during the transmission of
stream si : the prefix buffer Bp and a temporary buffer Bt . Initially, frames are
delivered from the Bp buffer while frames coming from the server are stored in
the Bt buffer.
Caching with Bandwidth Sharing Techniques. When using a proxy server
in conjunction with scalable delivery mechanisms several issues have to be ad-
dressed. The data to be stored at each proxy depends on the relative cost of
streaming a video from the server and from the proxy, the number of proxies,
the client arrival rate and the path from the server to the proxy (unicast or
multicast enabled).
Most of the studies in the literature [23,58,16,36,71,5] define a system cost
function which depends on the fraction of the stream stored at the proxy (wi ),
the bandwidth required for a stream (bi ), the client arrival rate (λi ) and the
length of the stream (Ti ). The cost for delivering a stream si is given by
Ci (wi , bi , λi , Ti ) = Bserver (wi , bi , λi , Ti ) + Bproxy (wi , bi , λi , Ti ) where Bserver is
the cost of the server-proxy required bandwidth and Bproxy is the cost of the
proxy-client required bandwidth. Then, an optimization problem is formulated.
The goal is to minimize the transmission costs subject to bounds on the total
storage and/or bandwidth available at the proxy. The solution of the problem
gives the proxy cache allocation that minimizes the aggregate transmission cost.
The work of [71] combines proxy prefix caching with client request oriented
techniques for video delivery between the proxy and the client. It is assumed that
the transmission between the server and the proxy is unicast and the network
paths from the proxy to the clients are either multicast/broadcast or unicast.
Two scenarios are evaluated: (a) the proxy-client path is unicast and (b) the
proxy-client path is multicast. For the scenario (a) two transmission strategies
are proposed. In the first a batching technique is used to group the client re-
quest arrivals within a window wpi (equal to the length of the prefix stored at
the proxy). Each group of clients is served from the same unicast transmission
from the server to the proxy. The second is an improvement of the first. It is
similar to the patching technique used in the context of unicast. If a client re-
quest arrives at time t after the end of wpi , the proxy schedules a patch for the
transmission of the missing part from the server. The (Ti − t) (Ti is the length of
the stream si ) remaining frames of the stream are delivered from the on-going
transmission of stream si . The client will receive data from at most two chan-
nels: the patch channel and the on-going transmission channel. For the scenario
(b), two transmission schemes are presented: the multicast patching technique
[13] implemented at the proxy and the multicast merging which is similar to the
stream merging technique [25]. A dynamic programming algorithm is used to
solve the optimization problem. Results show that the transmission costs when
a prefix cache is used are lower compared to caching the entire stream, and that
significant transmissions savings can be obtained with a small proxy size.
The work in [36] studies the use of proxy prefix caching with periodic broad-
cast techniques. The authors propose the use of patching to deliver the prefix
from the proxy to the client and periodic broadcast to deliver the remaining
frames (the suffix) from the server to the client. Clients will temporarily receive
both the prefix from the proxy and the suffix from the server. Therefore, the
number of channels a client needs is the sum of the channels to obtain the suffix
and the prefix. A slight modification in periodic broadcast and patching is intro-
duced such that the maximum number of simultaneous channels required by a
client is equal to two. Proxy buffer allocation is based on a three steps algorithm
aimed at minimizing server bandwidth in the path from the server to the proxy.
Results show that the optimal buffer allocation algorithm outperforms a scheme
where the proxy buffer is evenly divided among the streams without considering
the length of each stream.
In [5] the following scenarios are considered: (a) the bandwidth skimming
protocol is used in the server-proxy and proxy-client paths, (b) the server-proxy
path is unicast capable and the bandwidth skimming technique is used in the
proxy-client path, and (c) scenarios (a) and (b) combined to proxy prefix caching.
In the scenario (a) the proxy can store an arbitrary fraction of each stream si .
Streams are merged at the proxy and at the server using the closest target
bandwidth skimming protocol [25]. In the scenario (b) the server-proxy path
is unicast, thus only streams requested from the same proxy can be merged at
the server. Several results are obtained from a large set of system configuration
parameters. They show that the use of proxy servers is cost effective in the
following cases: the server-proxy path is not multicast enabled or the client
arrival rate is low or the cost to deliver a stream from the proxy to the client is
very small when compared to the cost to deliver a stream from the server to the
client.
5 Conclusions
In this chapter we have surveyed several performance issues related to the design
of real time voice and video applications (such as voice transmission tools and
multimedia video servers). These include issues from continuous media retrieval
to transmission. Since the topic is too broad to be covered in one chapter we
trade deepness of exposition to broadness, in order to cover a wide range of
inter-related problems.
As can be seen in the material covered, an important aspect in the design of
multimedia servers is the storage strategy. We favor the use of the random I/O
technique due to its simplicity of implementation and comparable performance
with respect to other schemes. This technique is particularly attractive when
different types of data are placed in the server, for instance mixture of voice,
video, transparencies, photos, etc. Furthermore, the same technique can be easily
employed in proxies. To evaluate the performance of the technique, queueing
models constructed from real traffic streams traces can be used. It is clear the
importance of accurate traffic models to feed the overall server model.
A multimedia server should try to send the requested streams as smooth
as possible (or as close as possible to CBR traffic) to minimize the impact of
sudden rate changes in the network resource. Large buffers at the receiver imply
better smoothing, but at the expense of increasing latency to start displaying
a stream. The receiver playout buffer is also used to reduce the packet delay
variability imposed by the network and to help in the recovery process when
a packet loss occur. We have surveyed a few packet recovery techniques, and
presented the main tradeoffs such as error correction capability and increase in
the transmission rate, efficiency versus latency, etc. Modeling the loss process
is an important problem and many issues remain open. Although some of the
conclusions in the chapter were drawn based on the study of voice traffic the
issues are not different for video traffic.
Due to the high speed of modern disk systems, presently the bottleneck to
delivery the continuous media stream to clients is mainly at the local network
where the server is attached, and not at the storage server. Therefore, an is-
sue that has drawn attention in recent years is the development of algorithms
to conserve bandwidth, when a large number of clients submit requests to the
server. Since multicast is still far from been widely deployed, we favor schemes
that use unicast transmission from the storage server to proxy servers. Between
the proxy and the clients multicast is more likely to be feasible, and therefore
multicast-based techniques to reduce bandwidth requirements are most likely to
be useful in the path from the proxy to the clients.
To conclude, we stress the importance of developing multimedia applica-

tions and perform tests on prototypes, collect statistics, develop models based
on the data obtained. Modeling tools are an important part of the evaluation
process, and this includes not only simulation but analytical tools, traffic gen-
erators, etc. Several tools and prototypes have been developed in recent years.
Our own tools include: video servers, implementing different storage techniques;
voice transmission tool implementing FEC recovery mechanisms; distributed
whiteboard (with multicast library), TANGRAM-II that includes a modeling
environment with analytical as well as simulation solvers, traffic modeling envi-
ronment and traffic generator and analyzer. (Most of the tools can be download
from www.land.ufrj.br and/or www.dcc.ufmg.br.)
References
1. A. Adas. Traffic Models in Broadband Networks. IEEE Communications Magazine,
(7):82–89, 1997.
2. C. C. Aggarwal, J. L. Wolf, and P. S. Wu. On optimal batching policies for video-
on-demand storage server. In Proc. of the IEEE Conf. on Multimedia Systems,
1996.
3. C. C. Aggarwal, J. L. Wolf, and P. S. Wu. A permutation-based pyramid broad-
casting scheme for video-on-demand systems. In Proc. of the IEEE Conf. on Mul-
timedia Systems, 1996.
4. C.C. Aggarwal, J.L. Wolf, and P.S. Wu. On optimal piggyback merging policies.
In Proc. ACM Sigmetrics’96, pages 200–209, May 1996.
5. J. Almeida, D. Eager, M. Ferris, and M. Vernon. Provisioning content distribution
networks for streaming media. In Proc. of IEEE/Infocom’02, June 2002.
6. J. Apostolopoulos, T. Wong, W. Tan, and S. Wee. On multiple description stream-
ing with content delivery networks. In Proc. of IEEE/Infocom’02, NY, June 2002.
7. A. Bar-Noy, G. Goshi, R. E. Ladner, and K. Tam. Comparison os stream merging
algorithms for media-on-demand. In Proc. MMCN’02, January 2002.
8. S. Berson, R.Muntz, S. Ghandeharizadeh, and X. Ju. Staggered striping in multi-
media information systems. In ACM SIGMOD Conference, 1994.
9. W. Bolosky, J.S. Barrera, R. Draves, R. Fitzgerald, G. Gibson, M. Jones, S. Levi,
N. Myhrvold, and R. Rashid. The Tiger video fileserver. In Proc. NOSSDAV’96.
1996.
10. J-C. Bolot. Characterizing end-to-end packet delay and loss in the Internet. In
Proc. ACM Sigcomm’93, pages 289–298, September 1993.
11. J-C. Bolot, S. Fosse-Parisis, and D. Towsley. Adaptative FEC-based error control
for Internet telephony. In Proc. of IEEE/Infocom’99, pages 1453–1460, 1999.
12. J-C. Bolot and A. Vega-Garcı́a. The case for FEC-based error control for packet
audio in the Internet. ACM Multimedia Systems, 1997.
13. Y. Cai, K. Hua, and K. Vu. Optimizing patching performance. In Proc. SPIE/ACM
Conference on Multimedia Computing and Networking, 1999.
14. S. Campos, B. Ribeiro-Neto, A. Macedo, and L. Bertini. Formal verification and
analysis of multimedia systems. In ACM Multimedia Conference. Orlando, Novem-
ber 1999.
15. S. W. Carter and D. D. E. Long. Improving video-on-demand server efficiency
through stream tapping. In Sixth International Conference on Computer Commu-
nications and Networks, pages 200–207, 1997.
16. S.-H.G. Chan and F. Tobagi. Tradeoff between system profit and user delay/loss
in providing near video-on-demand service. IEEE Transactions on Circuits and
Systems for Video Technology, 11(8):916–927, August 2001.
17. E. Chang and A. Zakhor. Cost analyses for VBR video servers. IEEE Multimedia,
3(4):56–71, 1996.
18. A.L. Chervenak, D.A. Patterson, and R.H. Katz. Choosing the best storage system
for video service. In ACM Multimedia Conf., pages 109–119. SF, 1995.
19. T. Chua, J. Li, B. Ooi, and K. Tan. Disk striping strategies for large video-on-
demand servers. In ACM Multimedia Conf., pages 297–306, 1996.
20. A. Dan, D. Sitaram, and P. Shahabuddin. Scheduling policies for an on-demand
video server with batching. In Proc. of the 2nd ACM Intl. Conf. on Multimedia,
pages 15–23, 1994.
21. A. Dan, D. Sitaram, and P. Shahabuddin. Dynamic batching policies for an on-
demand video server. Multimedia Systems, (4):112–121, 1996.
22. M.C. Diniz and E. de Souza e Silva. Models for jitter control at destination. In
Proc., IEEE Intern. Telecomm. Symp., pages 118–122, 1996.
23. D. Eager, M. Ferris, and M. Vernon. Optimized caching in systems with heteroge-
nous client populations. Performance Evaluation, (42):163–185, 2000.
24. D. Eager and M. Vernon. Dynamic skyscraper broadcasts for video-on-demand. In
4th International Workshop on Multimedia Information Systems, September 1998.
25. D. Eager, M. Vernon, and J. Zahorjan. Optimal and efficient merging schedules
for video-on-demand servers. In Proc. ACM Multimedia’99, November 1999.
26. D. Eager, M. Vernon, and J. Zahorjan. Bandwidth skimming: A technique for
cost effective video-on-demand. In Proc. Multimedia Computing and Networking,
January 2000.
27. D. Eager, M. Vernon, and J. Zahorjan. Minimizing bandwidth requirements for
on-demand data delivery. IEEE Transactions on Knowledge and Data Engineering,
13(5):742–757, September 2001.
28. F. Fabbrocino, J.R. Santos, and R.R. Muntz. An implicitly scalable, fully interac-
tive multimedia storage server. In DISRT’98, pages 92–101. Montreal, July 1998.
29. D.R. Figueiredo and E. de Souza e Silva. Efficient mechanisms for recovering voice
packets in the Internet. In Proc. of IEEE/Globecom’99, Global Internet Symp.,
pages 1830–1837, December 1999.
30. C.S. Freedman and D.J. DeWitt. The SPIFFI scalable video-on-demand system.
In ACM Multimedia Conf., pages 352–363, 1995.
31. V.S. Frost and B. Melamed. Traffic Modeling for Telecommunications Networks.
IEEE Communications Magazine, 32(3):70–81, 1994.
32. L. Gao and D. Towsley. Supplying instantaneous video-on-demand services using
controlled multicast. In IEEE International Conference on Multimedia Computing
and Systems, pages 117–121, 1999.
33. S. Ghandeharizadeh, R. Zimmermann, W. Shi, R. Rejaie, D. Ierardi, and T.-W.
Li. Mitra: a scalable continuous media server. Multimedia Tools and Applications,
5(1):79–108, July 1997.
34. L. Golubchick, J.C.S. Lui, E. de Souza e Silva, and R.Gail. Evaluation of perfor-
mance tradeoffs in scheduling techniques for mixed workload multimedia servers.
Journal of Multimedia Tools and Applications, to appear, 2002.
35. L. Golubchick, J.C.S. Lui, and R. Muntz. Reducing i/o demand in video-on-
demand storage servers. In Proc. ACM Sigmetrics’95, pages 25–36, May 1995.
36. Y. Guo, S. Sen, and D. Towsley. Prefix caching assisted periodic broadcast: Frame-
work and techniques to support streaming for popular videos. In Proc. of ICC’02,
2002.
37. D. Heyman and D. Lucantoni. Modeling multiple ip traffic with rate limits. In J.M.
de Souza, N. da Fonseca, and E. de Souza e Silva, editors, Teletraffic Engineering
in the Internet Era, pages 445–456. 2001.
38. D.P. Heyman and T.V. Lakshman. What are the Implications of Long-Range De-
pendence for VBR-Video Traffic Engineering. IEEE/ACM Transactions on Net-
working, 4(3):301–317, June 1996.
39. Ailan Hu. Video-on-demand broadcasting protocols: A comprehensive study. In
Proc. IEEE Infocom, pages 508–517, 2001.
40. K. A. Hua, Y. Cai, and S. Sheu. Patching: A multicast technique for true video-on
demand services. In Proceedings of ACM Multimedia, pages 191–200, 1998.
41. K.A. Hua and S. Sheu. Skyscraper broadcasting: a new broadcasting scheme for
metropolitan video-on-demand systems. In Proc. of ACM Sigcomm’97, pages 89–
100. ACM Press, 1997.
42. J.Chien-Liang, D.H.C. Du, S.S.Y. Shim, J. Hsieh, and M. Lin. Design and evalua-
tion of a generic software architecture for on-demand video servers. IEEE Trans-
actions on Knowledge and Data Engineering, 11(3):406–424, May 1999.
43. P. Ji, B. Liu, D. Towsley, and J. Kurose. Modeling frame-level errors in gsm wireless
channels. In Proc. of IEEE/Globecom’02 Global Internet Symp., 2002.
44. S. Jin and A. Bestavros. Scalability of multicast delivery for non-sequential stream-
ing access. In Proc. of ACM Sigmetrics’02, June 2002.
45. K. Keeton and R. Kantz. Evaluating video layout strategies for a high-performance
storage server. In ACM Multimedia Conference, pages 43–52, 1995.
46. J. Korst. Random duplicated assignment: An alternative to striping in video
servers. In ACM Multimedia Conference, pages 219–226. Seattle, 1997.
47. J.F. Kurose and K.W. Ross. Computer Networking: A Top-Down Approach Fea-
turing the Internet. Addison-Wesley, 2001.
48. S.W. Lau, J.C.S. Lui, and L. Golubchik. Merging video streams in a multimedia
storage server: Complexity and heuristics. ACM Multimedia Systems Journal,
6(1):29–42, January 1998.
49. R.M.M. Leão, E. de Souza e Silva, and Sidney C. de Lucena. A set of tools for
traffic modelling, analysis and experimentation. In Lecture Notes in Computer
Science 1786 (TOOLS’00), pages 40–55, 2000.
50. Y. Mansour and B Patt-Shamir. Jitter control in QoS networks. IEEE/ACM
Transactions on Networking, 2001.
51. A.P. Markopoulou, F.A. Tobagi, and M.J. Karam. Assessment of VoIP quality
over Internet backbones. In Proc. of IEEE/Infocom’02, June 2002.
52. H. Michiel and K. Laevens. Traffic Engineering in a Broadband Era. Proceedings
of the IEEE, pages 2007–2033, 1997.
53. R.R. Muntz, J.R. Santos, and S. Berson. A parallel disk storage system for real-time
multimedia applications. Intl. Journal of Intelligent Systems, 13(12):1137–1174,
December 1998.
54. B. Ozden, R. Rastogi, and A. Silberschatz. Disk striping in video server environ-
ments. In IEEE Intl. Conference on Multimedia Computing and Systems, 1996.
55. B. Ozden, R. Rastogi, and A. Silberschatz. On the design of a low-cost video-on-
demand storage system. In ACM Multimedia Conference, pages 40–54, 1996.
56. K. Park and W. Willinger. Self-Similar Network Traffic: an Overview, pages 1–38.
John Wiley and Sons, INC., 2000.
57. C. S. Perkins, O. Hodson, and V. Hardman. A survey of packet-loss recovery
techniques for streaming audio. IEEE Network Magazine, pages 40–48, Sep. 1998.
58. S. Ramesh, I. Rhee, and K. Guo. Multicast with cache (mcache): An adaptative
zero-delay video-on-demand service. IEEE Transactions on Circuits and Systems
for Video Technology, 11(3):440–456, March 2001.
59. R. Rejaie, H. Yu, M. Handley, and D. Estrin. Multimedia proxy caching mechanism
for quality adaptive streaming applications in the Internet. In Proc. IEEE Infocom,
pages 980–989, 2000.
60. Reza Rejaie, Mark Handley, and Deborah Estrin. Quality adaptation for congestion
controlled video playback over the Internet. In Proc. ACM Sigcomm’99, pages 189–
200, August 1999.
61. K. Salamatian and S. Vaton. Hidden Markov Modeling for network communica-
tion channels. In Proc. of Sigmetrics/Performance’01, pages 92–101, Cambridge,
Massachusetts, USA, June 2001.
62. J.D. Salehi, Z.L.Zhang, J.F. Kurose, and D. Towsley. Supporting stored video:
reducing rate variability and end-to-end resource requirements through optimal
smoothing. IEEE/ACM Transactions on Networking, 6(4):397–410, 1998.
63. J.R. Santos and R. Muntz. Performance analysis of the RIO multimedia storage
system with heterogeneous disk configurations. In ACM Multimedia Conf., 1998.
64. J.R. Santos, R. Muntz, and B. Ribeiro-Neto. Comparing random data allocation
and data striping in multimedia servers. In Proc. ACM Sigmetrics’00, pages 44–55.
Santa Clara, 2000.
65. S. Sen, J. Rexford, J. Dey, J. Kurose, and D. Towsley. Online smoothing of variable-
bit-rate streaming video. IEEE Transactions on Multimedia, 2000.
66. S. Sen, J. Rexford, and D. Towsley. Proxy prefix caching for multimedia streams.
In Proc. IEEE Infocom, pages 1310–1319, 1999.
67. P.J. Shenoy and H.M. Vin. Efficient striping techniques for multimedia file servers.
In Proc. NOSSDAV’97, pages 25–36. 1997.
68. H. Tan, D. Eager, M. Vernon, and H. Guo. Quality of service evaluations of
multicast streaming protocols. In Proc. of ACM Sigmetrics 2002, June 2002.
69. D. Towsley, J. Kurose, and S. Pingali. A comparison of sender-initiated and
receiver-initiated reliable multicast protocols. IEEE Journal on Selected Areas
in Communications, 15(3):398–406, April 1997.
70. S. Viswanathan and T. Imielinski. Pyramid broadcasting for video on demand
service. In Proc. IEEE Multimedia Computing and Networking, volume 2417, pages
66–77, 1995.
71. B. Wang, S. Sen, M. Adler, and D. Towsley. Optimal proxy cache allocation for
efficient streaming media distribution. In Proc. IEEE Infocom, 2002.
72. Y. Wang, Z. Zhang, D. Du, and D. Su. A network-conscious approach to end-to-
end video delivery over wide area networks using proxy servers. In Proc. of IEEE
Infocom 98, pages 660–667, Abril 1998.
73. W.R. Wong. On-time Data Delivery for Interactive Visualization Apploications.
PhD thesis, UCLA/CS Dept., 2000.
74. D. Wu, Y.T. Hou, and Y. Zhang. Transporting real-time video over the Internet:
Challenges and approaches. Proceedings of the IEEE, 88(12):1855–1875, December
2000.
75. M. Yajnik, S. Mon, J. Kurose, and D. Towsley. Measurement and modeling of the
temporal dependence in packet loss. In Proc. of IEEE/Infocom’99, 1999.
76. E. Steinbach Yi J. Liang and B. Girod. Real-time voice communication over the
Internet using packet path diversity. In Proc. ACM Multimedia 2001, Ottawa,
Canada, Sept./Oct. 2001.
Markovian Modeling of Real Data Traffic:
Heuristic Phase Type and MAP Fitting of
Heavy Tailed and Fractal Like Samples
András Horváth and Miklós Telek
Dept. of Telecommunications, Budapest University of Technology and Economics,

{horvath,telek}@webspn.hit.bme.hu
Abstract. In order to support the effective use of telecommunication

infrastructure, the “random” behavior of traffic sources has been stud-
ied since the early days of telephony. Strange new features, like fractal
like behavior and heavy tailed distributions were observed in high speed
packet switched data networks in the early ’90s. Since that time a fertile
research aims to find proper models to describe these strange traffic fea-
tures and to establish a robust method to design, dimension and operate
such networks.
In this paper we give an overview of methods that, on the one hand, allow
us to capture important traffic properties like slow decay rate, Hurst
parameter, scaling factor, etc., and, on the other hand, makes possible the
quantitative analysis of the studied systems using the effective analysis
approach called matrix geometric method.
The presentation of this analysis approach is associated with a discussion
on the properties and limits of Markovian fitting of the typical non-
Markovian behavior present in telecommunication networks.
1 Introduction
In the late 80’s, traffic measurement of high speed communication networks
indicated unexpectedly high variability and burstiness over several time scales,
which indicated the need of new modeling approaches capable to capture the
observed traffic features. The first promising approach, the fractal modeling of
high speed data traffic [28], resulted in a big bum in traffic theory. Since that
time a series of traffic models were proposed to describe real traffic behavior:
fractional Gaussian noises [30,37], traditional [7] and fractional ARIMA processes
[18], fractals and multifractals [49,13], etc.
A significant positive consequence of the new traffic engineering wave is that
the importance of traffic measurement and the proper statistical analysis of
measured datasets became widely accepted and measured datasets of a wide
range of real network configurations became publicly available [52].
In spite of the intensive research activity, there are still open problems asso-
ciated with these new traffic models:

This work is supported by the OTKA-T34972 grant of the Hungarian Research
Found.

406 A. Horváth and M. Telek
– None of the traffic models is evidently verified by the physical behavior of the
networks. The proposed models allow us to represent some of the features
of data traffic, but some other features are not captured. Which are the
important traffic features?
– The traffic features of measured data are checked via statistical tests and
the traffic features of the models are checked using analysis and simulation
methods. Are these tests correct enough? Is there enough data available for
reliable tests?
– The majority the proposed traffic models has important asymptotic proper-
ties, but all tests are based on finite datasets. Shall we draw consequence on
the asymptotic properties based on finite datasets? And vice-versa, shall we
draw consequence from the asymptotic model behavior on the performance
of finite systems.
– Having finite datasets the asymptotic properties extracted from tests per-
formed on different time scales often differ. Which is the dominant time scale
to consider?
The above listed questions refer to the correctness of traffic models. There
is an even more important issue which determines the utility of a traffic model,
which is computability. The majority of the mentioned traffic models are not
accompanied with effective analysis tools which would allow us to use them in
practical traffic engineering.
In this paper we discuss the application of Markovian models for traffic en-
gineering. The most evident advantage of this modeling approach with respect
to the above mentioned ones is that it is supported with a set of effective analy-
sis techniques called matrix geometric methods [34,35,27,29]. The other features
of Markovian models with respect to the answers of the above listed questions
are subjects to discussion. By the nature of Markovian models, non-exponential
asymptotic behavior cannot be captured, and hence, they are not suitable for
that purpose. Instead, recent research results show that Markovian models are
able to approximate arbitrary non-Markovian behavior for an arbitrary wide
range of scales.
The paper summarizes a traffic engineering procedure composed by the fol-
lowing steps:
– statistical analysis of measured traffic data,
– Markovian approximation of traffic processes,
– analysis of performance parameters based on the Markovian model.
All steps of this procedure are supported with a number of numerical example
and the results are verified against simulation and alternative analysis methods.
The paper is organized as follows. Section 2 discusses some relevant char-
acteristics of traffic processes and describe models that exhibit these features.
Statistical tests for identifying these characteristics in datasets are described in
Section 3. A short introduction to Markovian models is given in 4. An overview
of the existing fitting methods with connected application examples is given in
5. The survey is concluded in 6.
Markovian Modeling of Real Data Traffic 407
2 Traffic Models and Their Properties
The traffic process at a given point of a telecommunication network is charac-

terized by the data packet arrival instances (or equivalently by the interarrival
times) and the associated data packet sizes. Any of these two processes can
be composed by dependent or independent samples. In case of identically dis-
tributed independent samples the process modeling simplifies to capturing a
distribution, while in case of dependent samples the whole stochastic process
(with its intrinsic dependency structure) has to be captured as well.
2.1 Heavy Tailed Distributions
One of the important new observations of the intensive traffic measurement of

high speed telecommunication networks is the presence of heavy tailed distribu-
tions. Marginal distributions of specific traffic processes, file size distribution on
HTTP servers, etc, were found to be “heavy tailed”. The random variable Y ,
with cumulative distribution function (cdf) FY (x), is said to be heavy tailed if
1 − FY (x) = x−α L(x),
where L(x) is slowly varying as x → ∞, i.e., limx→∞ L(ax)/L(x) = 1 for a >

0. (There are several different naming conventions applied in this field. Heavy
tailed distributions are called regularly varying or power tail distributions also.)
Typical member of this distribution class is the Pareto family.
There is an important qualitative property of the moments of heavy tailed
distributions. If Y is heavy tailed with parameter α then its first n < α moments
E(Y n ) are finite and its all higher moments are infinite.
There are other classes of distributions whose tail decay slower than the
exponential. The random variable Y , with distribution FY (x), is said to be long
tailed if
lim eγx (1 − FY (x)) = ∞, ∀γ > 0

x→∞
The Weibull family (F (x) = 1 − e−(t/a) ) with c < 1 is long tailed, even if
c
all moments of the Weibull distributed random variables are finite. The heavy
tailed distributions form a subclass of the long tailed class.
A characteristic property of the heavy tailed class is the asymptotic relation
of the distribution of the sum of n samples, Sn = Y1 +. . .+Yn , and the maximum
of n samples, Mn = max1≤i≤n Yi :
P r(Sn > x) ∼ P r(Mn > x) (1)

(x)
where the notation g(x) ∼ f (x) denotes limx→∞ fg(x) = 1. In words, the sum
of heavy tailed random variables is dominated by a single large sample and the
rest of the samples are negligible small compare to the dominant one for large
values of x. The probability that Sn is dominated by more than one “large”

samples or it is obtained as the sum of number of small samples is negligible for
“large” values of Sn . This interpretation gives an intuitive explanation for a set
of complex results about the waiting time of queuing models with heavy tailed
service time distribution [6].
2.2 Processes with Long Range Dependence

The definition of long range dependence of traffic arrival processes is as follows.
Let us divide the time access into equidistant intervals of length Δ. The number
of arrivals in the ith interval is denoted by Xi . X = {Xi , i = 0, 1, . . . } is a
stochastic process whose aggregated process is defined as follows:
! 0
(m) X1 + . . . + Xm Xmk+1 + . . . + X(m+1)k
X (m) = {Xi } = ,... , ,...
m m
The autocorrelation function of X (m) is:

(m) (m)
(m)
E{(Xn − E(X (m) )) · (Xn+k − E(X (m) ))}
r (k) = (m)
E{(Xn − E(X (m) ))2 }
The process X exhibits long-range dependence (LRD) of index β if its auto-
correlation function can be realized as
r(k) ∼ A(k)k −β , k→∞
where A(k) is a slowly varying function.
Self-similar processes. Using the above definition of the aggregated process,

X is
d
a) exactly self-similar if X = m1−H X (m) , i.e., if X and X (m) are identical within
a scale factor in finite dimensional distribution sense.
b) exactly second-order self-similar if r(m) (k) = r(k), ∀m , k ≥ 0
c) asymptotically second-order self-similar if r(m) (k) → r(k), (k, m → ∞)
where H is the Hurst parameter, also referred to as the self-similarity parameter.
For exactly self-similar processes the scaling behavior, which is character-
ized by the Hurst parameter (H), can be checked based on any of the absolute
moments of the aggregated process:
log(E(|X (m) |q )) = log(E(|mH−1 X|q )) = q(H − 1)log(m) + log(E(|X|q )). (2)
According to (2), in case of a self-similar process, plotting log(E(|X (m) |q ))

against log(m) for fixed q results in a straight line. The slope of the line is
q(H − 1). Based on the above observations the test is performed as follows.
Having a series of length N , the moments may be estimated as
N/m
1 (m)
E(|X (m) q
| )= |Xi |q ,
N/m! i=1
where x! denotes the largest integer number smaller or equal to x. To test

for self-similarity log(E(|X (m) |q )) is plotted against log(m) and a straight line is
fitted to the curve. If the straight line shows good correspondence with the curve,
then the process is self-similar and its Hurst-parameter may be calculated by the
slope of the straight line. This approach assumes that the scaling behavior of all
absolute moments, q, are the same and it is captured by the Hurst-parameter.
If it is the case we talk about mono-fractal behavior. The variance-time plot,
which is used widespread to gain evidence of self-similarity, is the special case
with q = 2. It depicts the behavior of the 2nd moments for the centered data.
It is worth to point out that self-similarity and stationarity imply that either
E(X) = 0, or E(X) = ±∞, or H = 1. But H = 1 implies as well that Xi =
Xj , ∀i, j almost surely. As a consequence, to test for statistical self-similarity
makes sense only having zero-mean data, i.e., the data has to be centered before
the analysis.
Multi-fractal processes. Statistical tests of self-similarity try to gain evi-

dence through examining the behavior of the absolute moments E(|X (m) |q ). In
case of monofractal processes the scaling behavior of all absolute moments is
characterized by a single number, the Hurst parameter. Multifractal processes
might exhibit different scaling for different absolute moments. Multifractal anal-
ysis looks at the behavior of E(|X (m) |q ) for different values q and results in a
spectrum that illustrates the behavior of the absolute moments. This analysis
procedure is detailed in Section 3.3.
Fractional Gaussian noise. By now we provided the definition of the large

class of self-similar stochastic processes, but we did not provide any specific
member of this class. The two simplest self-similar processes that are often used
in validation of self-similar modeling assumptions are the fractional Gaussian
noise and the ARIMA process.
Fractional Gaussian noise, Xi , i ≥ 1, is the increment process of fractional
Brownian motion, B(t), t ∈ R+ :
Xi = B(i + 1) − B(i),
Fractional Brownian motion with Hurst parameter H (0.5 < H < 1) is

characterized by the following properties: i) B(t) has stationary increment,
ii) E(B(t)) = 0, iii) E(B 2 (t)) = t2H (assuming the time unit is such that
E(B 2 (1)) = 1), iv) B(t) has continuous path, v) B(t) is a Gaussian process, i.e.,
all of its finite dimensional distributions are Gaussian. The covariance of frac-
tional Brownian motion is E(B(t)·B(s)) = 1/2(s2H +t2H −|s−t|2H ), and hence,
the auto-covariance function of fractional Gaussian noise γ(h) = E(Xi Xi+h ) ∼
H(2H − 1)h2H−2 is positive and exhibits long-range dependence.
ARIMA process. An other simple self-similar process is the fractional

ARIMA(0,d,0) process. It is defined as:
∞

Xi = cj i−j
j=0
where i are i.i.d. standard normal random variables and the cj coefficients
Γ (j+d)
implement moving average with parameter d according to cj = Γ (d)Γ (j+1) . For
j d−1
large values of j the coefficients cj ∼ Γ (d) . The asymptotic behavior of the
auto-covariance function is
γ(h) = E(Xi Xi+h ) ∼ Cd h2d−1
with coefficient Cd = π −1 Γ (1 − 2d) sin(πd). For 0 < d < 1/2 the auto-covariance
function has the same polynomial decay as the auto-covariance function of frac-
tional Gaussian noise with H = d + 1/2.
The better choice among these two processes depends on the applied anal-
ysis method. The fractional Gaussian noise is better in exhibiting asymptotic
properties based on finite number of samples, while the generation of fractional
ARIMA process samples is easier since it is based on an explicit expression.
3 Statistical Analysis of Measured Traffic Datasets

3.1 Estimation of the Heavy Tail Index
In this section we discuss methods for identifying the heavy tail index of datasets.
Application of the methods is illustrated on the dataset EPA HTTP which can be
downloaded from [52] and contains a day of HTTP logs with about 40000 entries.
The experimental complementary cumulative distribution function (ccdf) of the
length of the requests is depicted in Figure 1.
3
1 H ill e s tim a to r
q q e s tim a to r
2 .5
0 .1
2
0 .0 1
e s tim a te
1 .5
c c d f
0 .0 0 1
1
0 .0 0 0 1 0 .5
1 e -0 5 0
0 .1 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 7 0 0 0 8 0 0 0 9 0 0 0 1 0 0 0 0
L e n g th in 1 0 0 b y te s k
Fig. 1. Experimental ccdf of the length of Fig. 2. The Hill- and the dynamic qq-plot
requests arriving to the server for the EPA trace
Hill estimator. A possible approach to estimate the index of the tail behavior
α is the Hill estimator [20]. This estimator provides the index as a function of
the k largest elements of the dataset and is defined as
−1
1
k−1
αn,k = log X(n−i) − log X(n−k) (3)
k i=0
where X(1) ≤ ... ≤ X(n) denotes the order statistics of the dataset. In practice,
the estimator given in (3) is plotted against k and if the plot stabilizes to a
constant value this provides an estimate of the index. The Hill-plot (together
with the dynamic qq-plot that will be described later) for the EPA trace is
depicted in Figure 2.
The idea behind the procedure and theoretical properties of the estimator
are discussed in [39]. Applicability of the Hill estimator is reduced by the fact
that
– its properties (e.g. confidence intervals) are known to hold only under con-
ditions that often cannot be validated in practice [39],
– the point at which the power-law tail begins must be determined and this
can be difficult because often the datasets do not show clear border between
the power-law tail and the non-power-low body of the distributions.
By slight modifications in the way the Hill plot is displayed, the uncertainty
of the estimation procedure can be somewhat reduced, see [39,40].
Quantile-quantile regression plot. The above described Hill estimator per-

forms well if the underlying distribution is close to Pareto. With the quantile-
quantile plot (qq-plot), which is a visual tool for assessing the presence of heavy
tails in distributions, one can check this. The qq-plot is commonly used in various
forms, see for example [8,41]. Hereinafter, among the various forms, we follow
the one presented in [25].
Having the order statistics X(1) ≤ ... ≤ X(n) plot
! 0
j
− log 1 − , log X(j) , n − k + 1 ≤ j ≤ n (4)
k+1
for a fixed value of k. (As one can see only the k upper order statistics is con-
sidered in the plot, the other part of the sample is neglected.) The plot, if the
data is close to Pareto, should be a straight line with slope 1/α. By determining
the slope of the straight line fitted to the points by least squares, we obtain the
so-called qq-estimator [25].
The qq-estimator can be visualized in two different ways. The dynamic qq-
plot, depicted in Figure 2, plots the estimate of α as the function of k (this plot
is similar to the Hill-plot). The static qq-plot, given in Figure 3, depicts (4) for a
fixed value of k and shows its least square fit. As for the Hill-plot, when applying
the qq-estimator, the point at which the tail begins has to be determined.
1 7 -0 .5
q q p lo t
le a s t s q u a r e fit
1 6 -1
1 5
-1 .5
L o g 1 0 (P [X > x ])
1 4
lo g s o r te d d a ta
-2
1 3
-2 .5 R a w D a ta
1 2 2 -A g g re g a te d
4 -A g g re g a te d
-3 8 -A g g re g a te d
1 1 1 6 -A g g re g a te d
3 2 -A g g re g a te d
1 0 -3 .5 6 4 -A g g re g a te d
1 2 8 -A g g re g a te d
p o w e r lo w fr a c tio n
9 -4
0 1 2 3 4 5 6 7 8 2 2 .5 3 3 .5 4 4 .5 5 5 .5 6 6 .5 7
q u a n tile s o f e x p o n e n tia l" L o g 1 0 ( s iz e - m e a n )
Fig. 3. Static qq-plot for the EPA trace Fig. 4. Complementary distribution func-
tion for different aggregation levels for the
EPA trace
Estimation based on the scaling properties. Another method is proposed

in [9] which, in contrast to the Hill- and qq-estimator, does not require to deter-
mine where the tail begins. The procedure is based on the scaling properties of
sums of heavy tailed distribution. The estimator, which is implemented in the
tool aest, determines the heavy tail index by exploring the complementary distri-
bution function of the dataset at different aggregation levels. For the EPA trace,
the index estimated by aest is 0.97. In order to aid further investigation, the tool
produces a plot of the complementary distribution function of the dataset at
different aggregation levels indicating the segments where heavy tailed behavior
is present. This plot for the considered dataset is depicted in Figure 4.
3.2 Tests for Long Range Dependency
Recently, it has been agreed [28,36,37] that when one studies the long-range
dependence of a traffic trace the most significant parameter to be estimated is
the degree of self-similarity, usually given by the so-called Hurst-parameter. The
aim of the statistical approach, based on the theory of self-similarity, is to find
the Hurst-parameter.
In this section methods for estimating the long-range dependence of datasets
are recalled. Beside the procedures described here, several other can be found in
the literature. See [3] for an exhaustive discussion on this subject.
It is important to note that the introduced statistical tests of self-similarity,
based on a finite number of samples, provides an approximate value of H only
for the considered range of scales. Nothing can be said about the higher scales
and the asymptotic behavior based on these tests.
Throughout the section, we illustrate the application of the estimators on
the first trace of the well-known Bellcore dataset set that contains local-area
network (LAN) traffic collected in 1989 on an Ethernet at the Bellcore Morris-
town Research and Engineering facility. It may be downloaded from the WEB
site collecting traffic traces [52]. The trace was first analyzed in [16].
Variance-time plot. One of the tests for pseudo self-similarity is the variance-
time plot. It is based on the fact that for self-similar time series {X1 , X2 , . . . }
Var(X (m) ) ∼ m−β , as m → ∞, 0 < β < 1.
The variance-time plot depicts Log(Var(X (m) )) versus Log(m). For pseudo self-
similar time series, the slope of the variance-time plot −β is greater than −1.
The Hurst parameter can be calculated as H = 1 − (β/2). A traffic process is
said to be pseudo self-similar when the empirical Hurst parameter is between 0.5
and 1.
The variance-time plot for the analyzed Bellcore trace is depicted in Figure
5. The Hurst-parameter given by the variance-time plot is 0.83.
V a r ia n c e tim e p lo t 4 .5
le a s t s q u a r e fit O r ig in a l tr a c e
4 (1 ,2 )
1 (1 ,5 )
3 .5 (2 ,1 0 )
(2 ,2 0 )
3
lo g 1 0 ( R /S ( n ) )
V a r ia n c e
0 .1 2 .5
1 .5
0 .0 1
1
0 .5
0 .0 0 1 0
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6 1 e + 0 7 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6
A g g r e g a tio n le v e l n
Fig. 5. Variance-time plot and its least Fig. 6. R/S plot and its least square fit
square fit for the Bellcore trace for the Bellcore trace
R/S plot. The R/S method is one of the oldest tests for self-similarity, it is
discussed in detailin [31]. For interarrival time series, Z = {Zi , i ≥ 1}, with
n
partial sum Yn = i=1 Zi , and sample variance
1 2
n
1
S 2 (n) = Zi − 2 · Yn2 ,
n i=1 n
the R/S statistic, or the rescaled adjusted range, is given by:

1 k k
R/S(n) = max Y (k) − Y (n) − min Y (k) − Y (n) .
S(n) 0≤k≤n n 0≤k≤n n
R/S(n) is the scaled difference between the fastest and the slowest arrival period
considering n arrivals. For stationary LRD processes R/S(n) ≈ (n/2)H . To
determine the Hurst parameter based on the R/S statistic the dataset is divided
into blocks, log[R/S(n)] is plotted versus log n and a straight line is fitted on
the points. The slope of the fitted line is the estimated Hurst parameter.
The R/S plot for the analyzed Bellcore trace is depicted in Figure 6. The
Hurst-parameter determined based on the R/S plot is 0.78.
Whittle estimator. The Whittle estimator is based on the maximum likeli-

hood principle assuming that the process under analysis is Gaussian. The esti-
mator, unlike the previous ones, provides the estimate through a non-graphical
method. This estimation takes more time to perform but it has the advantage
of providing confidence intervals as well. For details see [17,3]. For the Bellcore
trace, the estimated value of the Hurst parameter is 0.82 and its 95% confidence
interval is [0.79, 0.84].
3.3 Multifractal Framework

In this section we introduce two techniques to analyze multifractal processes.
Legendre spectrum. Considering a continuous-time process Y = {Y (t), t > 0}

the scaling of the absolute moments of the increments is observed through the
partition function
2n −1
1
−n −n q
T (q) = lim log2 E |Y ((k + 1)2 ) − Y (k2 )| . (5)
n→∞ −n
k=0
Then, a multifractal spectrum, the so-called Legendre spectrum is given as the

Legendre transform of (5)
fL (α) = T ∗ (α) = inf (qα − T (q))

q
Since T (q) is always concave, the Legendre spectrum fL (α) may be found by
simple calculations using that
T ∗ (α) = qα − T (q), and (T ∗ ) (α) = q at α = T (q). (6)
Let us mention here that there are also other kinds of fractal spectrum defined
in the fractal world (see for example [42]). The Legendre spectrum is the most
attractive one from numerical point of view, and even though in some cases it
is less informative than, for example, the large deviation spectrum, it provides
enough information in the cases considered herein.
In case of a discrete-time process X we assume that we are given the incre-
ments of a continuous-time process. This way, assuming that the sequence we
examine consists of N = 2L numbers, the sum in (5) becomes
N/2n −1
(2n ) q
Sn (q) = |Xk | , 0 ≤ n ≤ L, (7)
k=0
where the expectation is ignored. Ignoring the expectation is accurate for small n,
i.e., for the finer resolution levels. In order to estimate T (q), we plot log2 (Sn (q))
against (L − n), n = 0, 1, ..., L, then T (q) is found by the slope of the linear
line fitted to the curve. If the linear line shows good correspondence with the
8 0
4
¾ ËÕ
6 0
2

4 0 0
ËÕ
2 0 -2
Ë·½Õ
¾
0 -4
q = - 3
q = - 2
-2 0 q = - 1 -6
¾
q = 0
q = 1
-4 0 q = 2 -8
q = 3
q = 4
-6 0 -1 0
0 2 4 6 8 1 0 1 2 1 4 1 6 1 8 2 0 0 2 4 6 8 1 0 1 2 1 4 1 6 1 8

Fig. 7. Scaling of log-moments with lin- Fig. 8. Increments of log-moments for the
ear fits for the interarrival times of the interarrival times of the Bellcore pAug
Bellcore pAug trace trace
curve, i.e., if log2 (Sn (q)) scales linearly with log(n), then the sequence X can be
considered a multifractal process.
Figure 7, 9, 8 and 10 illustrate the above described procedure to obtain the
Legendre spectrum of the famous Bellcore pAug traffic trace (the trace may be
found at [52]). Figure 7 depicts the scaling behavior of the log moments calcu-
lated through (7). With q in the range [−3, 4], excluding the finest resolution
levels n = 0, 1 the moments show good linear scaling. For values of q outside the
range [−3, 4] the curves deviate more and more from linearity. As, for example,
in [43] one may look at non-integer values of q as well, but, in general, it does not
provide notably more information on the process. To better visualize the devia-
tion from linearity Figure 8 depicts the increments of the log-moment curves of
Figure 7. Completely horizontal lines would represent linear log-moment curves.
The partition function T (q) is depicted in Figure 9. The three slightly differ-
ent curves differ only in the considered range of the log-moments curves, since
different ranges result in different linear fitting. The lower bound of the linear
fitting is set to 3, 5 and 7, while the upper bound is 18 in each cases. (In the
rest of this paper the fitting range is 5 - 18 and there are 100 moments evaluated
in the range q ∈ [−5, +5].) Since the partition function varies only a little (its
derivative is in the range [0.8, 1.15]), it is not as informative as its Legendre
transform is (Figure 10). According to (6) the Legendre spectrum is as wide
as wide the range of derivatives of the partition function is, i.e., the more the
partition function deviates from linearity the wider the Legendre spectrum is.
The Legendre transform significantly amplifies the scaling information, but it is
also sensitive to the considered range of the log-moments curves.
See [43] for basic principles of interpreting the spectrum. We mention here
only that a curve like the one depicted in Figure 10 reveals a rich multifractal
spectrum. On the contrary, as it was shown in [51], the fractional Brownian
motion (fBm) has a trivial spectrum. The partition function of the fBm is a
straight line which indicates that its spectrum consists of one point, i.e., the
behavior of its log-moments is identical for any q.
4
3 1
5 3
2 7 0 .9 5 5
7
0 .9
0
0 .8 5

T (q )
-2 0 .8
0 .7 5
-4
0 .7
-6 0 .6 5
0 .6
-8
0 .5 5
-5 -4 -3 -2 -1 0 1 2 3 4 5 0 .8 0 .8 5 0 .9 0 .9 5 1 1 .0 5 1 .1 1 .1 5
q «
Fig. 9. Partition function estimated Fig. 10. The Legendre transform of the
through the linear fits shown in Figure 7 partition function (Figure 9) results in the
Legendre spectrum
Haar wavelet. Another way to carry out multiscale analysis is the Haar wavelet
transform. The choice of using the unnormalized version of the Haar wavelet
transform is motivated by the fact that it suits more the analysis of the Marko-
vian point process introduced further on.
The multiscale behavior of the finite sequence Xi , 1 ≤ i ≤ 2L will be repre-
sented by the quantities cj,k , dj,k , j = 0, . . . , L and k = 1, . . . , 2L /2j . The finest
resolution is described by c0,k , 1 ≤ k ≤ 2L which gives the finite sequence itself,
i.e., c0,k = Xk . Then the multiscale analysis based on the unnormalized Haar
wavelet transform is carried out by iterating
cj,k = cj−1,2k−1 + cj−1,2k , (8)
dj,k = cj−1,2k−1 − cj−1,2k , (9)

L j
for j = 1, . . . , L and k = 1, . . . , 2 /2 . The quantities cj,k , dj,k are the so-called
scaling and wavelet coefficients of the sequence, respectively, at scale j and po-
sition k. At each scale the coefficients are represented by the vectors cj = [cj,k ]
and dj = [dj,k ] with k = 1, . . . , 2L /2j . For what concerns cj , the higher j the
lower the resolution level at which we have information on the sequence. The
information that we lost as a result of the step from cj−1 to cj , is conveyed by
the sequence of wavelet coefficients dj . It is easy to see that cj−1 can be perfectly
reconstructed from cj and dj . As a consequence the whole Xi , 1 ≤ i ≤ 2L se-
quence can be constructed (in a top to bottom manner) based on a normalizing
2L
constant, cL = cL,1 = i=1 Xi , and the dj , j = 1, . . . , L vectors.
By taking the expectation of the square of (8) and (9)
E[c2j,k ] = E[c2j−1,2k−1 ] + 2E[cj−1,2k−1 cj−1,2k ] + E[c2j−1,2k ], (10)
E[d2j,k ] = E[c2j−1,2k−1 ] − 2E[cj−1,2k−1 cj−1,2k ] + E[c2j−1,2k ], (11)

Let us assume that the series we analyze are stationary; then, by summing (10)
and (11) and rearranging the equation, we have
1
E[c2j−1 ] = E[d2j ] + E[c2j ] . (12)
4
¿½
¿½
¾½ ¾¾
¾½ ¾¾
½½ ½¾ ½¿
½½ ½¾ ½¿ ½
¼½ ¼¾ ¼¿
½ ¾ ¿
Fig. 11. Haar wavelet transform
Similarly, by consecutive application of (12) from one scale to another, the

E[d2j ], j = 1, . . . , L series completely characterize the variance decay of the
2L
Xi , 1 ≤ i ≤ 2L sequence apart of a normalizing constant (cL = cL,1 = i=1 Xi ).
This fact allows us to realize a series with a given variance decay if it is possible
to control the 2nd moment of the scaling coefficient with the chosen synthesis
procedure. In Section 5 we will briefly discuss a method that attempts to capture
the multifractal scaling behavior via the series E[d2j ], j = 1, . . . , L.
4 Markovian Modeling Tools

Markovian modeling tools are stochastic processes whose stochastic behavior
depends only on the state of a “background” Markov chain. The research and
application of these modeling tools through the last 20 years resulted in a widely
accepted standard notation. The two classes of Markovian processes considered
in this paper are Phase type distributions and Markovian arrival processes. Here,
we concentrate our attention mainly on continuous time Markovian models, but
it is also possible to apply Markovian models in discrete time [33,5,27].
4.1 Phase Type Distribution

Z(t) is a continuous time Markov chain with n transient state and one absorbing
state. Its initial probability distribution is α̂ and generator matrix is B̂. The time
to reach the absorbing state, T , phase type distributed with representation α, B,
where α is the sub-vector of α̂ and B is the sub-matrix of B̂ associated with
the transient states. The cumulative distribution function (cdf), the probability
density function (pdf), and the moments of this distribution are:
FT (t) = 1 − αeBt h, fT (t) = αBeBt h, E[X i ] = i! α (−B)−i h,
where h is the column vector of ones. The number of unknown in the α, B
representation of a PH distribution is O(n2 ).
C o n s t r a i n t s

½ ¾

½ ¾
¼ ·½ ½ ½
Fig. 12. Canonical form for Acyclic continuous-time PH distributions
When Z(t) is an acyclic Markov chain the associated PH distribution is

referred to as Acyclic PH (APH) distribution. The popularity of APH distri-
butions (specially in PH fitting) lies in the fact that all APH distributions can
be uniquely transformed into a canonical from (Figure 12) which has only O(n)
parameters [10] and the flexibility of the PH and the APH class of the same
order is very close. E.g., the 2nd order PH and APH classes exhibit the same
moments bounds [50].
4.2 Markovian Arrival Process

Let Z(t) be an irreducible Markov chain with finite state space of size m and
generator Q. An arrival process is associated with this Markov chain in the
following way:
– while the Markov chain stays in state i arrival occurs at rate λi ,
– when the Markov chain undergoes a state transition from i to j arrival occurs
with probability pij .
The standard description of MAPs is given with matrices D0 and D1 of size
(m × m), where D0 contains the transition rates of the Markov chain which are
not accompanied with arrivals and D1 contains the transition rates which are
accompanied with arrivals, i.e.:
– D0ij = (1 − pij )Qij , for i = j and D0ii = Qii − λi ;
– D1ij = pij Qij for, i = j and D1ii = λi .
Many familiar arrival processes represent special cases of MAPs:
– the Poisson process (MAP with a single state),
– interrupted Poisson process: a two-state MAP in which arrivals occur only
in one of the states and state jumps do not cause arrival,
– Markov modulated Poisson process: state jumps do not give rise to arrivals.
The class of MAPs is closed for superposition and Markovian splitting.
5 Fitting Markovian Models to Datasets

Fitting a Markovian model to a measured dataset is to find a Markovian model
which exhibits a stochastic behavior as close to the one of the measured dataset
as possible. In practice, the order of approximate Markov models should kept
low, both, for having few model parameters to evaluate and for obtaining com-
putable models. The presence of slow decay behavior (heavy tail or long range
correlation) in measured datasets makes the fitting more difficult. Typically a
huge number of samples needed to obtain a fairly reliable view on the stochastic
behavior over a range of several orders of magnitude, and, of course, the asymp-
totic behavior can not be checked based on finite datasets. A class of fitting
methods approximates the asymptotic behavior based on the reliably known
ranges (e.g., based on 106 i.i.d. samples the cdf. can be approximated up to the
1 − F (x) ∼ 10−4 − 10−5 limit). The asymptotic methods are based on the as-
sumption that the dominant parameters (e.g., tail decay, correlation decay) of
the known ranges remain unchanged in the unknown region up to the asymptotic
limit.
Unfortunately, Markovian models can not exhibit any complex asymptotic
behavior. In the asymptotic region Markovian models have exponential tail de-
cay or autocorrelation. Due to this dominant property Markovian models were
not considered for fitting datasets with slow decaying features for a long time.
Recently, in spite of the exponential asymptotic decay behavior, Markovian mod-
els with slow decay behavior for several orders of magnitude were introduced.
These results broaden the attention from asymptotically slow decay models to
models with slow decay in given predefined range. The main focus of this paper
is on the use of Markovian models with slow decay behavior in applied traffic
engineering.
A finite dataset provides only a limited information about the stochastic
properties of traffic processes. Especially, the long range and the asymptotic
behavior cannot be extracted from finite dataset. To overcome the lack of these
important model properties the set of information provided by the dataset is
often accompanied by engineering assumptions in practice. One of the most
commonly applied traffic engineering assumptions is that the decay trends of a
known region continuous to infinity.
The use of engineering assumptions has a significant role in model fitting as
well. With this respect there are two major classes of fitting methods:
– fitting based on al the samples,
– fitting based on information extracted from the samples,
Naturally, there are methods which combines these two approaches.
The fitting methods based on extracted information find their roots in traffic
engineering assumptions. It is a common goal in traffic engineering to find a
simple (characterized by few parameters), but robust (widely applicable) traffic
model which is based on few representative traffic parameters of network traffic.
The traffic models discussed in Section 2 are completely characterized by very few
parameters. E.g., the tail behavior of a power tail distribution is characterized by
the heavy tail index α, fractional Gaussian noise is characterized by parameter H
and the variance over a natural time unit. Assuming that there is representative
information of the dataset, it is worth to complete the model fitting based on
this compact description of the traffic properties instead of using all the very
large dataset. Unfortunately, a commonly accepted, accurate and compact traffic
characterization is not available up to now. This way, when the fitting is based on
extracted information, the goodness of fitting strongly depend on the descriptive
power of the selected characteristics to be fitted.
In this section we introduce a selected set of fitting methods from both classes.
The fitting methods that are based on extracted information are composed by
two mains steps: the statistical analysis of the dataset to extract representative
properties and the fitting itself based on these properties. The first step of this
procedure is based on the methods presented in the previous section, and only
the second step is considered here.
5.1 PH Fitting
General PH fitting methods minimizes a distance measure between the experi-

mental distribution and the approximate PH one. The most commonly
applied
) ∞
f (t)
distance measure is the relative entropy: f (t) log dt where f (t)
0 fˆ(t)
and fˆ(t) denote the pdf of the distribution to be fitted and that of the fitting
distribution, respectively. The number of parameters to minimize in this proce-
dure depends on the order of the approximate PH model. The required order of
PH models can be approximated based on the dataset [48], but usually small
models are preferred in practice for computational convenience. It is a common
feature of the relative entropy and other distance measures that the distance is
a non-linear function of the PH parameters.
General PH fitting methods might perform poorly in fitting slow decaying tail
behavior [22]. As an alternative, heuristic fitting procedures can be applied that
focus on capturing the tail decay behavior. In case of heuristic fitting methods,
the goal is not to minimize a properly defined distance measure, but to construct
a PH distribution which fulfills a set of heuristic requirements.
According to the above classification of fitting procedures general fitting
methods commonly belong to the fitting based on samples class and heuristic
fitting methods to the fitting to extracted model properties class.
The literature of general PH fitting methods is quite large. A set of methods
with a comparison of their fitting properties are presented in [26]. Here we con-
sider only those methods which were applied for fitting slowly decaying behavior
in [11] and [22]. Among the heuristic methods we discuss the one proposed in
[14] and its extension in [22].
EM method. The expectation maximization (EM) method was proposed to

apply for PH fitting in [2]. It is a statistical method which performs an iter-
ative optimization over the space of the PH parameters to minimize the rela-
tive entropy. It differs from other relative entropy minimizing methods in the
way it searches for the minimum of the non-linear distance measure. Based on
the fact that hyper-exponential distributions can capture slow decay behavior
([14]), a specialized version of the EM algorithm, which fits the dataset with
hyper-exponential distributions, is applied for fitting measured traffic datasets

in [11].
Starting from an initial guess α(0) , λ(0) and denoting the pdf of the hyper-
exponential distribution with initial probability vector α and intensity vector
λ by fˆ(t|α, λ), the iterative procedure calculates consecutive hyper-exponential
distributions based on the samples t1 , . . . , tN as:
1 αi fˆ(tn |ei , λ(k) )

N (k)
1 αi fˆ(tn |ei , λ(k) ) N n=1 fˆ(tn |α(k) , λ(k) )

N (k)
(k+1) (k+1)
αi = , αi =
N n=1 fˆ(tn |α(k) , λ(k) ) 1 αi fˆ(tn |ei , λ(k) )
N (k)
tn
N n=1 fˆ(tn |α(k) , λ(k) )
where ei is the vector of zeros with a one at the ith position.

The computational complexity of this simplified method using hyper-
exponential distributions is much less than the one for the whole PH class.
Nevertheless, a reliable view on the (slow decaying) tail behavior requires very
large number of samples. The complexity of the simplified fitting method is still
proportional to the size of the dataset, hence the applicability of this approach
is limited by computational complexity (∼ 107 samples were reported in [11]).
On the other hand, due to the strict structure of hyper-exponential distributions
(e.g., there is no fork in the structure), less iterations are required to reach a
reasonable accuracy (5 − 10 iterations were found to be sufficient in [11]).
This simplified EM fitting method is a potential choice for model fitting
when we have a large dataset, but we do not have or do not want to apply any
engineering assumption on the properties of the dataset.
Tail fitting based on the ccdf. The method proposed by Feldmann and Whitt
[14] is a recursive fitting procedure that results in a hyper-exponential distribu-
tion whose cumulative distribution function (ccdf) at a given set of points is
“very close” to the ccdf of the original distribution. This method was success-
fully applied to fit Pareto and Weibull distributions.
Combined fitting method. In [22] a PH fitting method is proposed that

handles the fitting of the body and the fitting of the tail in a separate manner.
This is done by combining the method proposed by Feldmann and Whitt [14]
and a general method to.
The limitation of this combined method comes from the limitation of the
method of Feldmann and Whitt. Their method is applicable only for fitting
distributions with monotone decreasing density function. Hence the proposed
combined method is applicable when the tail of the distribution is with monotone
decreasing density. In the case of the combined method, this restriction is quite
loose since the border of the main part and the tail of the distribution is arbitrary,
hence the restriction of applicability is to have a positive number C such that
the density of the distribution is monotone decreasing above C.
C o n s t r a i n t s
½ ¾

½ ¾
B o d y
¼ ·½ ½ ½
¼ ½
½ ¾
½ ¾
T a i l
Fig. 13. Structure of approximate Phase type distribution
O r ig .
0 .0 1 C F 1 + H y p e re x p .
C F 1
H y p e re x p .
0 .0 0 0 1
1 e -0 6
1 e -0 8
1 e -1 0
1 e -1 2
1 e -1 4
1 e -1 6
1 e -1 8
1 e -2 0
0 .0 1 0 .1 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6 1 e + 0 7 1 e + 0 8
Fig. 14. Different parts of the pdf are approximated by different parts of the PH
structure
The result of this fitting algorithm is a Phase type distribution of order n+m,
where n is the number of phases used for fitting the body and m is the number
of phases used for fitting the tail. The structure of this Phase type distribution is
depicted in Figure 13 where we have marked the phases used to fit the body and
those to fit the tail. The parameters β1 , . . . , βm , μ1 , . . . , μm are computed by
considering the tail while the parameters α1 , . . . , αm , λ1 , . . . , λ2 are determined
considering the main part of the distribution.
To illustrate the combined fitting method, we consider the following Pareto-
like distributions [45]:
!
αB −1 e− B t
α
for t ≤ B
Pareto I: f (t) =
αB α e−α t−(α+1) for t > B
bα e−b/t −(α+1)
Pareto II: f (t) = x
Γ (α)
For both ditributions α is the heavy tail index.

Figure 14 pictures how different parts of the PH structure (Figure 13)
contributes to the pdf when fitting distribution Pareto I with parameters
α = 1.5, B = 4. In this case 8 phases are used to fit the body and 10 to fit
the tail.
0 .5 1
O r ig . O r ig .
0 .4 5 8 + 4 M L 8 + 4 M L
8 + 4 A D 8 + 4 A D
8 + 1 0 M L 8 + 1 0 M L
0 .4 8 + 1 0 A D 1 e -0 5 8 + 1 0 A D
0 .3 5
0 .3 1 e -1 0
0 .2 5
0 .2 1 e -1 5
0 .1 5
0 .1 1 e -2 0
0 .0 5
0 1 e -2 5
0 1 2 3 4 5 6 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6 1 e + 0 7 1 e + 0 8
pdf of body pdf of tail
Fig. 15. Pareto II distribution and its PH approximation with the combined method
0 .2 0 .1
O r ig . O r ig .
0 .1 8 8 + 1 0 M L 0 .0 1 8 + 1 0 M L
8 + 1 0 A D 8 + 1 0 A D
0 .1 6 0 .0 0 1
0 .0 0 0 1
0 .1 4
1 e -0 5
0 .1 2
1 e -0 6
0 .1
1 e -0 7
0 .0 8
1 e -0 8
0 .0 6
1 e -0 9
0 .0 4 1 e -1 0
0 .0 2 1 e -1 1
0 1 e -1 2
0 2 4 6 8 1 0 1 2 1 4 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6 1 e + 0 7
Body Tail
Fig. 16. Queue length distribution of an M/G/1 queue and its approximate M/PH/1
queue
Figure 15 illustrates the fitting of distribution Pareto II with parameters

α = 1.5, b = 2. In the legend of the figure ML indicates that the relative en-
tropy measure was applied to fit the main part (corresponding to the maximum
likelihood principle), while AD stands for area difference of the pdf. Still in the
legend, X+Y means that X phases was used to fit the body, while Y to fit the
tail. Figures 16 shows the effect of Phase type fitting on the M/G/1 queue be-
haviour with Pareto II service (utilization is 0.8). Exact result of the M/G/1
queue was computed with the method of [45].
At this point we take detour to discrete-time models. Discrete-time counter-
part of the fitting method, i.e. when discrete-time PH distributions are applied,
is given in [24]. We apply discrete PH distributions to fit the EPA trace. The
ccdf of the body and the tail of the resulting discrete PH distribution are shown
in Figure 17 and 18. In Figure 18 we depicted the polynomial fit of the tail
behaviour as well.
5.2 MAP Fitting Based on Samples

Similarly to the case of PH fitting, MAP fitting methods can be classified as gen-
eral and heuristic ones. General methods utilize directly the data samples, and
hence they do not require any additional engineering knowledge. Our numerical
experiences show that MAP fitting is a far more difficult task than PH fitting.
1
D a ta s e t P o ly n o m ia l ta il
0 .9 F ittin g I D a ta s e t
F ittin g II F ittin g I
0 .1 F ittin g II
0 .8
0 .7
0 .6 0 .0 1
c c d f
c c d f
0 .5
0 .4
0 .0 0 1
0 .3
0 .2
0 .1 0 .0 0 0 1
0 1 0 2 0 3 0 4 0 5 0 6 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0
L e n g th in 1 0 0 b y te s L e n g th in 1 0 0 b y te s
Fig. 17. Body of the approximating dis- Fig. 18. Tail of the approximating distri-
tributions butions
A simple explanation is that fitting a process is more difficult than fittign a dis-
tribution. Capturing slow decaying behaviour with general MAP fitting seems
impossible.
Anyhow, there are numerical methods available for fitting low order MAPs
directly to datasets. In [32,15,46] a fitting method based on maximum likelihood
estimate is presented, and in [47] the EM method is used for maximizing the
likelihood estimate.
Simple numerical tests (like taking a MAP, drawing samples from it, and
fitting a MAP of the same order to these samples) often fail for MAPs of higher
order (≥ 3) and the accuracy of the method does not necessarily improve with
increasing number of samples.
5.3 Heuristic MAP Fitting
An alternative to general MAP fitting is to extract a set of (hopefully) domi-

nant properties of the traffic process from the dataset and to create a MAP (of
particular structure) that exhibits the same properties. This kind of heuristic
methods fail to satisfy the above mentioned “self test” by their nature, but if
the selected set of parameters are really dominant with respect to the goal of
the analysis we can achieve “sufficient” fitting. [19] proposed to fit the following
parameters: mean arrival rate, variance to mean ratio of arrivals in (0, t), and its
asymptotic limit. After the notion of long range dependence in traffic processes
the Hurst parameter was added to this list. The following subsections introduces
heuristic fitting methods with various properties to capture and various fitting
MAP structures.
MAP structures approximating long range dependent behaviour. An

intuitive way to provide long range dependent behaviour for several time scales
with Markovian models is to compose a combined model from small pieces each
of which represents the model behaviour at a selected range of the time scales.
One of the first models of this kind was proposed in [44]. The same approach
was applied for traffic fitting in [38], but recently this approach is criticized
in [12]. Renewal processes with heavy tailed interarrival times also exhibit self-
similar properties. Using this fact the approximate heavy tailed PH distributions
can be used to create a MAP with PH renewal process. In [1] superposition of 2
state MMPPs are used for approximating 2nd order self-similarity. The proposed
procedure fits the mean arrival rate, the 1-lag correlation, the Hurst parameter
and the required range of fitting.
Fitting based on separate handling of long- and short-range dependent

behavior. In [21] a procedure is given to construct a MAP such a way that some
parameters of the traffic generated by the model match predefined values. The
following parameters are set:
– The fundamental arrival rate describes the expected number of arrivals in a
time unit.
– In order to describe the burstiness of the arrival stream, the index of disper-
sion for counts I(t) = Var(Nt )/E(Nt ) is set for two different values of time:
I(t1 ) and I(t2 ). The choice of these two time points significantly affects the
goodness of fitting.
– A higher order descriptor, the third centralized moment of the number of
arrivals in the interval (0, t3 ), M (t3 ) = E[(Nt3 − E(Nt3 ))3 ] is set.
– The degree of pseudo self-similarity is defined by the Hurst parameter H.
The Hurst parameter is realized in terms of the variance-time behavior of the
resulting traffic, i.e., the straight line fitted by regression to the variance-time
curve in a predefined interval has slope 2(H − 1).
The MAP resulting from the procedure is the superposition of a PH arrival
process and a two-state MMPP. In the following we sketch how to construct a PH
arrival process with pseudo self-similar behavior and describe the superposition
of this PH arrival process with a two-state MMPP. Detailed description of the
procedure is given in [21].
Let us consider an arrival process whose interarrival times are independent
random variables with heavy tail probability density function (pdf) of Pareto
type
c · ac
f (x) = , x ≥ 0. (13)
(x + a)c+1
The process Xn (n > 0) representing the number of arrivals in the nth time-slot
is asymptotically second-order self-similar with Hurst parameter H = (3 − c)/2
([49]).
Using the method of Feldman and Whitt [14] one may build an arrival process
whose interarrival times are independent, identically distributed PH random
variables with pdf approximating (13). To check pseudo self-similarity of this
PH renewal processes Figure 19 plots V ar(X (m) ) of PH arrival processes whose
interarrival time is a 6 phase PH approximation of the pdf given in (13) for
different values of c. As it can be observed V ar(X (m) ) is close through several
orders of magnitude to the straight line corresponding to the self-similar case
with slope 2(H − 1). The aggregation level where V ar(X (m) ) drops compared to
the straight line may be increased by changing the parameters of the PH fitting
algorithm.
1 0 1 0
P H r e n e w a l, H = 0 .8 IP P
P H r e n e w a l, H = 0 .7 P H
1 P H r e n e w a l, H = 0 .6 1 S u p e rp o s e d
0 .1
0 .1
0 .0 1
V a r ia n c e
V a r ia n c e
0 .0 1
0 .0 0 1
0 .0 0 1
0 .0 0 0 1
0 .0 0 0 1 1 e -0 5
1 e -0 5 1 e -0 6
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6 1 e + 0 7 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6 1 e + 0 7
A g g r e g a tio n le v e l A g g r e g a tio n le v e l
Fig. 19. Variance-time plot of pseudo self- Fig. 20. Superposition of the PH arrival
similar arrival processes with i.i.d. PH in- process with an MMPP
terarrival
The parameters of the two-state MMPP with which the PH arrival process
is superposed are calculated in two steps:
1. At first we calculate the parameters of an Interrupted Poisson Process (IPP).
The IPP is a two-state MMPP that has one of its two arrival rates equal
to 0. The calculated parameters of the IPP are such that the superposition
of the PH arrival process and the IPP results in a traffic source with the
desired first and second order parameters E(N1 ), I(t1 ) and I(t2 ).
2. In the second step, based on the IPP we find a two-state MMPP that has the
same first and second order properties as the IPP has (recalling results from
[4]), and with which the superposition results in the desired third centralized
moment.
If the MMPP is “less long-range dependent” than the PH arrival process,
the pseudo self-similarity of the superposed traffic model will be dominated by
the PH arrival process. This fact is depicted in Figure 20. It can be observed
that if the Hurst parameter is estimated based on the variance-time plot the
Hurst parameter of the superposed model is only slightly smaller than the Hurst
parameter of the PH arrival process. In numbers, the Hurst parameter of the
PH arrival process is 0.8 while it is 0.78 for the superposed model (based on the
slope in the interval (10, 106 )). This behavior is utilized in the fitting method to
approximate the short and long range behavior in a separate manner.
We illustrate the procedure by fitting the Bellcore trace. Variance-time plots
of the traffic generated by the MAPs resulted from the fitting are depicted in
Figure 21. The curve signed by (x1 , x2 ) belongs to the fitting when the first
(second) time point of fitting the IDC value, t1 (t2 ), is x1 (x2 ) times the expected
interarrival time. R/S plots for both the real traffic trace and the traffic generated
by the approximating MAPs are given in Figure 22. The fitting of the traces
4 .5
O r ig in a l tr a c e O r ig in a l tr a c e
(1 ,2 ) 4 (1 ,2 )
(1 ,5 ) (1 ,5 )
1 (2 ,1 0 ) (2 ,1 0 )
3 .5
(2 ,2 0 ) (2 ,2 0 )
3
lo g 1 0 ( R /S ( n ) )
2 .5
V a r ia n c e
0 .1
2
1 .5
0 .0 1
1
0 .5
0 .0 0 1 0
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6 1 e + 0 7 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6
A g g r e g a tio n le v e l n
Fig. 21. Variance-time plots of MAPs Fig. 22. R/S plots of MAPs with different
with different time points of IDC match- time points of IDC matching
ing
were tested by a •/D/1 queue, as well. The results are depicted in Figure 23.
The •/D/1 queue was analyzed by simulation with different levels of utilization
of the server. As one may observe the lower t1 and t2 the longer the queue length
distribution follows the original one.
The fitting method provides a MAP whose some parameters are the same
as those of the original traffic process (or very close). Still, the queue length
distribution does not show a good match. This means that the chosen parameters
do not capture all the important characteristics of the traffic trace.
1 1
(1 ,2 ) (1 ,2 )
0 .1 (1 ,5 ) 0 .1 (1 ,5 )
(2 ,1 0 ) (2 ,1 0 )
0 .0 1 (2 ,2 0 ) 0 .0 1 (2 ,2 0 )
0 .0 0 1 0 .0 0 1
P r o b a b ility
P r o b a b ility
0 .0 0 0 1 0 .0 0 0 1
1 e -0 5 1 e -0 5
1 e -0 6 1 e -0 6
1 e -0 7 1 e -0 7
1 e -0 8 1 e -0 8
1 1 0 1 0 0 1 1 0 1 0 0 1 0 0 0
Q u e u e le n g th Q u e u e le n g th
ρ = 0.2 ρ = 0.4
1 1
(1 ,2 ) (1 ,2 )
0 .1 (1 ,5 ) 0 .1 (1 ,5 )
(2 ,1 0 ) (2 ,1 0 )
0 .0 1 (2 ,2 0 ) 0 .0 1 (2 ,2 0 )
0 .0 0 1 0 .0 0 1
P r o b a b ility
P r o b a b ility
0 .0 0 0 1 0 .0 0 0 1
1 e -0 5 1 e -0 5
1 e -0 6 1 e -0 6
1 e -0 7 1 e -0 7
1 e -0 8 1 e -0 8
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0
ρ = 0.6 ρ = 0.8
Fig. 23. Queue-length distribution

MMPP Exhibiting Multifractal Behavior. In [23] a special MMPP struc-

ture is proposed to exhibit multifractal behavior. The background CTMC of the
MMPP has a symmetric n-dimensional cube structure and the arrival intensities
are set according to the variation of the arrival process at the different time
scales. The special choice of the structure is motivated by the generation of the
Haar wavelet transform. Basically the Haar wavelet transform evaluates the vari-
ation of the dataset at different aggregation levels (time scales), and similarly,
the proposed MMPP structure provide different variation of the arrival rate at
different time scales.
The composition of the proposed MMPP structure is similar to the generation
of the Haar wavelet transform (a procedure for traffic trace generation based on
this transform is introduced in [43]). Without loss of generality, we assume that
the time unit is such that the long term arrival intensity is one. A MMPP of one
state with arrival rate 1 represents the arrival process at the largest (considered)
time scale.
At the next time scale, 1/λ, an MMPP of two states with generator
−λ λ
λ −λ
and with arrival rates 1 − a1 and 1 + a1 (−1 ≤ a1 ≤ 1) represents the variation

of the arrival process. This composition leaves the long term average arrival rate
unchanged.
In the rest of the composition we perform the same step. We introduce a new
dimension and generate the n-dimensional cube such that the behavior at the
already set time scales remains unchanged. E.g., considering also the 1/γλ time
scale an MMPP of four states with generator
• λ γλ
λ • γλ
γλ • λ
γλ λ •
and with arrival rates (1 − a1 )(1 − a2 ), (1 + a1 )(1 − a2 ), (1 − a1 )(1 + a2 ) and (1 +

a1 )(1+a2 ) (−1 ≤ a1 , a2 ≤ 1) represents the variation of the arrival process. With
this MMPP, parameter a1 (a2 ) determines the variance of the arrival process at
the 1/λ (1/γλ) time scale. If γ is large enough (>∼ 30) the process behavior at
the 1/λ time scale is independent of a2 . The proposed model is also applicable
with a small γ. In this case, the only difference is that the model parameters
and the process behavior of different time scales are dependent.
A level n MMPP of the proposed structure is composed by 2n states and it
has n + 2 parameters. Parameters γ and λ defines the considered time scales,
and parameters a1 , a2 , . . . , an determines the variance of the arrival process at
the n considered time scales. It can be seen that the ratio of the largest and the
smallest considered time scales is γ n . Having a fixed n (i.e., a fixed cardinality
of the MMPP), any large ratio of the largest and the smallest considered time
scales can be captured by using a sufficiently large γ.
A simple numerical procedure can be applied to fit a MMPP of the given

structure to a measured dataset. This heuristic approach is composed by “engi-
neering considerations” based on the properties of the measured dataset and a
parameter fitting method.
First, we fix the value of n. According to our experience a “visible” multiscal-
ing behavior can be obtained from n = 3 ∼ 4. The computational complexity
of the fitting procedure grows exponentially with the dimension of the MMPP.
The response time with n = 6 (MMPP of 64 states) is still acceptable (in the
order of minutes).
Similarly to [43], we set γ and the λ based on the inspection of the dataset.
Practically, we define the largest, TM , and the smallest, Tm , considered time
scales and calculate γ and λ from
1 1
TM = ; Tm = .
λ γnλ
The extreme values of TM and Tm can be set based on simple practical
considerations. For example when the measured dataset is composed by N arrival
instances, TM can be chosen to be less than the mean time of N/4 arrivals, and
Tm can be chosen to be greater than the mean time of 4 arrivals. A similar
approach was applied in [43]. These boundary values can be refined based on a
detailed statistical test of the dataset. E.g., if the scaling behavior disappears
beyond a given time scale, TM can be set to that value.
Having γ and λ, we apply a downhill simplex method to find the optimal
values of the variability parameters a1 , a2 , . . . , an . The goal function that our
parameter fitting method minimizes is the sum of the relative errors of the second
moment of Haar wavelet coefficients up to a predefined time scale S:

S
|E(d2j ) − E(dˆ2j )|
min .
a1 ,... ,an
j=1
E(d2j )
The goal function can be calculated analytically as it is described in [23].

Application of the fitting procedure is illustrated on the Bellcore trace. We
applied the fitting method with n = 5 and several different predefined setting
of γ, λ. We found that the goodness of the fitting is not very sensitive to the
predefined parameters around a reasonable region. The best “looking” fit is
obtained when Tm is the mean time of 16 arrivals and γ = 8. In this case TM is the
mean time of 16∗85 = 219 arrivals which corresponds to the coarsest time scale we
can analyze in the case of the Bellcore trace. The simplex method minimizing the
sum of the relative error of the second moments of the Haar wavelet coefficients
over S = 12 time scales resulted in: a1 = 0.144, a2 = 0.184, a3 = 0.184, a4 =
0.306, a5 = 0.687. The result of fitting the second moment of the Haar wavelet
transform at different aggregation levels is plotted in Figure 24. At small time
scales the fitting seems to be perfect, while at larger time scales the error enlarges.
The slope of the curves are almost equal in the depicted range.
The multiscaling behavior of the obtained MAP and of the original dataset
are illustrated via the log-moment curves in Figure 25. In the figure, the symbols
1 e + 1 0 8 0
O r ig in a l tr a c e
1 e + 0 9 A p p r o x im a tin g tr a c e
6 0
1 e + 0 8
1 e + 0 7 4 0
ËÕ
¾
1 e + 0 6
2 0
1 0 0 0 0 0
¾
1 0 0 0 0 0
q = - 3
q = - 2
1 0 0 0 -2 0 q = - 1
q = 0
1 0 0 q = 1
-4 0 q = 2
1 0 q = 3
q = 4
1 -6 0
0 2 4 6 8 1 0 1 2 1 4 1 6 1 8 2 0 0 2 4 6 8 1 0 1 2 1 4 1 6 1 8 2 0

Fig. 24. The second moment of the Haar Fig. 25. Scaling of log-moments of the
wavelet transform at different aggregation original trace and the fitting MMPP
levels
represent the log-moment curves of the fitting MAP and the solid lines indicate
the corresponding log-moment curves of the Bellcore trace. In the range of n ∈
(3, 19) the log-moment curves of the fitting MAP are very close to the ones of
the original trace. The log-moment curves of the approximate MAP are also very
close to linear in the considered range.
4
O r ig in a l tr a c e
A p p r o x im a tin g tr a c e 1
2
0 .9 5
0 0 .9
0 .8 5

T (q )
-2
0 .8
-4 0 .7 5
0 .7
-6
0 .6 5
O r ig in a l s p e c tr u m
-8 A p p r o x im a tin g s p e c tr u m
0 .6
-5 -4 -3 -2 -1 0 1 2 3 4 5 0 .8 5 0 .9 0 .9 5 1 1 .0 5 1 .1 1 .1 5
q «
Fig. 26. Partition function estimated Fig. 27. The Legendre transform of the
through the linear fits shown in Figure 25 original dataset and the one of the ap-
proximate MMPP
The partition functions of the fitting MAP and of the original trace are
depicted in Figure 26. As it was mentioned earlier, the visual appearance of
the partition function is not very informative about the multifractal scaling
behavior. Figure 27 depicts the Legendre transform of the partition functions
of the original dataset and the approximating MAP. The visual appearance of
the Legendre transform significantly amplifies the differences of the partition
functions. In Figure 27, it can be seen that both processes exhibit multifractal
behavior but the original dataset has a bit richer multifractal spectrum.
We also compared the queuing behavior of the original dataset with that of
the approximate MAP assuming deterministic service time and different levels of
utilization, ρ. Figure 28 depicts the queue length distribution resulting from the
1 1
A p p r o x im a tin g tr a c e A p p r o x im a tin g tr a c e
0 .1 0 .1
0 .0 1
0 .0 1
0 .0 0 1
P r o b a b ility
P r o b a b ility
0 .0 0 1
0 .0 0 0 1
0 .0 0 0 1
1 e -0 5
1 e -0 5
1 e -0 6
1 e -0 7 1 e -0 6
1 e -0 8 1 e -0 7
1 1 0 1 0 0 1 1 0 1 0 0 1 0 0 0
ρ = 0.2 ρ = 0.4
1 1
A p p r o x im a tin g tr a c e A p p r o x im a tin g tr a c e
0 .1 0 .1
0 .0 1 0 .0 1
0 .0 0 1 0 .0 0 1
P r o b a b ility
P r o b a b ility
0 .0 0 0 1 0 .0 0 0 1
1 e -0 5 1 e -0 5
1 e -0 6 1 e -0 6
1 e -0 7 1 e -0 7
1 e -0 8 1 e -0 8
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0
ρ = 0.6 ρ = 0.8
Fig. 28. Queue-length distribution
original and the approximate arrival processes. The queue length distribution
curves show a quite close fit. The probability of an empty queue, which is not
displayed in the figures, is the same for the MAP as for the original trace since
the MAP has the same average arrival intensity as the original trace. The fit is
better with a higher queue utilization, which might mean that different scaling
behaviors play a dominant rule at different utilizations, and the ones that are
dominant at high utilization are better approximated by the proposed MAP.
6 Conclusions
This paper collects a set of methods which can be used in practice for mea-
surement based traffic engineering. The history of traffic theory of high speed
communication networks is summarized together with a short introduction to
the mathematical foundation of the applied concepts. The common statistical
methods for the analysis of data traces and the practical problems of their ap-
plication is discussed.
The use of Markovian methods is motivated by the fact that an effective
analysis technique, the matrix geometric method, is available for the evalua-
tion of Markovian queuing systems. To obtain the Markovian approximation
of measured traffic data a variety of heuristic fitting methods are applied. The
properties and abilities of these methods are also discussed.
The presented numerical examples provide insight to the qualitative under-

standing of the strange traffic properties of high speed networks.
References
1. A. T. Andersen and B. F. Nielsen. A markovian approach for modeling packet
traffic with long-range dependence. IEEE Journal on Selected Areas in Commu-
nications, 16(5):719–732, 1998.
2. S. Asmussen and O. Nerman. Fitting Phase-type distributions via the EM algo-
rithm. In Proceedings: ”Symposium i Advent Statistik”, pages 335–346, Copen-
hagen, 1991.
3. J. Beran. Statistics for long-memory processes. Chapman and Hall, New York,
1994.
4. A. W. Berger. On the index of dispersion for counts for user demand modeling. In
ITU, Madrid, Spain, June 1994. Study Group 2, Question 17/2.
5. A. Bobbio, A. Horváth, M. Scarpa, and M. Telek. Acyclic discrete phase type
distributions: Properties and a parameter estimation algorithm. submitted to Per-
formance Evaluation, 2000.
6. S. C. Borst, O. J. Boxma, and R. Nunez-Queija. Heavy tails: The effect of the ser-
vice discipline. In Tools 2002, pages 1–30, London, England, April 2002. Springer,
LNCS 2324.
7. G. E. P. Box, G. M Jenkins, and C. Reinsel. Time Series Analysis: Forecasting
and Control. Prentice Hall, Englewood Cliff, N.J., third edition, 1994.
8. E. Castillo. Extreme Value Theory in Engineering. Academic Press, San Diego,
California, 1988.
9. M. E. Crovella and M. S. Taqqu. Estimating the heavy tail index from scaling
properties. Methodology and Computing in Applied Probability, 1(1):55–79, 1999.
10. A. Cumani. On the canonical representation of homogeneous Markov processes
modelling failure-time distributions. Microelectronics and Reliability, 22:583–602,
1982.
11. R. El Abdouni Khayari, R. Sadre, and B. Haverkort. Fitting world-wide web
request traces with the EM-algorithm. In Proc. of SPIE, volume 4523, pages 211–
220, Denver, USA, 2001.
12. R. El Abdouni Khayari, R. Sadre, and B. Haverkort. A valiadation of the pseudo
self-similar traffic model. In Proc. of IPDS, Washington D.C., USA, 2002.
13. A. Feldman, A. C. Gilbert, and W. Willinger. Data networks as cascades: Investi-
gating the multifractal nature of internet WAN traffic. Computer communication
review, 28/4:42–55, 1998.
14. A. Feldman and W. Whitt. Fitting mixtures of exponentials to long-tail distribu-
tions to analyze network performance models. Performance Evaluation, 31:245–
279, 1998.
15. W. Fischer and K. Meier-Hellstern. The Markov-modulated Poisson process
(MMPP) cookbook. Performance Evaluation, 18:149–171, 1992.
16. H. J. Fowler and W. E. Leland. Local area network traffic characteristics, with im-
plications for broadband network congestion management. IEEE JSAC, 9(7):1139–
1149, 1991.
17. R. Fox and M. S. Taqqu. Large sample properties of parameter estimates for
strongly dependent stationary time series. The Annals of Statistics, 14:517–532,
1986.
18. C. W. J. Granger and R. Joyeux. An introduction to long-memory time series and

fractional differencing. Journal of Time Series Analysis, 1:15–30, 1980.
19. H. Heffes and D. M. Lucantoni. A Markov-modulated characterization of packe-
tized voice and data traffic and related statistical multiplexer performance. IEEE
Journal on Selected Areas in Communications, 4(6):856–868, 1986.
20. B. M. Hill. A simple general approach to inference about the tail of a distribution.
The Annals of Statistics, 3:1163–1174, 1975.
21. A. Horváth, G. I. Rózsa, and M. Telek. A map fitting method to approximate real
traffic behaviour. In 8th IFIP Workshop on Performance Modelling and Evaluation
of ATM & IP Networks, pages 32/1–12, Ilkley, England, July 2000.
22. A. Horváth and M. Telek. Approximating heavy tailed behavior with phase type
distributions. In 3rd International Conference on Matrix-Analytic Methods in
Stochastic models, Leuven, Belgium, 2000.
23. A Horváth and M. Telek. A markovian point process exhibiting multifractal be-
haviour and its application to traffic modeling. In Proc. of MAM4, Adelaide,
Australia, 2002.
24. A. Horváth and M. Telek. Phfit: A general phase-type fitting tool. In Proc. of 12th
Performance TOOLS, volume 2324 of Lecture Notes in Computer Science, pages
82–91, Imperial College, London, April 2002.
25. M. Kratz and S. Resnick. The qq–estimator and heavy tails. Stochastic Models,
12:699–724, 1996.
26. A. Lang and J. L. Arthur. Parameter approximation for phase-type distributions.
In S. R. Chakravarty and A. S. Alfa, editors, Matrix-analytic methods in stochastic
models, Lecture notes in pure and applied mathematics, pages 151–206. Marcel
Dekker, Inc., 1996.
27. G. Latouche and V. Ramaswami. Introduction to Matrix-Analytic Methods in
Stochastic Modeling. Series on statistics and applied probability. ASA-SIAM, 1999.
28. W. E. Leland, M. Taqqu, W. Willinger, and D. V. Wilson. On the self-similar
nature of ethernet traffic (extended version). IEEE/ACM Transactions in Net-
working, 2:1–15, 1994.
29. D. M. Lucantoni. New results on the single server queue with a batch Markovian
arrival process. Commun. Statist.-Stochastic Models, 7(1):1–46, 1991.
30. B. B. Mandelbrot and J. W. Van Ness. Fractional Brownian motions, fractional
noises and applications. SIAM Review, 10:422–437, 1969.
31. B. B. Mandelbrot and M. S. Taqqu. Robust R/S analysis of long-run serial corre-
lation. In Proceedings of the 42nd Session of the International Statistical Institute,
volume 48, Book 2, pages 69–104, Manila, 1979. Bulletin of the I.S.I.
32. K.S. Meier. A fitting algorithm for Markov-modulated Poisson processes having
two arrival rates. European Journal of Operations Research, 29:370–377, 1987.
33. M. Neuts. Probability distributions of phase type. In Liber Amicorum Prof. Emer-
itus H. Florin, pages 173–206. University of Louvain, 1975.
34. M.F. Neuts. Matrix Geometric Solutions in Stochastic Models. Johns Hopkins
University Press, Baltimore, 1981.
35. M.F. Neuts. Structured stochastic matrices of M/G/1 type and their applications.
Marcel Dekker, 1989.
36. I. Norros. A storage model with self-similar imput. Queueing Systems, 16:387–396,
1994.
37. I. Norros. On the use of fractional brownian motion in the theorem of connectionless
networks. IEEE Journal on Selected Areas in Communications, 13:953–962, 1995.
38. A. Ost and B. Haverkort. Modeling and evaluation of pseudo self-similar traffic with
infinite-state stochastic petri nets. In Proc. of the Workshop on Formal Methods
in Telecommunications, pages 120–136, Zaragoza, Spain, 1999.
39. S. Resnick. Heavy tail modeling and teletraffic data. The Annals of Statistics,
25:1805–1869, 1997.
40. S. Resnick and C. Starica. Smoothing the hill estimator. Advances in Applied
Probability, 29:271–293, 1997.
41. J. Rice. Mathematical Statistics and Data Analysis. Brooks/Cole Publishing,
Pacific Grove, California, 1988.
42. R. H. Riedi. An introduction to multifractals. Technical report, Rice University,
1997. Available at http://www.ece.rice.edu/˜riedi.
43. R. H. Riedi, M. S. Crouse, V. J. Ribeiro, and R. G. Baraniuk. A multifractal wavelet
model with application to network traffic. IEEE Transactions on Information
Theory, 45:992–1018, April 1999.
44. S. Robert and J.-Y. Le Boudec. New models for pseudo self-similar traffic. Per-
formance Evaluation, 30:1997, 57-68.
45. M. Roughan, D. Veitch, and M. Rumsewicz. Numerical inversion of probability
generating functions of power-law tail queues. tech. report, 1997.
46. T. Rydén. Parameter estimation for Markov Modulated Poisson Processes.
Stochastic Models, 10(4):795–829, 1994.
47. T. Ryden. An EM algorithm for estimation in Markov modulated Poisson pro-
cesses. Computational statist. and data analysis, 21:431–447, 1996.
48. T. Rydén. Estimating the order of continuous phase-type distributions and markov-
modulated poisson processes. Stochastic Models, 13:417–433, 1997.
49. B. Ryu and S. B. Lowen. Point process models for self-similar network traffic, with
applications. Stochastic models, 14, 1998.
50. M. Telek and A. Heindl. Moment bounds for acyclic discrete and continuous
phase-type distributions of second order. In in proc. of Eighteenth Annual UK
Performance Engineering Workshop (UKPEW), Glasgow, UK, 2002.
51. J. Lévy Véhel and R. H. Riedi. Fractional brownian motion and data traffic model-
ing: The other end of the spectrum. In C. Tricot J. Lévy Véhel, E. Lutton, editor,
Fractals in Engineering, pages 185–202. Springer, 1997.
52. The internet traffic archive. http://ita.ee.lbl.gov/index.html.
Optimization of Bandwidth and Energy
Consumption in Wireless Local Area Networks
M arc o C o nt i and Enric o Gre go ri
C o nsiglio Naz io nale de lle Ric e rc he

I I T I nst it ut e , V ia G. M o ruz z i, 1
5 6 1 2 4 Pisa, I t aly
{marco.conti, enrico.gregori}@iit.cnr.it
Abstract. I n t he re c e nt ye ars t he pro lif e rat io n o f po rt able c o mput e rs,

handhe ld digit al de v ic e s, and PDA s has le d t o a rapid gro w t h in t he use o f
w ire le ss t e c hno lo gie s f o r t he Lo c al A re a Ne t w o rk ( LA N) e nv iro nme nt . Be -
yo nd suppo rt ing w ire le ss c o nne c t iv it y f o r f ix e d, po rt able and mo v ing st a-
t io ns w it hin a lo c al are a, t he w ire le ss LA N ( W LA N) t e c hno lo gie s c an pro -
v ide a mo bile and ubiquit o us c o nne c t io n t o t he I nt e rne t inf o rmat io n se rv -
ic e s. T he de sign o f W LA Ns has t o c o nc e nt rat e o n bandw idt h c o nsumpt io n
be c ause w ire le ss ne t w o rks de liv e r muc h lo w e r bandw idt h t han w ire d ne t -
w o rks, e . g. , 2 - 1 1 M bps [ 1 ] v e rsus 1 0 - 1 5 0 M bps [ 2 ] . I n addit io n, t he f init e
bat t e ry po w e r o f mo bile c o mput e rs re pre se nt s o ne o f t he gre at e st limit a-
t io ns t o t he ut ilit y o f po rt able c o mput e rs [ 3] , [ 4] . H e nc e , a re le v ant pe r-
f o rmanc e - o pt imiz at io n pro ble m is t he balanc ing be t w e e n t he minimiz at io n
o f bat t e ry c o nsumpt io n, and t he max imiz at io n o f t he c hanne l ut iliz at io n. I n
t his pape r, w e st udy bandw idt h and e ne rgy c o nsumpt io n o f t he I EEE 80 2 . 1 1
st andard, i. e . , t he mo st mat ure t e c hno lo gy f o r W LA Ns. S pe c if ic ally, w e de -
riv e d analyt ic al f o rmulas t hat re lat e t he pro t o c o l parame t e rs t o t he max imum
t hro ughput and t o t he minimal e ne rgy c o nsumpt io n. T he se f o rmulas are use d
t o de f ine an e f f e c t iv e me t ho d f o r t uning at run t ime t he pro t o c o l parame t e rs.
1 Introduction
I n I EEE80 2 . 1 1 W LA Ns [ 1 ] , t he M e dium A c c e ss C o nt ro l ( M A C ) pro t o c o l is t he main

e le me nt t hat de t e rmine s t he e f f ic ie nc y in sharing t he limit e d re so urc e s o f t he w ire le ss
c hanne l. T he M A C pro t o c o l c o o rdinat e s t he t ransmissio ns o f t he ne t w o rk st at io ns
and, at t he same t ime , manage s t he c o nge st io n sit uat io ns t hat may o c c ur inside t he
ne t w o rk. T he c o nge st io n le v e l in t he ne t w o rk ne gat iv e ly af f e c t s bo t h t he c hanne l
ut iliz at io n ( i. e . t he f rac t io n o f c hanne l bandw idt h use d f ro m suc c e ssf ully t ransmit t e d
me ssage s) , and t he e ne rgy c o nsume d t o suc c e ssf ully t ransmit a me ssage . S pe c if ic ally,
e ac h c o llisio n re duc e s t he c hanne l bandw idt h and t he bat t e ry c apac it y av ailable f o r
suc c e ssf ul t ransmissio ns. T o de c re ase t he c o llisio n pro babilit y, t he I EEE 80 2 . 1 1
pro t o c o l use s a C S M A / C A pro t o c o l base d o n a t runc at e d binary e x po ne nt ial bac ko f f
sc he me t hat do uble s t he bac ko f f w indo w af t e r e ac h c o llisio n [ 1 ] , [ 2 ] . H o w e v e r, t he
M . C . C alz aro ssa and S . T uc c i ( Eds. ) : Pe rf o rmanc e 2 0 0 2 , LNC S 2 45 9 , pp. 435 − 46 2 , 2 0 0 2 .

© S pringe r- V e rlag Be rlin H e ide lbe rg 2 0 0 2
436 M . C o nt i and E. Gre go ri
t ime spre ading o f t he ac c e sse s t hat t he st andard bac ko f f pro c e dure ac c o mplishe s has a
ne gat iv e impac t o n bo t h t he c hanne l ut iliz at io n, and t he e ne rgy c o nsumpt io n. S pe c if i-
c ally, t he t ime spre ading o f t he ac c e sse s c an int ro duc e large de lays in t he me ssage
t ransmissio ns, and e ne rgy w ast age s due t o t he c arrie r se nsing. Furt he rmo re , t he I EEE
80 2 . 1 1 po lic y has t o pay t he c o st o f c o llisio ns t o inc re ase t he bac ko f f t ime w he n t he
ne t w o rk is c o nge st e d.
I n [ 5 ] , [ 6 ] and [ 7 ] , giv e n t he binary e x po ne nt ial bac ko f f sc he me ado pt e d by t he
st andard, so lut io ns hav e be e n pro po se d f o r a be t t e r unif o rm dist ribut io n o f ac c e sse s.
T he mo st pro mising dire c t io n f o r impro v ing t he bac ko f f pro t o c o ls is t o ado pt f e e d-
bac k- base d t uning algo rit hms t hat e x plo it t he inf o rmat io n re t rie v e d f ro m t he o bse rv a-
t io n o f t he c hanne l st at us [ 8] , [ 9 ] , [ 1 0 ] . Fo r t he I EEE 80 2 . 1 1 M A C pro t o c o l, so me
aut ho rs hav e pro po se d an adapt iv e c o nt ro l o f t he ne t w o rk c o nge st io n by inv e st igat ing
t he numbe r o f use rs in t he syst e m [ 1 1 ] , [ 1 2 ] , [ 1 3] . T his inv e st igat io n c o uld re sult
e x pe nsiv e , dif f ic ult t o o bt ain, and subje c t t o signif ic ant e rro rs, e spe c ially in high
c o nt e nt io n sit uat io ns [ 1 2 ] . Dist ribut e d ( i. e . inde pe nde nt ly e x e c ut e d by e ac h st at io n)
st rat e gie s f o r po w e r sav ing hav e be e n pro po se d and inv e st igat e d in [ 1 4] , [ 1 5 ] . S pe c if i-
c ally, in [ 1 4] t he aut ho rs pro po se a po w e r c o nt ro lle d w ire le ss M A C pro t o c o l base d o n
a f ine - t uning o f ne t w o rk int e rf ac e t ransmit t ing po w e r. [ 1 5 ] e x t e nds t he algo rit hm
pre se nt e d in [ 7 ] w it h po w e r sav ing f e at ure s.
T his pape r pre se nt s and e v aluat e s a dist ribut e d me c hanism f o r t he c o nt e nt io n c o n-
t ro l in I EEE 80 2 . 1 1 W LA Ns t hat e x t e nds t he st andard ac c e ss me c hanism w it ho ut
re quiring any addit io nal hardw are . O ur me c hanism dynamic ally adapt s t he bac ko f f
w indo w siz e t o t he c urre nt ne t w o rk c o nt e nt io n le v e l, and guarant e e s t hat an I EEE
80 2 . 1 1 W LA N asymptotically ac hie v e s it s o pt imal c hanne l ut iliz at io n and/ o r t he
minimum e ne rgy c o nsumpt io n. Fo r t his re aso n w e name d o ur me c hanism A sympt o t i-
c al O pt imal Bac ko f f ( AOB) . T o t une t he parame t e rs o f o ur me c hanism w e analyt i-
c ally st udie d t he bandw idt h and e ne rgy c o nsumpt io n o f t he I EEE 80 2 . 1 1 st andard, and
w e de riv e d c lo se d f o rmulas t hat re lat e t he pro t o c o l bac ko f f parame t e rs t o t he max imum
t hro ughput and t o t he minimal e ne rgy c o nsumpt io n.
O ur analyt ic al st udy o f I EEE 80 2 . 1 1 pe rf o rmanc e is base d o n a p- pe rsist e nt mo de l
o f t he I EEE 80 2 . 1 1 pro t o c o l [ 1 1 ] , [ 1 5 ] . T his pro t o c o l mo de l dif f e rs f ro m t he st andard
pro t o c o l o nly in t he se le c t io n o f t he bac ko f f int e rv al. I nst e ad o f t he binary e x po ne nt ial
bac ko f f use d in t he st andard, t he bac ko f f int e rv al o f t he p- pe rsist e nt I EEE 80 2 . 1 1
pro t o c o l is sample d f ro m a ge o me t ric dist ribut io n w it h parame t e r p. I n [ 1 1 ] , it w as
sho w n t hat a p- pe rsist e nt I EEE 80 2 . 1 1 c lo se ly appro x imat e s t he st andard pro t o c o l.
I n t his pape r, w e use t he p- pe rsist e nt mo de l t o de riv e analyt ic al f o rmulas f o r t he
I EEE80 2 . 1 1 pro t o c o l c apac it y and e ne rgy c o nsumpt io n. Fro m t he se f o rmulas w e
c o mput e t he p v alue ( i. e . , t he av e rage bac ko f f w indo w siz e ) c o rre spo nding t o max i-
mum c hanne l ut iliz at io n ( optimal capacity p, also re f e rre d t o as pCopt ) , and t he p v alue
c o rre spo nding t o minimum e ne rgy c o nsumpt io n ( optimal energy p, also re f e rre d t o as
E
popt ) . T he pro pe rt ie s o f t he o pt imal o pe rat ing po int s ( bo t h f ro m t he e f f ic ie nc y and
po w e r sav ing st andpo int ) are de e ply inv e st igat e d. I n addit io n, w e also pro v ide c lo se d
f o rmulas f o r t he o pt imal p v alue s. T he se f o rmulas are use d by A O B t o dynamic ally
t une t he W LA N bac ko f f parame t e rs e it he r t o max imiz e W LA N e f f ic ie nc y, o r t o
minimiz e W LA N e ne rgy c o nsumpt io n.
O pt imiz at io n o f Bandw idt h and Ene rgy C o nsumpt io n 437
V ia simulat io n, w e c o mpare d t he pe rf o rmanc e f igure s o f t he I EEE 80 2 . 1 1 pro t o c o l

w it h o r w it ho ut o ur bac ko f f t uning algo rit hm. Re sult s o bt aine d indic at e t hat t he e n-
hanc e d pro t o c o l signif ic ant ly impro v e s t he st andard pro t o c o l, and ( in all t he c o nf igura-
t io ns analyz e d) pro duc e s pe rf o rmanc e f igure s v e ry c lo se t o t he t he o re t ic al limit s.
2. IEEE 802.11
T he I EEE 80 2 . 1 1 st andard de f ine s a M A C laye r and a Physic al Laye r f o r W LA Ns. T he

M A C laye r pro v ide s bo t h c o nt e nt io n- base d and c o nt e nt io n- f re e ac c e ss c o nt ro l o n a
v arie t y o f physic al laye rs. T he st andard pro v ide s 2 Physic al laye r spe c if ic at io ns f o r
radio ( Fre que nc y H o pping S pre ad S pe c t rum, Dire c t S e que nc e S pre ad S pe c t rum) , o pe r-
at ing in t he 2 ,40 0 - 2 ,483. 5 M H z band ( de pe nding o n lo c al re gulat io ns) , and o ne f o r
inf rare d. T he Physic al Laye r pro v ide s t he basic rat e s o f 1 M bit / s and 2 M bit / s. T w o
pro je c t s are c urre nt ly o ngo ing t o de v e lo p highe r- spe e d PH Y e x t e nsio ns t o 80 2 . 1 1
o pe rat ing in t he 2 . 4 GH z band ( pro je c t 80 2 . 1 1 b, handle d by T Gb) and in t he 5 GH z
band ( pro je c t 80 2 . 1 1 a handle d by T Ga) , se e [ 1 6 ] and [ 1 7 ] .
T he basic ac c e ss me t ho d in t he I EEE 80 2 . 1 1 M A C pro t o c o l is t he Distributed Co-
ordination Function ( DC F) w hic h is a Carrier Sense Multiple Access with Collision
Avoidance ( C S M A / C A ) M A C pro t o c o l. I n addit io n t o t he DC F, t he I EEE 80 2 . 1 1
also inc o rpo rat e s an alt e rnat iv e ac c e ss me t ho d kno w n as t he Point Coordination Func-
tion ( PC F) - an ac c e ss me t ho d t hat is similar t o a po lling syst e m and use s a po int
c o o rdinat o r t o de t e rmine w hic h st at io n has t he right t o t ransmit . I n t his se c t io n w e
o nly pre se nt t he aspe c t s o f t he DC F ac c e ss me t ho d re le v ant f o r t he sc o pe o f t his pa-
pe r. Fo r t he de t aile d e x planat io n o f t he I EEE 80 2 . 1 1 st andard w e addre ss int e re st e d
re ade rs t o [ 1 ] , [ 2 ] , [ 1 8] .
T he DC F ac c e ss me t ho d is base d o n a C S M A / C A M A C pro t o c o l. T his pro t o c o l
re quire s t hat e v e ry st at io n, be f o re t ransmit t ing, pe rf o rms a C arrie r S e nsing ac t iv it y t o
de t e rmine t he st at e o f t he c hanne l ( idle o r busy) . I f t he me dium is f o und t o be idle f o r
an int e rv al t hat e x c e e ds t he Distributed InterFrame Space ( DI FS ) , t he st at io n c o nt inue s
w it h it s t ransmissio n. I f t he me dium is busy, t he t ransmissio n is de f e rre d unt il t he
o ngo ing t ransmissio n t e rminat e s, and a C o llisio n A v o idanc e me c hanism is ado pt e d.
T he I EEE 80 2 . 1 1 C o llisio n A v o idanc e me c hanism is a Binary Exponential Backoff
sc he me [ 1 ] , [ 1 9 ] , [ 2 0 ] , [ 2 1 ] . A c c o rding t o t his me c hanism, a st at io n se le c t s a rando m
int e rv al, name d backoff interval, t hat is use d t o init ializ e a backoff counter.
T he bac ko f f c o unt e r is de c re ase d as lo ng as t he c hanne l is se nse d idle , st o ppe d w he n
a t ransmissio n is de t e c t e d o n t he c hanne l, and re ac t iv at e d w he n t he c hanne l is se nse d
idle again f o r mo re t han a DI FS . A st at io n t ransmit s w he n it s bac ko f f c o unt e r re ac he s
z e ro .
W he n, t he c hanne l is idle t he t ime is me asure d in c o nst ant le ngt h unit s
( Slot_Time) indic at e d as slo t s in t he f o llo w ing. T he bac ko f f int e rv al is an int e ge r
numbe r o f slo t s and it s v alue is unif o rmly c ho se n in t he int e rv al ( 0 , CW_Size- 1 ) ,
w he re CW_Size is, in e ac h st at io n, a lo c al parame t e r t hat de f ine s t he c urre nt st at io n
Contention Window siz e . S pe c if ic ally, t he bac ko f f v alue is de f ine d by t he f o llo w ing
e x pre ssio n [ 1 ] :
Backoff _ Counter
INT Rnd CW _ Size ,
w he re Rnd() is a f unc t io n w hic h re t urns pse udo - rando m numbe rs unif o rmly dist ribut e d
in [ 0 . . 1 ] .
T he Binary Ex po ne nt ial Bac ko f f is c harac t e riz e d by t he e x pre ssio n t hat giv e s t he
de pe nde nc y o f t he CW_Size parame t e r by t he numbe r o f unsuccessful transmission
attempts ( N_A) alre ady pe rf o rme d f o r a giv e n f rame . I n [ 1 ] it is de f ine d t hat t he f irst
t ransmissio n at t e mpt f o r a giv e n f rame is pe rf o rme d ado pt ing CW_Size e qual t o t he
minimum v alue CW_Size_min ( assuming lo w c o nt e nt io n) . A f t e r e ac h unsuc c e ssf ul
( re ) t ransmissio n o f t he same f rame , t he st at io n do uble s C W _ S iz e unt il it re ac he s t he
max imal v alue f ix e d by t he st andard, i. e . CW_Size_MAX, as f o llo w s:
CW _ Size N _ A
min CW _ Size _ MAX , CW _ Size _ min 2 N _ A 1
.
Po sit iv e ac kno w le dge me nt s are e mplo ye d t o asc e rt ain a suc c e ssf ul t ransmissio n. T his
is ac c o mplishe d by t he re c e iv e r ( imme diat e ly f o llo w ing t he re c e pt io n o f t he dat a
f rame ) w hic h init iat e s t he t ransmissio n o f an ac kno w le dge me nt f rame ( A C K ) af t e r a
t ime int e rv al Short Inter Frame Space ( S I FS ) , w hic h is le ss t han DI FS .
I f t he t ransmissio n ge ne rat e s a c o llisio n1 , t he CW_Size parame t e r is do uble d f o r
t he ne w sc he duling o f t he re t ransmissio n at t e mpt t hus o bt aining a f urt he r re duc t io n o f
c o nt e nt io n.
T he inc re ase o f t he C W _ S iz e parame t e r v alue af t e r a c o llisio n is t he re ac t io n t hat
t he 80 2 . 1 1 st andard DC F pro v ide s t o make t he ac c e ss me c hanism adapt iv e t o c hanne l
c o ndit io ns.
2.1 IEEE 802.11 Congestion Reaction
Fig. 1 sho w s simulat io n dat a re garding t he c hanne l ut iliz at io n o f a st andard 80 2 . 1 1

syst e m running in DC F mo de , w it h re spe c t t o t he c o nt e nt io n le v e l, i. e . t he numbe r o f
ac t iv e st at io ns w it h c o nt inuo us t ransmissio n re quire me nt s. T he parame t e rs ado pt e d in
t he simulat io n, pre se nt e d in T able 1 , re f e r t o t he Fre que nc y H o pping S pre ad S pe c t rum
imple me nt at io n [ 1 ] .
Fig. 1 plo t s t he c hanne l ut iliz at io n v e rsus t he numbe r o f ac t iv e st at io ns o bt aine d
assuming asympt o t ic c o ndit io ns, i. e . all t he st at io ns hav e alw ays a f rame t o t ransmit .
By analyz ing t he be hav io r o f t he 80 2 . 1 1 DC F me c hanism so me pro ble ms c o uld be
ide nt if ie d. S pe c if ic ally, t he re sult s pre se nt e d in t he f igure sho w t hat t he c hanne l ut ili-
z at io n is ne gat iv e ly af f e c t e d by t he inc re ase in t he c o nt e nt io n le v e l.
T he se re sult s c an be e x plaine d as, in t he I EEE 80 2 . 1 1 bac ko f f algo rit hm, a st at io n
se le c t s t he init ial siz e o f t he C o nt e nt io n W indo w by assuming a lo w le v e l o f c o nge s-
t io n in t he syst e m. T his c ho ic e av o ids lo ng ac c e ss de lays w he n t he lo ad is light . U n-
f o rt unat e ly, t his c ho ic e c ause s e f f ic ie nc y pro ble ms in burst y arriv al sc e nario s, and in
1 A c o llisio n is assume d w he ne v e r t he A C K f ro m t he re c e iv e r is missing

c o nge st e d syst e ms, be c ause it c o nc e nt rat e s t he ac c e sse s in a re duc e d t ime w indo w , and
he nc e it may c ause a high c o llisio n pro babilit y.
Table 1: S yst e m’s physic al parame t e rs
parame t e r v alue
Numbe r o f S t at io ns ( M) v ariable f ro m 2 t o 2 0 0
C W _ S iz e _ min 1 6
C W _ S iz e _ M A X 1 0 2 4
C hanne l t ransmissio n rat e 2 M b/ s
Paylo ad siz e Ge o me t ric dist ribut io n ( parame t e r q)
A c kno w le dge me nt siz e 2 0 0 P se c ( 5 0 Byt e s)
H e ade r siz e 1 36 P se c ( 34 Byt e s)
S lo t T ime ( t slot ) 5 0 P se c
S I FS 2 8 P se c
DI FS 1 2 8 P se c
Pro pagat io n t ime < 1 P se c
Channel Utilization of the Standard 802.11 DCF

0.9
0.8
0.7
Average Payload = 2.5 Slot times

0.6 Average Payload = 50 Slot times
Average Payload = 100 Slot times
Channel Utilization
0.5
0.4
0.3
0.2
0.1
0
20 40 60 80 100 120 140 160 180 200
Number of active stations
Fig. 1.: C hanne l ut iliz at io n o f t he I EEE 80 2 . 1 1 DC F ac c e ss sc he me
I n high- c o nge st io n c o ndit io ns e ac h st at io n re ac t s t o t he c o nt e nt io n o n t he basis o f t he

c o llisio ns so f ar e x pe rie nc e d w hile t ransmit t ing a f rame . Ev e ry st at io n pe rf o rms it s
at t e mpt s blindly, w it h a lat e c o llisio n re ac t io n pe rf o rme d ( inc re asing CW_Size). Eac h
inc re ase o f t he CW_Size is o bt aine d paying t he c o st o f a c o llisio n. I t is w o rt h no t ing
t hat , as a c o llisio n de t e c t io n me c hanism is no t imple me nt e d in t he I EEE 80 2 . 1 1 , a
c o llisio n implie s t hat t he c hanne l is no t av ailable f o r t he t ime re quire d t o t ransmit t he
lo nge st c o lliding pac ke t . Furt he rmo re , af t e r a suc c e ssf ul t ransmissio n t he CW_Size is
se t again t o t he minimum v alue w it ho ut maint aining any kno w le dge o f t he c urre nt

c o nt e nt io n le v e l. T o summariz e t he I EEE 80 2 . 1 1 bac ko f f me c hanism has t w o main
draw bac ks: i) t he inc re ase o f t he CW_Size is o bt aine d paying t he c o st o f a c o llisio n,
and ii) af t e r a suc c e ssf ul t ransmissio n no st at e inf o rmat io n indic at ing t he ac t ual c o n-
t e nt io n le v e l is maint aine d.
3. Low-Cost Dynamic Tuning of the Backoff Window Size
T he draw bac ks o f t he I EEE 80 2 . 1 1 bac ko f f algo rit hm, e x plaine d in t he pre v io us se c -

t io n, indic at e t he dire c t io n f o r impro v ing t he pe rf o rmanc e o f a rando m ac c e ss sc he me ,
by e x plo it ing t he inf o rmat io n o n t he c urre nt ne t w o rk c o nge st io n le v e l t hat is alre ady
av ailable at t he M A C le v e l. S pe c if ic ally, t he ut iliz at io n rat e o f t he slo t s ( Slot Utiliza-
tion) o bse rv e d o n t he c hanne l by e ac h st at io n is use d as a simple and e f f e c t iv e e st imat e
o f t he c hanne l c o nge st io n le v e l. T he e st imat e o f t he S lo t U t iliz at io n must be f re -
que nt ly updat e d. Fo r t his re aso n in [ 7 ] it w as pro po se d an e st imat e t hat has t o be
updat e d by e ac h st at io n in e v e ry Backoff interval, i. e . , t he de f e r phase t hat pre c e de s a
t ransmissio n at t e mpt .
A simple and int uit iv e de f init io n o f t he slo t ut iliz at io n e st imat e is t he n giv e n by:
Num _ Busy _ Slots ,
Slot _ Utilization
Num _ Available _ Slots
w he re Num_Busy_Slots is t he numbe r o f slo t s, in t he bac ko f f int e rv al, in w hic h a
t ransmissio n at t e mpt st art s, he re af t e r re f e rre d as busy slots. A t ransmissio n at t e mpt
c an be e it he r a suc c e ssf ul t ransmissio n, o r a c o llisio n; and Num_Available_Slots is
t he t o t al numbe r o f slo t s av ailable f o r t ransmissio n in t he bac ko f f int e rv al, i. e . t he
sum o f idle and busy slo t s.
I n t he 80 2 . 1 1 st andard me c hanism e v e ry st at io n pe rf o rms a C arrie r S e nsing ac t iv it y
and t hus t he pro po se d slo t ut iliz at io n ( S_U) e st imat e is simple t o o bt ain. T he inf o r-
mat io n re quire d t o e st imat e S_U is alre ady av ailable t o an I EEE 80 2 . 1 1 st at io n, and
no addit io nal hardw are is re quire d.
T he c urre nt S_U e st imat e c an be ut iliz e d by e ac h st at io n t o e v aluat e ( be f o re t rying
a “blind” t ransmissio n) t he o ppo rt unit y t o pe rf o rm, o r t o de f e r, it s sc he dule d t ransmis-
sio n at t e mpt . I f a st at io n kno w s t hat t he pro babilit y o f a suc c e ssf ul t ransmissio n is
lo w , it sho uld de f e r it s t ransmissio n at t e mpt . S uc h a be hav io r c an be ac hie v e d in an
I EEE 80 2 . 1 1 ne t w o rk by e x plo it ing t he DC C me c hanism pro po se d in [ 7 ] . A c c o rding
t o DC C , e ac h I EEE 80 2 . 1 1 st at io n pe rf o rms an addit io nal c o nt ro l ( be yo nd c arrie r
se nsing and bac ko f f algo rit hm) be f o re any t ransmissio n at t e mpt . T his c o nt ro l is base d
o n a ne w parame t e r name d Probability of Transmission P_T(...) w ho se v alue is de -
pe nde nt o n t he c urre nt c o nt e nt io n le v e l o f t he c hanne l, i. e . , S_U. T he he urist ic f o r-
mula pro po se d in [ 7 ] f o r P_T( . . . ) is:
P _ T S _ U, N _ A 1 S _ U N _ A
,
w he re , by de f init io n, S _ U assume s v alue s in t he int e rv al [ 0 ,1 ] , and N_A is t he

numbe r o f at t e mpt s alre ady pe rf o rme d by t he st at io n f o r t he t ransmissio n o f t he c ur-
re nt f rame .
T he N_A parame t e r is use d t o part it io n t he se t o f ac t iv e st at io ns in suc h a w ay t hat
e ac h subse t o f st at io ns is asso c iat e d w it h a dif f e re nt le v e l o f priv ile ge t o ac c e ss t he
c hanne l. S t at io ns t hat hav e pe rf o rme d se v e ral unsuc c e ssf ul at t e mpt s hav e t he highe st
t ransmissio n priv ile ge [ 7 ] .
T he P_T parame t e r allo w s t o f ilt e r t he t ransmissio n ac c e sse s. W he n, ac c o rding t o
t he st andard pro t o c o l, a st at io n is aut ho riz e d t o t ransmit ( i. e . , bac ko f f c o unt e r is z e ro
and c hanne l is idle ) in t he pro t o c o l e x t e nde d w it h t he Pro babilit y o f T ransmissio n, a
st at io n w ill pe rf o rm a re al t ransmissio n w it h pro babilit y P_T, o t he rw ise ( i. e . w it h
pro babilit y 1 - P_T) t he t ransmissio n is re - sc he dule d as a c o llisio n w o uld hav e o c c ure d,
i. e . a ne w bac ko f f int e rv al is sample d.
T o be t t e r unde rst and t he re lat io nship be t w e e n t he P_T de f init io n and t he ne t w o rk
c o nge st io n le v e l, w e c an o bse rv e t he Fig. 2 .
P_ T ( N_ A = 1 ) P_ T ( N_ A = 2 )
P_ T ( N_ A = 4) P_ T ( N_ A = 8)
1
P_T
0 .8
0 .6
0 .4
0 .2
0
0 0 .2 0 .4 0 .6 0 .8 1
S_U
Fig. 2: DC C Pro babilit y o f T ransmissio n
I n t he f igure w e sho w t he P_T c urv e s ( f o r use rs w it h dif f e re nt N_A) w it h re spe c t t o

t he e st imat e d S_U v alue s. A ssuming a slo t ut iliz at io n ne ar t o z e ro , w e c an o bse rv e
t hat e ac h st at io n, inde pe nde nt ly by it s numbe r o f pe rf o rme d at t e mpt s, o bt ains a Pro b-
abilit y o f T ransmissio n ne ar t o 1 . T his me ans t hat t he pro po se d me c hanism has no
e f f e c t o n t he syst e m, and e ac h use r pe rf o rms it s ac c e sse s just like in t he st andard ac -
c e ss sc he me , w it ho ut any addit io nal c o nt e nt io n c o nt ro l. T his po int is signif ic ant as it
implie s t he abse nc e o f o v e rhe ad int ro duc e d in lo w - lo ad c o ndit io ns. T he dif f e re nc e s in
t he use rs be hav io r as a f unc t io n o f t he ir le v e ls o f priv ile ge ( re lat e d t o t he v alue o f t he
N_A parame t e r) appe ar w he n t he slo t ut iliz at io n gro w s. Fo r e x ample , assuming a slo t
ut iliz at io n ne ar t o 1 , say 0 . 8, w e o bse rv e t hat t he st at io ns w it h t he highe st N_A v alue
o bt ains a Pro babilit y o f T ransmissio n c lo se t o 1 w hile st at io ns at t he f irst t ransmis-

sio n at t e mpt t ransmit w it h a pro babilit y e qual t o 0 . 2 .
P_ T ( N_ A = 1 ) P_ T ( N_ A = 2 )
P_ T ( N_ A = 4) P_ T ( N_ A = 8)
1
P_T
0 .8
0 .6
opt_S_U= 0 . 80
0 .4
0 .2
0
0 0 .2 0 .4 0 .6 0 .8 1
S_U
Fig. 3: Ge ne raliz e d Pro babilit y o f T ransmissio n
I t is w o rt h no t ing a pro pe rt y o f t he DC C me c hanism: t he slo t ut iliz at io n o f t he c han-

ne l ne v e r re ac he s t he v alue 1 . A ssuming S_U ne ar o r e qual t o 1 , t he DC C me c hanism
re duc e s t he Pro babilit ie s o f T ransmissio n f o r all st at io ns c lo se t o z e ro t hus re duc ing
t he ne t w o rk c o nt e nt io n le v e l. T his e f f e c t w as due t o t he P_T de f init io n, and in part ic u-
lar t o t he e x plic it pre se nc e o f t he uppe r bo und 1 f o r t he slo t ut iliz at io n e st imat e . T he
DC C c ho ic e t o use 1 as t he asympt o t ic al limit f o r t he S_U is he urist ic , and it is
induc e d by t he lac k o f t he kno w le dge o f w hic h is t he o pt imal uppe r bo und f o r t he S_U
v alue (opt_S_U) t o guarant e e t he max imum c hanne l ut iliz at io n. I t is w o rt h no t ing
t hat , if opt_S_U is kno w n, t he P_T me c hanism c an be e asily t une d t o guarant e e t hat
t he max imum c hanne l ut iliz at io n is ac hie v e d. I nt uit iv e ly, if t he slo t - ut iliz at io n
bo undary v alue ( i. e . t he v alue o ne f o r DC C ) w o uld be re plac e d by t he opt_S_U v alue ,
w e re duc e all t he pro babilit ie s o f t ransmissio n t o z e ro in c o rre spo nde nc e o f slo t ut ili-
z at io n v alue s gre at e r o r e qual t o t he opt_S_U. T his c an be ac hie v e d by ge ne raliz ing
t he de f init io n f o r t he Pro babilit y o f T ransmissio n:
N_ A (1)
§ S_ U ·
P _ T opt _ S _ U , S _ U , N _ A 1 min¨ 1 , ¸ .
© opt _ S _ U ¹
S pe c if ic ally, by applying t his de f init io n o f t he t ransmissio n pro babilit y w e o bt ain t he
P_T c urv e s sho w n in Fig. 3. T he se c urv e s hav e be e n o bt aine d by applying t he ge ne r-
aliz e d P_T de f init io n w it h opt_S_U= 0 . 80 . A s e x pe c t e d t he c urv e s indic at e t he e f f e c -
t iv e ne ss o f t he ge ne raliz e d P_T de f init io n in pro v iding a S_U bo unding t hat c hange s
w it h t he opt_S_U v alue .
T he ge ne raliz e d Pro babilit y o f T ransmissio n pro v ide s an e f f e c t iv e t o o l f o r c o nt ro l-
ling, in an o pt imal w ay, t he c o nge st io n inside a I EEE 80 2 . 1 1 W LA N pro v ide d t hat
t he opt_S_U v alue is kno w n. I n t he ne x t se c t io n w e w ill sho w ho w t his o pt imal

v alue c an be c o mput e d. T he c o mput at io n is base d o n t he e x plo it at io n o f t he re sult s
de riv e d in [ 1 1 ] .
4. Protocol Model
C o nside r a syst e m w it h M ac t iv e st at io ns ac c e ssing a slo t t e d mult iple - ac c e ss c hanne l.

T he rando m ac c e ss pro t o c o ls f o r c o nt ro lling t his c hanne l c an be e it he r a S - A lo ha o r a
p- pe rsist e nt C S M A algo rit hm. I n t he f irst c ase ( A lo ha) t he st at io ns t ransmit c o nst ant -
le ngt h me ssage s w it h le ngt h l t hat e x ac t ly f it s in a slo t . I n t he lat t e r c ase ( C S M A ) ,
t he me ssage le ngt h is a rando m v ariable L, w it h av e rage l. H e re af t e r, t o simplif y t he
pre se nt at io n w e w ill assume t hat L v alue s alw ays c o rre spo nd t o an int e ge r numbe r o f
slo t s. I n bo t h c ase s, w he n a t ransmissio n at t e mpt is c o mple t e d ( suc c e ssf ully o r w it h a
c o llisio n) , e ac h ne t w o rk st at io n w it h pac ke t s re ady f o r t ransmissio n ( he re af t e r back-
logged station) w ill st art a t ransmissio n at t e mpt w it h pro babilit y p.
T o de riv e a c lo se d f o rmula o f t he c hanne l ut iliz at io n, say U , f o r t he p- pe rsist e nt
C S M A pro t o c o ls w e o bse rv e t he c hanne l be t w e e n t w o c o nse c ut iv e suc c e ssf ul t rans-
missio ns. Le t us de no t e w it h t i t he t ime be t w e e n t he ( i- 1 ) t h suc c e ssf ul t ransmissio n
and t he it h suc c e ssf ul t ransmissio n, and w it h si t he durat io n o f t he it h suc c e ssf ul
t ransmissio n. H e nc e , t he c hanne l ut iliz at io n is simply o bt aine d:
s1 s2 L sn ( 2)
U lim .
n of t1 t i2 L t n
By div iding bo t h nume rat o r and de no minat o r o f Equat io n ( 3) by n, af t e r so me alge -
braic manipulat io ns, it f o llo w s t hat :
E> S @ ( 3)
U ,
E>T @
w he re E> S@ is t he av e rage durat io n o f a suc c e ssf ul t ransmissio n, and E>T @ is t he
av e rage t ime be t w e e n t w o suc c e ssf ul t ransmissio ns. T he E>T @ f o rmula is st raight f o r-
w ardly o bt aine d by c o nside ring t he be hav io r o f a C S M A pro t o c o l. S pe c if ic ally, be f o re
suc c e ssf ully t ransmit t ing a me ssage , a st at io n w ill e x pe rie nc e E> N c @ c o llisio ns, o n
av e rage . Furt he rmo re , e ac h t ransmissio n is pre c e de d by an idle t ime , say E> Idle@ ,
during w hic h a st at io n list e ns t o t he c hanne l. T he re f o re , w e c an w rit e :
E>T @ E>N c @ 1 E>Idle@ E>N c @ E>Coll | Coll@ E>S@ , ( 4)
w he re E>Coll | Coll @ is t he av e rage durat io n o f a c o llisio n, giv e n t hat a c o llisio n

o c c urs. Finally, t he unkno w n quant it ie s in Equat io n( 4) are de riv e d in Le mma 1 unde r
t he f o llo w ing assumpt io ns: i) t he c hanne l idle t ime is div ide d in f ix e d t ime le ngt hs,
say t slot ; ii) all t he st at io ns ado pt a p- pe rsist e nt C S M A algo rit hm t o ac c e ss t he c han-
ne l; iii) all t he st at io ns o pe rat e in sat urat io n c o ndit io ns, i. e . , t he y hav e alw ays a me s-
sage w ait ing t o be t ransmit t e d; iv) t he me ssage le ngt hs, say l i , are rando m v ariable s
ide nt ic ally and inde pe nde nt ly dist ribut e d.
Lemma 1. I n a ne t w o rk w it h M st at io ns o pe rat ing ac c o rding t o assumpt io ns i) t o iv )

abo v e , by de no t ing w it h Pr^L d h` t he pro babilit y t hat t he t ime o c c upie d by t he
t ransmissio n o f t he me ssage L is le ss o r e qual t o h t slot :
1 p
M
( 5)
E> Idle@ ,
1 1 p
M
1 1 p
M
( 6)
E> N c @ 1 ,
Mp1 p
M 1
E[ Coll | Coll ] ( 7)
^
f
t slot
¦ h >1 1 Pr^L d h` p@
M

>
1 1 p
M
Mp1 p
M 1
@ h 1 ,
>1 1 Pr^L h` p@
M
M > Pr^L d h` Pr^L h`@ p1 p
M 1
`
Pro o f . T he pro o f is o bt aine d w it h st andard pro babilist ic c o nside rat io ns ( se e [ 1 1 ] ) .
¡
E> S@ is a c o nst ant inde pe nde nt o f t he p v alue , but de pe nde nt o nly o n t he me ssage
le ngt h dist ribut io n. A s it appe ars f ro m Equat io n ( 4) and Le mma 1 , t he c hanne l ut iliz a-
t io n is a f unc t io n o f t he pro t o c o l parame t e r p, t he numbe r M o f ac t iv e st at io ns and t he
me ssage le ngt h dist ribut io n. T he pro t o c o l c apac it y, say U MAX , is o bt aine d by f inding
t he p v alue , say pCopt , t hat max imiz e s Equat io n ( 3) . S inc e E> S@ is a c o nst ant , t he
ut iliz at io n- max imiz at io n pro ble m is e quiv ale nt t o t he f o llo w ing minimiz at io n pro b-
le m:
min ^ E>N @ 1 E>Idle@ E>N @ E>Coll | Coll@`

p > 0 ,1 @
c c , ( 8)
Fo r inst anc e , f o r t he S lo t t e d- A LO H A pCopt v alue is c alc ulat e d by

ac c e ss sc he me t he
c o nside ring in Equat io n ( 8) c o nst ant me ssage s w it h le ngt h 1 tslot . By so lv ing Equa-
t io n ( 8) w e o bt ain t hat pCopt 1 M and U MAX o e 1 . H o w e v e r, f ro m Equat io n ( 8) ,
M !!1
it is no t po ssible t o de riv e a ge ne ral e x ac t c lo se d f o rmula f o r t he pCopt v alue in t he

ge ne ral me ssage - le ngt h dist ribut io n c ase .
Due t o t he c ho ic e o f l 1 , c o llisio ns and suc c e ssf ul t ransmissio ns alw ays last a
single tslot . H o w e v e r, f o r a p-persistent C S M A pro t o c o l w it h a ge ne ral me ssage -
le ngt h dist ribut io n, t he pro t o c o l c apac it y de pe nds o n t he me ssage le ngt h, he nc e , t he
pCopt also de pe nds o n t he t raf f ic c harac t e rist ic s. T he pCopt v alue c an be c o mput e d f o r
any ne t w o rk c o nf igurat io n by nume ric ally max imiz ing Equat io n ( 3) . H e nc e , if a st a-
t io n has an e x ac t kno w le dge o f t he ne t w o rk c o nf igurat io n it is po ssible t o t une it s
bac ko f f algo rit hm t o ac hie v e a c hanne l ut iliz at io n c lo se t o t he pro t o c o l c apac it y. U n-
f o rt unat e ly, in a re al c ase a st at io n do e s no t hav e t his kno w le dge but it c an o nly e st i-
mat e it . U sing t his e st imat e , t he st at io n c an in princ iple c o mput e at run t ime t he
pCopt v alue . H o w e v e r, t he pCopt c o mput at io n o bt aine d by t he nume ric al max imiz at io n
o f Equat io n ( 3) is t o o c o mple x t o be e x e c ut e d at run- t ime . Furt he rmo re , t he nume ric al
max imiz at io n o f Equat io n ( 3) do e s no t pro v ide a c lo se d f o rmula f o r t he o pt imal w o rk-
ing c o ndit io ns, and he nc e it do e s no t pro v ide any insight o n t he o pt imal pro t o c o l
be hav io r. H e nc e , t he Equat io n ( 3) c an be ado pt e d t o de riv e t he o pt imal c apac it y st at e
in an o f f - line analysis, but it w o uld be c o nv e nie nt t o de riv e a simple r re lat io nship t o
pro v ide an appro x imat io n o f t he pCopt v alue t o guarant e e a quasi- o pt imal c apac it y
st at e .
4.1 Protocol Capacity: An Approximate Analysis
Equat io n ( 8) c an be ado pt e d t o de riv e t he o pt imal c apac it y st at e in an o f f - line analysis,

but it is impo rt ant t o de riv e a simple r re lat io nship t o pro v ide an appro x imat io n o f t he
p v alue c o rre spo nding t o o pt imal c apac it y, i. e . pCopt . I n [ 2 2 ] , t he ne t w o rk o pe rat ing
po int in w hic h t he t ime w ast e d o n idle pe rio ds is e qual t o t he t ime spe nt o n c o llisio ns
w as ide nt if ie d as t he c o ndit io n t o o bt ain t he max imum pro t o c o l c apac it y in A LO H A
and C S M A pro t o c o ls w he re all me ssage s re quire o ne t ime slo t f o r t ransmissio n. I n
[ 1 2 ] t he same ne t w o rk o pe rat ing po int w as pro po se d t o de t e rmine a quasi- o pt imal
c apac it y st at e in t he p-persistent I EEE 80 2 . 1 1 pro t o c o l w he re t he me ssage le ngt h w as
sample d f ro m a ge o me t ric dist ribut io n. T his c o ndit io n c an be e x pre sse d w it h t he f o l-
lo w ing re lat io nship:
E> Idle _ p@ E>Coll | N tr t 1 @ , ( 9)
w he re E[ Coll | N tr t 1 ] is t he av e rage c o llisio n le ngt h giv e n t hat at le ast a t ransmis-

sio n o c c urs.
Equat io n ( 9 ) w as pro po se d in pre v io us pape rs using he urist ic c o nside rat io ns. S pe -
c if ic ally, it is st raight f o rw ard t o o bse rv e t hat E[ Idle _ p] is a de c re asing f unc t io n o f
t he p v alue , w he re as E[ Coll | N tr t 1 ] is an inc re asing f unc t io n o f t he p v alue . T he re -
f o re , Equat io n ( 9 ) sugge st s t hat a quasi- o pt imal c apac it y st at e is ac hie v e d w he n e ac h
st at io n be hav e s in suc h a w ay t o balanc e t he se t w o c o nf lic t ing c o st s.
H e re af t e r, f irst , w e giv e an analyt ic al just if ic at io n and a nume ric al v alidat io n o f t he

abo v e he urist ic . S e c o nd, w e pro v ide c lo se d f o rmulas f o r t he pCopt , and w e sho w t hat
t he o pt imal c apac it y st at e , giv e n t he me ssage le ngt h dist ribut io n, is c harac t e riz e d by
an inv ariant f igure : t he M pCopt pro duc t .
1
(popt) - M=10
(popt) - M=100
(capacity) - M=10
(capacity) - M=100
0.1
Relative Error
0.01
0.001
0.0001
1e-005
0 10 20 30 40 50 60 70 80 90 100
Message Length (time slots)
Fig. 4: Re lat iv e e rro rs f o r de t e rminist ic me ssage - le ngt h dist ribut io n
4.2 A Balancing Equation to Derive the Optimal Capacity State
Equat io n ( 9 ) pre se nt s a he urist ic t hat w as nume ric ally v alidat e d. I n t his se c t io n w e

analyt ic ally inv e st igat e it s v alidit y. T o make t rac t able t he pro ble m de f ine d by Equa-
t io n ( 5 ) in t he ge ne ral me ssage - le ngt h dist ribut io n c ase , w e assume t hat
E>Coll | Coll @ C max ^l1 , l 2 ` 2 . A c c o rding t o t his assumpt io n, subst it ut ing f o rmu-
las ( 5 ) and ( 6 ) in Equat io n ( 8) , af t e r so me alge braic manipulat io ns, it f o llo w s t hat t he
pCopt v alue is c alc ulat e d by so lv ing:
° C 1 p M C 1 ½° ( 10)
min ®
p > 0 ,1 @ °̄ Mp1 p
M 1 ¾
p > 0 ,1 @
^
min F p, M , C ` .
°¿
A s sho w n by t he f o llo w ing le mma, t he so lut io n o f Equat io n ( 1 0 ) is o f t e m e quiv ale nt

t o t he so lut io n o f Equat io n ( 9 ) .
2 T his appro x imat io n is mo t iv at e d by c o nside ring t hat : i) t he c o llisio n pro babilit y is lo w

w he n t he ne t w o rk is c lo se t o it s o pt imal c apac it y st at e , and ii) giv e n a c o llisio n, t he
pro babilit y t hat mo re t han t w o st at io ns c o llide is ne gligible ( as sho w n in [ 1 1 ] f o r t he p-
pe rsist e nt I EEE 80 2 . 1 1 pro t o c o l) .
Lemma 2. Fo r M > > 1 , t he p v alue t hat sat isf ie s Equat io n ( 1 0 ) c an be o bt aine d by

so lv ing t he f o llo w ing e quat io n:
E> Idle _ p@ E>Coll | N tr t 1 @ , ( 11)
Pro o f . T he pro o f is re po rt e d in A ppe ndix A .

¡
Le mma 2 sho w s t hat , asympt o t ic ally ( M> > 1 ) , in p- pe rsist e nt C S M A pro t o c o ls t he

o pt imal c apac it y st at e is c harac t e riz e d by t he balanc ing be t w e e n c o llisio ns’ durat io n
and idle t ime s. T o v e rif y t he e x ist e nc e o f t his re lat io nship f o r small me dium M v al-
ue s, w e nume ric ally so lv e d bo t h Equat io n ( 4) and Equat io n ( 9 ) f o r a w ide range o f M
v alue s and se v e ral me ssage le ngt h dist ribut io ns. S pe c if ic ally, in Fig. 4 and 5 w e
sho w , f o r se v e ral t he av e rage me ssage le ngt h ( l) , t he re lat iv e e rro r3 be t w e e n t he o pt i-
mal p ( c urv e s t agge d c apac it y) , and t he p v alue t hat so lv e Equat io n ( 7 ) , c urv e s t agge d
c apac it y po pt .
1
(popt) - M=10
(popt) - M=100
(capacity) - M=10
(capacity) - M=100
0.1
Relative Error
0.01
0.001
0.0001
1e-005
0 10 20 30 40 50 60 70 80 90 100
Message Length (time slots)
Fig. 5: Re lat iv e e rro rs f o r ge o me t ric me ssage - le ngt h dist ribut io n
4.3 Protocol Capacity: A Closed Formula
T he Equat io n ( 9 ) allo w s us t o e v aluat e a quasi- o pt imal p v alue by simply de t e rmining

t he p v alue t hat balanc e E[ Idle _ p] and E[ Coll | N tr t 1 ] c o st s. By e x plo it ing t his
appro ac h it is po ssible t o af f o rd, at run- t ime , t he c hanne l- ut iliz at io n max imiz at io n.
I nde e d, e ac h st at io n by e x plo it ing t he c arrie r se nsing me c hanism is able t o dist inguish
t he idle pe rio ds by c o llisio ns and by suc c e ssf ul t ransmissio ns. Furt he rmo re , as sho w n
in t his subse c t io n, f ro m Equat io n ( 1 0 ) it is e asy t o de riv e an appro x imat e d c lo se d
3 T he re lat iv e e rro r is de f ine d as t he e rro r be t w e e n t he e x ac t v alue and it s appro x imat io n,

no rmaliz e d t o t he e x ac t v alue .
C
f o rmula f o r t he popt v alue . No t e t hat t his c lo se d f o rmula ge ne raliz e s re sult s alre ady
kno w n f o r S - A LO H A ne t w o rks [ 2 3] .
Lemma 3. I n an M - st at io n ne t w o rk t hat ado pt s a p- pe rsist e nt C S M A ac c e ss sc he me ,

in w hic h t he me ssage le ngt hs { Li} , no rmaliz e d t o t slot , are a se que nc e o f i. i. d. rando m
v ariable s, if t he st at io ns o pe rat e in asympt o t ic c o ndit io ns, unde r t he c o ndit io n
Mp 1 , t he pCopt v alue is4
MM 1
( 12)
1 2 C 1 1
pCopt # ,
M 1 C 1
>
w he re C E max ^L1 , L2 ` . @
PRO O F. T he pro o f is re po rt e d in A ppe ndix B.
1 2 C 1 1 ( 13)
M pCopt # ,
C 1
4 T his assumpt io n is as mo re c o rre c t as mo re lo ng is t he av e rage me ssage le ngt h. S e e , f o r

e x ample , t he nume ric al re sult s re po rt e d in Figure 4 and 5 .
PRO O F. T he pro o f o f t his pro po sit io n is st raight f o rw ard. U nde r t he c o ndit io n M !! 1 ,

Equat io n ( 1 2 ) c an be re w rit t e n as ( 1 3) by no t ing t hat ( M 1 ) | M .
¡
Remark. I n an M- st at io n ne t w o rk t hat ado pt s a p- pe rsist e nt C S M A ac c e ss sc he me ,

v ariable s, unde r t he c o ndit io n C !! 1 t he pCopt v alue is
1 ( 14)
pCopt # .
M C 2
Re sult s pre se nt e d in t his se c t io n indic at e t hat t he M pCopt pro duc t c an be e asily c o m-

put e d by e st imat ing o nly t he av e rage c o llisio n le ngt h. T he av e rage c o llisio n le ngt h
e st imat e s are e asy t o o bt ain and re liable if t he av e rage c o llisio n le ngt h do e s no t c hange
f re que nt ly and sharply. I n addit io n, t he se re sult s indic at e t hat t he e st imat io n o f t he
numbe r o f ac t iv e st at io ns is no t ne c e ssary f o r driv ing t he syst e m t o it s o pt imal c apac -
it y st at e .
5 . Energy Consumption in P-Persistent CSMA Access

Schemes
A s e x plaine d in t he int ro duc t io n, t he f init e bat t e ry po w e r re pre se nt s t he o t he r gre at e st

limit at io n t o t he ut ilit y o f po rt able c o mput e rs [ 2 5 ] , [ 2 6 ] , [ 2 7 ] . T he ne t w o rk int e rf ac e
is o ne o f t he main syst e m c o mpo ne nt s f ro m t he bat t e ry c o nsumpt io n st andpo int [ 2 8] .
H e nc e , t he e ne rgy use d by t he ne t w o rk int e rf ac e t o suc c e ssf ully t ransmit a me ssage is
t he o t he r impo rt ant f igure f o r a mo bile c o mput e r. T hus t he t arge t f o r o ur e nv iro nme nt
w o uld be bo t h t o max imiz e t he ne t w o rk c apac it y and t o minimiz e t he ne t w o rk-
int e rf ac e e ne rgy c o nsumpt io n. I t may appe ar t hat t he se t w o t arge t s c anno t be ac hie v e d
t o ge t he r. I t se e ms t hat t o max imiz e t he ne t w o rk c apac it y t he use rs must be gre e dy,
i. e . t ransmit t ing as muc h as po ssible . O n t he o t he r hand, minimiz at io n o f t he ne t -
w o rk- int e rf ac e e ne rgy se e ms t o indic at e t hat t he ne t w o rk sho uld be light ly lo ade d, i. e .
spo radic ac c e sse s t o t he ne t w o rk. I n t his pape r w e w ill sho w t hat , f o r p-persistent
C S M A ac c e ss sc he me s, t he main pe rf o rmanc e f igure s ( c apac it y and e ne rgy c o nsump-
t io n) are no t o rt ho go nal, i. e . t he ne t w o rk st at e t hat o pt imiz e s o ne inde x is no t f ar
f ro m be ing o pt imal also f o r t he o t he r o ne . T o sho w t his re sult , f irst ly w e ne e d t o
st udy o ur syst e m f ro m an e ne rgy c o nsumpt io n st andpo int , i. e . t o ide nt if y t he re lat io n-
ships t o c harac t e riz e t he p v alue t hat minimiz e s t he e ne rgy c o nsumpt io n.
5.1 An Analytical Model for the Energy Consumption
I n t his se c t io n w e de v e lo p an analyt ic al mo de l t o inv e st igat e t he e ne rgy c o nsumpt io n

Fro m an e ne rgy c o nsumpt io n st andpo int , t he ne t w o rk int e rf ac e alt e rnat e s be t w e e n t w o
45 0 M . C o nt i and E. Gre go ri
dif f e re nt phase s: t he t ransmit t ing phase , during w hic h it c o nsume s po w e r t o t ransmit

t he me ssage t o t he physic al c hanne l, and t he re c e iv ing phase , during w hic h it c o n-
sume s po w e r t o list e n t o t he physic al c hanne l. H e re af t e r, w e de no t e w it h PTX and
PRX t he po w e r c o nsumpt io n ( e x pre sse d in mW ) o f t he ne t w o rk int e rf ac e during t he
t ransmit t ing and re c e iv ing phase , re spe c t iv e ly.
W e assume t hat t he me ssage le ngt hs are i. i. d. rando m v ariable s. A n analyt ic al
mo de l f o r t he e ne rgy c o nsumpt io n in p-persistent I EEE 80 2 . 1 1 M A C pro t o c o l w as
de v e lo pe d in [ 1 5 ] .
Fo llo w ing t he same line o f re aso ning o f S e c t io n 4 and f o c using o n t he suc c e ssf ul
t ransmissio n o f a t agge d st at io n, w e de riv e an analyt ic al e x pre ssio n f o r t he e ne rgy
draine d by a t agge d- st at io n ne t w o rk- int e rf ac e t o suc c e ssf ully t ransmit a pac ke t . H e nc e ,
t he syst e m e f f ic ie nc y, f ro m t he e ne rgy st andpo int , c an be e x pre sse d as:
PTX l t slot ( 15)
U energy ,
>
E Energy virtual _ transmission _ time @
w he re E[ Energy virtual _ transmission _ time ] is t he e ne rgy c o nsume d by t he t agge d st at io n in

a t agge d_ st at io n v irt ual t ransmissio n t ime ( . e . , t he t ime be t w e e n t w o t agge d st at io n
suc c e ssf ul t ransmissio n) and l is t he av e rage me ssage t ransmissio n t ime , no rmaliz e d
t o t he tslot .
5.2 Energy Consumption: An Approximate Analysis
A s f o r t he t hro ughput max imiz at io n, it is de sirable t o hav e a simple r re lat io nship t han
E
Equat io n ( 1 5 ) t o pro v ide an appro x imat io n f o r t he popt v alue . T o t his e nd, in t he
f o llo w ing w e w ill inv e st igat e t he ro le o f t he v ario us t e rms o f Equat io n ( 1 5 ) in de t e r-
mining t he Ene rgy C o nsumpt io n. S pe c if ic ally, in t he e ne rgy c o nsumpt io n f o rmula
w e se parat e t he t e rms t hat are inc re asing f unc t io n o f t he p v alue f ro m t he t e rms t hat
are de c re asing f unc t io n o f t he p v alue .
T o ac hie v e t his, it is use f ul t o int ro duc e t he f o llo w ing pro po sit io n.
Proposition 2. I n an M- st at io n ne t w o rk t hat ado pt s a p- pe rsist e nt C S M A ac c e ss

sc he me , if t he st at io ns o pe rat e in asympt o t ic c o ndit io ns, U energy is e qual t o .
U energy PTXlt slot Ê>N @E>Energy

ta Idle_ p @ PTXlt slot ,
( 16)

PRX M 1 lt slot E> N ta @ M E> EnergyColl | N tr t 1 @`
PRO O F. T he pro o f re quire s so me alge braic manipulat io ns o f Equat io n ( 1 5 ) . W e f irst
o bse rv e t hat t he numbe r o f suc c e ssf ul t ransmissio ns in a v irt ual t ransmissio n t ime is
M . Furt he rmo re , in t he v irt ual t ransmissio n t ime t he re is e x ac t ly o ne suc c e ssf ul
t ransmissio n o f t he t agge d st at io n, and in av e rage t he re are ( M 1 ) suc c e ssf ul t rans-
missio ns o f t he o t he r st at io ns. I t is also st raight f o rw ard t o de riv e t hat t he av e rage
numbe r o f c o llisio ns t hat w e o bse rv e w it hin a v irt ual t ransmissio n t ime is
O pt imiz at io n o f Bandw idt h and Ene rgy C o nsumpt io n 45 1
( E[ N ta ] M ) i. e . , t he t o t al numbe r o f t ransmissio n at t e mpt s in a v irt ual t ransmis-

sio n t ime le ss t he av e rage numbe r o f suc c e ssf ul t ransmissio ns, i. e . . A c c o rding t o
t he se asse ssme nt s, w it h st andard pro babilist ic c o nside rat io ns Equat io n ( 1 6 ) is de riv e d.
¡
100
E[Idle_p]
E[EnergyColl] - PTXPRX=1
10
Energy Consumption
0.1
0.01
0 0.02 0.04 0.06 0.08 0.1
p value
E
Fig. 6: popt appro x imat io n w it h M 1 0 and l 2 t slot .
E
Fro m Pro po sit io n 2 it f o llo w s t hat popt c o rre spo nds t o t he p v alue t hat minimiz e s
t he de no minat o r o f Equat io n ( 1 6 ) . I t is also w o rt h no t ing t hat t he se c o nd and t hird
t e rms o f t his de no minat o r do no t de pe nd o n t he p v alue and he nc e t he y play no ro le in
t he minimiz at io n pro c e ss. No w o ur pro ble m re duc e s t o f ind
p
^
min E> N ta @ E Energy Idle_ > p @ E>N @ M E>Energy
ta Coll | N tr t 1 @` . ( 17)
No w le t us no t e t hat t he f irst t e rm in Equat io n ( 1 7 ) , i. e . >

E> N ta @ E Energy Idle_ p @,
af t e r so me alge braic manipulat io ns, re duc e s t o [ PRX t slot ( 1 p) ] p , and t hus t his
f irst t e rm is a de c re asing f unc t io n o f p. O n t he o t he r hand, t he se c o nd t e rm in Equat io n
( 2 1 ) is an inc re asing f unc t io n o f p. H e nc e , f o llo w ing t he same argume nt s t hat driv e t o
pro po se t he pro t o c o l- c apac it y appro x imat io n ( se e Equat io n ( 9 ) ) , w e pro po se t o ap-
E v alue w it h t he p v alue t hat
pro x imat e t he popt balanc e s t he inc re asing and de c re asing
c o st s o f p:
E> N ta @ E Energy Idle_ > p @ E>N @ M E>Energy

ta Coll | N tr t 1 @ . ( 18)
Equat io n ( 2 2 ) c an be re - w rit t e n as:
E Energy Idle_ > p @ E>Energy Coll | N tr t 1 @ PColl | N tr t1 . ( 19)

T o e ase t he c o mput at io n o f E[ EnergyColl | N tr t 1 ] w e subdiv ide t he c o llisio ns in t w o

subse t s de pe nding o r no t t he y inv o lv e t he t agge d st at io n. S pe c if ic ally, Equat io n ( 1 9 )
c an be re - w rit t e n as:
>
E Energy Idle _ p @ E>Energy tag _ Coll @
| tag _ Coll Ptag _ Coll | N tr t1
.
( 20)
E> Energy not _ tag _ Coll @

| not _ tag _ Coll Pnot _ tag _ Coll | N tr t1
E .
Equat io n ( 2 0 ) de f ine s a simple but appro x imat e re lat io nship t o c harac t e riz e popt
S pe c if ic ally, in Fig. 6 w e hav e plo t t e d E[ Energy Idle _ p ] and E[ EnergyColl | N tr t 1 ]
v e rsus t he p v alue , f o r v ario us PTX PRX v alue s. E[ Energy Idle _ p ] is e qual t o
E[ Idle _ p] due t o t he assumpt io n t hat PRX = 1 . T he p v alue t hat c o rre spo nds t o t he
int e rse c t io n po int o f t he E[ Energy Idle _ p ] and E[ EnergyColl | N tr t 1 ] c urv e s is t he
E
appro x imat io n o f t he popt v alue , as Equat io n ( 1 6 ) indic at e s. A s t he
E[ EnergyColl | N tr t 1 ] re lat e d t o PTX / PRX 1 is e qual t o t he av e rage le ngt h o f a
c o llisio n giv e n a t ransmissio n at t e mpt , i. e . E[ Coll | N tr t 1 ] , t he p v alue t hat c o rre -
spo nds t o t he int e rse c t io n po int o f t he E[ Idle _ p] and E[ Coll | N tr t 1 ] c urv e s pro -
v ide s a go o d appro x imat io n o f t he pCopt v alue , as Equat io n ( 6 ) indic at e s. W e no t e t hat
by inc re asing t he PTX v alue also E[ EnergyColl | N tr t 1 ] gro w s due t o t he rise in t he
e ne rgy c o nsumpt io n o f t agge d- st at io n c o llisio ns. H o w e v e r, E[ Energy Idle _ p ] do e s no t
E
de pe nd o n t he PTX v alue , he nc e , o nly a de c re ase in t he popt v alue c an balanc e t he
inc re ase in E[ EnergyColl | N tr t 1 ] .
5.3 Energy Consumption: A pEopt Closed Formula
S imilarly t o S e c t io n 4, w e c o nc lude t he c harac t e riz at io n o f o ur syst e m, f ro m t he En-

e rgy C o nsumpt io n st andpo int , by pro v iding c lo se d ( appro x imat e ) f o rmulas t o ide nt if y
t he ne t w o rk st at e t hat minimiz e s t he Ene rgy C o nsumpt io n.
Lemma 4. I n an M- st at io n ne t w o rk in w hic h t he me ssage le ngt hs { Li} , no rmaliz e d

t o t slot , are a se que nc e o f i. i. d. rando m v ariable s, unde r t he c o ndit io n Mp 1 t he
E
popt v alue is
M 1 ª M 2 ECT 1 º ( 21)
1 2 «C 1 » 1
M ¬ M PRX M ¼
E
popt # ,
ª M 2 ECT 1 º
M 1 «C 1 »
¬ M PRX M ¼
where C >
E max ^L1 , L2 `@ and ECT >
E Energy tag _ Coll tag _ Coll , N tr @
2 .
PRO O F. T he pro o f o f t his Le mma c an be f o und in [ 2 4]
A s w e hav e do ne in Pro po sit io n 1 , t he f o llo w ing pro po sit io n pro v ide s an analyt ic al
inv e st igat io n o f t he M popt
E f o r a large ne t w o rk- siz e po pulat io n. T his inv e st igat io n is
E
use f ul be c ause it sho w s ho w f o r a large ne t w o rk siz e po pulat io n t he popt v alue t e nds
t o t he pCopt v alue .
Proposition 3. I n an ne t w o rk w it h a large numbe r o f ac t iv e st at io ns M !! 1 in

w hic h t he me ssage le ngt hs { Li} , no rmaliz e d t o t slot , are a se que nc e o f i. i. d. rando m
v ariable s, t he o pt imal M popt
E
v alue is:
§ E 1 · ( 22)
1 2 ¨ C CT 1 ¸ 1
© PRX M ¹ 1 2 C 1 1
M popt
E
# | .
§ ECT 1 · C 1
¨C 1 ¸
© PRX M ¹
PRO O F. T he pro o f o f t his pro po sit io n is st raight f o rw ard. U nde r t he c o ndit io n
M !! 1 , Equat io n ( 2 1 ) c an be re w rit t e n as ( 2 2 ) by no t ing t hat ( M 1 ) | M and
( M 2 ) |M .
¡
Equat io n ( 2 2 ) pre se nt s a t ight analo gy w it h Equat io n ( 1 3) . Fo r a large ne t w o rk- siz e

E
po pulat io ns it is st raight f o rw ard t o o bse rv e t hat Mpopt | MpCopt ( as Equat io n ( 2 2 ) and
Equat io n ( 1 3) sho w ) . S inc e ECT is div ide d by M , t he e ne rgy c o nsumpt io n during
t agge d- st at io n c o llisio ns de c re ase s as t he M v alue inc re ase s. T he re f o re , t he PTX v alue
has no signif ic ant impac t o n t he M popt E c o mput at io n f o r a large ne t w o rk- siz e po pu-
E
lat io n as it c o nt ribut e s o nly t o ECT . O bv io usly popt pCopt w he n PTX PRX .
H o w e v e r, t he c o mpariso n be t w e e n t he st ruc t ure o f Equat io n ( 1 3) and Equat io n ( 2 2 )
sho w also t hat t he c o rre spo nde nc e be t w e e n t he o pt imal p v alue s c o nt inue s t o ho ld.
6. Effectiveness of the AOB Mechanism
I n t he re maining part o f t he pape r, by me ans o f t he disc re t e e v e nt simulat io n, w e

e x t e nsiv e ly inv e st igat e t he pe rf o rmanc e o f t he I EEE 80 2 . 1 1 pro t o c o l e nhanc e d w it h
t he A O B me c hanism. T his me c hanism ut iliz e t he c urre nt S_U e st imat e t o e v aluat e ,
t he o ppo rt unit y t o pe rf o rm o r t o de f e r a t ransmissio n at t e mpt aut ho riz e d by t he st an-
dard pro t o c o l ( se e S e c t io n 3) . S pe c if ic ally A O B use s t he pro babilit y o f t ransmissio n
de f ine d by Equat io n ( 1 ) . A s disc usse d be f o re , Equat io n ( 1 ) re quire s t he kno w le dge o f
t he opt_S_U parame t e r. Be lo w w e sho w ho w re sult s de riv e d in S e c t io ns 4 and 5 c an be
use d t o e st imat e t he opt_S_U v alue . S pe c if ic ally, in t he pre v io us se c t io ns w e sho w

t hat t he o pt imal c apac it y and e ne rgy st at e are c harac t e riz e d by inv ariant f igure s:
M pCopt and M popt E
. H e re af t e r w e inv e st igat e t he re lat io nship be t w e e n S_U and
M poptx
x ^C , E` .
W e de no t e w it h N tr t he numbe r o f st at io ns t hat make a t ransmissio n at t e mpt in a
slo t . H e nc e , P^N tr i` is t he pro babilit y t hat e x ac t ly i st at io ns t ransmit in a slo t ,
and P^N tr 0 ` is t he pro babilit y t hat a slo t re mains e mpt y. Le t us no w o bse rv e t hat
M x
popt is t he av e rage numbe r o f st at io ns w hic h t ransmit s in a slo t :
M M
M popt
x
¦ i P^N tr i` t ¦ P^N tr i` 1 P^N tr 0 ` S_ U .
i 1 i 1
T he abo v e f o rmula indic at e s t hat M popt

x
t S _ U . By e x plo it ing t his re sult w e de riv e
t he opt _ S _ U parame t e r v alue use d by A O B: opt _ S _ U | Mpopt
x
.
6.1 Steady-State Analysis
I n t his se c t io n w e analyz e t he A O B be hav io r w he n t he ne t w o rk o pe rat e s unde r st e ady-

st at e c o ndit io ns. T he pro t o c o l analysis in t ransie nt c o ndit io ns, and t he pro t o c o l ro -
bust ne ss are st udie d in t he subse que nt se c t io ns. T he physic al c harac t e rist ic s and pa-
rame t e r v alue s o f t he inv e st igat e d syst e m are re po rt e d in T able 1 .
T he t arge t o f a M A C pro t o c o l is t o share re so urc e s e f f ic ie nt ly amo ng se v e ral use rs.
T his e f f ic ie nc y c an be e x pre sse d in t e rms o f capacity a[ 2 5 ] , [ 2 9 ] . H o w e v e r f ro m t he
use r st andpo int o t he r pe rf o rmanc e f igure s are ne e de d t o me asure t he Q ualit y o f S e rv ic e
( Q o S ) t hat c an be re lie d o n. T he mo st w ide ly use d pe rf o rmanc e me asure is t he de lay,
w hic h c an be de f ine d in se v e ral f o rms, de pe nding o n t he t ime inst ant s c o nside re d dur-
ing it s me asure me nt ( ac c e ss de lay, que ue ing de lay, pro pagat io n de lay, e t c . ) . H e re af t e r,
w e w ill f o c us o n t he M A C de lay. T he M A C de lay o f a st at io n in a LA N is t he t ime
be t w e e n t he inst ant at w hic h a pac ke t c o me s t o t he he ad o f t he st at io n t ransmissio n
que ue and t he e nd o f t he pac ke t t ransmissio n [ 2 9 ] .
T o st udy t he pro t o c o l pe rf o rmanc e w e run a se t o f simulat iv e e x pe rime nt s w it h dif -
f e re nt M o f v alue s. A c t iv e st at io ns are assume d t o o pe rat e in asympt o t ic c o ndit io ns
( i. e . , w it h c o nt inuo us t ransmissio n re quire me nt s) . W e use a max imum numbe r o f 2 0 0
ac t iv e st at io ns be c ause t he numbe r o f st at io ns e x pe c t e d in t he f ut ure f o r suc h a syst e m
c o uld raise t he o rde r o f hundre ds [ 30 ] . Fo r e x ample , le t us t hink t o a c o nf e re nc e ro o m
in w hic h t he part ic ipant s use mo bile de v ic e s w it h a w ire le ss int e rf ac e .
T he e f f e c t iv e ne ss o f t he pro po se d A O B me c hanism is sho w n in Fig. 7 . T his f igure
sho w s t he c hanne l ut iliz at io n le v e l ac hie v e d by ado pt ing t he A O B syst e m and c o m-
pare s t his inde x w it h t he analyt ic ally de f ine d o pt imal ut iliz at io n le v e ls ( O PT c urv e s in
t he f igure ) . T he re sult s sho w t hat t he A O B me c hanism driv e s an I EEE 80 2 . 1 1 ne t -
w o rk v e ry c lo se t o it s o pt imal be hav io r at le ast f ro m t he c hanne l ut iliz at io n v ie w -
po int . O nly a lit t le o v e rhe ad is int ro duc e d w he n o nly f e w st at io ns are ac t iv e . I t is
w o rt h no t ing t hat , w it h t he A O B me c hanism, t he c hanne l ut iliz at io n re mains c lo se t o

it s o pt imal v alue e v e n in high- c o nt e nt io n sit uat io ns. I n suc h c ase s, A O B almo st
do uble s t he c hanne l ut iliz at io n w it h re spe c t t o t he st andard pro t o c o l.
Channel Utilization obtained: AOB vs. optimal values
0.9
0.8
0.7
AOB, avrg. payload = 2.5 Slots

0.6 AOB, avrg. payload = 50 Slots
AOB, avrg. payload = 100 Slots
Channel Utilization
OPT. value, avrg. payload = 2.5 Slots

OPT. value, avrg. payload = 50 Slots
0.5 OPT. value, avrg. payload = 100 Slots
0.4
0.3
0.2
0.1
0
2 20 40 60 80 100 120 140 160 180 200
Fig. 7: C hanne l ut iliz at io n o f t he I EEE 80 2 . 1 1 pro t o c o l w it h t he A O B me c hanism v s.

o pt imal v alue
99-th percentile of MAC access delay: Standard vs. AOB

600000
Standard, avrg. payload = 2.5 Slots

Standard, avrg. payload = 100 Slots
99-th percentile of MAC access delay (SlotTime units)
AOB, avrg. payload = 2.5 Slots

500000 AOB, avrg. payload = 100 Slots
400000
300000
200000
100000
0
2 20 40 60 80 100 120 140 160 180 200
Fig. 8: 9 9 - t h pe rc e nt ile o f M A C de lay
I n Fig. 8 w e re po rt t he 9 9 - t h pe rc e nt ile o f t he M A C de lay v s. c o nt e nt io n le v e l ( i. e .

numbe r o f ac t iv e st at io ns) f o r v ario us av e rage siz e s o f t he t ransmit t e d f rame s. S imula-
t iv e re sult s sho w t hat t he A O B me c hanism le ads t o a gre at re duc t io n o f t he t ail o f t he
M A C de lay dist ribut io n w it h re spe c t t o t he st andard ac c e ss sc he me alo ne . By no t ing
t hat w he n t he ne t w o rk o pe rat e s in asympt o t ic c o ndit io ns t he av e rage M A C de lay is
t he inv e rse o f t he st at io n t hro ughput , w e c an v e rif y t hat A O B is re ally e f f e c t iv e in
re duc ing t he t ail o f t he M A C De lay. Fo r e x ample , w it h 1 0 0 - slo t av e rage paylo ad, t he

rat io be t w e e n t he 9 9 - t h pe rc e nt ile o f t he M A C De lay w it h o r w it ho ut t he A O B
me c hanism is abo ut 6 w hile t he rat io be t w e e n t he av e rage M A C De lay is o nly abo ut
2 .
0.6
0.5
0.4 burst arrival time

Slot Utilization
AOB: 10 mixed-tx burst over 10 mixed-tx

STD: 10 mixed-tx burst over 10 mixed-tx
0.3
0.2
0.1
0
0 500 1000 1500 2000 2500 3000 3500 4000
Slot Units
Fig. 9: A burst o f 1 0 ne w st at io ns ac t iv at e s w he n t he ne t w o rk is o pe rat ing in st e ady- st at e

c o ndit io ns w it h 1 0 ac t iv e st at io ns ( S lo t ut iliz at io n)
0.6
0.5
Channel Utilization
0.4
0.3 burst arrival time

AOB: 10 mixed-tx burst over 10 mixed-tx
STD: 10 mixed-tx burst over 10 mixed-tx
0.2
0.1
0
0 500 1000 1500 2000 2500 3000 3500 4000
Block Units
Fig. 10: A burst o f 1 0 ne w st at io ns ac t iv at e s w he n t he ne t w o rk is o pe rat ing in st e ady-

st at e c o ndit io ns w it h 1 0 ac t iv e st at io ns ( C hanne l ut iliz at io n)
Pre v io us re sult s se e m t o indic at e t hat A O B, by re duc ing t he av e rage M A C de lay, has

also a po sit iv e e f f e c t o n t he po w e r c o nsume d by an I EEE 80 2 . 1 1 ne t w o rk int e rf ac e .
T his int uit io n is f urt he r c o nf irme d by t he re sult s pre se nt e d in [ 1 5 ] sho w ing t hat , by
ado pt ing A O B w it h c urre nt I EEE80 2 . 1 1 c ards, t he o pt imal c apac it y st at e c lo se ly ap-

pro x imat e s t he minimum e ne rgy c o nsumpt io n st at e .
6.2 AOB Behavior in Transient Situations
I n t his se c t io n w e analyz e t he pro t o c o l pro mpt ne ss t o re - t une w he n t he ne t w o rk st at e

sharply c hange s. S pe c if ic ally, w e inv e st igat e t he e f f e c t iv e ne ss t he AOB w he n t he re is
an upsurge in t he numbe r o f ac t iv e st at io ns. S pe c if ic ally, w e analyz e a ne t w o rk o pe r-
at ing in st e ady- st at e c o ndit io ns w it h 1 0 ac t iv e st at io ns. A f t e r 2 5 6 blo c k unit s 5 ( high-
light e d by t he v e rt ic al bar “burst arriv al t ime ”) , addit io nal 1 0 st at io ns be c o me ac t iv e .
A ll st at io ns t ransmit mix e d t raf f ic c o mpo se d by 5 0 % o f lo ng me ssage ( 1 5 0 0 byt e s)
and 5 0 % o f sho rt me ssage s ( 40 Byt e s) . Fig. 9 sho w s t he e f f e c t iv e ne ss o f t he A O B
me c hanism. I n t he A O B c ase , t he sharp inc re ase in t he numbe r o f ac t iv e st at io ns
pro duc e s a ne gligible e f f e c t bo t h in t he slo t ut iliz at io n and in t he c hanne l ut iliz at io n.
O n t he o t he r hand, t he st andard is ne gat iv e ly af f e c t e d by t his c hange : t he slo t ut iliz a-
t io n sharply inc re ase s ( i. e . , t he ne t w o rk c o nge st io n le v e l inc re ase s) and as a c o nse -
que nc e t he c hanne l ut iliz at io n de c re ase s.
0.8
0.79
Channel Utilization
0.78
0.77 AOB: no estimation error

AOB: - 50% error avg. payload 100 Slots
0.76 AOB: + 10% error avg. payload 100 Slots
AOB: + 25% error avg. payload 100 Slots
AOB: + 50% error avg. payload 100 Slots
0.75
10 20 30 40 50 60 70 80 90 100
Fig. 11: S e nsit iv e ne ss t o e rro rs in t he e st imat io n o f t he av e rage me ssage le ngt h ( S lo t

ut iliz at io n)
6.3 Protocol Robustness
T he A O B me c hanism, t o t une t he bac ko f f algo rit hm, re quire s t he kno w le dge o f t he

ne t w o rk st at us t hat is ide nt if ie d by t w o parame t e rs: t he av e rage me ssage le ngt h ( o r
5 A blo c k unit c o rre spo nds t o 5 1 2 slo t s. T he blo c k unit is int ro duc e d t o smo o t h t he t rac e .
T he smo o t hing w as int ro duc e d t o re duc e t he f luc t uat io ns and t hus inc re asing t he f igure
re adabilit y.
e quiv ale nt ly t he q parame t e r) and t he slo t ut iliz at io n. A s t he v alue s o f t he se parame t e rs

are o bt aine d t hro ugh e st imat io ns so me e rro rs may o c c ur. H e re af t e r, w e disc uss t he
se nsit iv e ne ss o f A O B t o t he se po ssible e rro rs. T o t his e nd w e c o mpare t he c hanne l
ut iliz at io n in t he ide al c ase ( no e st imat io n e rro rs) w it h t he c hanne l ut iliz at io n w he n an
e rro r is adde d/ subt rac t e d t o t he c o rre c t q and S_U e st imat e . S pe c if ic ally, in t he f o llo w -
ing f igure s t he c urv e t agge d w it h + x% ( - x%) e rro r is o bt aine d by mult iplying by
1 + x/ 1 0 0 ( 1 - x/ 1 0 0 ) t he re al e st imat e o f t he parame t e r. Re sult s o bt aine d are summariz e d
in Fig. 1 0 ( e rro rs o n q by assuming an av e rage me ssage le ngt h e qual t o 1 0 0 slo t s,
i. e . , q= 0 . 9 9 ) and in Fig. 1 2 ( e rro rs o n S_U) . T he se re sult s indic at e t hat t he A O B
c hanne l ut iliz at io n is sc arc e ly af f e c t e d by e st imat io n e rro rs. Fo r e x ample , assuming
c o nst ant e rro rs o f 5 0 %, t he c hanne l ut iliz at io n f luc t uat e s in a small int e rv al ( 2 - 3%)
aro und t he no - e rro r v alue . I t is w o rt h no t ing t hat due t o t he w ay A O B is de f ine d: i) f o r
large M e rro rs alw ays hav e a ne gat iv e impac t ( A O B is t une d t o o pt imiz e asympt o t ic
pe rf o rmanc e ) , ii) f o r f e w ac t iv e st at io ns, unde re st imat io n e rro rs ge ne rat e a c hanne l
ut iliz at io n w hic h is highe r t han t hat o bt aine d in t he no - e rro r c ase . T he lat t e r be hav io r
w as e x pe c t e d be c ause t he A C L v alue is t o o c o nse rv at iv e ( i. e . , it e x c e ssiv e ly limit s t he
t ransmissio n rat e ) w he n t he re are f e w ac t iv e st at io ns. T he parame t e rs unde re st imat io n
pro duc e s t he o ppo sit e e f f e c t t hus re sult ing in an inc re ase d c hanne l ut iliz at io n.
0.8
0.78
Channel Utilization
0.76
AOB: no estimation error
AOB: -50% Slot Utilization error
0.74 AOB: -25% Slot Utilization error
AOB: -10% Slot Utilization error
AOB: +10% Slot Utilization error
AOB: +25% Slot Utilization error
0.72 AOB: +50% Slot Utilization error
10 20 30 40 50 60 70 80 90 100
Fig. 12: S e nsit iv e ne ss t o e rro rs in t he e st imat io n o f t he S lo t U t iliz at io n
Acknoledgements
T his w o rk w as part ially suppo rt e d by t he NA T O C LG. 9 7 7 40 5 pro je c t “W ire le ss A c -

c e ss t o I nt e rne t e x plo it ing t he I EEE80 2 . 1 1 T e c hno lo gy” and by t he t he W I LM A pro -
je c t “W ire le ss I nt e rne t and Lo c at io n M anage me nt A rc hit e c t ure ” f unde d by t he Pro v inc e
o f T re nt o , I t aly.
References
1 . A NS I / I EEE S t andard 80 2 . 1 1 , “Part 1 1 : W ire le ss LA N- M e dium A c c e ss C o nt ro l ( M A C ) and

Physic al Laye r ( PH Y ) S pe c if ic at io n”, A ugust 1 9 9 9
2 . S t allings W . , Lo c al & M e t ro po lit an A re a Ne t w o rks, Fif t h Edit io n, Pre nt ic e H all 1 9 9 6 ,
pp. 35 6 - 383.
3. H . W o e sne r, J . P. Ebe rt , M . S c hlage r, A . W o lisz , “Po w e r- sav ing me c hanisms in e me rg-
ing st andards f o r w ire le ss LA Ns: T he M A C le v e l pe rspe c t iv e ”, IEEE Personal Comm,
1 9 9 8, pp. 40 - 48.
4. N. Bambo s, “T o w ard po w e r- se nsit iv e ne t w o rk arc hit e c t ure s in w ire le ss c o mmunic at io ns:
C o nc e pt s, issue s and de sign aspe c t s”, IEEE Personal Comm, 1 9 9 8, pp. 5 0 - 5 9 .
5 . W e inmille r J . , W o e sne r H . , Ebe rt J . P. , W o lisz A . , “A nalyz ing and t uning t he Dist rib-
ut e d C o o rdinat io n Func t io n in t he I EEE 80 2 . 1 1 DFW M A C Draf t S t andard”, Pro c . I nt .
W o rksho p o n M o de ling, M A S C O T 9 6 , S an J o se , C A .
6 . J . W e inmille r, M . S c hläge r, A . Fe st ag, A . W o lisz , " Pe rf o rmanc e S t udy o f A c c e ss c o n-
t ro l in W ire le ss LA Ns- I EEE 80 2 . 1 1 DFW M A C and ET S I RES 1 0 H I PERLA N", Mobile
Networks and Applications, V o l. 2 , 1 9 9 7 , pp. 5 5 - 6 7
7 . L. Bo no ni, M . C o nt i, L. Do nat ie llo , “De sign and Pe rf o rmanc e Ev aluat io n o f a Dist rib-
ut e d C o nt e nt io n C o nt ro l ( DC C ) M e c hanism f o r I EEE 80 2 . 1 1 W ire le ss Lo c al A re a Ne t -
w o rks”, Journal of Parallel and Distributed Computing, A c c ade mic Pre ss V o l. 6 0 N. 4 di
A prile 2 0 0 0 .
8. M . Ge rla, L. K le inro c k, “C lo se d lo o p st abilit y c o nt ro l f o r S - A lo ha sat e llit e c o mmunic a-
t io ns”, Proc. Fifth Data Communications Symp., Sept. 1977, pp. 2 . 1 0 - 2 . 1 9 .
9 . B. H aje k, T . V an Lo o n, “De c e nt raliz e d dynamic c o nt ro l o f a mult iac c e ss bro adc ast c han-
ne l”, IEEE Trans Automat. Control, V o l. 2 7 , 1 9 82 , pp. 5 5 9 - 5 6 9 .
1 0 . F. K e lly, “S t ho c ast ic M o de ls o f c o mput e r c o mmunic at io ns syst e ms”, J. Royal Statist.
Soc., S e rie s B, V o l. 47 , 1 9 85 , pp. 37 9 - 39 5 .
1 1 . F. C ali' , C o nt i M . , E. Gre go ri, "Dynamic T uning o f t he I EEE 80 2 . 1 1 Pro t o c o l t o
A c hie v e a T he o re t ic al T hro ughput Limit ”,IEEE/ACM Transactions on Networking, V o l-
ume 8 , No . 6 ( De c . 2 0 0 0 ) , pp. 7 85 - 7 9 9 .
1 2 . F. C ali' , C o nt i M . , E. Gre go ri, “Dynamic I EEE 80 2 . 1 1 : de sign, mo de ling and pe rf o rm-
anc e e v aluat io n”, IEEE Journal on Selected Areas in Communications, 1 8( 9 ) , S e pt e mbe r
2 0 0 0 . pp. 1 7 7 4- 1 7 86 .
1 3. Bianc hi G. , Frat t a L. , O liv ie ri M . , “Pe rf o rmanc e Ev aluat io n and Enhanc e me nt o f t he
C S M A / C A M A C pro t o c o l f o r 80 2 . 1 1 W ire le ss LA Ns”, pro c e e dings o f PI M RC 1 9 9 6 ,
1 0 / 1 9 9 6 , T aipe i, T aiw an, pp. 39 2 - 39 6 .
1 4. J . P. M o nks, V . Bharghav an, W . W . H w u, “A Po w e r C o nt ro lle d M ult iple A c c e ss Pro t o -
c o l f o r W ire le ss Pac ke t Ne t w o rks”, in Proc Infocom’01, A nc ho rage , A laska ( A pr. 2 0 0 1
1 5 . L. Bo no ni, M . C o nt i, L. Do nat ie llo , “A Dist ribut e d M e c hanism f o r Po w e r S av ing i n
I EEE 80 2 . 1 1 W ire le ss LA Ns”, ACM/Kluwer Mobile Networks and Applic. Journal, V o l.
6 , N. 3 ( 2 0 0 1 ) , pp. 2 1 1 - 2 2 2 .
1 6 . ht t p:/ / gro upe r. ie e e . o rg/ gro ups/ 80 2 / 1 1 / main. ht ml
1 7 . K . Bie se ke r, “T he Pro mise o f Bro adband W ire le ss”, I T Pro No v e mbe r/ De c e mbe r 2 0 0 0 ,
pp. 31 - 39 .
1 8. R. Bruno , M . C o nt i, E. Gre go ri, “W LA N t e c hno lo gie s f o r mo bile ad- ho c ne t w o rks”,
Pro c . H I C S S - 34, M aui, H aw aii, J anuary 3- 6 , 2 0 0 1 . A n e x t e nde d v e rsio n c an be f o und i n
t he C hapt e r 4 o f Handbook of Wireless Networks and Mobile Computing ( I . S t o jme no -
v ic Edit o r) , J o hn W ile y & S o ns, Ne w Y o rk, 2 0 0 1 .
1 9 . Go o dman J . , Gre e nbe rg A . G. , M adras N. , M arc h P. , “S t abilit y o f Binary Ex po ne nt ial

Bac ko f f ”, app. in t he Pro c . o f t he 1 7 - t h A nnual A C M S ymp. o n T he o ry o f C o mp. ,
Pro v ide nc e , M ay 1 9 85 .
2 0 . H ammo nd J . L. , O 'Re illy P. J . P. , Pe rf o rmanc e A nalysis o f Lo c al C o mput e r Ne t w o rks,
A ddiso n- W e sle y 1 9 88.
2 1 . H ast ad J . , Le ight o n T . , Ro go f f B. , “A nalysis o f Bac ko f f Pro t o c o ls f o r M ult iple A c c e ss
C hanne ls”, S iam J . C o mput ing v o l. 2 5 , No . 4, 8/ 1 9 9 6 , pp. 7 40 - 7 7 4.
2 2 . Gallaghe r R. G. , “A pe rspe c t iv e o n mult iac c e ss c hanne ls”, I EEE T rans. I nf o rmat io n
T he o ry, v o l. I T - 31 , No . 2 , 3/ 1 9 85 , pp. 1 2 4- 1 42 .
2 3. D. Be rt se kas, R. Gallage r, "Dat a Ne t w o rks" Prentice Hall, 1 9 9 2 .
2 4. R. Bruno , M . C o nt i, E. Gre go ri, " O pt imiz at io n o f Ef f ic ie nc y and Ene rgy C o n-
sumpt io n in p- pe rsist e nt C S M A - base d W ire le ss LA Ns", IEEE Transactions on Mobile
Computing, V o l. 1 N. 1 , J anuary 2 0 0 2 .
2 5 . A . C handra V . Gumalla, J . O . Limb, “W ire le ss M e dium A c c e ss C o nt ro l Pro t o c o ls”, I EEE
C o mmunic at io ns S urv e ys S e c o nd Q uart e r 2 0 0 0 .
2 6 . G. H . Fo rman, J . Z aho rjan, “ T he c halle nge s o f mo bile c o mput ing”, IEEE Computer,
A pril 9 4, pp. 38- 47 .
2 7 . T . I mie linsky, B. R. Badrinat h, “M o bile C o mput ing: S o lut io ns and C halle nge s in Dat a
M anage me nt ”, Communications of ACM , O c t . 9 4.
2 8. R. K rav e t s, P. K rishnan, “Po w e r M anage me nt T e c hnique s f o r M o bile C o mmunic at io n”,
Pro c e e dings o f T he Fo urt h A nnual ACM/IEEE International Conference on Mobile
Computing and Networking ( M O BI C O M '9 8) .
2 9 . C o nt i M . , Gre go ri E. , Le nz ini L. , “M e t ro po lit an A re a Ne t w o rks”, S pringe r V e rlag,
Lo ndo n, 1 9 9 7 .
30 . C he n K . C . , “M e dium A c c e ss C o nt ro l o f W ire le ss LA Ns f o r M o bile C o mput ing”, I EEE
Ne t w o rks, 9 - 1 0 / 1 9 9 4.
31 . W . R. S t e v e ns. T C P/ I P I llust rat e d, V o lume 1 : T he Pro t o c o ls, A ddiso n- W e sle y, Re ading,
M A , 1 9 9 4.
32 . M . S t e mm , R. H . K at z , “M e asuring and Re duc ing Ene rgy C o nsumpt io n o f Ne t w o rk
I nt e rf ac e s . in H and- H e ld De v ic e s”, Pro c . 3rd I nt e rnat io nal w o rksho p o n M o bile M ult i-
me dia C o mmunic at io ns ( M o M uC - 3) , Princ e t o n, NJ , S e pt e mbe r 1 9 9 6 .
Appendix: A
Lemma 2. Fo r M> > 1 , t he p v alue t hat sat isf ie s ( 6 ) c an be o bt aine d by so lv ing t he

f o llo w ing e quat io n: E> Idle _ p@ E>Coll | N tr t 1 @
w he re E[ Coll | N tr t 1 ] is t he av e rage durat io n o f a c o llisio n giv e n t hat at le ast a
t ransmissio n o c c urs.
PRO O F. T aking t he de riv at iv e o f F p, M , C w it h re spe c t p, and impo sing it e qual t o
0 , w e o bt ain t he f o llo w ing e quat io n:
^1 p
M
p1 p
M 1
` ( A.1)
p ½ .
>1 @ M 1 1

C ® p1 p 1 p
M 1 M
¾
¯ p¿
T he popt v alue is t he so lut io n o f Equat io n ( A . 1 ) .

First w e analyz e t he l. h. s o f Equat io n ( A . 1 ) . I t is e asy t o o bse rv e t hat t he l. h. s is
e qual t o 1 p t hat M> > 1 t e nds t o 1 p , and E> Idle@ PN tr t1 1 p ,
M 1 M M
1 1 p , i. e . , t he pro babilit y t hat at le ast a st at io n is t ransmit t ing.

M
w he re PN tr t1
U nde r t he c o ndit io n M p 1 , 6 , t he r. h. s o f Equat io n ( A . 1 ) c an be e x pre sse d as:
M 2 M 1

® p1 p
¯
M 1
>
1 1 p M 1
M
1
@
p ½

¾
p¿ 2

p 2 O Mp
3
( A.2)
By indic at ing w it h PColl | N tr ≥1 t he c o llisio n pro babilit y c o ndit io ne d t o hav e at le ast a

t ransmit t ing st at io n, it also ho lds t hat :
M M 1
PColl | N tr t1 PN tr t1 >
1 1 p
M
Mp1 p
M 1
@ 2
p 2 O Mp 3
( A.3)
I t is w o rt h no t ing t he similarit y be t w e e n t he r. h. s o f Equat io n ( A . 3) and t he r. h. s o f

Equat io n ( A . 2 ) . S pe c if ic ally t he r. h. s o f Equat io n ( A . 2 ) c an be w rit t e n as:
ªM 1 ªM 1
« 2
¬
º
p2 » M 1
¼

O Mp 3 M!!
1 o «
¬ 2
º
p2 » M
¼ .
( A.4)

O Mp
3
PColl | N tr t1 PN tr t1
H e nc e , Equat io n ( A . 1 ) c an be re w rit t e n as:

E> Idle@ PN tr t1 C PColl | N tr t1 PN tr t1 . ( A.5)
By div iding all t he t e rms in Equat io n ( A . 5 ) by PN tr t1 , and subst it ut ing t he C ap-

pro x imat io n w it h E>Coll | Coll @ , Equat io n ( A . 5 ) be c o me s:
E> Idle _ p@ E>Coll | N tr t 1 @ ( A.6)

.
T his c o nc lude s t he pro o f .
¡
Appendix: B
Lemma 3. I n an M- st at io n ne t w o rk t hat ado pt s a p- pe rsist e nt C S M A ac c e ss sc he me ,

6 T his assumpt io n is alw ays t rue in a st able syst e m as t he av e rage numbe r o f t ransmit t ing
st at io ns in an e mpt y slo t must be le ss t han o ne .
v ariable s, if t he st at io ns o pe rat e in asympt o t ic c o ndit io ns, unde r t he c o ndit io n

Mp 1 , t he pCopt v alue is
MM 1
( B.1)
1 2 C 1 1
pCopt # ,
M 1 C 1
>
w he re C E max ^L1 , L2 ` . @
PRO O F.
I n [ 2 4] it is sho w n t hat if t he ne t w o rk st at io ns are o pe rat ing c lo se t o t he pCopt v alue
and if t he st at io ns o pe rat e in asympt o t ic c o ndit io ns E>Coll | Collision@ | C . T he
E>Coll | Collision@ | C assumpt io n indic at e s t hat E[ Coll | Collision ] de pe nds o nly
o n t he me ssage - le ngt h dist ribut io n and Equat io n ( 1 1 ) c an be re w rit t e n as:
1 p
M
^ >
C 1 1 p
M
Mp1 p
M 1
@` 0 . ( B.2)
U nde r t he c o ndit io n Mp 1 ,
M M 1
1 p
M
| 1 Mp
2

p 2 2 Mp
3
, ( B.3)
> @ | M M2 1 p
2 ( B.4)
1 1 p Mp1 p 2 Mp
M M 1 3
.
By subst it ut ing ( B. 3) and ( B. 4) in ( B. 2 ) t he f o llo w ing e quat io n is o bt aine d:

M M 1
C 1 p 2 Mp 1 . ( B.5)
0
2
By so lv ing Equat io n ( B. 5 ) w e o bt ain t he f o rmula ( B. 1 ) and t his c o nc lude s t he pro o f o f
t he Le mma.
¡
Service Centric Computing – Next Generation
Internet Computing
Jerry Rolia, Rich Friedrich, and Chandrakant Patel
Hewlett Packard Labs, 1501 Page Mill Rd., Palo Alto, CA, USA, 94304
{jerry rolia,rich friedrich,chandrakant patel}@hp.com
www.hpl.hp.com/research/internet
Abstract. In the not-too-distant future, billions of people, places and

things could all be connected to each other and to useful services through
the Internet. In this world scalable, cost-effective information technology
capabilities will need to be provisioned as service, delivered as a service,
metered and managed as a service, and purchased as a service. We re-
fer to this world as service centric computing. Consequently, processing
and storage will be accessible via utilities where customers pay for what
they need when they need it and where they need it. This tutorial in-
troduces concepts of service centric computing and its relationship to
the Grid. It explains a programmable data center paradigm as a flexible
architecture that helps to achieve service centric computing. Case study
results illustrate performance and thermal issues. Finally, key open re-
search questions pertaining to service centric computing and Internet
computing are summarized.
1 Introduction
In the not-too-distant future, billions of people, places and things could all be
connected to each other and to useful services through the Internet. Re-use
and scale motivate the need for service centric computing. With service centric
computing application services, for example payroll or tax calculation, may be
composed of other application services and also rely on computing, network-
ing, and storage resources as services. These services will be offered according
to a utility paradigm. They will be provisioned, delivered, metered, managed,
and purchased in a consistent manner when and where they are needed. This
paper explains the components of service centric computing with examples of
performance studies that pertain to resources offered as a service.
Figure 1 illustrates the components of service centric computing. Applications
may be composed of application and resource services via open middleware such
as Web services. Applications discover and acquire access to services via a grid
service architecture. Resource utilities offer computing, network, and storage
resources as services. They may also offer complex aggregates of these resources
with specific qualities of service. We refer to this as Infrastructure on Demand
(IOD).

464 J. Rolia, R. Friedrich, and C. Patel
A p p lic a tio n s
O p e n M id d le w a re :
W e b s e rv ic e s
. G rid S e rv ic e A rc h ite c tu re
R e s o u rc e U tilitie s
Fig. 1. Service centric computing
Section 2 describes several classes of applications and their requirements on

infrastructure for service centric computing. Web service technologies are also in-
troduced. An example of a Grid resource management system and a Grid Service
Architecture are introduced in Section 3. Section 4 describes resource utilities,
resources as services, and IOD. A programmable data center is introduced as an
example of IOD. Next we describe a broad set of technologies that can be inte-
grated to offer IOD with specific qualities of service. Case studies illustrate key
issues with regard to IOD. A list of research challenges is offered in Section 5.
Concluding remarks are given in Section 6.
2 Applications and Web Services

This section discusses the requirements of applications on service centric com-
puting with a focus on issues that pertain to infrastructure. This is followed by
a brief discussion of Web service technologies as an open middleware for service
interaction.
2.1 Applications
We consider technical, commercial, and ubiquitous classes of applications and
their requirements on service centric computing.
Technical applications are typically processing, data, and/or communications
intensive. Examples of applications are found from life and material sciences,
manufacturing CAE and CAD, national defense, high-end film and video, elec-
tronic design simulation, weather and climate modeling, geological sciences, and
basic research. These applications typically present batch style jobs with a finite
duration.
Commercial applications may be accessed via Intranet or Internet systems.
They often rely on multi-tier architectures that include firewall, load balancer
and server appliances and exploit concepts of horizontal and vertical scalability.
Service Centric Computing – Next Generation Internet Computing 465
Examples of applications include enterprise resource management systems, E-

commerce systems, and portals. These applications typically require resources
continuously but may require different quantities of resources depending on fac-
tors such as time of day and day of week.
With ubiquitous computing there is a potential for literally billions of inter-
connected devices each participating in many instances of value added services.
Examples of devices include personal digital assistants, tools used for manufac-
turing or maintenance, and fixtures. Consider the following example of a value
added service for a maintenance process. Instrumentation within tools and an
aircraft under service record the steps of maintenance procedures as they are
completed to help verify that appropriate service schedules are being followed
correctly.
Applications place many requirements on service centric computing. They
include the following.
For technical computing:
– Expensive and/or specialized resources need to be easy to share
– Utilize geographically distributed resources effectively
– Share computing, storage, data, programs, and other resources
– Take advantage of underutilized resources
The above requirements have driven the development of grids for high perfor-
mance computing. The following additional requirements arise for commercial
and ubiquitous computing:
– Automate the deployment and evolution of complex multi-tier applications
and infrastructure
– Support applications that execute continuously
– Provide access to resources with some level of assurance
– Scale: enable the deployment of many small distributed services that would
not otherwise be possible
– Mobility: take computation/data closer to the client(s)
Technical computing applications express demands for numbers of resources
and their capacities. For some applications these demands may constrain the
topology of a supporting grid’s infrastructure – for example requiring the use of
high capacity low latency communication fabrics. However the actual topology of
a grid’s infrastructure is in general deliberately transparent to such applications.
We refer to this as infrastructure transparency.
In contrast multi-tier commercial and ubiquitous computing applications can
require explicit networking topologies that include firewalls and load balancers.
Networking topology and appliance configuration may implement security and
performance policies and as a result can be explicit features of such applications.
As a result such applications are not necessarily infrastructure transparent. They
may require changes in infrastructure in response to changes in workload, device
mobility, or maintenance.
To reduce the time and skills needed to deploy infrastructure; avoid the
pitfalls of over or under-provisioning; and to enable large scale and adaptive
service deployment (due to changing workload demands and/or mobility), com-

mercial and ubiquitous computing applications need automated support for the
deployment and maintenance of both applications and their infrastructure. Fur-
thermore such applications need assurances that they will be able to acquire
resources when and where they need them.
2.2 Web Services

Web services [1] are a collection of middleware technologies for interactions be-
tween services in Internet environments. They are platform and application lan-
guage independent. The following Web service technologies are likely to be ex-
ploited by applications to discover and bind with other application services and
infrastructure services.
– XML, descriptive data
– Messaging (SOAP, etc), message formats
– Description of documents (WSDL, etc), interface/data specifications
– Registry (WSIL, UDDI), Directories/lookup
The extended markup language (XML) [2] describes data. The Simple Object
Access Protocol (SOAP) [3] is a mechanism for framing data for remote proce-
dure calls. The Web Service Description Language (WSDL) [4] defines type infor-
mation for data and interfaces. The Web Service Inspection Language (WSIL) [5]
and Universal Description, Discovery and Integration (UDDI) [6] offer registry
and lookup services.
A business can become a service provider by encapsulating application func-
tionality as a Web service and then offering that service over its Intranet or the
Internet. Another application can reuse that functionality by binding with and
then exploiting an instance of the service. In a world of Web services, applica-
tions will be an integration of locally managed and outsourced services. This is
referred to as service composition. Grid computing has a natural relationship to
the concept of Web services in that it provides resources as a service.
3 Grids and Grid Service Architectures

Today’s grids largely support the needs of the technical computing community.
However there are efforts underway to further develop the notion of grids to
the level of Grid Service Architectures (GSA) that support both technical and
commercial applications. The Global Grid Forum [7] is an organization that pro-
motes grid technologies and architectures. As examples of Grids and GSAs, this
section describes Globus [8] and the Globus Open Grid Service Architecture [9].
3.1 Globus
Globus is U.S. government sponsored organization formed to develop Grid so-
lutions for high performance scientific computing applications. The goals are
essentially those listed for technical applications in Section 2.1. The initial Grid
vision for the Globus Grid development was to enable computational grids:
A computational grid is a hardware and software infrastructure that

provides dependable, consistent, pervasive, and inexpensive access to
high-end computing capabilities.
Ian Foster and Carl Kesselman [10]
Figure 2 illustrates the resource management architecture for the Globus

grid infrastructure. The purpose of the infrastructure is to bind applications
with resource schedulers. Additional Grid services provide support for large file
transfer and access control.
From the figure, applications describe their resource requirements using a Re-
source Specification Language (RSL). They submit the RSL to Brokers. Brokers
interact with Information Services to learn about resource availability. An infor-
mation service receives resource availability information from resource scheduler
systems. Brokers may transform RSL iteratively to find a match between the
supply and demand for specific resource types. An application may then use
co-allocators to reserve access to the resources managed by resource schedulers
via an open Globus Resource Allocation Manager (GRAM) interface.
R S L
S p e c ia liz a tio n
B ro k e r
R S L R S L
Q u e rie s /In fo
A p p lic a tio n In fo S e rv ic e
.
R S L
C o − a llo c a to r R S L
R S L
R S L
G R A M G R A M G R A M
L S F C o n d o r O th e r
R e s o u rc e S c h e d u le rs
Fig. 2. Globus Resource Management Architecture
There are many examples of resource schedulers including LSF [12], Con-
dor [13] and Legion [14]. A review of scheduling in Grid environments is given
in reference [11].
Grids help to enable a utility computing paradigm by providing the mecha-

nisms to match demands for resources with the supply of resources.
3.2 Grid Service Architectures

Recently the notion of integrating the Globus Grid with Web services has devel-
oped. The vision for the Grid has evolved to support both e-business and high
performance computing applications:
The Grid integrates services across distributed, heterogeneous, dynamic
virtual organizations formed from the disparate resources within a single
enterprise and/or from external resource sharing and service provider
relationships in both e-business and e-science
Foster, Kesselman, Nick, Tuecke [9]
The integration of Globus and Web services leads to a specification for an Open
Grid Services Architecture (OGSA) [9] that treats both applications and re-
sources uniformly as services. This uniform approach is expected to simplify the
development of more advanced Grid environments.
Within an OGSA Grid, persistent services accept requests for the creation of
service instances. These created instances are referred to as transient services.
An OGSA governs the creation of and interactions between service instances. A
newly defined Grid service interface provides a mechanism for service instance
creation, registration and discovery, the management of state information, noti-
fications, and management.
A Grid architecture based on Grid services helps to support advanced Grid
environments in the following ways. For example, service instances associated
with resources may exploit the Grid service interface to implement patterns for
joining and departing from resource pools managed by resource schedulers and
for propagating resource event information to a resource scheduler. Similarly an
application may use the Grid service interface to implement a pattern for asking
a resource scheduler for notification of events that pertain to a specific resource.
4 Resource Utilities
In a world of service centric computing, applications may rely on resource utilities
for some or all of their resource needs. Today’s grid environments offer resources
in an infrastructure transparent manner. Yet some styles of applications place
explicit requirements on infrastructure topology and qualities of service. In this
section we describe programmable data centers as resource utilities that can
offer infrastructure on demand. Technologies that contribute to infrastructure
on demand are described along with several examples of research in this area.
4.1 Programmable Data Centers

A programmable data center (PDC) is composed of compute, networking, and
storage resources that are physically wired once but with relationships that can
be virtually wired programatically. Virtual wiring exploits the existing virtual-

ization features of resources. Examples of these features are described in the next
subsection.
To support automation and ease of introducing applications into the data
center we introduce the notion of virtual application environments (VAE) [15].
A VAE presents an environment to an application that is consistent with the
application’s configuration requirements. For example, every application has cer-
tain requirements regarding server capacity, network connectivity, appliances,
and storage services. It can have explicit layers of servers and many local area
networks. Network and storage fabrics are configured to make a portion of the
data center’s resources and back end servers appear to an application as a ded-
icated environment. Data center management services create these VAEs by
programming resource virtualization features. This is done in a manner that iso-
lates VAEs from one another. Later, applications can request changes to their
VAEs, for example to add or remove servers in response to changing workload
conditions. A PDC is illustrated in Figure 3.
P ro g ra m m a b le
n e tw o rk fa b ric s
In fra s tru c tu re
P ro c e s s in g S to ra g e o n D e m a n d
e le m e n ts e le m e n ts
V A E 1
.
In fra s tru c tu re V irtu a l
S p e c ific a tio n w irin g
V A E n
In te rn e t
Fig. 3. Programmable Data Center
A PDC accepts a markup language description of the infrastructure required

by a VAE. It must perform an admission control test to report whether it has
sufficient resources to host the application with its desired qualities of service.
Applications may express requirements for many qualities of service that
include: Internet bandwidth, internal communication and storage bandwidths,
packet latencies and loss rates, server, storage, and network fabric reliability.
They are also likely to require some assurance that it will be possible to acquire
resources when they are needed.
Security and reliability are essential features of PDCs. As programmable

entities strict controls must be in place to ensure that only valid changes are made
to infrastructure and so that VAEs are isolated from one another. Additionally,
many applications may rely on shared resources such as PDC network links so
the consequences of single failures are larger than in environments without such
sharing. For these reasons PDCs must have security and reliability built into
their architectures.
Planetary scale computing relies on notions of PDCs and programmable net-
works. With planetary scale computing PDCs are joined by metro and wide area
networking infrastructures providing network capacity on demand. Applications
may be deployed across multiple PDCs with resources allocated near to their
end-users or where capacity is least expensive. As an application’s compute and
storage requirements change corresponding changes must be made to network
capacities between the PDCs.
4.2 Programmable Infrastructure
This section gives examples of virtualization technologies that can contribute

to planetary scale computing. Subsets of these technologies can be combined to
provide infrastructure as a service with specific qualities of service.
Programmable Networks. Research on programmable networks has provided

a wide range of virtualization technologies. These include:
– Virtual Private Networks (VPN)

• A tunneling mechanism that frames encrypted source packets for trans-
mission to a destination appliance over an insecure network. The en-
crypted packets are treated as data by the appliance, decrypted, and
dropped onto a secure network for delivery to the packet’s true destina-
tion.
• Properties: Security isolation
– Virtual LANs (VLAN)
• Ethernet packets are augmented with a VLAN tag header. Ports on
appropriate Ethernet switches can be programmed to only accept and
forward frames (at line speed) with specific VLAN tags.
• Properties: Security isolation, classes of service
– Multiple Protocl Label Switching (MPLS)
• Similar to VLAN tags. However these tags can be used by
switches/routers to identify specific tunnels of data. Resource reserva-
tion, class of service differentiation, and other protocols provide support
for fast recovery in the event of network failures and capacity on demand
for these tunnels.
• Properties: Isolation, reliability, capacity on demand
– Resilient Packet Rings (RPR)
• These rings are being used to replace SONET infrastructure in metro

and wide area networks. They provide a new frame specification that
can carry Ethernet and other traffic, offer security isolation and have
special support for fast recovery and capacity on demand.
• Properties: Security isolation, reliability, capacity on demand
– Optical wavelength switching
• Optical switches can support switching at the abstraction of wavelengths.
In an earlier section we described virtual wiring. Wavelength switching is
best described as virtual wire. With appropriate use of lasers wavelengths
can each support tens of Gbps of bandwidth and fibers can support hun-
dreds of wavelengths. Lightpaths are end-to-end circuits of wavelengths
that provide true performance isolation across optical switching fabrics.
• Properties: Security and performance isolation, reliability, capacity on
demand
Programmable Servers. Since the time of early IBM mainframes server vir-
tualization has been an important feature for resource management. With server
virtualization each job or application is isolated within its own logically inde-
pendent system partition. The fraction of system resources associated with each
partition can be dynamically altered, permitting the vertical scaling of resources
associated with an application. Server virtualization is a convenient mechanism
to achieve server consolidation. Examples of systems that support server virtu-
alization include:
– HP: Superdome and mid-range HP-UX servers [16]

– Sun: Sun Fire [17]
– IBM Mainframes [18]
– Intel [20] processor based servers with VMWare [19]
Server virtualization offers: performance isolation for partitions – when the re-
source consumption of each partition can bounded; capacity on demand – when
the fractions of resources associated with partitions can be changed dynamically;
security isolation – depending on the implementation; and can be used to sup-
port high availability solutions with redundant, but idle, application components
residing in partitions on alternative servers.
Programmable Storage. Today’s storage systems are utilities in and of them-

selves. They can contain thousands of physical disks and have sophisticated
management services for backup, self-tuning, and maintenance. Storage virtual-
ization offers the notion of virtual disks (logical units). These virtual disks are
striped across a storage system’s physical disks in a manner that supports the
service level requirements and workload characteristics of application loads.
All major storage vendors support storage virtualization for storage systems
accessed via storage area networks. Storage virtualization mechanisms support
the management of performance, capacity, availability, reliability, and security.
Programmable Cooling Systems. Data centers contain thousands of single

board systems deployed in racks in close proximity which results in very high heat
density. Thermal management aims to extract heat dissipated by these systems
while maintaining reliable operating temperatures. Air conditioning resources
account for 30% of the energy costs of such installations [32].
Today’s data centers rely on fixed cooling infrastructures. The PDCs of to-
morrow will exploit programmable cooling controls that include:
– Variable capacity air movers
– Variable capacity compressors in air conditioners
– Variable capacity vents and air distribution systems
These features can be used to dynamically allocate cooling resources based on
heat load while operating at the highest possible energy efficiency. This type of
on-demand cooling is expected to reduce cooling costs by 25% over conventional
designs [32].
Programmable Data centers. Programmable data centers provide infrastruc-

ture on demand for complex application infrastructures. They exploit the virtu-
alization features of other resources to render multi-tier virtual infrastructure.
This permits applications to automate the acquisition and removal of resources
in proportion to their time varying workloads.
Examples of programmable data centers include:
– HP Utility Data Center with Controller Software [21], provides integrated
computing, networking, and storage infrastructure as a service
– Terraspring [22], provides integrated computing, networking, and storage
infrastructure as a service
– Think Dynamics [23], provides a scripting environment to enable the imple-
mentation of infrastructure on demand
– IBM Research, The Oceano Project [24], an E-business utility for the support
of multi-tier E-commerce applications
These systems help to provide infrastructure on demand. For the commer-
cial PDCs, solutions must be engineered to offer security isolation and specific
internal network qualities of service that are required by their customers.
4.3 Case Studies on Infrastructure on Demand

This subsection offers several examples of research and results on infrastructure
on demand. We consider server consolidation, presenting some otherwise unpub-
lished results that demonstrate opportunities for resource sharing in a commer-
cial data center environment. Next we illustrate the resource savings offered by
a utility environment to two horizontally scalable Web based applications along
with a mechanism to achieve those savings. We note that commercial applica-
tions are unlikely to rely on utilities unless they can receive some assurance that
resources will be available when they need them. We describe some recent work
on admission control for PDCs. Next, an example of a self-managing storage

system is given.
Planetary scale computing relies on the co-allocation of resources, for example
wide area networking as well as data center resources. Mechanisms to achieve
co-allocation in Grid environments are described. The concepts of programmable
data centers, programmable networks, and co-allocation help to enable wide area
load balancing. Control issues regarding wide area load balancing are introduced.
Finally, we present results that pertain to the thermal management of data
centers. We describe the importance of thermal management on resource relia-
bility and on overall energy costs.
Server consolidation. Figure 4 presents the results of a server consolidation

exercise for 10 identically configured 6-cpu servers from an enterprise data center.
Consolidation is based on cpu utilizations as measured over 5 minute intervals
for nearly 2 months. An off-line integer programming model is used to map work
from the source servers onto as few consolidated target servers as possible such
that the work of only one source server is allocated to a target server or the total
per-interval mean cpu utilization on the target server does not exceed 50%. The
following factors are considered:
– Number of CPUs per server - The number of cpus per target server: 6, 8,
16, 32
– Fast migration - whether the work of a source server may migrate between
target servers at the end of each interval without penalty.
– On-line server migration - whether the pool of target servers can vary in size.
The figure shows the peak number of cpus required with and without fast mi-
gration and the mean number of servers required with fast migration and on-line
server migration. In the figure, FM represents the case with fast migration while
fm represents no fast migration.
We found that fast application migration enables a more effective consolida-
tion but is sensitive to number of cpus per server. As the number of cpus per
target server increases fast migration offered no benefit for the system under
study. On-line server migration permits us to reclaim unused capacity for other
purposes. It permits us to meaningfully characterize a system based on its av-
erage number of target servers required because we can use these resources for
another purpose. Last, the figure also shows that for the system under study
many small servers can be nearly as effective as fewer large servers.
Horizontal Scalability. Ranjan et. al., characterize the advantages of on-line

server migration for commercial E-commerce and search engine systems using a
trace driven simulation environment [25]. In each case the simulation exploits an
algorithm named Quality of Infrastructure on Demand (QuID) that attempts to
maintain a target cpu utilization for application servers by adding and removing
servers in response to changing workload conditions. The simulation takes into
account the time needed to migrate servers into an application (including server
1
5 0 % fm P e a k
5 0 % F M P e a k
0 .9 5 0 % F M M e a n
0 .8
0 .7
F r a c tio n o f C P U s
0 .6
0 .5
0 .4
0 .3
0 .2
0 .1
0
6 8 1 6 3 2
N u m b e r o f C P U s p e r S e rv e r
Fig. 4. Fraction of Original Number of CPUs
boot time) and time needed to drain user sessions prior to removing a server. The
results show resource savings, with respect to a static allocation of resources, of
approximately 29%. In both cases the peak to mean ratio for resource demand
was approximately 1.6. A transformed version of the E-commerce trace with
a peak to mean of 5 offered resource savings of 68% with respect to a static
allocation of resources. We note that the resource savings are expected to be
higher in practice since the static cases would have to be further over-provisioned.
Admission control for PDCs. Commercial applications that exploit infras-

tructure on demand will expect some assurance that resources will be available
when they need them. An admission control approach is presented in [26]. The
approach characterizes demand profiles for applications based on factors such as
time of day. An aggregate of the demand profiles along with application demand
correlation information are used to estimate the number of resources needed to
satisfy requests for resources with a specific probability θ. Simulations suggest
that the technique is relatively insensitive to correlations in application demands
as long as they are taken into account when estimating the number of resources
required.
Automated storage management. Reference [27] describes tools for auto-

mated storage management. Methods are used to automate an initial storage
design – the layout of virtual disks over physical disks. Iterative techniques then
exploit online measurements to improve the design while the system operates.
Experiments showed performance results within 15% of that achieved by expert
administrators.
Co-allocation. A resource co-allocation technique is described in reference [29].

The method extends the Globus resource management architecture with a com-
ponent for acquiring groups of resources from multiple resource schedulers. A
two phase commit protocol can be used to ensure that either all or none of the
required resources are reserved.
Such a co-allocation mechanism is particularly important in the context of
service centric computing as it may be necessary to acquire resources from mul-
tiple resource utilities that include network providers and programmable data
centers.
Wide area load balancing. A wide area load balancing system is described
in reference [30]. A goal of the work is to balance application demands over
servers within and across utilities using distributed, collaborative, self-control
mechanisms.
Thermal management for PDCs. Programmable cooling is a smart cooling

proposition achieved through modeling, metrology and controls - by charting
real-time temperature distribution through a distributed sensor network and
modulating the cooling. The cooling resources are dynamically provisioned based
on distributed sensing (power, air flow and temperature) in the data center and
numerical modeling [31][32][33]. The capacity of compressors, condensers, and
air moving devices are varied commensurate with the heat loads present in the
data center. An example is an instance when high heat loads prompt an increase
in the opening of “smart” cool air inlet vents and a change in speed of air movers
and compressors in air conditioners to address a specific area in the data center.
In addition to this dynamic variation in cooling, distributed measurements
and thermal resource allocation policies may guide the provisioning of workload
within the data center. As an example we may choose to provision a resource that
results in the most efficient utilization of cooling resources. In yet another exam-
ple, workload allocation policies programmatically move workload and shut some
systems down in response to the failure of certain air conditioning infrastructure
thereby maintaining the reliability of the overall data center. Furthermore, in
the context of Grid computing, workload may be provisioned in a global net-
work of data centers based on the most cost effective energy available e.g. on
diurnal basis based on the climate - e.g. Bangalore, India at night to provide
a more energy efficient condensing temperature for the air conditioning vapor
compression cycle. Additionally, the economics of energy production around the
globe may be used to drive the choices for global load distribution.
5 Research Challenges
This section offers several performance related research challenges for service
centric computing. They address issues of data center design and resource man-
agement, resource management for planetary scale computing (federations of
infrastructure providers), and general control and validation for these large scale
systems. We use the term resource broadly to include information technology
and energy.
What is the most efficient, economical data center design?
– What high density, low power, high performance computing architectures
most economically support infrastructure as a service?
– What are the simple building blocks of processing, communications and stor-
age that support dynamic allocation at a data center level of granularity?
– What are the implications of commodity components on multi-system de-
signs?
What are the most effective performance management techniques for utility com-
puting?
– What measurement techniques/metrics are appropriate for large scale dis-
tributed environments?
– What automated techniques are appropriate for creating models of applica-
tions and infrastructure?
– How are models validated for this very dynamic world?
What are the most efficient dynamic resource management techniques?
– What techniques are appropriate for ensuring qualities of service within lay-
ers of shared infrastructures?
– What to do about federations of providers (co-allocation)?
– What techniques are appropriate for making good use of resources?
What control system techniques can be applied effectively to this scale and
dynamism?
– What automated reasoning techniques can eliminate the complexity of con-
trolling large scale systems?
– What control theoretic techniques are applicable to reactive and predictive
events?
– What time scales are appropriate for control?
– How are control measures and decisions coordinated across federated sys-
tems?
What is the science of large scale computing that provides probabilistic assurance
of large scale behavior based on small scale experiments?
– What is the equivalent to the aeronautical engineer’s wind tunnel?
– What behaviors scale linearly?
6 Summary and Remarks

This paper motivates and explains the concept of service centric computing as
an approach for next generation Internet computing. With service centric com-
puting: applications, resources, and infrastructure are offered as services. This is
essential for the support of commercial and ubiquitous computing applications
as it enables the reuse of application functions, server consolidation, and large
scale deployment.
As an architecture, service centric computing relies on middleware such as
Web services and Grid service architectures as open mechanisms for service
creation, interactions, discovery, and binding. Resource utilities offer access to
resources and infrastructure. Today’s resource utilities in Grid environments
typically offer resources in an infrastructure transparent manner. Commercial
applications can have explicit dependencies on networking topologies and their
relationship with servers, storage, and appliances. These require resource utilities
that offer infrastructure on demand.
We believe that there are many opportunities for research in this area. When
realized service centric computing will enable new kinds of applications and
reduce barriers to market entry for small and medium sized organizations.
7 Trademarks
Sun and Sun Fire are trademarks of the Sun Microsystems Inc., IBM is a trade-
mark of International Business Machines Corporation, Intel is a trademark of
Intel Corporation, VMware is a trademark of VMware Inc., HP Utility Data
Center with Controller Software is a trademark of Hewlett Packard Company,
Terraspring is a trademark of Terraspring, and Think Dynamics is a trademark
of Think Dynamics.
Acknowledgements. Thanks to Xiaoyun Zhu, Sharad Singhal, Jim Pruyne,

and Martin Arlitt of HP Labs for their helpful comments regarding this tutorial
paper.
References
1. www.webservices.org.
2. www.w3.org/XML.
3. www.w3.org/TR/SOAP.
4. www.w3.org/TR/wsdl.
5. www-106.ibm.com/developerworks/webservices/library/ws-wsilspec.html.
6. www.uddi.org.
7. www.globalgridforum.org.
8. Czajkowski K., Foster I., Karonis N., Kesselman C., Martin S., Smith W., and
Tuecke S.: A Resource Management Architecture for Metacomputing Systems.
JSSPP, 1988, 62-82.
9. Foster I., Kesselman C., Nick J., and Tuecke S.: The Physiology of the
Grid: An Open Grid Services Architecture for Distributed Systems Integration.
www.globus.org, January, 2002.
10. The Grid: Blueprint for a New Computing Infrastructure, Edited by Ian Foster
and Carl Kesselman, July 1998, ISBN 1-55860-475-8.
11. Krauter K., Buyya R., and Maheswaran M.: A taxonomy and survey of grid re-
source management systems for distributed computing. Software-Practice and Ex-
perience, vol. 32, no. 2, 2002, 135-164.
12. Zhou S.: LSF: Load sharing in large-scale heterogeneous distributed systems, Work-
shop on Cluster Computing, 1992.
13. Litzkow M., Livny M. and Mutka M.: Condor - A Hunter of Idle Workstations. Pro-
ceedings of the 8th International Conference on Distributed Computing Systems,
June, 1998, 104-111.
14. Natrajan A., Humphrey M., and Grimshaw A.: Grids: Harnessing Geographically-
Separated Resources in a Multi-Organisational Context. Proceedings of High Per-
formance Computing Systems, June, 2001.
15. Rolia J., Singhal S. and Friedrich R.: Adaptive Internet Data Centers. Proceedings
of the European Computer and eBusiness Conference (SSGRR), L’Aquila, Italy,
July 2000, Italy, http://www.ssgrr.it/en/ssgrr2000/papers/053.pdf.
16. www.hp.com.
17. www.sun.com.
18. www.ibm.com.
19. www.vmware.com.
20. www.intel.com.
21. HP Utility Data Center Architecture, http://www.hp.com/solutions1/
infrastructure/solutions /utilitydata/architecture/index.html.
22. www.terraspring.com.
23. www.thinkdynamics.com.
24. Appleby K., Fakhouri S., Fong L., Goldszmidt G. and Kalantar M.: Oceano –
SLA Based Management of a Computing Utility. Proceedings of the IFIP/IEEE
International Symposium on Integrated Network Management, May 2001.
25. Ranjan S., Rolia J., Zu H., and Knightly E.: QoS-Driven Server Migration for
Internet Data Centers. Proceedings of IWQoS 2002, May 2002, 3-12.
26. Rolia J., Zhu X., Arlitt M., and Andrzejak A.: Statistical Service Assurances for
Applications in Utility Grid Environments. HPL Technical Report, HPL-2002-155.
27. Anderson E., Hobbs M., Keeton K., Spence S., Uysal M., and Veitch A.: Hip-
podrome: running circles around storage administration. Conference on File and
Storage Technologies (FAST3902), 17545188 - 284530 January 2002, Monterey,
CA. (USENIX, Berkeley, CA.).
28. Borowsky E., Golding R., Jacobson P., Merchant A., Schreier L., Spasojevic M.,
and Wilkes J.: Capacity planning with phased workloads, WOSP, 1998, 199-207.
29. Foster I., Kesselman C., Lee C., Lindell R., Nahrstedt K., and Roy A.: A Dis-
tributed Resource Management Architecture that Supports Advance Reservations
and Co-Allocation. Proceedings of the International Workshop on Quality of Ser-
vice, 1999.
30. Andrzejak, A., Graupner, S., Kotov, V., and Trinks, H.: Self-Organizing Control
in Planetary-Scale Computing. IEEE International Symposium on Cluster Com-
puting and the Grid (CCGrid), 2nd Workshop on Agent-based Cluster and Grid
Computing (ACGC), May 21-24, 2002, Berlin.
31. Patel C., Bash C., Belady C., Stahl L., and Sullivan D.: Computational Fluid
Dynamics Modeling of High Compute Density Data Centers to Assure System Inlet
Air Specifications. Proceedings of IPACK’01 The Pacific Rim/ASME International
Electronic Packaging Technical Conference and Exhibition July 8-13, 2001, Kauai,
Hawaii, USA.
32. Patel, C.D., Sharma, R.K, Bash, C.E., Beitelmal, A: Thermal Considerations in
Cooling Large Scale High Compute Density Data Centers, ITherm 2002 - Eighth
Intersociety Conference on Thermal and Thermomechanical Phenomena in Elec-
tronic Systems. May 2002, San Diego, California.
33. Sharma, R.K, Bash. C.E., Patel, C.D.: Dimensionless Parameters for Evaluation of
Thermal Design and Performance of Large Scale Data Centers. Proceedings of the
8th ASME/AIAA Joint Thermophysics and Heat Transfer Conf., St. Louis, MO,
June 2002.
E u r o p e a n D a ta G r id P r o je c t: E x p e r ie n c e s o f D e p lo y in g a
L a r g e S c a le T e s tb e d fo r E -s c ie n c e A p p lic a tio n s
1 1 2 3
F a b riz io G a g lia rd i , B o b J o n e s , M a rio R e a le , a n d S te p h e n B u rk e
O n b e h a lf o f th e E U D a ta G rid P ro je c t
1 C E R N , E u ro p e a n P a rtic le P h y s ic s L a b o ra to ry ,
C H -1 2 1 1 G e n e v e 2 3 , S w itz e rla n d
{ F a b r i z i o . G a g l i a r d i , B o b . J o n e s } @ c e r n . c h
h t t p : / / w w w . c e r n . c h
2 IN F N C N A F , V ia le B e rti-P ic h a t 6 /2 ,
I-4 0 1 2 7 B o lo g n a , Ita ly
m a r i o . r e a l e @ c n a f . i n f n . i t
3 R u th e rfo rd A p p le to n L a b o ra to ry ,
C h ilto n , D id c o t, O x o n , U K
s . b u r k e @ r l . a c . u k
A b s tr a c t. T h e o b je c tiv e o f th e E u ro p e a n D a ta G rid (E D G ) p ro je c t is to a s s is t
th e n e x t g e n e ra tio n o f s c ie n tific e x p lo ra tio n , w h ic h re q u ire s in te n s iv e
c o m p u ta tio n a n d a n a ly s is o f s h a re d la rg e -s c a le d a ta s e ts , fro m h u n d re d s o f
te ra b y te s to p e ta b y te s , a c ro s s w id e ly d is trib u te d s c ie n tific c o m m u n itie s . W e s e e
th e s e re q u ire m e n ts e m e rg in g in m a n y s c ie n tific d is c ip lin e s , in c lu d in g p h y s ic s ,
b io lo g y , a n d e a rth s c ie n c e s . S u c h s h a rin g is m a d e c o m p lic a te d b y th e
d is trib u te d n a tu re o f th e re s o u rc e s to b e u s e d , th e d is trib u te d n a tu re o f th e
re s e a rc h c o m m u n itie s , th e s iz e o f th e d a ta s e ts a n d th e lim ite d n e tw o rk
b a n d w id th a v a ila b le . T o a d d re s s th e s e p ro b le m s w e a re b u ild in g o n e m e rg in g
c o m p u ta tio n a l G rid te c h n o lo g ie s to e s ta b lis h a re s e a rc h n e tw o rk th a t is
d e v e lo p in g th e te c h n o lo g y c o m p o n e n ts e s s e n tia l fo r th e im p le m e n ta tio n o f a
w o rld -w id e d a ta a n d c o m p u ta tio n a l G rid o n a s c a le n o t p re v io u s ly a tte m p te d .
A n e s s e n tia l p a rt o f th is p ro je c t is th e p h a s e d d e v e lo p m e n t a n d d e p lo y m e n t o f a
la rg e -s c a le G rid te s tb e d .
T h e p rim a ry g o a ls o f th e firs t p h a s e o f th e E D G te s tb e d w e re : 1 ) to d e m o n s tra te

th a t th e E D G s o ftw a re c o m p o n e n ts c o u ld b e in te g ra te d in to a p ro d u c tio n -
q u a lity c o m p u ta tio n a l G rid ; 2 ) to a llo w th e m id d le w a re d e v e lo p e rs to e v a lu a te
th e d e s ig n a n d p e rfo rm a n c e o f th e ir s o ftw a re ; 3 ) to e x p o s e th e te c h n o lo g y to
e n d -u s e rs to g iv e th e m h a n d s -o n e x p e rie n c e ; a n d 4 ) to fa c ilita te in te ra c tio n a n d
fe e d b a c k b e tw e e n e n d -u s e rs a n d d e v e lo p e rs . T h is firs t te s tb e d d e p lo y m e n t w a s
a c h ie v e d to w a rd s th e e n d o f 2 0 0 1 a n d a s s e s s e d d u rin g th e s u c c e s s fu l E u ro p e a n
U n io n re v ie w o f th e p ro je c t o n M a rc h 1 , 2 0 0 2 . In th is a rtic le w e g iv e a n
o v e rv ie w o f th e c u rre n t s ta tu s a n d p la n s o f th e E D G p ro je c t a n d d e s c rib e th e
d is trib u te d te s tb e d .
E u ro p e a n D a ta G rid P ro je c t 4 8 1
A d v a n c e s in d is trib u te d c o m p u tin g , h ig h q u a lity n e tw o rk s a n d p o w e rfu l a n d c o s t-

e ffe c tiv e c o m m o d ity -b a s e d c o m p u tin g h a v e g iv e n ris e to th e G rid c o m p u tin g
p a ra d ig m [6 ]. In th e a c a d e m ic w o rld , a m a jo r d riv e r fo r G rid d e v e lo p m e n t is
c o lla b o ra tiv e s c ie n c e m e d ia te d b y th e u s e o f c o m p u tin g te c h n o lo g y , o fte n re fe rre d to
a s e -s c ie n c e . W h ile s c ie n tis ts o f m a n y d is c ip lin e s h a v e b e e n u s in g c o m p u tin g
te c h n o lo g y fo r d e c a d e s (a lm o s t p re -d a tin g c o m p u tin g s c ie n c e its e lf), e -S c ie n c e
p ro je c ts p re s e n t fre s h c h a lle n g e s fo r a n u m b e r o f re a s o n s , s u c h a s th e d iffic u lty o f c o -
o rd in a tin g th e u s e o f w id e ly d is trib u te d re s o u rc e s o w n e d a n d c o n tro lle d b y m a n y
o rg a n is a tio n s . T h e G rid in tro d u c e s th e c o n c e p t o f th e V irtu a l O rg a n is a tio n (V O ) a s a
g ro u p o f b o th u s e rs a n d c o m p u tin g re s o u rc e s fro m a n u m b e r o f re a l o rg a n is a tio n s
w h ic h is b ro u g h t to g e th e r to w o rk o n a p a rtic u la r p ro je c t.
T h e E U D a ta G rid (E D G ) is a p ro je c t fu n d e d b y th e E u r o p e a n U n i o n w i t h 0
th ro u g h th e F ra m e w o rk V IS T R & D p ro g ra m m e (s e e w w w .e u - d a ta g r id .o r g ) . T h e r e
a re 2 1 p a rtn e r o rg a n is a tio n s fro m 1 5 E U c o u n trie s , w ith a to ta l p a rtic ip a tio n o f o v e r
2 0 0 p e o p le , fo r a p e rio d o f th re e y e a rs s ta rtin g in J a n u a ry 2 0 0 1 . T h e o b je c tiv e s o f th e
p ro je c t a re to s u p p o rt a d v a n c e d s c ie n tific re s e a rc h w ith in a G rid e n v iro n m e n t,
o ffe rin g c a p a b ilitie s fo r in te n s iv e c o m p u ta tio n a n d a n a ly s is o f s h a re d la rg e -s c a le
d a ta s e ts , fro m h u n d re d s o f te ra b y te s to p e ta b y te s , a c ro s s w id e ly d is trib u te d s c ie n tific
c o m m u n itie s . S u c h re q u ire m e n ts a re e m e rg in g in m a n y s c ie n tific d is c ip lin e s ,
in c lu d in g p a rtic le p h y s ic s , b io lo g y , a n d e a rth s c ie n c e s .
T h e E D G p ro je c t h a s n o w re a c h e d its m id -p o in t, s in c e th e p ro je c t s ta rte d o n Ja n u a ry
st st
1 2 0 0 1 a n d th e fo re s e e n e n d o f th e p ro je c t is o n D e c e m b e r 3 1 2 0 0 3 . A t th is s ta g e ,
v e ry e n c o u ra g in g re s u lts h a v e a lre a d y b e e n a c h ie v e d in te rm s o f th e m a jo r g o a ls o f
th e p ro je c t, w h ic h a re th e d e m o n s tra tio n o f th e p ra c tic a l u s e o f c o m p u ta tio n a l a n d
d a ta G rid s fo r w id e a n d e x te n d e d u s e b y th e h ig h e n e rg y p h y s ic s , b io -in fo rm a tic s a n d
e a rth o b s e rv a tio n c o m m u n itie s .
A p ro d u c tio n q u a lity te s tb e d h a s b e e n s e t u p a n d im p le m e n te d a t a n u m b e r o f E D G
s ite s , w h ile a s e p a ra te d e v e lo p m e n t te s tb e d a d d re s s e s th e n e e d fo r ra p id te s tin g a n d
p ro to ty p in g o f th e E D G m id d le w a re . T h e E D G p ro d u c tio n te s tb e d c o n s is ts c u rre n tly
o f te n s ite s , s p re a d a ro u n d E u ro p e : a t C E R N (G e n e v a ), IN F N -C N A F (B o lo g n a ), C C -
IN 2 P 3 (L y o n ), N IK H E F (A m s te rd a m ), IN F N -T O (T o rin o ), IN F N -C T (C a ta n ia ),
IN F N -P D (P a d o v a ), E S A -E S R IN (F ra s c a ti), Im p e ria l C o lle g e (L o n d o n ), a n d R A L
(O x fo rd s h ire ). T h e E D G d e v e lo p m e n t te s tb e d c u rre n tly c o n s is ts o f fo u r s ite s : C E R N ,
IN F N -C N A F , N IK H E F , a n d R A L . T h e re fe re n c e s ite fo r th e E D G c o lla b o ra tio n is a t
C E R N , w h e re , b e fo re a n y o ffic ia l v e rs io n o f th e E D G m id d le w a re is re le a s e d , th e
in itia l te s tin g o f th e s o ftw a re is p e rfo rm e d a n d th e m a in fu n c tio n a litie s a re p ro v e n ,
b e fo re d is trib u tio n to th e o th e r d e v e lo p m e n t te s tb e d s ite s .
T h e E D G c o lla b o r a tio n is c u rre n tly p ro v id in g fre e , o p e n so u rc e s o ftw a re b a se d o n th e

L in u x R e d H a t 6 .2 p la tf o r m . A ra n g e o f s ta n d a rd m a c h in e p ro file s is su p p o rte d
(C o m p u tin g E le m e n t, S to ra g e E le m e n t, U se r I n te rf a c e , R e so u rc e B ro k e r, W o rk e r
N o d e , N e tw o rk M o n ito rin g N o d e ). T h e te s tb e d p ro v id e s a se t o f c o m m o n sh a re d
4 8 2 F . G a g lia rd i e t a l.
s e rv ic e s a v a ila b le to a ll c e rtifie d u s e rs w ith v a lid X .5 0 9 P K I c e rtific a te s is s u e d b y a

C e rtific a te A u th o rity tru s te d b y E D G . A s e t o f to o ls is p ro v id e d to h a n d le th e
a u to m a tic u p d a te o f th e g rid -m a p file s o n a ll h o s ts b e lo n g in g to th e te s tb e d s ite s
w h ic h a llo w u s e rs to b e a u th o ris e d to u s e th e re s o u rc e s. A n u m b e r o f V O s h a v e b e e n
d e fin e d fo r th e v a rio u s re s e a rc h g ro u p s in v o lv e d in th e p ro je c t. E a c h V O h a s a n
a u th o ris a tio n s e rv e r (u s in g L D A P te c h n o lo g y ) to d e fin e its m e m b e rs, a n d a R e p lic a
C a ta lo g u e to s to re th e lo c a tio n o f its file s .
In th is a rtic le , b e s id e s th is s h o rt in tro d u c tio n a b o u t th e h is to ry o f th e p ro je c t a n d its

c u rre n t s ta tu s , w e g iv e a n o v e rv ie w o f th e te c h n o lo g y d e v e lo p e d b y th e p ro je c t s o fa r.
T h is is a c o n c re te illu s tra tio n o f th e le v e l o f m a tu rity re a c h e d b y G rid te c h n o lo g ie s to
a d d re s s th e ta s k o f h ig h th ro u g h p u t c o m p u tin g fo r d is trib u te d V irtu a l O rg a n is a tio n s .
T h e w o rk o f th e p ro je c t is d iv id e d in to fu n c tio n a l a re a s : w o rk lo a d m a n a g e m e n t, d a ta
m a n a g e m e n t, g rid m o n ito rin g a n d in fo rm a tio n s y s te m s , fa b ric m a n a g e m e n t, m a s s
d a ta s to ra g e , te s tb e d o p e ra tio n , a n d n e tw o rk m o n ito rin g . In a d d itio n th e re a re u s e r
c o m m u n itie s d ra w n fro m h ig h e n e rg y p h y s ic s , e a rth o b s e rv a tio n a n d b io m e d ic a l
a p p lic a tio n s . In N o v e m b e r-D e c e m b e r 2 0 0 1 th e firs t te s tb e d w a s s e t u p a t C E R N ,
m e rg in g a n d c o lle c tin g th e d e v e lo p m e n t w o rk p e rfo rm e d b y th e v a rio u s m id d le w a re
d e v e lo p e rs , a n d th e firs t re le a s e o f th e E D G s o ftw a re w a s d e p lo y e d a n d s u c c e s s fu lly
v a lid a te d . T h e p ro je c t h a s b e e n c o n g ra tu la te d “ fo r e x c e e d in g e x p e c ta tio n s ” b y th e
st
E u ro p e a n U n io n re v ie w e rs o n M a rc h 1 , 2 0 0 2 , d u rin g th e firs t o ffic ia l E U re v ie w .
2 T h e E u r o p e a n D a ta G r id M id d le w a r e A r c h ite c tu r e
T h e E D G a rc h ite c tu re is b a s e d o n th e G rid a rc h ite c tu re p ro p o s e d b y Ia n F o s te r a n d

C a rl K e s s e lm a n [6 ], w ith a re d u c e d n u m b e r o f im p le m e n te d s e rv ic e s .
S ix te e n s e rv ic e s h a v e b e e n im p le m e n te d b y th e m id d le w a re d e v e lo p e rs , b a s e d o n
o rig in a l c o d in g f o r s o m e s e rv ic e s a n d o n th e u s a g e o f th e G lo b u s 2 to o lk it ( se e
w w w .g lo b u s .o rg ) fo r b a s ic G rid in fra s tru c tu re s e rv ic e s : a u th e n tic a tio n (G S I), s e c u re
file tra n s fe r (G rid F T P ), in fo rm a tio n s y s te m s (M D S ), jo b s u b m is s io n (G R A M ) a n d th e
G lo b u s R e p lic a C a ta lo g u e . In a d d itio n th e jo b s u b m is s io n s y s te m u s e s s o ftw a re fr o m
th e C o n d o r-G p ro je c t [8 ]. T h e m id d le w a re a ls o re lie s o n g e n e ra l o p e n s o u rc e s o ftw a re
su c h a s O p e n L D A P .
T h e m id d le w a re d e v e lo p m e n t is d iv id e d in to s ix fu n c tio n a l a re a s : w o rk lo a d
m a n a g e m e n t, d a ta m a n a g e m e n t, G rid M o n ito rin g a n d In fo rm a tio n S y s te m s , fa b ric
m a n a g e m e n t, m a s s d a ta s to ra g e , a n d n e tw o rk m o n ito rin g . A s k e tc h o f th e e s s e n tia l
E D G a rc h ite c tu re is s h o w n in F ig u re 1 [1 ], w h e re th e re la tio n s h ip b e tw e e n th e
O p e ra tin g S y s te m , G lo b u s to o ls , th e E D G m id d le w a re a n d th e a p p lic a tio n s is s h o w n .
T h e E D G a rc h ite c tu re is th e re fo re a m u lti-la y e re d a rc h ite c tu re . A t th e lo w e s t le v e l is
th e o p e ra tin g s y s te m . G lo b u s p ro v id e s th e b a s ic s e rv ic e s fo r s e c u re a n d a u th e n tic a te d
u s e o f b o th o p e ra tin g s y s te m a n d n e tw o rk c o n n e c tio n s to s a fe ly tra n s fe r file s a n d d a ta
a n d a llo w in te ro p e ra tio n o f d is trib u te d s e rv ic e s . T h e E D G m id d le w a re u s e s th e
G lo b u s s e rv ic e s , a n d in te rfa c e s to th e h ig h e s t la y e r, th e u s e r a p p lic a tio n s ru n n in g o n
th e G rid .
a p p lic a tio n A L IC E A T L A S C M S L H C b O th e r
la y e r
V O c o m m o n
a p p lic a tio n la y e r L H C O th e r
G R ID H ig h le v e l G R I D
m id d le w a r e m id d le w a r e
G L O B U S B a s ic S e r v c e s
2 .0
O S & N e t s e rv ic e s
F ig . 1 . T h e s c h e m a tic la y e re d E D G a rc h ite c tu re : th e G lo b u s h o u rg la s s
T h e m u lti-la y e re d E D G G rid a rc h ite c tu re is s h o w n in F ig u re 2 a n d F ig u re 3 , w h ic h

s h o w th e d iffe re n t la y e rs fro m b o tto m to to p , n a m e ly : th e F a b ric la y e r, th e u n d e rly in g
G rid S e rv ic e s , th e C o lle c tiv e S e rv ic e s , th e G rid A p p lic a tio n la y e r a n d , a t th e to p , a
lo c a l a p p lic a tio n a c c e s s in g a re m o te c lie n t m a c h in e . F ig u re 3 g ro u p s to g e th e r a n d
id e n tifie s th e m a in E D G s e rv ic e s . A t th e to p o f th e w h o le s y s te m , th e lo c a l
a p p lic a tio n a n d th e lo c a l d a ta b a s e re p re s e n t th e e n d u s e r m a c h in e , w h ic h e x e c u te s a n
a p p lic a tio n re q u e s tin g G rid s e rv ic e s , e ith e r s u b m ittin g a G rid jo b o r re q u e s tin g a file
th ro u g h th e in te rfa c e s to th e lis t o f file s s to re d o n th e G rid a n d p u b lis h e d in a R e p lic a
C a ta lo g u e .
2 .1 W o r k lo a d M a n a g e m e n t S y s te m (W M S )
T h e g o a l o f th e W o rk lo a d M a n a g e m e n t S y s te m is to im p le m e n t a n a rc h ite c tu re fo r
d is trib u te d s c h e d u lin g a n d re s o u rc e m a n a g e m e n t in a G rid e n v iro n m e n t. It p ro v id e s to
th e G rid u s e rs a s e t o f to o ls to s u b m it th e ir jo b s , h a v e th e m e x e c u te d o n th e
d is trib u te d C o m p u tin g E le m e n ts (a G rid re s o u rc e m a p p e d to a n u n d e rly in g b a tc h
s y s te m ), g e t in fo rm a tio n a b o u t th e ir s ta tu s , re trie v e th e ir o u tp u t, a n d a llo w th e m to
a c c e s s G rid re s o u rc e s in a n o p tim a l w a y (o p tim iz in g C P U u s a g e , re d u c in g file
tra n s fe r tim e a n d c o s t, a n d b a la n c in g a c c e s s to re s o u rc e s b e tw e e n u s e rs ). It d e a ls w ith
th e J o b M a n a g e r o f th e G rid A p p lic a tio n la y e r a n d th e G rid S c h e d u le r in th e
C o lle c tiv e S e rv ic e s la y e r. A fu n c tio n a l v ie w o f th e w h o le W M S s y s te m is re p re s e n te d
in fig u re 4 .
L o c a l L o c a l
A p p lic a tio n D a ta b a se
G r id A p p lic a tio n L a y e r
J o b D a ta M e ta d a ta O b je c t to F ile
M g m t. M g m t. M g m t. M a p .
C o lle c tiv e S e r v ic e s
G r id S c h e d u le r R e p lic a I n fo r m a tio n
M a n a g e r M o n ito r in g
U n d e r ly in g G r id S e r v ic e s
S Q L C o m p . S to r a g e R e p lic a A u th o r . S e r v ic e
D a ta b a se E le m . E le m . C a ta lo g A u th e n . In d e x
S e r v e r S e r v ic e s S e r v ic e s a n d A c c .
F a b r ic s e r v ic e s
R e so u r c e C o n fig . M o n ito r . N o d e F a b r ic
M g m t. M g m t A n d F a u lt I n s ta lla tio n S to r a g e
T o le r a n c e & M g m t. M g m t.
F ig . 2 . T h e d e ta ile d m u lti la y e re d E D G G R ID a rc h ite c tu re
T h e W M S is c u rre n tly c o m p o s e d o f th e fo llo w in g p a rts :
– U se r In te r fa c e (U I): T h e a c c e s s p o in t fo r th e G rid u s e r. A jo b is d e fin e d u s in g th e

J D L la n g u a g e (s e e b e lo w ), w h ic h s p e c ifie s th e in p u t d a ta file s , th e c o d e to e x e c u te ,
th e re q u ire d s o ftw a re e
n v iro n m e n t, a n d lis ts o f in p u t a n d o u tp u t file s to b e
tra n s fe rre d w ith th e jo b . T h e u s e r c a n a ls o c o n tro l th e w a y in w h ic h th e b ro k e r
c h o o s e s th e b e s t-m a tc h in
g re s o u rc e . T h e jo b is s u b m itte d to th e R e s o u rc e B ro k e r
u s in g a c o m m a n d lin e in te rfa c e o r a p ro g ra m m a tic A P I; th e re a re a ls o se v e ra l
g ro u p s d e v e lo p in g g ra p h ic a l in te rfa c e s .
A p p l ic a t io n A r e a s
P h y s ic s A p p l. ( W P 8 ) E a r t h O b s e r v a t io n A p p l. ( W P 9 ) B io lo g y A p p l. ( W P 1 0 )
D a t a G r id S e r v ic e s
W o r k lo a d M a n a g e m e n t ( W P 1 )
D a ta M a n a g e m e n t ( W P 2 ) M o n ito r in g S e r v ic e s ( W P 3 )
C o r e M id d le w a r e
G lo b u s M i d d l e w a r e S e r v i c e s ( I n f o r m a t i o n , S e c u r i t y , . . . )
P h y s ic a l F a b r ic
F a b r ic M a n a g e m e n t ( W P 4 ) N e tw o r k in g ( W P 7 ) M a s s S to ra g e M a n a g e m e n t (W P 5 )
F ig . 3 . T h e E D G s e rv ic e a rc h ite c tu re
- R e s o u r c e B r o k e r (R B ): T h is p e rfo rm s m a tc h -m a k in g b e tw e e n th e re q u ire m e n ts o f
a jo b a n d th e a v a ila b le re s o u rc e s , a n d a tte m p ts to s c h e d u le th e jo b s in a n o p tim a l
w a y , ta k in g in to a c c o u n t th e d a ta lo c a tio n a n d th e re q u ire m e n ts s p e c ifie d b y th e
u s e r. T h e in fo rm a tio n a b o u t a v a ila b le re s o u rc e s is re a d d y n a m ic a lly fro m th e
In fo rm a tio n a n d M o n ito rin g S y s te m . T h e s c h e d u lin g a n d m a tc h -m a k in g a lg o rith m s
u s e d b y th e R B a re th e k e y to m a k in g e ffic ie n t u s e o f G rid re s o u rc e s . In
p e rfo rm in g th e m a tc h -m a k in g th e R B q u e rie s th e R e p lic a C a ta lo g u e , w h ic h is a
s e rv ic e u s e d to re s o lv e lo g ic a l file n a m e s (L F N , th e g e n e ric n a m e o f a file ) in to
p h y s ic a l file n a m e s (P F N , w h ic h g iv e s th e p h y s ic a l lo c a tio n a n d n a m e o f a
p a rtic u la r file re p lic a ). T h e jo b c a n th e n b e s e n t to th e s ite w h ic h m in im is e s th e
c o s t o f n e tw o rk b a n d w id th to a c c e s s th e file s .
- J o b S u b m is s io n S y s t e m (JS S ): T h is is a w ra p p e r fo r C o n d o r-G [8 ] , in te rf a c in g th e
G rid to a L o c a l R e s o u rc e M a n a g e m e n t S y s te m (L R M S ), u s u a lly a b a tc h s y s te m
lik e P B S , L S F o r B Q S . C o n d o r-G is a C o n d o r-G lo b u s jo in t p ro je c t, w h ic h
c o m b in e s th e in te r-d o m a in re s o u rc e m a n a g e m e n t p ro to c o ls o f th e G lo b u s T o o lk it
w ith th e in tra -d o m a in re so u rc e a n d jo b m a n a g e m e n t m e th o d s o f C o n d o r to a llo w
h ig h th ro u g h p u t c o m p u tin g in m u lti-d o m a in e n v iro n m e n ts .
- I n fo r m a tio n I n d e x (II): T h is is a G lo b u s M D S in d e x w h ic h c o lle c ts in fo rm a tio n

fro m th e G lo b u s G R IS in fo rm a tio n s e rv e rs ru n n in g o n th e v a rio u s G rid re s o u rc e s,
p u b lis h e d u s in g L D A P , a n d re a d b y th e R B to p e rfo rm th e m a tc h -m a k in g .
In fo rm a tio n ite m s a re b o th s ta tic (in s ta lle d s o ftw a re , n u m b e r o f a v a ila b le C P U s
e tc ) a n d d y n a m ic (to ta l n u m b e r o f ru n n in g jo b s , c u rr e n t a v a ila b le d is k s p a c e e tc ).
T h e in fo rm a tio n is c a c h e d fo r a s h o rt p e rio d to im p ro v e p e rfo rm a n c e .
- L o g g in g a n d B o o k k e e p in g (L B ): T h e L o g g in g a n d B o o k k e e p in g s e rv ic e s to re s a
v a rie ty o f in fo rm a tio n a b o u t th e s ta tu s a n d h is to ry o f s u b m itte d jo b s u s in g a
M y S Q L d a ta b a s e .
J o b D e s c r ip tio n L a n g u a g e (J D L )
T h e J D L a llo w s th e v a rio u s c o m p o n e n ts o f th e G rid S c h e d u le r to c o m m u n ic a te
re q u ire m e n ts c o n c e rn in g th e jo b e x e c u tio n . E x a m p le s o f s u c h re q u ire m e n ts a re :
S p e c ific a tio n o f th e e x e c u ta b le p ro g ra m o r s c rip t to b e ru n a n d a rg u m e n ts to

b e p a s s e d to it, a n d file s to b e u s e d fo r th e s ta n d a rd in p u t, o u tp u t a n d e rro r
s tre a m s .
S p e c ific a tio n o f file s th a t s h o u ld b e s h ip p e d w ith th e jo b v ia In p u t a n d
O u tp u t S a n d b o x e s .
A lis t o f in p u t file s a n d th e a c c e s s p ro to c o ls th e jo b is p re p a re d to u s e to re a d
th e m .
S p e c ific a tio n o f th e R e p lic a C a ta lo g u e to b e s e a rc h e d fo r p h y s ic a l in s ta n c e s
o f th e re q u e s te d in p u t file s .
R e q u ire m e n ts o n th e c o m p u tin g e n v iro n m e n t (O S , m e m o ry , fre e d is k sp a c e ,
s o ftw a re e n v iro n m e n t e tc ) in w h ic h th e jo b w ill ru n .
E x p e c te d re s o u rc e c o n s u m p tio n (C P U tim e , o u tp u t file s iz e s e tc ).
A ra n k in g e x p re s s io n u s e d to d e c id e b e tw e e n re s o u rc e s w h ic h m a tc h th e
o th e r re q u ire m e n ts .
T h e c la s s ifie d a d v e r tis e m e n ts (C la s s A d s ) la n g u a g e d e fin e d b y th e C o n d o r p ro je c t h a s

b e e n a d o p te d fo r th e J o b D e s c rip tio n L a n g u a g e b e c a u s e it h a s a ll th e re q u ire d
p ro p e rtie s .
In o rd e r fo r a u s e r to h a v e a jo b c o rre c tly e x e c u te d o n a w o rk e r n o d e o f a n
a v a ila b le C o m p u tin g E le m e n t, th e u s e r’s c re d e n tia ls h a v e to b e tra n s m itte d b y th e
c re a tio n o f a p ro x y c e rtific a te . A u s e r is s u e s a g rid -p ro x y -in it c o m m a n d o n a u s e r
in te rfa c e m a c h in e to c re a te a n X .5 0 9 P K I p ro x y c e rtific a te u s in g th e ir lo c a lly s to re d
p riv a te k e y . A n a u th e n tic a tio n re q u e s t c o n ta in in g th e p ro x y p u b lic a n d p riv a te k e y s
a n d th e u s e r’s p u b lic k e y is s e n t to a s e rv e r; th e s e rv e r g e ts th e re q u e s t a n d c re a te s a
c o d e d m e s s a g e b y m e a n s o f th e u s e r’s p u b lic k e y , s e n d in g it b a c k to th e u s e r p ro c e s s
o n th e U s e r In te rfa c e m a c h in e . T h is m e s s a g e is d e c o d e d b y m e a n s o f th e u s e r’s
p riv a te k e y a n d s e n t b a c k a g a in to th e s e rv e r (in th is c a s e n o rm a lly th e R e s o u rc e
B ro k e r). W h e n th e s e rv e r g e ts th e c o rre c tly d e c o d e d m e s s a g e it c a n b e s u re a b o u t th e
u s e r’s id e n tity , s o th a t a n a u th e n tic a te d c h a n n e l c a n b e e s ta b lis h e d a n d th e u s e r
c re d e n tia ls c a n b e d e le g a te d to th e b ro k e r.
U s e rs u s e C o n d o r C la s s A d s -lik e s ta te m e n ts in s id e a J D L (J o b D e s c rip tio n L a n g u a g e )

file to d e s c rib e th e jo b th e y w a n t to b e e x e c u te d b y th e G rid . T h is in c lu d e s a lis t o f
in p u t d a ta re s id in g o n S to ra g e E le m e n ts (G rid -e n a b le d d is k o r ta p e s to ra g e ), a n d
p la c e s re q u ire m e n ts o n th e fe a tu re s o f th e c o m p u te n o d e s o n w h ic h th e jo b w ill
e x e c u te . T h e s e c a n b e c h o s e n fro m th e s e t o f in fo rm a tio n d e fin e d b y th e s c h e m a u s e d
b y th e in fo rm a tio n s y s te m , a n d in c lu d e s s u c h th in g s a s o p e ra tin g s y s te m v e rs io n , C P U
s p e e d , a v a ila b le m e m o ry e tc .
F ig . 4 . T h e W o rk lo a d M a n a g e m e n t S y s te m a n d its c o m p o n e n ts : in te ra c tio n w ith o th e r E D G

e le m e n ts . T h e fu tu re c o m p o n e n t H L R (H o m e L o c a tio n R e g is te r) is a ls o s h o w n .
U s e rs c a n d e fin e a n In p u t S a n d b o x , w h ic h is a s e t o f file s tra n s fe rre d to a W o r k e r

N o d e b y m e a n s o f G rid F T P b y th e R e s o u rc e B ro k e r, s o th a t a n y file re q u ire d fo r th e
jo b to b e e x e c u te d (in c lu d in g th e e x e c u ta b le its e lf if n e c e s s a ry ) c a n b e s e n t to th e
lo c a l d is k o f th e m a c h in e w h e re th e jo b w ill ru n . S im ila rly , th e u s e r c a n s p e c ify a n
O u tp u t S a n d b o x , w h ic h is a s e t o f file s to b e re trie v e d fro m th e W o rk e r N o d e a fte r th e
jo b fin is h e s (o th e r file s a re d e le te d ). T h e file s in th e O u tp u t S a n d b o x a re s to re d o n th e
R B n o d e u n til th e u s e r re q u e s ts th e m to b e tra n s fe rre d b a c k to a U I m a c h in e .
T h e J D L c a n a ls o s p e c ify a p a rtic u la r re q u ire d s o ftw a re e n v iro n m e n t u s in g a s e t o f

u s e r-d e fin e d s trin g s to id e n tify p a rtic u la r fe a tu re s o f th e ru n -tim e e n v iro n m e n t (fo r
e x a m p le , lo c a lly in s ta lle d a p p lic a tio n s o ftw a re ).
A s p e c ia l file , c a lle d th e B ro k e rIn fo file , is c re a te d b y th e R e so u rc e B ro k e r to e n a b le a

ru n n in g jo b to b e a w a re o f th e c h o ic e s m a d e in th e m a tc h m a k in g , in p a rtic u la r a b o u t
th e S to ra g e E le m e n t(s ) lo c a l to th e c h o s e n C o m p u tin g E le m e n t, a n d th e w a y to a c c e s s
th e re q u e s te d in p u t file s . T h e B ro k e rIn f o f ile is tra n s f e rre d to th e W o rk e r N o d e a lo n g
w ith th e In p u t S a n d b o x , a n d c a n b e re a d d ire c tly o r w ith a n A P I o r c o m m a n d -lin e
to o ls .
U se rs h a v e a t th e ir d is p o s a l a s e t o f c o m m a n d s to h a n d le jo b s b y m e a n s o f a
c o m m a n d lin e in te rfa c e in s ta lle d o n a U s e r In te rfa c e m a c h in e , o n w h ic h th e y h a v e a
n o rm a l lo g in a c c o u n t a n d h a v e in s ta lle d th e ir X 5 0 9 c e rtific a te . T h e y c a n s u b m it a jo b ,
q u e ry its s ta tu s , g e t lo g g in g in fo rm a tio n a b o u t th e jo b h is to ry , c a n c e l a jo b , b e n o tifie d
v ia e m a il o f th e jo b ’s e x e c u tio n , a n d re trie v e th e jo b o u tp u t. W h e n a jo b is s u b m itte d
to th e s y s te m th e u s e r g e ts b a c k a G rid -w id e u n iq u e h a n d le b y m e a n s o f w h ic h th e jo b
c a n b e id e n tifie d in o th e r c o m m a n d s .
2 .2 D a ta M a n a g e m e n t S y ste m (D M S )
T h e g o a l o f th e D a ta M a n a g e m e n t S y s te m is to s p e c ify , d e v e lo p , in te g ra te a n d te s t
to o ls a n d m id d le w a re to c o h e re n tly m a n a g e a n d s h a re p e ta b y te -s c a le in fo rm a tio n
v o lu m e s in h ig h -th ro u g h p u t p ro d u c tio n -q u a lity g rid e n v iro n m e n ts . T h e e m p h a s is is
o n a u to m a tio n , e a s e o f u s e , s c a la b ility , u n ifo rm ity , tra n s p a re n c y a n d h e te ro g e n e ity .
T h e D M S w ill m a k e it p o s s ib le to s e c u re ly a c c e s s m a s s iv e a m o u n ts o f d a ta in a
u n iv e rs a l g lo b a l n a m e s p a c e , to m o v e a n d re p lic a te d a ta a t h ig h s p e e d fro m o n e
g e o g ra p h ic a l s ite to a n o th e r, a n d to m a n a g e s y n c h ro n is a tio n o f d is trib u te d re p lic a s o f
file s o r d a ta b a s e s . G e n e ric in te rfa c e s to h e te ro g e n e o u s m a s s s to ra g e m a n a g e m e n t
s y s te m s w ill e n a b le s e a m le s s a n d e ffic ie n t in te g ra tio n o f d is trib u te d re s o u rc e s . T h e
m a in c o m p o n e n ts o f th e E D G D a ta M a n a g e m e n t S y s te m , c u rre n tly p ro v id e d o r in
d e v e lo p m e n t, a re a s fo llo w s :
- R e p lic a M a n a g e r : T h is is s till u n d e r d e v e lo p m e n t, b u t it w ill m a n a g e th e c re a tio n

o f file re p lic a s b y c o p y in g fro m o n e S to ra g e E le m e n t to a n o th e r, o p tim is in g th e u s e
o f n e tw o rk b a n d w id th . It w ill in te rfa c e w ith th e R e p lic a C a ta lo g u e s e rv ic e to a llo w
G r id u s e rs to k e e p tra c k o f th e lo c a tio n s o f th e ir file s .
- R e p lic a C a t a lo g u e : T h is is a G rid s e rv ic e u s e d to re s o lv e L o g ic a l F ile N a m e s in to

a se t o f c o rre s p o n d in g P h y s ic a l F ile N a m e s w h ic h lo c a te e a c h re p lic a o f a file . T h is
p ro v id e s a G rid -w id e file c a ta lo g u e fo r th e m e m b e rs o f a g iv e n V irtu a l
O rg a n is a tio n .
- G D M P : T h e G R ID D a ta M irro rin g P a c k a g e is u s e d to a u to m a tic a lly m irro r file

re p lic a s fro m o n e S to ra g e E le m e n t to a s e t o f o th e r s u b s c rib e d s ite s . It is a ls o
c u rre n tly u s e d a s a p ro to ty p e o f th e g e n e ra l R e p lic a M a n a g e r s e rv ic e .
- S p itfir e : T h is p ro v id e s a G rid -e n a b le d in te rfa c e fo r a c c e s s to re la tio n a l d a ta b a s e s .

T h is w ill b e u s e d w ith in th e d a ta m a n a g e m e n tm id d le w a re to im p le m e n t th e
R e p lic a C a ta lo g u e , b u t is a ls o a v a ila b le fo r g e n e ra l u s e .
T h e R e p lic a M a n a g e r
T h e E D G R e p lic a M a n a g e r w ill a llo w u s e rs a n d ru n n in g jo b s to m a k e c o p ie s o f file s
b e tw e e n d iffe re n t S to ra g e E le m e n ts , s im u lta n e o u s ly u p d a tin g th e R e p lic a C a ta lo g u e ,
a n d to o p tim is e th e c re a tio n o f file re p lic a s b y u s in g n e tw o rk p e rfo rm a n c e
in fo rm a tio n a n d c o s t fu n c tio n s , a c c o rd in g to th e file lo c a tio n a n d s iz e . It w ill b e a
d is tr ib u te d s y s te m , i.e . d if f e r e n t in s ta n c e s o f th e R e p lic a M a n a g e r w ill b e r u n n in g o n
d iffe re n t s ite s , a n d w ill b e s y n c h ro n is e d to lo c a l R e p lic a C a ta lo g u e s , w h ic h w ill b e
in te rc o n n e c te d b y th e R e p lic a L o c a tio n In d e x . T h e R e p lic a M a n a g e r fu n c tio n a lity w ill

b e a v a ila b le b o th w ith A P Is a v a ila b le to ru n n in g a p p lic a tio n s a n d b y a c o m m a n d lin e
in te rfa c e a v a ila b le to u s e rs . T h e R e p lic a M a n a g e r is re s p o n s ib le fo r c o m p u tin g th e
c o s t e s tim a te s fo r re p lic a c re a tio n . In fo rm a tio n fo r c o st e s tim a te s , s u c h a s n e tw o rk
b a n d w id th , s ta g in g tim e s a n d S to ra g e E le m e n t lo a d in d ic a to rs , w ill b e g a th e re d f ro m
th e G rid In fo rm a tio n a n d M o n ito rin g S y s te m .
T h e R e p lic a C a ta lo g u e
T h e R e p lic a C a ta lo g u e h a s a s a p rim a ry g o a l th e re s o lu tio n o f L o g ic a l F ile N a m e s
in to P h y s ic a l F ile N a m e s , to a llo w th e lo c a tio n o f th e p h y s ic a l file (s ) w h ic h c a n b e
a c c e s s e d m o s t e ffic ie n tly b y a jo b . It is c u rre n tly im p le m e n te d u s in g G lo b u s s o ftw a re
b y m e a n s o f a s in g le L D A P s e rv e r ru n n in g o n a d e d ic a te d m a c h in e . In fu tu re it w ill b e
im p le m e n te d b y a d is trib u te d s y s te m w ith a lo c a l c a ta lo g u e o n e a c h S to ra g e E le m e n t
a n d a s y s te m o f R e p lic a L o c a tio n In d ic e s to a g g re g a te th e in fo rm a tio n fro m m a n y
s ite s . In o rd e r to a c h ie v e m a x im u m fle x ib ility th e tra n s p o rt p ro to c o l, q u e ry
m e c h a n is m , a n d d a ta b a s e b a c k e n d te c h n o lo g y w ill b e d e c o u p le d , a llo w in g th e
im p le m e n ta tio n o f a R e p lic a C a ta lo g u e s e rv e r u s in g m u ltip le d a ta b a s e te c h n o lo g ie s
(s u c h a s R D B M S s , L D A P -b a s e d d a ta b a s e s , o r fla t file s ). A P Is a n d p ro to c o ls b e tw e e n
c lie n t a n d s e rv e r a re re q u ire d , a n d w ill b e p ro v id e d in fu tu re re le a s e s o f th e E D G
m id d le w a re . T h e u s e o f m e c h a n is m s s p e c ific to a p a rtic u la r d a ta b a s e is e x c lu d e d .
A ls o th e q u e ry te c h n o lo g y w ill n o t b e tie d to a p a rtic u la r p ro to c o l, s u c h a s S Q L o r
L D A P . T h e u s e o f G S I-e n a b le d H T T P S fo r tra n s p o rt a n d X M L fo r in p u t/o u tp u t d a ta
re p re s e n ta tio n is fo re s e e n . B o th H T T P S a n d X M L a re th e m o s t w id e ly u s e d in d u s try
s ta n d a rd s fo r th is ty p e o f s y s te m .
T h e R e p lic a M a n a g e r, G rid u s e rs a n d G rid s e rv ic e s lik e th e s c h e d u le r (W M S ) c a n
a c c e s s th e R e p lic a C a ta lo g u e in fo rm a tio n v ia A P Is . T h e W M S m a k e s a q u e ry to th e
R C in th e firs t p a rt o f th e m a tc h m a k in g p ro c e s s , in w h ic h a ta rg e t c o m p u tin g e le m e n t
fo r th e e x e c u tio n o f a jo b is c h o s e n a c c o rd in g to th e a c c e s s ib ility o f a S to ra g e E le m e n t
c o n ta in in g th e re q u ire d in p u t file s . T o d o s o , th e W M S h a s to c o n v e rt lo g ic a l file
n a m e s in to p h y s ic a l file n a m e s . B o th lo g ic a l a n d p h y s ic a l file s c a n c a rry a d d itio n a l
m e ta d a ta in th e fo rm o f " a ttrib u te s " . L o g ic a l file a ttrib u te s m a y in c lu d e ite m s s u c h a s
file s iz e , C R C c h e c k s u m , file ty p e a n d file c re a tio n tim e s ta m p s .
A c e n tra lis e d R e p lic a C a ta lo g u e w a s c h o s e n fo r in itia l d e p lo y m e n t, th is b e in g th e

s im p le s t im p le m e n ta tio n . T h e G lo b u s R e p lic a C a ta lo g u e , b a se d o n L D A P d ire c to rie s ,
h a s b e e n u s e d in th e te s tb e d s o fa r. O n e d e d ic a te d L D A P se rv e r is a s s ig n e d to e a c h
V irtu a l O rg a n is a tio n ; fo u r o f th e s e re s id e o n a s e rv e r m a c h in e a t N IK H E F , tw o a t
C N A F , a n d o n e a t C E R N . U s e rs in te ra c t w ith th e R e p lic a C a ta lo g u e m a in ly v ia th e
p re v io u s ly d is c u s s e d R e p lic a C a ta lo g u e a n d B ro k e r In fo A P Is.
G D M P
T h e G D M P c lie n t-s e rv e r s o ftw a re s y s te m is a g e n e ric file re p lic a tio n to o l th a t
re p lic a te s file s s e c u re ly a n d e ffic ie n tly fro m o n e s ite to a n o th e r in a D a ta G rid
e n v iro n m e n t u s in g s e v e ra l G lo b u s G rid to o ls . In a d d itio n , it m a n a g e s re p lic a
c a ta lo g u e e n trie s fo r file re p lic a s , a n d th u s m a in ta in s a c o n s is te n t v ie w o f n a m e s a n d
lo c a tio n s o f re p lic a te d file s . A n y file fo rm a t c a n b e s u p p o rte d fo r file tra n s fe r u s in g
p lu g in s fo r p re - a n d p o s t-p ro c e s s in g , a n d fo r O b je c tiv ity d a ta b a s e file s a p lu g in is

s u p p lie d .
G D M P a llo w s m irro rin g o f u n c a ta lo g u e d u s e r d a ta b e tw e e n S to ra g e E le m e n ts .

R e g is tra tio n o f u s e r d a ta in to th e R e p lic a C a ta lo g u e is a ls o p o s s ib le v ia th e R e p lic a
C a ta lo g u e A P I. T h e b a s ic c o n c e p t is th a t c lie n t S E s s u b s c rib e to a so u rc e S E in w h ic h
th e y h a v e in te re s t. T h e c lie n ts w ill th e n b e n o tifie d o f n e w file s e n te re d in th e
c a ta lo g u e o f th e s u b s c rib e d s e rv e r, a n d c a n th e n m a k e c o p ie s o f re q u ire d file s ,
a u to m a tic a lly u p d a tin g th e R e p lic a C a ta lo g u e if n e c e s s a ry .
S p itfir e
S p itfire is a s e c u re , G rid -e n a b le d in te rfa c e to a re la tio n a l d a ta b a s e . S p itfire p ro v id e s
s e c u re q u e ry a c c e s s to re m o te d a ta b a s e s th ro u g h th e G rid u s in g G lo b u s G S I
a u th e n tic a tio n .
2 .3 G r id M o n ito r in g a n d I n fo r m a tio n S y s te m s
T h e E D G In fo rm a tio n S y s te m s m id d le w a re im p le m e n ts a c o m p le te in fra s tru c tu re to

e n a b le e n d -u s e r a n d a d m in is tra to r a c c e s s to s ta tu s a n d e rro r in fo rm a tio n in th e G rid
e n v iro n m e n t, a n d p ro v id e s a n e n v iro n m e n t in w h ic h a p p lic a tio n m o n ito rin g c a n b e
c a rrie d o u t. T h is p e rm its jo b p e rfo rm a n c e o p tim is a tio n a s w e ll a s a llo w in g fo r
p ro b le m tra c in g , a n d is c ru c ia l to fa c ilita tin g h ig h p e rfo rm a n c e G rid c o m p u tin g . T h e
g o a l is to p ro v id e e a s y a c c e s s to c u rre n t a n d a rc h iv e d in fo rm a tio n a b o u t th e G rid
its e lf (in fo rm a tio n a b o u t re s o u rc e s - C o m p u tin g E le m e n ts , S to ra g e E le m e n ts a n d th e
N e tw o r k ) , f o r w h ic h th e G lo b u s M D S is a c o m m o n s o lu tio n , a b o u t jo b s ta tu s ( e .g . a s
im p le m e n te d b y th e W M S L o g g in g a n d B o o k k e e p in g s e rv ic e ) a n d a b o u t u s e r
a p p lic a tio n s r u n n in g o n th e G r id , e .g . f o r p e r f o r m a n c e m o n ito r in g . T h e m a in
c o m p o n e n ts a re a s fo llo w s :
- M D S : M D S is th e G lo b u s M o n ito rin g a n d D is c o v e ry S e rv ic e , b a se d o n so f t-s ta te

re g is tra tio n p ro to c o ls a n d L D A P a g g re g a te d ire c to ry s e rv ic e s . E a c h re so u rc e ru n s a
G R IS (G rid R e s o u rc e In fo rm a tio n S e rv e r) p u b lis h in g lo c a l in fo rm a tio n a s a n
L D A P d ire c to ry . T h e s e s e rv e rs a re in tu rn re g is te re d to a h ie ra r c h y o f G IIS s (G rid
In fo rm a tio n In d e x S e rv e rs ), w h ic h a g g re g a te th e in fo rm a tio n a n d a g a in p u b lis h it
a s a n L D A P d ire c to ry .
- F tr e e : F tre e is a n E D G -d e v e lo p e d a lte rn a tiv e to th e G lo b u s L D A P b a c k e n d w ith

im p ro v e d c a c h in g o v e r th e c o d e in th e G lo b u s 1 to o lk it.
- R - G M A : R -G M A is a re la tio n a l G M A (G rid M o n ito rin g A rc h ite c tu re )

im p le m e n ta tio n w h ic h m a k e s in fo rm a tio n fro m p ro d u c e rs a v a ila b le to c o n s u m e rs
a s re la tio n s (ta b le s ). It a ls o u s e s re la tio n s to h a n d le th e re g is tra tio n o f p ro d u c e rs . R -
G M A is c o n s is te n t w ith G M A p rin c ip le s .
- G R M /P R O V E : G R M /P ro v e is a n a p p lic a tio n m o n ito rin g a n d v is u a lis a tio n to o l o f

th e P -G R A D E g ra p h ic a l p a ra lle l p ro g ra m m in g e n v iro n m e n t, m o d ifie d fo r
a p p lic a tio n m o n ito rin g in th e D a ta G rid e n v iro n m e n t. T h e in s tru m e n ta tio n lib ra ry
o f G R M is g e n e ra lis e d fo r a fle x ib le tra c e e v e n t sp e c ific a tio n . T h e c o m p o n e n ts o f

G R M w ill b e c o n n e c te d to R -G M A u s in g its P ro d u c e r a n d C o n su m e r A P Is.
A n u m b e r o f a lte rn a tiv e s , M D S , F tre e a n d R -G M A , a re b e in g c o n s id e re d a s th e b a s is
o f th e fin a l E D G in fo rm a tio n s e rv ic e . T h e s e im p le m e n ta tio n s a re b e in g e v a lu a te d a n d
c o m p a re d u s in g a s e t o f p e rfo rm a n c e , s c a la b ility a n d re lia b ility c rite ria to d e te rm in e
w h ic h is th e m o s t s u ita b le fo r d e p lo y m e n t.
In th e c u rre n t te s tb e d a ll re le v a n t G rid e le m e n ts ru n a G R IS , w h ic h c a rrie s th e
in fo rm a tio n fo r th a t e le m e n t to a n M D S G IIS w h e re th e in fo rm a tio n is c o lle c te d , to b e
q u e rie d b y th e R e s o u rc e B ro k e r a n d o th e r G rid s e rv e rs.
2 .4 E D G F a b r ic I n s ta lla tio n a n d J o b M a n a g e m e n t T o o ls
T h e E D G c o lla b o ra tio n h a s d e v e lo p e d a c o m p le te s e t o f to o ls fo r th e m a n a g e m e n t o f
P C fa rm s (fa b ric s ), in o rd e r to m a k e th e in s ta lla tio n a n d c o n fig u ra tio n o f th e v a rio u s
n o d e s a u to m a tic a n d e a s y fo r th e s ite m a n a g e rs m a n a g in g a te s tb e d s ite , a n d fo r th e
c o n tro l o f jo b s o n th e W o rk e r N o d e s in th e fa b ric . T h e m a in ta s k s a re :
U s e r J o b C o n tr o l a n d M a n a g e m e n t (G r id a n d lo c a l jo b s ) o n fa b r ic b a tc h a n d /o r
in te r a c tiv e C P U s e r v ic e s . T h e re a re tw o b ra n c h e s :
- T h e G r id ific a tio n s u b s y s te m p ro v id e s th e in te rfa c e fro m th e G rid to th e re s o u rc e s

a v a ila b le in s id e a fa b ric fo r b a tc h a n d in te ra c tiv e C P U s e rv ic e s . It p ro v id e s th e
in te rfa c e fo r jo b s u b m is s io n /c o n tro l a n d in fo rm a tio n p u b lic a tio n to th e G rid
s e rv ic e s . It a ls o p ro v id e s fu n c tio n a lity fo r lo c a l a u th e n tic a tio n a n d p o lic y -b a s e d
a u th o ris a tio n , a n d m a p p in g o f G rid c re d e n tia ls to lo c a l c r e d e n tia ls .
- T h e R e so u r c e M a n a g e m e n t s u b s y s te m is a la y e r o n to p o f th e b a tc h a n d
in te ra c tiv e s e rv ic e s (L R M S ). W h ile th e G rid R e so u rc e B r o k e r m a n a g e s w o rk lo a d
d is trib u tio n b e tw e e n fa b ric s , th e R e s o u rc e M a n a g e m e n t su b s y s te m m a n a g e s th e
w o rk lo a d d is trib u tio n a n d re s o u rc e s h a rin g o f a ll b a tc h a n d in te r a c tiv e s e rv ic e s
in s id e a fa b ric , a c c o rd in g to d e fin e d p o lic ie s a n d u se r q u o ta a llo c a tio n s.
A u to m a te d S y ste m A d m in is tr a tio n fo r th e a u to m a tic in s ta lla tio n a n d

c o n fig u r a tio n o f c o m p u tin g n o d e s . T h e s e th re e s u b s y s te m s a re d e s ig n e d fo r th e u s e
o f sy s te m a d m in is tra to rs a n d o p e ra to rs to p e rfo rm s y s te m in s ta lla tio n , c o n fig u ra tio n
a n d m a in te n a n c e :
- C o n fig u r a tio n M a n a g e m e n t p ro v id e s th e c o m p o n e n ts to m a n a g e a n d s to re
c e n tra lly a ll fa b ric c o n fig u ra tio n in fo rm a tio n . T h is in c lu d e s th e c o n fig u ra tio n o f a ll
E D G s u b s y s te m s a s w e ll a s in fo rm a tio n a b o u t th e fa b ric h a rd w a re , s y s te m s a n d
s e rv ic e s .
- I n s ta lla tio n M a n a g e m e n t h a n d le s th e in itia l in s ta lla tio n o f c o m p u tin g fa b ric

n o d e s . It a ls o h a n d le s s o ftw a re d is trib u tio n , c o n fig u ra tio n a n d m a in te n a n c e
a c c o rd in g to in fo rm a tio n s to re d in th e C o n fig u ra tio n M a n a g e m e n t s u b s y s te m .
- F a b r ic M o n ito r in g a n d F a u lt T o le r a n c e p ro v id e s th e n e c e s s a ry c o m p o n e n ts fo r
g a th e rin g , s to rin g a n d r e trie v in g p e rfo rm a n c e , fu n c tio n a l, s e tu p a n d e n v iro n m e n ta l
d a ta fo r a ll fa b ric e le m e n ts . It a ls o p ro v id e s th e m e a n s to c o rre la te th a t d a ta a n d
e x e c u te c o rre c tiv e a c tio n s if p ro b le m s a re id e n tifie d .
T h e fa b ric in s ta lla tio n a n d c o n fig u ra tio n m a n a g e m e n t to o ls a re b a se d o n a re m o te

in s ta ll a n d c o n fig u ra tio n to o l c a lle d L C F G ( L o c a l C o n f ig u ra to r), w h ic h , b y m e a n s o f
a se rv e r, in s ta lls a n d c o n fig u re s re m o te c lie n ts , s ta rtin g fro m sc ra tc h , u s in g a n e tw o rk
c o n n e c tio n to d o w n lo a d th e re q u ire d R P M f ile s fo r th e in s ta lla tio n , a fte r u s in g a d is k
to lo a d a b o o t k e rn e l o n th e c lie n t m a c h in e s .
T h e b a s ic a rc h ite c tu ra l s tru c tu re a n d fu n c tio n o f L C F G a re re p re s e n te d in F ig u re 5

a n d a re a s fo llo w s : a b s tra c t c o n fig u ra tio n p a ra m e te rs a re s to re d in a c e n tra l re p o s ito ry
lo c a te d in th e L C F G s e rv e r. S c rip ts o n th e h o s t m a c h in e (L C F G c lie n t) re a d th e s e
c o n fig u ra tio n p a ra m e te rs a n d e ith e r g e n e ra te tra d itio n a l c o n fig u ra tio n file s , o r d ire c tly
m a n ip u la te v a rio u s s e rv ic e s . A d a e m o n in th e L C F G s e rv e r (m k x p ro f) p o lls fo r
c h a n g e s in th e s o u rc e file s a n d c o n v e rts th e m in to X M L p ro file s , o n e p ro file p e r c lie n t
n o d e . T h e X M L p ro file s a re th e n p u b lis h e d o n a w e b s e rv e r. L C F G c lie n ts c a n b e
c o n fig u re d to p o ll a t re g u la r in te rv a ls , o r to re c e iv e a u to m a tic c h a n g e n o tific a tio n s , o r
th e y c a n fe tc h n e w p ro file s in re s p o n s e to a n e x p lic it c o m m a n d . A d a e m o n in e a c h
L C F G c lie n t (rd x p ro f) th e n re a d s its a s s o c ia te d X M L p ro file fro m th e w e b s e rv e r a n d
c a c h e s it lo c a lly (D B M file ). L C F G s c rip ts a c c e s s th e lo c a l c a c h e to e x tra c t th e
c o n fig u ra tio n v a lu e s a n d e x e c u te c h a n g e s a c c o rd in g ly .
L C F G c o n fig u ra tio n file s
h ttp
L C F G c lie n t
m k x p r o f
r d x p r o f ll dd xx pp rr oo ff
W e b S e rv e r
X M L P ro file
(o n e p e r c lie n t)
L C F G se rv e r
G e n e r ic
C o m p o n e n t
D B M F ile
L C F G C o m p o n e n ts
F ig . 5 . L C F G in te rn a l o p e ra tio n
2 .5 T h e S to r a g e E le m e n t
T h e S to ra g e E le m e n t h a s a n im p o rta n t ro le in th e s to ra g e o f d a ta a n d th e m a n a g e m e n t
o f file s in th e G rid d o m a in , a n d E D G is w o rk in g o n its d e fin itio n , d e s ig n , s o ftw a re
d e v e lo p m e n t, s e tu p a n d te s tin g .
A S to ra g e E le m e n t is a c o m p le te G rid -e n a b le d in te rfa c e to a M a s s S to ra g e
M a n a g e m e n t S y s te m , ta p e o r d is k b a s e d , s o th a t m a s s s to ra g e o f file s c a n b e a lm o s t
c o m p le te ly tra n s p a re n t to G rid u s e rs . A u s e r s h o u ld n o t n e e d to k n o w a n y th in g a b o u t
th e p a rtic u la r s to ra g e s y s te m a v a ila b le lo c a lly to a g iv e n G rid re s o u rc e , a n d s h o u ld
o n ly b e re q u ire d to re q u e s t th a t file s s h o u ld b e re a d o r w ritte n u s in g a c o m m o n
in te rfa c e . A ll e x is tin g m a s s s to ra g e s y s te m s u s e d a t te s tb e d s ite s w ill b e in te rfa c e d to
th e G rid , s o th a t th e ir u s e w ill b e c o m p le te ly tra n s p a re n t a n d th e a u th o ris a tio n o f u s e rs
to u s e th e s y s te m w ill b e in te rm s o f g e n e ra l q u a n titie s lik e s p a c e u s e d o r s to ra g e
d u ra tio n .
T h e p ro c e d u re s fo r a c c e s s in g file s a re s till in th e d e v e lo p m e n t p h a s e . T h e m a in
a c h ie v e m e n ts to d a te h a v e b e e n th e d e fin itio n o f th e a rc h ite c tu re a n d d e s ig n fo r th e
S to ra g e E le m e n t, c o lla b o ra tio n w ith G lo b u s o n G rid F T P /R F IO a c c e s s , c o lla b o ra tio n
w ith P P D G o n a c o n tro l A P I, s ta g in g fro m a n d to th e C A S T O R ta p e s y s te m a t C E R N ,
a n d a n in te rfa c e to G D M P . In itia lly th e s u p p o rte d s to ra g e in te rfa c e s w ill b e U N IX
d is k s y s te m s , H P S S (H ig h P e rfo rm a n c e S to ra g e S y s te m ), C A S T O R (th ro u g h R F IO ),
a n d re m o te a c c e s s v ia th e G lo b u s G rid F T P p ro to c o l. L o c a l file a c c e s s w ith in a s ite
w ill a ls o b e a v a ila b le u s in g U n ix f ile a c c e s s , e .g . w ith N F S o r A F S . E D G a r e a ls o
d e v e lo p in g a g rid -a w a re U n ix filin g s y s te m w ith o w n e rs h ip a n d a c c e s s c o n tro l b a s e d
o n G rid c e rtific a te s ra th e r th a n lo c a l U n ix a c c o u n ts .
3 T h e E D G T e stb e d
E D G h a s d e p lo y e d th e m id d le w a re o n a d is trib u te d te s tb e d , w h ic h a ls o p ro v id e s s o m e
sh a re d s e rv ic e s . A c e n tra l s o ftw a re re p o s ito ry p ro v id e s d e fin e d b u n d le s o f R P M s
a c c o rd in g to m a c h in e ty p e , to g e th e r w ith L C F G s c rip ts to in s ta ll a n d c o n fig u re th e
so f tw a re .
T h e re a re a ls o a u to m a tic to o ls fo r th e c re a tio n a n d u p d a te o f g rid -m a p file s (u s e d to

m a p G rid c e rtific a te s to lo c a l U n ix a c c o u n ts ), n e e d e d b y a ll te s tb e d s ite s to a u th o ris e
u s e rs to a c c e s s th e te s tb e d re s o u rc e s . A n e w u s e r s u b s c rib e s to th e E D G A c c e p ta b le
U s a g e P o lic y b y u s in g th e ir c e rtific a te , lo a d e d in to a w e b b ro w s e r, to d ig ita lly s ig n
th e ir a g re e m e n t. T h e ir c e rtific a te S u b je c t N a m e is th e n a d d e d to a n L D A P s e rv e r
m a in ta in e d fo r e a c h V irtu a l O rg a n is a tio n b y a V O a d m in is tra to r. E a c h s ite c a n u s e
th is in fo rm a tio n , to g e th e r w ith lo c a l p o lic y o n w h ic h V O s a re s u p p o rte d , to g e n e ra te
th e lo c a l m a p file w h ic h a u th o ris e s th e u s e r a t th a t s ite . T h is m e c h a n is m is s k e tc h e d in
6 .
T h e te s tb e d s ite s e a c h im p le m e n t a U s e r In te rfa c e m a c h in e , a G a te k e e p e r a n d a s e t o f
W o r k e r N o d e s ( i.e . a G r id C o m p u tin g E le m e n t) , m a n a g e d b y m e a n s o f a L o c a l
R e so u rc e M a n a g e m e n t S y s te m , a n d a S to r a g e E le m e n t (d is k o n ly a t m o s t s ite s , b u t
w ith ta p e s to ra g e a t C E R N , L y o n a n d R A L ). S o m e s ite s h a v e a ls o s e t u p a lo c a l
R e so u rc e B ro k e r. A s a re fe re n c e , F ig . 7 s h o w s a ty p ic a l s ite s e tu p in te rm s o f m a c h in e
c o m p o s itio n , fo r b o th d e v e lo p m e n t a n d p ro d u c tio n te s tb e d s , n a m e ly th e c u rre n t
C E R N te s tb e d , w ith p ro d u c tio n , d e v e lo p m e n t a n d s e rv ic e m a c h in e s (n e tw o rk tim e
se rv e r, N F S s e rv e r, L C F G s e rv e r, m o n ito rin g se rv e rs).
N e w s ite s a re w e lc o m e to jo in th e te s tb e d , w ith a w e ll-d e fin e d s e t o f ru le s a n d

p ro c e d u re s . In a d d itio n th e E D G m id d le w a re is fre e ly a v a ila b le , d o c u m e n te d , a n d
d o w n lo a d a b le fro m a c e n tra l re p o s ito ry (th e e x a c t lic e n s e c o n d itio n s a re s till u n d e r
re v ie w , b u t w ill a llo w fre e u s e ). A ll re q u ire d R P M s a re a v a ila b le fo r d o w n lo a d to a n
L C F G s e rv e r, w h ic h is g e n e ra lly th e firs t e le m e n t to b e s e t u p , b y m e a n s o f w h ic h a ll
o th e r c o m p o n e n ts c a n b e e a s ily in s ta lle d a n d c o n fig u re d fo llo w in g th e E D G
d o c u m e n ta tio n . A lte rn a tiv e ly it is p o s s ib le to in s ta ll a n d c o n fig u re th e s o ftw a re b y
h a n d . O n ly R e d H a t L in u x v e r s io n 6 .2 is c u r r e n tly s u p p o r te d , b u t m o r e p la tfo r m s w ill
b e s u p p o rte d in d u e c o u rs e .
T h e o p e ra tio n o f th e m a k e g rid m a p d a e m o n
F ig . 6 . T h e o p e ra tio n o f th e m a k e g rid m a p d a e m o n
F ig . 7 . T h e C E R N te s tb e d c lu s te r c o m p o s itio n
4 F u tu r e D e v e lo p m e n ts
T h e re a re a n u m b e r o f im p o rta n t s e rv ic e s s till in d e v e lo p m e n t. T h e W M S w ill

in tro d u c e a n im p le m e n ta tio n fo r b illin g a n d a c c o u n tin g , a d v a n c e re s e rv a tio n o f
re s o u rc e s , jo b p a rtitio n in g a n d c h e c k p o in tin g . In th e d a ta m a n a g e m e n t a re a th e re w ill
b e A P Is fo r R e p lic a S e le c tio n th ro u g h a R e p lic a O p tim is e r. T h e firs t fu ll
im p le m e n ta tio n o f R -G M A w ill b e u s e d a n d c o m p a re d w ith th e e x is tin g in fo rm a tio n
s y s te m s (M D S a n d F tre e ). F o r fa b ric m a n a g e m e n t th e re w ill b e th e L C A S (L o c a l
C re d e n tia ls a n d A u th o riz a tio n S e rv e r) a n d fu rth e r to o ls to re p la c e th e u s e o f G rid m a p
file s to a u th o ris e u s e rs a n d in tro d u c e e ffe c tiv e d iffe re n tia te d a u th o ris a tio n o f u s e rs
a c c o rd in g to th e V irtu a l O rg a n is a tio n th e y b e lo n g to (V O M S , V irtu a l O rg a n is a tio n
M e m b e rs h ip S e rv e r). A ls o a n e w h ig h le v e l c o n fig u ra tio n d e s c rip tio n la n g u a g e w ill
b e p ro v id e d . In te rfa c e s to S to ra g e E le m e n ts w ill c o n tin u e to b e d e v e lo p e d . N e tw o rk
m o n ito rin g in fo rm a tio n w ill b e u s e d to in flu e n c e th e d e c is io n s o f th e R e s o u rc e
B ro k e r.
T h e E u ro p e a n D a ta G rid p ro je c t h a s a lre a d y a c h ie v e d m a n y o f its g o a ls , s ta te d a t th e

tim e o f th e p ro je c t c o n c e p tio n tw o y e a rs a g o . A p ro d u c tio n q u a lity d is trib u te d
c o m p u tin g e n v iro n m e n t h a s b e e n d e m o n s tra te d b y th e E D G te s tb e d . It w ill n o w b e
e n ric h e d in fu n c tio n a lity , fu rth e r im p ro v e d in re lia b ility a n d e x te n d e d b o th
g e o g ra p h ic a lly a n d in te rm s o f a g g re g a te C P U p o w e r a n d s to ra g e c a p a c ity . T h e
c o m m u n ity o f u s e rs h a s a lre a d y s u c c e s s fu lly v a lid a te d th e u s e o f a la rg e s e t o f
a p p lic a tio n s , ra n g in g fro m H ig h E n e rg y P h y s ic s to B io -In fo rm a tic s a n d E a rth
O b s e rv a tio n [2 , 3 , 4 ]. A t th e s a m e tim e d e v e lo p m e n t is c u rre n tly o n g o in g to e x te n d
th e ra n g e o f fu n c tio n a lity c o v e re d b y th e E D G m id d le w a re .
A ll w o rk p a c k a g e s h a v e d e fin e d a n in te n s e s c h e d u le o f n e w re s e a rc h a n d
d e v e lo p m e n t, w h ic h w ill b e s u p p o rte d b y th e p ro g re s s iv e in tro d u c tio n o f h ig h -s p e e d
s c ie n tific n e tw o rk s s u c h a s th o s e d e p lo y e d b y R N G E A N T . T h is w ill in c re a s e th e
ra n g e o f p o s s ib ilitie s a v a ila b le to th e E D G d e v e lo p e rs . A s a n e x a m p le , E D G h a s
p ro p o s e d th e in tro d u c tio n o f a N e tw o rk O p tim is e r s e rv e r, to e s ta b lis h in w h ic h c a s e s
it is p re fe ra b le to a c c e s s a file fro m a re m o te lo c a tio n o r to trig g e r lo c a l c o p y in g ,
a c c o rd in g to n e tw o rk c o n d itio n s in th e e n d -to -e n d lin k b e tw e e n th e re le v a n t s ite s . T h e
d e v e lo p m e n t o f D iffe re n tia te d S e rv ic e s a n d P a c k e t F o rw a rd in g p o lic ie s is s tro n g ly
e n c o u ra g e d , in o rd e r to m a k e G rid a p p lic a tio n s c o p e b e tte r w ith th e d y n a m ic n e tw o rk
p e rfo rm a n c e a n d c re a te d iffe re n t c la s s e s o f s e rv ic e s to b e p ro v id e d to d iffe re n t c la s s e s
o f a p p lic a tio n s , a c c o rd in g to th e ir re q u ire m e n ts in te rm s o f b a n d w id th , th ro u g h p u t,
d e la y , jitte r e tc .
T h e im p a c t o f th e n e w G lo b u s fe a tu re s fo re s e e n b y th e in tro d u c tio n o f th e O G S A
p a ra d ig m s u g g e s te d b y th e U S G lo b u s d e v e lo p e rs , w h e re th e m a in a c c e n t is o n a W e b
S e rv ic e s o rie n te d a rc h ite c tu re , is b e in g e v a lu a te d b y E D G , a n d a n e v o lu tio n o f th e
c u rre n t a rc h ite c tu re in th a t d ire c tio n c o u ld b e e n v is a g e d . T h is is p ro p o s e d fo r fu tu re
re le a s e s o f th e E D G m id d le w a re a n d it m a y b e c o n tin u e d w ith in itia tiv e s in th e n e w
E U F P 6 fra m e w o rk . A n im p o rta n t c o lla b o ra tio n h a s a lre a d y b e e n e s ta b lis h e d v ia th e
G R I D S T A R T in itia tiv e ( w w w .g r id s ta r t.o r g ) w ith th e o th e r te n e x is tin g E U f u n d e d
G r id p r o je c ts . I n p a r tic u la r , th e E U C r o s s G r id p r o je c t ( w w w .c r o s s g r id .o r g ) w h ic h w ill
e x p lo it D a ta G rid te c h n o lo g ie s to s u p p o rt a v a rie ty o f a p p lic a tio n s , a ll d e m a n d in g
g u a r a n te e d q u a lity o f s e r v ic e ( i.e . r e a l tim e e n v ir o n m e n t s im u la tio n , v id e o s tr e a m in g
a n d o th e r a p p lic a tio n s re q u irin g h ig h n e tw o rk b a n d w ith ).
C o lla b o r a tio n w ith s im ila r G r id p r o je c ts in th e U S , e s p e c ia lly P P D G ( w w w .p p d g .n e t) ,

G r iP h y N ( w w w .g r ip h y n .o r g ) a n d iV D G L ( w w w .iv d g l.o r g ) is b e in g p u r s u e d in
c o lla b o r a tio n w ith th e s is te r p r o je c t E U D a ta T A G ( w w w .d a ta ta g .o r g ) . T h e m a in g o a l
o f D a ta T A G is th e e s ta b lis h m e n t o f a tra n s a tla n tic te s tb e d to d e p lo y a n d te s t th e
s o ftw a re o f th e E D G p ro je c t. In te re s t in th e E D G p ro je c t a n d its p ro d u c tio n -o rie n te d
a p p ro a c h h a s a lre a d y re a c h e d b e y o n d th e b o rd e rs o f th e E u ro p e a n U n io n : a fte r R u s s ia
a n d R o m a n ia , w h ic h h a v e a lre a d y in s ta lle d th e E D G s o ftw a re , s o m e A s ia n s ite s (in
T a iw a n a n d S o u th K o re a ) h a v e a p p lie d to b e c o m e m e m b e rs o f th e d is trib u te d E D G
te s tb e d , in o rd e r to p a rtic ip a te in H ig h E n e rg y P h y s ic s d a ta c h a lle n g e s fo r d a ta
p ro d u c tio n a n d s im u la tio n .
R e fe r e n c e D o c u m e n ts
N o te : a ll o ffic ia l E D G d o c u m e n ts a re a v a ila b le o n th e w e b a t th e U R L :
h ttp ://e u - d a ta g r id .w e b .c e r n .c h /e u - d a ta g r id /D e liv e r a b le s /d e f a u lt.h tm
[1 ] D a ta G r id D 1 2 .4 : “ D a ta G r id A r c h ite c tu r e ”
[2 ] D a ta G rid D 8 .1 a : “ D a ta G rid U s e r R e q u ire m e n ts a n d S p e c ific a tio n s fo r th e D a ta G rid
P ro je c t”
[3 ] D a ta G rid D 9 .1 : “ R e q u ire m e n ts S p e c ific a tio n : E O A p p lic a tio n R e q u ire m e n ts fo r G rid ”
[4 ] D a ta G r id D 1 0 .1 : W P 1 0 R e q u ir e m e n ts D o c u m e n t
[5 ] D a ta G rid D 8 .2 : “ T e s tb e d 1 A s s e s s m e n t b y H E P A p p lic a tio n s ”
[6 ] “ T h e A n a to m y o f th e G rid ” , I. F o s te r, C . K e s s e lm a n , e t a l. T e c h n ic a l R e p o rt, G lo b a l
G r id F o r u m , 2 0 0 1 , h ttp ://w w w .g lo b u s .o r g /r e s e a r c h /p a p e r s /a n a to m y .p d f
[7 ] D a ta G rid D 6 .1 : “ T e s tb e d S o ftw a re In te g ra tio n P ro c e s s ”
[8 ] C o n d o r P r o je c t ( h ttp ://w w w .c s .w is c .e d u /c o n d o r /) . J im B a s n e y a n d M ir o n L iv n y ,
“ D e p lo y in g a H ig h T h ro u g h p u t. C o m p u tin g C lu s te r” , H ig h P e rfo rm a n c e C lu s te r
c o m p u tin g ,R a jk u m a r B u y y a , E d ito r , V o l. 1 , C h a p te r 5 , P r e n tic e H a ll P T R ,M a y 1 9 9 9 .
N ic h o la s C o le m a n , " A n Im p le m e n ta tio n o f M a tc h m a k in g A n a ly s is in C o n d o r" ,
M a s te rs ' P ro je c t re p o rt, U n iv e rs ity o f W is c o n s in , M a d is o n , M a y 2 0 0 1 .
[1 0 ] D a ta G rid A rc h ite c tu re V e rs io n 2 , G . C a n c io , S . F is h e r, T . F o lk e s , F . G ia c o m in i, W .
H o s c h e k , D . K e ls e y , B . T ie rn e y ,
h ttp ://g r id - a tf .w e b .c e r n .c h /g r id - a tf /d o c u m e n ts .h tm l
[1 1 ] E D G U s a g e G u id e lin e s ( h ttp ://m a r ia n n e .in 2 p 3 .f r /d a ta g r id /d o c u m e n ta tio n /E D G -
U s a g e -G u id e lin e s .h tm l)
[1 2 ] S o ftw a re R e le a s e P la n D a ta G rid -1 2 -P L N -3 3 3 2 9 7 ;
h ttp ://e d m s .c e r n .c h /d o c u m e n t/3 3 3 2 9 7
[1 3 ] P ro je c t te c h n ic a l a n n e x .
[1 4 ] D a ta G r id D 1 2 .3 : “ S o f tw a r e R e le a s e P o lic y ”
D a ta G r id P u b lic a tio n s
G a g lia r d i, F ., B a x e v a n id is , K ., F o s te r , I ., a n d D a v ie s , H . G r id s a n d R e s e a r c h N e tw o r k s a s
D riv e rs a n d E n a b le rs o f F u tu re In te rn e t A rc h ite c tu re s . T h e N e w In te r n e t A r c h ite c tu r e (to b e
p u b lis h e d )
B u y y a , R . S to c k in g e r, H . E c o n o m ic M o d e ls fo r re s o u rc e m a n a g e m e n t a n d s c h e d u lin g in
G rid c o m p u tin g . T h e J o u r n a l o f C o n c u r r e n c y a n d C o m p u ta tio n : P r a tic e a n d E x p e r ie n c e
(C C P E ) S p e c ia l is s u e o n G r id c o m p u tin g e n v ir o n m e n ts . 2 0 0 2
S to c k in g e r, H . D a ta b a s e R e p lic a tio n in W o rld -W id e D is trib u te d D a ta G rid s . P h D th e s is ,
2 0 0 2 .
P rim e t, P . H ig h P e rfo rm a n c e G rid N e tw o rk in g in th e D a ta G rid P ro je c t. T e re n a 2 0 0 2 .
S to c k in g e r , H ., S a m a r , A ., A llc o c k , B ., F o s te r , I ., H o ltm a n , K .,a n d T ie r n e y , B . F ile a n d
th
O b je c t R e p lic a tio n in D a ta G rid s . 1 0 IE E E S y m p o s iu m o n H ig h P e r fo r m a n c e D is tr ib u te d
C o m p u tin g (H P D C 2 0 0 1 ). S a n F ra n c is c o , C a lifo rn ia , A u g u s t 7 -9 , 2 0 0 1 .
H o s c h e k , W ., J a e n - M a r tin e z , J ., S a m a r , A ., S to c k in g e r , H . a n d S to c k in g e r , K . D a ta
M a n a g e m e n t in a n In te rn a tio n a l D a ta G rid P ro je c t. IE E E /A C M In te r n a tio n a l W o r k s h o p o n G r id
C o m p u tin g G r id ’2 0 0 0 – 1 7 -2 0 D e c e m b e r 2 0 0 0 B a n g a lo re , In d ia . “ D is tin g u is h e d P a p e r”
A w a rd .
B a la to n , Z ., K a c z u k , P . a n d P o d h o r s k i, N . F r o m C lu s te r M o n ito rin g to G rid M o n ito rin g

B a s e d o n G R M a n d P R O V E . R e p o r t o f th e L a b o r a to r y o f P a r a lle l a n d D is tr ib u te d S y s te m s ,
L P D S – 1 /2 0 0 0
D u llm a n n , D ., H o s c h e k , W ., J e a n - M a r tin e z , J ., S a m a r , A ., S to c k in g e r , H .a n d S to c k in g e r , K .
th
M o d e ls fo r R e p lic a S y n c h ro n is a tio n a n d C o n s is te n c y in a D a ta G rid . 1 0 IE E E S y m p o s iu m o n
H ig h P e r fo r m a n c e D is tr ib u te d C o m p u tin g (H P D C 2 0 0 1 ). S a n F ra n c is c o , C a lifo rn ia , A u g u s t 7 -
9 , 2 0 0 1 .
th
S to c k in g e r, H . D is trib u te d D a ta b a s e M a n a g e m e n t S y s te m s a n d th e D a ta G rid . 1 8 IE E E
th
S y m p o s iu m o n M a s s S to r a g e S y s te m s a n d 9 N A S A G o d d a r d C o n fe r e n c e o n M a s s S to r a g e
S y s te m s a n d T e c h n o lo g ie s , S a n D ie g o , A p ril 1 7 -2 0 , 2 0 0 1 .
S e r a f in i, L ., S to c k in g e r H ., S to c k in g e r , K . a n d Z in i, F . A g e n t- B a s e d Q u e r y O p tim is a tio n in
a G rid E n v iro n m e n t. IA S T E D In te r n a tio n a l C o n fe r e n c e o n A p p lie d In fo r m a tic s (A I2 0 0 1 ) ,
In n s b ru c k , A u s tria , F e b ru a ry 2 0 0 1 .
S to c k in g e r , H ., S to c k in g e r , K ., S c h ik u ta a n d W ille r s , I . T o w a r d s a C o s t M o d e l f o r
th
D is trib u te d a n d R e p lic a te d D a ta S to re s . 9 E u r o m ic r o W o r k s h o p o n P a r a lle l a n d D is tr ib u te d
P r o c e s s in g P D P 2 0 0 1 , M a n to v a , Ita ly , F e b ru a ry 7 -9 , 2 0 0 1 . IE E E C o m p u te r S o c ie ty P re s s
H a f e e z , M ., S a m a r , A . a n d S to c k in g e r , H . A D a ta G r id P r o to ty p e f o r d is tr ib u te d D a ta
P ro d u c tio n in C M S . V II In te r n a tio n a l W o r k s h o p o n A d v a n c e d C o m p u tin g a n d A n a ly s is
T e c h n iq u e s in P h y s ic s R e s e a r c h (A C A T 2 0 0 0 ), O c to b e r 2 0 0 0 .
S a m a r, A . a n d S to c k in g e r, H . G rid D a ta M a n a g e m e n t P ilo t (G D M P ): a T o o l fo r W ild e A re a
R e p lic a tio n . . IA S T E D In te r n a tio n a l C o n fe r e n c e o n A p p lie d In fo r m a tic s (A I2 0 0 1 ) , In n s b ru c k ,
A u s tria , F e b ru a ry 2 0 0 1 .
R u d a , M . In te g ra tin g G rid T o o ls to b u ild a c o m p u tin g re s o u rc e b ro k e r: a c tiv itie s o f
D a ta G rid W P 1 . C o n fe r e n c e in C o m p u tin g in H ig h E n e r g y P h y s ic s (C H E P 0 1 ), B e ijin g ,
S e p te m b e r 3 -7 , 2 0 0 1
C e re llo , P . G rid A c tiv itie s in A L IC E . P r o c e e d in g s o f th e C o n fe r e n c e in C o m p u tin g in H ig h
E n e r g y P h y s ic s (C H E P 0 1 ), B e ijin g , S e p te m b e r 3 -7 , 2 0 0 1 .
H a rris , F . a n d V a n H e rw ijn e n , E . M o v in g th e L H C b M o n te C a rlo P ro d u c tio n s y s te m to th e
G rid . P r o c e e d in g s o f th e C o n fe r e n c e in C o m p u tin g in H ig h E n e r g y P h y s ic s (C H E P 0 1 ), B e ijin g ,
S e p te m b e r 3 -7 , 2 0 0 1 .
F is k , I. C M S G rid A c tiv itie s in th e U n ite d S ta te s . P r o c e e d in g s o f th e C o n fe r e n c e in
C o m p u tin g in H ig h E n e r g y P h y s ic s (C H E P 0 1 ), B e ijin g , S e p te m b e r 3 -7 , 2 0 0 1 .
G ra n d i, C . C M S G rid A c tiv itie s in E u ro p e . P r o c e e d in g s o f th e C o n fe r e n c e in C o m p u tin g in
H ig h E n e r g y P h y s ic s (C H E P 0 1 ), B e ijin g , S e p te m b e r 3 -7 , 2 0 0 1 .
H o ltm a n , K . C M S re q u ire m e n ts fo r th e G rid . P r o c e e d in g s o f th e C o n fe r e n c e in C o m p u tin g
in H ig h E n e r g y P h y s ic s (C H E P 0 1 ), B e ijin g , S e p te m b e r 3 -7 , 2 0 0 1 .
M a lo n , D . e t a l, G rid -e n a b le d D a ta A c c e s s in th e A T L A S A th e n a F ra m e w o rk . P r o c e e d in g s
o f th e C o n fe r e n c e in C o m p u tin g in H ig h E n e r g y P h y s ic s (C H E P 0 1 ), B e ijin g , S e p te m b e r 3 -7 ,
2 0 0 1 .
O th e r s G r id P u b lic a tio n s
F o s te r , I ., K e s s e lm a n , C ., M .N ic k , J . A n d T u e c k e , S . T h e P h y s io lo g y o f th e G r id : A n O p e n
G rid S e rv ic e s A rc h ite c tu re fo r D is trib u te d S y s te m s In te g ra tio n .
st
F o s te r, I. T h e G rid : A n e w in fra s tru c tu re fo r 2 1 C e n tu ry S c ie n c e . P h y s ic s T o d a y , 5 4 (2 ).
2 0 0 2
F o s te r, I. A n d K e s s e lm a n , C . G lo b u s : A T o o lk it-B a s e d G rid A rc h ite c tu re . In F o s te r, I. a n d
K e s s e lm a n , C . e d s . T h e G r id : B lu e p r in t fo r a N e w C o m p u tin g In fr a s tr u c tu r e , M o rg a n
K a u fm a n n , 1 9 9 9 , 2 5 9 -2 7 8 .
F o s te r, I. a n d K e s s e lm a n , C . ( e d s .) . T h e G r id : B lu e p r in t fo r a N e w C o m p u tin g
In fr a s tr u c tu r e , M o rg a n K a u fm a n n , 1 9 9 9 .
G lo s s a r y
A F S A n d re w F ile S y s te m
B Q S B a tc h Q u e u e S e rv ic e
C E C o m p u tin g E le m e n t
C V S C o n c u rre n t V e rs io n in g S y s te m
E D G E u ro p e a n D a ta G rid
E IP E x p e rim e n t In d e p e n d e n t P e rs o n
F tre e L D A P -b a s e d d y n a m ic d ire c to ry s e rv ic e
G D M P G rid D a ta M irro rin g P a c k a g e
II In fo rm a tio n In d e x
IT e a m In te g ra tio n T e a m
JD L J o b D e s c rip tio n L a n g u a g e
JS S J o b S u b m is s io n S e rv ic e
L B L o g g in g a n d B o o k k e e p in g
L C F G A u to m a te d s o ftw a re in s ta lla tio n s y s te m
L D A P L ig h tw e ig h t D ire c to ry A c c e s s P ro to c o l
L F N L o g ic a l F ile N a m e
L S F L o a d S h a rin g F a c ility
M D S G lo b u s M e ta c o m p u tin g D ire c to ry S e rv ic e
M S M a s s S to ra g e
N F S N e tw o rk F ile S y s te m
P B S P o rta b le B a tc h S y s te m
R B R e so u rc e B ro k e r
R C R e p lic a C a ta lo g u e
R F IO R e m o te F ile I/O s o ftw a re p a c k a g e
R P M R e d H a t P a c k a g e M a n a g e r
S E S to ra g e E le m e n t
T B 1 T e s tb e d 1 (p ro je c t m o n th 9 re le a s e o f D a ta G rid )
U I U s e r In te rfa c e
V O V irtu a l O rg a n is a tio n
W N W o rk e r N o d e
W P W o rk p a c k a g e
A c k n o w le d g m e n ts . T h e a u th o rs w o u ld lik e to th a n k th e e n tire E U D a ta G rid p ro je c t

fo r c o n trib u tin g m o s t o f th e m a te ria l fo r th is a rtic le .
Author Index
Almeida, Virgilio A.F. 142 Jones, Bob 480

Andreolini, Mauro 208
Kalbarczyk, Zbigniew 290
Baier, Christel 261 Katoen, Joost-Pieter 261
Bernardo, Marco 236
Burke, Stephen 480 Leão, Rosa M.M. 374
Campos, Sérgio 374 Mirandola, Raffaela 346

Cardellini, Valeria 208 Mitrani, Isi 17
Ciancarini, Paolo 236
Colajanni, Michele 208 Patel, Chandrakant 463
Conti, Marco 435 Pekergin, N. 64
Cortellessa, Vittorio 346
Cremonesi, Paolo 158 Reale, Mario 480
Ribeiro-Neto, Berthier 374
Donatiello, Lorenzo 236
Riska, Alma 36
Rolia, Jerry 463
Feitelson, Dror G. 114
Fourneau, J.M. 64
Serazzi, Giuseppe 158
Friedrich, Rich 463
Silva, Edmundo de Souza e 374
Gagliardi, Fabrizio 480 Smirni, Evgenia 36
Gelenbe, Erol 1
Grassi, Vincenzo 346 Telek, Miklós 405
Gregori, Enrico 435 Trivedi, Kishor S. 318
Haverkort, Boudewijn 261 Vaidyanathan, Kalyanaraman 318

Hermanns, Holger 261
Horváth, András 405 Weicker, Reinhold 179
Iyer, Ravishankar K. 290 Yao, David D. 89

3 540 45798 4

Uploaded by

Copyright:

Available Formats

3 540 45798 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3 540 45798 4

Uploaded by

Copyright:

Available Formats

L e c tu re N o te s in C o m p u te r S c ie n c e 2 4 5 9

C a ta lo g in g -in -P u b lic a tio n D a ta a p p lie d fo r

T h is w o rk is s u b je c t to c o p y rig h t. A ll rig h ts a re re s e rv e d , w h e th e r th e w h o le o r p a rt o f th e m a te ria l is

July 2002 Maria Carla Calzarossa

G-Networks: Multiple Classes of Positive Customers, Signals, and

Spectral Expansion Solutions for Markov-Modulated Queues . . . . . . . . . . . . 17

M/G/1-Type Markov Processes: A Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

An Algorithmic Approach to Stochastic Bounds . . . . . . . . . . . . . . . . . . . . . . . 64

Dynamic Scheduling via Polymatroid Optimization . . . . . . . . . . . . . . . . . . . . . 89

Workload Modeling for Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 114

Capacity Planning for Web Services (Techniques and Methodology) . . . . . . 142

End-to-End Performance of Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

Benchmarking Models and Tools for Distributed Web-Server Systems . . . . . 208

Stochastic Process Algebra: From an Algebraic Formalism to an

Automated Performance and Dependability Evaluation Using Model

Measurement-Based Analysis of System Dependability Using Fault

Software Reliability and Rejuvenation: Modeling and Analysis . . . . . . . . . . . 318

Performance Validation of Mobile Software Architectures . . . . . . . . . . . . . . . 346

Performance Issues of Multimedia Applications . . . . . . . . . . . . . . . . . . . . . . . . 374

Markovian Modeling of Real Data Traﬃc: Heuristic Phase Type and

Optimization of Bandwidth and Energy Consumption in Wireless

European DataGrid Project: Experiences of Deploying a Large Scale

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501

School of Electrical Engineering and Computer Science

Abstract. The purpose of this tutorial presentation is to introduce G-

at one of the processors if the work is successfully completed at the other, is

– Type 1 : ﬁrst-in-ﬁrst-out (FIFO),

We consider networks with an arbitrary number n of queues, an arbitrary number

signal is either exogenous, or is obtained by the transformation of a positive

for all classes of positive customers a and b, Ki,m,a = Ki,m,b (3)

2.1 State Representation

We denote the state at time t of the queueing network by a vector x(t) =

Theorem 1 Consider a G-network with the restrictions and properties described

4 Proof of the Main Result

Similarly, using equation (7)

According to the deﬁnition of the routing matrix P (equation (1)), we have

Thus the proof of the Lemma is complete.

FIFO and LIFO/PR Mj,l (xj ) = rj,l 1{cj,1 =l} ,

Denote by (xj + ej,l ) the state of station j obtained by adding to the j − th

Lemma 3 Let i be any Type 1, 2, or 4 station, and let Δi (xi ) be:

Substituting the value of D[j, l] and the value of d[j, l],

which yields thefollowing equality which is obviously satisﬁed,

concluding the proof.

where the marginal probabilities hi (yi ) have the following form :

4.1 Proof of Lemma 3

The proof of Lemma 3 consists in algebraic manipulations of the terms in the

FIFO. Consider now an arbitrary FIFO station :

− 1{|xi |>0} 1{ci,1 =k} (ri,k + Ki,m,k λi,m + Ki,m,k λ−

PS. Consider now an arbitrary PS station :

As usual, we substitute the values of Yi,m , Mi,k , Ni,k , Ai,k :

Then, we apply equation (5) to substitute qi,k . After some cancelations of

Computing Science Department, University of Newcastle

There are many computer, communication and manufacturing systems which

Fig. 1. State diagram of a QBD process

pi,j = lim P (It = i , Jt = j) ; i = 0, 1, . . . , N ; j = 0, 1, . . . . (1)

2 Examples of Markov-Modulated Queues

2.1 A Multiserver Queue with Breakdowns and Repairs

and inoperative periods are distributed exponentially with parameters ξ and η,

Fig. 2. A multiserver queue with breakdowns and repairs