3 540 45798 4
3 540 45798 4
3 540 45798 4
E d ite d b y G . G o o s , J . H a rtm a n is , a n d J . v a n L e e u w e n
3B e r lin
H e id e lb e rg
N e w Y o rk
B a rc e lo n a
H o n g K o n g
L o n d o n
M ila n
P a r is
T o k y o
M a ria C a rla C a lz a ro s s a S a lv a to r e T u c c i ( E d s .)
P e rfo rm a n c e E v a lu a tio n
o f C o m p le x S y s te m s :
T e c h n iq u e s a n d T o o ls
P e rfo rm a n c e 2 0 0 2 T u to ria l L e c tu re s
13
S e rie s E d ito rs
G e r h a r d G o o s , K a r ls r u h e U n iv e r s ity , G e r m a n y
J u r is H a r tm a n is , C o r n e ll U n iv e r s ity , N Y , U S A
J a n v a n L e e u w e n , U tr e c h t U n iv e r s ity , T h e N e th e r la n d s
V o lu m e E d ito rs
M a ria C a rla C a lz a ro s s a
U n iv e r s ità d i P a v ia , D ip a r tim e n to d i In fo rm a tic a e S is te m is tic a
v ia F e rra ta 1 , 2 7 1 0 0 P a v ia , Ita ly
E - m a il: m c c @ a lic e .u n ip v .it
S a lv a to re T u c c i
U ffi c io p e r l’In fo rm a tic a , la T e le m a tic a e la S ta tis tic a
P re s id e n z a d e l C o n s ig lio d e i M in is tri
v ia d e lla S ta m p e ria 8 , 0 0 1 8 7 R o m a , Ita ly
E - m a il: tu c c i@ to r v e rg a ta .it
D ie D e u ts c h e B ib lio th e k - C IP -E in h e its a u fn a h m e
P e rfo rm a n c e e v a lu a tio n o f c o m p le x s y te m s : te c h n iq u e s a n d to o ls ;
p e rfo rm a n c e 2 0 0 2 tu to ria l le c tu r e s / M a ria C a rla C a lz a ro s s a ; S a lv a to re
T u c c i (e d .) . - B e r lin ; H e id e lb e rg ; N e w Y o rk ; H o n g K o n g ; L o n d o n ; M ila n ;
P a ris ; T o k y o : S p rin g e r, 2 0 0 2
(L e c tu re n o te s in c o m p u te r s c ie n c e ; V o l. 2 4 5 9 )
IS B N 3 -5 4 0 -4 4 2 5 2 -9
C R S u b je c t C la s s ifi c a tio n ( 1 9 9 8 ) : C .4 , C .2 , D .2 .8 , D .4 , F .1 , H .4
IS S N 0 3 0 2 -9 7 4 3
IS B N 3 -5 4 0 -4 4 2 5 2 -9 S p rin g e r-V e rla g B e rlin H e id e lb e rg N e w Y o rk
The fast evolution and the increased pervasiveness of computers and commu-
nication networks have led to the development of a large variety of complex
applications and services which have become an integral part of our daily lives.
Modern society widely relies on information technologies. Hence, the Quality
of Service, that is, the efficiency, availability, reliability, and security of these
technologies, is an essential requirement for the proper functioning of modern
society.
In this scenario, performance evaluation plays a central role. Performance
evaluation has to assess and predict the performance of hardware and software
systems, and to identify and prevent their current and future performance bott-
lenecks.
In the past thirty years, many performance evaluation techniques and tools
have been developed and successfully applied in studies dealing with the con-
figuration and capacity planning of existing systems and with the design and
development of new systems. Recently, performance evaluation techniques have
evolved to cope with the increased complexity of the current systems and their
workloads. Many of the classical techniques have been revisited in light of the
recent technological advances, and novel techniques, methods, and tools have
been developed.
This book is organized around a set of survey papers which provide a com-
prehensive overview of the theories, techniques, and tools for performance and
reliability evaluation of current and new emerging technologies. The papers, by
leading international experts in the field of performance evaluation, are based on
the tutorials presented at the IFIP WG 7.3 International Symposium on Compu-
ter Modeling, Measurement, and Evaluation (Performance 2002) held in Rome
on September 23–27, 2002.
The papers address the state of the art of the theoretical and methodological
advances in the area of performance and reliability evaluation as well as new
perspectives in the major application domains. A broad spectrum of topics is
covered in this book. Modeling and verification formalisms, solution methods,
workload characterization, and benchmarking are addressed from a methodo-
logical point of view. Applications of performance and reliability techniques to
various domains, such as, hardware and software architectures, wired and wi-
reless networks, Grid environments, Web services, real–time voice and video
applications, are also examined.
This book is intended to serve as a reference for students, scientists, and en-
gineers working in the areas of performance and reliability evaluation, hardware
and software design, and capacity planning.
VI Preface
Finally, as editors of the book, we would like to thank all authors for their
valuable contributions and their effort and cooperation in the preparation of
their manuscripts.
Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Reinhold Weicker
Erol Gelenbe
1 Introduction
In this survey and tutorial, we discuss a class of queueing networks, originally
inspired by our work on neural networks, in which customers are either “signals”
or positive customers.
Positive customers enter a queue and receive service as ordinary queueing
network customers; they constitute queue length. A signal may be of a “negative
customer”, or it may be a “trigger”. Signals do not receive service, and disappear
after having visited a queue. If the signal is a trigger, then it actually transfers
a customer from the queue it arrives to, to some other queue according to a
probabilistic rule. On the other hand, a negative customer simply depletes the
length of the queue to which it arrives if the queue is non-empty. One can
also consider that a negative customer is a special kind of trigger which simply
sends a customer to the “outside world” rather than transferring it to another
queue. Positive customers which leave a queue to enter another queue can become
signals or remain positive customers.
Additional primitive operations for these networks have also been introduced
in [12]. The computation of numerical solutions to the non-linear traffic equations
of some of these models have been discussed in [6]. Applications to networking
problems are reported in [17]. A model of doubly redundant systems using G-
networks, where work is scheduled on two different processors and then cancelled
M.C. Calzarossa and S. Tucci (Eds.): Performance 2002, LNCS 2459, pp. 1–16, 2002.
c Springer-Verlag Berlin Heidelberg 2002
2 E. Gelenbe
With reference to the usual terminology related to the BCMP theorem [2],
we exclude from the present discussion the Type 3 service centers with an in-
finite number of servers since they will not be covered by our results.
Furthermore, in this paper we deal only with exponentially distributed service
times.
In Section 2 we will prove that these multiple class G-Networks, with Type
1, 2 and 4 service centers, have product form.
2 The Model
n
R
n
S
P + [i, j][k, l] + P − [i, j][k, m] + d[i, k] = 1 (1)
j=1 l=1 j=1 m=1
We assume that all service centers have exponential service time distribu-
tions. In the three types of service centers, each class of positive customers may
have a distinct service rate
rik. When the service center is of Type 1 (FIFO) we place the following con-
straint on the service rate and the movement triggering rate due to incoming
signals:
S
rik + Ki,m,k λi,m = ci (2)
m=1
Note that this constraint, together with the constraint (3) given below, have
the effect of producing a single positive customer class equivalent for service cen-
ters with FIFO discipline. The following constraints on the movement triggering
probability are assumed to exist. Note that because services are exponentially
distributed, positive customers of a given class are indistinguishable for move-
ment triggering because of the Markovian property of service time.
– The following constraint
n must
R hold for all stations i of Type 1 and classes of
signals m such that j=1 l=1 P − [j, i][l, m] > 0
This constraint implies that a signal of some class m arriving from the net-
work does not “distinguish” between the positive customer classes it will try
to trigger the movement, and that it will treat them all in the same manner.
– For a Type 2 server, the probability that any one positive customer of the
queue is selected by the arriving signalis 1/c if c is the total number of
customers in the queue.
For Type 1 service centers, one may consider the following conditions which
are simpler than (2) and (3):
ria = rib
(4)
Ki,m,a = Ki,m,b
for all classes of positive customers a and b, and all classes of signals m. Note
however that these new conditions are more restrictive, though they do imply
that (2), 3) hold.
4 E. Gelenbe
3 Main Theorem
Let P (x) denote the stationary probability distribution of the state of the net-
work. It is given by the following product form result.
Λi,k + Λ+
i,k
qi,k = S (5)
ri,k + m=1 Ki,m,k [λi,m + λ−
i,m ]
n
R
Λ+
i,k = P + [j, i][l, k]rj,l qj,l
j=1 l=1
n
R
n
S
R
+ rj,l qj,l P − [j, h][l, m]Kh,m,s qh,s Q[h, i][s, k]
j=1 l=1 h=1 m=1 s=1
n
S
R
+ λj,m Kj,m,s qj,s Q[j, i][s, k] (6)
j=1 m=1 s=1
n
R
λ−
i,m = P − [j, i][l, m]rj,l qj,l (7)
j=1 l=1
has a solution such that for each pair i, k : 0 < qi,k and for each sta-
R
tion i : k=1 qi,k < 1, then the stationary probability distribution of the network
state is:
n
P (x) = G gi (xi ) (8)
i=1
G-Networks: Multiple Classes of Positive Customers 5
where each gi (xi ) depends on the type of service center i. The gi (xi ) in (5) have
the following form :
FIFO. If the service center is of Type 1, then
|xi |
gi (xi ) = qi,vi,n (9)
n=1
PS. If the service center is of Type 2, then
R
(qi,k )xi,k
gi (xi ) = |xi |! (10)
xi,k !
k=1
LIFO/PR. If the service center is of Type 4, then
|xi |
gi (xi ) = qi,vi,n (11)
n=1
and G is the normalization constant.
Notice that Λ+
i,k may be written as:
n
R
Λ+
i,k = rj,l qj,l P + [j, i][l, k]
j=1 l=1
n
R
S
+ qj,l Q[j, i][l, k]Kj,m,l [λj,m + λ−
j,m ] (12)
j=1 l=1 m=1
The conditions requiring that qi,k > 0 and on that their sum over all classes
at each center be less than 1 simply insure the existence of the normalizing
constant G in (8) and the stability of the network.
n
R
= qi,k ri,k (1 − d[i, k])
i=1 k=1
n
R
n
R
S
+ qj,l Q[j, i][l, k]Kj,m,l [λj,m + λ−
j,m ]
i=1 k=1 j=1 l=1 m=1
6 E. Gelenbe
Proof : Consider (12), then sum it for all the stations and all the classes
and exchange the order of summations in the right-hand side of the equation :
n
R
n
R n
R
Λ+
i,k = rj,l qj,l ( P + [j, i][l, k])
i=1 k=1 j=1 l=1 i=1 k=1
n
R
n
R
S
+ qj,l Q[j, i][l, k]Kj,m,l [λj,m + λ−
j,m ]
i=1 k=1 j=1 l=1 m=1
n
S
n
R n
S
λ−
i,m = rj,l qj,l ( P − [j, i][l, m])
i=1 m=1 j=1 l=1 i=1 m=1
Furthermore:
n
R
n
S
Λ+
i,k + λ−
i,m
i=1 k=1 i=1 m=1
n
R n
R
n
S
= rj,l qj,l ( P + [j, i][l, k] + P − [j, i][l, m])
j=1 l=1 i=1 k=1 i=1 m=1
n
R
n
R
S
+ qj,l Q[j, i][l, k]Kj,m,l [λj,m + λ−
j,m ]
i=1 k=1 j=1 l=1 m=1
n
R
n
S
Λ+
i,k + λ−
i,m
i=1 k=1 i=1 m=1
n
R
= rj,l qj,l (1 − d[j, l])
j=1 l=1
n
R
n
R
S
+ qj,l Q[j, i][l, k]Kj,m,l [λj,m + λ−
j,m ]
i=1 k=1 j=1 l=1 m=1
gj (xj + ej,l )
Mj,l (xj + ej,l ) = rj,l qj,l (13)
gj (xj )
gj (xj + ej,l ) S
Nj,l (xj + ej,l ) = (Kj,m,l λj,m )qj,l (14)
gj (xj ) m=1
gj (xj + ej,l )
Zj,l,m (xj + ej,l ) = Kj,m,l qj,l (15)
gj (xj )
The proof is purely algebraic.
Remark : As a consequence, we have from equations (12), (7) and (13):
n
R
gj (xj + ej,l ) +
Λ+
i,k = Mj,l (xj + ej,l ) P [j, i][l, k]
j=1 l=1
gj (xj )
n
R
S
+ qj,l Q[j, i][l, k]Kj,m,l [λj,m + λ−
j,m ] (16)
j=1 l=1 m=1
and
8 E. Gelenbe
n
R
gj (xj + ej,l ) −
λ−
i,m = Mj,l (xj + ej,l ) P [j, i][l, m] (17)
j=1 l=1
gj (xj )
S
Δi (xi ) = λ−
i,m Yi,m (xi )
m=1
R
− (Mi,k (xi ) + Ni,k (xi ))
k=1
R
gi (xi − ei,k )
+ Ai,k (xi )(Λi,k + Λ+
i,k )
gi (xi )
k=1
S
Then for the three types of service centers, 1{|xi |>0} Δi (xi ) = m=1 λ−
i,m
1{|xi |>0} .
The proof of Lemma 3 is in a separate subsection at the end of this paper in
order to make the text somewhat easier to follow.
Let us now turn to the proof of Theorem 1. The global balance equation of
the networks which are considered is:
n
R
P (x)[ (Λj,l + Mj,l (xj )1{|xj |>0} + Nj,l (xj )1{|xj |>0} )]
j=1 l=1
n
R
= P (x − ej,l )Λj,l Aj,l (xj )1{|xj |>0}
j=1 l=1
n
R
+ P (x + ej,l )Nj,l (xj + ej,l )D[j, l]
j=1 l=1
n
R
+ P (x + ej,l )Mj,l (xj + ej,l )d[j, l]
j=1 l=1
n
n
R
S
+ Mj,l (xj + ej,l )P (x + ej,l )P − [j, i][l, m]Yi,m (xi )1{|xi |>0}
i=1 j=1 l=1 m=1
n
n
R
S
+ Mj,l (xj + ej,l )P (x + ej,l )P − [j, i][l, m]1{|xi |=0}
i=1 j=1 l=1 m=1
n
n
R
R
+ Mj,l (xj + ej,l )P (x − ei,k + ej,l )P + [j, i][l, k]Ai,k (xi )1{|xi |>0}
i=1 j=1 k=1 l=1
G-Networks: Multiple Classes of Positive Customers 9
n
n
R
R
+ Nj,l (xj + ej,l )P (x − ei,k + ej,l )Q[j, i][l, k]Ai,k (xi )1{|xi |>0}
i=1 j=1 k=1 l=1
n
n
R
R
S
+ Mj,l (xj + ej,l )P (x + ei,k + ej,l )P − [j, i][l, m]Zi,k,m
i=1 j=1 k=1 l=1 m=1
(xi + ei,k )D[i, k]
n
n
R
R
S
n
R
+ (Mj,l (xj + ej,l )P (x+ei,k + ej,l − eh,s )P − [j, i][l, m]
i=1 j=1 k=1 l=1 m=1 h=1 s=1
Zi,k,m (xi + ei,k )Q[i, h][k, s]Ah,s (xh )1{|xh |>0} )
We divide both sides by P (x), assume that there is a product form solution,
and apply Lemma 2:
n
R
(Λj,l + Mj,l (xj )1{|xj |>0} + Nj,l (xj )1{|xj |>0} )
j=1 l=1
n R
gj (xj − ej,l )
= Λj,l Aj,l (xj )1{|xj |>0}
j=1
gj (xj )
l=1
n
R
S
n
R
+ λj,m Kj,m,l qj,l D[j, l] + rj,l qj,l d[j, l]
j=1 l=1 m=1 j=1 l=1
n
n
R
S
+ rj,l qj,l P − [j, i][l, m]Yi,m (xi )1{|xi |>0}
i=1 j=1 l=1 m=1
n
n
R
S
+ rj,l qj,l P − [j, i][l, m]1{|xi |=0}
i=1 j=1 l=1 m=1
n
n
R
R
gi (xi − ei,k )
+ rj,l qj,l P + [j, i][l, k]Ai,k (xi ) 1{|xi |>0}
i=1 j=1 k=1 l=1
gi (xi )
n
n
R
R
S
gi (xi − ei,k )
+ λj,m Kj,m,l qj,l Q[j, i][l, k]Ai,k (xi ) 1{|xi |>0}
i=1 j=1 k=1 l=1 m=1
gi (xi )
n
n
R
R
S
+ rj,l qj,l P − [j, i][l, m]Ki,m,k qi,k D[i, k]
i=1 j=1 k=1 l=1 m=1
n
n
n
R
R
R
S
+ rj,l qj,l P − [j, i][l, m]Ki,m,k qi,k Q[i, h][k, s]
i=1 j=1 h=1 l=1 k=1 s=1 m=1
gh (xh − eh,s )
Ah,s (xh )1{|xh |>0}
gh (xh )
10 E. Gelenbe
We now apply (7) to the fourth, fifth, eigth and ninth terms of the second member
of the equation:
n
R
(Λj,l + Mj,l (xj )1{|xj |>0} + Nj,l (xj )1{|xj |>0} )
j=1 l=1
n R
gj (xj − ej,l )
= Λj,l Aj,l (xj )1{|xj |>0}
j=1
gj (xj )
l=1
n
R
S
n
R
+ λj,m Kj,m,l qj,l D[j, l] + rj,l qj,l d[j, l]
j=1 l=1 m=1 j=1 l=1
n
S
+ λ−
i,m Yi,m (xi )1{|xi |>0}
i=1 m=1
n
S
+ λ−
i,m 1{|xi |=0}
i=1 m=1
n
n
R
R
gi (xi − ei,k )
+ rj,l qj,l P + [j, i][l, k]Ai,k (xi ) 1{|xi |>0}
i=1 j=1 k=1 l=1
gi (xi )
n
n
R
R
S
gi (xi − ei,k )
+ λj,m Kj,m,l qj,l Q[j, i][l, k]Ai,k (xi ) 1{|xi |>0}
i=1 j=1 k=1 l=1 m=1
gi (xi )
n
R
S
+ λ−
i,m Ki,m,k qi,k D[i, k]
i=1 k=1 m=1
n
n
R
R
S
gi (xi − ei,k )
+ λ−
j,m Kj,m,l qj,l Q[j, i][l, k] Ai,k (xi )1{|xi |>0}
i=1 j=1 l=1 k=1 m=1
gi (xi )
We group the first, sixth, seventh and ninth terms of the right side of the equa-
tion, and pass the two last terms of the first member to the second:
n
R
(Λj,l )
j=1 l=1
n
R
=− (Mi,k (xi ) + Ni,k (xi ))1{|xi |>0}
i=1 k=1
n
R
gi (xi − ei,k )
+ Ai,k (xi )1{|xi |>0} (Λi,k + Λ+
i,k )
i=1 k=1
gi (xi )
n
S
+ λ−
i,m Yi,m (xi )1{|xi |>0}
i=1 m=1
G-Networks: Multiple Classes of Positive Customers 11
n
R
S
n
R
+ λj,m Kj,m,l qj,l D[j, l] + rj,l qj,l d[j, l]
j=1 l=1 m=1 j=1 l=1
n
S
+ λ−
i,m 1{|xi |=0}
i=1 m=1
n
R
S
+ λ−
i,m Ki,m,k qi,k D[i, k]
i=1 k=1 m=1
We now apply Lemma 3 to the sum of the three first terms of the second equation:
n
R
Λj,l
j=1 l=1
n
S
= λ−
i,m 1{|xi |>0}
i=1 m=1
n
R
S
n
R
+ λj,m Kj,m,l qj,l D[j, l] + rj,l qj,l d[j, l]
j=1 l=1 m=1 j=1 l=1
n
S
+ λ−
i,m 1{|xi |=0}
i=1 m=1
n
R
S
+ λ−
j,m Kj,m,k qj,k D[j, k]
j=1 k=1 m=1
Now we group the first and fourth terms, and the second and fifth terms of the
right side of the equation.
n
R
Λj,l
j=1 l=1
n
S
= λ−
i,m
i=1 m=1
n
R
S
+ qj,l Kj,m,l (λj,m + λ−
j,m )D[j, l]
j=1 l=1 m=1
n
R
+ rj,l qj,l d[j, l]
j=1 l=1
n
R
Λj,l
j=1 l=1
12 E. Gelenbe
n
S
n
R
S
n
R
= λ−
i,m + qj,l Kj,m,l (λj,m + λ−
j,m ) + qj,l rj,l
i=1 m=1 j=1 l=1 m=1 j=1 l=1
n
n
R
R
S
−( qj,l Kj,m,l Q[j, i][l, k](λj,m + λ−
j,m )
i=1 j=1 l=1 k=1 m=1
n
n
R
R
+ rj,l qj,l P + [j, i][l, k])
i=1 j=1 l=1 k=1
n
n
R
S
− qj,l rj,l P − [j, i][l, m]
i=1 j=1 l=1 m=1
and substituting for qjl in the second and third terms and grouping them we
have:
n
R
Λj,l
j=1 l=1
n
S
= λ−
i,m
i=1 m=1
n
R
n
R
+ Λj,l + Λ+
j,l
j=1 l=1 j=1 l=1
n
R
n
S
− j,l −
Λ+ λ−
i,m
j=1 l=1 i=1 m=1
LIFO/PR. First consider an arbitrary LIFO station and recall the definition of
Δi :
R
gi (xi − ei,k )
1{|xi |>0} Δi (xi ) = 1{|xi |>0} Ai,k (xi )(Λi,k + Λ+
i,k )
gi (xi )
k=1
R
R
− 1{|xi |>0} Mi,k (xi ) − 1{|xi |>0} Ni,k (xi )
k=1 k=1
S
+ 1{|xi |>0} λ−
i,m Yi,m (xi )
m=1
We substitute the values of Yi,m , Mi,k , Ni,k and Ai,k for a LIFO station :
R
1{|xi |>0} Δi (xi ) = 1{|xi |>0} 1{ci,1 =k} (Λi,k + Λ+
i,k )/qi,k
k=1
R
− 1{|xi |>0} 1{ci,1 =k} ri,k
k=1
R
S
− 1{|xi |>0} 1{ci,1 =k} Ki,m,k λi,m
k=1 m=1
S
R
+ 1{|xi |>0} λ−
i,m 1{ci,1 =k} (1 − Ki,m,k )
m=1 k=1
We use the value of qi,k from equation (5) and some cancellations of termsto
obtain:
R
S
S
1{|xi |>0} Δi (xi ) = 1{|xi |>0} 1{ci,1 =k} ( Ki,m,k λ−
i,m + λ−
i,m (1 − Ki,m,k )
k=1 m=1 m=1
S
R
= 1{|xi |>0} λ−
i,m 1{ci,1 =k}
m=1 k=1
R
and as 1{|xi |>0} k=1 1{ci,1 =k} = 1{|xi |>0} , we finally get the result :
S
1{|xi |>0} Δi (xi ) = 1{|xi |>0} λ−
i,m (20)
m=1
14 E. Gelenbe
R
gi (xi − ei,k )
1{|xi |>0} Δi (xi ) = 1{|xi |>0} Ai,k (xi )(Λi,k + Λ+
i,k )
gi (xi )
k=1
R
R
− 1{|xi |>0} Mi,k (xi ) − 1{|xi |>0} Ni,k (xi )
k=1 k=1
S
+ 1{|xi |>0} λ−
i,m Yi,m (xi )
m=1
Similarly, we substitute the values of Yi,m , Mi,k , Ni,k , Ai,k and qi,k :
R
S
S
1{|xi |>0} Δi (xi ) = 1{|xi |>0} 1{ci,∞ =k} (ri,k + Ki,m,k λi,m + Ki,m,k λ−
i,m )
k=1 m=1 m=1
R
R
S
− 1{|xi |>0} 1{ci,1 =k} ri,k − 1{|xi |>0} 1{ci,1 =k} Ki,m,k λi,m
k=1 k=1 m=1
S
R
+ 1{|xi |>0} λ−
i,m 1{ci,1 =k} (1 − Ki,m,k )
m=1 k=1
We separate the last term into two parts, and regroup terms:
R
S
S
1{|xi |>0} Δi (xi ) = 1{|xi |>0} 1{ci,∞ =k} (ri,k + Ki,m,k λi,m + Ki,m,k λ−
i,m )
k=1 m=1 m=1
R
S
S
S
R
+ 1{|xi |>0} λ−
i,m 1{ci,1 =k}
m=1 k=1
Conditions (2) and (3) imply that the following relation must hold:
R
S
S
1{ci,∞ =k} (ri,k + Ki,m,k λi,m + Ki,m,k λ−
i,m ) =
k=1 m=1 m=1
R
S
S
1{ci,1 =k} (ri,k + Ki,m,k λi,m + Ki,m,k λ−
i,m )
k=1 m=1 m=1
R
Thus, as 1{|xi |>0} k=1 1{ci,1 =k} = 1{|xi |>0} , we finally get the expected
result :
S
1{|xi |>0} Δi (xi ) = 1{|xi |>0} λ−
i,m (21)
m=1
G-Networks: Multiple Classes of Positive Customers 15
Finally we have:
R
xi,k
S
1{|xi |>0} Δi (xi ) = 1{|xi |>0} λ− (22)
|xi | m=1
i,m
k=1
R x
Since 1{|xi |>0} k=1 |xi,k
i|
= 1{|xi |>0} , once again, we establish the relation we
need. This concludes the proof of Lemma 3.
References
1. Kemmeny, J.G., Snell, J.L. “Finite Markov Chains”, Von Nostrand, Princeton,
1965.
16 E. Gelenbe
2. Baskett F., Chandy K., Muntz R.R., Palacios F.G. “Open, closed and mixed net-
works of queues with different classes of customers”, Journal ACM, Vol. 22, No 2,
pp 248–260, April 1975.
3. Gelenbe E. “Random neural networks with negative and positive signals and prod-
uct form solution”, Neural Computation, Vol. 1, No. 4, pp 502–510, 1989.
4. Gelenbe E. “Product form queueing networks with negative and positive cus-
tomers”, Journal of Applied Probability, Vol. 28, pp 656–663, 1991.
5. Gelenbe E., Glynn P., Sigmann K. “Queues with negative customers”, Journal of
Applied Probability, Vol. 28, pp 245–250, 1991.
6. Fourneau J.M. “Computing the steady-state distribution of networks with positive
and negative customers”, Proc. 13-th IMACS World Congress on Computation and
Applied Mathematics, Dublin, 1991.
7. E. Gelenbe, S. Tucci “Performances d’un système informatique dupliqué”,
Comptes-Rendus Acad. Sci., t 312, Série II, pp. 27–30, 1991.
8. Gelenbe E., Schassberger R. “Stability of G-Networks”, Probability in the Engi-
neering and Informational Sciences, Vol. 6, pp 271–276, 1992.
9. Fourneau, J.M., Gelenbe, E. “Multiple class G-networks,” Conference of the ORSA
Technical Committee on Computer Science, Williamsburg, VA, Balci, O. (Ed.),
Pergamon, 1992.
10. Atalay, V., Gelenbe, E. “Parallel algorithm for colour texture generation using the
random neural network model”, International Journal of Pattern Recognition and
Artificial Intelligence, Vol. 6, No. 2 & 3, pp 437–446, 1992.
11. Miyazawa, M. “ Insensitivity and product form decomposability of reallocatable
GSMP”, Advances in Applied Probability, Vol. 25, No. 2, pp 415–437, 1993.
12. Henderson, W. “ Queueing networks with negative customers and negative queue
lengths”, Journal of Applied Probability, Vol. 30, No. 3, 1993.
13. Gelenbe E. “G-Networks with triggered customer movement”, Journal of Applied
Probability, Vol. 30, No. 3, pp 742–748, 1993.
14. Gelenbe E., “G-Networks with signals and batch removal”, Probability in the En-
gineering and Informational Sciences, Vol. 7, pp 335–342, 1993.
15. Chao, X., Pinedo, M. “On generalized networks of queues with positive and neg-
ative arrivals”, Probability in the Engineering and Informational Sciences, Vol. 7,
pp 301–334, 1993.
16. Henderson, W., Northcote, B.S., Taylor, P.G. “Geometric equilibrium distributions
for queues with interactive batch departures,” Annals of Operations Research, Vol.
48, No. 1–4, 1994.
17. Henderson, W., Northcote, B.S., Taylor, P.G. “Networks of customer queues and
resource queues”, Proc. International Teletraffic Congress 14, Labetoulle, J. and
Roberts, J. (Eds.), pp 853–864, Elsevier, 1994.
18. Gelenbe, E. “G-networks: an unifying model for neural and queueing networks”,
Annals of Operations Research, Vol. 48, No. 1–4, pp 433–461, 1994.
19. Chao, X., Pinedo, M. “Product form queueing networks with batch services, sig-
nals, and product form solutions”, Operations Research Letters, Vol. 17, pp 237–
242, 1995.
20. J.M. Fourneau, E. Gelenbe, R. Suros “G-networks with multiple classes of positive
and negative customers,” Theoretical Computer Science, Vol. 155, pp. 141–156,
1996.
21. Gelenbe, E., Labed A. “G-networks with multiple classes of customers and trig-
gers”, to appear.
Spectral Expansion Solutions for
Markov-Modulated Queues
Isi Mitrani
1 Introduction
(i) There is a threshold M , such that the instantaneous transition rates out of
state (i, j) do not depend on j when j ≥ M .
(ii) the jumps of the random variable J are bounded.
When the jumps of the random variable J are of size 1, i.e. when jobs arrive
and depart one at a time, the process is said to be of the Quasi-Birth-and-Death
type, or QBD (the term skip-free is also used, e.g. in Latouche et al. [12]). The
state diagram for this common model, showing some transitions out of state
(i, j), is illustrated in figure 1.
The requirement that all transition rates cease to depend on the size of the job
queue beyond a certain threshold is not too restrictive. Note that we impose no
limit on the magnitude of the threshold M , although it must be pointed out that
M.C. Calzarossa and S. Tucci (Eds.): Performance 2002, LNCS 2459, pp. 17–35, 2002.
c Springer-Verlag Berlin Heidelberg 2002
18 I. Mitrani
j+1 n n n n
@
I 6
@
@ n?
j n n - - n
6@I
@
? @
j−1 n n n n
1 n n n n
0 n n n n
0 1 i N
the larger M is, the greater the complexity of the solution. Similarly, although
jobs may arrive and/or depart in fixed or variable (but bounded) batches, the
larger the batch size, the more complex the solution.
The object of the analysis of a Markov-modulated queue is to determine the
joint steady-state distribution of the environmental phase and the number of
jobs in the system:
That distribution exists for an irreducible Markov process if, and only if, the
corresponding set of balance equations has a positive solution that can be nor-
malized.
The marginal distributions of the number of jobs in the system, and of the
phase, can be obtained from the joint distribution:
N
p·,j = pi,j . (2)
i=0
Spectral Expansion Solutions for Markov-Modulated Queues 19
∞
pi,· = pi,j . (3)
j=0
Various performance measures can then be computed in terms of these joint and
marginal distributions.
There are three ways of solving Markov-modulated queueing models exactly.
Perhaps the most widely used one is the matrix-geometric method [18]. This
approach relies on determining the minimal positive solution, R, of a non-linear
matrix equation; the equilibrium distribution is then expressed in terms of pow-
ers of R.
The second method uses generating functions to solve the set of balance
equations. A number of unknown probabilities which appear in the equations for
those generating functions are determined by exploiting the singularities of the
coefficient matrix. A comprehensive treatment of that approach, in the context
of a discrete-time process with an M/G/1 structure, is presented in Gail et al.
[5].
The third (and arguably best) method is the subject of this tutorial. It is
called spectral expansion, and is based on expressing the equilibrium distribution
of the process in terms of the eigenvalues and left eigenvectors of a certain matrix
polynomial. The idea of the spectral expansion solution method has been known
for some time (e.g., see Neuts [18]), but there are rather few examples of its
application in the performance evaluation literature. Some instances where that
solution has proved useful are reported in Elwalid et al. [3], and Mitrani and
Mitra [17]; a more detailed treatment, including numerical results, is presented
in Mitrani and Chakka [16]. More recently, Grassmann [7] has discussed models
where the eigenvalues can be isolated and determined very efficiently. Some
comparisons between the spectral expansion and the matrix-geometric solutions
can be found in [16] and in Haverkort and Ost [8]. The available evidence suggests
that, where both methods are applicable, spectral expansion is faster even if the
matrix R is computed by the most efficient algorithm.
The presentation in this tutorial is largely based on the material in chapter
6 of [13] and chapter 13 of [14].
Before describing the details of the spectral expansion solution, it would be
instructive to show some examples of systems which are modelled as Markov-
modulated queues.
μ, ξ, η
-
λi b
""
- bb
"
@
@
R
@
-
Note that only transition (d) has a rate which depends on j, and that dependency
vanishes when j ≥ N .
Remark. Even if the breakdown and repair processes were more compli-
cated, e.g., if servers could break down and be repaired in batches, or if a
server breakdown triggered a job departure, the queueing process would still
be QBD. The environmental state transitions can be arbitrary, as long as the
queue changes in steps of 1.
Spectral Expansion Solutions for Markov-Modulated Queues 21
Nη
E(Xt ) = . (5)
ξ+η
N
λ= pi,· λi . (6)
i=0
This gives us an explicit condition for stability. The offered load must be less
than the processing capacity:
λ Nη
< . (7)
μ ξ+η
Consider a network of two nodes in tandem, such as the one in figure 3. Jobs
arrive into the first node in a Poisson stream with rate λ, and join an unbounded
queue. After completing service at node 1 (exponentially distributed with pa-
rameter μ ), they attempt to go to node 2, where there is a finite buffer with
room for a maximum of N − 1 jobs (including the one in service). If that transfer
is impossible because the buffer is full, the job remains at node 1, preventing its
server from starting a new service, until the completion of the current service at
node 2 (exponentially distributed with parameter ξ ). In this last case, server 1
is said to be ‘blocked’. Transfers from node 1 to node 2 are instantaneous (see
[1,19]).
λ
- μ - ξ -
N −1
The only dependency on j comes from the fact that transitions (b), (c) and (d)
are not available when j = 0. In this example, the j-independency threshold is
M = 1.
Because the environmental process is coupled with the queueing process, the
marginal distribution of the former (i.e., the number of jobs at node 2), cannot
be determined without finding the joint distribution of It and Jt . Nor is the
stability condition as simple as in the previous example.
There is a large and useful family of distributions that can be incorporated into
queueing models by means of Markovian environments. Those distributions are
‘almost’ general, in the sense that any distribution function either belongs to
this family or can be approximated as closely as desired by functions from it.
Let It be a Markov process with state space {0, 1, . . . , N } and generator
matrix Ã. States 0, 1, . . . , N − 1 are transient, while state N , reachable from any
of the other states, is absorbing (the last row of à is 0). At time 0, the process
starts in state i with probability αi (i = 0, 1, . . . , N −1; α1 +α2 +. . .+αN −1 = 1).
Eventually, after an interval of length T , it is absorbed in state N . The random
variable T is said to have a ‘phase-type’ (PH) distribution with parameters Ã
and αi (see [18]).
The exponential distribution is obviously phase-type (N = 1). So is the
Erlang distribution—the convolution of N exponentials (exercise 5 in section
2.3). The corresponding generator matrix is
Spectral Expansion Solutions for Markov-Modulated Queues 23
⎡ ⎤
−μ μ
⎢ −μ μ ⎥
⎢ ⎥
⎢ .. .. ⎥
à = ⎢ . . ⎥ ,
⎢ ⎥
⎣ −μ μ ⎦
0
The PH family is very versatile. It contains distributions with both low and
high coefficients of variation. It is closed with respect to mixing and convolution:
if X1 and X2 are two independent PH random variables with N1 and N2 (non-
absorbing) phases respectively, and c1 and c2 are constants, then c1 X1 + c2 X2
has a PH distribution with N1 + N2 phases.
A model with a single unbounded queue, where either the interarrival in-
tervals, or the service times, or both, have PH distributions, is easily cast in
the framework of a queue in Markovian environment. Consider, for instance, the
M/PH/1 queue. Its state at time t can be represented as a pair (It , Jt ), where Jt
is the number of jobs present and It is the phase of the current service (if Jt > 0).
When It has a transition into the absorbing state, the current service completes
and (if the queue is not empty) a new service starts immediately, entering phase
i with probability αi .
The PH/PH/n queue can also be represented as a QBD process. However,
the state of the environmental variable, It , now has to indicate the phase of the
current interarrival interval and the phases of the current services at all busy
servers. If the interarrival interval has N1 phases and the service has N2 phases,
the state space of It would be of size N1 N2n .
The last example is not a QBD process. Consider a system where transactions,
arriving according to a Poisson process with rate λ, are served in FIFO order by
24 I. Mitrani
a single server. The service times are i.i.d. random variables distributed exponen-
tially with parameter μ. After N consecutive transactions have been completed,
the system performs a checkpoint operation whose duration is an i.i.d. random
variable distributed exponentially with parameter β. Once a checkpoint is es-
tablished, the N completed transactions are deemed to have departed. However,
both transaction processing and checkpointing may be interrupted by the occur-
rence of a fault. The latter arrive according to an independent Poisson process
with rate ξ. When a fault occurs, the system instantaneously rolls back to the
last established checkpoint; all transactions which arrived since that moment
either remain in the queue, if they have not been processed, or return to it,
in order to be processed again (it is assumed that repeated service times are
resampled independently) (see [11,8]).
This system can be modelled as an unbounded queue of (uncompleted) trans-
actions, which is modulated by an environment consisting of completed trans-
actions and checkpoints. More precisely, the two state variables, I(t) and J(t),
are the number of transactions that have completed service since the last check-
point, and the number of transactions present that have not completed service
(including those requiring re-processing), respectively.
The Markov-modulated queueing process X = {[I(t), J(t)] ; t ≥ 0}, has the
following transitions out of state (i, j):
Because transitions (a), resulting from arrivals of faults, cause the queue size
to jump by more than 1, this is not a QBD process.
Let us now turn to the problem of determining the steady-state joint distribu-
tion of the environmental phase and the number of jobs present, for a Markov-
modulated queue. We shall start with the most commonly encountered case,
namely the QBD process, where jobs arrive and depart singly. The starting
point is of course the set of balance equations which the probabilities pi,j , de-
fined in 1, must satisfy. In order to write them in general terms, the following
notation for the instantaneous transition rates will be used.
(a) Phase transitions leaving the queue unchanged: from state (i, j) to state
(k, j) (0 ≤ i, k ≤ N ; i = k), with rate aj (i, k);
(b) Transitions incrementing the queue: from state (i, j) to state (k, j + 1) (0 ≤
i, k ≤ N ), with rate bj (i, k);
(c) Transitions decrementing the queue: from state (i, j) to state (k, j − 1) (0 ≤
i, k ≤ N ; j > 0), with rate cj (i, k).
Spectral Expansion Solutions for Markov-Modulated Queues 25
Aj = A ; Bj = B ; Cj = C , j ≥ M . (8)
Note that transitions (b) may represent a job arrival coinciding with a change
of phase. If arrivals are not accompanied by such changes, then the matrices
Bj and B are diagonal. Similarly, a transition of type (c) may represent a job
departure coinciding with a change of phase. Again, if such coincidences do not
occur, then the matrices Cj and C are diagonal.
By way of illustration, here are the transition rate matrices for some of the
examples in the previous subsection.
Since the phase transitions are independent of the queue size, the matrices Aj
are all equal:
⎡ ⎤
0 Nη
⎢ ξ 0 (N − 1)η ⎥
⎢ ⎥
⎢ .. ⎥
Aj = A = ⎢ ⎢ 2ξ 0 . ⎥ .
⎥
⎢ . . ⎥
⎣ .. .. η ⎦
Nξ 0
Denoting
μi,j = min(i, j)μ ; i = 0, 1, . . . , N ; j = 1, 2, . . . ,
Manufacturing Blocking
Remember that the environment changes phase without changing the queue size
either when a service completes at node 2 and node 1 is not blocked, or when
node 1 becomes blocked (if node 1 is already blocked, then a completion at node
2 changes both phase and queue size). Hence, when j > 0,
⎡ ⎤
0 0
⎢ξ 0 0 ⎥
⎢ ⎥
⎢ .. .. .. ⎥
Aj = A = ⎢ . . . ⎥ ; j = 1, 2, . . . .
⎢ ⎥
⎣ ξ 0 μ ⎦
0 0
When node 1 is empty (j = 0), it cannot become blocked; the state (N, 0)
does not exist and the matrix A0 has only N rows and columns:
⎡ ⎤
0
⎢ξ 0 ⎥
⎢ ⎥
A0 = ⎢ . . ⎥ ;
⎣ .. .. ⎦
ξ 0
Since the arrival rate into node 1 does not depend on either i or j, we have
Bj = B = λI, where I is the identity matrix of order N + 1. The departures
from node 1 (which can occur when i = N − 1) are always accompanied by
environmental changes: from state (i, j) the system moves to state (i + 1, j − 1)
with rate μ for i < N − 1; from state (N, j) to state (N − 2, j − 1) with rate ξ.
Hence, the departure rate matrices do not depend on j and are equal to
⎡ ⎤
0μ
⎢0 0 μ ⎥
⎢ ⎥
⎢ .. .. ⎥
⎢ . . ⎥
Cj = C = ⎢
⎢
⎥ .
⎥
⎢ ..
⎢ . 0 μ ⎥⎥
⎣ 0 0 0⎦
ξ 00
Spectral Expansion Solutions for Markov-Modulated Queues 27
Balance Equations
Using the instantaneous transition rates defined at the beginning of this section,
the balance equations of a general QBD process can be written as
N
pi,j [aj (i, k) + bj (i, k) + cj (i, k)]
k=0
N
= [pk,j aj (k, i) + pk,j−1 bj−1 (k, i) + pk,j+1 cj+1 (k, i)] , (9)
k=0
where pi,−1 = b−1 (k, i) = c0 (i, k) = 0 by definition. The left-hand side of (9)
gives the total average number of transitions out of state (i, j) per unit time (due
to changes of phase, arrivals and departures), while the right-hand side expresses
the total average number of transitions into state (i, j) (again due to changes of
phase, arrivals and departures). These balance equations can be written more
compactly by using vectors and matrices. Define the row vectors of probabilities
corresponding to states with j jobs in the system:
Also, let DjA , DjB and DjC be the diagonal matrices whose i th diagonal element
is equal to the i th row sum of Aj , Bj and Cj , respectively. Then equations (9),
for j = 0, 1, . . ., can be written as:
for j = M + 1, M + 2, . . ..
In addition, all probabilities must sum up to 1:
∞
vj e = 1 , (13)
j=0
Proposition 1 The QBD process has a steady-state distribution if, and only
if, the number of eigenvalues of Q(x) strictly inside the unit disk, each counted
according to its multiplicity, is equal to the number of states of the Markovian
environment, N +1. Then, assuming that the eigenvectors of multiple eigenvalues
are linearly independent, the spectral expansion solution of (12) has the form
N +1
vj = αk uk xj−M
k ; j = M, M + 1, . . . . (20)
k=1
1. Compute the eigenvalues of Q(x), xk , inside the unit disk, and the corre-
sponding left eigenvectors uk . If their number is other than N + 1, stop; a
steady-state distribution does not exist.
2. Solve the finite set of linear equations (11), for j = 0, 1, . . . , M , and (13),
with vM and vM +1 given by (20), to determine the constants αk and the
vectors vj for j < M .
3. Use the obtained solution in order to determine various moments, marginal
probabilities, percentiles and other system performance measures that may
be of interest.
Careful attention should be paid to step 1. The ‘brute force’ approach which
relies on first evaluating the scalar polynomial det[Q(x)], then finding its roots,
may be very inefficient for large N . An alternative which is preferable in most
cases is to reduce the quadratic eigenvalue-eigenvector problem
u[Q0 + Q1 x + Q2 x2 ] = 0 , (21)
30 I. Mitrani
The spectral expansion solution can also be used to provide simple estimates
of performance when the system is heavily loaded. The important observation
in this connection is that when the system approaches instability, the expansion
(19) is dominated by the eigenvalue with the largest modulus inside the unit
disk, xN +1 . That eigenvalue is always real. It can be shown that when the offered
load is high, the average number of jobs in the system is approximately equal to
xN +1 /(1 − xN +1 ).
Defining again the diagonal matrices DA , DBs and DCs , whose i th diagonal
element is equal to the i th row sum of A, Bs and Cs , respectively, the balance
equations for j > M + r1 can be written in a form analogous to (12):
r1
r2
r1
r2
vj [DA + D Bs + D Cs ] = vj−s Bs + vj A + vj+s Cs . (28)
s=1 s=1 s=1 s=1
Similar equations, involving Aj , Bj,s and Cj,s , together with the corresponding
diagonal matrices, can be written for j ≤ M + r1 .
As before, (28) can be rewritten as a vector difference equation, this time of
order r = r1 + r2 , with constant coefficients:
r
vj+ Q = 0 ; j ≥ M . (29)
=0
32 I. Mitrani
r1
r2
Qr1 = A − DA − D Bs − D Cs ,
s=1 s=1
r
Q(x) = Q x . (30)
=0
where xk are the eigenvalues of Q(x) in the interior of the unit disk, uk are the
corresponding left eigenvectors, and αk are constants (k = 1, 2, . . . , c ). These
constants, together with the the probability vectors vj for j < M , are deter-
mined with the aid of the state-dependent balance equations and the normalizing
equation.
There are now (M + r1 )(N + 1) so-far-unused balance equations (the ones
where j < M + r1 ), of which (M + r1 )(N + 1) − 1 are linearly independent, plus
one normalizing equation. The number of unknowns is M (N + 1) +c (the vectors
vj for j = 0, 1, . . . , M − 1), plus the c constants αk . Hence, there is a unique
solution when c = r1 (N + 1).
r1 ∗(N +1)
vj = αk uk xj−M
k ; j = M, M + 1, . . . . (32)
k=1
References
16. I. Mitrani and R. Chakka, Spectral expansion solution for a class of Markov mod-
els: Application and comparison with the matrix-geometric method, to appear in
Performance Evaluation, 1995.
17. I. Mitrani and D. Mitra, A spectral expansion method for random walks on semi-
infinite strips, IMACS Symposium on Iterative Methods in Linear Algebra, Brussels,
1991.
18. M.F. Neuts, Matrix Geometric Solutions in Stochastic Models, John Hopkins Press,
1981.
19. M.F. Neuts, Two queues in series with a finite intermediate waiting room, J. Appl.
Prob., 5, pp. 123–142, 1968.
20. M.F. Neuts and D.M. Lucantoni, A Markovian queue with N servers subject to
breakdowns and repairs, Management Science, 25, pp. 849–861, 1979.
21. N.U. Prabhu and Y. Zhu, Markov-modulated queueing systems, QUESTA, 5, pp.
215–246, 1989.
M/G/1-Type Markov Processes: A Tutorial∗
1 Introduction
M.C. Calzarossa and S. Tucci (Eds.): Performance 2002, LNCS 2459, pp. 36–63, 2002.
c Springer-Verlag Berlin Heidelberg 2002
M/G/1-Type Markov Processes: A Tutorial 37
traffic models than the simple Poisson process or the batch Poisson process, as
they can effectively capture dependence and correlation, salient characteristics
of Internet traffic [27,12,33].
In this paper, we focus on the solution techniques for M/G/1-type Markov
chains. Neuts [25] defines various classes of infinite-state Markov chains with a
repetitive structure, whose state space1 is partitioned into the boundary states
(0) (0) (i) (i)
S (0) = {s1 , . . . , sm } and the sets of states S (i) = {s1 , . . . , sn }, for i ≥ 1, that
correspond to the repetitive portion of the chain. For the class of M/G/1-type
Markov chains, the infinitesimal generator QM/G/1 has upper block Hessenberg
form:
⎡ (1) (2) (3) (4) ⎤
L F F F F ···
⎢B F(1) F(2) F(3) ···⎥
⎢ L ⎥
⎢0 B L F(1) F(2) ···⎥
⎢ ⎥
QM/G/1 = ⎢0 0 F(1) ···⎥ . (1)
⎢ B L ⎥
⎢0 0 0 B L ···⎦ ⎥
⎣
.. .. .. .. .. ..
. . . . . .
We use the letters “L”, “F”, and “B” to describe “local”, ‘forward”, and “back-
ward” transition rates, respectively, in relation to a set of states S (i) for i ≥ 1,
and a “” for matrices related to S (0) .
For systems of the M/G/1-type, matrix analytic methods have been pro-
posed for the solution of the basic equation π · QM/G/1 = 0 [26], where π
is the (infinite) stationary probability vector of all states in the chain. Key to
the matrix-analytic methods is the computation of an auxiliary matrix called
G. Traditional solution methodologies for M/G/1-type processes compute the
stationary probability vector with a recursive function based on G. Iterative
algorithms are used to determine G [20,16].
Another class of Markov-chains with repetitive structure that commonly oc-
curs in modeling of computer systems is the class of GI/M/1-type processes,
whose infinitesimal generator QGI/M/1 has a lower block Hessenberg form:
1
We use calligraphic letters to indicate sets (e.g., A), lower case boldface Roman or
Greek letters to indicate row vectors (e.g., a, α), and upper case boldface Roman
letters to indicate matrices (e.g., A). We use superscripts in parentheses or subscripts
to indicate family of related entities (e.g., A(1) , A1 ), and we extend the notation
to subvectors or submatrices by allowing sets of indices to be used instead of single
indices (e.g., a[A], A[A, B]). Vector and matrix elements are indicated using square
brackets (e.g., a[1], A[1, 2]). RowSum(·) indicates the diagonal matrix whose entry in
position (r, r) is the sum of the entries on the rth row of the argument (which can be
a rectangular matrix). Norm(·) indicates a matrix whose rows are normalized. 0 and
1 indicate a row vector or a matrix of 0’s, or a row vector of 1’s, of the appropriate
dimensions, respectively.
38 A. Riska and E. Smirni
⎡ ⎤
L
F 0 0 0 ···
⎢ (1) ⎥
⎢ B L F 0 0 ···⎥
⎢ (2) ⎥
QGI/M/1 =⎢ B
⎢ (3)
B(1) L F 0 ···⎥.
⎥ (2)
⎢B B(2) B(1) L F ···⎥
⎣ ⎦
.. .. .. .. .. . .
. . . . . .
The solution of GI/M/1-type processes is significantly simpler than the solu-
tion of M/G/1-type processes because of the matrix geometric relation [25] that
exists among the stationary probabilities of sets S (i) for i ≥ 1. This property
leads to significant algebraic simplifications resulting in the very elegant matrix-
geometric solution technique that was pioneered by Neuts and that was later
popularized by Nelson in the early ’90s [23,24]. Key to the matrix-geometric so-
lution is a matrix called R which is used in the computation of the steady-state
probability vector and measures of interest.
Quasi-Birth-Death (QBD) processes are the intersection of M/G/1-type and
GI/M/1-type processes and their infinitesimal generator has the structure de-
picted in Eq.(3).
⎡ ⎤
F
L 0 0 0 ···
⎢B ⎥
⎢ L F 0 0 ···⎥
⎢ 0 B L F 0 ···⎥
QQDB = ⎢ ⎥. (3)
⎢ 0 0 B L F ···⎥
⎣ ⎦
.. .. .. .. .. . .
. . . . . .
Since QBDs are special cases of both M/G/1-type processes and GI/M/1-type
processes, either the matrix-analytic method or the matrix-geometric solution
can be used for their analysis. The matrix-geometric solution is the preferable one
because of its simplicity. Both matrices G and R are defined for QBD processes.
We direct the interested reader to [16] for recent advances on the analysis of
QBD processes.
Key to the solution of Markov chains of the M/G/1, GI/M/1, and QBD
types, is the existence of a repetitive structure, as illustrated in Eqs. (1), (2),
and (3), that allows for a certain recursive procedure to be applied for the compu-
tation of the stationary probability vector π (i) corresponding to S (i) for i ≥ 1.
It is this recursive relation that gives elegance to the solution for the case of
GI/M/1 (and consequently QBD) Markov chains, but results in unfortunately
more complicated mathematics for the case of the M/G/1-type.
The purpose of this tutorial is to shed light into the existing techniques
for the analysis of Markov chains of the M/G/1 type that are traditionally
considered not easy to solve. Our intention is to derive from first principles (i.e.,
global balance equations) the repetitive patterns that allow for their solution and
illustrate that the mathematics involved are less arduous than initially feared.
Our stated goals and outline of this tutorial are the following:
– Give an overview of the matrix-geometric solution of GI/M/1 and QBD
processes and establish from first principles why a geometric solution exists
(Section 2).
M/G/1-Type Markov Processes: A Tutorial 39
– Use first principles to establish the most stable recursive relation for the
case of M/G/1-type processes and essentially illustrate the absence of any
geometric relation among the steady state probabilities of sets S (i) , i ≥ 0,
for such chains (Section 3).
– Present an overview of the current state of the art of efficient solutions for
M/G/1-type processes (Section 4).
– State the stability conditions for M/G/1-type processes (Section 5).
– Summarize the features of an existing software tool that can provide M/G/1-
type solutions (Section 6).
Our aim is to make these results more accessible to performance modelers. We
do this by presenting simplified derivations that are often example driven and
by describing an existing tool for the solution of such processes.
and can be computed using iterative numerical algorithms. The above equation
is obtained from the balance equations of the repeating portion of the process,
i.e., starting from the third column of QGI/M/1 . Using Eq.(4) and substituting
in the balance equation that corresponds to the second column of QGI/M/1 , and
together with the normalization condition
∞
π (0) · 1T + π (1) · Ri−1 · 1T = 1 i.e., π (0) · 1T + π (1) · (I − R)−1 · 1T = 1,
i=1
Again, the average queue length is given by the same equation as in the GI/M/1
case.
λ λ λ λ
0 ,0 1 ,1 2 ,1 3,1 .......
0.2μ 0.2μ 0.2μ
0.8μ 0.8μ 0.8μ
γ γ γ
1 ,2
.......
2 ,2 3,2
λ λ λ
(1) (i)
(0) S S
S
λ
λ λ
0 ,0 1 ,1 ....... 0.2μ i,1
0.2μ
0.8μ 0.8μ x
γ γ γ
1 ,2 ....... i,2
λ λ
states and a set with an infinite number of states, respectively. The stochastic
complement of the states in A is a new Markov chain that “skips over” all states
in A. This Markov chain includes states in A only but all transitions out of S (i)
(i.e., the boundary set) to S (i+1) (i.e., the first set in A) need to be “folded back”
to A (see Figure 2). This folding introduces a new direct transition with rate x
that ensures that the stochastic complement of the states in A is a stand-alone
process. Because of the structure of this particular process, i.e., A is entered from
A only through state (i, 1), x is simply equal to λ (see Lemma 1 in Appendix
A). Furthermore, because of the repetitive structure of the original chain, this
rate does not depend on i (which essentially defines the size of set A).
The steady state probability vector π = [π (0) , · · · , π (i) ] of the stochastic
complement of the states in A relates to the steady state probability π A of the
original process with: π = π A /π A · 1T . This implies that if a relation exists
between π (i−1) and π (i) , then the same relation holds for π (i−1) and π (i) .
The flow balance equations for states (i, 1) and (i, 2) in the stochastic com-
plement of A, are:
−μ 0.8μ λ0
π (i) [1], π (i) [2] = − π (i−1) [1], π (i−1) [2] ,
λ −(γ + λ) 0λ
which implies that the relation between π (i−1) and π (i) can be expressed as
Applying Eq.(8) recursively, one can obtain the result of Eq.(4). Observe that
in this particular case an explicit computation of R is possible (i.e., there is no
need to compute R [28] via an iterative numerical procedure as in the general
case). This is a direct effect of the fact that in this example backward transitions
from S (i) to S (i−1) are directed toward a single state only. In Appendix B, we
give details on the cases when matrix R can be explicitly computed.
where
M/G/1-Type Markov Processes: A Tutorial 43
⎡ ⎤ ⎡ ⎤
L F ··· 0 0 0 0 0 0 ···
⎢ (1) ⎥ ⎢ 0 0 0 0 ···⎥
⎢ B L ··· 0 0⎥ ⎢ ⎥
⎢. .. ⎥ ⎢ ⎥
Q[A, A] = ⎢
⎢ ..
..
.
.. ..
. .
⎥
. ⎥, Q[A, A] = ⎢ ... ... ... ... ... ⎥ ,
⎢ (i−1) ⎥ ⎢ ⎥
⎣B B(i−2) · · · L F⎦ ⎣ 0 0 0 0 ···⎦
(i)
B B(i−1) · · · B(1) L F 0 0 0 ···
⎡ ⎤ ⎡ ⎤
(i+1)
B B(i) · · · B(1) L F 0 0 ···
⎢ (i+2) ⎥ ⎢ B(1) 0 ···⎥
⎢B B(i+1) · · · B(2) ⎥ ⎢ (2) L F ⎥
⎢ (i+3) ⎥
Q[A, A] = ⎢ B B(i+2) · · · B(3) ⎥ , Q[A, A] = ⎢
⎢B B(1) L F ···⎥ ⎥.
⎢ (i+4) ⎥ ⎢ B(3)
⎢B B(i+3) ··· B ⎦ (4) ⎥
⎣ B(2) B(1) L ···⎥ ⎦
⎣ .. .. .. .. . .
.. .. .. ..
. . . . . . . . .
(10)
Observe that Q[A, A] is the same matrix for any i > 1. We define its inverse
to be as follows
⎡ ⎤
A0,0 A0,1 A0,2 A0,3 · · ·
⎢ A1,0 A1,1 A1,2 A1,3 · · · ⎥
⎢ ⎥
⎢ ⎥
(−Q[A, A]) = ⎢ A2,0 A2,1 A2,2 A2,3 · · · ⎥ .
−1
(11)
⎢ A3,0 A3,1 A3,2 A3,3 · · · ⎥
⎣ ⎦
.. .. .. .. ..
. . . . .
From the special structure of Q[A, A] we conclude that the second term in the
summation that defines Q is a matrix with all block entries equal to zero except
the very last block row, whose block entries Xj are of the form:
∞
Xj = F · (j+1+k)
A0,k B j=i
k=0
and
∞
Xj = F · A0,k B(j+1+k) , 0 ≤ j < i.
k=0
∞
Note that X0 = F · k=0 A0,k B(1+k) which means that X0 does not depend on
the value of i > 1. The infinitesimal generator Q of the stochastic complement
of states in A is determined as
⎡ ⎤
L
F ··· 0 0
⎢ (1) ⎥
⎢ B L ··· 0 0 ⎥
⎢ ⎥
Q=⎢ ⎢
..
.
..
.
..
.
..
.
..
.
⎥.
⎥ (12)
⎢ (i−1) ⎥
⎣ B B (i−2)
··· L F ⎦
(i) + Xi B(i−1) + Xi−1 · · · B(1) + X1 L + X0
B
Let π be the stationary probability vector of the CTMC with infinitesimal gen-
erator Q and π A the steady-state probability vector of the CTMC of states in
44 A. Riska and E. Smirni
A in the original process, i.e., the process with infinitesimal generator QGI/M/1 .
There is a linear relation between π and π A given in the following equation:
πA
π= . (13)
π A · 1T
π (i) · (L + X0 ) = −π (i−1) · F
implying:
π (i) · (L + X0 ) = −π (i−1) · F.
The above equation holds for any i > 1, because their matrix coefficients do not
depend on i. By applying it recursively over all vectors π (i) for i > 1, we obtain
the following geometric relation
...
S S S S
...
0.25λ 0.25λ
0.25λ
0.5λ 0.5λ 0.5λ
λ λ λ λ
0 ,0 1 ,1 2 ,1 3,1 .......
0.2μ 0.2μ 0.2μ
0.8μ 0.8μ 0.8μ
γ γ γ
1 ,2
.......
2 ,2 3,2
λ λ λ
0.5λ 0.5λ
0.25λ
0.25λ
...
Fig. 3. The CTMC that models a BM AP1 /Cox2 /1 queue.
In the following, we derive the relation between π (i) for i ≥ 1 and the rest of
vectors in π using stochastic complementation, i.e., similarly to the approach
described in Section 2. First we partition the state space S into two partitions
A = ∪j=i
j=0 S
(j)
and A = ∪∞j=i+1 S
(j)
and then we construct the stochastic comple-
ment of states in A. The Markov chain of the stochastic complement of states in
A, (see Figure 4), illustrates how transitions from states (j, 1) and (j, 2) for j ≤ i
and state (0, 0) to states (l, 1) and (l, 2) for l > i are folded back to state (i, 1),
which is the single state to enter A from states in A. These “back-folded” tran-
sitions are marked by xk,h for k ≤ i and h = 1, 2 and represent the “correction”
needed to make the stochastic complement of states in A, a stand-alone process.
Because of the single entry state in A the stochastic complement of states in A
for this example can be explicitly derived (see Lemma 1 in Appendix A) and the
definition of rates xk,h is as follows:
The flow balance equations for states (i, 1) and (i, 2) for the stochastic comple-
ment of states in A are:
(0.2μ + 0.8μ)π (i) [1] = 2λπ (i) [2]
+ 2 · 0.5i−1 λπ (0) [1]
+ 2 · 0.5i−2 λπ (1) [1] + 0.5i−2 λπ (1) [2] + ...
+ 2 · 0.5i−i λπ (i−1) [1] + 0.5i−i λπ (i−1) [2]
and
(2λ + γ)π (i) [2] = 0.8μπ (i) [1] + 0.5i−2 λπ (1) [2] + ... + 0.5i−i λπ (i−1) [2].
46 A. Riska and E. Smirni
1 ,2 .... γ
i,2
λ λ
In the above equalities we group the elements of π (i) on the left and the
rest of the terms on the right in order to express their relation in terms of
the block matrices that describe the infinitesimal generator Q of the stochastic
complement of states in A. By rearranging the terms, the above equations can
be re-written as:
−μπ (i) [1] + 2λπ (i) [2] = −(2 · 0.5i−1 λπ (0) [1]
+ 2 · 0.5i−2 λπ (1) [1] + 0.5i−2 λπ (1) [2] + ...
+ 2 · 0.5i−i λπ (i−1) [1] + 0.5i−i λπ (i−1) [2])
and
0.8μπ (i) [1] − (2λ + γ)π (i) [2] = −(0.5i−2 λπ (1) [2] + ... + 0.5i−i λπ (i−1) [2]).
We can now re-write the above equations in the following matrix equation form:
−μ 0.8μ (0) i−1
(i)
[π [1], π
(i)
[2]] · =− π [1] 2 · 0.5 λ 0
2λ −(2λ + γ)
2 · 0.5i−2 λ 0
+ [π (1) [1], π (1) [2]] + ...
0.5i−2 λ 0.5i−2 λ
2 · 0.5i−i λ 0
+ [π (i−1) [1], π (i−1) [2]] .
0.5i−i λ 0.5i−i λ
By substituting [π (i) [1], π (i) [2]] with π (i) and expressing the coefficient ma-
trices in the above equation in terms of the component matrices of the infinites-
imal generator Q of the stochastic complement of states in A, we obtain4 :
4
Recall that π is the stationary probability vector of the stochastic complement
of states in A and π[A] is the stationary probability vector of states in A in
M/G/1-Type Markov Processes: A Tutorial 47
∞
∞
∞
∞
π (i) ·(L+ F(j) G) = −(π (0) (j) G+π (1)
F F(j) G+...+π (i−1) F(j) G),
j=1 j=i j=i−1 j=1
Note that at this point, we have introduced a new matrix, G, that has an im-
portant probabilistic interpretation. In this specific example, the matrix G can
be explicitly derived [28]. This is a direct outcome of the fact that all states in
set S (i) for i ≥ 1 return to the same single state in set S (i−1) . Equivalently,
the matrix B of the infinitesimal generator QM/G/1 has only a single column
different from zero.
In this section, we investigate the relation between π (i) for i > 1 and π (j) for
0 ≤ j < i for the general case in the same spirit as [34]. We construct the
stochastic complementation of the states in A = ∪ij=0 S (j) (A = S − A). We
obtain
⎡ ⎤ ⎡ (i+1) ⎤
F
L (1) (i−1) F
··· F (i)
F (i+2) F
F (i+3) ···
⎢B L (i−1) ⎥
(i−2) F
··· F ⎢ F(i) F (i+1)
F(i+2) ···⎥
⎢ ⎥ ⎢ ⎥
⎢ . . .. .. .. ⎥ ⎢ . .. .. ⎥
Q[A, A] = ⎢ .. .. ⎥ , Q[A, A] = ⎢ .. . . ···⎥ ,
⎢ . . . ⎥ ⎢ ⎥
⎣0 0 ··· L F ⎦ ⎣ F(2) F(3) F(4) ··· ⎦
0 0 ··· B L F(1) F(2) F(3) ···
⎡ ⎤ ⎡ ⎤
0 0 ··· 0 B L F(1) F(2) F(3) ···
⎢0 0 ··· 0 0 ⎥ ⎢B L F(1) F(2) ···⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ···⎥
Q[A, A] = ⎢ 0 0 · · · 0 0 ⎥ , Q[A, A] = ⎢ 0 B L F(1) ⎥.
⎢0 0 ··· 0 0 ⎥ ⎢0 0 B L ···⎥
⎣ ⎦ ⎣ ⎦
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . .
the original M/G/1 process. They relate to each other based on the equation
π = π[A]/(π[A]1T ), which implies that any relation that holds among subvectors
π (j) for j ≤ i would hold for subvectors π (j) for j ≤ i as well
48 A. Riska and E. Smirni
Observe that Q[A, A] is the same matrix for any i ≥ 1. We define its inverse to
be as follows ⎡ ⎤
A0,0 A0,1 A0,2 A0,3 · · ·
⎢ A1,0 A1,1 A1,2 A1,3 · · · ⎥
⎢ ⎥
⎢ ⎥
(−Q[A, A]) = ⎢ A2,0 A2,1 A2,2 A2,3 · · · ⎥ .
−1
(14)
⎢ A3,0 A3,1 A3,2 A3,3 · · · ⎥
⎣ ⎦
.. .. .. .. ..
. . . . .
From the special structure of Q[A, A] we conclude that the second term of the
above summation is a matrix with all block entries equal to zero except the very
last block column, whose block entries Xj are of the form:
∞
Xi = (i+1+k) · Ak,0 · B
F
k=0
and
∞
Xj = F(j+1+k) · Ak,0 · B, 0 ≤ j < i.
k=0
i−1
(i) + Xi ) +
π (i) · (L + X0 ) = −(π (0) · (F π (j) · (F(i−j) + Xi−j )) ∀i ≥ 1
j=1
and
i−1
(i) + Xi ) +
π (i) · (L + X0 ) = −(π (0) · (F π (j) · (F(i−j) + Xi−j )) ∀i ≥ 1. (17)
j=1
The above equation shows that there in no geometric relation between vectors
π (i) for i ≥ 1, however it provides a recursive relation for the computation of
M/G/1-Type Markov Processes: A Tutorial 49
the steady-state probability vector for M/G/1 Markov chains. In the following,
we further work on simplifying the expression of matrices Xj for 0 ≤ j ≤ i.
From the definition of the stochastic complementation (see Appendix A) we
know that an entry [r, c] in (−Q[A, A]−1 · Q[A, A]) 5 represents the probability
that starting from state r ∈ A the process enters A through state c. Since
A is entered from A only through states in S (i) we can use the probabilistic
interpretation of matrix G to figure out the entries in (−Q[A, A]−1 ) · Q[A, A].
An entry [r, c] in Gj for j > 0 represents the probability that starting from
state r ∈ S (i+j) for i > 0 the process enters set S (i) through state c. It is
straightforward now to define
⎡ ⎤
0 0 ··· 0 G
⎢ 0 0 · · · 0 G1 ⎥
⎢ ⎥
⎢ 2⎥
(−Q[A, A] ) · Q[A, A] = ⎢ 0 0 · · · 0 G3 ⎥ .
−1
(18)
⎢0 0 ··· 0 G ⎥
⎣ ⎦
.. .. .. .. ..
. . . . .
..
..
..
..
..
.
.
.
.
(0 ) (1 ) (i) (i+ 1 ) (i+ 2 )
S S S S S
... r ...
j k
h
i− 1 G [k ,h ] G [r,k ]
G [h ,j]
2
G [r,h ]
Observe that the above auxiliary sums represent the last column in the infinites-
imal generator Q defined in Eq.(15). We can express them in terms of matrices
Xi defined in Eq.(19) as follows:
S (i) + Xi , i ≥ 1
(i) = F S(i) = F(i) + Xi , i ≥ 0.
7
Subtractions on these type of formulas present the possibility of numerical instabil-
ity [26,29].
M/G/1-Type Markov Processes: A Tutorial 51
Given the above definition of π (i) for i ≥ 1 and the normalization condition, a
unique vector π (0) can be obtained by solving the following system of m linear
equations, i.e., the cardinality of set S (0) :
⎡ ⎛ ⎞−1 ⎤
∞ ∞
⎢ (0) (1) (0) −1 (i) ⎝ ⎥
π (0) ⎣ L −S S B | 1T − S S(j) ⎠ 1T ⎦ = [0 | 1], (23)
i=1 j=0
where the symbol “ ” indicates that we discard one (any) column of the corre-
sponding matrix, since we added a column representing the normalization condi-
tion. Once π (0) is known, we can then iteratively compute π (i) for i ≥ 1, stopping
when the accumulated probability mass is close to one. After this point, mea-
sures of interest can be computed. Since the relation between π (i) for i ≥ 1 is
not straightforward, computation of measures of interest requires generation of
the whole stationary probability vector.
The major gain of this special case is the fact that G does not need to be either
computed or fully stored.
[19] gives an improved version of Ramaswami’s formula. Once π (0) is known using
Eq.(23), the stationary probability vector is computed using matrix-generating
functions associated with block triangular Toeplitz matrices8 . These matrix-
generating functions are computed efficiently by using fast Fourier transforms
(FFT).
The algorithm of the Fast FFT Ramaswami’s formula is based on the fact
that in practice it is not possible to store an infinite number of matrices to
express the M/G/1-type process. Assuming that only p matrices can be stored
then the infinitesimal generator QM/G/1 has the following structure
8
A Toeplitz matrix has equal elements in each of its diagonals allowing the use of
computationally efficient methods.
52 A. Riska and E. Smirni
⎡ ⎤
F
L (1) (2)
F (3)
F (4)
F ··· F (p) 0 0 ···
⎢B ···⎥
⎢ L F(1) F(2) F(3) · · · F(p−1) F(p) 0 ⎥
⎢ ⎥
⎢0 B L F(1) F(2) · · · F(p−2) F(p−1) F(p) ···⎥
⎢ ⎥
⎢0 0 B L F(1) · · · F(p−3) F(p−2) F(p−1) ···⎥
⎢ . . ⎥
⎢ . . .. .. .. .. .. .. .. .. ⎥
⎢ . .⎥
QM/G/1 =⎢ . . . . . . . .
⎥. (25)
⎢0 0 0 0 0 · · · F(1) F(2) F(3) ···⎥
⎢ ⎥
⎢0 0 0 0 0 ··· L F(1) F(2) ···⎥
⎢ ⎥
⎢0 0 0 0 0 ··· B L F(1) ···⎥
⎢ ⎥
⎢0 0 0 0 0 ··· 0 B L ···⎥
⎣ ⎦
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . .
Because there are only p matrices of type F (i) and F(i) , there are only p
sums of type S and S to be computed. Therefore, the computation of π (i)
(i) (i)
for i > 0 using Ramaswami’s formula, i.e., Eq.(21), depends only on p vectors
π (j) for max(0, i − p) ≤ j < i. Define
π̃ (1) = [π (1) , ..., π (p) ] and π̃ (i) = [π (p(i−1)+1) , ..., π (pi) ] for i ≥ 2. (26)
Define
⎡ ⎤
S(0) S(1) S(2) · · · S(p−1)
⎢ 0 S(0) S(1) · · · S(p−2) ⎥
⎢ ⎥
⎢ S(0) · · · S(p−3) ⎥ (2) , S
(1) , S (3) , · · · , S
(p) . (29)
Y=⎢ 0 0 ⎥ and b = S
⎢ . .. .. .. .. ⎥
⎣ .. . . . . ⎦
0 0 0 · · · S(0)
M/G/1-Type Markov Processes: A Tutorial 53
The set of equations in Eq.(28) can be written in a compact way by using the
definitions in Eq.(26) and Eq.(29).
4.4 ETAQA-M/G/1
Etaqa[31]
∞ (i) is an aggregation-based technique that computes only π , π and
(0) (1)
´¼µ ´½µ ´¾µ ´¿µ ¾
´¼µ ¿ ¾ ¾ ¾ Ì ´¾µ
¾
´¼µ
´½µ ´¾µ ´¿µ
´¼µ
´½µ
´¾µ
´¿µ
Ì
x · X = [1, 0] (33)
⎢1
T F
L −
(1) ·G
S (i)
( +(
F(i) · G) ⎥
S (i)
⎢ ⎥
⎢ i=3
∞
i=2
∞
i=3
∞ ⎥
⎢ T ⎥
X=⎢ 1
B L − S (i)
· G ( F(i)
+ S (i)
· G)⎥
(34)
⎢ ⎥
⎢ i=2 i=1 i=2 ⎥
⎢ (i)
∞ (i)
∞ (i)
∞ ⎥
⎣ ⎦
1T 0 B− S ·G ( F +L+ S · G)
i=1 i=1 i=1
∞
admits a unique solution x = [π (0) , π (1) , π (∗) ], where π (∗) = i=2 π (i) .
Etaqa approach is in the same spirit as the one presented in [5,4] for the
exact solution of a very limited class of QBD and M/G/1-type Markov chains,
M/G/1-Type Markov Processes: A Tutorial 55
but in distinct contrast to these works, the above theorem does not require any
restriction on the form of the chain’s repeating pattern, thus can be applied to
any type of M/G/1 chain.
Etaqa provides a recursive approach to compute metrics of interest once
π (0) , π (1) , and π (∗) have been computed. We consider measures that can be
expressed as the expected reward rate:
∞
(j) (j)
r= ρi π i ,
j=0 i∈S (j)
(j) (j)
where ρi is the reward rate of state si . For example, to compute the expected
queue length in steady state, where S (j) represents the system states with j
(j)
customers in the queue, we let ρi = j. To compute the second moment of the
(j)
queue length, we let ρi = j 2 . ∞
Since our solution approach obtains π (0) , π (1) , and j=2 π (j) , we rewrite r
as
∞
r = π (0) ρ(0)T + π (1) ρ(1)T + π (j) ρ(j)T ,
j=2
(0) (0) (j) (j)
where ρ(0) = [ρ1 , . . . , ρm ]
and ρ(j) = [ρ1 , . . . , ρn ],
for j ≥ 1. Then, we
must show how to compute the above summation without explicitly using the
(j)
values of π (j) for j ≥ 2. We can do so if the reward rate of state si , for j ≥ 2
and i = 1, . . . , n, is a polynomial of degree k in j with arbitrary coefficients
[0] [1] [k]
ai , ai , . . . , ai :
(j) [0] [1] [k]
∀j ≥ 2, ∀i ∈ {1, 2, . . . , n}, ρi = ai + ai j + · · · + ai j k . (35)
(j)
The definition of ρi illustrates that the set of measures of interest that one can
compute includes any moment of the probability vector π as long as the reward
rate of the ith state in each set S (j) has the same polynomial coefficients for all
j ≥ 2. ∞
We compute j=2 π (j) ρ(j)T as follows
∞ (j) (j) T
∞ T
j=2 π ρ = j=2 π (j) a[0] + a[1] j + · · · + a[k] j k
(36)
= r[0] a[0]T + r[1] a[1]T + · · · + r[k] a[k]T ,
∞
and the problem is reduced to the computation of r[l] = j=2 j l π (j) , for l =
0, . . . , k. We show how r[k] , k > 0, can be computed recursively, starting from
r[0] , which is simply π (∗) .
∞
∞
r[k] · [(B + L + F(j) ) | ( j k F(j) − B)1T ] = [b[k] | c[k] ], (37)
j=1 j=1
k
∞
∞
k
∞
∞
(j) 4λ 0
jF = .
0 4λ
j=1
study, the queue length and queue length distribution, as well as probabilistic
indicators for the queueing model such as the caudal characteristic [16].
MAMSolver provides solutions for both DTMCs and CTMCs. The matrix-
analytic algorithms are defined in terms of matrices, making matrix manipula-
tions and operations the basic elements of the tool. The input to MAMSolver,
in the form of a structured text file, indicates the method to be used for the
solution and the finite set of matrices that accurately describe the process to be
solved. Several tests are performed within the tool to insure that special cases
are treated separately and therefore efficiently.
To address possible numerical problems that may arise during matrix op-
erations, we use well known and heavily tested routines provided by the La-
pack and BLAS packages10 . Methods such as LU-decomposition, GMRES, and
BiCGSTAB are used for the solution of systems of linear equations.
The solution of QBD processes starts with the computation of the matrix
R using the logarithmic reduction algorithm [16]. However for completeness
we provide also the classical iterative algorithm. There are cases when G (and
R) can be computed explicitly [28]. We check if the conditions for the explicit
computation hold in order to simplify and speedup the solution. The available
solution methods for QBD processes are matrix-geometric and Etaqa.
The classic matrix geometric solution is implemented to solve GI/M/1 pro-
cesses. The algorithm goes first through the classic iterative procedure to com-
pute R (to our knowledge, there is no alternative more efficient one). Then, it
computes the boundary part of the stationary probability vector. Since there
exists a geometric relation between vectors π (i) for i ≥ 1, there is no need to
compute the whole stationary probability vector.
M/G/1 processes require the computation of matrix G. More effort has
been placed on efficient solution of M/G/1 processes. MAMSolver provides the
classic iterative algorithm, the cyclic-reduction algorithm, and the explicit one
for special cases. The stationary probability vector is computed recursively using
either Ramaswami’s formula or its fast FFT version. Etaqa is the other available
alternative for the solution of M/G/1 processes.
For a set of input and output examples and the source code of MAMSolver,
we point the interested reader to the tool’s website
http://www.cs.wm.edu/MAMSolver/.
7 Concluding Remarks
In this tutorial, we derived the basic matrix analytic results for the solution of
M/G/1-type Markov processes. Via simple examples and from first principles,
we illustrated why the solution of QBD and GI/M/1-type processes is simpler
than the solution of M/G/1-type processes. We direct the interested reader in
the two books of Neuts [25,26] for further reading as well as to the book of
Latouche and Ramaswami [16]. Our target was to present enough material for
10
Available from http://www.netlib.org.
M/G/1-Type Markov Processes: A Tutorial 59
References
where (−Q[A, A])−1 [r, c] represents the mean time spent in state c ∈
A, starting from state r ∈ A, before reaching any state in A, and
((−Q[A, A])−1 Q[A, A])[r, c ] represents the probability that, starting
from r ∈ A, we enter A through state c . 2
state to reach any of the states in A, while the rth row of Z, which sums to one,
specifies how this rate should be redistributed over the states in A when the
process eventually reenters it.
Lemma 1. (Single entry) If A can be entered from A only through
a single state c ∈ A, the matrix Z defined in Eq. (42) is trivially com-
putable: it is a matrix of zeros except for its cth column, which contains
all ones. 2
γ
α α d
b d b
β μ β λ μ
λ a γ
a
ν ν e
e
δ δ c
c τ
τ
(a ) (b )
We choose the simple finite Markov chain depicted in Figure 7(a) to explain the
concept of stochastic complementation. The state space of this Markov chain is
S = {a, b, c, d, e}. We construct the stochastic complement of the states in set
A = {a, b, c} (A = {d, e}), as shown in Figure 7(b). The matrices used in Eq.(41)
for this example are:
⎡ ⎤ ⎡ ⎤
−(α + ν) α 0 0ν
Q[A, A] = ⎣ β −(γ + β) 0 ⎦ , Q[A, A] = ⎣ γ 0 ⎦ ,
δ 0 −δ 00
000 −μ μ
Q[A, A] = , Q[A, A] = .
00τ λ −(λ + τ )
Observe that in this case one can apply Lemma 1 to trivially construct the
stochastic complement, since A is entered from states in A only through state
c. There are only two transitions from states in A to states in A; the transition
with rate γ from state b to state d and the transition with rate ν from state a to
state e. These two transitions are folded back into A through state c, which is
the single entry in A. The following derivation shows that because of the special
single entry state the two folded transitions have the original rates, γ and ν
respectively.
01 λ+τ 1 001
000
Z = Norm(Q[A, A])(−Q[A, A])−1 Q[A, A] = 10 · μτ τ
λ · = 001 ,
1
μτ τ
00τ
00 000
B + LG + FG2 = 0, F + RL + R2 B = 0.
π (i) = π (i−1 R,
If matrix-analytic is the solution method then the relation between π (i) and
π (i−1) is based on Ramaswami’s recursive formula:
where S(1) = F and S(0) = (L + FG), i.e., the only auxiliary sums (see Subsec-
tion 4.1) used in the solution of M/G/1 processes that are defined for a QBD
process. The above equations allow the derivation of the fundamental relation
between R and G [16, pages 137-8],
Obviously, for the case of QBD processes, knowing G (or R) implies a direct
computation of R (or G). Computing G is usually easier than computing R:
G’s computation is a prerequisite to the computation of R in the logarithmic
reduction algorithm, the most efficient algorithm to compute R [16]. If B can
be expressed as a product of two vectors
B = α · β,
G = 1 · β, R = −F(L + F1 · β)−1 .
Representative examples, where the above condition holds, are the queues
M/Cox/1, M/Hr/1, and M/Er/1, whose service process is Coxian, Hyperex-
ponential, and Erlang distribution respectively.
An Algorithmic Approach to Stochastic Bounds
1 Introduction
Since Plateau’s seminal work on composition and compact tensor representation
of Markov chains using Stochastic Automata Networks (SAN), we know how to
model Markov systems with interacting components and large state space [29,30,
31]. The main idea of the SAN approach is to decompose the system of interest
into its components and to model each component separately. Once this is done,
interactions and dependencies among components can be added to complete the
model. The stochastic matrix of the chain is obtained after summations and
Kronecker (or tensor) products of local components. The benefit of the SAN ap-
proach is twofold. First, each component can be modeled much easier compared
to the global system. Second, the space required to store the description of com-
ponents is in general much smaller than the explicit list of transitions, even in
a sparse representation. However, using this representation instead of the usual
sparse matrix form increases the time required for numerical analysis of the
chains [6,15,37,33]. Note that we are interested in performance indices
R defined
as reward functions on the steady-state distribution (i.e. R = i r(i)π(i)) and
we do not try to compute transient measures. Thus the numerical computation
of the analysis is mainly the computation of the steady-state distribution and
then the summation of the elementary rewards r(i) to obtain R. The first step
is in general the most difficult because of the memory space and time require-
ments (see Steward’s book [34] for an overview of usual numerical techniques for
Markov chains). The decomposition and tensor representation has been general-
ized to other modeling formalisms as well : Stochastic Petri nets [13], Stochastic
Process Algebra [20]. So we now have several well-founded methods to model
complex systems using Markov chains with large state space.
Despite considerable works [7,12,15,37], the numerical analysis of Markov
chains, is still a very difficult problem when the state space is too large or the
M.C. Calzarossa and S. Tucci (Eds.): Performance 2002, LNCS 2459, pp. 64–88, 2002.
c Springer-Verlag Berlin Heidelberg 2002
An Algorithmic Approach to Stochastic Bounds 65
Thus we get:
Theorem 1 Let X(t) and Y (t) be two DTMC and P and Q be their respective
stochastic matrices. Then X(t) <st Y (t), t > 0, if
• X(0) <st Y (0),
• st-monotonicity of at least one of the matrices holds,
• st-comparability of the matrices holds, that is, Pi,∗ <st Qi,∗ ∀i.
2.2 Algorithms
It is possible to derive a set of equalities, instead of inequalities. These equalities
provides, once they have been ordered (in increasing order for i and in decreasing
order for j in system 2), a constructive way to design a stochastic matrix which
yields a stochastic bound.
! n n
Q1,k = k=j P1,k
k=j
n n n (2)
k=j Qi+1,k = max( k=j Qi,k , k=j Pi+1,k ) ∀ i, j
when we need them. And they can be stored to avoid computations. How-
ever, we let them appear as summations to show the relations with inequalities 1.
Even if matrix v(P ) is reducible, it has one essential class of states and the
last state belongs to that class. So it is still possible to compute the steady-
state distribution for this class. We do not prove the theorem but we present an
example of a matrix P 2 such that v(P 2) is reducible (i.e. states 0, 1 and 2 are
transient in matrix v(P 2)).
An Algorithmic Approach to Stochastic Bounds 69
⎡ ⎤ ⎡ ⎤
0.5 0.2 0.1 0.2 0.0 0.5 0.2 0.1 0.2 0.0
⎢ 0.1 0.7 0.1 0.0 0.1 ⎥ ⎢ 0.1 0.6 0.1 0.1 0.1 ⎥
⎢ ⎥ ⎢ ⎥
P2 = ⎢
⎢ 0.2 0.1 0.5 0.2 0.0 ⎥
⎥ Q = v(P 2) = ⎢
⎢ 0.1 0.2 0.5 0.1 0.1 ⎥
⎥
⎣ 0.0 0.0 0.0 0.7 0.3 ⎦ ⎣ 0.0 0.0 0.0 0.7 0.3 ⎦
0.0 0.2 0.2 0.1 0.5 0.0 0.0 0.0 0.5 0.5
2.3 Properties
Algorithm 1 has several interesting properties which can be proved using a max-
plus formulation [10] which appears clearly in equation 2.
Theorem 3 Algorithm 1 provides the smallest st-monotone upper bound for a
matrix P : i.e. if we consider U another st-monotone upper bounding DTMC for
P then Q <st U [1].
However bounds on the probability distributions may still be improved. The
former theorem only states that Algorithm 1 provides the smallest matrix. We
have developed new techniques to improve the accuracy of the bounds on the
steady-state π which are based on some transformations on P [10].
We have studied a linear transformation for stochastic matrices α(P, δ) =
(1−δ)I +δP , for δ ∈ (0, 1). This transformation has no effect on the steady-state
distribution but it has a large influence on the effect of Algorithm 1. We have
proved in [10] that if the given stochastic matrix is not row diagonally dominant,
then the steady-state probability distribution of the optimal st-monotone upper
bounding matrix corresponding to the row diagonally dominant transformed
matrix is better in the strong stochastic sense than the one corresponding to
the original matrix. And we have established that the transformation P/2 +
I/2 provides the best bound for the family of linear transformation we have
considered. More precisely:
Theorem 4 Let P be a DTMC of order n, and two different values δ1 , δ2 ∈ (0, 1)
such that δ1 < δ2, Then πv(α(P,δ1)) <st πv(α(P,δ2)) <st πv(P ) .
70 J.M. Fourneau and N. Pekergin
One may ask if there is an optimal value of δ. When the matrix is row diagonal
dominant (RDD), its diagonal serves as a barrier for the perturbation moving
from the upper-triangular part to the strictly lower-triangular part in forming
v(P ).
Definition 6 A stochastic matrix is said to be row diagonally dominant (RDD)
if all of its diagonal elements are greater than or equal to 0.5.
Corollary 1 Let P be a DTMC of order n that is RDD. Then v(P ) and v(α(P ))
have the same steady-state probability distribution.
the low triangle except the main sub-diagonal is zero). Therefore the resolution
by direct elimination is quite simple. In the following we illustrate this princi-
ple with several structures associated to simple resolution methods and present
algorithms to build structure based st-monotone bounding stochastic matrices.
Most of these algorithms do not assume any particular property or structure for
the initial stochastic matrix.
The paradigm for upper-Hessenberg case is the M/G/1 queue. The resolution
by recursion for these matrices requires o(m) operations [34].
3.2 Lumpability
Ordinary lumpability is another efficient technique to combine with stochastic
bounds [36]. Unlike the former algorithms, lumpability implies a state space
reduction. The algorithms are based on Algorithm 1 and on the decomposition
of the chain into macro-states. Again we assume that the states are ordered
according to the macro-state partition. Let r be the number of macro-states.
Let b(k) and e(k) be the indices of the first state and the last state, respectively,
of macro-state Ak . First, let us recall the definition of ordinary lumpability.
Definition 9 (ordinary lumpability) Let Q be the matrix of an irreducible
finite DTMC, let Ak be a partition of the states of the chain. The chain is
ordinary lumpable according to partition Ak , if and only if for all states e and f
in the same arbitrary macro state Ai , we have:
qe,j = qf,j ∀ macro − state Ak
j∈Ak j∈Ak
So we can subtract in both terms of relation 3 partial sums on the macro state
which are all equal due to ordinary lumpability. Therefore, assume that a, i and
i + 1 are in the same macro state Ak , we get
Q(i, j) ≤ Q(i + 1, j)
j≥a,j∈Ak j≥a,j∈Ak
The algorithm computes the matrix column by column. Each block needs two
steps. The first step is based on Algorithm 1 while second step modifies the
first column of the block to satisfy the ordinary lumpability constraint. More
precisely, the first step uses the same relations as Algorithm 1 but it has to
take into account that the first row of P and Q may now be different due to
the second step. The lumpability constraint is only known at the end of the
74 J.M. Fourneau and N. Pekergin
first step. Recall that ordinary lumpability is due to a constant row sum for
the block. Thus after the first step, we know how to modify the first column of
the block to obtain a constant row sum. Furthermore due to st-monotonicity,
we know that the maximal row sum is reached for the last row of the block. In
step 2, we modify the first column of the block taking into account the last row
sum. Once a block has been computed, it is now possible to compute the block
on the left.
This algorithm is used in the next section for the analysis of a mechanism for
high speed networks. Most of the algorithms presented here may be applied but
the best results, for this particular problem, were found with this last approach.
for j = n − 1, n − 2, . . . , 2 do
n n
pi,k − q1,k
k=j k=j
αj = max2≤i≤n i−1
n n n
gi = n−1
n−i k=j p i,k − k=j+1 q i,k + i−1
n−i k=j+1 qn,k −1
+
q1,j = [max1≤i≤n−1 gi ]
−q1,j n
cj = max( n−1 , αj+ − k=j+1 ck )
od n
q1,1 = 1 − j=2 q1,j
Feinberg and Chiu [14] have studied chains divided into macro-states where
the transition entering a macro-state must go through exactly one node. This
node is denoted as the input node of the macro-state. They have developed an
algorithm to efficiently compute the steady-state distribution by decomposition.
It consists of the resolution of the macro-state in isolation and the analysis of
the chain reduced to input nodes. Unlike ordinary lumpability, the assumptions
of the theorem are based on the graph of the transitions and do not take into
account the real transition rates.
It is very easy to modify Algorithm 1 to create a Single Input Macro State
Markov chain. We assume that for every macro state, the input state is the last
state of the macro state. Thus the matrix Q looks like this:
An Algorithmic Approach to Stochastic Bounds 77
⎡ ⎤
... ... ... ...
⎢ A ...0 ... ...0 ... ⎥
⎢ ⎥
⎢ ... ... ... ... ⎥
⎢ ⎥
⎢... ... ... ... ⎥
⎢ ⎥
⎢...0 ... B ...0 ... ⎥
⎢ ⎥
⎢... ... ... ... ⎥
⎢ ⎥
⎢... ... ... ... ⎥
⎢ ⎥
⎣...0 ... ...0 ... C ⎦
... ... ... ...
The algorithm is based on the following decomposition into three types of
block : diagonal blocks, upper triangle and lower triangle. The elements of diag-
onal blocks are computed using the same equalities as in Algorithm 1:
⎧ n n
⎨ Q1,j = k=j P1,k − k=j+1 Q1,k
n n n (6)
⎩
Qi+1,j = max( k=j Qi,k , k=j Pi+1,k ) − k=j+1 Qi+1,k
The elements of blocks in upper and lower triangles have the “single input”
structure : several columns of zero followed by a last column which is positive.
Furthermore, lower and upper triangles differ because the elements of lower
triangle of Q must follows inequalities which take into account the diagonal
blocks of Q. Let us denote by f (i) the lower index of the set which contains state
i. Then for all i, j in the upper triangle, we just have to sum up the elements of
P (take care of the lower index f (j) on the summation of the elements of P ):
⎧ n
⎪ Q1,n = k=f (n) P1,k
⎪
⎪
⎪
⎨ n n
Q1,j = k=f (j) P1,k − k=j+1 Q1,k (7)
⎪
⎪
⎪
⎪ n n n
⎩
Qi+1,j = max( k=j Qi,k , k=f (j) Pi+1,k ) − k=j+1 Qi+1,k
And for all i, j in the lower triangle (here the lower index f (j) is also also
used in the summation of the elements in the former row of Q):
& n n n
Qi+1,j = max( k=f (j) Qi,k , k=f (j) Pi+1,k ) − k=j+1 Qi+1,k (8)
The derivation of the algorithm is straightforward. Again let us apply this
algorithm on matrix P 1 with partition into two sets of size 2 and 3 to obtain
matrix Q (we also give the values of f for all the indices):
⎡ ⎤
0.5 0.2 0.0 0.0 0.3
⎢ 0.1 0.6 0.0 0.0 0.3 ⎥
⎢ ⎥
f = (1, 1, 3, 3, 3) Q = ⎢ ⎥
⎢ 0.0 0.3 0.4 0.0 0.3 ⎥
⎣ 0.0 0.1 0.1 0.5 0.3 ⎦
0.0 0.2 0.0 0.3 0.5
This structure have been used by several authors even if their proofs of compar-
ison are usually based on sample-path theorem [19,24,25].
78 J.M. Fourneau and N. Pekergin
a h ig h p rio rity c e ll
" p u s h e s -o u t" a lo w p rio rity c e ll
L L H H H H H L L
B a tc h B e rn o u illi D e te rm in is tic
a rriv a ls o f h ig h s e rv ic e tim e
a n d lo w p rio rity c e lls
th e lo w p rio rity
c e ll is lo s t
state space (T, H) where T is the total number of packets and H is the number
of high priority packets. The states are ordered according to a lexicographic
non decreasing ordering. It must be clear at this point that the ordering of
the state is a very important issue. First, the rewards have to be non decreasing
functions of the state indices. Furthermore, as the st-monotone property is based
on the state representation and ordering, the accuracy of the results may depend
on this ordering. Here, we are interested in the the expected number of lost
i
packet per slot. Let us denote by RM this expectation for type i packets and let
R = R + R . The difficult problem here is the computation of RH . Indeed R
H L
can be computed with a smaller chain since the Pushout mechanism does not
change the global number of losses. It is sufficient to analyze the global number
of packets (i.e without distinction). Such a chain has only B + 1 states if we use
a simple batch arrival process. For realistic values of buffer size (i.e. 1000), such
a chain is very simple to solve with usual numerical algorithms. However for
the same value of B, the chain of the HOL+Pushout mechanisms has roughly
5 105 states. So, we use Algorithm 4 to get a lumpable bounding matrix. And
we analyze the macro-state chain. First let us describe the ordering of the states
and the rewards. Let pH k be the probability of k arrivals of high priority packets
during one slot.
RH = Π(T, H) pH 2 max(0, (H + 2 − B − 1T =H ))
(T,H)
2 × Π(B, B)
R H = pH
Clearly, we have to estimate only one probability and the reward is a non
decreasing function which is zero everywhere except for the last state where its
value is one. For more general arrival process, the reward function is only slightly
different.
The key idea to shorten the state space is to avoid the states with large value
of low priority packets. So, we bound the matrix with an ordinary lumpable
matrix Q with o(B × F ) macro-states where the parameter F allows a trade-off
between the computational complexity and the accuracy of the results. More
precisely, we define macro-states (T, Y ) where Y is constrained to evolve in the
range T..T − F . If Y = T − F , then the state (T, Y ) is a real macro-state which
contains all the states (T, X) such that X ≤ T − F . In this case Y is a upper
bound of the number of high priority packets in the states which are aggregated.
If Y > T − F then the state contains only one state (T, X) where Y = X. So, Y
represents exactly the number of high priority packets in the buffer (see figure
2). Clearly, if the value of F is large, we do not reduce the state space but we
expect that the bound would be tight.
80 J.M. Fourneau and N. Pekergin
s ta te s w h e re
T h ig h p rio rity
c e lls c a n b e lo s t
B
(B -M + 2 , B -M + 2 ) M
(B -M + 2 , B -M + 1 )
H = T -F
F m a c ro -s ta te s
u n c h a n g e d
s ta te s
B H
In [16] we have analyzed small buffers to check the accuracy of the bound
and large buffers to show the efficiency of the method. Here, we only present
a typical comparison of these bounds for a small buffer of size 80 (these small
value allows the computation of the exact result for comparison purpose). The
load is 0.9 with 2/3 high level packets. With a sufficiently large value for F
(typically 10), the algorithm gives accurate results. The exact result for RH is
in this example 8.9 10−13 . The bound with F = 10 is 9.510−13 . Of course if
F is too small, the result is worse and can reach 10−6 for F = 2. The exact
chain has 3321 states while the bound with F = 10 is computed with a chain
of size 798. The number of states is divided by 4 and we only lost few digits. It
is worthy to remark that a reduction by an order on the states space implies a
reduction by two or three orders on the computation times for the steady state
distribution. And the reduction is much more important if the original chain is
bigger. Typically, for a buffer size of 1000 and an aggregation factor F equal to
20, the bounding matrix obtained from Algorithm 5 has roughly 20000 states.
The original state space is 25 times larger.
The results shows previously are very accurate. We have found several reasons
for that property. First the distribution is skewed. Almost all the probability
mass is concentrated on the states with a small number of packets. Moreover
the first part of the initial matrix is already st-monotone. This property is due
to the ordering of the states we have considered. Again, we have to emphasis
that the states ordering is a crucial issue for st-bounds [11].
For instance, consider the matrix of the chain for a small buffer of size 4.
The chain has 15 states ordered in a lexicographic way: {(0), (1, 0), (1, 1), (2, 0),
(2, 1), (2, 2), (3, 0), (3, 1), (3, 2), (3, 3), (4, 0), (4, 1), (4, 2), (4, 3), (4, 4)}. Let us de-
note by p, q and r respectively the arrival probabilities of arrival for a batch of
size 0, 1 or 2. And let a be the probability that an arriving packets is a low pri-
An Algorithmic Approach to Stochastic Bounds 81
ority one. Similarly b is the probability for a high level packet. The distribution
of packets types in a size 2 batch are respectively c for 2 low level packets, e for
two high level, and d for a mixed batch. Independence assumption on the type
of packets entering the queue lead to an important reduction of the number of
parameter (for instance c = a2 ). However, it is not necessary to illustrate the
effect of Algorithm 5.
⎛ ⎞
p qa qb rc rd re
⎜p qa qb rc rd re ⎟
⎜ ⎟
⎜p qa qb rc rd re ⎟
⎜ ⎟
⎜ p qa qb rc rd re ⎟
⎜ ⎟
⎜ p qa qb rc rd re ⎟
⎜ ⎟
⎜ p qa qb rc rd re ⎟
⎜ ⎟
⎜ p qa qb rc rd re ⎟
⎜ ⎟
P =⎜ ⎜ p qa qb rc rd re ⎟
⎟
⎜ p qa qb rc rd re ⎟
⎜ ⎟
⎜ p qa qb rc rd re ⎟
⎜ ⎟
⎜ p qa + rc qb + rd re ⎟
⎜ ⎟
⎜ p qa + rc qb + rd re ⎟
⎜ ⎟
⎜ p qa + rc qb + rd re ⎟
⎜ ⎟
⎝ p qa + rc qb + rd + re⎠
p qa + rc bq + rd + re
A careful inspection of matrix P shows that the 10 first rows of the matrix
already satisfy the st-monotone property. For a bigger buffer model, this prop-
erty is still true for the states where the buffer is not full. We assume that F = 2,
the only one non trivial values for such a small example). Thus, we consider two
real macro-states : {(3, 2), (3, 3)} and {(4, 2), (4, 3), (4, 4)}. Note that the initial
matrix is already lumpable since the scheduling of service and arrivals imply that
some states have similar transitions. For instance states (0), (1, 0) and (1, 1) can
be gathered into one macro-state). We use this property in the resolution algo-
rithm but we do not develop here to focus on the bounding algorithm. Algorithm
5 provides a lumpable matrix with the macro-states already defined which can
be aggregated to obtain (f = min(qb, rc) and g = max(qb, rc)):
⎛ ⎞
p qa qb rc rd re
⎜ p qa qb rc rd re ⎟
⎜ ⎟
⎜ p qa qb rc rd re ⎟
⎜ ⎟
⎜ p qa qb rc rd re ⎟
⎜ ⎟
⎜ p qa qb rc rd + re ⎟
⎜ ⎟
⎜ p qa qb rc rd + re ⎟
⎜ ⎟
⎜ p qa qb rc rd re ⎟
⎜ ⎟
⎜ p qa qb rc rd + re ⎟
⎜ ⎟
⎜ p qa + qb r ⎟
⎜ ⎟
⎜ p qa + f g − rc r ⎟
⎜ ⎟
⎝ p qa + f g + rd + re ⎠
p q+r
82 J.M. Fourneau and N. Pekergin
Stoyan’s proof in Theorem 4.2.5 of ([35], p.65]) that the monotonicity and the
comparability of transition matrices yield sufficient conditions for chain com-
parison is not restricted to “st” ordering. Similarly, the definitions of the mono-
tonicity and the comparison of stochastic matrices are much more general than
the statements presented in section 2. First let us turn back to the definitions
for “icx” ordering which is supposed to be more accurate than the st-ordering.
For discrete state space, it is also possible to use a matrix formulation through
matrix Kicx . Let p and q be respectively the probability distribution vectors of
X and Y . X <icx Y if and only if pKicx ≤ qKicx , where Kicx is defined as
following : ⎡ ⎤
10 0 ... 0
⎢2 1 0 ... 0⎥
⎢ ⎥
⎢3 2 1 ... 0⎥
Kicx = ⎢ ⎥
⎢ .. .. .. . . .. ⎥
⎣. . . ..⎦
n n − 1 n − 2 ... 1
This can be rewritten as follows :
n
n
X <icx Y ⇐⇒ (k − i + 1) pk ≤ (k − i + 1) qk , ∀i ∈ {1, . . . , n}
k=i k=i
Similarly, we can define the increasing concave ordering by the set non-
decreasing concave functions. In this case Kicv = −Kicx T
, where AT denotes
the transpose of matrix A.
Clearly, the icx-comparison and the icx-monotonicity of stochastic matri-
ces are defined in the same manner as the st-ordering (see definitions 2 and
3). However, the characterization of the <icx -monotonicity through matrix
Kicx must take into account the finiteness of matrix P . Indeed, the conditions
−1
Kicx P Kicx ≥ 0 provide sufficient conditions for the <icx -monotonicity. It is
known for a long time time that these conditions are also necessary for infinite
chains.
For finite chains, the necessary conditions were unknown until recently. More-
−1
over the conditions Kicx P Kicx ≥ 0 are very restrictive and they lead to a chain
whose first and last states are absorbing. Thus, it was not possible to develop an
algorithmic approach without an efficient necessary and sufficient condition for
monotonicity. Recently, in [2], Benmammoun has proved such conditions for the
icx-monotonicity of finite chains. This characterization is based on matrix Zicx
−1
which is slightly different from matrix Kicx .
An Algorithmic Approach to Stochastic Bounds 83
⎡ ⎤ ⎡ ⎤
1 0 0 ... 0 1 0 0 ... 0
⎢ −2 1 0 . . . 0⎥ ⎢ −1 1 0 . . . 0 ⎥
⎢ ⎥ ⎢ ⎥
⎢ 0 ⎥ Zicx = ⎢
⎥ ⎥
−1
Kicx = ⎢ 1 −2 1 . . . ⎢ 1 −2 1 . . . 0 ⎥
⎢ .. .. .. . . .. ⎥ ⎢ .. .. .. . . .. ⎥
⎣. . . . .⎦ ⎣. . . . . ⎦
0 . . . 1 −2 1 0 . . . 1 −2 1
for i = 3, 4, · · · , n do
n
qi,j = max k=j (k − j + 1)pi,k ,
n n
2 k=j (k − j + 1)qi−1,k − k=j (k − j + 1)qi−2,k
n
− k=j+1 (k − j + 1)qi,k ;
od
od n
for i = 1, 2 · · · n do qi,1 = 1 − j=2 qi,j ; od
Several heuristics may be used to solve this problem. Further researches are
still necessary to obtain a simple and efficient algorithm.
cn −cn
if cn < 0 then cn = 0 else cn = cn + const ;
for j = n − 1, n −2, · · · , 2 do
n n
f (i, j) = n−1
n−i k=j (k − j + 1)p i,k − k=j+1 (k − j + 1)q i,k
n
− n−i
i−1
[1 − k=j+1 qn,k ];
n
1− q1,k −q1,j
+ k=j+1
q1,j = (max1≤i≤n−1 f (i, j)) ; q1,j = q1,j + const ;
n
(k−j+1)pi,k − (k−j+1)qi,k −q1,j
αj = max2≤i≤n ( k=j k=j+1
i−1 );
n
−q1,j 1− qn,k −q1,j
k=j+1
cj = max(αj , n−1 ); cj = n−1 ;
n n
if cj < − k=j=1 (k − j + 1)ck then cj = − k=j+1 (k − j + 1)ck
cj −cj
else cj = cj const ;
od n
q1,1 = 1 − j=2 q1,j ;
n
c1 = − j=2 cj ;
6 Conclusions
Strong stochastic bounds are not limited to sample-path proofs. It is now possi-
ble to compute bounds of the steady-state distribution directly from the chain.
This approach may be specially useful for high speed networks modeling where
the performance requirements are thresholds. Using the algorithmic approach
86 J.M. Fourneau and N. Pekergin
we survey in this paper, a sample-path proof is not necessary anymore and these
algorithms may be integrated into software performance tools based on Markov
chains. Generalizations to other orderings or to computation of transient mea-
sures are still important problems for performance analysis.
References
1. Abu-Amsha O., VincentJ.-M.: An algorithm to bound functionals of Markov chains
with large state space. Int: 4th INFORMS Conference on Telecommunications,
Boca Raton, Florida, (1998)
2. Benmammoun M.: Encadrement stochastiques et évaluation de performances des
réseaux, PHD, Université de Versailles St-Quentin en Yvelines, (2002)
3. Benmammoun M., Fourneau J.M., Pekergin N., Troubnikoff A.: An algorithmic and
numerical approach to bound the performance of high speed networks, Submitted,
(2002)
4. Benmammoun M., Pekergin N.: Closed form stochastic bounds on the stationary
distribution of Markov chains. To appear in Probability in the Engineering and
Informational Sciences, (2002)
5. Boujdaine F., Dayar T., Fourneau J.M., Pekergin N., Saadi S., Vincent J.M.: A new
proof of st-comparison for polynomials of a stochastic matrix, Submitted, (2002)
6. Buchholz P.: An aggregation\disaggregation algorithm for stochastic automata net-
works. In: Probability in the Engineering and Informational Sciences, V 11, (1997)
229–253
7. Buchholz P.: Projection methods for the analysis of stochastic automata networks.
In: Proc. of the 3rd International Workshop on the Numerical Solution of Markov
Chains, B. Plateau, W. J. Stewart, M. Silva, (Eds.), Prensas Universitarias de
Zaragoza, Spain, (1999) pp. 149–168.
8. Courtois P.J., Semal P.: Bounds for the positive eigenvectors of nonnegative ma-
trices and for their approximations by decomposition. In: Journal of ACM, V 31
(1984) 804–825
9. Courtois P.J., Semal P.: Computable bounds for conditional steady-state prob-
abilities in large Markov chains and queueing models. In: IEEE JSAC, V4, N6,
(1986)
10. Dayar T., Fourneau J.M., Pekergin N.: Transforming stochastic matrices for
stochastic comparison with the st-order, Submitted, (2002)
11. Dayar T., Pekergin, N.: Stochastic comparison, reorderings, and nearly completely
decomposable Markov chains. In: Proceedings of the International Conference on
the Numerical Solution of Markov Chains (NSMC’99), (Ed. Plateau, B. Stewart,
W.), Prensas universitarias de Zaragoza. (1999) 228–246
12. Dayar T., Stewart W. J.: Comparison of partitioning techniques for two-level iter-
ative solvers on large sparse Markov chains. In: SIAM Journal on Scientific Com-
puting V21 (2000) 1691–1705.
13. Donatelli S.: Superposed generalized stochastic Petri nets: definition and efficient
solution. In: Proc. 15th Int. Conf. on Application and Theory of Petri Nets,
Zaragoza, Spain, (1994)
14. Feinberg B.N., Chiu S.S.: A method to calculate steady-state distributions of large
Markov chains by aggregating states. In: Oper. Res, V 35 (1987) 282-290
15. Fernandes P., Plateau B., Stewart W.J.: Efficient descriptor-vector multiplications
in stochastic automata networks. In: Journal of the ACM, V45 (1998) 381–414.
An Algorithmic Approach to Stochastic Bounds 87
16. Fourneau J.M., Pekergin N., Taleb H.: An Application of Stochastic Ordering to
the Analysis of the PushOut Mechanism. In Performance Modelling and Evaluation
of ATM Networks, Chapman and Hall, (1995) 227–244
17. Fourneau J.M., Quessette F.: Graphs and Stochastic Automata Networks. In: Pro-
ceedings of the 2nd Int. Workshop on the Numerical Solution of Markov Chains,
Raleigh, USA, (1995)
18. Hébuterne G., Gravey A.: A space priority queueing mechanism for multiplexing
ATM channels. In: ITC Specialist Seminar, Computer Network and ISDN Systems,
V20 (1990) 37–43
19. Golubchik, L. and Lui, J.: Bounding of performance measures for a threshold-based
queuing systems with hysteresis. In: Proceeding of ACM SIGMETRICS’97, (1997)
147–157
20. Hillston J., Kloul L.: An Efficient Kronecker Representation for PEPA Models. In:
PAPM’2001, Aachen Germany, (2001)
21. Keilson J., Kester A.: Monotone matrices and monotone Markov processes. In:
Stochastic Processes and Their Applications, V5 (1977) 231–241
22. Kijima M.: Markov Processes for stochastic modeling. Chapman & Hall (1997)
23. Latouche G., Ramaswami V.: Introduction to Matrix Analytic Methods in Stochas-
tic Modeling. SIAM, (1999)
24. Lui, J. Muntz, R. and Towsley, D.: Bounding the mean response time of the min-
imum expected delay routing policy: an algorithmic approach. In: IEEE Transac-
tions on Computers. V44 N12 (1995) 1371–1382
25. Lui, J. Muntz, R. and Towsley, D.: Computing performance bounds of Fork-Join
parallel programs under a multiprocessing environment. In: IEEE Transactions on
Parallel and Distributed Systems. V9 N3 (1998) 295–311
26. Meyer C.D.: Stochastic complementation, uncoupling Markov chains, and the the-
ory of nearly reducible systems. In: SIAM Review. V31 (1989) 240–272.
27. Pekergin N.: Stochastic delay bounds on fair queueing algorithms. In: Proceedings
of INFOCOM’99 New York (1999) 1212–1220
28. Pekergin N.: Stochastic performance bounds by state reduction. In: Performance
Evaluation V36-37 (1999) 1–17
29. Plateau B.: On the stochastic structure of parallelism and synchronization models
for distributed algorithms. In: Proceedings of the SIGMETRICS Conference on
Measurement and Modeling of Computer Systems, Texas (1985) 147–154
30. Plateau B., Fourneau J.-M., Lee K.-H.: PEPS: A package for solving complex
Markov models of parallel systems. In: Modeling Techniques and Tools for Com-
puter Performance Evaluation, R. Puigjaner, D. Potier (Eds.), Spain (1988) 291–
305
31. Plateau B., Fourneau J.-M.: A methodology for solving Markov models of parallel
systems. In: Journal of Parallel and Distributed Computing. V12 (1991) 370–387.
32. Shaked M., Shantikumar J.G.: Stochastic Orders and Their Applications. In: Aca-
demic Press, California (1994)
33. Stewart W.J., Atif K., Plateau B.: The numerical solution of stochastic automata
networks. In: European Journal of Operational Research V86 (1995) 503–525
34. Stewart W. J.: Introduction to the Numerical Solution of Markov Chains. Princeton
University Press, (1994)
35. Stoyan D.: Comparison Methods for Queues and Other Stochastic Models. John
Wiley & Sons, Berlin, Germany, (1983)
36. Truffet L.: Reduction Technique For Discrete Time Markov Chains on Totally
Ordered State Space Using Stochastic Comparisons. In: Journal of Applied Prob-
ability, V37 N3 (2000)
88 J.M. Fourneau and N. Pekergin
37. Uysal E., Dayar T.: Iterative methods based on splittings for stochastic automata
networks. In: European Journal of Operational Research, V 110 (1998) 166–186
38. Van Dijk N.: Error bound analysis for queueing networks” In: Performane 96 Tu-
torials, Lausanne, (1996)
Dynamic Scheduling via Polymatroid
Optimization
David D. Yao
1 Polymatroid
M.C. Calzarossa and S. Tucci (Eds.): Performance 2002, LNCS 2459, pp. 89–113, 2002.
c Springer-Verlag Berlin Heidelberg 2002
90 D.D. Yao
since A ∩ B ⊂ A ∪ B. Therefore,
f (A ∪ B) + f (A ∩ B) = xi + xi = xi + xi ≤ f (A) + f (B),
A∪B A∩B i∈A i∈B
1.2 Optimization
Assume
c1 ≥ c2 ≥ · · · ≥ cn ≥ 0, (2)
without loss of generality, since any negative ci clearly results in the correspond-
ing xi = 0. Let π = (1, 2, · · · , n). Then, we claim that the vertex xπ in Definition
2 is the optimal solution to (P).
92 D.D. Yao
To verify the claim, we start with writing down the dual problem as follows:
( D) min yA f (A)
A⊆E
s.t. yA ≥ ci , for all i ∈ E,
Ai
yA ≥ 0, for all A ⊆ E.
2 Conservation Laws
To relate to the last section, here E = {1, 2, ..., n} denotes the set of all job
classes, and x denotes the vector of performance measures of interest. For in-
stance, xi is the (long-run) average delay or throughput of job class i.
The conservation laws defined below were first formalized in Shanthikumar
and Yao [39], where the connection to polymatroid was made. In [39], as well as
subsequent papers in the literature, these laws are termed “strong conservation
laws.” Here, we shall simply refer to these as conservation laws.
Verbally, conservation laws can be summarized into the following two state-
ments:
(i) the total performance (i.e., the sum) over all job classes in E is invariant
under any admissible policy;
(ii) the total performance over any given subset, A ⊂ E, of job classes is mini-
mized (or maximized) by offering priority to job classes in this subset over
all other classes.
As a simple example, consider a system of two job classes. Each job (of either
class) brings a certain amount of “work” (service requirement) to the system.
Suppose the server serves (i.e., depletes work) at unit rate. Then it is not difficult
to see that (i) the total amount of work, summing over all jobs of both classes
that are present in the system, will remain invariant regardless of the actual
policy that schedules the server, as long as it is non-idling; and (ii) if class 1
jobs are given preemptive priority over class 2 jobs, then the amount of work
in system summing over class 1 jobs is minimized, namely, it cannot be further
reduced by any other admissible policy.
We now state the formal definition of conservation laws. For any A ⊆ E, de-
note by |A| the cardinality of A. Let A denote the space of all admissible policies
— all non-anticipative and non-idling policies (see more details below), and xu
the performance vector under an admissible policy u ∈ A. As before, let π denote
a permutation of the integers {1, 2, ..., n}. In particular, π = (π1 , ..., πn ) denotes
a priority rule, which is admissible, and in which class π1 jobs are assigned the
highest priority, and class πn jobs, the lowest priority.
Definition 4. (Conservation Laws) The performance vector x is said to satisfy
conservation laws, if there exists a set function b (or respectively f ): 2E → + ,
satisfying
b(A) = xπi , ∀π : {π1 , ..., π|A| } = A, ∀A ⊆ E; (4)
i∈A
or respectively,
f (A) = xπi , ∀π : {π1 , ..., π|A| } = A, ∀A ⊆ E; (5)
i∈A
94 D.D. Yao
(when A = ∅, by definition, b(∅) = f (∅) = 0); such that for all u ∈ A the
following is satisfied:
xui ≥ b(A), ∀A ⊂ E; xui = b(E); (6)
i∈A i∈E
or respectively,
xui ≤ f (A), ∀A ⊂ E; xui = f (E). (7)
i∈A i∈E
2.2 Examples
Consider a queueing system with n different job classes which are denoted by the
set E. Let u be the control or scheduling rule that governs the order of service
among different classes of jobs. Let A denote the class of admissible controls,
which are required to be non-idling and non-anticipative. That is, no server is
allowed to be idle when there are jobs waiting to be served, and the control is
only allowed to make use of past history and current state of the system. Neither
can an admissible control affect the arrival processes or the service requirements
of the jobs. Otherwise we impose no further restrictions on the system. For
instance, the arrival processes and the service requirements of the jobs can be
arbitrary. Indeed, since the control cannot affect the arrival processes and the
service requirements, all the arrival and service data can be viewed as generated
a priori following any given (joint) distribution and with any given dependence
relations. We allow multiple servers, and multiple stages (e.g., tandem queues
or networks of queues). We also allow the control to be either preemptive or
non-preemptive. (Some restrictions will be imposed on individual systems to be
studied below.)
Let xui be a performance measure of class i (i ∈ E) jobs under control u.
This need not be a steady-state quantity or an expectation; it can very well
be a sample-path realization over a finite time interval, for instance, the delay
(sojourn time) of the first m class i jobs, the number of class i jobs in the system
at time t, or the number of class i job completions by time t. Let xu := (xui )i∈E
be the performance vector.
For any given permutation π ∈ Π, let xπ denote the performance vector
under a priority scheduling rule that assigns priority to the job classes according
to the permutation π, i.e., class π1 has the highest priority, ..., class πn has the
lowest priority. Clearly any such priority rule belongs to the admissible class.
In all the queueing systems studied below, the service requirements of the
jobs are mutually independent, and are also independent of the arrival processes.
(One exception to these independence requirements is Example 1 below, where
these independence assumptions are not needed.) No independence assumption,
however, is required for the arrival processes, which can be arbitrary. When a
performance vector satisfies conservation laws, whether its state space is B(b)
(6) or B(f ) (7) depends on whether the performance of a given subset of job
classes is minimized or maximized by giving priority to this subset. This is often
immediately evident from the context.
Example 1 Consider a G/G/1 system that allows preemption. For i ∈ E, let
Vi (t) denote the amount of work (processing requirement) in the system at time
t due to jobs of class i. (Note that for any given t, Vi (t) is a random quantity,
corresponding to some sample realization of the work-load process.) Then it is
easily verified that for any t, x := [Vi (t)]i∈E satisfies conservation laws.
Example 2 Continue with the last example. For all i ∈ E, let Ni (t) be the
number of class i jobs in the system at time t. When the service times follow
exponential distributions, with mean 1/μi for class i jobs, we have ENi (t) =
96 D.D. Yao
μi EVi (t). Let Wi be the steady-state sojourn time in system for class i jobs.
From Little’s Law we have EWi = ENi /λi = EVi /ρi , where λi is the arrival rate
of class i jobs, ρi := λi /μi , Ni and Vi are the steady-state counterparts of Ni (t)
and Vi (t), respectively. Hence, the following x also satisfies conservation laws:
(i) for any given t, x := [ENi (t)/μi ]i∈E ;
(ii) x := [ρi EWi ]i∈E .
Example 3 In a G/M/c (c > 1) system that allows preemption, if all job classes
follow the same exponential service-time distribution (with mean 1/μ), then it
is easy to verify that for any t, x := [ENi (t)]i∈E satisfies conservation laws. In
this case, EVi (t) = ENi (t)/μ and EWi = ENi /λi . Hence, x defined as follows
satisfies conservation laws:
(i) for any given t, x := [ENi (t)]i∈E , x := [EVi (t)]i∈E ;
(ii) x := [λi EWi ]i∈E .
(If the control is restricted to be non-preemptive, the results here still hold true.
See Example 6 below.)
Example 4 The results in Example 3 still hold when the system is a network
of queues, provided all job classes follow the same exponential service-time dis-
tribution and the same routing probabilities at each node (service-time distri-
butions and routing probabilities can, however, be node dependent); (external)
job arrival processes can be arbitrary and can be different among the classes.
We next turn to considering cases where the admissible controls are restricted
to be non-preemptive.
Example 6 Consider the G/G/c system, c ≥ 1. If all job classes follow the same
service-time distribution, then it is easy to see that the scheduling of the servers
will not affect the departure epochs of jobs (in a pathwise sense); although
it will affect the identity (class) of the departing jobs at those epochs. (See
Shanthikumar and Sumita [38], §2, for the G/G/1 case; the results there also
hold true for the G/G/c case.) Hence, for any given t, x := [Ni (t)]i∈E satisfies
conservation laws.
Dynamic Scheduling via Polymatroid Optimization 97
Example 7 Comparing the above with Example 3, we know that the results
there also hold for non-preemptive controls. However, in contrast to the extension
of Example 3 to the network case in Example 4, the above can only be extended
to queues in tandem, where overtaking is excluded. Specifically, the result in
Example 6 also holds for a series of G/G/c queues in tandem, where at each
node all job classes have the same service-time distribution, which, however, can
be node dependent. External job arrival processes can be arbitrary and can be
different among classes. The number of servers can also be node dependent.
Example 8 With non-preemptive control, there is a special case for the G/G/1
system with only two job classes (n = 2) which may follow different service-time
distributions: for any given t, x := [Vi (t)]i∈E satisfies conservation laws.
For steady-steady measures, from standard results in GI/G/1 queues (see,
e.g., Asmussen [1], Chapter VIII, Proposition 3.4), we have
EVi = μ−1
i [ENi − ρi ] + ρi μi mi /2
and
EVi = ρi [EWi − μ−1
i + μi mi /2],
where mi is the second moment of the service time of class i jobs. Hence, following
the above, we know that x = [ENi /μi ]i∈E and x = [ρi EWi ]i∈E also satisfy
conservation laws.
where in both (i) and (ii) α > 0 is a discount rate, and t is any given time.
Finally, note that in all the above examples, with the exception of Exam-
ple 5, whenever [ENi (t)]i∈E satisfies conservation laws, [EDi (t)]i∈E also satisfies
conservation laws, since in a no-loss system the number of departures is the dif-
ference between the number of arrivals (which is independent of the control) and
the number in system.
Evidently, based on the above discussions, the state space of the performance
vectors in each of the examples above is a polymatroid.
98 D.D. Yao
where x is a performance measure that satisfies conservation laws, and the cost
coefficients ci (i ∈ E) satisfy, without loss of generality, the ordering in (2).
Then, this optimal control problem can be solved by solving the following linear
program (LP):
max ci xi [ or min ci xi ].
x∈B(f ) x∈B(b)
i∈E i∈E
The optimal solution to this LP is simply the vertex xπ ∈ B(f ), with π = (1, ..., n)
being the permutation corresponding to the decreasing order of the cost coeffi-
cients in (2). And the optimal control policy is the corresponding priority rule,
which assigns the highest priority to class 1 jobs, and the lowest priority to class
n jobs.
where ci is the inventory holding cost rate for class i jobs. We then rewrite this
objective as
min ci μi xi .
i∈E
(Note that (Ni )i∈E does not satisfy conservation laws; (xi )i∈E does.) Then, we
know from the above theorem that the optimal policy is a priority rule, with the
priorities assigned according to the ci μi values — the larger the value, the higher
the priority. This is what is known as the “cμ-rule”. When all jobs have the same
cost rate, the priorities follow the μi values, i.e., the faster the processing rate (or,
the shorter the processing time), the higher the priority, which is the so-called
SPT (shortest processing time) rule.
Although conservation laws apply to the many examples in the last section, there
are other interesting and important problems that do not fall into this category.
A primary class of such examples includes systems with feedback, i.e., jobs may
come back after service completion. For example, consider the so-called Klimov’s
problem: a multi-class M/G/1 queue in which jobs, after service completion, may
return and switch to another class, following a Bernoulli mechanism. Without
feedback, we know this is a special case of Example 1, and the work in system,
[Vi (t)]i∈E , satisfies conservation laws. With feedback, however, the conservation
laws as defined in Definition 4, need to be modified.
Specifically, with the possibility of feedback, the work of a particular job class,
say class i, should not only include the work associated with class i jobs that are
present in the system, it should also take into account the potential work that
will be generated by feedback jobs, which not only include class i jobs but also
all other classes that may feedback to become class i. With this modification,
the two intuitive principles of conservation laws listed at the beginning of §2.1
will apply.
To be concrete, let us paraphrase here the simple example at the beginning of
§2.1 with two job classes, allowing the additional feature of feedback. As before,
suppose the server serves at unit rate. Then it is not difficult to see that (i) the
total amount of potential work, summing over both classes, will remain invariant
100 D.D. Yao
regardless of the actual schedule that the server follows, as long as it is a non-
idling schedule; and (ii) if class 1 jobs are given (preemptive) priority over class 2
jobs, then the amount of potential work due to class 1 jobs is minimized, namely,
it cannot be further reduced by any other scheduling rule. And the same holds
for class 2 jobs, if given priority over class 1 jobs.
Another way to look at this example: Let T be the first time there is no class 1
jobs left in the system. Then, T is minimized by giving class 1 jobs (preemptive)
priority over class 2 jobs. In particular, T is no smaller than the potential work
of class 1 generated by class 1 jobs (only); T is equal to the latter if and only if
class 1 jobs are given priority over class 2 jobs.
Therefore, with this modification, the conservation laws in Definition 4 can
be generalized. The net effect, as will be demonstrated in the examples below,
is that the variables xi in Definition 4 will have to be multiplied with different
coefficients aA i that depend on both the job classes (i) and the subsets (A). In
particular, when xi is, for instance, the average number of jobs of class i, aA i
denotes the rate of potential work of those classes in set A that is generated by
class i jobs.
We now state the formal definition of generalized conservation laws (GCL),
using the same notation wherever possible as in Definition 4.
Definition 5. (Generalized Conservation Laws) The performance vector x is
said to satisfy generalized conservation laws (GCL), if there exists a set function
b (or respectively f ): 2E → + , and a matrix (aSi )i∈E, S⊆E (which is in general
different for b and f , but we will not make this distinction below for notational
simplicity) satisfying:
such that
b(A) = aA
πi xπi , ∀π : {π1 , ..., π|A| } = A, ∀A ⊆ E; (8)
i∈A
or respectively,
f (A) = aA
πi xπi , ∀π : {π1 , ..., π|A| } = A, ∀A ⊆ E; (9)
i∈A
or respectively,
i xi ≤ f (A),
aA ∀A ⊂ E;
u
aE u
i xi = f (E). (11)
i∈A i∈E
It is obvious from the above definition that GCL reduces to the conservation
i = 1 for all i ∈ A, and all A ⊆ E.
laws if aA
Dynamic Scheduling via Polymatroid Optimization 101
3.2 Examples
Example 11 (Klimov’s problem [32]) This concerns the optimal control of a
system in which a single server is available to serve n classes of jobs. Class i jobs
arrive according to a Poisson process with rate αi , which is independent of other
classes of jobs. The service times for class i jobs are independent and identically
distributed with mean μi . When the service of a class i job is completed, it either
returns to become a class j job, with probability pij , or leaves the system with
probability 1 − j pij . Denote α = (αi )i∈E , μ = (μi )i∈E , and P = [pij ]i,j∈E .
Consider the class of non-preemptive policies. The performance measure is
where the coefficients are given as aSi = λi βiS , with λ = (λ)i∈E and β S = (β)i∈S
obtained as follows:
where PSS and μS are, respectively, the restriction of P and μ to the set S. Note
that here, λi is the overall arrival rate of class i jobs (including both external
arrivals and feedback jobs), βiS is the amount of potential work of the classes in
S generated by a class i job. (Hence, this potential work is generated at rate αi
in the system.) Summing over i ∈ S yields the total amount of potential work
of the classes in S (generated by the same set of jobs), which is minimized when
these jobs are given priority over other classes. This is the basic intuition as to
why x satisfies GCL.
Bertsimas and Niño-Mora [3] showed that xu = (xui )i∈E , as defined above,
satisfy the GCL, with coefficients
* TiSc −αt
E[ e dt]
aSi = *0 v , i ∈ S ⊆ E,
E[ 0 i e−αt dt]
and ) E
) c
S
Tm Tm
−αt −αt
b(S) = E e dt − E e dt .
0 0
Intuitively, the GCL here says that the time until all the S c -descendents of all
the projects in S are served is minimized by giving project classes in S c priority
over those in S.
An undiscounted version is also available in [3]. (This includes Klimov’s prob-
lem, the last example above, as a special case.) The criterion here is to minimize
the total expected cost incurred under control u during the first busy period (of
the server) [0, T ], i∈E ci xui , with
) ∞
xi = Eu
u u
tIi (t)dt .
0
and
Dynamic Scheduling via Polymatroid Optimization 103
1 1
Sc 2
b(S) = E[(Tm ) ] − E[(Tm
E 2
) ]+ bi (S),
2 2
i∈S
where c c
E[vi ]E[vi2 ] E[TiS ] E[(TiS )2 ]
bi (S) = − , i ∈ S.
2 E[vi ] E[vi2 ]
The intuition is similar to the discounted case.
4 Extended Polymatroid
4.1 Equivalent Definitions
Recall the space of any performance measure that satisfies conservation laws is a
polymatroid. Analogously, one can ask what is the structure of the performance
space under GCL, i.e., what is the structure of the following polytopes:
EP(b) = { x ≥ 0 : aSi xi ≥ b(S), S ⊆ E }, (12)
i∈S
EP(f ) = { x ≥ 0 : aSi xi ≤ f (S), S ⊆ E }. (13)
i∈S
The most natural route to approach this issue appears to be mimicking Def-
inition 2 of polymatroid (and this is indeed the route taken in [3]). Similar
to the definition of xπ preceding Definition 2, here, given a permutation π (of
{ 1, 2, · · · , n }), we can generate a vertex xπ as follows.
reserved for the f function. For simplicity, we do not make such a distinction
here and below.)
With the above definition for EP, the right hand side functions b and f are
not necessarily increasing and supermodular/submodular. In other words, we do
not have a counterpart of Definition 1 for EP (more on this later). On the other
hand, the counterpart for Definition 3 does apply.
where
f ({1})
f˜(S) := f (S) − {1} aS1 .
a1
Clearly, since the stated condition in Definition 7 is assumed to hold for EP(f )
(the one with k + 1 variables), it also holds for EP(f˜) (the one with k variables),
{1}
since the equations in question all differ by an amount f ({1})a1 /aS1 on both
sides. Hence, the induction hypothesis confirms that EP(f˜) is an EP. This implies
that (xπ2 , ..., xπn ) ∈ EP(f˜), which is equivalent to xπ = (xπ1 , xπ2 , ..., xπn ) satisfying
all the constraints in EP(f ) that involve S ⊆ E with 1 ∈ S.
We still need to check that xπ satisfies all the other constraints in EP(f )
corresponding to S ⊆ E with 1 ∈ S. To this end, consider the following polytope:
{x ≥ 0 : aSi xi ≤ f (S), S ⊆ E \ {1}}. (14)
i∈S
Dynamic Scheduling via Polymatroid Optimization 105
The above is another polytope with k variables. Obviously the stated condition
in Definition 7, which is assumed to hold for the polytope EP(f ), holds for the
above polytope as well (since the defining inequalities in the latter are just part
of those in EP(f )). Hence, based on the induction hypothesis, the polytope in
(14) is also an EP. This implies that (xπ2 , ..., xπn ), and hence xπ , satisfies all the
inequalities involved in (14).
Hence, we have established that given the stated condition in Definition 7,
xπ does satisfy all the constraints in EP(f ), for each permutation π. Therefore,
EP(f ) is an EP.
The above theorem leads immediately to the following:
Corollary 1. If EP(f ) is an extended polymatroid, then
EP − (f ) := {x ≥ 0 : aSi xi ≤ f (S), S ⊆ E \ E0 }
i∈S
5 Optimization over EP
Here we consider the optimization problem of maximizing a linear function over
the EP, EP(f ), defined in (13):
(PG) max ci xi
i∈E
s.t. i xi ≤ f (A),
aA for all A ⊆ E,
i∈A
xi ≥ 0, for all i ∈ E.
The dual problem can be written as follows:
(DG) min yA f (A)
A⊆E
s.t. i ≥ ci ,
yA aA for all ∈ E,
Ai
yA ≥ 0, for all A ⊆ E.
106 D.D. Yao
Let us start with π = (1, 2, · · · , n), and consider xπ , the vertex defined at
the beginning part of the last section. Below we write out the objective function
of (PG) at xπ , and use the expression, along with complementary slackness, to
identify a candidate for the dual solution. From dual feasibility, we then identify
the conditions under which π is the optimal permutation. Collectively, these
steps constitute an algorithm that finds the optimal π.
For simplicity, write x for xπ below. We first write out xn in the objective
function:
,
n {1,···,n}
n−1
n−1
ci xi = cn f ({1, · · · , n}) − ai xi a{1,···,n}
n + ci xi
i=1 i=1 i=1
n−1
{1,···,n}
= y{1,···,n} f ({1, · · · , n}) + ci − y{1,···,n} ai xi ,
i=1
where we set
y{1,···,n} = cn /a{1,···,n}
n .
Next, we write out xn−1 in the summation above, and set
{1,···,n} {1,···,n−1}
y{1,···,n−1} = (cn−1 − y{1,···,n} an−1 )/an−1 ,
where
⎛ ⎞,
n
{1,···,j} ⎠ {1,···,k}
y{1,···,k} = ⎝ck − y{1,···,j} ak ak , (16)
j=k+1
satisfying the first set of constraints in (DG). So it suffices to show that the
n non-zero dual variables in (16) are non-negative. To this end, we need to be
specific about the construction of the permutation π = (1, ..., n).
Let us start from the last element in π. Note that from (16), we have
cn
y{1,···,n} = {1,···,n}
≥ 0.
an
Next, to ensure y{1,···,n−1} ≥ 0, the numerator of its expression in (16) must be
non-negative, i.e.,
cn−1 cn
{1,···,n}
≥ y{1,···,n} = {1,···,n} .
an−1 an
Therefore, the index n has to be:
ci
n = arg min {1,···,n}
.
i ai
or
{1,···,n}
cn−2 − y{1,···,n} an−2
{1,···,n−1}
≥ y{1,···,n−1} .
an−2
Hence, the choice of n − 1 has to be:
{1,···,n}
ci − y{1,···,n} ai
n − 1 = arg min {1,···,n−1}
.
i≤n−1 ai
This procedure can be repeated until all elements of the permutation is de-
termined. In general, the index k is chosen in the order of k = n, n − 1, ..., 1, and
it has to satisfy:
n {1,···,j}
ci − j=k+1 y{1,···,j} ai
k = arg min {1,···,k}
.
i≤k ai
Formally, the following algorithm solves the dual problem (DG) in terms
of generating the permutation π, along with the dual solution y π . The optimal
primal solution is then the vertex, xπ , corresponding to the permutation π.
108 D.D. Yao
Proof. Following the discussions preceding the algorithm, it is clear that we only
π
need to check yS(k) ≥ 0, for k = 1, · · · , n.
When k = n, following the algorithm, we have S(n) = E, and
π
Inductively, suppose yS(j) ≥ 0, for j = k + 1, ..., n, have all been determined.
π
The choice of πk+1 and hence yS(k+1) in the algorithm guarantees
n
S(j)
ck − π
yS(j) ak ≥ 0,
j=k
π
and hence yS(k) ≥ 0.
That the optimal solution is generated in O(n2 ) steps is evident from the
description of the algorithm.
To summarize, the two remarks at the end of §1.2 for the polymatroid op-
timization also apply here: (i) primal feasibility is automatic, by way of the
definition of EP; and (ii) dual feasibility, along with complementary slackness,
identifies the permutation π that defines the (primal) optimal vertex.
Furthermore, there is also an analogy to (3), i.e., the sum of dual variables
yields the priority index. To see this, for concreteness consider Klimov’s problem,
with the performance measure xi being the (long-run) average number of class
i jobs in the system. (For this example, we are dealing with a minimization
problem over the EP EP(b). But all of the above discussions, including the
algorithm, still apply, mutatis mutandis, such as changing f to b and max to
min, etc.) The optimal policy is a priority rule corresponding to the permutation
π generated by the above algorithm, with the jobs of class π1 given the highest
Dynamic Scheduling via Polymatroid Optimization 109
priority, and jobs of class πn , the lowest priority. Let y ∗ be the optimal dual
solution generated by the algorithm. Define
γi := yS∗ , i ∈ E.
Si
Then, we have
∗ ∗
γπi = y{π 1 ,···,πi }
+ · · · + y{π 1 ,···,πn }
, i ∈ E. (17)
Note that γπi is decreasing in i, since the dual variables are non-negative. Hence,
the order of γπi ’s is in the same direction as the priority assignment. In other
words, (17) is completely analogous to (3): just like the indexing role played by
the cost coefficients in the polymatroid case, in the EP case here {γi } is also a
set of indices upon which the priorities are assigned: at each decision epoch, the
server chooses to serve, among all waiting jobs, the job class with the highest γ
index.
Finally, we can synthesize all the above discussions on GCL and its con-
nection to EP, and on optimization over an EP, to come up with the following
generalization of Theorem 3.
Theorem 7. Consider the optimal control problem in Theorem 3:
max ci xui [ or min ci xui ].
u∈A u∈A
i∈E i∈E
Suppose x is a performance measure that satisfies GCL. Then, this optimal con-
trol problem can be solved by solving the following LP:
max ci xi [ or min ci xi ].
x∈EP(f ) x∈EP(b)
i∈E i∈E
The optimal solution to this LP is simply the vertex xπ ∈ B(f ), with π being the
permutation identified by Algorithm 1; and the optimal policy is the corresponding
priority rule, which assigns the highest priority to class π1 jobs, and the lowest
priority to class πn jobs.
Applying the above theorem to Klimov’s model we can generate the optimal
policy, which is a priority rule dictated by the permutation π, which, in turn, is
generated by Algorithm 1.
The materials presented here are drawn from Chapter 11 of the book by Chen
and Yao [7], to which the reader is also referred for preliminaries in queueing
networks. A standard reference to matroid, as well as polymatroid, is Welsh [47].
The equivalence of the first two definitions of the polymatroid, Definitions 1 and
110 D.D. Yao
2, is a classical result; refer to, e.g., Edmonds [13], Welsh [47], and Dunstan and
Welsh [12].
The original version of conservation laws, due to Kleinrock [31], takes the
form of a single equality constraint, i∈E xi = b(E) or = f (E). In the works
of Coffman and Mitrani [9], and Gelenbe and Mitrani [20], the additional in-
equality constraints were introduced, which, along with the equality constraint,
give a full characterization of the performance space. In a sequence of papers,
Federgruen and Groenevelt [15,16,17], established the polymatroid structure of
the performance space of several queueing systems, by showing that the RHS
(right hand side) functions are increasing and submodular.
Shanthikumar and Yao [39] revealed the equivalence between conservations
laws and the polymatroid nature of the performance polytope. In other words,
the increasingness and submodularity of the RHS functions are not only sufficient
but also necessary conditions for conservation laws. This equivalence is based on
two key ingredients: On the one hand, the polymatroid Definition 2 asserts that
if the “vertex” xπ — generated through a triangular system of n linear equations
(made out of a total of 2n − 1 inequalities that define the polytope) — belongs to
the polytope (i.e., if it satisfies all the other inequalities), for every permutation,
π, then the polytope is a polymatroid. On the other hand, in conservation laws
the RHS functions that characterize the performance polytope can be defined
in such a way that they correspond to those “vertices”. This way, the vertices
will automatically belong to the performance space, since they are achievable by
priority rules.
The direct implication of the connection between conservation laws and poly-
matroid is the translation of the scheduling (control) problem into an optimiza-
tion problem. In the case of a linear objective, the optimal solution follows im-
mediately from examining the primal-dual pair: primal feasibility is guaranteed
by the polymatroid property — all vertices belong to the polytope, and dual
feasibility, along with complementary slackness, yields the priority indices.
Motivated by Klimov’s problem, Tsoucas [42], and Bertsimas and Ninõ-Mora
[3] extended conservation laws and related polymatroid structure to GCL and
EP. The key ingredients in the conservation laws/polymatroid theory of [39]
are carried over to GCL/EP. In particular, EP is defined completely analogous
to the polymatroid Definition 2 mentioned above, via the “vertex” xπ ; whereas
GCL is such that for every permutation π, xπ corresponds to a priority rule, and
thereby guarantees its membership to the performance polytope. The equivalent
definitions for EP in Definition 7 are due to Lu [34] and Zhang [52] (also see
[51]).
Dynamic scheduling of a multi-class stochastic network is a complex and
difficult problem that has continued to attract much research effort. A sample
of more recent works shows a variety of different approaches to the problem,
from Markov decision programming (e.g., Harrison [27], Weber and Stidham
[45]), monotone control of generalized semi-Markov processes (Glasserman and
Yao [24,25]), to asymptotic techniques via diffusion limits (Harrison [28], and
Harrison and Wein [29]). This chapter presents yet another approach, which is
Dynamic Scheduling via Polymatroid Optimization 111
References
1. Asmussen, S., Applied Probability and Queues. Wiley, Chichester, U.K., 1987.
2. Bertsimas, D., The Achievable Region Method in the Optimal Control of Queueing
Systems; Formulations, Bounds and Policies. Queueing Systems, 21 (1995), 337–
389.
3. Bertsimas, D. and Niño-Mora, J., Conservation Laws, Extended Polymatroid and
Multi-Armed Bandit Problems: A Unified Approach to Indexable Systems. Math-
ematics of Operations Research, 21 (1996), 257–306.
4. Bertsimas, D. Paschalidis, I.C. and Tsitsiklis, J.N., Optimization of Multiclass
Queueing Networks: Polyhedral and Nonlinear Characterization of Achievable Per-
formance. Ann. Appl. Prob., 4 (1994), 43–75.
5. Baras, J.S., Dorsey, A.J. and Makowski, A.M., Two Competing Queues with Linear
Cost: the μc Rule Is Often Optimal. Adv. Appl. Prob., 17 (1985), 186–209.
6. Buyukkoc, C., Varaiya, P. and Walrand, J., The cμ Rule Revisited. Adv. Appl.
Prob., 30 (1985), 237–238.
7. Chen, H. and Yao, D.D., Fundamentals of Queueing Networks: Performance,
Asymptotics and Optimization. Springer-Verlag, New York, 2001.
8. Chvátal, V., Linear Programming. W.H. Freeman, New York, 1983.
112 D.D. Yao
31. Kleinrock, L., Queueing Systems, Vol. 2. Wiley, New York, 1976.
32. Klimov, G.P., Time Sharing Service Systems, Theory of Probability and Its Appli-
cations, 19 (1974), 532–551 (Part I) and 23 (1978), 314–321 (Part II).
33. Lai, T.L. and Ying, Z., Open Bandit Processes and Optimal Scheduling of Queueing
Networks. Adv. Appl. Prob., 20 (1988), 447-472.
34. Lu, Y., Dynamic Scheduling of Stochastic Networks with Side Constraints. Ph.D.
Thesis, Columbia University, 1998.
35. Meilijson, I. and Weiss, G., Multiple Feedback at a Single-Server Station. Stochastic
Proc. and Appl., 5 (1977), 195–205.
36. Ross, K.W. and Yao, D.D., Optimal Dynamic Scheduling in Jackson Networks.
IEEE Transactions on Automatic Control, 34 (1989), 47-53.
37. Ross, K.W. and Yao, D.D., Optimal Load Balancing and Scheduling in a Dis-
tributed Computer System. Journal of the Association for Computing Machinery,
38 (1991), 676–690.
38. Shanthikumar, J.G. and Sumita, U., Convex Ordering of Sojourn Times in Single-
Server Queues: Extremal Properties of FIFO and LIFO Service Disciplines. J. Appl.
Prob., 24 (1987), 737–748.
39. Shanthikumar J.G. and Yao D.D., Multiclass Queueing Systems: Polymatroid
Structure and Optimal Scheduling Control. Operation Research, 40 (1992), Sup-
plement 2, S293–299.
40. Smith, W.L., Various Optimizers for Single-Stage Production. Naval Research Lo-
gistics Quarterly, 3 (1956), 59–66.
41. Tcha, D. and Pliska, S.R., Optimal Control of Single-Server Queueing Networks
and Multiclass M/G/1 Queues with Feedback. Operations Research, 25 (1977),
248–258.
42. Tsoucas, P., The Region of Achievable Performance in a Model of Klimov. IBM
Research Report RC-16543, IBM Research Division, T.J. Watson Research Center,
Yorktown Hts., New York, NY 10598, 1991.
43. Varaiya, P., Walrand, J. and Buyyokoc, C., Extensions of the Multiarmed Bandit
Problem: The Discounted Case. IEEE Trans. Automatic Control, 30 (1985), 426–
439.
44. Weber, R., On the Gittins Index for Multiarmed Bandits. Annals of Applied Prob-
ability, (1992), 1024–1033.
45. Weber, R. and Stidham, S., Jr., Optimal Control of Service Rates in Networks of
Queues. Adv. Appl. Prob., 19 (1987), 202–218.
46. Weiss, G., Branching Bandit Processes. Probability in the Engineering and Infor-
mational Sciences, 2 (1988), 269–278.
47. Welsh, D., Matroid Theory, (1976), Academic Press, London.
48. Whittle, P., Multiarmed Bandits and the Gittins Index. J. Royal Statistical Society,
Ser. B, 42 (1980), 143–149.
49. Whittle, P., Optimization over Time: Dynamic Programming and Stochastic Con-
trol, vols. I, II, Wiley, Chichester, 1982.
50. Yao, D.D. and Shanthikumar, J.G., Optimal Scheduling Control of a Flexible Ma-
chine. IEEE Trans. on Robotics and Automation, 6 (1990), 706–712.
51. Yao, D.D. and Zhang, L., Stochastic Scheduling and Polymatroid Optimization,
Lecture Notes in Applied Mathematics, 33, G. Ying and Q. Zhang (eds.), Springer-
Verlag, 1997, 333–364.
52. Zhang, L., Reliability and Dynamic Scheduling in Stochastic Networks. Ph.D. The-
sis, Columbia University, 1997.
Workload Modeling for Performance Evaluation
Dror G. Feitelson
1 Introduction
M.C. Calzarossa and S. Tucci (Eds.): Performance 2002, LNCS 2459, pp. 114–141, 2002.
c Springer-Verlag Berlin Heidelberg 2002
Workload Modeling for Performance Evaluation 115
the arrival process. But if we consider a complete computer system, the problem
becomes more complex [13,11]. For example, a computer program may require a
certain amount of CPU time, memory, and I/O, and these resource requirements
may be interleaved in various ways during its execution. In addition there are
several levels at which we might model the system: we can study the functional
units used by a stream of instructions, the subsystems used by a job during its
execution, or the requirements of jobs submitted to the system over time. Each
of these scales is relevant for the design and evaluation of different parts of the
system: the CPU, the hardware configuration, or the operating system.
The main domain used as a source of examples in this survey is that of paral-
lel job scheduling. Workloads in this field are interesting due to the combination
of being relatively small and at the same time relatively complex. The size of
typical workloads is tens of thousands of jobs, as opposed to millions of packets
in communication workloads. These workloads are characterized by a large num-
ber of factors, including the job sizes, runtimes, runtime estimates, and arrival
patterns. The complexity derives not only from the multiple factors themselves,
but from various correlations between them. Research on these issues is facili-
tated by the availability of data and models in the Parallel Workloads Archive
[60]. In addition, there are several documented cases of how workload parameters
influence the outcomes of performance evaluation studies [53,57,25].
2 Data Sources
The most readily available source of data is accounting or activity logs. Such
logs are kept by the system for auditing, and record selected attributes of all
activities. For example, many computer systems keep a log of all executed jobs.
In large scale parallel systems, these logs can be quite detailed and are a rich
source of information for workload studies [60]. Another example is web servers,
that are often configured to log all requests.
A good example is provided by the analysis of three months of activity on
the 128-node NASA Ames iPSC/860 hypercube supercomputer. This analysis
provided the following data [29]:
– The distribution of job sizes (in number of nodes) for system jobs, and for
user jobs classified according to when they ran: during the day, at night, or
on the weekend.
– The distribution of total resource consumption (node seconds), for the same
job classifications.
– The same two distributions, but classifying jobs according to their type:
those that were submitted directly, batch jobs, and Unix utilities.
– The changes in system utilization throughout the day, for weekdays and
weekends.
– The distribution of multiprogramming level seen during the day, at night,
and on weekends. This also included the measured down time (a special case
of 0 multiprogramming).
– The distribution of runtimes for system jobs, sequential jobs, and parallel
jobs, and for jobs with different degrees of parallelism. This included a con-
nection between common runtimes and the queue time limits of the batch
scheduling system.
– The correlation between resource usage and job size, for jobs that ran during
the day, at night, and over the weekend.
– The arrival pattern of jobs during the day, on weekdays and weekends, and
the distribution of interarrival times.
– The correlation between the time of day a job is submitted and its resource
consumption.
– The activity of different users, in terms of number of jobs submitted, and
how many of them were different.
– Profiles of application usage, including repeated runs by the same user and
by different users, on the same or on different numbers of nodes.
– The dispersion of runtimes when the same application is executed many
times.
Note, however, that accounting logs do not always exist at the desired level
of detail. For example, even if all communication on a web server is logged,
this is at the request level, not at the packet level. To obtain packet-level data,
specialized instrumentation is needed.
Workload Modeling for Performance Evaluation 117
– In the two-year log of jobs run on the SDSC Paragon parallel machine, there
is a large concentration of short jobs that arrive at 3:30 AM on different
days. This is probably due to periodic invocation of administrative scripts.
– In the two-year log of jobs run on the SDSC SP2 parallel machine, there is
a single hour in which a single user submitted some 580 similar jobs.
3 Workload Modeling
There are two common ways to use a measured workload to analyze or evaluate
a system design [32]: (1) use the traced workload directly to drive a simulation,
or (2) create a model from the trace and use the model for either analysis or
simulation. For example, trace-driven simulations based on large address traces
are often used to evaluate cache designs [45,42]. But models of how applications
traverse their address space have also been proposed, and provide interesting
insights into program behavior [71,72].
that ran on a given system during this year. A synthetic workload can then
be generated according to the model, by sampling from the distributions that
constitute the model.
The question of what exactly to model, and at what degree of detail, is a
hard one. On one hand, we want to fully characterize all important workload
attributes. On the other hand a parsimonious model is more manageable, as
there are less parameters whose values need to be assessed and whose influence
needs to be studied. Also, there is a danger of over-fitting a particular workload
at the expense of generality.
C T C S P 2 J a n n m o d e l
1 1
s e r ia l s e r ia l
2 -4 2 -4
0 .8 5 -8 0 .8 5 -8
c u m m u la tiv e p r o b a b ility
c u m m u la tiv e p r o b a b ility
9 -3 2 9 -3 2
> 3 2 > 3 2
0 .6 0 .6
0 .4 0 .4
0 .2 0 .2
0 0
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0
r u n tim e [s ] r u n tim e [s ]
S D S C S P 2 F e ite ls o n m o d e l
1 1
s e r ia l s e r ia l
2 -4 2 -4
0 .8 5 -8 0 .8 5 -8
c u m m u la tiv e p r o b a b ility
c u m m u la tiv e p r o b a b ility
9 -3 2 9 -3 2
> 3 2 > 3 2
0 .6 0 .6
0 .4 0 .4
0 .2 0 .2
0 0
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0
r u n tim e [s ] r u n tim e [s ]
Fig. 1. Distributions of runtimes for different ranges of job sizes, in two workload logs
and two models of parallel jobs.
have used hyper-Erlang distributions to create models that match the first 3
moments of a distribution [41]. However, such summaries may be misleading,
because they may not represent the shape of the distribution correctly. Specifi-
cally, in the Jann models, the distributions become distinctly bimodal, whereas
the original data is much more continuous (Figure 1). The Feitelson model,
which uses a three-stage hyper-exponential distribution, more closely resembles
the original data in this respect.
The use of distributions with the right shape is not just an esthetic is-
sue. Some 25 years ago Lazowska showed that using models based on a hyper-
exponential distribution with matching moments to evaluate a simple queueing
system leads to inaccurate results [50], and advocated the use of distributions
with matching percentiles instead. He also noted that a hyper-exponential distri-
bution has three parameters, whereas the mean and standard deviation of data
only define two, so many different hyper-exponential distributions that match
the first two moments are possible — and lead to different results.
Table 1. Sensitivity of statistics to the largest data points. Data regarding runtimes
on the CTC SP2 machine from [23] courtesy of Allen Downey.
Another problem with using statistics based on high moments of the data is
that they are very sensitive to rare large samples [23]. Table 1 shows data based
on the runtimes of 50866 parallel jobs from the CTC SP2 machine. Removing just
the top 5 values causes the mean to drop by 2%, and the coefficient of variation
(the standard deviation divided by the mean) to drop by 29%. The median, as
a representative of order statistics, only changes by 0.2%. As the extreme values
observed in a sample are not necessarily representative, this implies that the
model may be largely governed by a small number of unrepresentative samples.
Finding a distribution that matches given moments is relatively easy, be-
cause it can be done based on inverting equations that relate a distribution’s
parameters to its moments. Finding a distribution that fits a given shape is typ-
ically harder [54]. One possibility is to use a maximum likelihood method, which
finds the parameters that most likely gave rise to the observed data. Another
option is to use an iterative method, in which the goodness of fit at each stage
is quantified using the Chi-square test, the Kolmogorov-Smirnov test, or the
Workload Modeling for Performance Evaluation 123
Table 2. Correlation coefficient of runtime and size for different parallel supercom-
puter workloads.
System Correlation
CTC SP2 −0.029
KTH SP2 0.011
SDSC SP2 0.145
LANL CM-5 0.211
SDSC Paragon 0.305
Establishing whether or not a correlation exists is not always easy. The com-
monly used correlation coefficient only yields high values if a strong linear rela-
tionship exists between the variables. In the example of the size and runtime of
parallel jobs, the correlation coefficient is typically rather small (Table 2), and a
scatter plot shows no significant correlation either (Figure 2). However, these two
attributes are actually correlated with each other, as seen from the distributions
for the CTC and SDSC logs in Figure 1. In both of these, the distribution of
runtimes for ranges of larger job-sizes distinctly favors longer runtimes, whereas
smaller jobs sizes favor short runtimes1 .
A coarse way to model correlation, which avoids this problem altogether, is to
represent the workload as a set of points in a multidimensional space, and apply
1
The only exception is the serial jobs on the CTC machine, which have very long
runtimes; but this anomaly is unique to the CTC workload.
124 D.G. Feitelson
Fig. 2. The correlation between job sizes and runtimes on parallel supercomputers.
The scatter-plot data is from the SDSC Paragon parallel machine.
clustering [13]. For example, each job can be represented by a tuple including its
runtime, its size, its memory usage, and so on. By clustering we can then select
a small number of representative jobs, as use them as the basis of our workload
model; each such job comes with a certain (representative) combination of values
for the different attributes. However, many workloads do not cluster nicely —
rather, attribute values come from continuous distributions, and many different
combinations are all possible.
The direct way to model a correlation between two attributes is to use the
joint distribution of the two attributes. This suffers from two problems. One is
that it may be expected to be hard to find an analytical distribution function
that matches the data. The other is that for a large part of the range, the data
may be very sparse. For example, most parallel jobs are small and run for a
short time, so we have a lot of data about small short jobs. But we may not
have enough data about large long jobs to say anything meaningful about the
distribution — we just have a small set of unrelated samples.
The typical solution is therefore to divide the range of one attribute into
sub-ranges, and model the distribution of the other attribute for each such sub-
range. For example, the Jann model of supercomputer workloads divides the job
size scale according to powers of two, and creates an independent model of the
runtimes for each range of sizes [41]. As can be seen in Figure 1, these models
are completely different from each other. An alternative is to use the same model
for all subranges, and define a functional dependency of the model parameters
on the subrange. For example, the Feitelson model first selects the size of each
job according to the distribution of job sizes, and then selects a runtime from a
distribution of runtimes that is conditioned on the selected size [28]. Specifically,
the runtime is selected from a two-stage hyperexponential distribution, and the
probability for using the exponential with the higher mean is linearly dependent
on the size:
Thus, for small jobs (the job size n is small relative to the machine size N ) the
probability of using the exponential with the smaller mean is 0.95, and for large
jobs this drops to 0.75.
1 0 0 1
9 0
8 0 0 .1
C u m m u la tiv e P e r c e n t
s u r v iv a l p r o b a b ility ( lo g )
7 0
n u m b e r
0 .0 1
6 0 o f file s
5 0
0 .0 0 1
4 0
d is k s p a c e
3 0 0 .0 0 0 1 U n ix file s
2 0 P a re to a = 1 .2 5
1 0 1 e -0 5
0
0 1 8 6 4 5 1 2 4 K 3 2 K 2 5 6 K 2 M 1 6 M 1 2 8 M 1 G 1 e -0 6
1 1 0 0 1 0 0 0 0 1 e + 0 6 1 e + 0 8
F ile S iz e
file s iz e ( lo g )
Fig. 3. The distribution of file sizes, from a 1993 survey of 12 million Unix files [40].
Left: 90% of the files are less than 16KB long, and use only some 10% of the total
disk space. Half the disk space is occupied by a very small fraction of large files. Right:
log-log complementary distribution plot, with possible Pareto model of the tail; see
Equation (2).
where F̄ (x) is the survival function (that is, F̄ (x) = 1−F (x)), and ∼ means “has
the same distribution”. This is a very strong statement. Consider an exponential
distribution. The probability of sampling a value larger than say 100 times the
mean is e−100 , which is totally negligible for all intents and purposes. But for a
Pareto distribution with a = 2, this probability is 1/40000: one in every 40000
samples will be bigger than 100 times the mean. While rare, such events can
certainly happen. When the shape parameter is a = 1.1, and the tail is heavier,
this probability increases to one in 2216 samples.
An important characteristic of heavy tailed distributions is that some of their
moments may be undefined. Specifically, using the above definition, if a ≤ 1 the
mean will be undefined, and if a ≤ 2 the variance will be undefined. But what
does this mean? Consider a Pareto distribution with a = 1, whose probability
density is proportional to x−2 . Trying to evaluate its mean leads to
)
1
E[x] = cx 2 dx = c ln x
x
so the mean is infinite. But for any finite number of samples, the mean obviously
exists. The answer is that the mean grows logarithmically with the number of
observations. However, this statement is misleading, as the running mean does
not actually resemble the log function. In fact, it grows in big jumps every time a
between, are missing from this picture.
128 D.G. Feitelson
4 0
3 5
r u n n in g a v e r a g e 3 0
2 5
2 0
1 5
lo g ( x )
1 0
0 2 0 4 0 6 0 8 0 1 0 0
s a m p le s iz e ( m illio n s )
Fig. 4. Examples of the running mean of samples from a Pareto distribution. Four
plots using different random number generator seeds are shown.
large observation from the tail of the distribution is sampled, and then it slowly
decays again towards the log function (Figure 4).
The definition (1) can also be used to determine if a given data set is heavy
tailed. Taking the log from both sides we observe that
So plotting log F̄ (x) (the log of the fraction of observations larger than x) as a
function of log x should lead to a straight line with slope −a (this is sometimes
called a “log-log complementary distribution plot”, or LLCD, see Figure 3).
This technique can be further improved by aggregating successive observa-
tions (that is, replacing each sequence of k observations by their sum). Distribu-
tions for which such aggregated random variables have the same distribution as
the original are called stable distributions. The Normal distribution is the only
stable distribution with finite variance. Heavy tailed distributions (according to
definition (1)) are also stable, but have an infinite variance. Thus the central
limit theorem does not apply, and the aggregated random variables do not have
a Normal distribution. Rather, they have the same heavy-tailed distribution.
This can be verified by creating LLCD plots of the aggregated samples, and
checking that they too are straight lines with the same slope as the original [19,
18]. If the distribution is not heavy tailed, the aggregated samples will tend to
be Normally distributed (the more so as the level of aggregation increases), and
the slopes of the LLCD plots will increase with the level of aggregation.
Using these and other procedures, the following have been argued to be heavy
tailed:
– Process runtimes on general purpose workstations [51,37]. Note that this
only applies to the tail of the distribution, i.e. to processes longer than a
certain threshold. Measurements show the power to be close to 1.
Workload Modeling for Performance Evaluation 129
model tail k
Pr[T > t] = t−k [51] (’86) > 3s 1.05–1.25
[37] (’96) > 1s 0.78–1.29
– File sizes on a general purpose system (Figure 3), again limited to the tail
of the distribution. There has been some discussion on whether this is best
modeled by a Pareto or a lognormal distribution, but at least some data sets
seem to fit a Pareto model better, and in any case they are highly skewed
[22].
– Various aspects of Internet traffic, specifically [62,69]
• Flow sizes
• FTP data transfer sizes
• TELNET packet interarrival times
– Various aspects of web server load, specifically [18,6]
• The tail of the distribution of file sizes on a server
• The distribution of request sizes
• The popularity of the different files (this is a Zipf distribution — see
below)
• The distribution of off times (between requests)
• The distribution of the number of embedded references in a web page
– The popularity of items (e.g. pages on the web) is often found to follow Zipf’s
Law [77], which is also a power law [7]. Assume a set of items are ordered
according to their popularity counts, i.e. according to how many times each
was selected. Zipf’s Law is that the count y is inversely proportional to the
rank r according to
y ≈ r−b b≈1 (3)
This means that there are r items with count larger than y, or
Pr[Y > y] = C · y −a
the uptime of the computer; a file cannot be larger than the total available disk
space).
One simple option is to postulate a certain upper bound on the distribution,
but this does not really solve the problem because the question of where to
place the bound remains unanswered. Another option is to try fitting alternative
distributions for which all moments converge. For example, there have been
successful attempts to model file sizes using a lognormal distribution rather than
a Pareto distribution [22]. This has the additional benefit of fitting the whole
distribution rather than just the tail.
A more general approach is to use phase-type distributions, which employ
a mixture of exponentials. Consider a simple example, in which N samples are
drawn from an exponential distribution, and one additional sample is a far out-
lier. This can be modeled as a hyperexponential distribution, with probability
N/(N + 1) to sample from the main exponential, and probability 1/(N + 1) to
sample from a second exponential distribution with a mean equal to the outlier
value. In general, it is possible to construct mixtures of exponentials to fit any
observed distribution [9]. This is especially important for analytical modeling, as
distributions with infinite moments cause severe problems for such analysis. For
simulation the exact definition is somewhat less important, as long as significant
mass is concentrated in the tail.
Self similarity refers to situations in which a phenomenon has the same general
characteristics at different scales [56,67]. In particular, parts of the whole may
be scaled-down copies of the whole, as in well known fractals such as the Cantor
set and the Sierpiński triangle. In natural phenomena we cannot expect perfect
copies of the whole, but we can expect the same statistical properties. A well
known natural fractal is the coast of Britain [56]. Workloads often also display
such behavior.
The first demonstrations of self similarity in computer workloads were for
Internet traffic, and used a striking visual demonstration. A time series rep-
resenting the number of packets transmitted during successive time units was
recorded. At a fine granularity, i.e. when using small time unites, this was seen
to be bursty. But the same bursty behavior persisted also when the time series
was aggregated over several orders of magnitude, by using larger and larger time
units. This contradicted the common Poisson model of packet arrivals, which
predicted that the traffic should average out when aggregated.
Similar demonstrations have since been done for other types of workloads.
Figure 5 gives an example from jobs arriving at a parallel supercomputer. Self
similarity has also been shown in file systems [36] and in web usage [18].
The mathematical description of self similarity is based on the notion of long-
range correlations. Actually, there are correlations at many different time scales:
self similarity implies that the workload at a certain instant is similar to the
workload at other instants at different scales, starting with a short time scale,
Workload Modeling for Performance Evaluation 131
1 0 8 0
p ro c e s e s p e r 3 6 s e c .
jo b s p e r 3 6 s e c .
8 6 0
6
4 0
4
2 2 0
0 0
3 8 0 0 0 0 3 8 2 0 0 0 3 8 4 0 0 0 3 8 6 0 0 0 3 8 8 0 0 0 3 9 0 0 0 0 3 8 0 0 0 0 3 8 2 0 0 0 3 8 4 0 0 0 3 8 6 0 0 0 3 8 8 0 0 0 3 9 0 0 0 0
tim e tim e
5 0 5 0 0
p r o c e s s e s p e r 6 m in .
jo b s p e r 6 m in .
4 0 4 0 0
3 0 3 0 0
2 0 2 0 0
1 0 1 0 0
0 0
3 0 0 0 0 0 3 2 0 0 0 0 3 4 0 0 0 0 3 6 0 0 0 0 3 8 0 0 0 0 4 0 0 0 0 0 3 0 0 0 0 0 3 2 0 0 0 0 3 4 0 0 0 0 3 6 0 0 0 0 3 8 0 0 0 0 4 0 0 0 0 0
tim e tim e
1 0 0 1 2 0 0
p ro c e s s e s p e r 1 h r.
8 0
jo b s p e r 1 h r .
8 0 0
6 0
4 0
4 0 0
2 0
0 0
1 e + 0 6 1 .2 e + 0 6 1 .4 e + 0 6 1 .6 e + 0 6 1 .8 e + 0 6 2 e + 0 6 1 e + 0 6 1 .2 e + 0 6 1 .4 e + 0 6 1 .6 e + 0 6 1 .8 e + 0 6 2 e + 0 6
tim e tim e
5 0 0 5 0 0 0
p ro c e s s e s p e r 1 0 h r.
jo b s p e r 1 0 h r .
4 0 0 4 0 0 0
3 0 0 3 0 0 0
2 0 0 2 0 0 0
1 0 0 1 0 0 0
0 0
3 e + 0 7 3 .2 e + 0 7 3 .4 e + 0 7 3 .6 e + 0 7 3 .8 e + 0 7 4 e + 0 7 3 e + 0 7 3 .2 e + 0 7 3 .4 e + 0 7 3 .6 e + 0 7 3 .8 e + 0 7 4 e + 0 7
tim e tim e
3 0 0 0 2 0 0 0 0
p ro c e s s e s p e r 4 d a y s
2 5 0 0 1 6 0 0 0
jo b s p e r 4 d a y s
2 0 0 0
1 2 0 0 0
1 5 0 0
8 0 0 0
1 0 0 0
5 0 0 4 0 0 0
0 0
0 2 e + 0 7 4 e + 0 7 6 e + 0 7 8 e + 0 7 1 e + 0 8 0 2 e + 0 7 4 e + 0 7 6 e + 0 7 8 e + 0 7 1 e + 0 8
tim e tim e
through medium time scales, and up to long time scales. But the strength of the
correlation decreases as a power law with the time scale.
A model useful for understanding the correlations leading to self similarity is
provided by random walks. In a one-dimensional random walk, each step is either
to the left or to the right with equal probabilities.
√ It is well known that after n
steps the expected distance from the origin is n, or n0.5 . But what happens if
the steps are correlated with each other? If each step has a probability higher
than 12 of being in the same direction as the previous step, we can expect slightly
longer stretches of steps in the same direction. But this is not enough to change
the expected distance from the origin after n steps — is stays n0.5 . This remains
true also if each step is correlated with all previous steps with exponentially
decreasing weights. In both these cases, the correlation only has a short range,
and the effect of each step decays to zero very quickly.
132 D.G. Feitelson
j
yj = zi
i=1
3. The range covered after n steps is the maximum distance that has occurred:
Rn = max yj − min yj
j=1...n j=1...n
If the process is indeed self similar, we expect to see a straight line, and the
slope of the line gives H.
If a long time series is given, the calculation for small values of n is repeated
for non-overlapping sub-series of length n each, and the average is used. An
example of the results of doing so is given in Figure 6, based on the data shown
graphically in Figure 5.
Other ways of checking for self similarity are based on the rate in which the
variance decays as observations are aggregated, or on the decay of the spectral
density, possibly using wavelet analysis [1]. Results of the Variance-time method
are also shown in Figure 6. This is based on aggregating the original time series
(that is, replacing each m consecutive values by their average) and calculating
the variance of the new series. This decays polynomially with a rate of −β,
leading to a straight line with this slope in log-log axes. The Hurst parameter is
then given by
H = 1 − (β/2)
Workload Modeling for Performance Evaluation 133
1 0 0 0 0 1 0 0 0 0
S D S C jo b s
S D S C jo b s H = 0 .8 1 2 S D S C p ro c
S D S C p ro c 1 0 0 0
P o is s o n
P o is s o n
H = 0 .6 5 6 1 0 0
b e ta = 0 .6 8 0
H = 0 .6 5 5
1 0 0 0 1 0
a v e r a g e v a r ia n c e
a v e ra g e R /s
1 b e ta = 0 .4 4 9
H = 0 .4 9 0
H = 0 .7 7 5
0 .1
1 0 0 0 .0 1
0 .0 0 1 b e ta = 1 .0 3 6
H = 0 .4 8 2
0 .0 0 0 1
1 0 1 e -0 5
1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0
n a g g r e g a te s iz e m
Fig. 6. The (R/s)n and variance-time methods for measuring self similarity, applied to
the data in Figure 5. A Poisson process with no self-similarity is included as reference,
as well as linear regression lines.
Heavy tailed distributions and self similarity are intimately tied to each other,
and the modeling of self-similar workloads depends on this. As noted above,
self similarity is a result of long-range correlation in the workload. By using
heavy tailed distributions to create a workload model with the desired long
range correlation, we get a model that also displays self similarity.
The idea is that the workload is not uniform, but rather generated by mul-
tiple on-off processes [75,18,36]. “On” periods are active periods, in which the
workload arrives at the system at a certain rate (jobs per hour, packets per sec-
ond, etc). “Off” periods are inactive periods during which no load is generated.
The complete workload is the result of many such on-off processes.
The crux of the model is the distributions governing the lengths of the on and
off periods. If these distributions are heavy tailed, we get long-range correlation:
if a unit of work arrives at time t, similar units of work will continue to arrive for
the duration d of the on period to which it belongs, leading to a correlation with
subsequent times up to t + d. As this duration is heavy tailed, the correlation
created by this burst will typically be for a short d; but occasionally a long on
period will lead to a correlation over a long span of time. As many different
bursts may be active at time t, what we actually get is a combination of such
correlations for durations that correspond to the distribution of the on periods.
But this is heavy tailed, so we get a correlation that decays polynomially — a
long range dependence.
In some cases, this type of behavior is built in, and a direct result of the
heavy tailed nature of certain workload parameters. For example, given that
web server file sizes are heavy tailed, the distribution of service times will also
be heavy tailed (as the time to serve a file is proportional to its size). During the
time a file is served, data is transmitted at a constant rate. This is correlated
with later transmittals according to the heavy-tailed distribution of sizes and
transmission times, leading to long range correlation and self similarity [18].
134 D.G. Feitelson
The on-off process used for modeling self-similar workloads has another very
important benefit. It provides a mechanism for introducing locality into the
workload, so that not only the statistics will be modeled, but also the dynamics.
The procedure for workload modeling outlined in Section 3.2 was to analyze real
workloads, recover distributions that characterize them, and then sample from
these distributions. The main problem with this procedure is that is loses all
structural information.
A real workload is not a random sampling from a distribution. For example,
the load on a server used by students at a university changes from week to week,
depending on the assignments that are due each time. In each week, everybody
is working on the same task, so the workload is composed of many jobs that are
statistically similar. The next week all the jobs are similar to each other again,
but they are all different from the jobs of the previous week. Over the whole year
we indeed observe a wide distribution with many job types, but at any given
time we do not see a representative sampling of this distribution. Instead, we
only see samples concentrated in a small part of the distribution (Figure 7). The
workload displays a “locality of sampling”3 .
7 0 0 1 2 0
6 0 0 C T C S P 2 C T C S P 2
a v e r a g e d iffe r e n t jo b s iz e s
1 0 0
L A N L C M -5 S D S C P a ra g o n
a v e r a g e d iffe r e n t u s e r s
5 0 0 S D S C P a ra g o n S D S C S P 2
8 0
S D S C S P 2
4 0 0
6 0
3 0 0
4 0
2 0 0
1 0 0 2 0
0 0
d a y w e e k m o n th q u a rte r a ll d a y w e e k m o n th q u a rte r a ll
o b s e r v a tio n w in d o w o b s e r v a tio n w in d o w
Fig. 7. The dynamics of workloads. Left: the active set of users grows with the obser-
vation window. Right: so does the diversity of the workload, in this case represented
by the number of different job sizes observed. Note that the x scale is not linear.
The common way to model workload dynamics is with a user behavior graph
[31]. This is a graph whose nodes represent states. In each state, the user exe-
cutes a certain job with characteristics drawn from a certain distribution. The
3
The existence of such local repetitiveness in workloads was suggested to me by Larry
Rudolph over ten years ago.
Workload Modeling for Performance Evaluation 135
arcs denote the probability of moving from state to state. The graph therefore
encodes a Markovian model of the workload dynamics. A random walk on the
graph, subject to the model’s transition probabilities, creates a random workload
sequence such that the probability of each job matches the limiting probability
of that job’s state, but it also abides by the model of which jobs come after each
other, and how many times a job may be repeated (using self-looping arcs in the
graph) [64]. However, this needs to be adjusted in order to create heavy tailed
distributions.
In a university it may be plausible to argue that all students should be
modeled using the same user behavior graph. But in a production environment
one would expect different users, with different levels of activity and different
behaviors. In addition, the active population changes with time (Figure 7) [23].
Thus what we actually need is not one user behavior graph, but a model of the
user population as a whole: how the population of users changes, and what user
behavior graph each one should have. Using such a model has two important
advantages. First, it has built-in support for generating self-similar workloads
(assuming users have long-tailed on and off activity times). Second, it provides a
good way to control load without modifying the underlying distributions: simply
change the number of users [6].
Another aspect of user behavior, which is not captured by the user behav-
ior graph, is the feedback from the system performance to the generation of
new work. Real users are not oblivious to the system’s behavior: They typically
submit additional work only when existing work is finished. Thus, if the user
population is bounded, the system’s current performance modulates the offered
load, automatically reducing it when congestion occurs, and spreading the load
more evenly over time. But adding this integrates the workload model with the
system, and prevents the use of an independent workload model.
6 Conclusions
Performance evaluation depends on workload modeling. We have outlined the
conceptual framework of such modeling, starting with simple statistical charac-
terization, continuing with the handling of self similarity, and ending with the
need to also model user behavior. But all this is useless without real measured
data from which distributions and parameters can be learned. One of the most
important tasks is to collect large amounts of high resolution data about the
behavior of workloads, and to share this data to facilitate the creation of better
workload models.
Apart from collecting data, there are also many methodological issues that
beg for additional work. These include techniques to analyze and characterize
workloads, evaluations of the relative importance of different workload parame-
ters, and demonstrations of how workloads affect system performance. In all of
these, emphasis should be placed on the dynamics of workloads. And as with the
workload data, it is important to share the programs that perform the analysis
and implement the models — both to facilitate the dissemination and use of new
techniques, and to help ensure that researchers use compatible methodologies.
References
18. M. E. Crovella and A. Bestavros, “Self-similarity in world wide web traffic: evi-
dence and possible causes”. In SIGMETRICS Conf. Measurement & Modeling of
Comput. Syst., pp. 160–169, May 1996.
19. M. E. Crovella and M. S. Taqqu, “Estimating the heavy tail index from scaling
properties”. Methodology & Comput. in Applied Probability 1(1), pp. 55–79, Jul
1999.
20. R. Cypher, A. Ho, S. Konstantinidou, and P. Messina, “A quantitative study of par-
allel scientific applications with explicit communication”. J. Supercomput. 10(1),
pp. 5–24, 1996.
21. A. B. Downey, “A parallel workload model and its implications for processor allo-
cation”. In 6th Intl. Symp. High Performance Distributed Comput., Aug 1997.
22. A. B. Downey, “The structural cause of file size distributions”. In 9th Modeling,
Anal. & Simulation of Comput. & Telecomm. Syst., Aug 2001.
23. A. B. Downey and D. G. Feitelson, “The elusive goal of workload characterization”.
Performance Evaluation Rev. 26(4), pp. 14–29, Mar 1999.
24. A. Erramilli, U. Narayan, and W. Willinger, “Experimental queueing analysis
with long-range dependent packet traffic”. IEEE/ACM Trans. Networking 4(2),
pp. 209–223, Apr 1996.
25. D. G. Feitelson, Analyzing the Root Causes of Performance Evaluation Results.
Technical Report 2002–4, School of Computer Science and Engineering, Hebrew
University, Mar 2002.
26. D. G. Feitelson, “The forgotten factor: facts”. In EuroPar, Springer-Verlag, Aug
2002. Lect. Notes Comput. Sci.
27. D. G. Feitelson, “Memory usage in the LANL CM-5 workload”. In Job Scheduling
Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 78–94,
Springer Verlag, 1997. Lect. Notes Comput. Sci. vol. 1291.
28. D. G. Feitelson, “Packing schemes for gang scheduling”. In Job Scheduling Strate-
gies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 89–110,
Springer-Verlag, 1996. Lect. Notes Comput. Sci. vol. 1162.
29. D. G. Feitelson and B. Nitzberg, “Job characteristics of a production parallel sci-
entific workload on the NASA Ames iPSC/860”. In Job Scheduling Strategies for
Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 337–360, Springer-
Verlag, 1995. Lect. Notes Comput. Sci. vol. 949.
30. D. G. Feitelson and L. Rudolph, “Metrics and benchmarking for parallel job
scheduling”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitel-
son and L. Rudolph (eds.), pp. 1–24, Springer-Verlag, 1998. Lect. Notes Comput.
Sci. vol. 1459.
31. D. Ferrari, “On the foundation of artificial workload design”. In SIGMETRICS
Conf. Measurement & Modeling of Comput. Syst., pp. 8–14, Aug 1984.
32. D. Ferrari, “Workload characterization and selection in computer performance
measurement”. Computer 5(4), pp. 18–24, Jul/Aug 1972.
33. K. Ferschweiler, M. Calzarossa, C. Pancake, D. Tessera, and D. Keon, “A commu-
nity databank for performance tracefiles”. In Euro PVM/MPI, Y. Cotronis and
J. Dongarra (eds.), pp. 233–240, Springer-Verlag, 2001. Lect. Notes Comput. Sci.
vol. 2131.
34. R. Gibbons, “A historical application profiler for use by parallel schedulers”. In
Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph
(eds.), pp. 58–77, Springer Verlag, 1997. Lect. Notes Comput. Sci. vol. 1291.
35. S. D. Gribble and E. A. Brewer, “System design issues for internet middleware
services: deductions from a large client trace”. In Symp. Internet Technologies and
Systems, USENIX, Dec 1997.
Workload Modeling for Performance Evaluation 139
1 Introduction
Performance, around-the-clock availability, and security are the most common
indicators of quality of service on the Internet. Management faces a twofold
challenge. On the one hand, it has to meet customer expectations in terms of
quality of service. On the other hand, companies have to keep IT costs under
control to stay competitive. Many possible alternative architectures can be used
to implement a Web service; one has to be able to determine the most cost-
effective architecture and system. This is where the quantitative approach and
capacity planning techniques come into play. This tutorial introduces capacity
planning [19,1] as an essential tool for managing quality of service on the Web
and presents a methodology, where the main steps are: understanding the envi-
ronment, characterizing the workload, modeling the workload, validating and cal-
ibrating the models, predicting the performance, analyzing the cost-performance
plans, and suggesting actions. It provides a framework for planning the capacity
of Web services and understanding their behavior. The tutorial also discusses a
state transition graph called Customer Behavior Model Graph (CBMG), that is
used to describe the behavior of groups of users who exhibit similar navigational
patterns. The rest of the paper is organized as follows. Section two presents the
main steps of the capacity planning methodology. Section three discusses the
M.C. Calzarossa and S. Tucci (Eds.): Performance 2002, LNCS 2459, pp. 142–157, 2002.
c Springer-Verlag Berlin Heidelberg 2002
Capacity Planning for Web Services 143
Planning the capacity of Web services requires that a series of steps be followed
in a systematic way. Figure 1 gives an overview of the main steps of the quanti-
tative approach to analyze Web services. The starting point of the process is the
business model and its measurable objectives, which are used to establish service
level goals and to find out the applications that are central to the goals. Once
the business model and its quantitative objectives have been understood, one
is able to go through the quantitative analysis cycle. We now cover the various
steps of the capacity planning process.
The first step entails obtaining an in-depth understanding of the service architec-
ture. This means answering questions such as: What are the system requirements
of the business model? What is the configuration of the site in terms of servers
and internal connectivity? How many internal layers are there in the site? What
types of servers (i.e., HTTP, database, authentication, streaming media) is the
site running? What type of software (i.e., operating system, HTTP server soft-
ware, transaction monitor, DBMS) is used in each server machine? How reliable
and scalable is the architecture? This step should yield a systematic descrip-
tion of the Web environment, its components, and services. This initial phase of
the process consists of learning what kind of hardware and software resources,
network connectivity, and network protocols, are present in the environment. It
also involves the identification of peak usage periods, management structures,
and service-level agreements. This information is gathered by various means in-
cluding user group meetings, audits, questionnaires, help desk records, planning
documents, interviews, and other information-gathering techniques [14].
Table 1 summarizes the main elements of a system that must be catalogued
and understood before the remaining steps of the methodology can be taken.
Element Description
Web Server Quantity, type, configuration, and function.
Application Server Quantity, type, configuration, and function.
Database Server Quantity, type, configuration, and function.
Middleware Type (e.g., TP monitors and DBMS).
Application Main applications.
Network connectivity Network connectivity diagram showing LANs,
WANs, routers, servers, etc.
Network protocols List of protocols used.
Service-level agreements Existing SLAs per application or service.
User Community Number of potential users, geographic location, etc.
Procurement procedures Elements of the procurement process, expenditure limits.
Performance Model
Develop Forecast
Performance Models Workload Evolution
production and operations management [12]. For example, one could forecast
number and type of employees, volume and type of production, product de-
mand, volume and destination of products. In the Internet, demand forecasting
is essential for guaranteeing quality of service. It is critical for the operation of
Web services. Let us consider the following scenario [16]. Unprecedented demand
for the newest product slows Web servers to a crawl. The company servers were
overwhelmed on Tuesday as a wave of customers attempted to download the
company’s new software product. Web services, in terms of responsiveness and
speed, started degrading as more and more customers tried to access the service.
And it is clear that many frustrated customers simply stopped trying. This un-
desirable scenario emphasizes the importance of good forecasting and planning
for Web environments.
A good forecast is more than just a single number; it is a set of scenarios and
assumptions. Time plays a key role in the forecasting process. The longer the
time horizon, the less accurate the forecast will be. Forecasting horizons can be
grouped into the classes: short term (e.g., less than three months), intermediate
term (e.g., from three months to one year) and long term (e.g., more than 2
years). Demand forecasting in the Web can be illustrated by typical questions
that come up very often during the course of capacity planning projects. Can
we forecast the number of visitors to the company’s Web site in order to plan
the adequate capacity to support the load? What is the expected workload for
the credit card authorization service during the Christmas season? How will the
number of messages processed by the e-mail servers vary over the next year?
What will be the number of simultaneous users for the streaming media ser-
vices six months from now? Implementation of Web services should rely on a
careful planning process, a planning process that pays attention to performance
and capacity right from the beginning. The goal of this step is to use existing
forecasting methods and techniques to predict future workload for Web services.
Capacity Planning for Web Services 147
the resource usage may be grouped into workload classes. Depending on the way
a given class is processed by a system, it may be classified as one of two types:
open, or closed.
Servers or service centers, are components of performance models intended
to represent the resources of a system. The first step in specifying a model is
the definition of the servers that make up the model. The scope of the capacity
planning project helps to select which servers are relevant to the performance
model. Consider the case of a Web site composed of Web servers, application and
database servers connected via a LAN. The capacity planner wants to examine
the impact caused on the system by the estimated growth of sales transactions.
The specific focus of the project may be used to define the components of a per-
formance model. For example, the system under study could be well represented
by an open queueing network model consisting of queues, which correspond to
the servers of the site. A different performance model, with other queues, would
be required if the planner were interested in studying the effect of a proxy cache
on the performance of the system.
− a v a ila b ility
M e tr ic s : − r e s p o n s e tim e
− th ro u g h p u t
U s e r W o r k lo a d P e rfo rm a n c e
M o d e l M o d e l M o d e l
W h a t− if q u e s tio n s W h a t− if q u e s tio n s
r e g a r d in g im p a c ts o f re g a rd in g im p a c ts o f
u s e r b e h a v io r w o r k lo a d , a r c h ite c tu r e , a n d
c o n fig u r a tio n c h a n g e s
for investing in the necessary upgrades. The personnel plan determines what
changes in the support personnel size and structure must be made in order to
accommodate changes in the system.
ous e-business functions, and times between access to the various services offered
by the site. A customer model can be useful for navigational and workload pre-
diction.
4 Workload Models
A workload model is a representation that mimics the real workload under study.
Although each system may require a specific approach to characterize and gen-
erate a workload model, there are some general guidelines that apply well to all
types of systems [4]. The common steps to be followed by any workload charac-
terization include: (1) specification of a point of view from which the workload
will be analyzed, (2) choice of the set of parameters that captures the most rel-
evant characteristics of the workload for the purpose of capacity planning, (3)
monitoring the system to obtain the raw performance data, (4) analysis and
reduction of performance data, (5) construction of a workload model, and (6)
verification that the model does capture all the important performance informa-
tion.
Graphs are also used to represent workloads. For example, a graph-based
model can be used to characterize Web sessions and generate information for
constructing workload models. This section concentrates on models that rep-
resent the behavior of users (i.e, customers). User models capture elements of
user behavior in terms of navigational patterns, Web service functions used,
frequency of access to the various functions, and times between access to the
various services offered by the site. Two different types of models are commonly
used in the capacity planning methodology.
152 V.A.F. Almeida
0.30
2
0.50 0.30
Browse
0.1
0.25
1 6 5 4
0.20 0.2
Entry Pay Add to Cart Select
0.60 0.20
1.0
0.10 0.1
0.45
0.1
0.40
0.50
Search
0.30
3
0.40
Consider the CBMG of Figure 3. This CBMG has seven states; the Exit state,
state seven, is not explicitly represented in the figure. Let Vj be the average
number of times that state j of the CBMG is visited for each visit to the e-
business site, i.e., for each visit to the state Entry. Consider the Add to Cart
state. We can see that the average number of visits (VAdd ) to this state is equal
Capacity Planning for Web Services 153
to the average number of visits to the state Select (VSelect ) multiplied by the
probability (0.2) that a customer will go from Select to Add Cart. We can then
write the relationship
VAdd = VSelect × 0.2. (1)
Consider now the Browse state. The average number of visits (VBrowse ) to this
state is equal to the average number of visits to state Search (VSearch ) multiplied
by the probability (0.2) that a customer will go from Search to Browse, plus the
average number of visits to state Select (VSelect ) multiplied by the probability
(0.30) that a customer will go from Select to Browse, plus the average number
of visits to the state Add to Cart (VAdd ) multiplied by the probability (0.25)
that a customer will go from Add to Cart to Browse, plus the average number
of visits to the state Browse (VBrowse ) multiplied by the probability (0.30) that
a Customer will remain in the Browse state, plus the number of visits to the
Entry state multiplied by the probability (0.5) of going from the Entry state to
the Browse state. Hence,
n−1
Vj = Vk × pk,j , (3)
k=1
where pk,j is the probability that a customer makes a transition from state k
to state j. Note that the summation in Eq. (3) does not include state n (the
Exit state) since there are no possible transitions from this state to any other
state. Since V1 = 1 (because state 1 is the Entry state), we can find the average
number of visits Vj by solving the system of linear equations
V1 = 1 (4)
n−1
Vj = Vk × pk,j j = 2, · · · , n − 1. (5)
k=1
Note that Vn = 1 since, by definition, the Exit state is only visited once per
session.
Useful metrics can be obtained from the CBMG. Once we have the average
number of visits (Vj ) to each state of the CBMG, we can obtain the average
session length as
n−1
AverageSessionLength = Vj . (6)
j=2
154 V.A.F. Almeida
For the the visit ratios of Fig. 3, the average session length is
5 Performance Models
Performance models represent the way system’s resources are used by the work-
load and capture the main factors determining system performance. These mod-
els use information provided by workload models and system architecture de-
scription. Performance models are used to compute both traditional performance
metrics such as response time, throughput, utilization, and mean queue length
as well as innovative business-oriented performance metrics, such as revenue
throughput or lost-revenue throughput. Basically, performance models can be
grouped into two categories: analytic and simulation models. Performance mod-
els help us understand the quantitative behavior of complex systems, such as
electronic business applications, e-government, and entertainment. Performance
models have been used for multiple purposes in systems.
– In the infrastructure design of Web-based applications, various issues call for
the use of models to evaluate system alternatives. For example, a distributed
Web server system is any architecture consisting of multiple Web server hosts
distributed on a LAN, with some sort of mechanism to distribute incoming
requests among the servers. So, for a specific type of workload, what is the
most effective scheme for load balancing in a certain distributed Web server
system? Models are also useful for analyzing document replacement policies
in caching proxies. Bandwidth capacity of certain network links can also be
estimated by performance models. In summary, performance models are an
essential tool for studying resource allocation problems in the context of Web
services.
– Most Web-based applications operate in multi-tiered environments. Models
can be used to analyze performance of distributed applications running on
three-tiered architectures, composed of Web servers, application servers and
database servers.
– Performance tuning of complex applications is a huge territory. When a
Web-based application presents performance problems, a mandatory step to
solve them is to tune the underlying system. This means to measure the
system and try to identify the sources of performance problems: application
156 V.A.F. Almeida
Parameters for queuing network (QN) models are divided into the following
categories. (1) System parameters specify the characteristics of a system that
affect performance. Examples include load-balancing disciplines for Web server
mirroring, network protocols, maximum number of connections supported by a
Web server, and maximum number of threads supported by the database man-
agement system. (2) Resource parameters describe the intrinsic features of a
resource that affect performance. Examples include disk seek times, latency and
transfer rates, network bandwidth, router latency, and processor speed ratings.
(3) Workload parameters that are derived from workload characterization and
are divided into types: workload intensity and service demand. Workload inten-
sity parameters provide a measure of the load placed on the system, indicated
by the number of units of work that contend for system resources. Examples in-
clude the number of requests/sec submitted to the database server and number
of sales transactions submitted per second to the credit card service. Workload
service demand parameters specify the total amount of service time required by
each basic component at each resource. Examples include the processor time of
transactions at the database server, the total transmission time of replies from
the database server and the total I/O time at the streaming media server.
6 Concluding Remarks
Capacity planning techniques are needed to avoid the pitfalls of inadequate ca-
pacity and to meet users’ performance expectations in a cost-effective manner.
This tutorial provides the foundations required to carry out capacity planning
studies. Planning the capacity of Web services requires that a series of steps be
followed in a systematic way. This paper gives an overview of the main steps of
the quantitative approach to analyze Web services. The main steps are based
on two models: a workload model and a performance model. The two models
can be used in capacity planning projects to answer typical what-if questions,
frequently faced by managers of Web services.
References
1 Introduction
In the last few years, the number of network-based services available on the Inter-
net has grown considerably. Web servers are now used as the ubiquitous interface
for information exchange and retrieval both at enterprise level, via intranets, and
at global level, via the the World Wide Web. In spite of the continuous increase
of the network capacity, in terms of investments in new technologies and in new
network components, the Internet still fails to satisfy the needs of a consistent
fraction of users. New network-based applications require interactive response
This work has been supported by MIUR project COFIN 2001: High quality Web
systems.
M.C. Calzarossa and S. Tucci (Eds.): Performance 2002, LNCS 2459, pp. 158–178, 2002.
c Springer-Verlag Berlin Heidelberg 2002
End-to-End Performance of Web Services 159
One of the typical misconceptions related to the Internet is that the bandwidth
is the only factor limiting the speed of web services. Thus, with the diffusion of
broadband networks in the next few years, high performance will be guaranteed.
This conclusion is clearly wrong.
Indeed, although high bandwidth is necessary for the efficient download of
large files such as video, audio and images, as more and more services are offered
on the Internet, a small end-to-end response time, i.e., the overall waiting time
160 P. Cremonesi and G. Serazzi
A current trend in the infrastructure of the Internet is the increase of the com-
plexity of the chain of networks between a client and a server, that is, the path
in both directions between the user browser and the web server, also referred
to as request path. From the instant a request is issued by a browser, a series
of hardware components and software processes are involved in the delivery of
the request to the server. Hardware components comprise routers, gateways, in-
termediate hosts, proxy cache hosts, firewalls, application servers, etc. Software
processes involved in the delivery of a request refer to the protocol layers (HTTP,
TCP, IP, and those of lower layers), the routing algorithms, the address transla-
tion process, the security controls, etc. In Fig. 1, a simplified model of a request
path between a user browser and a web server is illustrated.
End-to-End Performance of Web Services 161
In te r n a tio n a l
N a tio n a l B a c k b o n e N a tio n a l
In te rn e t In te rn e t w e b
S e r v ic e S e r v ic e
u s e r S e r v ic e S e r v ic e s e rv e r
P r o v id e r P r o v id e r
b ro w s e r P r o v id e r P r o v id e r
ro u te rs , g a te w a y s , h o s ts
s w itc h e s , fir e w a lls
Fig. 1. Simplified model of a request path from a user browser to a web server.
such that the utilization is greater than 80%, the rate of increase of the response
time is extremely high, i.e., the resource is congested and the delay introduced
in the packet flow is huge.
4 0 0
3 5 0
r e s p o n s e tim e ( r a te o f in c r e a s e )
3 0 0
2 5 0
2 0 0
1 5 0
1 0 0
5 0
0
0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1
u tiliz a tio n
1
p r o b . o f h ig h r e s p o n s e tim e
0 .8
0 .6
0 .1 3 9
0 .4
0 .2
0
1 5
1 0
5 0 .2
0 .1 5
0 .1
p a th le n g th 0 0 .0 5
0
p r o b . o f c o m p . c o n g e s tio n
Fig. 3. Probability of finding one or more congested components along a request path
browser-web server-browser (i.e., of a very high response time) as a function of the
congestion probability p of a single component and the path length n.
The World Wide Web is a more variable system than it was expected. Several
analyses show that the limited variability notion widely used for several decades
in telecommunication modelling, i.e., the assumption of the Poisson nature of
traffic related phenomena, has very little in common with Internet reality. Ev-
idence is provided by the fact that the behavior of the aggregated traffic does
not become less bursty as the number of sources increases [12].
More precisely, models in which the exponential distribution of the variables
is assumed are not able to describe Internet conditions in which the variables
(e.g., duration of the sessions, end-to-end response times, size of downloaded
files) show a variability encompassing several time scales. The high temporal
variability in traffic processes is captured assuming the long-term dependence of
164 P. Cremonesi and G. Serazzi
the corresponding variables and the heavy-tail distribution of their values (i.e.,
distribution whose tail declines according to a power-law).
A distribution is heavy-tailed (see, e.g., [5]) if its complementary cumulative
distribution 1 − F (x) decays slower than the exponential, i.e. if
for all γ > 0. One of the simplest heavy-tailed distributions is the Pareto distri-
bution, whose probability density function f (x) and distribution function F (x)
are given by (see, e.g., [15]):
0
1 0
α = 1 .1 k = 0 .0 9
α = 1 .5 k = 0 .3 3
− 2
1 0
f(x )
− 4
1 0
α = 4 k = 0 .7 5
α = 3 k = 0 .6 6
− 6 α = 2 k = 0 .5
1 0
e x p o n e n tia l
− 8
1 0
0 5 1 0 1 5 2 0
x
Fig. 4. Probability density functions (in log scale) of several Pareto random variables,
with different parameters α and k, compared with the probability density function
(dashed line) of an exponential random variable; the mean value of all the functions is
one.
network resources. It has been shown [16] [17] that, under certain assumptions,
the superposition of many ON/OFF sources generates a process exhibiting the
long-term dependency characteristic. Thus, the corresponding model is able to
capture the self-similar nature of Internet traffic.
Another phenomenon that influences the origin of fluctuations of Internet
traffic (at a more macroscopic level than the one seen at single source level)
is related to the amount of correlation existing among the sources. Empirical
observations suggest the presence of traffic cycles on a temporal basis, among
which the daytime cycle is the most evident. The existence of such a cycle is
enough intuitive and is connected to office working hours and availability periods
of some on-line services (e.g., typically the traffic peaks during the morning and
the afternoon hours). The time difference across the globe may also generate
cycles with different periodicity. Other types of source correlations are generated
by the occurrence of special events (sport competitions, natural disasters, wars,
etc.).
As we have seen, the Internet is a network environment where load fluctu-
ations should be considered physiological rather than exceptional events. The
self-similarity characteristic of the load propagates its effects on all the network
layers, from the application to the link layer. As a consequence, transient con-
gestions may occur with non-negligible probability in each of the components
along the request path browser-server-browser (Sect.2.1). While the task of per-
formance optimization is relatively straightforward in a network with limited
load variability, it becomes significantly more complex in the Internet because of
transient congestions. The load imbalance in the resources, usually modeled as
an open network of queues (Fig. 1), of a request path will be extreme and will
grow as the load increases. Thus, the probability of finding a component subject
to transient congestion in a relatively long request path, e.g., of about 15 hops,
is consistent (Fig. 3).
When a fluctuation of traffic creates a congestion in a component (e.g., a
router) of an open network of queues, the performance degradation due to the
overload is huge since the asymptotes of the performance indices are vertical
(Fig. 2): the response time increases several orders of magnitude, the throughput
reaches saturation, and the number of customers at the congested component
tends to infinity.
This unexpected increase of response time triggers the congestion control
mechanism implemented in the TCP protocol in order to prevent the source of
traffic from overloading the network. Since the source uses a feedback control,
directly computed from the network or received from intermediate components,
to tune the load sent on the network, the increase of response time (in this
context usually referred to as round trip time) beyond a threshold value triggers
an immediate reduction of the congestion window size, thus a reduction of the
traffic input on the network. The throughput decreases suddenly and will increase
slowly according to the algorithm implemented by the TCP version adopted.
The various versions of TCP implement different congestion control mechanisms
inducing a different impact on network performance [11]. Clearly, this type of
End-to-End Performance of Web Services 167
3.1 Experiments
The monitoring system used to collect the data consists of a Java–based tool
WPET (Web Performance Evaluation Tool) developed at the Politecnico di Mi-
lano. WPET is composed by a set of agents for the collection of Internet per-
formance data. Each agent is an automated browser that can be programmed
to periodically download web pages and to measure several performance metrics
(e.g., download times). Each agent is connected to Internet through a different
connection type (e.g., ISDN, xDSL, cable, backbone), from different geograph-
ical locations (e.g., Rome, Milan) and through different providers. A WPET
agent can surf on a web site performing a set of complex operations, such as
fill a form, select an item from a list, follow a link. An agent can handle HTTP
and HTTPS protocols, session tracking (url-rewriting and cookies) and plug-ins
(e.g., flash animations, applets, activexes). For each visited page, the agent col-
lects performance data for all the objects in the page. For each object, several
performance metrics are measured: DNS lookup time, connection time, redirect
time, HTTPS handshake time, server response time, object download time, ob-
ject size, error conditions. All the data collected by the agents are stored in a
centralized database and analyzed in order to extract meaningful statistics.
w w w .m it.e d u
2 5
2 0
d o w n lo a d tim e ( s e c .)
1 5
1 0
0
2 0 /0 3 2 1 /0 3 2 2 /0 3 2 3 /0 3 2 4 /0 3 2 5 /0 3 2 6 /0 3 2 7 /0 3 2 8 /0 3
d a y
0
1 0 2 5
2 0
− 1
1 0 α = 3 .2
Y Q u a n tile s
1 5
1 − F (x )
− 2
1 0
1 0
5
− 3
1 0 0
2 3 4 5 0 5 1 0 1 5 2 0 2 5
1 0 1 0 1 0 1 0
x X Q u a n tile s
Fig. 5. Download times of the home page of the MIT web site (upper part). Log-Log
complementary plot of cumulative distribution F (x) (lower left). Quantile-quantile plot
of the estimated Pareto distribution vs. the real distribution (lower right).
While the log-log complementary distribution plot provides solid evidence for
Pareto distribution in a given data set, the method described above for producing
an estimate for α is prone to errors. In order to confirm the correctness of the
estimated parameter α we can use the quantile-quantile plot method (lower
right part of Fig. 5). The purpose of this plot is to determine whether two
samples come from the same distribution type. If the samples do come from the
same distribution, the plot will be linear. The quantile-quantile plot in Fig. 5
shows quantiles of the measured data set (x axis) versus the quantiles of a
Pareto distribution with tail parameter α = 3.2 (y axis). The plot confirms the
correctness of the results.
Figures 6 and 7 extend the analysis by comparing the download times of the
home pages of four web servers:
– Massachusetts Institute of Technology (www.mit.edu)
– Standford University (www.standford.edu)
– Google www.google.com)
– Altavista (www.altavista.com).
End-to-End Performance of Web Services 169
The four plots in both the figures show the log-log complementary cumulative
distributions (continuous lines), together with the approximating Pareto distri-
butions (dashed lines). The measurements of Fig. 6 have been collected with a
WPET agent running on a system directly connected on a backbone. The mea-
surements of Fig. 7 have been collected with an agent connected to the Internet
via an ADSL line. Both the agents were located in Milan. All the figures confirm
the heavy-tail property of end-to-end download times.
w w w .m it.e d u − α = 3 .2 w w w .s ta n d fo rd .e d u − α = 2 .6 6
0 0
1 0 1 0
− 1 − 1
1 0 1 0
1 − F (x )
1 − F (x )
− 2 − 2
1 0 1 0
− 3 − 3
1 0 1 0
− 2 0 2 − 1 0 1 2
1 0 1 0 1 0 1 0 1 0 1 0 1 0
w w w .g o o g le .c o m − α = 2 .3 3 w w w .a lta v is ta .c o m − α = 3 .1 1
0 0
1 0 1 0
− 1 − 1
1 0 1 0
1 − F (x )
1 − F (x )
− 2 − 2
1 0 1 0
− 3 − 3
1 0 1 0
0 1 2 − 1 0 1 2
1 0 1 0 1 0 1 0 1 0 1 0 1 0
d o w n lo a d tim e ( s e c .)
d o w n lo a d tim e ( s e c .)
Fig. 6. Log-Log complementary plots of the home page download times distribution of
four web sites measured from a backbone Internet connection. The real data distribution
(continuous line) and the approximated Pareto distribution (dashed line) are shown.
The estimated tail index α is reported on each plot.
It is interesting to observe that all the plots in Fig. 7 (ADSL connection) have
a lower value of α with respect to the corresponding plots in Fig. 6 (backbone
connection). We remember that lower values of α mean higher variability. This
suggests that slow client connections are characterized by high variability, be-
170 P. Cremonesi and G. Serazzi
cause (i) the source of congestion is in the network, not in the client connection,
and (ii) the overhead of retransmissions is higher for slower client connections.
w w w .m it.e d u − α = 1 .9 4 w w w .s ta n d fo rd .e d u − α = 1 .8 4
0 0
1 0 1 0
− 1 − 1
1 0 1 0
1 − F (x )
1 − F (x )
− 2 − 2
1 0 1 0
− 3 − 3
1 0 1 0
− 1 0 1 2 − 2 0 2
1 0 1 0 1 0 1 0 1 0 1 0 1 0
w w w .g o o g le .c o m − α = 2 .1 3 w w w .a lta v is ta .c o m − α = 2 .6 1
0 0
1 0 1 0
− 1 − 1
1 0 1 0
1 − F (x )
1 − F (x )
− 2 − 2
1 0 1 0
− 3 − 3
1 0 1 0
0 1 2 − 2 0 2
1 0 1 0 1 0 1 0 1 0 1 0
d o w n lo a d tim e ( s e c .) d o w n lo a d tim e ( s e c .)
Fig. 7. Log-Log complementary plots of the home page download times distribution
of the same four web server of Fig. 6measured from an ADSL Internet connection.
The real data distribution (continuous line) and the approximated Pareto distribution
(dashed line) are shown. The estimated tail index α is reported on each plot.
H = 0 .9 5
3 2
3 1
3 0
2 9
e n e rg y
2 8
2 7
2 6
3 0 m in 1 h 2 h 4 h 8 h 1 6 h
tim e s c a le
Fig. 8. Scaling analysis of the download times of MIT web site home page. The wavelet
energy (continuous line) is approximated with a straight line (dashed line) with slope
0.90.
User-perceived response time has a strong impact on how long users would stay at
a web site and on the frequency with which they return to the site. Acceptable
response times are difficult to determine because people’s expectations differ
from situation to situation. Users seem willing to wait varying amounts of time
for different types of interactions [13]. The amount of time a user is willing to
wait appears to be a function of the perceived complexity of the request. For
example, people will wait longer:
– for requests that they think are hard or time-consuming for the web site to
be performed (e.g. search engines);
– when there are no simple or valid alternatives to the visited web site (e.g.,
the overhead required to move a bank account increases the tolerance of
home banking users).
On the contrary, users will be less tolerant to long delays for web tasks that they
consider simple or when they know there are valid alternatives to the web site.
Selvidge and Chaparro [14] conducted a study to examine the effect of down-
load delays on user performance. They used delays of 1 second, 30 seconds, and
60 seconds. They found that users were less frustrated with the one-second delay,
but their satisfaction was not affected by the 30 seconds response times.
According to Nielsen, download times greater than 10 seconds causes user
discomfort [10]. According to a study presented by IBM researchers, a download
time longer than 30 seconds is considered too slow [7].
Studies on how long users would wait for the complete download of a web
page have been performed by Bouch, Kuchinsky and Bhatti [3]. They reported
good ratings for pages with latencies up to 5 seconds, and poor ratings for pages
with delays over 10 seconds. In a second study, they applied the incremental load
of web pages (with the banner first, text next and graphics last). Under these
conditions, users were much more tolerant of longer latencies. They rated the
delay as “good” with latencies up to 30 seconds. In a third study they observed
that, as users interact more with a web site, their frustration with downloading
delays seems to accumulate. In general, the longer a user interacts with a site
(i.e., the longer is the navigation path), the less delay he will tolerate.
In Fig. 9 we have integrated the results of these studies in order to identify two
thresholds for the definition of a user satisfaction. The thresholds are function
of the navigation step:
End-to-End Performance of Web Services 173
3 0
a c c e p ta b le
u n a c c e p ta b le
2 5
2 0
d o w n l o a d t i m e ( s e c .)
1 5
1 0
0
1 2 3 4 5 1 0 2 0
n a v ig a tio n s te p
Fig. 9. User satisfaction as a function of the navigation steps. Users are always satisfied
with web pages whose download time is below the lower threshold (continuous line).
Users will not tolerate latencies longer than the upper threshold (dashed line).
6 0
c o n n e c tio n
re s p o n s e
tra n s fe r
5 0
4 0
d o w n lo a d tim e ( s e c .)
3 0
2 0
1 0
0
2 3 − d e c 3 0 − d e c 0 6 − ja n 1 3 − ja n 2 0 − ja n
d a te
Fig. 10. End-to-end response time for the download of the home page of a web site
with network problems. The three basic components, the TCP/IP connection time, the
server response time and the page transfer time are shown.
time is always lower than 30 seconds but higher than 10 seconds. Connection
time is always around 1 second. However, there is still space for optimizations.
In fact, the average response time, which measures the time required for the web
server to load the page from disk (or to generate the page dynamically), is about
10 seconds in most of the cases. By adding new hardware or improving the web
application, the response time should be reduced to 1–2 seconds.
3 0
c o n n e c tio n
re s p o n s e
tra n s fe r
2 5
2 0
d o w n lo a d tim e ( s e c .)
1 5
1 0
0
2 3 − d e c 3 0 − d e c 0 6 − ja n 1 3 − ja n 2 0 − ja n
d a te
Fig. 11. End-to-end response time for the download of the home page of a web site
with server problems. The three basic components, the TCP/IP connection time, the
server response time and the page transfer time are shown.
2 5 1 6
1 4
2 0
o b je c t s iz e 1 2
d o w n lo a d tim e ( s e c .)
1 0
1 5
K B y te
8
1 0 6
4
5
2
o b je c ts
Fig. 12. Page components plot. Vertical bars represent the download times for all the
objects in the page. The line indicates the dimension of each object. The object pointed
by the arrow is a banner.
5 Conclusions
In this paper we have analyzed the origins of the high fluctuations in web traf-
fic. The sources of these fluctuations are located into the characteristics of the
applications, the complexity of the network path connecting the web user to the
web server, the self-similarity of web traffic (file sizes and user think times), and
the congestion control mechanism in the TCP/IP protocol. Empirical evidence
of self-similar and heavy-tail features in measured end-to-end web site perfor-
mance is provided. We have integrated this technical knowledge with the results
of recent studies aimed at determining the effects of long download delays on
users satisfaction. We have showed that users satisfaction can be modelled with
End-to-End Performance of Web Services 177
2 5 3 0
2 5
2 0
o b je c t s iz e
d o w n lo a d tim e ( s e c .)
2 0
1 5
K B y te
1 5
1 0
1 0
5
5
o b je c ts
Fig. 13. Page components plot. Vertical bars represent the download times for all the
objects in the page. The line indicates the dimension of each object.
two thresholds. Simple guidelines for the detection of web performance problems
and for their optimization are also presented.
References
1. Abry, P. and Veitch, D.: Wavelet analysis of long-range dependent traffic. IEEE
Trans. on Information Theory 44 (1998) 2–15.
2. Barford, P., Bestavros, A., Bradley, A., Crovella, M.E.: Changes in Web Client Ac-
cess Patterns: Characteristics and Caching Implications. World Wide Web Journal
2 (1999) 15–28.
3. Bhatti, N., Bouch, A., Kuchinsky, A.: Integrating User–Perceived Quality into Web
Server Design. Proc. of the 9th International World-Wide Web Conference. Elsevier
(2000) 1–16.
4. Crovella, M.E., Bestavros, A.: Self-Similarity in World Wide Web traffic evidence
and possible causes. IEEE/ACM Trans. on Networking 5 (1997) 835–846.
5. Feldmann, A., Whitt. W.: Fitting mixtures of exponentials to long-tail distributions
to analyze network performance models. Performance Evaluation 31 (1998) 245–
279.
6. Haverkort, B.R.: Performance of Computer Communication System: A Model-
based Approach. Wiley, New York (1998).
7. IBM: Designed for Performance.
http://www.boulder.ibm.com/wsdd/library/techarticles/hvws/perform.html
8. Jackson, J.R.: Network of waiting lines. Oper. Res. 5 (1957) 518–521.
9. Leland, W.E., Taqqu, M.S., Willinger, W., Wilson, D.V.: On the Self-Similar Na-
ture of Ethernet Traffic. IEEE/ACM Trans. on Networking 2 (1994), 1–15.
10. Nielsen, J.: Designing Web Usability. New Riders (2000).
11. Park, K., Kim, G., Crovella, M.E.: On the Effect of Traffic Self-similarity on Net-
work Performance. Proc. of the SPIE International Conference on Performance
and Control of Network Systems (1997) 296–310.
178 P. Cremonesi and G. Serazzi
12. Paxon, V., Floyd, S.: Wide area traffic: The failure of Poisson modeling.
IEEE/ACM Trans. on Networking 3 (1995) 226–244.
13. Ramsay, J., Barbesi, A., Preece, J.: A psychological Investigation of Long Retrieval
Times on the World Wide Web. Interacting with Computers 10 (1998) 77–86.
14. Selvidge, P.R., Chaparro, B., Bender, G.T.: The World Wide Wait: Effects of
Delays on User Performance. Proc. of the IEA 2000/HFES 2000 Congress (2000)
416–419.
15. Trivedi, K.S.: Probability and Statistics with Reliability, Queueing and Computer
Science Applications. Wiley, New York (2002).
16. Willinger, W., Paxon, V., Taqqu, M.S.: Self-Similarity and Heavy-Tails: Structural
Modeling of Network Traffic. In A Practical Guide To Heavy Tails: Statistical
Techniques and Applications. R.Adler, R.Feldman and M.Taqqu Eds., Birkhauser,
Boston (1998) 27–53.
17. Willinger, W., Taqqu, M.S., Sherman, R., Wilson, D.V.: Self-Similarity Through
High-Variability: Statistical Analysis of Ethernet LAN Traffic at the Source Level.
IEEE/ACM Trans. on Networking 5 (1997) 71–86.
B e n c h m a r k in g
R e in h o ld W e ic k e r
F u jits u S ie m e n s C o m p u te rs , 3 3 0 9 4 P a d e rb o rn , G e rm a n y
r e i n h o l d . w e i c k e r @ f u j i t s u - s i e m e n s . c o m
A p a n e l d is c u s s io n o n th e o c c a s io n o f th e A S P L O S -III sy m p o s iu m in 1 9 8 9 h a d th e
title “ F a ir B e n c h m a rk in g – A n O x y m o ro n ? ” T h e e v e n t lo o k e d so m e w h a t s tra n g e
a m o n g a ll th e o th e r, te c h n ic a lly o rie n te d p re s e n ta tio n s a t th e c o n fe re n c e . T h is
a n e c d o ta l e v id e n c e in d ic a te s th e u n iq u e s ta tu s th a t b e n c h m a rk in g h a s in th e c o m p u te r
a re a : B e n c h m a rk in g is , o n th e o n e h a n d , a h ig h ly te c h n ic a l e n d e a v o r. B u t it a ls o is ,
a lm o s t b y d e fin itio n , re la te d to c o m p u te r m a rk e tin g .
W e s e e th a t b e n c h m a rk s a re u s e d in th re e a re a s :
1 . C o m p u te r c u s to m e rs a s w e ll a s th e g e n e ra l p u b lic u s e b e n c h m a rk re s u lts to
c o m p a re d iffe re n t c o m p u te r s y s te m s , b e it a v a g u e p e rc e p tio n o f p e r-
fo rm a n c e o r a s p e c ific p u rc h a s in g d e c is io n . C o n s e q u e n tly , m a rk e tin g
d e p a rtm e n ts o f h a rd w a re a n d s o ftw a re v e n d o rs d riv e to a la rg e d e g re e w h a t
h a p p e n s in b e n c h m a rk in g . A fte r a ll, th e d e v e lo p m e n t o f g o o d b e n c h m a rk s is
n e ith e r e a s y n o r c h e a p , a n d it is m o s tly th e c o m p u te r v e n d o rs ’ m a rk e tin g
d e p a rtm e n ts th a t in th e e n d , d ire c tly o r in d ire c tly , p a y th e b ill.
2 . D e v e lo p e rs in h a rd w a re o r s o ftw a re d e p a rtm e n ts u s e b e n c h m a rk s to o p tim iz e
th e ir p ro d u c ts , to c o m p a re th e m w ith a lte rn a tiv e o r c o m p e titiv e d e s ig n s .
Q u ite o fte n , d e s ig n d e c is io n s a re m a d e o n th e b a s is o f s u c h c o m p a ris o n s .
3 . F in a lly , b e n c h m a rk s a re h e a v ily u s e d in c o m p u te r re s e a rc h p a p e rs b e c a u s e
th e y a re re a d ily a v a ila b le , a n d th e y c a n b e e x p e c te d to b e e a s ily p o rta b le .
T h e re fo re , w h e n q u a n tita tiv e c la im s a re m a d e in re s e a rc h p a p e rs , th e y a re
o fte n b a s e d o n b e n c h m a rk s .
A lth o u g h th e firs t u s a g e o f b e n c h m a rk s (c o m p a ris o n o f e x is tin g c o m p u te rs ) is fo r
m a n y th e p rim a ry u s a g e , th is p a p e r trie s to to u c h a ll th re e a s p e c ts .
W h a t m a k e s a p ro g ra m a b e n c h m a rk ? W e c a n s a y th a t a b e n c h m a rk is a s ta n d a rd iz e d
p ro g ra m (o r d e ta ile d s p e c ific a tio n s o f a p ro g ra m ) d e s ig n e d o r s e le c te d to b e ru n o n
M .C . C a lz a r o s s a a n d S . T u c c i ( E d s .) : P e r f o r m a n c e 2 0 0 2 , L N C S 2 4 5 9 , p p . 1 7 9 – 2 0 7 , 2 0 0 2 .
© S p rin g e r-V e rla g B e rlin H e id e lb e rg 2 0 0 2
1 8 0 R . W e ic k e r
2 O v e r v ie w , C la s s ific a tio n o f B e n c h m a r k s
T h e b e n c h m a rk s th a t h a v e b e e n o r a re w id e ly u s e d c a n b e c la s s ifie d a c c o rd in g to
se v e ra l c rite r ia . P e rh a p s th e b e s t s ta rtin g p o in t is “ W h o o w n s / a d m in is te rs a
b e n c h m a rk ? ” It tu rn s o u t th a t th is a ls o ro u g h ly c o rre s p o n d s to a c h ro n o lo g ic a l o rd e r in
th e h is to ry o f b e n c h m a rk in g .
2 .1 C la s s ific a tio n b y B e n c h m a r k O w n e r s h ip , H is to r y o f B e n c h m a r k s
In e a rlie r y e a rs , b e n c h m a rk s w e re a d m in is te re d b y s in g le a u th o rs , th e n in d u s try
a s s o c ia tio n s to o k o v e r. In th e la s t y e a rs , b e n c h m a rk s b e c a m e p o p u la r th a t a re
a d m in is te re d b y im p o rta n t s o ftw a re v e n d o rs .
B e n c h m a rk in g 1 8 1
2 .1 .1 I n d iv id u a l A u th o r s , C o m p le te P r o g r a m s
T a b le 1 . S in g le -a u th o r c o m p le te b e n c h m a rk s
N a m e a n d A u th o r(s ) Y e a r L a n g u a g e C o d e s iz e , C h a ra c te riz a tio n
in b y te
W h e ts to n e 1 9 7 6 A L G O L 6 0 / 2 ,1 2 0 S y n th e tic
(C u rn o w /W ic h m a n ) F o rtra n N u m e ric a l C o d e ,
F P -in te n s iv e
L in p a c k 1 9 7 6 F o rtra n (In n e r lo o p :) P a c k a g e
(J. D o n g a rra ) 2 3 0 L in e a r A lg e b ra
N u m e ric a l C o d e ,
F P -in te n s iv e
D h ry s to n e 1 9 8 4 A d a / 1 ,0 4 0 S y n th e tic
(R . W e ic k e r) P a sc a l / S y s te m -ty p e c o d e ,
C in te g e r o n ly
A d e ta ile d o v e rv ie w o f th e s e th re e b e n c h m a rk s , w ritte n a t a b o u t th e p e a k o f th e ir
p o p u la rity , c a n b e fo u n d in [1 1 ]. A m o n g th e s e th re e b e n c h m a rk s , o n ly L in p a c k h a s
re ta in e d a c e rta in im p o rta n c e , m a in ly th ro u g h th e p o p u la r “ T o p 5 0 0 ” lis t
( w w w .to p 5 0 0 .o r g ) . I t m u s t b e s ta te d , a n d th e a u th o r, J a c k D o n g a rra , h a s
a c k n o w le d g e d , th a t it re p re s e n ts ju s t “ o n e a p p lic a tio n ” (s o lu tio n o f a s y s te m o f lin e a r
e q u a tio n s w ith a d e n s e m a trix ), re s u ltin g in “ o n e n u m b e r” . O n th e o th e r h a n d , th is
w e a k n e s s c a n tu rn in to a s tre n g th : T h e re is h a rd ly a n y c o m p u te r s y s te m in th e w o rld
fo r w h ic h th is b e n c h m a rk h a s n o t b e e n ru n ; th e re fo re m a n y re s u lts a re a v a ila b le . T h is
h a s le a d to th e e ffe c t th a t s y s te m s a re c o m p a re d o n th is b a s is , w h ic h w ill n e v e r ru n , in
re a l life , s c ie n tific -n u m e ric c o d e lik e L in p a c k .
2 .1 .2 I n d iv id u a l A u th o r s , M ic r o b e n c h m a r k s
T h e te rm “ m ic ro b e n c h m a rk ” is u s e d fo r p ro g ra m p ie c e s th a t in te n tio n a lly h a v e b e e n
c o n s tru c te d to te s t o n ly o n e p a rtic u la r fe a tu re o f th e s y s te m u n d e r te s t; th e y d o n o t
c la im to b e re p re s e n ta tiv e fo r a w h o le a p p lic a tio n a re a . H o w e v e r, th e fe a tu re th a t th e y
te s t is im p o rta n t e n o u g h th a t s u c h a s p e c ia liz e d te s t is in te re s tin g . T h e b e s t-k n o w n
e x a m p le , a n d a n o fte n -u s e d o n e , is p ro b a b ly J o h n M c C a lp in ’s “ S tre a m ” b e n c h m a rk
( w w w .c s .v ir g in ia .e d u /s tr e a m ) . I t m e a s u re s “ s u s ta in a b le m e m o ry b a n d w id th a n d th e
c o rre s p o n d in g c o m p u ta tio n ra te fo r s im p le v e c to r k e rn e ls ” [7 ]. T h e o n c e p o p u la r
“ lm b e n c h ” b e n c h m a r k ( w w w .b itm o v e r .c o m /lm b e n c h ) , c o n s is tin g o f v a r io u s s m a ll
p ro g ra m s e x e c u tin g in d iv id u a l U n ix s y s te m c a lls , s e e m s to b e n o lo n g e r a c tiv e ly
p u rs u e d b y its a u th o r.
1 8 2 R . W e ic k e r
2 .1 .3 B e n c h m a r k s O w n e d a n d A d m in is te r e d b y I n d u s tr y A s s o c ia tio n s
A fte r th e s u c c e s s o f s o m e s m a ll s in g le -a u th o r b e n c h m a rk s in th e 1 9 8 0 ’s , it b e c a m e
e v id e n t th a t s m a ll s in g le -a u th o r b e n c h m a rk s lik e D h ry s to n e w e re in s u ffic ie n t to
c h a ra c te riz e th e e m e rg in g la rg e r a n d m o re s o p h is tic a te d s y s te m s . F o r e x a m p le ,
b e n c h m a rk s w ith a s m a ll w o rk in g s e t c a n n o t a d e q u a te ly m e a s u re th e e ffe c t o f
m e m o ry h ie ra rc h ie s (m u lti-le v e l c a c h e s , m a in m e m o ry ). A ls o , s m a ll b e n c h m a rk s c a n
e a s ily b e s u b je c t to ta rg e te d c o m p ile r o p tim iz a tio n s . T o s a tis fy th e n e e d fo r
b e n c h m a rk s th a t a re la rg e r a n d c o v e r a b ro a d e r a re a , te c h n ic a l re p re s e n ta tiv e s fro m
v a rio u s c o m p u te r m a n u fa c tu re rs fo u n d e d in d u s try a s s o c ia tio n s th a t d e fin e
b e n c h m a rk s , s e t ru le s fo r m e a s u re m e n ts , a n d re v ie w a n d p u b lis h re s u lts .
T a b le 2 . B e n c h m a rk s o w n e d b y in d u s try a s s o c ia tio n s
B e n c h m a rk S in c e U R L L a n g u a g e A p p lic a tio n S y s te m s
g ro u p o f b e n c h - a re a ty p ic a lly
m a rk s te s te d
G P C / 1 9 8 6 w w w .s p e c .o r g / C G ra p h ic s W o rk -
S P E C G P C g p c p ro g ra m s s ta tio n s
P e rf e c t / 1 9 8 7 w w w .s p e c .o r g / F o rtra n N u m e ric a l S u p e rc o m -
S P E C H P G h p g p ro g ra m s p u te rs
S P E C C P U 1 9 8 8 w w w .s p e c .o r g / C , C + + , M ix e d W o rk s ta tio n
o sg /c p u F o rtra n p ro g ra m s s,
S e rv e rs
T P C 1 9 8 8 w w w .tp c .o r g S p e c ific a - O L T P D a ta b a s e
tio n o n ly p ro g ra m s S e rv e rs
B A P C o 1 9 9 1 w w w .b a p c o .c o m O b je c t P C P C s
C o d e A p p lic a tio n
s
S P E C 1 9 9 2 w w w .s p e c .o r g / C (d riv e r) S e le c te d S e rv e rs
S y s te m o sg s y s te m
fu n c tio n s
E E M B C 1 9 9 7 w w w .e e m b c .o r g C M ix e d E m b e d d e d
p ro g ra m s p ro c e sso rs
A m o n g th e s e b e n c h m a rk in g g ro u p s , S P E C s e e m s to b e s o m e k in d o f a ro le m o d e l:
S o m e la te r a s s o c ia tio n s (B A P C o , E E M B C , S to ra g e P e rfo rm a n c e C o u n c il) h a v e
fo llo w e d , to a s m a lle r o r la rg e r d e g re e , S P E C ’s a p p ro a c h in th e d e v e lo p m e n t a n d
a d m in is tra tio n o f b e n c h m a rk s . A ls o , s o m e o ld e r b e n c h m a rk in g g ro u p s lik e th e
“ P e rfe c t” o r G P C g ro u p s d e c id e d to u s e S P E C ’s e s ta b lis h e d in fra s tru c tu re fo r re s u lt
p u b lic a tio n ) a n d to c o n tin u e th e ir e ffo rts a s a s u b g ro u p o f S P E C . T h e la te s t e x a m p le
is th e E C P e rf b e n c h m a rk in g g ro u p , w h ic h is c u rre n tly c o n tin u in g its e ffo rt fo r a “ J a v a
A p p lic a tio n S e rv e r” b e n c h m a rk w ith in th e S P E C J a v a s u b c o m m itte e .
B e n c h m a rk in g 1 8 3
2 .1 .4 B e n c h m a r k s O w n e d a n d A d m in is te r e d b y M a jo r S o ftw a r e V e n d o r s
D u rin g th e la s t d e c a d e , b e n c h m a rk s b e c a m e p o p u la r th a t w e re d e v e lo p e d b y s o m e
m a jo r s o ftw a re v e n d o rs . O fte n , th e s e v e n d o rs a re a s k e d a b o u t s iz in g d e c is io n s : H o w
m a n y u s e rs w ill b e s u p p o rte d o n a g iv e n s y s te m , ru n n in g a s p e c ific a p p lic a tio n
p a c k a g e ? T h e re fo re , th e v e n d o r ty p ic a lly c o m b in e d o n e o r m o re o f h is a p p lic a tio n
p a c k a g e s w ith a fix e d in p u t a n d d e fin e d th is a s a b e n c h m a rk . L a te r, th e m a jo r s y s te m
v e n d o rs w h o ru n th e b e n c h m a rk c o o p e ra te w ith th e s o ftw a re v e n d o r in th e e v o lu tio n
o f th e b e n c h m a rk , a n d th e re is u s u a lly s o m e fo rm o f o rg a n iz e d c o o p e ra tio n a ro u n d a
s o ftw a re v e n d o r’s b e n c h m a rk . S till, th e re s p o n s ib ility fo r re s u lt p u b lic a tio n ty p ic a lly
lie s w ith th e s o ftw a re v e n d o r. T h e a ttra c tiv e n e s s o f th e s e b e n c h m a rk s fo r c o m p u te r
c u s to m e rs lie s in th e fa c t th a t th e y im m e d ia te ly h a v e a fe e lin g fo r th e p ro g ra m s th a t
a re e x e c u te d d u rin g th e b e n c h m a rk m e a s u re m e n t: Id e a lly , th e y a re th e s a m e p ro g ra m s
th a t c u s to m e rs ru n in th e ir d a ily o p e ra tio n s .
T a b le 3 . B e n c h m a rk s a d m in is te re d b y m a jo r s o ftw a re v e n d o rs
S o ftw a re v e n d o r S in c e U R L B e n c h m a rk s S y s te m s
c o v e re d te s te d
S A P 1 9 9 3 w w w .s a p .c o m / E R P s o ftw a re S e rv e rs
b e n c h m a rk /
L o tu s 1 9 9 6 w w w .n o te s b e n c h . D o m in o a n d L o tu s S e rv e rs
o rg so f tw a re , m a in ly
m a il
O ra c le 1 9 9 9 w w w .o r a c le .c o m / E R P s o ftw a re S e rv e rs
A p p lic a tio n s a p p s_ b e n c h m a rk /
T h e re a re m o re s o ftw a re v e n d o rs th a t h a v e c re a te d th e ir o w n b e n c h m a rk s , a g a in o fte n
a s a b y p ro d u c t o f s iz in g c o n s id e ra tio n s , a m o n g th e m a re B a a n , P e o p le s o ft, S ie b e l, a n d
o th e rs . T a b le 3 o n ly lis ts th o s e w h e re th e e x te rn a l u s e a s a b e n c h m a rk h a s b e c o m e
m o re im p o rta n t th a n ju s t s iz in g .
N o te th a t in th is g ro u p , th e re is n o c o lu m n “ S o u rc e L a n g u a g e ” : A lth o u g h th e
a p p lic a tio n p a c k a g e ty p ic a lly h a s b e e n d e v e lo p e d in a h ig h -le v e l la n g u a g e , th e
b e n c h m a rk c o d e is th e b in a ry c o d e g e n e ra te d b y th e s o ftw a re v e n d o r a n d /o r th e
s y s te m v e n d o r fo r a p a rtic u la r p la tfo rm .
s e rv e a n e e d o f th o s e th a t ju s t w a n t a s h o rt a n s w e r to th e n o n -triv ia l q u e s tio n o f
p e rfo rm a n c e ra n k in g : “ G iv e m e a s im p le lis t ra n k in g a ll s y s te m s a c c o rd in g to th e ir
o v e ra ll p e rfo rm a n c e ” , o r “ Ju st g iv e m e th e to p 1 0 s y s te m s , w ith o u t a ll th e d e ta ils ” .
O fte n , th e m e d ia , o r h ig h -le v e l m a n a g e rs h a rd p re s s e d o n tim e , w a n t s u c h a ra n k in g .
H o w e v e r, th e in e v ita b le d a n g e r o f s u c h c o n d e n s e d re s u lt p re s e n ta tio n s is th a t
im p o rta n t c a v e a ts , im p o rta n t d e ta ils o f a p a rtic u la r re s u lt g e t lo s t.
2 .2 .1 B e n c h m a r k D e f in it io n
T h e re a re b a s ic a lly th re e c la s s e s o f b e n c h m a rk d e fin itio n s , w ith a n im p o rta n t
a d d itio n a l s p e c ia l c a s e :
1 . B e n c h m a rk s th a t a re d e fin e d in s o u rc e c o d e fo rm : T h is is th e “ c la s s ic a l”
fo rm o f b e n c h m a rk s . T h e y re q u ire a c o m p ile r fo r th e s y s te m u n d e r te s t b u t
th is is u s u a lly n o p ro b le m . S P E C , E E M B C , a n d a ll o ld e r s in g le -a u th o r
b e n c h m a rk s lis te d h e re b e lo n g to th is g ro u p .
2 . B e n c h m a rk s th a t a re d e fin e d a s b in a ry c o d e s : B y d e fin itio n , th e s e
b e n c h m a rk s c o v e r a lim ite d ra n g e o f s y s te m s o n ly . H o w e v e r, th e m a rk e t o f
In te l/W in d o w s c o m p a tib le P C s is la rg e e n o u g h th a t b e n c h m a rk s c o v e rin g
o n ly th is a re a c a n b e q u ite p o p u la r. T h e B A P C o b e n c h m a rk s a n d o th e r
b e n c h m a rk s o fte n u s e d b y th e p o p u la r P C p re s s (n o t c o v e re d h e re ) b e lo n g to
th is c a te g o ry .
3 . B e n c h m a rk s th a t a re d e fin e d a s s p e c ific a tio n s o n ly : T h e T P C b e n c h m a rk s
a re d e fin e d b y a s p e c ific a tio n d o c u m e n t, T P C d o e s n o t p ro v id e s o u rc e c o d e .
B e c a u s e o f th e n e e d to p ro v id e a le v e l p la y in g fie ld in th e a b s e n c e o f s o u rc e
c o d e , a n d to p re v e n t lo o p h o le s , th e s e s p e c ific a tio n d o c u m e n ts a re q u ite
v o lu m in o u s . F o r e x a m p le , a s o f 2 0 0 2 , th e c u rre n t T P C -C d e fin itio n c o n ta in s
1 3 7 p a g e s , th e T P C -W d e fin itio n e v e n 1 9 9 p a g e s .
4 . B e n c h m a rk s a d m in is te re d b y a s o ftw a re v e n d o r a re a s o m e w h a t s p e c ia l c a s e :
T h e c o d e ru n n in g o n th e s y s te m u n d e r te s t is m a c h in e c o d e , b u t it u s u a lly is
th e c o d e s o ld b y th e s o ftw a re v e n d o r to h is c u s to m e rs , a n d th e re is ty p ic a lly
a v e rs io n fo r e v e ry m a jo r in s tru c tio n s e t a rc h ite c tu re / o p e ra tin g s y s te m
c o m b in a tio n . T h e o n ly p ro b le m c a n b e th a t in th e c a s e o f a s m a ll s y s te m
v e n d o r, w h e re le s s s y s te m s a re s o ld , th e s o ftw a re v e n d o r m a y n o t h a v e tu n e d
th e c o d e (c o m p ila tio n , u s e o f s p e c ific O S fe a tu re s ) a s w e ll a s h e d o e s in th e
c a s e o f a b ig s y s te m v e n d o rs , w h e re m a n y c o p ie s o f th e s o ftw a re a re s o ld .
O n th e o n e h a n d , th e s o ftw a re s y s te m s s o ld to c u s to m e rs w ill a ls o h a v e th is
p ro p e rty ; s o o n e c a n s a y th a t th e s itu a tio n re p re s e n ts re a l life . O n th e o th e r
h a n d , th e fe e lin g re m a in s th a t th is is s o m e w h a t u n fa ir, p e n a liz in g s m a lle r
s y s te m v e n d o rs .
In a ll c a se s, e v e n in th e c a se w h e re th e c o d e e x e c u te d o n th e s y s te m is g iv e n in s o u rc e
o r b in a ry fo rm , a b e n c h m a rk is d e fin e d n o t o n ly b y th e c o d e th a t is e x e c u te d b u t a ls o
b y in p u t d a ta a n d b y a d o c u m e n t, ty p ic a lly c a lle d “ R u n a n d R e p o rtin g R u le s ” ; it
d e s c rib e s th e ru le s a n d re q u ire m e n ts fo r th e m e a s u re m e n t e n v iro n m e n t.
B e n c h m a rk in g 1 8 5
2 .2 .2 P r ic e / P e r fo r m a n c e
T h e v a lu e o f p ric in g in b e n c h m a rk s is o fte n s u b je c t to d e b a te , in th e b e n c h m a rk
o rg a n iz a tio n s th e m s e lv e s a n d in th e p re s s . A rg u m e n ts fo r p ric in g a re :
C u s to m e rs n a tu ra lly a re in te re s te d in p ric e s , a n d p ric e s d e te rm in e d a c c o rd in g
to u n ifo rm p ric in g ru le s s e t b y th e b e n c h m a rk o rg a n iz a tio n h a v e a c h a n c e to
b e m o re u n ifo rm th a n , s a y , p ric e s p u b lis h e d b y a m a g a z in e .
T h e re is a lw a y s a te n d e n c y a m o n g s y s te m v e n d o rs to a im fo r th e to p s p o t in
th e p e rfo rm a n c e lis t. T h e re q u ire m e n t to p ro v id e p ric e in fo rm a tio n c a n b e a
u s e fu l c o rre c tiv e a g a in s t b e n c h m a rk c o n fig u ra tio n s th a t d e g ra d e in to s h e e r
b a ttle s o f m a te r ia l: I f a b e n c h m a r k s c a le s w e ll, e .g . f o r c lu s te r s , th e n w h o e v e r
c a n a c c u m u la te e n o u g h h a rd w a re in th e b e n c h m a rk la b , w in s th e c o m p e titio n
fo r p e rfo rm a n c e . T h e re q u ire m e n t to q u o te th e p ric e o f th e c o n fig u ra tio n m a y
p re v e n t s u c h u s e le s s b a ttle s .
O n th e o th e r h a n d , th e re a re th e a rg u m e n ts a g a in s t p ric in g :
In th e c a s e o f s y s te m s th a t a re la rg e r th a n ju s t a s in g le w o rk s ta tio n , p ric e s a re
d iffic u lt to d e te rm in e a n d h a v e m a n y c o m p o n e n ts : H a rd w a re , s o ftw a re ,
m a in te n a n c e . It is h a rd to fin d u n ifo rm c rite ria fo r a ll c o m p o n e n ts , in
p a rtic u la r fo r m a in te n a n c e ; d iffe re n t c o m p a n ie s m a y h a v e d iffe re n t b u s in e s s
m o d e ls .
In th e c o m p u te r b u s in e s s , p ric e s g e t o u td a te d v e ry fa s t. It is te m p tin g b u t
m is le a d in g to c o m p a re a p ric e p u b lis h e d to d a y w ith a p ric e p u b lis h e d a y e a r
a g o .
W ith th e g o a l to b e r e a lis tic , s o m e p r ic in g r u le s ( e .g . T P C ’ s r u le s ) a llo w
d is c o u n ts , p ro v id e d th a t th e y a re g e n e ra lly a v a ila b le . O n th e o th e r h a n d , th is
a llo w s s y s te m v e n d o rs to g ra n t s u c h d is c o u n ts ju s t fo r c o n fig u ra tio n s th a t
h a v e b e e n s e le c te d w ith a n e y e o n im p o rta n t b e n c h m a rk re s u lts , m a k in g th e
p ric e le s s re a lis tic th a n it a p p e a rs .
E x p e rie n c e in b e n c h m a rk o rg a n iz a tio n s lik e T P C s h o w s th a t a la rg e
p e r c e n ta g e o f r e s u lt c h a lle n g e s h a v e to d o w ith p r ic in g .. T h is d is tr a c ts e n e r g y
fro m th e m e m b e r o rg a n iz a tio n s th a t c o u ld b e b e tte r s p e n t in th e im p ro v e m e n t
o f b e n c h m a rk s.
2 .2 .3 A r e a s C o v e r e d b y B e n c h m a r k s
F in a lly , a n im p o rta n t c la s s ific a tio n o f b e n c h m a rk s is re la te d to th e a re a th e
b e n c h m a rk s in te n d to c o v e r. O n e s u c h c la s s ific a tio n is s h o w n in fig u re 1 .
1 8 6 R . W e ic k e r
A p p lic a tio n a n d B e n c h m a r k K it
O p e r a tin g s y s te m , C o m p ile r , L ib r a r ie s
M u lt i- D is k -
C P U C a c h e M e m o ry L A N D B M S
C P U IO
C P U S P E C C P U 2 0 0 0
T e s tin g R a n g e
C P U s S P E C C P U 2 0 0 0 C a p a c ity
C a te g o ry
J a v a V M S P E C jb b 2 0 0 0
W e b s e rv e r S P E C w e b 9 9
O L T P T P C -C
D S S T P C -H , T P C -R
e -C o m m e rc e T P C -W
E R P S A P
E R P O r a c le A p p lic a tio n s
F ig . 1 . A re a s c o v e re d b y s o m e m a jo r b e n c h m a rk s (fro m [1 0 ]).
O f c o u rse , th e b e n c h m a rk c o v e ra g e a ls o h a s so m e re la tio n to th e c o st o f
b e n c h m a rk in g . F o r e x a m p le , if n e tw o rk in g is in v o lv e d , th e b e n c h m a rk s e tu p ty p ic a lly
in c lu d e s o n e o r m o re se rv e rs a n d s e v e ra l (o fte n m a n y ) c lie n ts . It is n o t s u rp ris in g th a t
th e n u m b e r o f re s u lts fo r s u c h b e n c h m a rk s, w h e re m e a su re m e n ts ta k e w e e k s o r
m o n th s , is m u c h s m a lle r th a n fo r th o s e th a t in v o lv e o n ly o n e s y s te m .
3 A C lo s e r L o o k a t th e M o r e P o p u la r B e n c h m a r k s
3 .1 S P E C C P U B e n c h m a r k s, M e a su r e m e n t M e th o d s
S in c e 1 9 8 9 , S P E C h a s re p la c e d th e C P U b e n c h m a rk s th re e tim e s , w ith C P U 9 2 ,
C P U 9 5 , a n d C P U 2 0 0 0 . C u rre n tly , th e S P E C C P U su b c o m m itte e is w o rk in g o n
C P U 2 0 0 4 , in te n d e d to re p la c e th e c u rre n t s u ite C P U 2 0 0 0 . O n th e o th e r h a n d , th e
p rin c ip le o f S P E C C P U b e n c h m a rk in g h a s b e e n r e m a rk a b ly c o n s is te n t:
B e n c h m a rk in g 1 8 7
A n u m b e r o f p ro g ra m s a re c o n trib u te d b y th e S P E C m e m b e r c o m p a n ie s , b y
th e o p e n s o u rc e c o m m u n ity , o r b y re s e a rc h e rs in th e a c a d e m ic c o m m u n ity .
F o r th e C P U 2 0 0 0 s u ite , a n d a g a in fo r C P U 2 0 0 4 , S P E C h a s in itia te d a n
a w a rd p ro g ra m to e n c o u ra g e s u c h c o n trib u tio n s .
T h e m e m b e r c o m p a n ie s th a t a re a c tiv e in th e C P U s u b c o m m itte e p o rt th e
b e n c h m a rk s to th e ir v a rio u s p la tfo rm s ; d e p e n d e n c y o n I/O o r o p e ra tin g
s y s te m a c tiv ity is re m o v e d , if n e c e s s a ry . C a re is ta k e n th a t a ll c h a n g e s a re
p e rfo rm a n c e -n e u tra l a c ro s s p la tfo rm s . If p o s s ib le , th e c o o p e ra tio n o f th e
o rig in a l p ro g ra m ’s a u th o r(s ) is s o u g h t fo r a ll th e s e a c tiv itie s .
T h e b e n c h m a rk s a re te s te d in a to o l h a rn e s s p ro v id e d b y S P E C . C o m p ila tio n
a n d e x e c u tio n o f th e b e n c h m a rk s is a u to m a te d a s m u c h a s p o s s ib le . T h e
te s te r s u p p lie s a “ c o n f ig u r a tio n f ile ” w ith s y s te m - s p e c if ic p a r a m e te r s ( e .g .
lo c a tio n o f th e C c o m p ile r, c o m p ila tio n fla g s , lib ra rie s , d e s c rip tio n o f th e
s y s te m u n d e r te s t).
T a b le 4 . S P E C ’s C P U b e n c h m a rk s o v e r th e y e a rs
C P U 8 9 C P U 9 2 C P U 9 5 C P U 2 0 0 0
In te g e r p ro g ra m s 4 6 8 1 2
F lo a tin g -p o in t 6 1 4 1 0 1 4
p ro g ra m s
T o ta l s o u rc e lin e s , 7 7 ,1 0 0 8 5 ,5 0 0 2 7 5 ,0 0 0 3 8 9 ,3 0 0
In te g e r
T o ta l s o u rc e lin e s , 2 4 ,2 0 0 4 4 ,0 0 0 2 0 ,6 0 0 1 5 8 ,3 0 0
F P
S o u rc e la n g u a g e s C , F 7 7 C , F 7 7 C , F 7 7 C , C + + , F 7 7 , F 9 0
N u m b e r o f re su lts 1 9 1 1 2 9 2 1 8 8 1 1 0 4 3
th ro u g h Q 1 /2 0 0 2
In its o ffic ia l s ta te m e n ts , S P E C e m p h a s iz e s th e a d v ic e “ L o o k a t a ll th e n u m b e rs ” .
T h is m e a n s th a t a c u s to m e r in te re s te d in a p a rtic u la r a p p lic a tio n a re a w e ig h ts
b e n c h m a rk s fro m th is a re a h ig h e r th a n th e re s t. E x p e rts c a n d ra w e v e n m o re
in te re s tin g c o n c lu s io n s , c o rr e la tin g , fo r e x a m p le , th e w o rk in g s e ts fo r p a rtic u la r
b e n c h m a rk s w ith th e te s t s y s te m ’s c a c h e a rc h ite c tu re a n d c a c h e s iz e s [5 ]. H o w e v e r,
s u c h e v a lu a tio n s re m a in e d , to a la rg e d e g re e , a n a re a o f e x p e rts o n ly ; c u s to m e rs ra re ly
lo o k a t m o re th a n th e n u m b e r a c h ie v e d in th e o v e ra ll m e tric .
3 .2 E v o lu tio n o f th e S P E C C P U B e n c h m a r k s , I s s u e s
D e s p ite th e c o n s is te n c y o f th e m e a s u re m e n t m e th o d , a n u m b e r o f n e w e le m e n ts w e re
b ro u g h t in to th e s u ite o v e r th e y e a rs . B o th th e S P E C -p ro v id e d to o l h a rn e ss a n d th e
R u n R u le s re g u la tin g c o n fo rm a n t e x e c u tio n s o f th e s u ite g re w in c o m p le x ity . T h e
m o s t im p o rta n t s in g le c h a n g e w a s th e in tro d u c tio n o f th e “ b a s e lin e ” m e tric in 1 9 9 4 .
T h e id e a is n o w g e n e ra lly a c c e p te d th a t it m a k e s s e n s e to h a v e , in a d d itio n to th e
“ e v e ry th in g g o e s (e x c e p t s o u rc e c o d e c h a n g e s )” o f th e “ p e a k ” re s u lts , a “ b a s e lin e ”
re s u lt. H o w e v e r, th e d e ta ils a re o fte n c o n tro v e rs ia l, th e y e m e rg e a s a c o m p ro m is e in
S P E C ’s C P U s u b c o m m itte e . T h e q u e s tio n “ W h a t is th e p h ilo s o p h y b e h in d b a se lin e ? ”
m a y g e n e ra te d iffe re n t a n s w e rs if d iffe re n t p a rtic ip a n ts a re a s k e d . A p o ss ib le a n sw e r
c o u ld b e :
B a s e lin e ru le s s e rv e to fo rm a " b a s e lin e " o f p e rfo rm a n c e th a t ta k e s in to a c c o u n t
re a s o n a b le e a s e o f u s e fo r d e v e lo p e rs ,
c o rre c tn e s s a n d s a fe ty o f th e g e n e ra te d c o d e ,
re c o m m e n d a tio n s o f th e c o m p ile r v e n d o r fo r g o o d p e rfo rm a n c e ,
re p re s e n ta tiv ity o f th e c o m p ila tio n /lin k a g e p ro c e s s fo r w h a t h a p p e n s in th e
p ro d u c tio n o f im p o rta n t s o ftw a re p a c k a g e s .
A g a in , it c a n n o t b e d is p u te d th a t in s o m e c a s e s , in d iv id u a l p o in ts m a y c o n tra d ic t e a c h
o th e r: W h a t if th e v e n d o r o f a p o p u la r c o m p ile r s e ts , fo r p e rfo rm a n c e re a so n s, th e
d e fa u lt b e h a v io r to a m o d e th a t d o e s n o t im p le m e n t a ll fe a tu re s re q u ire d b y th e
la n g u a g e d e fin itio n ? T h is u s a g e m o d e m a y b e v e ry c o m m o n – m o st u se rs la c k th e
e x p e rtis e to re c o g n iz e s u c h c a s e s a n y w a y -, b u t c o rre c tn e s s o f th e g e n e ra te d c o d e , a s
d e fin e d b y th e la n g u a g e s ta n d a rd , is n o t g u a ra n te e d . S in c e S P E C re le a s e s th e C P U
b e n c h m a rk s in s o u rc e c o d e fo rm , c o rre c t im p le m e n ta tio n o f th e p ro g ra m m in g
la n g u a g e a s d e fin e d b y th e s ta n d a rd is a n e c e s s a ry re q u ire m e n t fo r a fa ir c o m p a ris o n .
3 .3 S P E C S y ste m B e n c h m a r k s
S P E C O S G s ta rte d w ith C P U b e n c h m a rk s b u t v e ry s o o n a ls o d e v e lo p e d b e n c h m a rk s
th a t m e a s u re d s y s te m p e rfo rm a n c e . T h e a re a s th a t S P E C c h o s e to p u t e ffo rts in w e re
d e te rm in e d b y a p e rc e p tio n o f th e m a rk e t d e m a n d s a s s e e n b y th e S P E C O S G m e m b e r
c o m p a n ie s . F o r e x a m p le , w h e n th e In te rn e t a n d J a v a g a in e d p o p u la rity , S P E C O S G
s o o n d e v e lo p e d its W e b a n d J V M b e n c h m a rk s . C u rre n tly , J a v a o n s e rv e rs a n d m a il
s e rv e rs a re s e e n a s h o t to p ic s ; th e re fo re , J a v a s e rv e r b e n c h m a rk s a n d m a il s e rv e r
b e n c h m a rk s a re a re a s w h e re S P E C m e m b e r c o m p a n ie s in v e s t c o n s id e ra b le e ffo rts o n
th e d e v e lo p m e n t o f n e w b e n c h m a rk s . T h e re a re a ls o a re a s w h e re S P E C ’s e ffo rts w e re
u n s u c c e s s fu l: S P E C w o rk e d fo r s o m e tim e o n a n I/O b e n c h m a rk b u t fin a lly c o u ld n o t
fin d a p ra c tic a l w a y b e tw e e n ra w d e v ic e m e a s u re m e n ts a n d s y s te m -s p e c ific I/O
lib ra ry c a lls . (T h e s e e ffo rts a p p a re n tly a re ta k e n u p n o w b y a s e p a ra te in d u s try
o r g a n iz a tio n , th e “ S to r a g e P e r f o r m a n c e C o u n c il” , s e e w w w .s to r a g e p e r f o r m a n c e .o r g ) .
T h e o n ly a re a in te n tio n a lly le ft o u t b y S P E C is tra n s a c tio n p ro c e s s in g , th e tra d itio n a l
d o m a in o f th e s is te r b e n c h m a rk in g o rg a n iz a tio n T P C .
It is im p o rta n t to re a liz e th a t S P E C ’s s y s te m b e n c h m a rk s a re b o th b ro a d e r a n d
n a rro w e r th a n th e c o m p o n e n t (C P U ) b e n c h m a rk s:
T h e y a re b ro a d e r th a n th e C P U b e n c h m a rk s b e c a u s e th e y te s t m o re c o m p o n e n ts
o f th e s y s te m , ty p ic a lly in c lu d in g th e O S , n e tw o rk in g , a n d – fo r s o m e
b e n c h m a rk s – th e I/O s u b s y s te m .
B e n c h m a rk in g 1 9 1
T h e y a re n a rro w e r th a n th e C P U b e n c h m a rk s b e c a u s e th e y te s t th e s y s te m w h e n
it is e x e c u tin g s p e c ific , s p e c ia liz e d ta s k s o n ly , e .g . a c tin g a s a f ile s e r v e r , a w e b
s e rv e r o r a m a il s e rv e r.
T h is n a rro w e r s c o p e o f s o m e s y s te m b e n c h m a rk s is n o t u n re la te d to re a l-life p ra c tic e :
M a n y c o m p u te rs a re e x c lu s iv e ly u s e d a s file s e rv e rs , w e b s e rv e rs , d a ta b a s e s e rv e rs ,
m a il s e rv e rs , e tc . T h e re fo re it m a k e s s e n s e to te s t th e m in s u c h a lim ite d s c e n a rio
o n ly .
M o s t s y s te m b e n c h m a rk s p re s e n t re s u lts in th e fo rm o f a ta b le o r a c u rv e , g iv in g , fo r
e x a m p le , th e th ro u g h p u t c o rre la te d w ith th e re s p o n s e tim e . T h is c o rre s p o n d s to
S P E C ’s p h ilo s o p h y “ L o o k a t a ll th e n u m b e rs ” , s im ila rly a s th e C P U b e n c h m a rk s u ite s
g iv e s th e re s u lts fo r e a c h in d iv id u a l b e n c h m a rk . H o w e v e r, S P E C is re a lis tic e n o u g h to
k n o w th a t th e m a rk e t o fte n d e m a n d s a “ s in g le fig u re o f m e rit” , a n d h a s d e fin e d s u c h a
n u m b e r f o r e a c h b e n c h m a r k ( e .g . m a x im u m th r o u g h p u t, th r o u g h p u t a t o r b e lo w a
s p e c ific re s p o n s e tim e ). T a b le 5 lis ts S P E C ’s c u rre n t s y s te m b e n c h m a rk s , w ith th e
n u m b e r o f p u b lic a tio n s o v e r th e y e a rs .
T a b le 5 . S P E C O S G s y s te m b e n c h m a rk re s u lts o v e r th e y e a rs
S D M S F S S P E C S P E C S P E C S P E C S P E C
w e b 9 6 w e b 9 9 jv m 9 8 jb b 2 0 0 0 m a il2 0 0 1
1 9 9 1 5 1
1 9 9 2 1 7
1 9 9 3 5 2 1
1 9 9 4 6 1 8
1 9 9 5 1 5 2 1
1 9 9 6 1 4 2 2
1 9 9 7 3 6 5 0
1 9 9 8 1 9 8 0 2 1
1 9 9 9 9 6 6 1 5 2 6
2 0 0 0 6 2 1 2 4 9 2 3 2 2
2 0 0 1 2 5 + 3 4 5 7 4 5 7 6
Q 1 /2 0 0 2 2 1 1 3 3 2 1 3
O v e ra ll 9 4 2 1 5 2 2 5 1 2 4 7 7 1 0 0 9
S o m e g e n e ra l d iffe re n c e s to th e C P U b e n c h m a rk s a re :
T h e w o rk lo a d is , a lm o s t in e v ita b ly , s y n th e tic . W h e re a s S P E C a v o id s s y n th e tic
b e n c h m a rk s fo r th e C P U s u ite s , w o rk lo a d s fo r file s e rv e r o r w e b s e rv e r
b e n c h m a rk s c a n n o t b e d e riv e d fro m “ re a l life ” w ith o u t e x te n s iv e tra c e c a p tu rin g /
r e p la y c a p a b ilitie s th a t m a k e u s e in b e n c h m a rk s im p ra c tic a l. O n th e o th e r h a n d ,
p r o p e r tie s th a t a r e im p o r ta n t f o r s y s te m b e n c h m a r k s , e .g . f ile s iz e s , c a n b e m o r e
e a s ily m o d e le d in s y n th e tic w o rk lo a d s .
T h e re s u lts a re m o re d iffic u lt to u n d e rs ta n d ; th e re fo re th e b e n c h m a rk s a re
p o s s ib ly n o t a s w e ll k n o w n a n d n o t a s p o p u la r a s th e C P U b e n c h m a rk s .
1 9 2 R . W e ic k e r
3 .3 .1 S D M B e n c h m a r k
S P E C ’s firs t s y s te m b e n c h m a rk s u ite , re le a s e d in 1 9 9 1 , w a s S D M (S y s te m
D e v e lo p m e n t M u ltiu s e r, re le a s e d 1 9 9 1 ). It c o n s is ts o f tw o in d iv id u a l b e n c h m a rk s th a t
d iffe r in s o m e a s p e c ts (w o rk lo a d in g re d ie n ts , th in k tim e ) b u t h a v e m a n y p ro p e rtie s in
c o m m o n : B o th h a v e a m ix o f g e n e ra l-p u rp o s e U n ix c o m m a n d s (ls , c d , fin d , c c , ...) a s
th e ir w o rk lo a d . B o th u s e s c rip ts o r s im u la te d c o n c u rre n t u s e rs th a t p u t m o re a n d m o re
s tre s s o n th e s y s te m , u n til th e s y s te m b e c o m e s o v e rlo a d e d a n d th e a d d itio n o f u se rs
re s u lts in a d e c re a s e in th ro u g h p u t.
S D M is a g o o d e x a m p le f o r th e e x p e rie n c e th a t a g o o d p e rfo rm a n c e m e a s u re m e n t to o l
is n o t y e t n e c e s s a rily a g o o d b e n c h m a rk : A n in -h o u s e to o l n e e d s n o p ro v is io n s a g a in s t
u n in te n d e d o p tim iz a tio n s (“ c h e a tin g ” ); th e u s e r w o u ld o n ly c h e a t h im s e lf o r h e rs e lf.
A b e n c h m a rk w h o se re su lts a re u s e d in m a rk e tin g m u s t h a v e a d d itio n a l q u a litie s : It
m u s t b e ta m p e r-p ro o f a g a in s t s u b s titu tio n o f fa s t, s p e c ia l-c a s e , b e n c h m a rk -s p e c ific
c o m p o n e n ts w h e re n o rm a lly o th e r s o ftw a re c o m p o n e n ts w o u ld b e u se d .
3 .3 .2 S F S B e n c h m a r k
T h e S F S b e n c h m a rk (S y s te m b e n c h m a rk , F ile S e rv e r, firs t re le a s e 1 9 9 3 ) is S P E C ’s
firs t c lie n t-s e rv e r b e n c h m a rk a n d h a s e s ta b lis h e d a m e th o d th a t w a s la te r u s e d fo r
o th e r b e n c h m a rk s a ls o : T h e b e n c h m a r k c o d e ru n s s o le ly o n th e c lie n ts , th e y g e n e ra te
N F S re q u e s ts lik e lo o k u p , g e ta ttr, re a d , w rite , e tc . U n ix is re q u ire d fo r th e c lie n ts , b u t
th e se rv e r, th e s y s te m u n d e r te s t, c a n b e a n y s e rv e r c a p a b le o f a c c e p tin g N F S
B e n c h m a rk in g 1 9 3
D e s p ite th e la rg e in v e s tm e n t n e c e ss a ry fo r th e te s t s p o n s o r (th e s e tu p fo r a la rg e
s e rv e r m a y in c lu d e a s m a n y a s 4 8 0 d is k s fo r th e s e rv e r, a n d a s m a n y a s 2 0 lo a d -
g e n e ra tin g c lie n ts ), th e re h a s b e e n a s te a d y flo w o f re s u lt p u b lic a tio n s , a n d S F S is th e
e s ta b lis h e d b e n c h m a r k in its a re a . In 1 9 9 7 , S F S 1 .0 w a s re p la c e d b y S F S 2 .0 . T h e
n e w e r b e n c h m a rk c o v e rs th e N F S p ro to c o l v e rs io n 3 , a n d it h a s a n e w ly d e s ig n e d m ix
o f o p e ra tio n s .
In s u m m e r 2 0 0 1 , p ro m p te d b y o b s e rv a tio n s d u rin g re s u lt re v ie w s , S P E C d is c o v e re d
s ig n ific a n t d e fe c ts in its S F S 9 7 b e n c h m a rk s u ite : C e rta in p ro p e rtie s b u ilt in to th e
b e n c h m a rk (p e rio d ic c h a n g e s b e tw e e n h ig h a n d lo w file s y s te m a c tiv itie s , d is trib u tio n
o f file s a c c e s s e s , n u m e ric a c c u ra c y o f th e ra n d o m p ro c e s s s e le c tin g th e file s th a t a re
a c c e s s e d ) w e re n o lo n g e r g u a ra n te e d w ith to d a y ’s fa st p ro c e sso rs. A s a c o n se q u e n c e ,
S P E C h a s s u s p e n d e d s a le s o f th e S F S 9 7 (2 .0 ) b e n c h m a rk a n d re p la c e d it b y S F S 9 7
R 1 V 3 .0 . R e s u lt s u b m is s io n s h a d to s ta r t o v e r , s in c e re s u lts m e a s u re d b y th e d e fe c tiv e
b e n c h m a rk c a n n o t b e u s e d w ith o u t c o n s id e ra b le c a re in in te rp re ta tio n .
3 .3 .3 S P E C w e b B e n c h m a r k s
T h e S P E C w e b b e n c h m a rk (W e b s e rv e r b e n c h m a rk , firs t re le a s e 1 9 9 6 ) w a s d e riv e d
fro m th e S F S b e n c h m a rk , a n d it h a s m a n y p ro p e rtie s in c o m m o n w ith S F S :
It m e a s u re s th e p e rfo rm a n c e o f th e s e rv e r a n d trie s to d o th is , a s m u c h a s
p o s s ib le , in d e p e n d e n tly fro m th e c lie n ts ’ p e rfo rm a n c e .
A s y n th e tic lo a d is g e n e ra te d o n th e c lie n ts , g e n e ra tin g H T T P re q u e s ts . In th e
c a s e o f S P E C w e b 9 6 , th e s e w e re s ta tic G E T re q u e s ts o n ly , th e m o s t c o m m o n ty p e
o f H T T P re q u e s ts a t th a t tim e . S P E C w e b 9 9 a d d e d d y n a m ic re q u e s ts (d y n a m ic
G E T , d y n a m ic P O S T , s im u la tio n o f a n a d ro ta tio n s c h e m e b a s e d o n c o o k ie s ).
T h e file s iz e d is trib u tio n is b a s e d o n lo g s fro m s e v e ra l la rg e w e b s ite s ; th e file s e t
s iz e is re q u ire d to s c a le w ith th e p e rfo rm a n c e
D iffe re n t fro m S F S , th e b e n c h m a rk c o d e c a n ru n o n N T c lie n t s y s te m s a s w e ll a s o n
U n ix c lie n t s y s te m s . S im ila r to S F S , th e s e rv e r c a n b e a n y s y s te m c a p a b le o f s e rv in g
H T T P re q u e s ts . A s a p p a re n t fro m th e la rg e n u m b e r o f re s u lt p u b lic a tio n s (s e e ta b le
5 ), S P E C ’s w e b b e n c h m a rk s h a v e b e c o m e v e ry p o p u la r.
W h e n it b e c a m e e v id e n t th a t e le c tro n ic c o m m e rc e n o w c o n s titu te s a la rg e p e rc e n ta g e
o f W W W u s a g e , a n d th a t th is ty p ic a lly in v o lv e s u se o f a “ S e c u re S o c k e t L a y e r”
(S S L ) m e c h a n is m , S P E C re s p o n d e d w ith a n e w W e b b e n c h m a rk , S P E C w e b 9 9 _ S S L .
In th e in te re s t o f a fa s t re le a s e o f th e b e n c h m a rk , S P E C d id n o t c h a n g e th e w o rk lo a d
b u t ju s t a d d e d S S L u s a g e to th e e x is tin g S P E C w e b 9 9 b e n c h m a rk . A n e w S S L
h a n d s h a k e is re q u ire d w h e n e v e r S P E C w e b 9 9 te rm in a te d a “ k e e p a liv e ” c o n n e c tio n (in
th e a v e ra g e , e v e ry te n th re q u e s t). O f c o u rs e , it m a k e s n o s e n s e to c o m p a re
S P E C w e b 9 9 _ S S L re s u lts w ith (n o n -S S L ) S P E C w e b 9 9 re s u lts . A n e x c e p tio n c a n b e
re s u lts th a t h a v e b e e n o b ta in e d fo r th e s a m e s y s te m ; th e y c a n h e lp to e v a lu a te th e
p e rfo rm a n c e im p a c t o f S S L e n c ry p tio n o n th e s e rv e r.
1 9 4 R . W e ic k e r
3 .3 .4 S P E C J a v a B e n c h m a r k s
In th e y e a rs 1 9 9 7 /1 9 9 8 , w h e n it b e c a m e c le a r th a t J a v a p e rfo rm a n c e w a s a h o t to p ic in
in d u s try , S P E C fe lt c o m p e lle d to p ro d u c e , d u rin g a re la tiv e ly s h o rt tim e , a s u ite fo r
J a v a b e n c h m a r k in g . P r e v io u s b e n c h m a r k c o lle c tio n in th is a r e a ( C o f f e in e m a r k e tc .)
h a d k n o w n w e a k n e s s e s th a t m a d e th e m v u ln e ra b le to s p e c ific o p tim iz a tio n s . T h e firs t
S P E C J a v a b e n c h m a rk s u ite fo llo w e d th e p a tte rn o f th e e s ta b lis h e d C P U b e n c h m a rk s :
A c o lle c tio n o f p ro g ra m s , ta k e n fro m re a l a p p lic a tio n s w h e re p o s s ib le , in d iv id u a l
b e n c h m a rk p e rfo rm a n c e ra tio s , a n d th e g e o m e tric m e a n o v e r a ll b e n c h m a rk s a s a
“ s in g le fig u re o f m e rit” . In th e d e s ig n o n th e b e n c h m a rk s u ite a n d th e ru n ru le s fo r th e
s u ite , s e v e ra l n e w a s p e c ts h a d to b e d e a lt w ith :
G a rb a g e c o lle c tio n p e rfo rm a n c e , e v e n th o u g h it m a y o c c u r a t u n p re d ic ta b le
tim e s , is im p o rta n t fo r J a v a p e rfo rm a n c e .
J u s t-In -T im e (J IT ) c o m p ile rs a re ty p ic a lly u s e d w ith J a v a v irtu a l m a c h in e s , in
p a rtic u la r fo r s y s te m s fo r w h ic h th e m a n u fa c tu re r w a n ts to s h o w g o o d
p e rfo rm a n c e .
D u r in g th e f ir s t y e a r s , J a v a w a s ty p ic a lly u s e d o n s m a ll c lie n t s y s te m s ( e .g .
w ith in w e b b ro w s e rs ), a n d m e m o ry s iz e is a n im p o rta n t is s u e fo r s u c h s y s te m s .
F in a lly , S P E C h a d to d e c id e w h e th e r b e n c h m a rk e x e c u tio n n e e d e d to fo llo w th e
s tric t ru le s o f th e o ffic ia l J a v a d e fin itio n , o r w h e th e r s o m e s o rt o f o fflin e
c o m p ila tio n s h o u ld b e a llo w e d .
S P E C d e c id e d to s ta rt w ith a s u ite th a t re q u ire s s tric t c o m p lia n c e w ith th e J a v a V irtu a l
M a c h in e m o d e l. A n o v e l fe a tu re o f S P E C jv m 9 8 re s u lts is th e ir g ro u p in g in to th re e
c a te g o rie s a c c o rd in g to th e m e m o ry s iz e : U n d e r 4 8 M B , 4 8 – 2 5 6 M B , o v e r 2 5 6 M B .
T h e e x p e rie n c e s e e m s to s h o w th a t th e te s t s p o n s o rs (m o s tly h a rd w a re m a n u fa c tu re rs )
fo llo w e d S P E C ’s s u g g e s tio n a n d p ro d u c e d re s u lts n o t o n ly fo r b ig m a c h in e s b u t a ls o
fo r “ th in ” c lie n ts .
D u rin g th e la s t y e a rs , J a v a b e c a m e m o re p o p u la r a s a p ro g ra m m in g la n g u a g e n o t o n ly
fo r s m a ll (c lie n t) s y s te m s , b u t a ls o fo r s e rv e rs . S P E C re s p o n d e d w ith th e
S P E C jb b 2 0 0 0 J a v a B u s in e s s B e n c h m a rk w h ic h h a s b e c o m e q u ite p o p u la r.
M a n u fa c tu re rs o f s e rv e r s y s te m s n o w u s e it to d e m o n s tra te th e c a p a b ilitie s o f th e ir
h ig h -e n d s e rv e rs . T h e J a v a c o d e e x e c u te d in S P E C jb b 2 0 0 0 re s e m b le s T P C ’s T P C -C
b e n c h m a r k ( w a r e h o u s e s , p r o c e s s in g o f o r d e r s ; s e e s e c tio n 3 .3 ) b u t th e r e a liz a tio n is
d iffe re n t: In s te a d o f re a l d a ta b a s e o b je c ts s to re d o n d is k s , J a v a o b je c ts (in m e m o ry )
a re u s e d to re p re s e n t th e b e n c h m a rk ’s d a ta ; th e re fo re J a v a m e m o ry a d m in is tra tio n a n d
g a rb a g e c o lle c tio n p la y a n im p o rta n t ro le fo r p e rfo rm a n c e . O v e ra ll, th e in te n tio n is to
m o d e l th e m id d le tie r in th re e -tie r s o ftw a re s y s te m s ; s u c h s o ftw a re is n o w o fte n
w ritte n in J a v a .
3 .4 T P C B e n c h m a r k s
L ik e S P E C , th e T ra n s a c tio n P ro c e s s in g P e rfo rm a n c e C o u n c il (T P C ), fo u n d e d in 1 9 8 8 ,
is a n o n -p ro fit c o rp o ra tio n w ith th e m is s io n to d e liv e r o b je c tiv e p e rf o rm a n c e
e v a lu a tio n s ta n d a rd s to th e in d u s try . H o w e v e r, T P C fo c u s e s o n tra n s a c tio n p r o c e s s in g
a n d d a ta b a s e b e n c h m a rk s . A n o th e r im p o rta n t d iffe re n c e to th e S P E C O S G
b e n c h m a rk s is th e fa c t th a t T P C , fro m its b e g in n in g , in c lu d e d a p ric e /p e rf o rm a n c e
m e tr ic a n d m a d e p r ic e r e p o r tin g m a n d a to r y ( s e e s e c tio n 2 .2 ) . T h e m o s t w id e ly u s e d
b e n c h m a rk o f th is o rg a n iz a tio n , a n d th e re fo re th e c la s s ic s y s te m b e n c h m a r k , is th e
T P C -C p u b lis h e d in 1 9 9 2 , a n O L T P b e n c h m a rk s im u la tin g a n o rd e r-e n try b u s in e s s
s c e n a rio .
S in c e T P C d o e s n o t d e fin e its b e n c h m a rk s v ia s o u rc e c o d e , th e d a ta b a s e v e n d o rs ,
o fte n in c o o p e ra tio n w ith th e m a jo r s y s te m v e n d o rs , ty p ic a lly p ro d u c e a “ b e n c h m a rk
k it” w h ic h im p le m e n ts th e b e n c h m a rk fo r a s p e c ific h a rd w a re /s o ftw a re c o n fig u ra tio n .
It is th e n th e ta s k o f th e T P C -c e rtifie d a u d ito r to c h e c k th a t th e k it im p le m e n ts th e
b e n c h m a rk c o rre c tly , in a d d itio n to v e rify in g th e p e rfo rm a n c e d a ta (th ro u g h p u t,
re s p o n s e tim e s ). A n im p o rta n t re q u ire m e n t c h e c k e d b y th e a u d ito r, w h ic h is u n iq u e to
T P C , a re th e s o -c a lle d “ A C ID p ro p e rtie s ”
A to m ic ity : T h e e n tire s e q u e n c e o f a c tio n s m u s t b e e ith e r c o m p le te d o r
a b o rte d .
C o n s is te n c y : T ra n s a c tio n s ta k e th e re s o u rc e s fro m o n e c o n s is te n t s ta te to
a n o th e r c o n s is te n t s ta te .
1 9 6 R . W e ic k e r
O v e r th e y e a r s , T P C - C u n d e r w e n t s e v e r a l r e v is io n s , th e c u r r e n t m a jo r r e v is io n is 5 .0 .
It d iffe rs f r o m r e v is io n 3 .5 in s o m e a s p e c ts o n ly ( r e v is e d p r ic in g r u le s ) ; th e a lg o r ith m s
re m a in e d u n c h a n g e d . A n e a rlie r a tte m p t (“ re v is io n 4 .0 ” ) to m a k e th e b e n c h m a rk
e a s ie r to h a n d le ( e .g . f e w e r d is k s r e q u ir e d ) , a n d a t th e s a m e tim e m o r e r e a lis tic ( e .g .
m o re c o m p le x tra n s a c tio n s ) fa ile d a n d d id n o t g e t th e n e c e s s a ry v o te s w ith in T P C . It
c a n b e a s s u m e d th a t th e in v e s tm e n t th a t c o m p a n ie s h a d m a d e in to th e e x is tin g re s u lts
(w h ic h w o u ld th e n lo o s e th e ir v a lu e fo r c o m p a ris o n s ) p la y e d a ro le in th is d e c is io n .
w e re m o re in te re s te d in th e “ b a s e lin e ” v e rs io n o f d e c is io n s u p p o rt b e n c h m a rk s .
R e a liz in g th a t th e s p lit in to tw o b e n c h m a rk s w a s a te m p o ra ry s o lu tio n o n ly , T P C
c u rre n tly w o rk s o n a c o m m o n s u c c e s s o r b e n c h m a rk fo r b o th T P C -H a n d T P C -R .
T P C ’s la te s t b e n c h m a rk , T P C -W , c o v e rs a n im p o rta n t n e w a re a ; it h a s b e e n d e s ig n e d
to s im u la te th e a c tiv itie s o f a b u s in e s s o rie n te d tra n s a c tio n a l In te rn e t w e b s e rv e r, a s it
m ig h t b e u s e d in e le c tro n ic c o m m e rc e . C o rre s p o n d in g ly , th e a p p lic a tio n p o rtra y e d b y
th e b e n c h m a rk is a re ta il s to re o n th e In te rn e t w ith c u s to m e r “ b ro w s e a n d o rd e r”
s c e n a rio . T h e fig u re o f m e rit c o m p u te d b y th e b e n c h m a rk is “ W e b In te ra c tio n s P e r
S e c o n d ” (W IP S ), fo r a g iv e n s c a le fa c to r (o v e ra ll ite m c o u n t). T h e in itia l p ro b le m o f
T P C -W s e e m s to b e its c o m p le x ity s in c e th e re a re m a n y c o m p o n e n ts th a t c a n
in flu e n c e th e re s u lt:
W e b s e rv e r, a p p lic a tio n s e rv e r, im a g e s e rv e r, d a ta b a s e s o ftw a re , a ll o f
w h ic h c a n c o m e fro m d iffe re n t s o u rc e s
S S L im p le m e n ta tio n
T C P /IP re a liz a tio n
R o u te b a la n c in g
C a c h in g
It c o u ld b e d u e to th is c o m p le x ity th a t th e re a re s till re la tiv e ly fe w T P C -W re s u lts
(c u rre n tly , a s o f J u n e 2 0 0 2 , 1 3 re s u lts ), m u c h le s s th a n fo r T P C -C (7 9 a c tiv e re s u lts
fo r v e rs io n 5 ). T h is m a y b e a n in d ic a tio n th a t in th e n e c e s s a ry tra d e o ff b e tw e e n
re p re s e n ta tiv ity (re a lis tic s c e n a rio s ) a n d e a s y o f u s e , T P C m ig h t h a v e b e e n to o
a m b itio u s a n d m ig h t h a v e d e s ig n e d a b e n c h m a rk th a t is to o c o s tly to m e a s u re . A ls o ,
th e b e n c h m a rk d o e s n o t h a v e a n e a s y -to -u n d e rs ta n d , in tu itiv e re s u lt m e tric . S P E C ’s
“ H T T P o p s /s e c ” (in S P E C w e b 9 6 ) m a y b e u n re a lis tic if o n e lo o k s c lo s e r a t th e
d e fin itio n , b u t it a t le a s t a p p e a r s m o re in tu itiv e th a n T P C -W ’s “ W IP S ” (W e b
In te ra c tio n s P e r S e c o n d ). In a d d itio n , w h e n th e firs t re s u lts w e re s u b m itte d , it b e c a m e
c le a r th a t m o re ru le s a re n e c e s s a ry fo r th e b e n c h m a rk w ith re s p e c t to th e ro le o f th e
v a rio u s s o ftw a re la y e rs (w e b s e rv e r, a p p lic a tio n s e rv e r, d a ta b a s e s e rv e r).
3 .5 S A P B e n c h m a r k s
T h e S D a n d A T O b u s in e s s s c e n a rio is th a t o f a s u p p ly c h a in : A c u s to m e r o rd e r is
p la c e d , th e d e liv e ry o f g o o d s is s c h e d u le d a n d in itia te d , a n in v o ic e in w ritte n . In S D ,
a n o rd e r c o m p ris e s fiv e s im p le a n d in d e p e n d e n t ite m s fro m a w a re h o u s e . In A T O , a n
in d iv id u a lly c o n fig u re d a n d a s s e m b le d P C is o rd e re d , w h ic h e x p la in s th e d iffe re n c e s
in c o m p le x ity . T h e s e q u e n c e o f S A P tra n s a c tio n s c o n s is ts o f a n u m b e r o f d ia lo g s te p s
o r s c re e n c h a n g e s . B y m e a n s o f a b e n c h m a rk d riv e r th e b e n c h m a rk s s im u la te
c o n c u rre n t u s e rs p a s s in g th ro u g h th e re s p e c tiv e s e q u e n c e w ith 1 0 s e c o n d s th in k tim e
a fte r e a c h d ia lo g s te p . A fte r a ll v irtu a l u s e rs h a v e lo g g e d in to th e S A P s y s te m a n d
s ta rte d w o rk in g in a ra m p -u p p h a s e , th e u s e rs re p e a t th e s e q u e n c e a s m a n y tim e s a s is
n e c e s s a ry to p ro v id e a s te a d y s ta te m e a s u re m e n t w in d o w o f a t le a s t 1 5 m in u te s . It is
re q u ire d th a t th e a v e ra g e re s p o n s e tim e o f th e d ia lo g s te p s is le s s th a n tw o s e c o n d s . In
c a s e o f S D , u s e rs , re s p o n s e tim e , a n d th e th ro u g h p u t e x p re s s e d a s S A P S a re th e m a in
p e rfo rm a n c e m e tric s . F o r A T O w h e re th e c o m p le te s e q u e n c e o f d ia lo g s te p s is c a lle d
a fu lly b u s in e s s p r o c e s s e d a s s e m b ly o r d e r , o n ly th e th ro u g h p u t in te rm s o f a s s e m b ly
o rd e rs p e r h o u r is re p o rte d .
B e n c h m a r k D r iv e r R /3 C e n tr a l In s ta n
P R IM E P O W E R 8 0 0 P R IM E R G Y N 8 0 0
R /3 U p d a te S e r v e r
P R IM E R G Y 4 0 0 /N 4 0 0
R /3 D ia lo g S e r v e r
P R IM E R G Y H 4 0 0 /N 4 0 0
1 6 0 R /3 A p p lic a tio n
S e rv e rs
6 4 C P U D B S e rv e r
P R IM E P O W E R 2 0 0 0
w ith E M C S to ra g e
T h e s p o n so r o f a th re e -tie r S A P b e n c h m a rk is fre e h o w to a c c o m m o d a te th e
p ro c e s s in g n e e d s o f th
e a p p lic a tio n la y e r. F o r e x a m p le , in a n o th e r S A P m e a su re m e n t
(IB M , w ith th e p 6 8 0 a
s d a ta b a s e s e rv e r), th e a p p lic a tio n la y e r c o n s is te d o f la rg e p 6 8 0
s y s te m s , to o . T h e b e nc h m a rk e r is fre e to c h o o s e b e tw e e n c o n fig u ra tio n s, b a se d fo r
e x a m p le o n e a s e o f h a n d lin g , o r o n th e d e s ire to im p le m e n t a ty p ic a l im p le m e n ta tio n
fo r re a l-life c lie n t/s e rv e r a p p lic a tio n e n v iro n m e n ts .
It m a y b e o f in te re s t to p ro v id e s o m e in s ig h t in to th e w o rk lo a d o n th e d a ta b a s e s e rv e r
a s th e m o s t im p o rta n t fa c to r in th e th re e -tie r c a s e . T h e re is a s im ila rity to th e T P C -C
b e c a u s e b o th b e n c h m a rk s d e a l w ith d a ta b a s e s e rv e r p e rfo rm a n c e . B u t w h ile in th e
T P C -C d is k I/O is a t th e c e n te r o f in te re s t, it is n e tw o rk I/O in th e S A P b e n c h m a rk .
T h e tw o w o rk lo a d s a re c o m p le m e n tin g e a c h o th e r in th is re g a rd in a v e ry n ic e w a y . In
th e b e n c h m a rk s e tu p s u m m a riz e d in F ig u re 4 th e re w e re 6 5 0 0 0 n e tw o rk ro u n d trip s o r
1 3 0 0 0 0 n e tw o rk I/O s p e r s e c o n d . F iv e o u t o f 6 4 C P U s o f th e d a ta b a s e s e rv e r w e re
a s s ig n e d to h a n d le th e in te rru p t p ro c e s s in g fo r th is tra ffic b e tw e e n a p p lic a tio n la y e r
a n d d a ta b a s e . T h e n e tw o rk s ta c k w a s s ta n d a rd T C P /IP , a m a tu re in d u s try s ta n d a rd .
4 B e n c h m a r k in g a n d C o m p u te r R e s e a r c h
It is w e ll k n o w n th a t s e v e ra l b e n c h m a rk s , in p a rtic u la r th e S P E C C P U b e n c h m a rk s ,
a re u s e d e x te n s iv e ly in th e m a n u fa c tu re rs ’ d e v e lo p m e n t la b s ; w e s h a ll d is c u s s th is
a s p e c t in s e c tio n 5 . A c ritic a l lo o k a t re c e n t c o m p u te r s c ie n c e c o n fe re n c e s o r jo u rn a ls ,
e s p e c ia lly in th e a re a o f c o m p u te r a rc h ite c tu re a n d c o m p ile r o p tim iz a tio n , s h o w s th a t
th is p h e n o m e n o n is n o t re s tric te d to m a n u fa c tu re rs , it a p p e a rs in a c a d e m ic re s e a rc h
a ls o . T a b le 6 s h o w s a s n a p s h o t fo r s e v e ra l m a jo r c o n fe re n c e s in 2 0 0 0 /2 0 0 1 , lis tin g
h o w o fte n b e n c h m a rk s w e re u s e d in c o n fe re n c e p a p e rs (N o te th a t s o m e p a p e rs u s e
m o re th a n o n e b e n c h m a rk p ro g ra m c o lle c tio n ).
2 0 0 R . W e ic k e r
T a b le 6 . U s e o f b e n c h m a rk c o lle c tio n s in c o n fe re n c e p a p e rs
A S P L O S S IG P L A N S IG A R C H
N o v . 2 0 0 0 Ju n e 2 0 0 1 Ju n e 2 0 0 1
O v e ra ll n u m b e r o f p a p e rs 2 4 3 0 2 4
S P E C C P U (9 2 , 9 5 , 2 0 0 0 ) 8 4 1 7
S P E C JV M 9 8 1 4 1
S P L A S H b e n c h m a rk s (P a ra lle l S y s te m s ) 2 - -
O ld e n b e n c h m a rk s (P o in te r, M e m o ry ) - - 3
O L T P / T P C 1 - 1
V a rio u s o th e r p ro g ra m c o lle c tio n s 1 1 1 3 8
N o b e n c h m a rk u s e d 6 9 3
L o o k in g a t ta b le 6 , w e c a n s a y :
B e n c h m a rk s th a t a re c o m p o s e d o f s e v e ra l in d iv id u a l c o m p o n e n ts (S P E C
C P U , S P E C J V M 9 8 , O ld e n , S P L A S H ) a re p a rtic u la rly p o p u la r. T h e s e a re
a ls o th e b e n c h m a rk s th a t a re re la tiv e ly e a s y to ru n , c o m p a re d w ith o th e rs .
F o r s p e c ific re s e a rc h to p ic s , s o m e b e n c h m a rk c o lle c tio n s th a t e m p h a s iz e a
p a rtic u la r a s p e c t a re p o p u la r: S P E C J V M 9 8 fo r J a v a , th e p a ra lle liz a b le
S P L A S H c o d e s fo r re s e a rc h o n p a ra lle l p ro c e s s in g , th e O ld e n b e n c h m a rk
c o lle c tio n o f p o in te r- a n d m e m o ry -in te n s iv e p ro g ra m s fo r re s e a rc h o n
p o in te rs a n d m e m o ry h ie ra rc h y .
T h e S P E C C P U b e n c h m a rk s a re th e m o s t p o p u la r b e n c h m a rk c o lle c tio n
o v e ra ll.
V e ry fe w p a p e rs b a s e q u a n tita tiv e d a ta o n O L T P w o rk lo a d s .
G iv e n th e im p o rta n c e o f O L T P , T P C - a n d S A P -ty p e w o rk lo a d s in c o m m e rc ia l
c o m p u tin g , o n e c a n a s k w h e th e r a c a d e m ic re s e a rc h n e g le c ts to g iv e g u id a n c e fo r
c o m p u te r d e v e lo p e rs a s fa r a s th e s e e n v iro n m e n ts a re c o n c e rn e d . F o rtu n a te ly ,
s p e c ia liz e d w o rk s h o p s lik e th e “ W o rk s h o p o n C o m p u te r A rc h ite c tu re E v a lu a tio n
u s in g C o m m e rc ia l W o rk lo a d s ” h e ld in c o n ju n c tio n w ith th e “ S y m p o s iu m o n H ig h
P e r f o r m a n c e C o m p u te r A r c h ite c tu r e ” ( w w w .h p c a c o n f .o r g ) f ill th is g a p .
S u c h a s ta te m e n t is c o n s id e re d p ro o f th a t th e id e a h a s v a lu e to it, a n d is re le v a n t fo r
s u c c e s s o r p ro je c ts , fo r re s e a rc h g ra n ts , a n d a c a d e m ic p ro m o tio n .
T h is te n d e n c y , b o th in m a n u fa c tu re rs ’ d e v e lo p m e n t la b s a n d in a c a d e m ic re s e a rc h ,
p la c e s a re s p o n s ib ility o n b e n c h m a rk d e v e lo p m e n t g ro u p s th a t c a n b e frig h te n in g :
S o m e tim e s , a s p e c ts o f b e n c h m a rk s b e c o m e im p o rta n t fo r s u c h d e s ig n o p tim iz a tio n s
th a t w e re n o t y e t th o u g h t o f, o r n e v e r d is c u s s e d , w h e n th e b e n c h m a rk s w e re s e le c te d .
S u d d e n ly , th e y d e v e lo p a n in flu e n c e n o t o n ly o n th e c o m p a ris o n o f to d a y ’s c o m p u te rs
b u t a ls o o n th e d e s ig n o f to m o rro w ’s c o m p u te rs . F o r e x a m p le , w h e n th e e a rlie r C P U
b e n c h m a rk s u ite s w e re p u t to g e th e r, S P E C lo o k e d m a in ly a t th e s o u rc e c o d e s . N o w ,
w ith te c h n iq u e s lik e fe e d b a c k o p tim iz a tio n a n d v a lu e p re d ic tio n b e c o m in g m o re
p o p u la r in re s e a rc h a n d p o s s ib ly a ls o in s ta te -o f-th e -a rt c o m p ile rs , o n e a ls o h a s to
lo o k m u c h m o re c lo s e ly a t th e in p u t d a ta th a t a re u s e d w ith th e b e n c h m a rk s : A re th e y
re p re s e n ta tiv e fo r ty p ic a l p ro b le m s in th e a re a c o v e re d b y th e b e n c h m a rk s ? D o th e y
e n c o u ra g e o p tim iz a tio n te c h n iq u e s th a t h a v e a n o v e r-p ro p o rtio n a l e ffe c t o n
b e n c h m a rk s , a s o p p o s e d to n o rm a l p ro g ra m s ?
T h is a u th o r o n c e a tte n d e d a n A S P L O S (A rc h ite c tu ra l S u p p o rt fo r P ro g ra m m in g
L a n g u a g e s a n d O p e ra tin g S y s te m s ) c o n fe re n c e a n d m a d e a c ritic a l re m a rk a b o u t s o m e
p a p e rs th a t re lie d , in h is o p in io n , to o m u c h o n th e S P E C C P U b e n c h m a rk s . H e w a s
a s k e d : “ D o y o u m e a n th a t w e s h o u ld n o t u s e th e S P E C b e n c h m a rk s ? ” T h is , o f c o u rs e ,
w o u ld m e a n to “ th ro w o u t th e c h ild w ith th e b a th w a te r” . T h e re s u lt w a s a s h o rt
c o n trib u tio n in a n e w s le tte r w id e ly re a d b y a c a d e m ic c o m p u te r a rc h ite c ts [1 2 ], a s k in g
to c o n tin u e u s in g th e S P E C C P U b e n c h m a rk s , b u t to u s e – if p o s s ib le – a ll o f th e m ,
a n d to r e p o r t a ll r e le v a n t c o n d itio n s ( e .g . c o m p ile r f la g s ) . T h e m a in r e q u e s t w a s n o t to
ta k e b e n c h m a rk s b lin d ly a s g iv e n , b u t to in c lu d e a c ritic a l d is c u s s io n o f th o s e
p ro p e rtie s o f th e b e n c h m a rk s th a t m a y b e re le v a n t fo r th e a rc h ite c tu ra l fe a tu re th a t is
s tu d ie d . F o r e x a m p le , in [1 4 ] it is s h o w n th a t th e p ro g ra m “ H E A L T H ” , o n e o f th e
o fte n -u s e d “ O ld e n B e n c h m a rk s ” s u p p o s e d ly re p re s e n ta tiv e fo r lin k e d -lis t d a ta
s tru c tu re s , is a lg o rith m ic a lly s o in e ffic ie n t th a t a n y p e rfo rm a n c e e v a lu a tio n s b a s e d o n
it a re h ig h ly q u e s tio n a b le . S u rp ris in g ly fe w re s e a rc h p a p e rs d is c u s s th e v a lu e o f
S P E C ’s b e n c h m a rk s fro m a n in d e p e n d e n t p e rs p e c tiv e , [2 ] is o n e o f th e m (h o w e v e r,
d is c u s s in g th e o ld C P U 9 2 b e n c h m a rk s ). O c c a s io n a lly , w h e n s u rp ris in g n e w re s u lts
c o m e u p , o n lin e d is c u s s io n g ro u p a re fu ll o f s ta te m e n ts “ B e n c h m a rk s , a n d in
p a rtic u la r th e S P E C C P U b e n c h m a rk s , a re b o g u s a n y w a y ” . B u t it is h a rd to fin d g o o d ,
c o n s tru c tiv e c ritic is m s o f b e n c h m a rk p ro g ra m s . M o re p a p e rs th a t c o m p a re
b e n c h m a rk s w ith o th e r p ro g ra m s th a t a re h e a v ily u s e d w o u ld b e p a rtic u la rly u s e fu l. In
o n e s u c h p a p e r [6 ], th e a u th o rs c o m p a re th e e x e c u tio n c h a ra c te ris tic s o f p o p u la r
W in d o w s -b a s e d d e s k to p a p p lic a tio n s w ith th a t o f s e le c te d S P E C C IN T 9 5 b e n c h m a rk s
a n d f in d s im ila r itie s ( e .g ., d a ta c a c h e b e h a v io r ) a s w e ll a s d if f e r e n c e s ( e .g ., in d ir e c t
b ra n c h e s ). G iv e n th e la rg e in flu e n c e – d ire c t o r in d ire c t, e v e n in a c a d e m ic re s e a rc h –
th a t b e n c h m a rk s c a n h a v e , it w o u ld b e b e n e fic ia l fo r b o th c o m p u te r d e v e lo p m e n t a n d
c o m p u te r s c ie n c e re s e a rc h if th e b e n c h m a rk s g e t th e a tte n tio n a n d c ritic a l s c ru tin y
th e y d e s e rv e .
O n e h a s to a c k n o w le d g e th a t th e th re e u s a g e a re a s fo r b e n c h m a rk s m e n tio n e d in th e
in tro d u c tio n
1 . C u s to m e r in fo rm a tio n a n d m a rk e tin g , g o a l: C o m p a re to d a y ’s c o m p u te rs o n a fa ir
b a s is ;
2 0 2 R . W e ic k e r
2 . D e s ig n in m a n u fa c tu re rs ’ la b s , g o a l: B u ild b e tte r c o m p u te rs ;
3 . C o m p u te r s c ie n c e re s e a rc h , g o a l: D e v e lo p d e s ig n id e a s fo r th e lo n g -te rm fu tu re ;
c a n c a ll fo r d iffe re n t s e le c tio n c rite ria : F o r g o a l 1 , it m a k e s s e n s e to h a v e
re p re s e n ta tiv e s o f to d a y ’s p ro g ra m s in a b e n c h m a rk s u ite , in c lu d in g in s ta n c e s o f
“ d u s ty d e c k ” c o d e . F o r g o a ls 2 a n d 3 , it w o u ld m a k e m u c h m o re s e n s e to o n ly h a v e
p ro g ra m s o f to m o rro w , p ro g ra m s w ith g o o d c o d in g s ty le , p o s s ib ly p ro g ra m s o f a ty p e
th a t ra re ly e x is ts to d a y . F o r e x a m p le , it h a s b e e n o b s e rv e d [6 ] th a t o b je c t-o rie n te d
p ro g ra m s a n d p ro g ra m s th a t m a k e fre q u e n t u s e o f d y n a m ic a lly lin k e d lib ra rie s
(D L L ’s ) h a v e d iffe re n t e x e c u tio n c h a ra c te ris tic s th a n th e c la s s ic a l C a n d F o rtra n
p ro g ra m in to d a y ’s S P E C C P U s u ite s .
5 U s e o f B e n c h m a r k s , I s s u e s , a n d O p p o r tu n itie s
5 .1 B e n c h m a r k s a s D r iv e r s o f O p tim iz a tio n
It is w e ll k n o w n , a n d in te n d e d a n d e n c o u ra g e d b y b e n c h m a rk o rg a n iz a tio n s , th a t g o o d
b e n c h m a rk s d riv e in d u s try a n d te c h n o lo g y fo rw a rd . E x a m p le s a re :
A d v a n c e s in c o m p ile r o p tim iz a tio n e n c o u ra g e d b y th e S P E C C P U
b e n c h m a rk s.
A d v a n c e s in d a ta b a s e s o ftw a re e n c o u ra g e d b y th e T P C b e n c h m a rk s .
O p tim iz a tio n s in W e b s e rv e rs lik e c a c h in g fo r s ta tic re q u e s ts , e n c o u ra g e d
b y th e S P E C w e b b e n c h m a rk s .
It m u s t b e e m p h a s iz e d th a t s u c h a d v a n c e s in p e rfo rm a n c e a re a w e lc o m e a n d
in te n d e d e ffe c t o f b e n c h m a rk s , a n d th a t th e b e n c h m a rk in g o rg a n iz a tio n s c a n ta k e
p rid e in th e s e d e v e lo p m e n ts . H o w e v e r, s o m e d e v e lo p m e n ts c a n a ls o b e
c o u n te rp ro d u c tiv e :
C o m p ile r w rite rs m a y c o n c e n tra te o n o p tim iz a tio n s th a t a re a llo w e d fo r
th e S P E C C P U b e n c h m a rk s b u t th a t a re n o t u s e d b y 8 0 o r 9 0 % o f th e
s o ftw a re d e v e lo p e rs : T h e n , th e b e n c h m a rk s le a d to a o n e -s id e d a n d
p ro b le m a tic a llo c a tio n o f re s o u rc e s . S P E C h a s trie d to c o u n te r th is w ith
th e re q u ire m e n t to p u b lis h “ b a s e lin e ” re s u lts . H o w e v e r, th is c a n o n ly b e
B e n c h m a rk in g 2 0 3
L e t u s lo o k a t th e e x a m p le o f th e S P E C C P U b e n c h m a rk s w h e re th e c o m p o n e n t
b e n c h m a rk s a re g iv e n in s o u rc e c o d e fo rm . If s o m e a s p e c t o f th e b e n c h m a rk tu rn s o u t
to b e p a rtic u la rly re le v a n t fo r th e m e a s u re d p e rfo rm a n c e , b u t if th is p ro p e rty is n o t
s h a re d b y im p o rta n t s ta te -o f-th e -a rt p ro g ra m s , a p a rtic u la r o p tim iz a tio n c a n s u g g e s t to
th e n a iv e r e a d e r a p e r f o r m a n c e im p r o v e m e n t th a t is u n r e a l, i.e . th a t is b a s e d o n th e
s p e c ific p ro p e rtie s o f a p a rtic u la r b e n c h m a rk o n ly . A n is s u e th a t s e e m s to c o m e u p
re p e a te d ly w ith th e S P E C C P U b e n c h m a rk s is a “ s ta irc a s e ” e ffe c t o f c e rta in s in g le
b e n c h m a rk s w ith re s p e c t to c a c h e s . S o m e p ro g ra m s , to g e th e r w ith th e ir in p u t d a ta
s e ts , h a v e a c ritic a l w o rk in g s e t s iz e : If th e w o rk in g s e t fits in to a c a c h e , p e rfo rm a n c e
is a m a g n itu d e b e tte r th a n in th e c a s e th a t th e w o rk in g s e t s iz e m a k e s m a n y m e m o ry
a c c e s s e s n e c e s s a ry , p o s s ib ly c o n n e c te d w ith “ c a c h e th ra s h in g ” e ffe c ts . S e v e ra l S P E C
C P U b e n c h m a r k s ( 0 3 0 .m a tr ix 3 0 0 in C P U 8 9 , 0 2 3 .e q n to tt in C P U 9 2 , 1 7 3 .a r t in
C P U 2 0 0 0 ) a p p a re n tly h a d s u c h “ m a g ic a l” w o rk in g s iz e p a ra m e te rs a n d , c o n s e q u e n tly ,
s h o w e d s u d d e n in c re a s e s in th e ir p e rfo rm a n c e th ro u g h c o m p ile r o p tim iz a tio n s th a t
m a n a g e d to p u s h th e w o rk in g s e t s iz e b e lo w a c ritic a l, c a c h e -re la te d b o u n d a ry . S u c h
o p tim iz a tio n s ty p ic a lly g e n e ra te d h e a te d d is c u s s io n s in s id e a n d o u ts id e o f S P E C :
“ C a n s u c h a n in c re a s e in p e rfo rm a n c e b e re a l? ” T h e o p tim iz a tio n s th e m s e lv e s c a n b e
“ le g a l” , i.e . g e n e r a l e n o u g h th a t th e y a r e a p p lic a b le to o th e r p r o g r a m s a ls o . B u t th e
s p e c ific p e rfo rm a n c e g a in m a y n o t b e re p re s e n ta tiv e ; it m a y c o m e fro m a p a rtic u la r
p ro g ra m m in g s ty le th a t S P E C o v e rlo o k e d in its b e n c h m a rk s e le c tio n p ro c e s s .
It is n o t o n ly th e c o m p ile r a n d th e C P U b e n c h m a rk s w h e re th e q u e s tio n o f
re p re s e n ta tiv ity is re le v a n t. S y s te m b e n c h m a rk s o fte n te s t th e p e rfo rm a n c e o f a
la y e re d s o ftw a re s y s te m . T h e s e v e n -la y e r IS O re fe re n c e m o d e l fo r n e tw o rk
a rc h ite c tu re s is a g o o d e x a m p le : In th e in te re s t o f g o o d s o ftw a re e n g in e e rin g p ra c tic e s
(m o d u la rity , e n c a p s u la tio n , m a in ta in a b ility ), s o ftw a re is o fte n o rg a n iz e d in la y e rs ,
e a c h p e rfo rm in g a s p e c ific ta s k . O n th e o th e r h a n d , it is w e ll k n o w n th a t s h o rtc u ts
th ro u g h th e s e la y e rs u s u a lly re s u lt in b e tte r p e rfo rm a n c e ; th e v a rio u s W e b s e rv e r
c a c h e s n o w u s e d in m o s t S P E C w e b re s u lts a re a g o o d e x a m p le . W h e re s h o u ld
b e n c h m a rk s p e c ific a tio n s d ra w th e lin e w h e n s u c h la y e r-tra n s g re s s in g o p tim iz a tio n s
o c c u r fo r th e firs t tim e a t th e o c c a s io n o f a n e w b e n c h m a rk re s u lt? W h a t w ill b e s e e n
a s a n a d v a n c e in te c h n o lo g y , a n d w h a t w ill b e s e e n a s a “ b e n c h m a rk s p e c ia l” , a
c o n s tru c t th a t is o u tla w e d in m o s t b e n c h m a rk s ’ ru le s ? E v e ry o n e w ill a g re e o n e x tre m e
c a se s:
2 0 4 R . W e ic k e r
O th e r a s p e c ts o f re p re s e n ta tiv ity s o fa r h a v e b a re ly b e e n to u c h e d b y b e n c h m a rk
d e s ig n e rs : In [9 ], S h u b h e n d u M u k h e rje e p o in ts o u t th a t fu tu re p ro c e s s o rs m ig h t h a v e
s e v e ra l m o d e s o f o p e ra tio n in re la tio n to c o s m ic ra y s : A “ fa u lt to le ra n t” m o d e w h e re
B e n c h m a rk in g 2 0 5
5 .2 C o n flic tin g G o a ls fo r B e n c h m a r k s
T h e d is c u s s io n s in th e p re v io u s s e c tio n s c a n b e s u b s u m e d u n d e r th e title
“ R e p re s e n ta tiv ity ” . A n o b v io u s w a y to w a rd s a c h ie v in g th is g o a l is th e in tro d u c tio n o f
n e w b e n c h m a rk s (lik e S P E C ’s s e q u e n c e o f C P U b e n c h m a rk s , fro m C P U 8 9 to
C P U 2 0 0 0 ), o r a t le a s t th e in tro d u c tio n o f n e w ru le s fo r e x is tin g b e n c h m a rk s (lik e
T P C ’s s e q u e n c e o f “ m a jo r re v is io n s ” fo r its T P C -C b e n c h m a rk ). N e w b e n c h m a rk s
c a n m a k e o v e r-a g g re s s iv e o p tim iz a tio n s irre le v a n t, a n d th e e x p e c ta tio n th a t
b e n c h m a rk s w ill b e re tire d a fte r a fe w y e a rs c a n d is c o u ra g e s p e c ia l-c a s e o p tim iz a tio n s
in th e firs t p la c e . O n th e o th e r h a n d , m a rk e tin g d e p a rtm e n ts a n d e n d u s e rs w a n t
b e n c h m a rk s to b e “ s ta b le ” , to b e v a lid o v e r m a n y y e a rs . T h is le a d s , fo r e x a m p le , to
o n e o f th e m o s t fre q u e n tly a s k e d q u e s tio n s to S P E C : “ C a n I c o n v e rt S P E C in t9 5
ra tin g s to S P E C in t2 0 0 0 ra tin g s ? ” S P E C ’s o ffic ia l a n s w e r “ Y o u c a n n o t – th e p ro g ra m s
a re d iffe re n t” is n o t s a tis fy in g fo r m a rk e tin g b u t n e c e s s a ry fro m a te c h n ic a l p o in t o f
v ie w .
5 .3 M o r e T h a n J u s t G e n e r a to r s o f S in g le N u m b e r s : B e n c h m a r k s a s U s e fu l
T o o ls fo r th e I n te r e s te d U s e r
W h a t w o u ld b e th e re s u lt o f a T P C o r S A P b e n c h m a rk m e a s u re m e n t if th e
C P U u tiliz a tio n o f th e s e rv e r is a t a le v e l re c o m m e n d e d fo r e v e ry d a y u se ,
a n d n o t a s c lo s e to 1 0 0 % a s p o s s ib le ?
M a n u fa c tu re rs ty p ic a lly d o n o t p u b lis h s u c h m e a s u re m e n ts b e c a u se th e y a re a fra id o f
u n fa ir c o m p a ris o n s : E v e ry o n e n o t u s in g a p e rfo r m a n c e -re le v a n t o p tim iz a tio n th a t is
le g a l a c c o rd in g to th e b e n c h m a rk ’s ru le s w o u ld r u n th e ris k th a t th e re s u lt w o u ld le t
h is s y s te m a p p e a r in fe rio r to c o m p e titiv e s y s te m s fo r w h ic h o n ly h ig h ly o p tim iz e d
re s u lts a re p u b lis h e d
A t le a st fo r e a s y -to -u s e b e n c h m a rk s lik e S P E C ’s C P U a n d S P E C jb b 2 0 0 0
b e n c h m a r k s , v a lu a b le in s ig h ts c o u ld b e g a in e d b y th e p u b lic a tio n s o f m o re re s u lts th a t
th o se p ro m o te d fo r m a rk e tin g p u rp o s e s . W ith a p p ro p ria te d o c u m e n ta tio n , re a d e rs
c o u ld a sk fo r a n s w e rs to q u e s tio n s lik e
W h a t is th e p e rfo rm a n c e g a in fo r a n e w p ro c e s s o r a rc h ite c tu re if p ro g ra m s
a re n o t re c o m p ile d ? T h is o fte n h a p p e n s w ith im p o rta n t IS V c o d e s .
W h a t is th e p e rfo rm a n c e lo s s if a P C u s e s R A M c o m p o n e n ts th a t a re
s lo w e r b u t c h e a p e r?
F o r d iffe re n t C P U a rc h ite c tu re s , h o w m u c h d o th e y d e p e n d o n
s o p h is tic a te d c o m p ila tio n te c h n iq u e s , h o w w e ll c a n th e y d e liv e r
a c c e p ta b le p e rfo rm a n c e in e n v iro n m e n ts th a t d o n o t in c lu d e s u c h
te c h n iq u e s ?
It w o u ld b e u n re a lis tic to e x p e c t a n s w e rs to s u c h q u e s tio n s fro m v e n d o r-p u b lis h e d
b e n c h m a rk re s u lts ; v e n d o rs c a n n o t a ffo rd to p u b lis h a n y th in g b u t th e b e s t re s u lts . B u t
if b e n c h m a rk s a re w e ll-d e s ig n e d , re p re s e n ta tiv e , a n d p o rta b le – a s th e y s h o u ld b e -,
th e n , w ith s o m e e ffo rts fro m in fo rm e d u s e r o rg a n iz a tio n s , re s e a rc h e rs , o r m a g a z in e s ,
th e y c o u ld b e u s e d fo r m u c h m o re th a n w h a t is v is ib le to d a y .
R e fe r e n c e s
1 Introduction
The explosive growth in size and usage of the Web is causing enormous strain on
users, network service, and content providers. Sophisticated software components
have been implemented for the provision of critical services through the Web.
Consequently, many research efforts have been directed toward improving the
performance of Web-based services through caching and replication solutions. A
large variety of novel content delivery architectures, such as distributed Web-
server systems, cooperative proxy systems, and content distribution networks
have been proposed and implemented [35].
One of the key issues is the evaluation of the performance and scalability of
these systems under realistic workload conditions. In this tutorial, we focus on
the use of benchmarking models and tools during the design, testing, and alter-
native comparison of locally and geographically distributed systems for highly
accessed Web sites. We discuss the properties that should be provided by a
benchmarking tool in terms of various parameters: applicability to distributed
Web-server systems, realism of workload and significance of the output results.
The analysis is also influenced by the availability of the source code and the
customizability of the workload model. We analyze popular products that are
M.C. Calzarossa and S. Tucci (Eds.): Performance 2002, LNCS 2459, pp. 208–235, 2002.
c Springer-Verlag Berlin Heidelberg 2002
Benchmarking Models and Tools for Distributed Web-Server Systems 209
free or at nominal costs, and provide source code: httperf [32], SPECweb99 (in-
cluding the version supporting SSL encryption/decryption) [38,39], SURGE [7,
8], S-Clients [6], TPC-W [41], WebBench [45], Web Polygraph [42], and Web-
Stone [30]. For this reason, we do not consider commercial tools (e.g., Techno-
vations’ Websizr [40], Neal Nelson’s Web Server Benchmark [34]) that are more
expensive and typically unavailable to the academic community, although they
provide richer functionalities. Other benchmarking tools that come from the re-
search (e.g., Flintstone [15], WAGON [24]) have not been included because they
are not publicly available.
We can anticipate that none of the observed tools is specifically oriented to
testing distributed Web-server systems, and only a minority of them reproduces
the load imposed by a modern user session. Many existing benchmarks prefer
to test the maximum capacity of a Web server by requesting objects as quickly
as possible or at a constant rate. Others with more realistic reproductions of
user session behavior (involving multiple requests for Web pages separated by
think times) refer to request and delivery of static content only. This result was
rather surprising if we think that the variety and complexity of offered Web-
based services require system structures that are quite different from the typical
browser/server solutions of the early days of the Web. The increasing need for
dynamic request, multimedia services, e-commerce transactions, and security are
typically based on multi-tier distributed systems. These novel architectures have
really complicated the user and client interactions with a Web system, ranging
from simple browsing to elaborated sessions involving queries to application and
database servers. Not to say about the manipulations to which a user request can
be subject, from cookie-based identifications to tunneling, caching, and redirec-
tions. Moreover, an increasing amount of Web services and content are subject
to security restrictions and secure communication channels involving strong au-
thentication that is becoming a common practice in the e-business world. Since
distributed Web-server systems typically provide dynamic and secure services, a
modern benchmarking tool should model and monitor the complex interactions
occurring between clients and servers. None of them seems publicly available to
the academic community.
We illustrate in Fig. 1 the basic structure of a benchmark tool for distributed
Web-server systems that we assume based on six main components (benchmark-
ing goal and scope, workload characterization, content mapping on servers, work-
load generation, data collection, data analysis and report) that will be analyzed
in details in the following sections. The clear identification of the characteristics
to be evaluated is at the basis of any serious benchmarking study that cannot
expect to achieve multiple goals. From this choice, the workload representation
phase takes as its input the set of parameters representing a given workload con-
figuration and produces a non ambiguous Web workload specification. In the case
of a distributed Web-server system, the content is not always replicated among
all the servers, hence it is important that the content mapping phase decides the
assignment of the Web content among multiple front-end and back-end servers.
The workload generation engine of a benchmark analyzes the workload specifi-
210 M. Andreolini, V. Cardellini, and M. Colajanni
cation and produces the offered Web workload, issuing the necessary amount of
requests to the Web system and handling the server responses. The component
responsible for data collection considers the metrics of interest that have been
chosen in the first phase of the benchmarking study and stores relative data
measurements. Often, the whole set of measurements must be aggregated and
processed in order to present meaningful results to the benchmark user. The
output analysis and report component of a benchmark takes the collected data
set, computes the desired statistics, and presents them to the benchmark user
in a readable form.
S c o p e a n d
g o a ls
W o rk lo a d
C o n fig u ra tio n c h a ra c te riz a tio n
p a ra m e te rs
C o n te n t m a p p in g W e b c o n te n t
W e b c o n te n t o n se rv e rs
a n d s e rv ic e s
W o rk lo a d D is trib u te d
W e b s y s te m
M e a s u re m e n ts W o rk lo a d R e q u e s ts
g e n e ra tio n
M e tric s R e sp o n se s
D a ta
c o lle c tio n
C o lle c te d d a ta
M e tric s D a ta a n a ly s is
& re p o rt
S ta tis tic s
After a brief description in Sect. 2 of the main architectures for locally and
geographically distributed Web-server systems, the remaining sections of this
tutorial follows the components outlined in Fig. 1. Finally, Sect. 9 concludes the
paper and summarizes some open issues for future research.
publicized with one site name to provide a single interface to users at least at
the site name level.
D y n a m ic r e s p o n s e
W e b c lu s te r fo r
w w w .w e b s it e .c o m
S ta tic r e s p o n s e (c a c h e ) 1
0 0
1
000
111
0
1 0
1
000
111
0
1 0
1
L A N 000
111
0
1 0
1
000
111
0
1
0
1 0
1
S ta tic r e s p o n s e (d is k )
0 1
000
111
1
000
111
0
1
0
0
1
0
1
000
111
0
1
S ta tic r e q u e s t 0 1
000
111
1 0
0
1
C lie n t S ta tic r e q u e s t W e b se r v e r 1
D y n a m ic r e q u e s t 1
0 0
1
000
111
0
1 0
1
000
111
0
1 0
1
000
111
0
1 0
1
000
111
B a c k − e n d
0
1
0 1
1 0
0
1 se r v e r 1
IN T E R N E T 000
111
0
1
000
1110
1
0
1
0 1
000
111
1 0
0
1
000
111
0
1 0
1
1 3 5 . 6 4 . 5 6 . 2 0 W e b s w itc h
1 3 5 .6 4 .5 6 .2 0 W e b se r v e r 2
0
1 0
1
111
000
0
1 0
1
111
000
0
1 0
1
111
000
0
1 0
1
111
000
w w w . w e b s i t e . c o m 0
1
0
1 0
1
0
1
111
000
0
1
0 1
000
1110 B a c k − e n d
L o c a l D N S se r v e r 1
000
111
0
1 0
1
0 1
0
000
111 se r v e r M
1 0
1
W e b se r v e r N
A u th o r ita tiv e D N S s e r v e r
f o r w w w .w e b s it e .c o m
the chosen Web server. The selection of the target server can be based on the
Web service/content requested, as URL content, SSL identifiers, and cookies.
Another important classification regards the mechanism used by the Web
cluster to route outbound packets to the clients. In two-ways architectures, both
inbound and outbound traffic pass through the Web switch. In one-way architec-
tures, only inbound packets flow through the Web switch, while outbound pack-
ets use a separate high-bandwidth network connection. A detailed description of
request routing mechanisms and dispatching algorithms for locally distributed
architectures can be found in [10].
A locally distributed system is a powerful and robust architecture from the server
point of view, but does not solve the problems related to network delivery, such as
first and last mile connectivity, router overload, peering points. An alternative
solution is to distribute the server nodes over the Internet. With respect to
clusters of nodes that reside at a single location, geographically distributed Web-
server systems can reduce network delays experienced by the client, and also
provide high availability to face network failures and congestion.
For performance and availability reasons, the distribution take typically place
at the granularity of Web clusters that is, each geographically distributed node
consists of a cluster of servers as that described in the previous section. We
refer to this architectures as to Web multi-cluster. It maintains one hostname
for the extern as in the Web cluster case, but now each Web cluster has a visible
IP address. Hence, the request assignment process can occur in two or more
steps. The first request assignment (inter-cluster) is typically carried out by the
authoritative Domain Name Server (DNS) of the Web site that selects the IP
Benchmarking Models and Tools for Distributed Web-Server Systems 213
address of the target Web cluster during the address lookup of the client request.
The second (intra-cluster) dispatching level is executed by the Web switch of the
target cluster that distributes the request reaching the cluster among the local
Web server nodes. A third (extra-cluster) dispatching level based on some request
re-routing technique may be integrated with the previous two mechanisms [11,
35].
4 Workload Characterization
The characterization of the workload generated by a Web benchmarking tool
represents a central aspect of benchmarking and constitutes a distinguishing
core feature of existing tools as on it founds the attempt to mimic the real-world
traffic patterns observed by Web-server systems. The generation of synthetic
Web traffic is not a trivial task because it aims at reproducing as accurately
as possible the characteristics of real traffic patterns, which exhibit some un-
usual features such as burstiness and self-similarity [4,12]. On the other hand,
real world workloads are inherently irreproducible, since it is impossible to repli-
cate the overall conditions under which the performance testing was originally
performed.
214 M. Andreolini, V. Cardellini, and M. Colajanni
In this section, we identify the main properties that are at the basis of the
process of specifying the workload characterization. Moreover, we analyze the
requirements that are specific for the benchmarking of distributed Web-server
systems, compare the identified approaches, and discuss how the existing bench-
marks realize these properties, providing also directions which we feel should
be considered in the realization of benchmarking tools specific to distributed
Web-server systems.
Name Meaning
Web page A collection of objects constituting a multipart document
intended to be rendered simultaneously; the base object
is the first fetched from the server, then it is parsed, and
all embedded objects are subsequently requested
User session A sequence of requests for Web pages (clicks) issued by
the same user during an entire visit to the Web site
Session length The number of Web pages constituting a user session
Session interarrival rate The rate at which new user sessions are generated
User think time The time between two consecutive Web pages retrievals
Object sizes The size of the collection of objects stored on the Web system
Request sizes The size of objects transferred from the Web system
Object popularity The relative frequency of requests made to individual objects
Embedded objects The number of objects (not counting the base object)
composing a single Web page
Temporal locality How likely a requested object will be requested again in the
near future
well as on the mapping of the synthetic content on the Web-server system that
will be analyzed in Sect. 5. As shown in Fig. 3, the generation of the stream of
Web requests falls into main four approaches.
W e b re q u e s t s tre a m
s p e c ific a tio n
allows the benchmark tool to mimic the user behavior in a realistic way. However,
the conclusions drawn from the experiments depends on the trace representa-
tiveness, as a trace can present workload properties that are strictly peculiar to
it and do not have general validity. Furthermore, it can be hard to adjust the
workload to imitate future conditions or varying demands.
It should also be remarked that, unlike the early days of the Web, server
access logfiles are becoming a precious source of business and marketing infor-
mation. As a consequence, companies and organizations are not willing to give
their traces for free (or even at all), if not after years when the realism of these
traces is at least doubtful. A further issue of the trace-based approach regards
the reconstruction of the user sessions from the trace logs, which is not a triv-
ial task [1]. For example, as sessions are identified through their IP address, it
may happen that clients behind the same proxy are considered as coming from
the same machine, which may lead to an improper characterization of the Web
workload. Another issue that may complicate the reconstruction of user sessions,
especially for highly accessed Web systems, concerns the coarse time resolution
at which requests are recorded in server access logs [20].
In the filelist based approach, the tool provides a list of Web objects with their
access frequencies. The object sizes are typically based upon the analysis of logs
from several Web sites. During the workload generation phase, the next object to
be retrieved is chosen on the basis of its access frequency. Time characteristics are
typically not taken into account, hence the stream of requests depends only on
the filelist while the inter-arrival request time is set. The filelist approach lacks
of flexibility with respect to the workload specification, and also ignores the
concept of user sessions As discussed in [4,3,8,12], Web traffic is bursty, session-
oriented, and characterized by heavy-tailed distributions, which have high or
even infinite variance and therefore show extreme variability on all time scales.
To emulate these workload characteristics, it is not sufficient to mimic the user
activity by requesting a set of files as quickly as possible; it is necessary to
provide some support for modeling the session-oriented nature of Web traffic.
As a consequence, a benchmark that uses just a filelist is not able to reproduce
a realistic Web workload. When using filelists, the only feasible alternative is
to provide some support to define the characteristics of a user session (such as
user think times) otherwise the workload generator will not be able to emulate
a realistic load. Furthermore, the overall size of the file set being used should be
checked to ensure that the server caching mechanism is fully exercised.
In the analytical distribution-driven approach, the Web workload characteris-
tics are specified by means of mathematical distributions. The requests are issued
according to the parameters of the workload model. The probability distributions
may be used to generate random values that reproduce all the characteristics of
the request stream during the execution of the benchmarking test. An alternative
is to pre-generate all user sessions and the resulting sequence of requests, and
to store them in a trace file which will be used by the workload generator. The
analytical distribution-driven approach allows a tool to define a detailed Web
workload characterization because all features are specified through mathemat-
Benchmarking Models and Tools for Distributed Web-Server Systems 217
ical models. Some can argue about the realism and accuracy of the workload
characterization, but changing the parameters of a distribution or a distribution
itself to evaluate the performance under different conditions is a really easy task.
The hybrid approach is a mix of the filelist and analytical techniques.
For example, the objects to be accessed may be specified through a filelist,
while session-oriented parameters, such as session lengths and user think times,
are modeled through analytical distributions. In the hybrid method, parame-
ters shaping the main characteristics of session-oriented workload are modeled
through stochastic models.
In this subsection we analyze how the selected Web benchmarks specify their
workload. We appreciate that most benchmark tools allow us to customize and
extend the workload model in order to test different scenarios. On the other
hand, the option for workload configuration of SPECweb and TCP-W bench-
marks are quite limited because their goal is to measure the performance of
different systems in a well-defined and standardized scenario. Obviously, we do
not penalize these benchmarks for a limit that is intrinsic in their design.
Httperf permits two approaches to generate the request stream that is, hybrid
and trace-based [32]. Both methods enable a session-oriented workload charac-
terization and the requests for both static and dynamic services. In the hybrid
approach, single or multiple URL sequences may be specified, together with
some session oriented parameters, such as user think times. In the trace-based
approach, user sessions are defined in a trace file. The requests are issued ac-
cording to an open model. Both HTTP/1.0 and HTTP/1.1 protocols are fully
supported, including cookies (although only one cookie per user session). Pri-
mary SSL support is provided, including the possibility of specifying session
reuse, which is an important feature as it avoids handshaking every client re-
quest. Httperf allows also to specify some realistic browser characteristics, such
as the use of multiple concurrent connections.
SURGE relies on a analytically generated workload aimed at dealing with
the self-similarity issues of the Web characteristics [7,8]. The workload model
derives from empirical analysis of Web server usage to mimic real-world traffic
properties. In SURGE, the workload is measured in terms of User Equivalent,
Benchmarking Models and Tools for Distributed Web-Server Systems 219
the basis of the system architectures, because the content is not always replicated
among all the servers.
Let us first examine the problem of mapping the Web site content onto
the Web servers in the case of one Web server, for which we identify three
alternatives: full support, partial support, and no support.
The most attractive feature to the benchmark user is a full support that
is, once the benchmark user provides the specification of the entire Web site
content (the tree of static documents as well as the set of data to be placed on
the back-end servers), it is automatically generated and uploaded on the Web
and back-end server disks. A partial support means that only a portion of the
Web site content (that , static documents) is put on the server disk, while other
content (that is, dynamic services) is left up to the benchmark user. If the bench-
mark does not provide any support for the content generation and mapping, the
content must be generated and uploaded manually on the server. Manual gener-
ation is errore-prone and is often unfeasible due to the large number of involved
files. Thus, the presence of a mapping component is strongly encouraged.
Webstone provides a partial support for Web content creation [30]. It is
possible to specify and generate a set of static files with given sizes, while dy-
namic content creation is left to the user. The other Web benchmarking tools,
although providing in some cases already predefined Web contents (SPECweb99,
WebBench), neither perform content mapping across different Web servers nor
install them. Every decision is left to the benchmark user.
The benchmark study of a distributed Web-server system has an additional
requirement because the site content may be fully replicated, partially replicated,
or partitioned among the multiple server nodes. The two last configurations are
typically used to increase the secondary storage scalability [10,44] or to enhance
the features of specialized server nodes providing dynamically generated content
or streaming media files. It is also important to observe that fully replication can
be easily avoided only if we use a layer-7 Web switch that can take content-aware
dispatching decisions. An alternative is to use a layer-4 Web switch combined
with a distributed file system, because any selected server node should be able
to respond to client requests for any part of the Web site content.
We can easily observe that none of the selected benchmarking tools includes
any utility for fully or partial content replication among multiple servers.
requirements that are specific for distributed Web systems, and discuss how the
selected Web benchmarking tools behave with respect to the identified features
and requirements.
The two main features in a workload generator are the engine architecture denot-
ing the computational units used to generate Web traffic (processes or threads)
and mutual interactions, and the coordination scheme defining the ability of
configuring and synchronizing the computational unit executions.
W o rk lo a d g e n e ra tio n
e n g in e
S in g le n o d e M u ltip le n o d e s
(c e n tra liz e d ) (d is trib u te d )
C lie n t n o d e 1
C lie n t
...
C lie n t
M a s te r n o d e
D is trib u te d
W e b s y s te m
M a s te r
C lie n t n o d e K
C lie n t
...
C lie n t
C lie n t n o d e 1
C lie n t
C o lle c to r
C lie n t
M a s te r n o d e
D is trib u te d
M a s te r W e b s y s te m
C lie n t n o d e K
C lie n t
C o lle c to r
C lie n t
In this section we analyze how the selected benchmarking tools generate the load
offered to the Web system.
Httperf generates the specified workload through one process, implement-
ing an event-driven approach with non-blocking I/O [32]. As a consequence,
the workload generator keeps a single CPU constantly occupied, so it is recom-
mended not to run more than one httperf process per CPU. Furthermore, the
maximum number of concurrent sessions is bounded by typical process limits
such as the maximum number of open descriptors. As there is no coordination
scheme, several instances of httperf must be executed manually on distinct nodes
to scale to the desired workload; an helper utility can be used to automate this
task [29]. The workload generation engine of httperf is adequate to the perfor-
mance evaluation of distributed Web systems.
In SURGE the client activity is modeled through a User Equivalent, which
is represented by a thread [7,8]. The benchmarking experiment is activated by
invoking a master which spawns a predefined number of client processes. Each
client process generates a prefixed number of client threads (i.e., User Equiv-
alents). Therefore, SURGE architecture can be defined as being centralized,
multiple-process and multiple-thread. The coordination scheme is a master-
collector-client, although on a single node. Since no support is provided to au-
tomatically distribute clients among multiple nodes, several instances of the
SURGE master have to be activated manually on distinct client nodes, in order
to scale the workload.
The workload generator of S-Clients is executed by a single process on one
client node [6]. The engine aims at generating excess load by using non-blocking
connects and closing the socket if no connection was established within a given
interval. There is no means to automatically start different workload generators
on distinct nodes, but this operation has to be performed manually. Further-
more, since timers are implemented using the rdtsc primitive [14], the ability to
generate connections with a specified rate depends heavily on the CPU speed of
the client, and the CPU type, which should be a Pentium. The most interesting
feature of S-Clients for the benchmarking of distributed Web systems is the use
226 M. Andreolini, V. Cardellini, and M. Colajanni
The most common metrics for Web system performance are reported in Ta-
ble 2 [28].
Name Meaning
Throughput The rate at which data is sent through the network
Connection rate The number of open connections per second
Request rate The number of client requests per second
Reply rate The number of server responses per second
Error rate The percentage of errors of a given type
DNS lookup time The time to translate the hostname into the IP address
Connect time The time interval between the initial SYN and the final
ACK sent by the client to establish the TCP connection
Latency time The time interval between the sending of the last byte
of a client request and the receipt of the first byte
of the corresponding response
Transfer time The time interval between the receipt of the first response
byte and the last response byte
Web object response time The sum of latency time and transfer time
Web page response time The sum of Web object response times pertaining to a
single Web page, plus the connect time
Session time The sum of all Web page response times and think times
in a user session
meaningless about peaks due to heavy load. This holds for throughput and re-
sponse times (specifically, object and page response times), which may exhibit
high variations from the mean value.
These performance statistics require more expensive or more sophisticated
data collection strategies, because measurements should be collected and stored
to allow later creation of histograms. The alternative is to implement techniques
to dynamically calculate the median and other percentiles without storing all
observations [17]. Let us analyze the main approaches to the collection strategy
(that is, record storage, data set processing, and hybrid ) and output analysis that
are strictly related.
In the record storage approach every record is stored. The generation of
meaningful statistics is entirely delegated to the output analysis. This technique
allows us to easily compute histograms and percentiles but it requires enormous
amount of memory. The main memory is often not sufficient, and the use of
secondary memory introduces other problems, such as delays and possible inter-
ferences in the experiment. Moreover, the elaboration of great amounts of data
tends to be resource expensive even if done post-mortem. Actually, a complete
collection and processing of all measurements is seldom necessary, and the use
of sampling techniques is the best alternative when we want to use the record
storage approach.
In the data set processing approach, measurements are not stored directly
into some repository, but are used to keep updated the data set with the in-
teresting statistics. Data set processing does not use great amounts of system
resources such as CPU or memory. This is the standard way for computing per-
formance indexes which do not require sophisticated statistics, such as minimum,
maximum, and mean values. It would be also possible to implement techniques
that dynamically calculate the median and other percentiles without storing all
observations [17], but even these more complex computation may interfere with
the experiment. The data set may coincide or not with the set of parameters
presented as final statistics. When they do not coincide, the generation of useful
statistics is partially delegated to the output analysis component that processes
the data set at the end of the benchmarking test.
None of the previous techniques is clearly the best. However, we can observe
that sophisticated statistics are really necessary only for those metrics which are
subject to high variance. In many other cases, min, max and mean values are
acceptable. For this reason, we consider also the hybrid approach that is a mix of
the previous two techniques. Each measurement may be stored, processed to keep
updated a data set, or both. This approach leads to a better trade-off between
main memory resource utilization and usefulness of the collected data. The per-
formance indexes that do not require sophisticated statistics may be computed
at run time, for the other indexes we can store the relative measurements and
postpone the evaluation during the output analysis after the experiment.
When multiple client emulators are used, it is necessary to use the data sets
and samples stored by each of them to compute the final metrics which are
Benchmarking Models and Tools for Distributed Web-Server Systems 229
with clients prior to assigning requests to the appropriate servers. On the other
hand, the latency time embodies the Web switch and server delays, since every
client TCP segment directed to a server passes through the Web switch. In this
case, the latency time does not give sufficient information to localize a possibly
overloaded Web system component.
If one-way architectures allow an approximate evaluation of component per-
formance, this estimation is practically impossible in two-way architectures, since
both packet flows pass through the switch. Hence, the above mentioned proce-
dure may lead to gross evaluation errors. In general, there is no way for mea-
suring the performance of the Web switch and the single servers through client
measurements. Therefore, the right approach is that of enabling logging at ev-
ery system component and analyzing the resulting logs at the end of the test.
Monitor facilities and a log analyzer are required to this purpose. They should
be highly configurable because different applications may have different logfile
formats. Analyzing log outputs may require integration or modifications of the
network application software because the standard logs have too coarse granu-
larities (e.g., 1 second in the Apache server). Moreover, the statistics obtained by
the internal monitors must be integrated with those of the benchmark reports.
For geographically distributed Web systems, it is necessary to measure the
time taken by the request routing mechanism, such as DNS lookup and request
redirection times.
The TPC-W specification recommends a report including graphs for the follow-
ing metrics: CPU utilization, memory utilization, page/swap activity, system
activity, Web server statistics (number of requests and error rates per second).
No session-oriented statistics are planned, but the provided graphs should give
an idea about the load conditions of the system.
from the difficulty in changing the network parameters of interest for different
test scenarios. Furthermore, it may be hard to generate a high workload using
it as discussed in [9], in which SURGE clients have been spread among different
network locations.
The majority of currently available Web benchmarking tools that operate in
high-speed LAN environment ignore the emulation of WAN conditions. Some
efforts in this direction have been pursued in some already considered bench-
marking tools (S-Client [6], WebPolygraph [42], and SpecWeb99 [38], although
quite limited in the latter) and also in WASPclient [33].
There are two main approaches that aim to emulate WAN conditions in
a LAN environment that is, centralized and distributed. In the centralized ap-
proach, one machine acting as a WAN emulator is interposed between the client
machines and the Web-server system to model WAN delays and packet losses
by dropping and delaying packets. S-Clients follows this approach, by putting a
router between the S-Client machines and the server system aimed at introducing
an artificial delay and dropping packets at a controlled rate [6].
In the distributed approach, each client acts as a WAN emulator, by di-
rectly delaying and dropping packets. WASPclient implements an interesting
distributed approach [33], by using an extended Dummynet layer in the proto-
col stack of the client machines to drop and delay packets [36]. The centralized
approach is transparent to the operating system of both client and server ma-
chines; however its scalability is limited [33]. On the contrary, the distributed
approach has the advantage that it provides a higher scalability, but it requires
modifications to the operating system of the client machines.
9 Conclusions
This study leads us to conclude that many Web benchmark tools work fine when
used to analyze a single server system, but none of them is able to address all
issues related to the analysis of distributed Web-server systems. Many popular
tools, such as SURGE and Webstone, suffer age problems, as they do not sup-
port dynamic requests and more recent protocols. Very few of them consider
application-level routing of the requests, such as DNS and HTTP redirection,
URL rewriting. In summary, we notice the lack of ability to sustain realistic Web
traffic under critical load conditions, the difficulty or impossibility of emulating
realistic dynamic and secure Web services, the poor support in analyzing col-
lected statistics different from min, max, mean values. Hence, we can conclude
that there is a lot of room for further research and implementation in this area.
References
[1] M. Arlitt. Characterizing Web user sessions. ACM Performance Evaluation Re-
view, 28(2):50–63, Sept. 2000.
[2] M. Arlitt, D. Krishnamurthy, and J. Rolia. Characterizing the scalability of a
large Web-based shopping system. ACM Trans. on Internet Technology, 1(1):44–
69, Sept. 2001.
234 M. Andreolini, V. Cardellini, and M. Colajanni
[3] M. F. Arlitt and T. Jin. A workload characterization study of the 1998 World
Cup Web site. IEEE Network, 14(3):30–37, May/June 2000.
[4] M. F. Arlitt and C. L. Williamson. Internet Web servers: Workload characteriza-
tion and performance implications. IEEE/ACM Trans. on Networking, 5(5):631–
645, Oct. 1997.
[5] H. Balakrishnan, V. Padmanabhan, S. Seshan, M. Stemm, and R. Katz. TCP
behavior of a busy Internet server: Analysis and improvements. In Proc. of IEEE
Infocom 1998, pages 252–262, San Francisco, CA, Mar. 1998.
[6] G. Banga and P. Druschel. Measuring the capacity of a Web server under realistic
loads. World Wide Web, 2(1-2):69–89, May 1999.
[7] P. Barford and M. E. Crovella. Generating representative Web workloads for net-
work and server performance evaluation. In Proc. of ACM Performance 1998/Sig-
metrics 1998, pages 151–160, Madison, WI, July 1998.
[8] P. Barford and M. E. Crovella. A performance evaluation of Hyper Text Transfer
Protocols. In Proc. of ACM Sigmetrics 1999, pages 188–197, Atlanta, May 1999.
[9] P. Barford and M. E. Crovella. Critical path analysis of TCP transactions.
IEEE/ACM Trans. on Networking, 9(3):238–248, June 2001.
[10] V. Cardellini, E. Casalicchio, M. Colajanni, and P. S. Yu. The state of the art in
locally distributed Web-server systems. ACM Computing Surveys, 34(2):263–311,
June 2002.
[11] V. Cardellini, M. Colajanni, and P. S. Yu. Geographic load balancing for scalable
distributed Web systems. In Proc. of IEEE MASCOTS 2000, pages 20–27, San
Francisco, CA, Aug./Sept. 2000.
[12] M. E. Crovella and A. Bestavros. Self-similarity in World Wide Web traffic:
Evidence and possible causes. IEEE/ACM Trans. on Networking, 5(6):835–846,
Dec. 1997.
[13] R. T. Fielding, J. Gettys, J. C. Mogul, H. F. Frystyk, L. Masinter, P. J. Leach,
and T. Berners-Lee. Hypertext Transfer Protocol – HTTP/1.1. RFC 2616, June
1999.
[14] Intel Corp. Using the RDTSC instruction for performance monitoring, July 1998.
http://cedar.intel.com/software/idap/media/pdf/rdtscpm1.pdf.
[15] A. K. Iyengar, M. S. Squillante, and L. Zhang. Analysis and characterization
of large-scale Web server access patterns and performance. World Wide Web,
2(1-2):85–100, Mar. 1999.
[16] R. Jain. The Art of Computer Systems Performance Analysis: Techniques for Ex-
perimental Design, Measurement, Simulation, and Modeling. Wiley-Interscience,
1991.
[17] R. Jain and I. Chlamtac. The P-Square algorithm for dynamic calculation of
percentiles and histograms without storing observations. ACM Communications,
28(10), Oct. 1985.
[18] D. Kegel. The C10K problem, 2002. http://www.kegel.com/c10k.html.
[19] B. Krishnamurthy, J. C. Mogul, and D. M. Kristol. Key differences between
HTTP/1.0 and HTTP/1.1. Computer Networks, 31(11-16):1737–1751, 1999.
[20] B. Krishnamurthy and J. Rexford. Web Protocols and Practice: HTTP/1.1, Net-
working Protocols, Caching, and Traffic Measurement. Addison-Wesley, Reading,
MA, 2001.
[21] B. Krishnamurthy and C. E. Wills. Analyzing factors that influence end-to-end
Web performance. Computer Networks, 33(1-6):17–32, 2000.
[22] D. Krishnamurthy and J. Rolia. Predicting the QoS of an electronic commerce
server: Those mean percentiles. In Proc. of Workshop on Internet Server Perfor-
mance, Madison, WI, June 1998.
Benchmarking Models and Tools for Distributed Web-Server Systems 235
1 Introduction
Many computing systems consist of a possibly huge number of components that
not only work independently but also communicate with each other. Examples of
such systems are communication protocols, operating systems, embedded control
systems for automobiles, airplanes, and medical equipment, banking systems,
automated production systems, control systems of nuclear and chemical plants,
railway signaling systems, air traffic control systems, distributed systems and
algorithms, computer architectures, and integrated circuits.
The catastrophic consequences – loss of human lives, environmental damages,
and financial losses – of failures in many of these critical systems have compelled
computer scientists and engineers to develop techniques for ensuring that these
systems are designed and implemented correctly despite of their complexity.
The need of formal methods in developing complex systems is becoming well
accepted. Formal methods seek to introduce mathematical rigor into each stage
of the design process in order to build more reliable systems.
The need of formal methods is even more urgent when planning and im-
plementing concurrent and distributed systems. In fact, they require a huge
amount of detail to be taken into account (e.g., interconnection and synchroniza-
tion structure, allocation and management of resources, real time constraints,
performance requirements) and involve many people with different skills in the
project (designers, implementors, debugging experts, performance and quality
M.C. Calzarossa and S. Tucci (Eds.): Performance 2002, LNCS 2459, pp. 236–260, 2002.
c Springer-Verlag Berlin Heidelberg 2002
Stochastic Process Algebra 237
another action of the same type. In this approach, patient synchronizations are
considered, i.e. the rate of the action resulting from the synchronization of two
equally typed actions of E1 and E2 is given by the minimum of the two total
rates with which E1 and E2 can execute actions of the considered type, multiplied
by the local execution probabilities of the two synchronizing actions. Following
the terminology of [36], in [26] a generative-reactive synchronization discipline
complying with the bounded capacity assumption is adopted, which is based on
the systematic use of prioritized, weighted passive actions of the form <a, ∗l,w >.
The idea is that the nonpassive actions probabilistically determine the type of
action to be executed at each step, while the passive actions of the determined
type probabilistically react in order to identify the subterms taking part in the
synchronization. In order for two equally typed actions to synchronize, in this
approach one of them must be passive and the rate of the resulting action is
given by the rate of the nonpassive action multiplied by the local execution
probability of the passive action. Finally, in [40] equal instantaneous activities
can synchronize, while time passages cannot. Therefore, when both E1 and E2
can let time pass, in this approach the overall time passage is the maximum of
the two local, exponentially distributed time passages.
side or the right hand side summand of the alternative compositions. As far as
the parallel composition operator is concerned, the related inference rules must
embody the desired synchronization discipline.
The resulting LTS is an interleaving semantic model, which means that ev-
ery parallel computation is represented through a choice between all the se-
quential computations that can be obtained by interleaving the execution of the
actions of the subterms composed in parallel. As an example, the parallel term
<a, λ>.0 ∅ <b, μ>.0 and the sequential term <a, λ>.<b, μ>.0+<b, μ>.<a, λ>.0
are given the same LTS up to state names:
a ,λ b ,μ
b ,μ a ,λ
each of them takes one time unit so that the performance model turns out to be
a discrete time Markov chain (DTMC). CTMCs and DTMCs can then be ana-
lyzed through standard techniques [69], mainly based on rewards [52], to derive
performance measures.
is known that in some cases the Markovian testing equivalence produces a more
compact exact aggregation.
multimedia stream [23], adaptive mechanisms for transmitting voice over IP [21,
3], ATM switches [2], replicated web services [11], Lehmann-Rabin randomized
algorithm for dining philosophers [10], and comparison of six mutual exclusion
algorithms [13].
SPA supports the compositional modeling of complex systems via algebraic op-
erators. However, this feature has not been exploited yet to enforce an easier
and more controlled way of describing systems that makes SPA technicalities
transparent to the designer. As an example, if a system is made out of a certain
number of components, with SPA the system is simply described as the parallel
composition of a certain number of subterms, each representing the behavior of
a single component, with suitable synchronization sets to represent the compo-
nent interactions. It is desirable to be able to describe the same system at a
higher level of abstraction, where the parallel composition operators and the re-
lated synchronization sets do not come into play. It is more natural to separately
define the behavior of each type of component, to indicate the actions through
which each component type interacts with the others, to declare the instances of
each component type that form the system, and to specify the way in which the
interacting actions are attached to each other in order to make the component
instances interact. This view brings the advantage that the system components
and the component interactions are clearly elucidated, with the synchronization
mechanism being hidden (e.g. interacting actions must not necessarily have the
same type). Another strength is the capability of defining the behavior – possi-
bly parametrized w.r.t. action rates – and the interactions of a component type
just once and subsequently reusing it as many times as there are instances of
that component type in the system. Additionally, it is desirable that composite
systems can be described in a hierachical way, and that a graphical support is
provided for the whole modeling process.
Besides this useful syntactical sugar, checks are needed to detect possible
mismatches when assembling components together and to identify the compo-
nents that cause such mismatches. A typical example is deadlock freedom. If
we put together some components that we know to be deadlock free, we would
like that their combination is still deadlock free. In order to investigate that,
we need suitable checks that allow deadlock to be quickly detected and some
diagnostic information to be obtained for localizing the source of deadlock. As
another example, in order to evaluate the performance of a system, its model
must be performance closed. In this case, a check at the syntax level is helpful
to easily detect and pinpoint possible violations of the performance closure.
In this section we show how SPA can be enhanced to work with at the ar-
chitectural level of design. Based on ideas contained in [4,16,19,20], we illustrate
how SPA can be turned into a fully fledged ADL for the modeling, functional
verification, and performance evaluation of complex systems. Recalled that the
transformation is largely independent of the specific SPA, we concentrate on
246 M. Bernardo, L. Donatiello, and P. Ciancarini
EMPAgr [26] – which includes prioritized, weighted immediate and passive ac-
tions and the generative-reactive synchronization discipline – and we exhibit the
resulting SPA based ADL called Æmilia [15,9]. The description of a system with
Æmilia can be done in a compositional, hierachical, graphical and controlled
way. First, we have to define the behavior of the types of components in the
system and their interactions with the other components. The functional and
performance aspects of the behavior are described through a family of EMPAgr
terms or the invocation of the specification of a previously modeled system,
while the interactions are described through actions occurring in the behavior.
Second, we have to declare the instances of each type of component present in
the system and the way in which their interactions are attached to each other
in order to allow the instances to communicate. This process is supported by
a graphical notation. Then, the whole behavior of the system is a family of
EMPAgr terms transparently obtained by composing in parallel the behavior of
the declared instances according to the specified attachments. From the whole
behavior, integrated, functional and performance semantic models can be au-
tomatically derived, which can undergo to the analysis techniques mentioned
in Sect. 2. In addition to that, Æmilia comes equipped with some architectural
checks for ensuring deadlock freedom and performance closure.
tural description in Æmilia, the boxes denote the AEIs, the black circles denote
the local interactions, the white squares denote the architectural interactions,
and the directed edges denote the attachments. As an example, the architec-
tural type PipeFilter can be pictorially represented through the flow graph of
Fig. 1. From a methodological viewpoint, when modeling an architectural type
with Æmilia, it is convenient to start with the flow graph representation of the
architectural type and then to textually specify the behavior of each AET.
a c c e p t_ ite m
0011
F 0 : F ilte r T
0110 s e r v e _ i t e m
1010
10 a c c e p t _ i t e m
P : P ip e T
fo r w a r d 00000000000000
11111111111111
00000000000000
11111111111111
_ ite m 1 11111111111111
00000000000000
00000000000000
11111111111111
fo r w a r d _ ite m 2
00000000000000
11111111111111 00000000000000
11111111111111
00000000000000
11111111111111 00000000000000
11111111111111
a c c e p t_ ite m 00000000000000
11111111111111 00000000000000 a
11111111111111 c c e p t_ ite m
F 1 : F ilte r T F 2 : F ilte r T
se rv
0011
e ite m
0011
s e r v e ite m
The interested reader is referred to [9,16] for a formal definition of the trans-
lation semantics.
type to be performance closed, the basic condition to check is that no AET be-
havior contains a passive action whose type is not an interaction, and that every
set of attached local interactions contains one interaction whose associated rate
is exponential or immediate.
We conclude by referring the interested reader to [9,16] for a precise definition
and examples of application of the architectural checks outlined in this section.
C : C lie n tT
g e n e r a te _ r e q u e s t a c c e p t_ o u tc o m e
r e c e iv e fo r w a r d
N r : N e tw o r k T N o : N e tw o r k T
fo r w a r d r e c e iv e
a c c e p t_ r e q u e s t g e n e r a te _ o u tc o m e
S : S e rv e rT a c c e p t_ ite m
0011
F 0 : F ilte r T
0110 s e r v e _ i t e m
1010
10 a c c e p t _ i t e m
P : P ip e T
f o r w a r d 11111111111111
00000000000000
00000000000000
11111111111111
_ ite m 1 11111111111111
00000000000000
00000000000000
11111111111111
fo r w a r d _ ite m 2
11111111111111
00000000000000 00000000000000
11111111111111
00000000000000
11111111111111 00000000000000
11111111111111
a c c e p t_ ite m 00000000000000
11111111111111 00000000000000 a
11111111111111 c c e p t_ ite m
F 1 : F ilte r T F 2 : F ilte r T
s e r v e _ ite m 0011 0011s e r v e _ ite m
The most complete form of architectural invocation is the one in which both
actual AETs and an actual topology are passed that are different from the
corresponding formal AETs and formal topology, respectively. In this case, we
have to additionally make sure that the actual topology conforms to the formal
topology. There are three kinds of admitted topological extensions, all of which
preserve the compatibility, interoperability, and performance closure properties
under some general conditions.
. . .
. . .
u n ic o n n − u n ic o n n u n ic o n n − a n d c o n n u n ic o n n − o rc o n n
a c c e p t_ ite m
0011
F 0 : F ilte r T
01 s e r v e _ i t e m
1010
1010 a c c e p t _ i t e m
P : P ip e T
11111111111111111111111111111111
00000000000000000000000000000000
00000000000000000000000000000000
11111111111111111111111111111111
fo r w a r d _ ite m 1 111111111111111111111111111111111
000000000000000000000000000000000
000000000000000000000000000000000
111111111111111111111111111111111
fo r w a r d _ ite m 2
00000000000000000000000000000000
11111111111111111111111111111111 000000000000000000000000000000000
111111111111111111111111111111111
a c c e p t_ ite m 00000000000000000000000000000000
11111111111111111111111111111111
00000000000000000000000000000000
11111111111111111111111111111111
000000000000000000000000000000000
111111111111111111111111111111111
000000000000000000000000000000000
111111111111111111111111111111111 a c c e p t_ ite m
000000000000000000000000000000000
111111111111111111111111111111111
F 1 : F ilte r T F 2 : F ilte r T
s e r v e _ ite m 0110 s e r v e _ i t e m
1010
a c c e p t_ ite m 10 a c c e p t _ i t e m
P ’ : P ip e T P ’’ : P ip e T
f o r w a r d 111111111111111
000000000000000 11111111111111
00000000000000 f o r w a r d 111111111111111
000000000000000 11111111111111
00000000000000
111111111111111
000000000000000
_ ite m 1
00000000000000
11111111111111
fo r w a r d _ ite m 2 111111111111111
000000000000000
_ ite m 1
00000000000000
11111111111111
fo r w a r d _ ite m 2
000000000000000
111111111111111 00000000000000
11111111111111 000000000000000
111111111111111 00000000000000
11111111111111
a c c e p t_ ite m 000000000000000
111111111111111
000000000000000
111111111111111 00000000000000
11111111111111
00000000000000 a c c e p
11111111111111 t_ ite m a c c e p t_ ite m 000000000000000
111111111111111
000000000000000
111111111111111 00000000000000
11111111111111
00000000000000 a c c e p
11111111111111 t_ ite m
se n d r e c e iv e
IS : In itS ta tio n T
r e c e iv e se n d
S 1 : S ta tio n T S 3 : S ta tio n T
se n d r e c e iv e
r e c e iv e se n d
se n d r e c e iv e
S 2’ : S t a t i o n T S 2’ ’ : S t a t i o n T
the first station allowed to send a message. Suppose that the Æmilia description
declares one instance of the initial station and three instances of the normal
station. Every instance of the architectural type, say Ring, can thus admit a
single initial station and three normal stations connected to form a ring, whereas
it would be desirable to be able to express by means of that architectural type
any ring system with an arbitrary number of normal stations. E.g., the flow graph
in Fig. 5 should be considered as a legal extension of the architectural type Ring.
The idea behind the endogenous extensions is that of replacing a set of AEIs with
a set of new instances of the already defined AETs, in a way that follows the
prescribed topology. In this case, we consider the frontier of the architectural
type w.r.t. one of the replaced AEIs to be the set of interactions previously
attached to the local interactions of the replaced AEI. On the other hand, all
the replacing AEIs that will be attached to the frontier of the architectural type
w.r.t. one of the replaced AEIs must be of the same type as the replaced AEI.
We conclude by referring the interested reader to [9,16,19,20] for a precise
definition of the behavioral and topological conformity checks outlined in this
section.
4 Conclusion
In this paper we have recalled the basic notions and the main achievements in
the field of SPA and we have stressed its current transformation into a fully
fledged ADL for the compositional, graphical, hierarchical and controlled mod-
eling of complex systems as well as their functional verification and performance
evaluation. Such a transformation eases the modeling process and provides an
added value given by some architectural checks for detecting deadlock as well as
performance underspecification, which scale over families of architectures.
Concerning future work in the area of SPA based ADLs, first of all we mention
the importance of devising additional architectural checks on the performance
side, that provide diagnostic information like in the case of the compatibility and
Stochastic Process Algebra 257
References
1. M. Ajmone Marsan, G. Balbo, G. Conte, S. Donatelli, G. Franceschinis, “Modelling
with Generalized Stochastic Petri Nets”, John Wiley & Sons, 1995
2. A. Aldini, M. Bernardo, R. Gorrieri, “An Algebraic Model for Evaluating the Per-
formance of an ATM Switch with Explicit Rate Marking”, in Proc. of the 7th Int.
Workshop on Process Algebra and Performance Modelling (PAPM 1999), Prensas
Universitarias de Zaragoza, pp. 119-138, Zaragoza (Spain), 1999
3. A. Aldini, M. Bernardo, R. Gorrieri, M. Roccetti, “Comparing the QoS of Inter-
net Audio Mechanisms via Formal Methods”, in ACM Trans. on Modeling and
Computer Simulation 11:1-42, 2001
4. R. Allen, D. Garlan, “A Formal Basis for Architectural Connection”, in ACM
Trans. on Software Engineering and Methodology 6:213-249, 1997
5. J.C.M. Baeten, W.P. Weijland, “Process Algebra”, Cambridge University Press,
1990
6. C. Baier, B. Haverkort, H. Hermanns, J.-P. Katoen, “On the Logical Charac-
terisation of Performability Properties”, in Proc. of the 27th Int. Coll. on Au-
tomata, Languages and Programming (ICALP 2000), LNCS 1853:780-792, Geneve
(Switzerland), 2000
7. C. Baier, B. Haverkort, H. Hermanns, J.-P. Katoen, “Model Checking Continuous-
Time Markov Chains by Transient Analysis”, in Proc. of the 12th Int. Conf. on
Computer Aided Verification (CAV 2000), LNCS 1855:358-372, Chicago (IL), 2000
8. C. Baier, J.-P. Katoen, H. Hermanns, “Approximate Symbolic Model Checking of
Continuous Time Markov Chains”, in Proc. of the 10th Int. Conf. on Concurrency
Theory (CONCUR 1999), LNCS 1664:146-162, Eindhoven (The Netherlands), 1999
9. S. Balsamo, M. Bernardo, M. Simeoni, “Combining Stochastic Process Algebras
and Queueing Networks for Software Architecture Analysis”, to appear in Proc. of
the 3rd Int. Workshop on Software and Performance (WOSP 2002), Rome (Italy),
2002
10. M. Bernardo, “Theory and Application of Extended Markovian Process Algebra”,
Ph.D. Thesis, University of Bologna (Italy), 1999
11. M. Bernardo, “A Simulation Analysis of Dynamic Server Selection Algorithms for
Replicated Web Services”, in Proc. of the 9th Int. Symp. on Modeling, Analysis
and Simulation of Computer and Telecommunication Systems (MASCOTS 2001),
IEEE-CS Press, pp. 371-378, Cincinnati (OH), 2001
258 M. Bernardo, L. Donatiello, and P. Ciancarini
31. P. D’Argenio, “Algebras and Automata for Timed and Stochastic Systems”, Ph.D.
Thesis, University of Twente (The Netherlands), 1999
32. R. De Nicola, M.C.B. Hennessy, “Testing Equivalences for Processes”, in Theoret-
ical Computer Science 34:83-133, 1983
33. D. Ferrari, “Considerations on the Insularity of Performance Evaluation”, in IEEE
Trans. on Software Engineering 12:678-683, 1986
34. S. Gilmore, “The PEPA Workbench User Manual”,
http://www.dcs.ed.ac.uk/pepa/tools.html, 2001
35. S. Gilmore, J. Hillston, D.R.W. Holton, M. Rettelbach, “Specifications in Stochas-
tic Process Algebra for a Robot Control Problem”, in Journal of Production Re-
search 34:1065-1080, 1996
36. R.J. van Glabbeek, S.A. Smolka, B. Steffen, “Reactive, Generative and Stratified
Models of Probabilistic Processes”, in Information and Computation 121:59-80,
1995
37. R.J. van Glabbeek, F.W. Vaandrager, “Petri Net Models for Algebraic Theories
of Concurrency”, in Proc. of the Conf. on Parallel Architectures and Languages
Europe (PARLE 1987), LNCS 259:224-242, Eindhoven (The Netherlands), 1987
38. N. Götz, “Stochastische Prozeßalgebren – Integration von funktionalem Entwurf
und Leistungsbewertung Verteilter Systeme”, Ph.D. Thesis, University of Erlangen
(Germany), 1994
39. P.G. Harrison, J. Hillston, “Exploiting Quasi-Reversible Structures in Markovian
Process Algebra Models”, in Computer Journal 38:510-520, 1995
40. H. Hermanns, “Interactive Markov Chains”, Ph.D. Thesis, University of Erlangen
(Germany), 1998
41. H. Hermanns, U. Herzog, J. Hillston, V. Mertsiotakis, M. Rettelbach, “Stochas-
tic Process Algebras: Integrating Qualitative and Quantitative Modelling”, Tech.
Rep. 11/94, University of Erlangen (Germany), 1994
42. H. Hermanns, U. Herzog, V. Mertsiotakis, “Stochastic Process Algebras as a Tool
for Performance and Dependability Modelling”, in Proc. of the 1st IEEE Int. Com-
puter Performance and Dependability Symp. (IPDS 1995), IEEE-CS Press, pp. 102-
111, Erlangen (Germany), 1995
43. H. Hermanns, J.-P. Katoen, “Automated Compositional Markov Chain Generation
for a Plain-Old Telephone System”, in Science of Computer Programming 36:97-
127, 2000
44. H. Hermanns, J. Meyer-Kayser, M. Siegle, “Multi Terminal Binary Decision Di-
agrams to Represent and Analyse Continuous Time Markov Chains”, in Proc. of
the 3rd Int. Workshop on the Numerical Solution of Markov Chains (NSMC 1999),
Zaragoza (Spain), 1999
45. H. Hermanns, M. Rettelbach, “Syntax, Semantics, Equivalences, and Axioms for
MTIPP”, in Proc. of the 2nd Int. Workshop on Process Algebra and Performance
Modelling (PAPM 1994), pp. 71-87, Erlangen (Germany), 1994
46. U. Herzog, “Formal Description, Time and Performance Analysis – A Framework”,
in Entwurf und Betrieb verteilter Systeme, Informatik Fachberichte 264, Springer,
1990
47. U. Herzog, “EXL: Syntax, Semantics and Examples”, Tech. Rep. 16/90, University
of Erlangen (Germany), 1990
48. J. Hillston, “A Compositional Approach to Performance Modelling”, Cambridge
University Press, 1996
49. J. Hillston, N. Thomas, “Product Form Solution for a Class of PEPA Models”, in
Performance Evaluation 35:171-192, 1999
260 M. Bernardo, L. Donatiello, and P. Ciancarini
Abstract. Markov chains (and their extensions with rewards) have been
widely used to determine performance, dependability and performability
characteristics of computer communication systems, such as throughput,
delay, mean time to failure, or the probability to accumulate at least a
certain amount of reward in a given time.
Due to the rapidly increasing size and complexity of systems, Markov
chains and Markov reward models are difficult and cumbersome to spec-
ify by hand at the state-space level. Therefore, various specification for-
malisms, such as stochastic Petri nets and stochastic process algebras,
have been developed to facilitate the specification of these models at a
higher level of abstraction. Up till now, however, the specification of the
measure-of-interest is often done in an informal and relatively unstruc-
tured way. Furthermore, some measures-of-interest can not be expressed
conveniently at all.
In this tutorial paper, we present a logic-based specification technique
to specify performance, dependability and performability measures-of-
interest and show how for a given finite Markov chain (or Markov re-
ward model) such measures can be evaluated in a fully automated way.
Particular emphasis will be given to so-called path-based measures and
hierarchically-specified measures. For this purpose, we extend so-called
model checking techniques to reason about discrete- and continuous-time
Markov chains and their rewards. We also report on the use of techniques
such as (compositional) model reduction and measure-driven state-space
generation to combat the infamous state space explosion problem.
1 Introduction
Over the last decades many techniques have been developed to specify and solve
performance, dependability and performability models. In many cases, the mod-
els addressed possess a continuous-time Markov chain as their associated stochas-
tic process. To avoid the specification of performance models directly at the state
Corresponding author; [email protected], phone: +31 53 489-4661.
M.C. Calzarossa and S. Tucci (Eds.): Performance 2002, LNCS 2459, pp. 261–289, 2002.
c Springer-Verlag Berlin Heidelberg 2002
262 C. Baier et al.
level, high-level specification methods have been developed, most notably those
based on stochastic Petri nets, stochastic process algebras, and stochastic ac-
tivity networks. With appropriate tools supporting these specification methods,
such as, for instance, provided by TIPPtool [36], the PEPA workbench [23],
GreatSPN [13], UltraSAN [56] or SPNP [14], it is relatively comfortable to spec-
ify performance models of which the associated CTMCs have millions of states.
In combination with state-of-the-art numerical means to solve the resulting linear
system of equations (for steady-state measures) or the linear system of differen-
tial equations (for time-dependent or transient measures) a good workbench is
available to construct and solve dependability models of complex systems.
However, whereas the specification of performance and dependdability mod-
els has become very comfortable, the specification of the measures of interest
most often has remained fairly cumbersome. In particular, most often only sim-
ple state-based measures can be defined with relative ease.
In contrast, in the area of formal methods for system verification, in particu-
lar in the area of model checking, very powerful logic-based methods have been
developed to express properties of systems specified as finite state automata
(note that we can view a CTMC as a special type of such an automaton). Not
only are suitable means available to express state-based properties, a logic like
CTL [16] (Computational Tree Logic; see below) also allows one to express prop-
erties over state sequences. Such capabilities would also be welcome in specifying
performance and dependability measures.
To fulfil this aim, we have introduced the so-called continuous stochastic
logic (CSL) that provides us ample means to specify state- as well as path-based
performance measures for CTMCs in a compact and flexible way [1,2,3,4,5].
Moreover, due to the formal syntax and semantics of CSL, we can exploit the
structure of CSL-specified measures in the subsequent evaluation process, such
that typically the size of the underlying Markov chains that need to be evaluated
can be reduced considerably.
To further strengthen the applicability of the stochastic model checking ap-
proach we recently considered Markov models involving costs or rewards, as they
are often used in the performability context. We extended the logic CSL to the
continuous stochastic reward logic CSRL in order to specify steady-state, tran-
sient and path-based measures over CTMCs extended with a reward structure
(Markov reward models) [4]. We showed that well-known performability mea-
sures, most notably also the performability distribution introduced by Meyer [51,
52,53], can be specified using CSRL. However, CSRL allows for the specification
of new measures that have not yet been addressed in the performability liter-
ature. For instance, when rewards are interpreted as costs, we can express the
probability that, given a starting state, a certain goal state is reached within
t time units, thereby deliberately avoiding or visiting certain immediate states,
and with a total cost (accumulated reward) below a certain threshold.
We have introduced CSL and CSRL (including its complete syntax and for-
mal semantics) in a much more theoretical context as we do in this tutorial paper
(cf. [2,3,4,5,33]).
Automated Performance and Dependability Evaluation 263
modelling
formalising
Measure Specification
Model of system (desired performance)
evaluation
D e p e n d a b ility
C h e c k e r
solution
N u m e ric a l R e s u lts
This section recalls the basic concepts of discrete- and continuous-time Markov
chains with finite state space. The presentation is focused on the concepts needed
for the understanding of the rest of this paper; for a more elaborate treatment
we refer to [21,43,47,48,59]. We slightly depart from the standard notations by
representing a Markov chain as an ordinary finite transition system where the
edges are equipped with probabilistic information, and where states are labelled
with atomic propositions, taken from a set AP. Atomic propositions identify
specific situations the system may be in, such as “acknowledgement pending”,
“buffer empty”, or “variable X is positive”.
state s are possible and are modelled by having R(s, s) > 0. We thus allow the
system to occupy
the same state before and after taking a transition.
Let E(s) = s ∈S R(s, s ), the total rate at which any transition emanating
from state s is taken.1 More precisely, E(s) specifies that the probability of
leaving s within t time-units (for positive t) is 1 − e−E(s)·t . The probability of
eventually moving from state s to s , denoted P(s, s ), is determined by the
probability that the delay of going from s to s finishes before the delays of
other outgoing edges from s; formally, P(s, s ) = R(s, s )/E(s) (except if s is an
absorbing state, i.e. if E(s) = 0; in this case we define P(s, s ) = 0). The matrix
P describes an embedded DTMC of the CTMC.
Steady-state and transient measures. For CTMCs, two major types of state
probabilities are normally considered: steady-state probabilities where the sys-
tem is considered “in the long run”, i.e., when an equilibrium has been reached,
and transient probabilities where the system is considered at a given time instant
t. Formally, the transient probability
π(s, s , t) = Pr{σ ∈ Path(s) | σ@t = s },
stands for the probability to be in state s at time t given the initial state s. We
denote with π(s, t) the vector of state probabilities (ranging over states s ) at
time t, when the starting state is s. The transient probabilities are then computed
from a system of linear differential equations:
π (s, t) = π(s, t) · Q,
which can be solved by standard numerical methods or by specialised methods
such as uniformisation [45,26,25]. With uniformisation, the transient probabili-
ties of a CTMC are computed via a uniformised DTMC which characterises the
CTMC at discrete state transition epochs. Steady-state probabilities are defined
as
π(s, s ) = lim π(s, s , t),
t→∞
This limit always exists for finite CTMCs. In case the steady-state distribution
does not depend on the starting state s we often simply write π(s ) instead of
π(s, s ). For S ⊆ S, π(s, S ) = s ∈S π(s, s ) denotes the steady-state probabil-
ity for the set of states S . In this case, steady-state probabilities are computed
from a system of linear equations:
π(s) · Q = 0 with π(s, s ) = 1,
s
requirements system
Formalizing Modeling
property
specification system model
Model Checking
it allows to express for instance the temporal ordering of events. Note that the
term “temporal” is meant in a qualitative sense, not in a quantitative sense. An
important logic for which efficient model checking algorithms exist is CTL [16]
(Computational Tree Logic). This logic allows to state properties over states,
and over paths using the following syntax:
State-formulas
Φ ::= a | ¬Φ |Φ ∨ Φ | ∃ϕ | ∀ϕ
a : atomic proposition
∃ϕ : there Exists a path that fulfils ϕ
∀ϕ : All paths fulfil ϕ
Path-formulas
ϕ ::= X Φ | ΦU Φ
XΦ : the neXt state fulfils Φ
ΦU Ψ : Φ holds along the path, Until Ψ holds
3Φ : true U Φ, i.e., eventually Φ
2Φ : ¬3¬Φ, i.e., invariantly Φ
The meaning of atomic propositions, negation (¬) and disjunction (∨) is stan-
dard; note that using these operators, other boolean operators such as conjunc-
tion (∧), implication (⇒), and so forth, can be defined. The state-formula ∃ϕ
is valid in state s if there exists some path starting in s and satisfying ϕ. The
formula ∃3deadlock, for example, expresses that for some system run eventually
a deadlock can be reached (potential deadlock). On the contrary, ∀ϕ is valid if
all paths satisfy ϕ; ∀3deadlock thus means that a deadlock is inevitable. A path
satisfies an until-formula Φ U Ψ if the path has an initial finite prefix (possibly
Automated Performance and Dependability Evaluation 269
only containing state s) such that Φ holds at all states along the path until a
state for which Ψ holds is encountered along the path.
Example 2. Considering the TMR system example as a finite-state automaton,
some properties one can express with CTL are:
– up3 ⇒ ∃3down:
if the system is fully operational, it may eventually go down.
– up3 ⇒ ∀X (up2 ∨ down):
if the system is fully operational, any next step involves the failure of a
component.
– ∃2¬ down:
it is possible that the voter never fails.
– ∃((up3 ∨ up2 ) U down):
it is possible to have two or three processors continuously working until the
voter fails.
Model checking CTL. A model, i.e., a finite-state automaton where states are
labelled with atomic propositions, is said to satisfy a property if and only if all
its initial states satisfy this property. In order to check whether a model satisfies
a property Φ, the set Sat(Φ) of states that satisfy Φ is computed recursively,
after which it is checked whether the initial states belong to this set. For atomic
propositions this set is directly obtained from the above mentioned labelling
of the states; Sat(Φ ∧ Ψ ) is obtained by computing Sat(Φ) and Sat(Ψ ), and
then intersecting these sets; Sat(¬Φ) is obtained by taking the complement of
the entire state space with respect to Sat(Φ). The algorithms for the temporal
operators are slightly more involved. For instance, for Sat(∃X Φ) we first compute
the set Sat(Φ) and then compute those states from which one can move to this
set by a single transition. Sat(∃(Φ U Ψ )) is computed in an iterative way: (i) as
a precomputation we determine Sat(Φ) and Sat(Ψ ); (ii) we start the iteration
with Sat(Ψ ) as these states will surely satisfy the property of interest; (iii) we
extend this set by the states in Sat(Φ) from which one can move to the already
computed set by a single transition; (iv) if no new states have been added in step
(iii), we have found the required set, otherwise we repeat (iii). As the number of
states is finite, this procedure is guaranteed to terminate. The worst case time
complexity of this algorithm (after an appropriate treatment of the ∃2-operator
[16]) is linear in the size of the formula and the number of transitions in the
model.
Syntax. CSL extends CTL with two probabilistic operators that refer to the
steady-state and transient behaviour of the system being studied. Whereas the
steady-state operator refers to the probability of residing in a particular set of
states (specified by a state-formula) in the long run, the transient operator allows
us to refer to the probability of the occurrence of particular paths in the CTMC.
In order to express the time-span of a certain path, the path-operators until U
and next X are extended with a parameter that specifies a time-interval. Let I
be an interval on the real line, p a probability and a comparison operator, i.e.,
∈ { , }. The syntax of CSL now becomes:
State-formulas
Φ ::= a | ¬Φ |Φ ∨ Φ | Sp (Φ) | Pp (ϕ)
Sp (Φ) : prob. that Φ holds in steady state p
Pp (ϕ) : prob. that a path fulfils ϕ p
Path-formulas
ϕ ::= X I Φ | Φ UI Φ
X I Φ : the next state is reached at time t ∈ I and fulfils Φ
Φ U I Ψ : Φ holds along the path until Ψ holds at time t ∈ I
The state-formula Sp (Φ) asserts that the steady-state probability for the set
of Φ-states meets the bound p. The operator Pp (.) replaces the usual CTL
path quantifiers ∃ and ∀. In fact, for most cases ∃ϕ can be written as P>0 (ϕ)
and ∀ϕ as P1 (ϕ). These rules are not generally applicable due to fairness con-
siderations [6]. Pp (ϕ) asserts that the probability measure of the paths sat-
isfying ϕ meets the bound p. Temporal operators like 3, 2 and their real-
time variants 3I or 2I can be derived, e.g., Pp (3I Φ) = Pp (true U I Φ) and
Pp (2I Φ) = P1−p (3I ¬Φ). The untimed next- and until-operators are ob-
tained by XΦ = X I Φ and Φ1 U Φ2 = Φ1 U I Φ2 for I = [0, ∞).
Automated Performance and Dependability Evaluation 271
More specifically, Pp (3[t,t] in(s )) is valid in state s if the transient probability
at time t to be in state s satisfies the bound p. For instance, P.2 (3[t,t] in(s2,1 ))
is valid in state s0,0 if the transient probability of state s2,1 at time t is at most
0.2 when starting in state s0,0 . In a similar way as done for steady-state measures,
the formula P0.99 (3[t,t] up3 ∨ up2 ) requires that the probability to have 3 or
2 processors running at time t is at least 0.99. For specification convenience, a
transient-state operator
Tp
@t
(Φ) = Pp (3[t,t] Φ)
could be defined. It states that the probability for a Φ-state at time t meets the
bound p.
Path-based measures. The standard transient measures on (sets of) states are
expressed using a specific instance of the P-operator. However, by the fact that
this operator allows an arbitrary path-formula as argument, much more general
measures can be described. An example is the probability of reaching a certain
set of states provided that all paths to these states obey certain properties. For
instance,
P0.01 ((up3 ∨ up2 ) U [0,10] down)
is valid for those states where the probability of the system going down within
10 time-units after having continuously operated with at least 2 processors is at
most 0.01.
is valid for those states that with probability at least 0.5 will reach a state s
between 10 and 20 time-units, which guarantees the system to be operational
with at least 2 processors when the system is in equilibrium. Besides, prior to
reaching state s the system must be operational continuously.
To put it in a nutshell, we believe that there are two main benefits by using
CSL for specifying constraints on measures-of-interest. First, the specification is
Automated Performance and Dependability Evaluation 273
Thus, checking whether state s satisfies Sp (Φ), a standard steady-state analysis
has to be carried out, i.e., a system of linear equations has to be solved.
In case the CTMC M is not strongly-connected, the approach is to determine
the so-called bottom strongly-connected components (BSCCs) of M, i.e., the
set of strongly-connected components that cannot be left once they are reached.
Then, for each BSCC (which is an ergodic CTMC) the steady-state probability
of a Φ-state (determined in the standard way) and the probability to reach any
BSCC B from state s is determined. To check whether state s satisfies Sp (Φ)
it then suffices to verify
⎛ ⎞
⎝P rob(s, 3B) · π B (s )⎠ p,
B s ∈B∩Sat(Φ)
i.e., the probability to leave state s in the interval I times the probability
to reach a Φ-state in one step. Thus, in order to compute the set Sat(X I Φ)
we first recursively compute Sat(Φ) and add state s to Sat(X I Φ) if it fulfils
(1); this check boils down to a matrix-vector multiplication.
– Time-Bounded Until: For the sake of simplicity, we only treat the case
I = [0, t]; the general case is a bit more involved, but can be treated in a
similar way [3]. The probability P rob(s, Φ U [0,t] Ψ ) is the least solution of the
following set of equations: (i) 1, if s ∈ Sat(Ψ ), (ii) 0, if s ∈ Sat(Φ) ∪ Sat(Ψ ),
and ) t
R(s, s ) · e−E(s)·x · P rob(s , Φ U [0,t−x] Ψ ) dx (2)
0 s ∈S
otherwise. The first two cases are self-explanatory; the last equation is ex-
plained as follows. If s satisfies Φ but not Ψ , the probability of reaching a
Ψ -state from s within t time-units equals the probability of reaching some
direct successor state s of s within x time-units (x t), multiplied by the
probability to reach a Ψ -state from s in the remaining time-span t−x.
It is easy to check that for the untimed until-operator (i.e., I = [0, ∞))
equation (2) reduces to
P(s, s ) · P rob(s , Φ U Ψ ).
s ∈S
Thus, for the standard until-operator, we can check whether a state satis-
fies Pp (Φ U Ψ ) by first computing recursively the sets Sat(Φ) and Sat(Ψ )
followed by solving a linear system of equations.
Tp
@t
(Φ) = Pp (true U [t,t] Φ)
Thus, for computing P rob(s, true U [t,t] Φ) standard transient analysis techniques
can be exploited. This raises the question whether we might be able to reduce
the general case, i.e., P rob(s, Φ U [0,t] Ψ ), to an instance of transient analysis as
well. This is indeed possible: the idea is to transform the CTMC M under
consideration into another CTMC M such that checking ϕ = Φ U [0,t] Ψ on
M amounts to checking ϕ = true U [t,t] Ψ on M ; a transient analysis of M
(for time t) then suffices. The question then is, how do we transform M in
M ? Two simple observations form the basis for this transformation. First, we
observe that once a Ψ -state in M has been reached (along a Φ-path) before
time t, we may conclude that ϕ holds, regardless of which states will be visited
after having reached Ψ . Thus, as a first transformation we make all Ψ -states
absorbing. Secondly, we observe that ϕ is violated once a state has been reached
that neither satisfies Φ nor Ψ . Again, this is regardless of the states that are
visited after having reached ¬(Φ ∧ Ψ ). Thus, as a second transformation, all the
¬(Φ ∧ Ψ )-states are made absorbing. It then suffices to carry out a transient
analysis on the resulting CTMC M for time t and collect the probability mass
to be in a Ψ -state (note that M typically is smaller than M):
P robM (s, Φ U [0,t] Ψ ) = P robM (s, true U [t,t] Ψ ).
In fact, by similar observations it turns out that also verifying the general
U I -operator can be reduced to instances of (nested) transient analysis [3]. As
mentioned above, the transient probability distribution can be computed via a
uniformised DTMC which characterises the CTMC at discrete state transition
epochs. A direct application of uniformisation to compute P robM (s, Φ U [0,t] Ψ )
requires to perform this procedure for each state s. An improvement suggested
in [46] cumulates the entire vector P robM (Φ U I Ψ ) for all states simultaneously.
For a single operator U I this yields a time complexity of O(|R|·Nε ), where
|R| is the number of non-zero entries in R, and Nε is the number of iterations
within the uniformisation algorithm needed to achieve a given accuracy ε. The
value Nε can be computed a priori, it linearly depends on the maximal diago-
nal entry of the generator matrix E max , and on the maximal time bound tmax
occuring in Φ.
In total, the time complexity to decide the validity of a CSL fomula Φ on a
CTMC (S, R, L) is O(|Φ|·(|R|·E max ·tmax + |S|2.81 )), and the space complexity
is O(|R|) [5].
276 C. Baier et al.
5.1 Introduction
Example 3. For the TMR example, the reward structure can be instantiated in
different ways so as to specify a variety of performability measures. The simplest
reward structure (leading to an availability model) divides the states into opera-
tional and non-operational ones: ρ1 (s0,0 ) = 0 and ρ1 (si,0 ) = 1 for the remaining
states. A reward structure in which varying levels of trustworthiness are repre-
sented is for instance based on the number of operational processors: ρ2 (s0,0 ) =
0 and ρ2 (si,1 ) = i. As a third reward structure, one may consider the mainte-
nance costs of the system, by setting: ρ3 (s0,0 ) = c2 and ρ3 (si,1 ) = c1 · (3 − i),
where c1 is the cost to replace a processor, and c2 the cost to renew the entire
system. As a fourth option (which we do not further consider here) one can also
imagine a reward structure quantifying the power consumption in each state.
k−1
to time t can be formalised as follows. For t = j=0 tj + t with t tk we
k−1
define y(σ, t) = j=0 tj · ρ(sj ) + t · ρ(sk ). For finite paths ending at time point
t the
cumulated reward definition is slightly adapted, basically replacing t by
l−1
t − j=0 tj .
for intervals I, J ⊆ IR0 . In a similar way as before, we define 3IJ Φ = true UJI Φ
and Pp (2IJ Φ) = ¬Pp (3IJ ¬Φ). Interval I can be considered as a timing con-
straint whereas J represents a bound for the cumulative reward. The path-
formula XJI Φ asserts that a transition is made to a Φ-state at time point t ∈ I
such that the earned cumulative reward r until time t meets the bounds spec-
ified by J, i.e., r ∈ J. The semantics of Φ1 UJI Φ2 is as for Φ1 U I Φ2 with the
additional constraints that earned cumulative reward r at the time of reaching
some Φ2 -state lies in J, i.e., r ∈ J.
278 C. Baier et al.
[60,60]
Example 4. As an example property for the TMR system, P0.95 (3[0,200] true)
denotes that with probability of at least 0.95 the cumulative reward, e.g., the
incurred costs of the system for reward structure ρ3 , at time instant 60 is at most
200. Given that the reward of a state indicates the power consumed per time-
[0,30]
unit, property P<0.08 (up3 U[7,∞) (down ∨ up2 )) expresses that with probability
less than 0.08 within 30 time units at least 7 units of power have been consumed
in full operational mode before some component fails. A simpler property, that
[0,∞)
only refers to reward accumulation, P>0.5 (3[0,10] down) would say that it is likely
(probability > 0.5) to spend less than 10 units of energy before a voter failure.
For the XJI case, the definition refines the one for CSL by demanding that the
reward accumulated during the time δ(σ, 0) of staying in the first state of the
path lies in J, while for UJI the reward accumulated until the time t when
touching a Φ2 -state must be in J.
X I Φ = X[0,∞)
I
Φ and Φ1 U I Φ2 = Φ1 U[0,∞)
I
Φ2 .
Similarly, we can identify a new logic CRL (continuous reward logic) in case I =
[0, ∞) for all sub-formulas. In CRL it is only possible to refer to the cumulation
[0,∞)
of rewards, but not to the advance of time. The formula P>0.5 (3[0,10] down)
is an example property of the CRL subset of CSRL. The CRL logic will play
a special role when describing the model checking of CSRL, and therefore we
will first discuss how model checking CRL can be performed, before turning our
attention to CSRL. Before doing so, we list in Table 1 a variety of standard
performance, dependability, and performability measures and how they can be
phrased in CSRL. Here F is a generic formula playing the role of an identifier
of the failed system states of the model under study (in the TMR example, F
would be down∨up0 ). These measures correspond to basic formulas in the logic,
Automated Performance and Dependability Evaluation 279
and it is worth to highlight that much more involved and nested measures are
easily expressible in CSRL, such as
[0,85]
S>0.3 P<0.3 (3[3,5] up2 ) ⇒ P>0.1 ((¬down) U [5,∞) up3 ) .
This section discusses how model checking can be performed for CRL properties,
i.e., formulas which do only refer to the cumulation of rewards, but not to the
advance of time. We will explain how a duality result can be used to reduce
model checking of such formulas to the CSL model checking algorithm described
above.
The basic strategy is the same as for CSL, and only the path operators XJ ,
UJ need specific considerations. To calculate the probability of satisfiying such
a path formula we rely on a general duality result for MRMs and CSRL [4].
– rescaling the transition rates by the reward of their originating state, i.e.,
R (s, s ) = R(s, s )/ρ(s) and,
– inverting the reward structure, i.e., ρ (s) = 1/ρ(s).
reward in M are reversed in M−1 . In terms of the logic CSRL, this corresponds
to swapping reward and time intervals inside a CSRL formula, and allows one
to establish that
−1
P robM (s, XJI Φ) = P robM (s, XIJ Φ), and
−1
P robM (s, Φ1 UJI Φ2 ) = P robM (s, Φ1 UIJ Φ2 ).
As a consequence, one can obtain the set SatM (Φ) (comprising the states in M
−1
satisfying Φ) by computing instead SatM (Φ−1 ), i.e.,
−1
SatM (Φ) = SatM (Φ−1 ),
where Φ−1 is defined as Φ where for each sub-formula in Φ of the form XJI
or UJI the intervals I and J are swapped. For the TMR example, for Φ =
P0.9 (¬F U[10,∞) F) we have Φ−1 = P0.9 (¬F U[50,50] F). We refer to [4] for a
[50,50] [10,∞)
proof of this property, and to extensions of this result to some cases with zero
rewards. Note that we excluded zero rewards here, since otherwise the model
inversion would imply divisions by zero.
The duality result is the key to model check CRL on MRMs (satisfying the
above restriction), since the swapping of formula implies that XJ turns into X J ,
and UJ into U J . Hence, any CRL formula corresponds to a CSL formula inter-
preted on the dual MRM. As a consequence, model checking CRL can proceed
via the algorithm for CSL, with some overhead (linear in the model size plus the
formula length) needed to swap the model and swap the formula.
absorbing barrier
rate ρ(1)
height r
1 2 5
Sat(Ψ )
3
Sat(Φ ∧ ¬Ψ )
Sat(¬Φ ∧ ¬Ψ ) 4
”CTMC dimension”
all ¬ (Φ ∧ Ψ )-states are made absorbing, and have reward 0 assigned to them.
The intuitive justification is as in the CSL setting. The rewards are set to 0 since
once a path reaches a Ψ -state at time t < t, while not having accumulated more
than r reward, it suffices to be trapped in that state until time t provided no
reward will be earned anymore, i.e., ρ(s) = 0 for Ψ -state s. Note that we can
amalgamate all states satisfying Ψ and all states satisfying ¬ (Φ ∧ Ψ ), thereby
making the MRM considerably smaller.
Thus, we can restrict our attention to the computation of
[t,t]
P rob(s, true U[0,r] Ψ ). This probability, in turn, can be derived from the
transient accumulated reward distribution of the MRM. (Compare this to the
transient distribution used in the CSL case at this point.) To explain why this
is the case, we consider a two-dimensional stochastic process ((Xt , Yt ), t ≥ 0) on
S × IR≥0 , as illustrated in Figure 3. Informally speaking, this stochastic process
has a discrete component that describes the transition behaviour in the original
MRM, combined with a continuous component that describes the accumulated
reward gained over time. For t = 0 we have Yt = 0, and for t > 0 the value of Yt
increases continuously with rate ρ(Xt ). Hence, the discrete states of the original
CTMC become “columns” of which the height models the accumulated reward.
To take into account the reward bound (≤ r), we introduce an absorbing barrier
282 C. Baier et al.
for the transformed MRM described above. This allows us to decide the satisfac-
tion of time- and reward-bounded until formulas via numerical recipes for cal-
culating Pr{Yt r, Xt ∈ S } on the two dimensional stochastic process (Xt , Yt ).
It is worth to remark that similar processes (with mixed discrete-continuous
state spaces) also emerge in the analysis of non-Markovian stochastic Petri nets
(when using the supplementary variable approach, cf. [22]), Markov-regenerative
stochastic Petri nets [9], and in fluid-stochastic Petri nets [42]. We briefly sketch
three other approaches to compute Pr{Yt r, Xt ∈ S } here, which are more
directly applicable to the problem.
where d is chosen such that the probability of more than one transition in the
MRM in an interval of length d is negligible. The algorithm allows only natural
number rewards, but this is no severe restriction since rational rewards can be
scaled to yield natural numbers.
The time complexity of this method is O(|S|·t·|(t−r)|/d2 ) and the space
complexity is O(|S|·r/d). As the computational effort is proportional to d−2 ,
the computation time grows rapidly when a higher accuracy is required.
Occupation time distributions. In 2000, Sericola [57] derived a result for the
distribution of occupation times in CTMCs prior to a given point in time t. The
approach is based on uniformisation, and (as with uniformisation) it is possible
to calculate an a priori error bound for the computed values. The distribution
of this occupation time can be used to derive Pr{Yt r, Xt ∈ S }, based on
the observation that if O(s, t) is the occupation time of state s prior to t then
ρ(s) · O(s, t) is the accumulated reward for this state prior to t. Summing over
all states leads to the accumulated reward required.
The computation of the occupation time distribution is an iterative proce-
dure, which in each iteration updates a linearly growing set of matrices. The
computational and storage requirements of the approach are therefore consider-
able. If we truncate after the Nε -th iteration, we obtain an overall time complex-
ity of O(Nε3 |S|3 ) and an overall space complexity of O(Nε2 |S|3 ). Contrary to the
Erlangian approximation, N determines the accuracy of the entire computation
procedure in this approach.
up2 up1
λ μ λ μ
0111 0011
μ λ
λ up2μ λ up1 μ
1011 λ 0101
μ μ μ λ λ
up2 μ λ up1
λ μ λ
up3 1111 1101 1001 0001 up0
μ μ ν λ
ν
ν ν
δ ν
ν
0000
ν down ν
It is well known that the measures derived from M and its quotient M/R are
strongly related if R is a bisimulation. Without going into details, it is possible
286 C. Baier et al.
References
1. A. Aziz, K. Sanwal, V. Singhal, and R. Brayton. Model checking continuous time
Markov chains. ACM Transactions on Computational Logic, 1(1): 162–170, 2000.
2. C. Baier, J.-P. Katoen, and H. Hermanns. Approximate symbolic model checking
of continuous-time Markov chains. In Concurrency Theory, LNCS 1664: 146–162,
Springer-Verlag, 1999.
3. C. Baier, B.R. Haverkort, H. Hermanns, and J.-P. Katoen. Model checking
continuous-time Markov chains by transient analysis. In Computer Aided Veri-
fication, LNCS 1855: 358–372, Springer-Verlag, 2000.
4. C. Baier, B.R. Haverkort, H. Hermanns, and J.-P. Katoen. On the logical character-
isation of performability properties. In Automata, Languages, and Programming,
LNCS 1853: 780–792, Springer-Verlag, 2000.
5. C. Baier, B.R. Haverkort, H. Hermanns, and J.-P. Katoen. Model checking al-
gorithms for continuous-time Markov chains. Technical report TR-CTIT-02-10.
Centre for Telematics and Information Technology, University of Twente. 2001.
6. C. Baier and M. Kwiatkowska. On the verification of qualitative properties of
probabilistic processes under fairness constraints. Information Processing Letters,
66(2): 71–79, 1998.
7. M.D. Beaudry. Performance-related reliability measures for computing systems.
IEEE Transactions on Computers, C-27: 540–547, 1978.
8. B. Bérard, M. Bidoit, A. Finkel, F. Laroussine, A. Petit, L. Petrucci, and Ph. Sch-
noebelen. Systems and Software Verification. Springer-Verlag, 2001.
9. A. Bobbio and M. Telek. Markov regenerative SPN with non-overlapping activity
cycles. In Proc. Int’l IEEE Performance and Dependability Symposium: 124–133,
1995.
10. P. Buchholz. Exact and ordinary lumpability in finite Markov chains. Journal of
Applied Probability, 31: 59–75, 1994.
11. P. Buchholz, J.-P. Katoen, P. Kemper, and C. Tepper. Model checking large struc-
tured Markov chains. Journal of Logic and Algebraic Programming, to appear,
2001.
12. CCITT Blue Book, Fascicle III.1, International Telecommunication Union,
Geneva, 1989.
13. G. Chiola, G. Franceschinis, R. Gaeta, and M. Ribaudo. GreatSPN 1.7: graphical
editor and analyzer for timed and stochastic Petri nets. Performance Evaluation,
24 (1-2):47-68, 1995.
14. G. Ciardo, J.K. Muppala, and K.S. Trivedi. SPNP: stochastic Petri net package.
In Proc. 3rd Int. Workshop on Petri Nets and Performance Models, pp. 142–151,
IEEE CS Press, 1989.
15. G. Clark, S. Gilmore, and J. Hillston. Specifying performance measures for PEPA.
In Formal Methods for Real-Time and Probabilistic Systems, LNCS 1601: 211–227,
Springer-Verlag, 1999.
288 C. Baier et al.
37. H. Hermanns, J.-P. Katoen, J. Meyer-Kayser, and M. Siegle. A Markov chain model
checker. In Tools and Algorithms for the Construction and Analysis of Systems,
LNCS 1785: 347–362, Springer-Verlag, 2000.
38. H. Hermanns, J.-P. Katoen, J. Meyer-Kayser, and M. Siegle. Towards model check-
ing stochastic process algebra. In Integrated Formal Methods, LNCS 1945: 420–439,
Springer-Verlag, 2000.
39. H. Hermanns and M. Siegle. Bisimulation algorithms for stochastic process alge-
bras and their BDD-based implementation. In Formal Methods for Real-Time and
Probabilistic Systems, LNCS 1601: 244–265, Springer-Verlag, 1999.
40. J. Hillston. A Compositional Approach to Performance Modelling. Cambridge
University Press, 1996.
41. G.J. Holzmann. The model checker Spin. IEEE Transactions on Software Engi-
neering, 23(5): 279–295, 1997.
42. G. Horton, V. Kulkarni, D. Nicol, K. Trivedi. Fluid stochastic Petri nets: Theory,
application and solution techniques. Eur. J. Oper. Res., 105(1): 184–201,1998.
43. R.A. Howard. Dynamic Probabilistic Systems; Volume 1: Markov Models. John
Wiley & Sons, 1971.
44. G.G. Infante-Lopez, H. Hermanns, and J.-P. Katoen. Beyond memoryless distri-
butions: Model checking semi-Markov chains. In Process Algebra and Probabilistic
Methods, LNCS 2165: 57–70, Springer-Verlag, 2001.
45. A. Jensen. Markov chains as an aid in the study of Markov processes. Skandinavisk
Aktuarietidskrift 36: 87–91, 1953.
46. J.-P. Katoen, M.Z. Kwiatkowska, G. Norman, and D. Parker. Faster and symbolic
CTMC model checking. In Process Algebra and Probabilistic Methods, LNCS 2165:
23–38, Springer-Verlag, 2001.
47. J.G. Kemeny and J.L. Snell. Finite Markov Chains. Van Nostrand, 1960.
48. V.G. Kulkarni. Modeling and Analysis of Stochastic Systems. Chapman & Hall,
1995.
49. M.Z. Kwiatkowska, G. Norman, and A. Pacheco. Model checking CSL until for-
mulae with random time bounds. In Process Algebra and Probabilistic Methods,
LNCS 2399, Springer-Verlag, 2002.
50. K.L. McMillan. Symbolic Model Checking. Kluwer Academic Publishers, 1993.
51. J.F. Meyer. On evaluating the performability of degradable computing systems.
IEEE Transactions on Computers, 29(8): 720–731, 1980.
52. J.F. Meyer. Closed-form solutions of performability, IEEE Transactions on Com-
puters, 31(7): 648–657, 1982.
53. J.F. Meyer. Performability: a retrospective and some pointers to the future. Per-
formance Evaluation, 14(3-4): 139–156, 1992.
54. W.D. Obal II and W.H. Sanders. State-space support for path-based reward vari-
ables. Performance Evaluation, 35: 233–251, 1999.
55. D. Peled. Software Reliability Methods. Springer-Verlag, 2001.
56. W.H. Sanders, W.D. Obal II, M.A. Qureshi, and F.K. Widnajarko. The UltraSAN
modeling environment. Performance Evaluation, 24: 89–115, 1995.
57. B. Sericola. Occupation times in Markov processes. Stochastic Models, 16(5): 339–
351, 2000.
58. E. de Souza e Silva and H.R. Gail. Performability analysis of computer systems:
from model specification to solution. Perf. Ev., 14: 157–196, 1992.
59. W.J. Stewart. Introduction to the Numerical Solution of Markov Chains. Princeton
University Press, 1994.
60. H.C. Tijms, R. Veldman. A fast algorithm for the transient reward distribution in
continuous-time Markov chains, Operation Research Letters, 26: 155–158, 2000.
M e a s u r e m e n t-B a s e d A n a ly s is o f S y s te m D e p e n d a b ility
U s in g F a u lt I n je c tio n a n d F ie ld F a ilu r e D a ta
R a v is h a n k a r K . Iy e r a n d Z b ig n ie w K a lb a rc z y k
C e n te r fo r R e lia b le a n d H ig h -P e r fo rm a n c e C o m p u tin g
U n iv e r s ity o f Illin o is a t U rb a n a - C h a m p a ig n
1 3 0 8 W . M a in S t., U r b a n a , IL 6 1 8 0 1 -2 3 0 7
{ i y e r , k a l b a r } @ c r h c . u i u c . e d u
A b s tr a c t. T h e d is c u s s io n in th is p a p e r fo c u s e s o n th e is s u e s in v o lv e d in
a n a ly z in g th e a v a ila b ility o f n e tw o rk e d s y s te m s u s in g fa u lt in je c tio n a n d th e
fa ilu re d a ta c o lle c te d b y th e lo g g in g m e c h a n is m s b u ilt in to th e s y s te m . In
p a rtic u la r w e a d d re s s : (1 ) a n a ly s is in th e p r o to ty p e p h a s e u s in g p h y s ic a l fa u lt
in je c tio n to a n a c tu a l s y s te m . W e u s e e x a m p le o f fa u lt in je c tio n -b a s e d
e v a lu a tio n o f a s o ftw a re -im p le m e n te d fa u lt to le ra n c e (S IF T ) e n v iro n m e n t (b u ilt
a ro u n d a s e t o f s e lf-c h e c k in g p ro c e s s e s c a lle d A R M O R S ) th a t p ro v id e s e rro r
d e te c tio n a n d re c o v e ry s e rv ic e s to s p a c e b o rn e s c ie n tific a p p lic a tio n s a n d (2 )
m e a s u r e m e n t-b a s e d a n a ly s is o f s y s te m s in th e fie ld . W e u s e e x a m p le o f L A N o f
W in d o w s N T b a s e d c o m p u te rs to p re s e n t m e th o d s fo r c o lle c tin g a n d a n a ly z in g
fa ilu re d a ta to c h a ra c te riz e n e tw o rk s y s te m d e p e n d a b ility . B o th , fa u lt in je c tio n
a n d fa ilu re d a ta a n a ly s is e n a b le u s to s tu d y n a tu ra lly o c c u rrin g e rro rs a n d to
p ro v id e fe e d b a c k to s y s te m d e s ig n e rs o n p o te n tia l a v a ila b ility b o ttle n e c k s . F o r
e x a m p le , th e s tu d y o f fa ilu re s in a n e tw o rk o f W in d o w s N T m a c h in e s re v e a ls
th a t m o s t o f th e p ro b le m s th a t le a d to re b o o ts a re s o ftw a re re la te d a n d th a t
th o u g h th e a v e ra g e a v a ila b ility e v a lu a te s to o v e r 9 9 % , a ty p ic a l m a c h in e , o n
a v e ra g e , p ro v id e s a c c e p ta b le s e rv ic e o n ly a b o u t 9 2 % o f th e tim e .
1 I n tr o d u c tio n
M .C . C a lz a r o s s a a n d S . T u c c i ( E d s .) : P e r f o r m a n c e 2 0 0 2 , L N C S 2 4 5 9 , p p . 2 9 0 – 3 1 7 , 2 0 0 2 .
© S p rin g e r-V e rla g B e rlin H e id e lb e rg 2 0 0 2
M e a s u re m e n t-B a s e d A n a ly s is o f S y s te m D e p e n d a b ility 2 9 1
2 .1 S I F T E n v ir o n m e n t fo r R E E
S I F T A r c h ite c tu r e
A n A R M O R is a m u ltith re a d e d p ro c e s s in te rn a lly s tru c tu re d a ro u n d o b je c ts c a lle d
e le m e n ts th a t c o n ta in th e ir o w n p riv a te d a ta a n d p ro v id e e le m e n ta ry fu n c tio n s o r
s e r v ic e s ( e .g ., d e te c tio n a n d r e c o v e r y f o r r e m o te A R M O R p r o c e s s e s , in te r n a l s e lf -
c h e c k in g m e c h a n is m s , o r c h e c k p o in tin g s u p p o rt). T o g e th e r, th e e le m e n ts c o n s titu te
th e fu n c tio n a lity th a t d e fin e s a n A R M O R ’s b e h a v io r. A ll A R M O R s c o n ta in a b a s ic
s e t o f e le m e n ts th a t p ro v id e a c o re fu n c tio n a lity , in c lu d in g th e a b ility to (1 )
im p le m e n t re lia b le p o in t-to -p o in t m e s s a g e c o m m u n ic a tio n b e tw e e n A R M O R s , (2 )
c o m m u n ic a te w ith th e lo c a l d a e m o n A R M O R p ro c e s s , (3 ) re s p o n d to h e a rtb e a ts fro m
th e lo c a l d a e m o n , a n d (4 ) c a p tu re A R M O R s ta te . S p e c ific A R M O R s e x te n d th is c o re
fu n c tio n a lity b y a d d in g e x tra e le m e n ts .
T y p e s o f A R M O R s . T h e S IF T e n v iro n m e n t fo r R E E a p p lic a tio n s c o n s is ts o f fo u r
k in d s o f A R M O R p ro c e s s e s : a F a u lt T o le ra n c e M a n a g e r (F T M ), a H e a rtb e a t
A R M O R , d a e m o n s , a n d E x e c u tio n A R M O R s
2 9 4 R .K . Iy e r a n d Z . K a lb a r c z y k
F a u lt T o le r a n c e M a n a g e r (F T M ). A s in g le F T M e x e c u te s o n o n e o f th e n o d e s a n d
is re s p o n s ib le fo r re c o v e rin g fro m A R M O R a n d n o d e fa ilu re s a s w e ll a s in te rfa c in g
w ith th e e x te rn a l S p a c e c ra ft C o n tro l C o m p u te r (S C C ).
H e a r tb e a t A R M O R . T h e H e a rtb e a t A R M O R e x e c u te s o n a n o d e s e p a ra te fro m th e
F T M . Its s o le re s p o n s ib ility is to d e te c t a n d re c o v e r fro m fa ilu re s in th e F T M
th ro u g h th e p e rio d ic p o llin g fo r liv e n e s s .
D a e m o n s . E a c h n o d e o n th e n e tw o rk e x e c u te s a d a e m o n p ro c e s s . D a e m o n s a re th e
g a te w a y s fo r A R M O R -to -A R M O R c o m m u n ic a tio n , a n d th e y d e te c t fa ilu re s in th e
lo c a l A R M O R s .
E x e c u tio n A R M O R s . E a c h a p p lic a tio n p ro c e s s is d ire c tly o v e rse e n b y a lo c a l
E x e c u tio n A R M O R .
E r r o r D e te c tio n H ie r a r c h y
T h e to p -d o w n e rro r d e te c tio n h ie ra rc h y c o n s is ts o f:
N o d e a n d d a e m o n e r r o r s . T h e F T M p e rio d ic a lly e x c h a n g e s h e a rtb e a t m e s s a g e s
w ith e a c h d a e m o n (e v e ry 1 0 s in o u r e x p e rim e n ts ) to d e te c t n o d e c ra s h e s a n d
h a n g s . If th e F T M d o e s n o t re c e iv e a re s p o n s e b y th e n e x t h e a rtb e a t ro u n d , it
a s s u m e s th a t th e n o d e h a s fa ile d . A d a e m o n fa ilu re is tre a te d a s a n o d e fa ilu re .
A R M O R e r r o r s . E a c h A R M O R c o n ta in s a s e t o f a s s e rtio n s o n its in te rn a l s ta te ,
in c lu d in g r a n g e c h e c k s , v a lid ity c h e c k s o n d a ta ( e .g ., a v a lid A R M O R I D ) , a n d d a ta
s tru c tu re in te g rity c h e c k s . O th e r in te rn a l s e lf-c h e c k s a v a ila b le to th e A R M O R s
in c lu d e p re e m p tiv e c o n tro l flo w c h e c k in g , I/O s ig n a tu re c h e c k in g , a n d
d e a d lo c k /liv e lo c k d e te c tio n [4 ]. In o rd e r to lim it e rro r p ro p a g a tio n , th e A R M O R k ills
its e lf w h e n a n in te rn a l c h e c k d e te c ts a n e rro r. T h e d a e m o n d e te c ts c ra s h fa ilu re s in
th e A R M O R s o n th e n o d e v ia o p e ra tin g s y s te m c a lls . T o d e te c t h a n g fa ilu re s , th e
d a e m o n p e rio d ic a lly (e v e ry 1 0 s in th e e x p e rim e n ts ) s e n d s “ A re -y o u -a liv e ? ”
m e s s a g e s to its lo c a l A R M O R s .
R E E a p p lic a tio n s . A ll a p p lic a tio n c ra s h fa ilu re s a re d e te c te d b y th e lo c a l E x e c u tio n
A R M O R . C r a s h f a ilu r e s in th e M P I p r o c e s s w ith r a n k 0 c a n b e d e te c te d b y th e
E x e c u tio n A R M O R th r o u g h o p e r a tin g s y s te m c a lls ( i.e ., w a i t p i d ) . T h e o th e r
E x e c u tio n A R M O R s p e rio d ic a lly c h e c k th a t th e ir M P I p ro c e s s e s (ra n k s 1 th ro u g h n )
a re s till in th e o p e ra tin g s y s te m ’s p ro c e s s ta b le . If n o t, it c o n c lu d e s th a t th e
a p p lic a tio n h a s c ra s h e d . A n a p p lic a tio n p ro c e s s n o tifie s th e lo c a l E x e c u tio n
A R M O R th r o u g h its c o m m u n ic a tio n c h a n n e l b e f o r e e x itin g n o r m a lly s o th a t th e
A R M O R d o e s n o t m is in te r p r e t th is e x it a s a n a b n o r m a l te r m in a tio n .
M e a s u re m e n t-B a s e d A n a ly s is o f S y s te m D e p e n d a b ility 2 9 5
n e tw o rk
D a e m o n D a e m o n D a e m o n D a e m o n
S IF T S IF T S IF T S IF T
In te rfa c e In te r fa c e In te rfa c e In te r fa c e
L e g e n d :
R o v e r R o v e r O T IS O T IS
H e a rtb e a ts
P ro c e ss P ro c e ss P ro c e ss P ro c e ss
P ro g re ss In d ic a to rs
(ra n k 0 ) (ra n k 1 ) (ra n k 0 ) (ra n k 1 )
R e c o v e ry
N o d e 1 N o d e 2 N o d e 3 N o d e 4
E r r o r R e c o v e r y
N o d e s . T h e F T M m ig ra te s th e A R M O R a n d a p p lic a tio n p ro c e s s e s th a t w e re e x e c u tin g
o n th e fa ile d n o d e to o th e r w o rk in g n o d e s in th e S IF T e n v iro n m e n t.
A R M O R s . A R M O R s ta te is r e c o v e r e d f r o m a c h e c k p o in t. T o p r o te c t th e A R M O R s ta te
a g a in s t p ro c e s s fa ilu re s , a c h e c k p o in tin g te c h n iq u e c a lle d m ic r o c h e c k p o in tin g is u s e d
[3 0 ]. M ic ro c h e c k p o in tin g le v e ra g e s th e m o d u la r e le m e n t c o m p o s itio n o f th e A R M O R
p ro c e s s to in c re m e n ta lly c h e c k p o in t s ta te o n a n e le m e n t-b y -e le m e n t b a s is .
R E E A p p lic a tio n s . O n d e te c tin g a n a p p lic a tio n fa ilu re , th e E x e c u tio n A R M O R
n o tifie s th e F T M to in itia te re c o v e ry . T h e v e rs io n o f M P I u s e d o n th e R E E te s tb e d
p re c lu d e s in d iv id u a l M P I p ro c e s s e s fro m b e in g re s ta rte d w ith in a n a p p lic a tio n ;
th e re fo re , th e F T M in s tru c ts a ll E x e c u tio n A R M O R s to te rm in a te th e ir M P I p ro c e s s e s
b e fo re re s ta rtin g th e a p p lic a tio n . T h e a p p lic a tio n e x e c u ta b le b in a rie s m u s t b e re lo a d e d
fro m th e re m o te d is k d u rin g re c o v e ry .
2 .2 I n je c tio n E x p e r im e n ts
E r r o r M o d e ls
T h e e rro r m o d e ls u s e d th e in je c tio n e x p e rim e n ts re p re s e n t a c o m b in a tio n o f th o s e
e m p lo y e d in s e v e ra l p a s t e x p e rim e n ta l s tu d ie s a n d th o s e p ro p o s e d b y J P L e n g in e e rs .
S IG IN T /S IG S T O P . T h e s e s ig n a ls w e re u s e d to m im ic “ c le a n ” c ra s h a n d h a n g
fa ilu re s a s d e s c rib e d in th e in tro d u c tio n .
R e g is te r a n d te x t-s e g m e n t e r r o r s . F a u lt a n a ly s is h a s p re d ic te d th a t th e m o s t
p re v a le n t fa u lts in th e ta rg e te d s p a c e b o rn e e n v iro n m e n t w ill b e s in g le -b it m e m o ry
a n d re g is te r fa u lts , a lth o u g h s h rin k in g fe a tu re s iz e s h a v e ra is e d th e lik e lih o o d o f
c lo c k e rro rs a n d m u ltip le -b it flip s in fu tu re te c h n o lo g ie s . S e v e ra l e rro r in je c tio n s
w e re u n ifo rm ly d is trib u te d w ith in e a c h ru n s in c e e a c h in je c tio n w a s u n lik e ly to
c a u s e a n im m e d ia te fa ilu re , a n d o n ly th e m o s t fre q u e n tly u s e d re g is te rs a n d
fu n c tio n s in th e te x t s e g m e n t w e re ta rg e te d fo r in je c tio n .
H e a p e r r o r s . H e a p in je c tio n s w e re u s e d to s tu d y th e e ffe c ts o f e rro r p ro p a g a tio n .
O n e e rro r w a s in je c te d p e r ru n in to n o n -p o in te r d a ta v a lu e s o n ly , a n d th e e ffe c ts o f
th e e rro r w e re tra c e d th ro u g h th e s y s te m .
E rro rs w e re n o t in je c te d in to th e o p e ra tin g s y s te m s in c e o u r e x p e rie n c e h a s s h o w n
th a t k e rn e l in je c tio n s ty p ic a lly le d to a c ra s h , le d to a h a n g , o r h a d n o im p a c t.
M a d e ria e t a l. [1 8 ] u s e d th e s a m e R E E te s tb e d to e x a m in e th e im p a c t o f tra n s ie n t
e rro rs o n L y n x O S .
D e fin itio n s a n d M e a s u r e m e n ts
S y s te m , e x p e r im e n t, a n d r u n . W e u s e th e te rm s y s te m to re fe r to th e R E E c lu s te r a n d
a s s o c ia te d s o f tw a r e ( i.e ., th e S I F T e n v ir o n m e n t a n d a p p lic a tio n s ) . T h e s y s te m d o e s
n o t in c lu d e th e ra d ia tio n -h a rd e n e d S C C o r c o m m u n ic a tio n c h a n n e l to th e g ro u n d . A n
e rro r in je c tio n e x p e r im e n t ta rg e te d a s p e c ific p ro c e s s (a p p lic a tio n p ro c e s s , F T M ,
E x e c u tio n A R M O R , o r H e a rtb e a t A R M O R ) u s in g a p a rtic u la r e rro r m o d e l. F o r e a c h
p ro c e s s /e rro r m o d e l p a ir, a s e rie s o f r u n s w e re e x e c u te d in w h ic h o n e o r m o re e rro rs
w e re in je c te d in to th e ta rg e t p ro c e s s .
A c tiv a te d e r r o r s a n d fa ilu r e s . A n in je c tio n c a u s e s a n e rro r to b e in tro d u c e d in to th e
s y s te m ( e .g ., c o r r u p tio n a t a s e le c te d m e m o r y lo c a tio n o r c o r r u p tio n o f th e v a lu e in a
re g is te r). A n e rro r is s a id to b e a c tiv a te d if p ro g ra m e x e c u tio n a c c e s s e s th e e rro n e o u s
v a lu e . A fa ilu r e re fe rs to a p ro c e s s d e v ia tin g fro m its e x p e c te d (c o rre c t) b e h a v io r a s
d e te rm in e d b y a ru n w ith o u t fa u lt in je c tio n . T h e a p p lic a tio n c a n a ls o fa il b y
p ro d u c in g o u tp u t th a t fa lls o u ts id e a c c e p ta b le to le ra n c e lim its a s d e fin e d b y a n
e x te rn a l a p p lic a tio n -p ro v id e d v e rific a tio n p ro g ra m .
U s e r n o tifie d
U s e r s u b m its
A p p s ta rts A p p e n d s o f te rm in a tio n
a p p jo b
S e tu p th e A R M O R s
e n v iro n m e n t u n in s ta lle d
tim e
A c tu a l a p p lic a tio n
e x e c u tio n tim e
P e rc e iv e d a p p lic a tio n
e x e c u tio n tim e
F ig . 2 . P e rc e iv e d v s . A c tu a l E x e c u tio n T im e
M e a s u re m e n t-B a s e d A n a ly s is o f S y s te m D e p e n d a b ility 2 9 7
P e rc e iv e d A c tu a l
W ith o u t S IF T 7 5 .7 1 0 .6 5 7 5 .7 1 0 .6 5
W ith S IF T 7 7 .9 7 0 .4 8 7 5 .7 4 0 .4 8
2 .3 C r a s h a n d H a n g F a ilu r e s
A p p lic a tio n R e c o v e r y
H a n g s a re th e m o s t e x p e n s iv e a p p lic a tio n fa ilu re s in te rm s o f lo s t p ro c e s s in g tim e .
A p p lic a tio n h a n g s a re d e te c te d u s in g a p o llin g te c h n iq u e in w h ic h th e E x e c u tio n
A R M O R e x e c u te s a th re a d th a t w a k e s u p e v e ry 2 0 s e c o n d s to c h e c k th e v a lu e o f a
c o u n te r in c re m e n te d b y p ro g re s s in d ic a to r m e s s a g e s s e n t b y th e a p p lic a tio n . B e c a u se
th e c o u n te r is p o lle d a t fix e d in te rv a ls , th e e rro r d e te c tio n la te n c y fo r h a n g s c a n b e u p
to tw ic e th e c h e c k in g p e rio d .
T a b le 2 . S IG IN T /S IG S T O P In je c tio n R e s u lts
S u c c e ssfu l A p p . E x e c . T im e (s ) R e c o v e ry
T a rg e t F a ilu re s
R e c o v e r ie s P e rc e iv e d A c tu a l T im e (s )
S IG IN T
B a s e lin e - - 7 4 .7 8 0 .5 5 7 2 .6 8 0 .4 9 -
A p p lic a tio n 1 0 0 1 0 0 8 9 .8 0 1 .5 0 8 7 .8 8 1 .5 0 0 .4 8 0 .0 5
F T M 8 1 8 1 7 9 .6 0 1 .6 1 7 3 .8 9 0 .2 5 0 .6 4 0 .1 6
E x e c u tio n A R M O R 1 0 0 1 0 0 7 7 .9 1 1 .0 1 7 5 .9 8 1 .0 0 0 .6 1 0 .0 7
H e a rtb e a t A R M O R 9 7 9 7 7 5 .2 6 0 .9 2 7 4 .3 9 0 .9 6 0 .4 7 0 .1 2
S IG S T O P
B a s e lin e - - 7 1 .9 6 0 .3 2 7 0 .0 3 0 .2 7 -
A p p lic a tio n 8 4 8 4 1 1 2 .2 1 1 .8 7 1 1 0 .2 1 1 .8 7 0 .4 7 0 .0 5
F T M 9 7 9 7 7 6 .2 0 1 .9 4 7 0 .0 9 0 .8 8 0 .7 9 0 .1 5
E x e c u tio n A R M O R 9 8 9 8 8 5 .0 1 4 .4 1 8 2 .2 1 4 .2 8 0 .6 3 0 .1 5
H e a rtb e a t A R M O R 7 7 7 7 7 1 .8 8 0 .2 4 7 0 .2 4 0 .2 4 0 .5 6 0 .2 1
M e a s u re m e n t-B a s e d A n a ly s is o f S y s te m D e p e n d a b ility 2 9 9
S I F T E n v ir o n m e n t R e c o v e r y
F T M r e c o v e r y . T h e p e rc e iv e d e x e c u tio n tim e fo r th e a p p lic a tio n is e x te n d e d if (1 ) th e
F T M fa ils w h ile s e ttin g u p th e e n v iro n m e n t b e fo re th e a p p lic a tio n e x e c u tio n b e g in s o r
(2 ) th e F T M fa ils w h ile c le a n in g u p th e e n v iro n m e n t a n d n o tify in g th e S p a c e c ra ft
C o n tro l C o m p u te r th a t th e a p p lic a tio n te rm in a te d . T h e a p p lic a tio n is d e c o u p le d fro m
th e F T M ’s e x e c u tio n a fte r s ta rtin g , s o fa ilu re s in th e F T M d o n o t a ffe c t it. T h e o n ly
o v e rh e a d in a c tu a l e x e c u tio n tim e o rig in a te s fro m th e n e tw o rk c o n te n tio n d u rin g th e
F T M ’ s r e c o v e r y , w h ic h la s ts f o r o n ly 0 .6 - 0 .7 s .
A n F T M -a p p lic a tio n c o r r e la te d fa ilu r e . T h e e rro r in je c tio n s a ls o re v e a le d a
c o rre la te d fa ilu re in w h ic h th e F T M fa ilu re c a u s e d th e a p p lic a tio n to re s ta rt in 2 o f th e
1 7 8 ru n s (s e e [3 2 ] fo r d e s c rip tio n o f c o rre la te d fa ilu re s c e n a rio s ).
T h e S IF T e n v iro n m e n t is a b le to re c o v e r fro m th is c o rre la te d fa ilu re b e c a u s e th e
c o m p o n e n ts p e rfo rm in g th e d e te c tio n (H e a rtb e a t A R M O R d e te c tin g F T M fa ilu re s a n d
E x e c u tio n A R M O R d e te c tin g a p p lic a tio n fa ilu re s ) a re n o t a ffe c te d b y th e fa ilu re s .
E x e c u tio n A R M O R . O f th e 1 9 8 c ra s h /h a n g e rro rs in je c te d in to th e E x e c u tio n
A R M O R s , 1 7 5 re q u ire d re c o v e ry o n ly in th e E x e c u tio n A R M O R . F o r th e s e ru n s , th e
a p p lic a tio n e x e c u tio n o v e rh e a d w a s n e g lig ib le . T h e o v e rh e a d re p o rte d in T a b le 2 (u p
to 1 0 % fo r h a n g fa ilu re s ) re s u lte d fro m th e re m a in in g 2 3 c a s e s in w h ic h th e
a p p lic a tio n w a s fo rc e d to re s ta rt.
A n E x e c u tio n A R M O R -a p p lic a tio n c o r r e la te d fa ilu r e . If th e a p p lic a tio n p ro c e s s
a tte m p te d to c o n ta c t th e E x e c u tio n A R M O R ( e .g ., to s e n d p r o g r e s s in d ic a to r u p d a te s
o r to n o tify th e E x e c u tio n A R M O R th a t it is te rm in a tin g n o rm a lly ) w h ile th e A R M O R
w a s re c o v e rin g , th e a p p lic a tio n p ro c e s s b lo c k e d u n til th e E x e c u tio n A R M O R
c o m p le te ly re c o v e re d . B e c a u s e th e M P I p ro c e s s e s a re tig h tly c o u p le d , a c o rre la te d
fa ilu re is p o s s ib le if th e E x e c u tio n A R M O R o v e rs e e in g th e o th e r M P I p ro c e s s
d ia g n o s e d th e b lo c k in g a s a n a p p lic a tio n h a n g a n d in itia te d re c o v e ry .
T h is c o rre la te d fa ilu re o c c u rre d m o s t o fte n w h e n th e E x e c u tio n A R M O R h u n g
( i.e ., d u e to S I G S T O P in je c tio n s ) : 2 2 c o r r e la te d f a ilu r e s w e r e d u e to S I G S T O P
in je c tio n s a s o p p o s e d to 1 c o r r e la te d f a ilu r e r e s u ltin g f r o m a n A R M O R c r a s h ( i.e .,
d u e to S IG IN T in je c tio n s ). T h is is b e c a u s e a n E x e c u tio n A R M O R c ra s h fa ilu re is
d e te c te d im m e d ia te ly b y th e d a e m o n th ro u g h o p e ra tin g s y s te m c a lls , m a k in g th e
E x e c u tio n A R M O R u n a v a ila b le fo r o n ly a s h o rt tim e . H a n g s , h o w e v e r, a re d e te c te d
v ia a 1 0 -s e c o n d h e a rtb e a t.
2 .4 R e g is te r a n d T e x t-S e g m e n t I n je c tio n s
in je c te d e rro rs th a t c o rru p te d a n A R M O R ’s c h e c k p o in t o r p ro p a g a te d o u ts id e th e
in je c te d p ro c e ss.
T e x t-s e g m e n t e rro rs w e re m o re lik e ly th a n re g is te r e rro rs to le a d to s y s te m fa ilu re s .
T h is w a s b e c a u se v a lu e s in r e g is te r s ty p ic a lly h a d a s h o r te r lif e tim e ( i.e ., th e y w e r e
e ith e r n e v e r u se d o r q u ic k ly o v e rw ritte n ) w h e n c o m p a re d to in fo rm a tio n s to re d in
th e te x t se g m e n t.
T a b le 3 s u m m a riz e s th e re s u lts o f a p p ro x im a te ly 6 ,0 0 0 re g is te r in je c tio n s a n d
3 ,0 0 0 te x t- s e g m e n t in je c tio n s in to b o th th e a p p lic a tio n a n d A R M O R p r o c e s s e s .
F a ilu re s a re c la s s ifie d in to fo u r c a te g o rie s : s e g m e n ta tio n fa u lts , ille g a l in s tru c tio n s ,
h a n g s , a n d e rro rs d e te c te d v ia a s s e rtio n s . T h e s e c o n d c o lu m n in T a b le 3 g iv e s th e
n u m b e r o f s u c c e s s fu l re c o v e rie s v s . th e n u m b e r o f fa ilu re s fo r e a c h s e t o f
e x p e rim e n ts . E rro rs th a t w e re n o t s u c c e s s fu lly re c o v e re d le d to s y s te m fa ilu re s (4 d u e
to F T M fa ilu re s , 5 d u e to E x e c u tio n A R M O R fa ilu re s , a n d 2 d u e to H e a rtb e a t
A R M O R fa ilu re s ).
F T M r e c o v e r y . T a b le 3 s h o w s th a t th e F T M s u c c e s s fu lly re c o v e re d fro m a ll
re g is te r in je c tio n s . T w o te x t-s e g m e n t in je c tio n s w e re d e te c te d th ro u g h a s s e rtio n s o n
th e F T M ’s in te rn a l d a ta s tru c tu re s , a n d b o th o f th e s e e rro rs w e re re c o v e re d .
T a b le 3 a ls o s h o w s th a t th e F T M c o u ld n o t re c o v e r fro m fo u r te x t-s e g m e n t e rro rs .
In e a c h c a s e , th e e rro r c o rru p te d th e F T M ’s c h e c k p o in t p rio r to c ra s h in g . B e c a u s e th e
c h e c k p o in t w a s c o rru p te d , th e F T M c ra s h e d s h o rtly a fte r b e in g re c o v e re d . T h is c y c le
o f fa ilu re a n d re c o v e ry re p e a te d u n til th e ru n tim e d o u t.
T h e re w e re s e v e n c a s e s o f a c o rre la te d fa ilu re in w h ic h th e F T M fa ile d d u rin g th e
a p p lic a tio n ’s in itia liz a tio n : th re e fro m te x t-s e g m e n t in je c tio n s a n d fo u r fro m re g is te r
in je c tio n s . B o th th e F T M a n d th e a p p lic a tio n re c o v e re d fro m a ll s e v e n c o rre la te d
fa ilu re s .
H e a rtb e a t
7 7 / 7 7 6 2 6 8 1 7 3 .0 0 0 .2 2 7 0 .6 6 0 .2 1 0 .3 1 0 .0 4
A R M O R
T e x t-s e g m e n t In je c tio n s
A p p lic a tio n 8 2 / 8 2 4 1 2 3 1 8 0 8 9 .4 7 2 .8 7 8 7 .4 9 2 .8 8 1 .0 5 0 .3 3
F T M 8 4 / 8 8 5 3 2 8 5 2 7 6 .4 7 2 .8 7 7 1 .0 0 2 .3 1 0 .5 1 0 .0 5
E x e c u tio n
9 3 / 9 5 4 5 3 1 1 1 8 7 7 .4 8 1 .9 3 7 4 .8 3 1 .8 6 0 .4 3 0 .0 4
A R M O R
H e a rtb e a t
9 5 / 9 7 5 3 3 3 1 1 0 7 3 .2 3 0 .3 7 7 1 .2 1 0 .3 6 0 .3 0 0 .0 1
A R M O R
M e a s u re m e n t-B a s e d A n a ly s is o f S y s te m D e p e n d a b ility 3 0 1
2 .5 H e a p I n je c tio n s
T a b le 4 . S y s te m F a ilu re s O b s e rv e d T h ro u g h H e a p In je c tio n s
L e g e n d (E ffe c t o n s y s te m ): (A ) u n a b le to re g is te r d a e m o n s, (B ) u n a b le to in s ta ll E x e c u tio n A R M O R s , (C ) u n a b le to s ta rt
a p p lic a tio n s , ( D ) u n a b le to u n in s ta ll E x e c u tio n A R M O R s a fte r a p p lic a tio n c o m p le te s .
L e g e n d (S y s te m fa ilu r e /a s s e r tio n c h e c k c la s s ific a tio n ): (2 ) sy s te m fa ilu re w ith o u t a s s e rtio n firin g , (3 ) s y s te m fa ilu re w ith
a ss e rtio n firin g , (4 ) su c c e ssfu l r e c o v e rie s a fte r a s s e rtio n fire d .
E le m e n t E ffe c t o n S y s te m S y s te m F a ilu re s
# 4
A B C D T o ta l # 2 # 3
m g r _ a r m o r _ i n f o . S to re s in fo rm a tio n a b o u t
s u b o rd in a te A R M O R s s u c h a s lo c a tio n a n d 4 1 5 4 1 4 6 8 1 9
e le m e n t c o m p o s itio n .
e x e c _ a r m o r _ i n f o . S to re s in fo rm a tio n
a b o u t e a c h E x e c u tio n A R M O R s u c h a s s ta tu s o f 0 0 5 4 9 4 5 9
s u b o rd in a te a p p lic a tio n .
a p p _ p a r a m . S to re s in fo rm a tio n a b o u t
a p p lic a tio n s u c h a s e x e c u ta b le n a m e ,
0 0 0 0 0 0 0 2
c o m m a n d - lin e a r g u m e n ts , a n d n u m b e r o f tim e s
a p p lic a tio n re s ta rte d .
a g r _ a p p _ d e t e c t . U s e d to d e te c t th a t a ll
p ro c e s s e s fo r M P I a p p lic a tio n h a v e te r m in a te d 0 0 0 0 0 0 0 4
a n d to in itia te re c o v e r y if n e c e s s a r y .
n o d e _ m g m t . S to re s in fo rm a tio n a b o u t th e
n o d e s , in c lu d in g th e re s id e n t d a e m o n a n d 0 1 4 0 0 1 4 0 1 4 3
h o s tn a m e .
T O T A L 4 1 5 1 0 8 3 7 1 0 2 7 3 7
2 .6 L e sso n s L e a r n e d
S IF T o v e r h e a d s h o u ld b e k e p t s m a ll. S y s te m d e s ig n e rs m u s t b e a w a re th a t S IF T
s o lu tio n s h a v e th e p o te n tia l to d e g ra d e th e p e rfo rm a n c e a n d e v e n th e d e p e n d a b ility o f
th e a p p lic a tio n s th e y a re in te n d e d to p ro te c t. O u r e x p e rim e n ts s h o w th a t th e
fu n c tio n a lity in S IF T c a n b e d is trib u te d a m o n g s e v e ra l p ro c e s s e s th ro u g h o u t th e
n e tw o rk s o th a t th e o v e rh e a d im p o s e d b y th e S IF T p ro c e s s e s is in s ig n ific a n t w h ile th e
a p p lic a tio n is ru n n in g .
S IF T r e c o v e r y tim e s h o u ld b e k e p t s m a ll. M in im iz in g th e S IF T p ro c e s s re c o v e ry
tim e is d e s ira b le fro m tw o s ta n d p o in ts : (1 ) re c o v e rin g S IF T p ro c e s s e s h a v e th e
p o te n tia l to a ffe c t a p p lic a tio n p e rfo rm a n c e b y c o n te n d in g fo r p ro c e s s o r a n d n e tw o rk
re s o u rc e s , a n d (2 ) a p p lic a tio n s re q u irin g s u p p o rt fro m th e S IF T e n v iro n m e n t a re
a ffe c te d w h e n S IF T p ro c e s s e s b e c o m e u n a v a ila b le . O u r re s u lts in d ic a te th a t fu lly
r e c o v e r in g a S I F T p r o c e s s ta k e s a p p r o x im a te ly 0 .5 s . T h e m e a n o v e r h e a d a s s e e n b y
th e a p p lic a tio n fro m S IF T re c o v e ry is le s s th a n 5 % , w h ic h ta k e s in to a c c o u n t 1 0 o u t
o f ro u g h ly 8 0 0 fa ilu re s fro m re g is te r, te x t-s e g m e n t a n d h e a p in je c tio n s th a t c a u s e d th e
a p p lic a tio n to b lo c k o r re s ta rt b e c a u s e o f th e u n a v a ila b ility o f a S IF T p ro c e s s . T h e
o v e rh e a d fro m re c o v e ry is in s ig n ific a n t w h e n th e s e 1 0 c a s e s a re n e g le c te d .
S IF T /a p p lic a tio n in te r fa c e s h o u ld b e k e p t s im p le . In a n y m u ltip ro c e s s S IF T
d e s ig n , s o m e S IF T p ro c e s s e s m u s t b e c o u p le d to th e a p p lic a tio n in o rd e r to p ro v id e
e rro r d e te c tio n a n d re c o v e ry . T h e E x e c u tio n A R M O R s p la y th is ro le in o u r S IF T
e n v iro n m e n t. B e c a u s e o f th is d e p e n d e n c y , it is im p o rta n t to m a k e th e E x e c u tio n
A R M O R s a s s im p le a s p o s s ib le . A ll re c o v e ry a c tio n s a n d th o s e o p e ra tio n s th a t a ffe c t
th e g lo b a l s y s te m ( e .g ., jo b s u b m is s io n a n d d e te c tin g r e m o te n o d e f a ilu r e s ) a r e
d e le g a te d to a re m o te S IF T p ro c e s s th a t is d e c o u p le d fro m th e a p p lic a tio n ’s
e x e c u tio n . T h is s tra te g y a p p e a rs to w o rk , a s o n ly 5 o f 3 7 3 o b s e rv e d E x e c u tio n
A R M O R fa ilu re s le d to s y s te m fa ilu re s .
S IF T a v a ila b ility im p a c ts th e a p p lic a tio n . L o w re c o v e ry tim e a n d a g g re s s iv e
c h e c k p o in tin g o f th e S IF T p ro c e s s e s h e lp m in im iz e th e S IF T e n v iro n m e n t d o w n tim e ,
m a k in g th e e n v iro n m e n t a v a ila b le fo r p ro c e s s in g a p p lic a tio n re q u e s ts a n d fo r
re c o v e rin g fro m a p p lic a tio n fa ilu re s .
S y s te m fa ilu r e s a r e n o t n e c e s s a r ily fa ta l. O n ly 1 1 o f th e 1 0 ,0 0 0 in je c tio n s re s u lte d
in a s y s te m fa ilu re in w h ic h th e S IF T e n v iro n m e n t c o u ld n o t re c o v e r fro m th e e rro r.
T h e s e s y s te m fa ilu re s d id n o t a ffe c t a n e x e c u tin g a p p lic a tio n .
3 0 4 R .K . Iy e r a n d Z . K a lb a r c z y k
3 E r r o r a n d F a ilu r e A n a ly s is o f a L A N o f W in d o w s N T -B a s e d
S e r v e r s
3 .1 E r r o r L o g g in g in W in d o w s N T
T a b le 5 . B re a k u p o f R e b o o ts B a s e d o n P ro m in e n t E v e n ts
C a te g o r y F r e q u e n c y P e r c e n ta g e
T o ta l re b o o ts 1 1 0 0 1 0 0
H a rd w a re o r firm w a re p ro b le m s 1 0 5 9
C o n n e c tiv ity p ro b le m s 2 4 1 2 2
C ru c ia l a p p lic a tio n fa ilu re s 1 5 2 1 4
P ro b le m s w ith a s o ftw a re c o m p o n e n t 4 2 4
N o rm a l s h u td o w n s 6 3 6
N o rm a l re b o o ts /p o w e r-o ff (n o in d ic a tio n o f 1 7 8 1 6
a n y p ro b le m s )
U n k n o w n 3 1 9 2 9
N o r m a l s h u td o w n s : T h is c a te g o ry c o v e rs re b o o ts , w h ic h a re n o t p re c e d e d b y
w a rn in g s o r e rro r m e s s a g e s . A d d itio n a lly , th e re a re e v e n ts th a t in d ic a te s h u ttin g d o w n
o f c r itic a l a p p lic a tio n s o f tw a r e a n d s o m e s y s te m c o m p o n e n ts ( e .g ., th e B R O W S E R ) .
T h e s e re p re s e n t s h u td o w n s fo r m a in te n a n c e o r fo r c o rre c tin g p ro b le m s n o t c a p tu re d in
th e e v e n t lo g s .
N o r m a l r e b o o ts /p o w e r -o ff: T h is c a te g o ry c o v e rs re b o o ts w h ic h a re ty p ic a lly n o t
p re c e d e d b y s h u td o w n e v e n ts , b u t d o n o t a p p e a r to b e c a u s e d b y a n y p ro b le m s e ith e r.
N o w a rn in g s o r e rro r m e s s a g e s a p p e a r in th e e v e n t lo g b e fo re th e re b o o t.
B a s e d o n d a ta in T a b le 5 , th e fo llo w in g o b s e rv a tio n s c a n b e m a d e a b o u t th e fa ilu re s :
1 . 2 9 % o f th e re b o o ts c a n n o t b e c a te g o riz e d . S u c h re b o o ts a re in d e e d p re c e d e d b y
e v e n ts o f s e v e rity 2 o r le s s e r, b u t th e re is n o t e n o u g h in fo rm a tio n a v a ila b le to
d e c id e (a ) w h e th e r th e e v e n ts w e re s e v e re e n o u g h to fo rc e a re b o o t o f th e m a c h in e
o r (b ) th e n a tu re o f th e p ro b le m th a t th e e v e n ts re fle c t.
2 . A s ig n ific a n t p e rc e n ta g e (2 2 % ) o f th e re b o o ts h a v e re p o rte d c o n n e c tiv ity p ro b le m s .
C o n n e c tiv ity p ro b le m s s u g g e s t th a t th e re c o u ld b e p ro p a g a te d fa ilu re s in th e
d o m a in . F u rth e rm o re , it is p o s s ib le th a t th e m a c h in e s fu n c tio n in g a s th e m a s te r
b ro w s e r a n d th e P rim a ry D o m a in C o n tro lle r (P D C )2, re s p e c tiv e ly a re p o te n tia l
re lia b ility b o ttle n e c k s o f th e d o m a in .
3 . O n ly a s m a ll p e rc e n ta g e (1 0 % ) o f th e re b o o ts c a n b e tra c e d to a s y s te m h a rd w a re
c o m p o n e n t. M o s t o f th e id e n tifia b le p ro b le m s a re s o ftw a re re la te d .
4 . N e a r ly 5 0 % o f th e r e b o o ts a r e a b n o r m a l r e b o o ts ( i.e ., th e r e b o o ts w e r e d u e to a
p ro b le m w ith th e m a c h in e ra th e r th a n d u e to a n o rm a l s h u td o w n ).
5 . In n e a rly 1 5 % o f th e c a s e s , s e v e re p ro b le m s w ith a c ru c ia l m a il s e rv e r a p p lic a tio n
fo rc e a re b o o t o f th e m a c h in e .
3 .3 A n a ly s is o f F a ilu r e B e h a v io r o f I n d iv id u a l M a c h in e s
2 In th e a n a ly z e d n e tw o rk , th e m a c h in e s b e lo n g e d to a c o m m o n W in d o w s N T d o m a in . O n e o f
th e m a c h in e s w a s c o n fig u re d a s th e P rim a ry D o m a in C o n tro lle r (P D C ). T h e re s t o f th e
m a c h in e s fu n c tio n e d a s B a c k u p D o m a in C o n tro lle rs (B D C s ).
3 0 8 R .K . Iy e r a n d Z . K a lb a r c z y k
Ite m M a c h in e M a c h in e
U p tim e S ta tis tic s D o w n tim e S ta tis tic s
N u m b e r o f e n trie s 6 1 6 6 8 2
M a x im u m 8 5 .2 d a y s 1 5 .7 6 d a y s
M in im u m 1 h o u r 1 se c o n d
A v e ra g e 1 1 .8 2 d a y s 1 .9 7 h o u rs
M e d ia n 5 .5 4 d a y s 1 1 .4 3 m in u te s
S ta n d a rd D e v ia tio n 1 5 .6 5 6 d a y s 1 5 .8 6 h o u rs
A s th e ta b le s h o w s , 5 0 % o f th e d o w n tim e s la s t a b o u t 1 2 m in u te s . T h is is p ro b a b ly to o
sh o rt a p e rio d to re p la c e h a rd w a re c o m p o n e n ts a n d re c o n fig u re th e m a c h in e . T h e
im p lic a tio n is th a t m a jo rity o f th e p ro b le m s a re s o ftw a re re la te d (m e m o ry le a k s ,
m is lo a d e d d r iv e r s , a p p lic a tio n e r r o r s e tc .) . T h e m a x im u m v a lu e is u n r e a lis tic a n d
m ig h t h a v e b e e n d u e to th e m a c h in e b e in g te m p o ra rily ta k e n o ff-lin e a n d p u t b a c k in
a fte r a fo rtn ig h t.
S in c e th e m a c h in e s u n d e r c o n s id e ra tio n a re d e d ic a te d m a il s e rv e rs , b rin g in g d o w n
o n e o r m o re o f th e m w o u ld p o te n tia lly d is ru p t s to ra g e , fo rw a rd in g , re c e p tio n , a n d
d e liv e ry o f m a il. T h e d is ru p tio n c a n b e p re v e n te d if e x p lic it re ro u tin g is p e r-fo rm e d to
a v o id th e m a c h in e s th a t a re d o w n . B u t it is n o t c le a r if s u c h re ro u tin g w a s d o n e o r c a n
b e d o n e . In th is c o n te x t th e fo llo w in g o b s e rv a tio n s w o u ld b e c a u s e s fo r c o n c e rn : (1 )
a v e ra g e d o w n tim e m e a s u re d w a s n e a rly 2 h o u rs o r (2 ) 5 0 % o f th e m e a s u re d u p tim e
sa m p le s w e re a b o u t 5 d a y s o r le s s .
A v a ila b ility
H a v in g e s tim a te d m a c h in e u p tim e a n d d o w n tim e , w e c a n e s tim a te th e a v a ila b ility o f
e a c h m a c h in e . T h e a v a ila b ility is e v a lu a te d a s th e ra tio :
[< a v e ra g e u p tim e > / (< a v e ra g e u p tim e > + < a v e ra g e d o w n tim e > )]* 1 0 0
T a b le 7 s u m m a riz e s th e a v a ila b ility m e a s u re m e n ts . A s th e ta b le d e p ic ts , th e
m a jo r ity o f th e m a c h in e s h a v e a n a v a ila b ility o f 9 9 .7 % o r h ig h e r . A ls o th e r e is n o t a
la rg e v a ria tio n a m o n g th e in d iv id u a l v a lu e s . T h is is s u rp ris in g c o n s id e rin g th e ra th e r
la rg e d e g re e o f v a ria tio n in th e a v e ra g e u p tim e s . It fo llo w s th a t m a c h in e s w ith s m a lle r
a v e ra g e u p -tim e s a ls o h a d c o rre s p o n d in g ly s m a lle r a v e ra g e d o w n tim e s , s o th a t th e
ra tio s a re n o t v e ry d iffe re n t. H e n c e , th e d o m a in h a s tw o ty p e s o f m a c h in e s : th o s e th a t
re b o o t o fte n b u t re c o v e r q u ic k ly a n d th o s e th a t s ta y u p re la tiv e ly lo n g e r b u t ta k e
lo n g e r to re c o v e r fro m a fa ilu re .
M e a s u re m e n t-B a s e d A n a ly s is o f S y s te m D e p e n d a b ility 3 0 9
T a b le 7 . M a c h in e A v a ila b ility
Ite m V a lu e
N u m b e r o f m a c h in e s 6 6
M a x im u m 9 9 .9 9
M in im u m 8 9 .3 9
M e d ia n 9 9 .7 6
A v e ra g e 9 9 .3 5
S ta n d a rd D e v ia tio n 1 .5 2
M o d e lin g M a c h in e B e h a v io r
T o o b ta in m o re a c c u ra te e s tim a te s o f m a c h in e a v a ila b ility , w e m o d e le d th e b e h a v io r
o f a ty p ic a l m a c h in e in te rm s o f a fin ite s ta te m o d e l. T h e m o d e l w a s b a s e d o n th e
e v e n ts th a t e a c h m a c h in e lo g s . In th e m o d e l, e a c h s ta te re p re s e n ts a le v e l o f
fu n c tio n a lity o f th e m a c h in e . A m a c h in e is e ith e r in a fu lly fu n c tio n a l s ta te , in w h ic h
it lo g s e v e n ts th a t in d ic a te n o rm a l a c tiv ity , o r in a p a rtia lly fu n c tio n a l s ta te , in w h ic h it
lo g s e v e n ts th a t in d ic a te p ro b le m s o f a s p e c ific n a tu re .
S e le c tio n a n d a s s ig n m e n t o f s ta te s to a m a c h in e w a s p e rfo rm e d a s fo llo w s. T h e
lo g s w e re s p lit in to tim e -w in d o w s o f o n e h o u r e a c h . F o r e a c h s u c h w in d o w , th e
m a c h in e w a s a s s ig n e d a s ta te , w h ic h it o c c u p ie d th ro u g h o u t th e d u ra tio n o f th e
w in d o w . T h e a s s ig n m e n t w a s b a s e d o n th e e v e n ts th a t th e m a c h in e lo g g e d in th e
w in d o w . T a b le 8 d e s c rib e s th e s ta te s id e n tifie d fo r th e m o d e l.
T a b le 8 . M a c h in e S ta te s
S ta te N a m e M a in E v e n ts (id /s o u r c e /s e v e r ity ) E x p la n a tio n
R e b o o t 6 0 0 5 /E v e n tL o g /4 M a c h in e lo g s re b o o t a n d o th e r
in itia liz a tio n e v e n ts
F u n c tio n a l 5 7 1 5 /N E T L O G O N /4 M a c h in e lo g s su c c e ssfu l
1 0 1 6 /M S E x c h a n g e IS P riv a te /8 c o m m u n ic a tio n w ith P D C
C o n n e c tiv ity p ro b le m s 3 0 9 6 /N E T L O G O N /1 P ro b le m s lo c a tin g th e P D C
5 7 1 9 /N E T L O G O N /1
S ta rtu p p ro b le m s 7 0 0 0 /S e rv ic e C o n tro l M a n a g e r/1 S o m e s y s te m c o m p o n e n t o r
7 0 0 1 /S e rv ic e C o n tro l M a n a g e r/1 a p p lic a tio n fa ile d to s ta rtu p
M T A p ro b le m s 2 2 0 6 /M S E x c h a n g e M T A /2 M e ssa g e T ra n sf e r A g e n t h a s
2 2 0 7 /M S E x c h a n g e M T A /2 p ro b le m s w ith s o m e in te rn a l
d a ta b a s e s
A d a p te r p ro b le m s 4 1 0 5 /C p q N F 3 /1 T h e N e tF le x A d a p te r d riv e r re p o rts
4 1 0 6 /C p q N F 3 /1 p ro b le m s
T e m p o ra ry M T A 9 3 2 2 /M S E x c h a n g e M T A /4 M e ssa g e T ra n sf e r A g e n t re p o rts
p ro b le m s 9 2 7 7 /M S E x c h a n g e M T A /2 p ro b le m s o f a te m p o ra ry (o r le s s
3 1 7 5 /M S E x c h a n g e M T A /2 s e v e re ) n a tu re
1 2 0 9 /M S E x c h a n g e M T A /2
S e rv e r p ro b le m s 2 0 0 6 /S rv /1 S e rv e r c o m p o n e n t re p o rts h a v in g
re c e iv e d b a d ly f o rm a tte d re q u e s ts
B R O W S E R p ro b le m s 8 0 2 1 /B R O W S E R /2 B ro w se r re p o rts in a b ility to c o n ta c t
8 0 3 2 /B R O W S E R /1 th e m a s te r b ro w se r
D is k p ro b le m s 1 1 /C p q 3 2 fs 2 /1 D is k d riv e rs re p o rt p ro b le m s
5 /C p q 3 2 fs 2 /1
9 /C p q a rra y /1
1 1 /C p q a rra y /1
T a p e p ro b le m s 1 5 /d ltta p e /1 T a p e d riv e r re p o rts p ro b le m s
S n m p e le a p ro b le m s 3 0 0 6 /S n m p e le a /1 S n m p e v e n t lo g a g e n t re p o rts e rro r
w h ile re a d in g a n e v e n t lo g re c o rd
S h u td o w n 8 0 3 3 /B R O W S E R /4 A p p lic a tio n /m a c h in e s h u td o w n in
1 0 0 3 /M S E x c h a n g e S A /4 p ro g re ss
F ig . 4 . S ta te T ra n s itio n s o f a T y p ic a l M a c h in e
O v e r 1 1 % o f th e tra n s itio n s o u t o f th e T e m p o r a r y M T A p r o b le m s s ta te a re to th e
B ro w se r p ro b le m s s ta te . W e s u s p e c t th a t th e re w a s a lo c a l p ro b le m th a t c a u s e d
R P C s to tim e o u t o r fa il a n d c a u s e d p ro b le m s f o r th e M T A a n d B R O W S E R .
A n o th e r p o s s ib ility is th a t, in b o th c a s e s , it w a s th e sa m e re m o te m a c h in e th a t
c o u ld n o t b e c o n ta c te d . B a s e d o n th e a v a ila b le d a ta , it w a s n o t p o s s ib le to
d e te rm in e th e re a l c a u s e o f th e p ro b le m .
T o v ie w th e tra n s itio n s fro m a d iffe re n t p e rs p e c tiv e , w e c o m p u te d th e w e ig h t o f e a c h
o u tg o in g e d g e a s a fra c tio n o f a ll th e tra n s itio n s in th e fin ite s ta te m a c h in e . S u c h a
c o m p u ta tio n p ro v id e d s o m e in te re s tin g in s ig h ts , w h ic h a re e n u m e ra te d b e lo w :
1 . N e a rly 1 0 % o f a ll th e tra n s itio n s a re b e tw e e n th e F u n c tio n a l a n d T e m p o r a r y M T A
p ro b le m s s ta te s . T h e s e M T A p ro b le m s a re ty p ic a lly p ro b le m s w ith s o m e R P C c a lls
(e ith e r fa ilin g o r b e in g c a n c e le d ).
2 . A b o u t 0 .5 % ( 1 in 2 0 0 ) o f a ll tr a n s itio n s a r e to th e R e b o o t s ta te .
3 . T h e m a jo rity o f th e tra n s itio n s in to th e M T A p r o b le m s s ta te a re fro m th e R e b o o t
s ta te . T h u s , M T A p ro b le m s a re p rim a rily p ro b le m s th a t o c c u r a t s ta rtu p . In
c o n tra s t, th e m a jo rity o f th e tra n s itio n s in to th e S e r v e r p r o b le m s s ta te a n d th e
B r o w s e r p r o b le m s s ta te (e x c lu d in g th e s e lf lo o p s ) a re fro m th e F u n c tio n a l s ta te .
S o , th e s e p ro b le m s (o r a t le a s t a s ig n ific a n t fra c tio n o f th e m ) ty p ic a lly a p p e a r a fte r
th e m a c h in e is fu n c tio n a l.
4 . A b o u t 9 2 % o f a ll tra n s itio n s a re in to th e F u n c tio n a l s ta te . T h is fig u re is
a p p ro x im a te ly a m e a s u re o f th e a v e ra g e tim e th e h y p o th e tic a l m a c h in e s p e n d s in
th e fu n c tio n a l s ta te . H e n c e it is a m e a s u re o f th e a v e ra g e a v a ila b ility o f a ty p ic a l
m a c h in e . In th is c a s e , a v a ila b ility m e a s u re s th e a b ility o f th e m a c h in e to p ro v id e
s e rv ic e , n o t ju s t to s ta y a liv e .
3 .4 M o d e lin g D o m a in B e h a v io r
A n a ly z in g s y s te m b e h a v io r fro m th e p e rs p e c tiv e o f th e w h o le d o m a in (1 ) p ro v id e s a
m a c ro s c o p ic v ie w o f th e s y s te m ra th e r th a n a m a c h in e -s p e c ific v ie w , (2 ) h e lp s to
c h a ra c te riz e th e n a tu re o f in te ra c tio n s in th e n e tw o rk , a n d (3 ) a id s in id e n tify in g
p o te n tia l re lia b ility b o ttle n e c k s a n d s u g g e s ts w a y s to im p ro v e re s ilie n c e to o p e ra tio n a l
fa u lts .
In te r -r e b o o t T im e s . A n im p o rta n t c h a ra c te ris tic o f th e d o m a in is h o w o fte n r e b o o ts
o c c u r w ith in it. T o e x a m in e th is , th e w h o le d o m a in is tre a te d a s a b la c k b o x , a n d
e v e ry re b o o t o f e v e ry m a c h in e in th e d o m a in is c o n s id e r e d to b e a re b o o t o f th e b la c k
b o x . T a b le 9 s h o w s th e s ta tis tic s o f s u c h in te r-re b o o t tim e s m e a su re d a c ro s s th e w h o le
d o m a in .
F in ite S ta te M o d e l o f th e D o m a in
T h e p ro p e r fu n c tio n in g o f th e d o m a in re lie s o n th e p ro p e r fu n c tio n in g o f th e P D C a n d
its in te ra c tio n s w ith th e B a c k u p D o m a in C o n tro lle rs (B D C s). T h u s it w o u ld se e m
u s e fu l to re p re s e n t th e d o m a in in te rm s o f h o w m a n y B D C s a re a liv e a t a n y g iv e n
m o m e n t a n d a ls o in te rm s o f th e P D C b e in g fu n c tio n a l o r n o t. A c c o rd in g ly , a fin ite
s ta te m o d e l w a s c o n s tru c te d a s fo llo w s :
1 . T h e d a ta c o lle c tio n p e rio d w a s b ro k e n u p in to tim e w in d o w s o f a fix e d le n g th ,
2 . F o r e a c h s u c h tim e w in d o w , th e s ta te o f th e d o m a in w a s c o m p u te d , a n d
3 . A tra n s itio n d ia g ra m w a s c o n s tru c te d b a s e d o n th e s ta te in fo rm a tio n .
T h e s ta te o f th e d o m a in d u rin g a g iv e n tim e w in d o w w a s c o m p u te d b y e v a lu a tin g
th e n u m b e r o f m a c h in e s th a t re b o o te d d u rin g th a t tim e w in d o w . M o re s p e c ific a lly , th e
s ta te s w e re id e n tifie d a s sh o w n in T a b le 1 0 . F ig . 5 s h o w s th e tra n s itio n s in th e
d o m a in . E a c h tim e w in d o w w a s o n e h o u r lo n g .
T a b le 1 0 . D o m a in S ta te s a n d th e ir In te rp re ta tio n
S ta te N a m e M e a n in g
P D C P rim a ry D o m a in C o n tro lle r (P D C ) re b o o te d
B D C 1 B a c k u p D o m a in C o n tro lle r (B D C ) re b o o te d
M B D C M a n y B D C s re b o o te d
P D C + B D C P D C a n d O n e B D C re b o o te d
P D C + M B D C P D C a n d M a n y B D C s re b o o te d
F F u n c tio n a l (n o re b o o ts o b s e rv e d )
F ig . 5 . D o m a in S ta te T ra n s itio n s
F ig . 5 re v e a ls s o m e in te re s tin g in s ig h ts .
3 1 4 R .K . Iy e r a n d Z . K a lb a r c z y k
4 C o n c lu s io n s
T h e d is c u s s io n in th is p a p e r fo c u se d o n th e is s u e s in v o lv e d in a n a ly z in g th e
a v a ila b ility o f n e tw o rk e d s y s te m s u s in g fa u lt in je c tio n a n d th e fa ilu re d a ta c o lle c te d
b y th e lo g g in g m e c h a n is m s b u ilt in to th e s y s te m . T o a c h ie v e a c c u ra te a n d
c o m p re h e n s iv e s y s te m d e p e n d a b ility e v a lu a tio n th e a n a ly s is m u st s p a n th e th re e
p h a s e s o f s y s te m life : d e s ig n p h a s e , p ro to ty p e p h a se , a n d o p e ra tio n a l p h
a se .
F o r e x a m p le th e p re s e n te d fa u lt in je c tio n s tu d y o f th e A R M O R - b a s e d S I F T
e n v iro n m e n t d e m o n s tra te d th a t:
1 . S tru c tu rin g th e fa u lt in je c tio n e x p e rim e n ts to p ro g re s s iv e ly s tre s s th e e rro r
d e te c tio n a n d re c o v e ry m e c h a n is m s is a u s e fu l a p p ro a c h to e v a lu a tin g p e rfo rm a n c e
a n d e rro r p ro p a g a tio n .
2 . E v e n th o u g h th e p ro b a b ility fo r c o rre la te d fa ilu re s is s m a ll, its p o te n tia l im p a c t o n
a p p lic a tio n a v a ila b ility is s ig n ific a n t.
3 . T h e S IF T e n v iro n m e n t s u c c e s s fu lly re c o v e re d fro m a ll c o rre la te d fa ilu re s
in v o lv in g th e a p p lic a tio n a n d a S IF T p ro c e s s b e c a u s e th e p ro c e s s e s p e rfo rm in g
e rro r d e te c tio n a n d re c o v e ry w e re d e c o u p le d fro m th e fa ile d p ro c e s s e s .
4 . T a rg e te d in je c tio n s in to d y n a m ic d a ta o n th e h e a p w e re u s e fu l in fu rth e r
in v e s tig a tin g s y s te m fa ilu re s b ro u g h t a b o u t b y e rro r p ro p a g a tio n . A s s e rtio n s w ith in
th e S IF T p ro c e s s e s w e re s h o w n to re d u c e th e n u m b e r o f s y s te m fa ilu re s fro m d a ta
e rro r p ro p a g a tio n b y u p to 4 2 % .
S im ila rly a n a ly s is o f fa ilu re d a ta c o lle c te d in a n e tw o rk o f W in d o w s N T m a c h in e s
p ro v id e s in s ig h ts in to n e tw o rk s y s te m fa ilu re b e h a v io r.
1 . M o s t o f th e p ro b le m s th a t le a d to r e b o o ts a re s o ftw a re re la te d . O n ly 1 0 % a re
a ttrib u ta b le to s p e c ific h a r d w a re c o m p o n e n ts .
2 . R e b o o tin g th e m a c h in e d o e s n o t a p p e a r to s o lv e th e p ro b le m in m a n y c a s e s . In
a b o u t 6 0 % o f th e re b o o ts , th e re b o o te d m a c h in e re p o rte d p ro b le m s w ith in a h o u r o r
tw o o f th e re b o o t.
M e a s u re m e n t-B a s e d A n a ly s is o f S y s te m D e p e n d a b ility 3 1 5
3 . T h o u g h th e a v e ra g e a v a ila b ility e v a lu a te s to o v e r 9 9 % , a ty p ic a l m a c h in e in th e
d o m a in , o n a v e ra g e , p ro v id e s a c c e p ta b le s e rv ic e o n ly a b o u t 9 2 % o f th e tim e .
4 . A b o u t 1 % o f th e re b o o ts in d ic a te m e m o ry le a k s in th e s o ftw a re .
5 . T h e re a re in d ic a tio n s o f p ro p a g a te d o r c o rre la te d fa ilu re s . T y p ic a lly , in su c h c a se s,
m u ltip le m a c h in e s e x h ib it id e n tic a l o r s im ila r p ro b le m s a t a lm o s t th e s a m e tim e .
M o re o v e r, th e fa ilu re d a ta a n a ly s is a ls o p ro v id e s in s ig h ts in to th e e rro r lo g g in g
m e c h a n is m . F o r e x a m p le , e v e n t-lo g g in g fe a tu re s th a t a re a b s e n t, b u t d e s ira b le , in
W in d o w s N T c a n b e s u g g e s te d :
1 . T h e p re s e n c e o f a W in d o w s N T s h u td o w n e v e n t w ill im p ro v e th e a c c u ra c y in
id e n tify in g th e c a u s e s o f re b o o ts . It w ill a ls o le a d to b e tte r e s tim a te s o f m a c h in e
a v a ila b ility .
2 . M o s t o f th e e v e n ts o b s e rv e d in th e lo g s w e re e ith e r d u e to a p p lic a tio n s o r to h ig h -
le v e l s y s te m c o m p o n e n ts , s u c h a s file -s y s te m d riv e rs . It is n o t e v id e n t if th is is d u e
to a g e n u in e a b s e n c e o f p ro b le m s a t th e lo w e r le v e ls o r it is ju s t b e c a u s e th e lo w e r-
le v e l s y s te m c o m p o n e n ts lo g e v e n ts s p a rin g ly o r re s o rt to o th e r m e a n s to re p o rt
e v e n ts . If th e la tte r is tru e , th e n im p ro v e d e v e n t lo g g in g b y th e lo w e r-le v e l s y s te m
c o m p o n e n ts (p ro to c o l d riv e rs , m e m o ry m a n a g e rs ) c a n e n h a n c e th e v a lu e o f e v e n t
lo g s in d ia g n o s is .
A c k n o w le d g m e n ts . T h is m a n u s c rip t is b a se d o n a re s e a rc h s u p p o rte d in p a rt b y
N A S A u n d e r g ra n t N A G -1 -6 1 3 , in c o o p e ra tio n w ith th e Illin o is C o m p u te r L a b o ra to ry
fo r A e ro sp a c e S y s te m s a n d S o ftw a re (IC L A S S ), b y T a n d e m C o m p u te rs , a n d in p a rt
b y a N A S A /J P L c o n tra c t 9 6 1 3 4 5 , a n d b y N S F g ra n ts C C R 0 0 -8 6 0 9 6 IT R a n d C C R
9 9 -0 2 0 2 6 .
R e fe r e n c e s
3 1 . K . W h is n a n t, R . Iy e r, Z . K a lb a rc z y k , P . Jo n e s, “ A n E x p e rim e n ta l E v a lu a tio n o f th e
A R M O R -b a se d R E E S o f tw a re -Im p le m e n te d F a u lt T o le ra n c e E n v ir o n m e n t,” p e n d in g
te c h n ic a l re p o rt, U n iv e rs ity o f Illin o is , U rb a n a , IL , 2 0 0
1 .
3 2 . K . W h is n a n t, e t a l., “ A n E x p e rim e n ta l E v a lu a tio n o f th e R E E S IF T E n v iro n m e n t fo r
S p a c e b o rn e A p p lic a tio n s ,” P ro c . O f In t. C o n f. O n D e p e n d a b le S y s te m s a n d N e tw o r k s
(D S N ’ 0 2 ), W a s h in g to n D C , p p . 5 8 5 -5 9 4 , Ju n e 2 0 0 2 .
Software Reliability and Rejuvenation: Modeling
and Analysis
Abstract. Several recent studies have established that most system out-
ages are due to software faults. Given the ever increasing complexity of
software and the well-developed techniques and analysis for hardware
reliability, this trend is not likely to change in the near future. In this
paper, we classify software faults and discuss various techniques to deal
with them in the testing/debugging phase and the operational phase of
the software. We discuss the phenomenon of software aging and a preven-
tive maintenance technique to deal with this problem called software re-
juvenation. Stochastic models to evaluate the effectiveness of preventive
maintenance in operational software systems and to determine optimal
times to perform rejuvenation for different scenarios are described. We
also present measurement-based methodologies to detect software aging
and estimate its effect on various system resources. These models are
intended to help develop software rejuvenation policies. An automated
online measurement-based approach has been used in the software reju-
venation agent implemented in a major commercial server.
1 Introduction
M.C. Calzarossa and S. Tucci (Eds.): Performance 2002, LNCS 2459, pp. 318–345, 2002.
c Springer-Verlag Berlin Heidelberg 2002
Software Reliability and Rejuvenation: Modeling and Analysis 319
to deal with software aging, explaining its various approaches and methods in
practice.
Popular and widely used software like the web browser Netscape is known
to suffer from serious memory leaks which lead to occasional crash/hang of the
application. This problem is particularly pronounced in systems with low swap
space. The newsreader software xrn also experiences problems due to memory
leaks. Software aging has not only been observed in software used on a mass scale
but also in specialized software used in high availability and safety-critical ap-
plications. This phenomenon has been observed in general purpose UNIX appli-
cations [25]. The applications experienced a crash/hang failure over time which
resulted in unplanned and expensive downtime. Avritzer and Weyuker [4] report
aging manifesting as gradual performance degradation in an industrial telecom-
munication software system. They deal with soft failures, i.e, a type of failure
where the system may enter a faulty state in which the system is still available
for service but has degraded to unacceptable performance levels, losing users or
packets. A similar kind of gradual performance degradation in file systems lead-
ing to a soft failure is discussed by Smith and Seltzer [39]. Their study shows
that in a degraded file system caused by normal usage and filling up of stor-
age space, the read throughput may be as much as 40% lower than that in an
empty file system. The reason behind this is the fragmentation of storage space
over time which results in non-sequential allocation of blocks. The most glaring
example of software aging in recent times is reported by Marshall [32]. In this
case, software aging resulted in loss of human life. The software system in the US
Patriot missiles deployed during the Gulf War accumulated numerical roundoff
error. This led to the interpretation of an incoming Iraqi Scud missile as a false
alarm which cost the lives of 28 US soldiers.
We designate faults attributed to software aging, which are quite different
from Bohrbugs and Heisenbugs, as aging-related faults. These faults are similar to
Heisenbugs in that they are activated under certain conditions (for example, lack
of OS resources) which may not be easily reproducible. However, as discussed
later, their modes and methods of recovery differ significantly. Figure 1 shows
our extended classification and treatment strategies for each class.
Techniques for tolerating faults in software have been divided into three classes:
examples of what cleaning the internal state of a software might involve. An ex-
treme, but well known example of rejuvenation is a hardware reboot. It has been
implemented in the real-time system collecting billing data for most telephone
exchanges in the United States [5]. A very similar technique called software
capacity restoration, has been used by Avritzer and Weyuker in a large telecom-
munications switching software [4], where the switching computer is rebooted
occasionally upon which its service rate is restored to the peak value. Grey [20]
proposed performing operations solely for fault management in SDI (Strategic
Defense Initiative) software which are invoked whether or not the fault exists
and called it operational redundancy. Tai et al. [41] have proposed and analyzed
the use of on-board preventive maintenance for maximizing the probability of
successful mission completion of spacecrafts with very long mission times. The
necessity of performing preventive maintenance in a safety critical environment
is evident from the example of aging in Patriot’s software [32]. The failure which
resulted in loss of human lives could have been prevented if the computer was
restarted after each 8 hours of running time. Rejuvenation has been implemented
in various other kinds of systems - transaction processing systems [7], web servers
[46] and cluster servers [8].
Software rejuvenation (preventive maintenance) incurs an overhead (in terms
of performance, cost and downtime) which should be balanced against the loss
incurred due to unexpected outage caused by a failure. Thus, an important
research issue is to determine the optimal times to perform rejuvenation. In this
paper, we present two approaches for analyzing software aging and studying
aging-related failures.
The rest of this paper is organized as follows. Section 2 describes various
analytical models for software aging and to determine optimal times to perform
rejuvenation. Measurement-based models are dealt with in Section 3. The im-
plementation of a software rejuvenation agent in a major commercial server is
discussed in Section 4. Section 5 describes various approaches and methods of
rejuvenation and Section 6 concludes the paper with pointers to future work.
The aim of the analytic modeling is to determine optimal times to perform re-
juvenation which maximize availability and minimize the probability of loss or
the response time of a transaction (in the case of a transaction processing sys-
tem). This is particularly important for business-critical applications for which
adequate response time can be as important as system uptime. The analysis
is done for different kinds of software systems exhibiting varied failure/aging
characteristics.
The accuracy of a modeling based approach is determined by the assumptions
made in capturing aging. In [12,13,14,25,41] only the failures causing unavail-
ability of the software are considered, while in [34] only a gradually decreasing
service rate of a software which serves transactions is assumed. Garg et al. [15],
however, consider both these effects of aging together in a single model. Mod-
Software Reliability and Rejuvenation: Modeling and Analysis 323
Figure 2 shows the basic software rejuvenation model proposed by Huang et al.
[25]. The software system is initially in a “robust” working state, 0. As time
progresses, it eventually transits to a “failure-probable” state 1. The system is
still operational in this state but can fail (move to state 2) with a non-zero
probability. The system can be repaired and brought back to the initial state
0. The software system is also rejuvenated at regular intervals from the failure
probable state 1 and brought back to the robust state 0.
completion of
completion of
repair
0 rejuvenation
state
change
2 1 3
system failure rejuvenation
Huang et al. [25] assume that the stochastic behavior of the system can be
described by a simple continuous-time Markov chain (CTMC) [43]. Let Z be the
random time interval when the highly robust state changes to the failure probable
324 K.S. Trivedi and K. Vaidyanathan
state A(t0 ). We make the following assumption that the mean time to repair is
strictly larger than the mean time to complete the software rejuvenation (i.e.,
μa > μc ). This assumption is quite reasonable and intuitive. The following result
gives the optimal software rejuvenation schedule for the semi-Markov model.
Assume that the failure time distribution is strictly IFR (increasing failure
rate) [43]. Define the following non-linear function:
. /
q(t0 ) = T (t0 ) − (μa − μc )rf (t0 ) + 1 S(t0 ), (2)
(ii) If q(0) ≤ 0, then the optimal software rejuvenation schedule is t∗0 = 0, i.e. it
is optimal to start the rejuvenation just after entering the failure probable
state, and the maximum system availability is A(0) = μ0 /(μ0 + μc ).
(iii) If q(∞) ≥ 0, then the optimal rejuvenation schedule is t∗0 → ∞, i.e. it
is optimal not to carry out the rejuvenation, and the maximum system
availability is A(∞) = (μ0 + λf )/(μ0 + μa + λf ).
If the failure time distribution is DFR (decreasing failure rate), then the
system availability A(t0 ) is a convex function of t0 , and the optimal rejuvenation
schedule is t∗0 = 0 or t∗0 → ∞ [10,11].
Garg et al. [12] have developed a Markov Regenerative Stochastic Petri Net
(MRSPN) model where rejuvenation is performed at deterministic intervals as-
suming that the failure probable state 1 is not observable.
R e c o v e rin g U n d e rg o in g P M
B A C
A v a ila b le
According to the model described above at any time t the software can be in
any one of three states: up and available for service (state A), recovering from a
failure (state B) or undergoing PM (state C). Let {Z(t), t ≥ 0} be a stochastic
process which represents the state of the software at time t. Further, let the
sequence of random variables Si , i > 0 represent the times at which transitions
among different states take place. Since the entrance times Si constitute renewal
points {Z(Si ), i > 0} is an embedded discrete time Markov chain (DTMC) with
a transition probability matrix P given by:
⎡ ⎤
0 PAB PAC
P = ⎣1 0 0 ⎦. (4)
1 0 0
The steady state probability πi of the DTMC being in state i, i ∈ {A, B, C} is:
1 1 1
π = [πA , πB , πC ] = , PAB , PAC . (5)
2 2 2
The software behavior is modeled via the stochastic process {(Z(t), N (t)) , t ≥
0}. If Z(t) = A, then N (t) ∈ {0, 1, . . . , K} as the queue can accommodate up
to K transactions. If Z(t) ∈ {B, C}, then N (t) = 0, since by assumption all
transactions arriving while the software is either recovering or undergoing PM
are lost. Further, the transactions already in the queue at the transition instant
are also discarded. It can be shown that the process {(Z(t), N (t)) , t ≥ 0} is a
Markov regenerative process (MRGP). Transition to state A from either B or C
constitutes a regeneration instant.
Let U be a random variable denoting the sojourn time in state A, and
denote its expectation by E[U ]. Expected sojourn times of the MRGP in
states B and C are already defined to be γf and γr . The steady state
availability is obtained using the standard formulae from MRGP theory:
ASS = P r{software is in state A}
πA E[U ]
= . (6)
πB γf + πC γr + πA E[U ]
The probability that a transaction is lost is defined as the ratio of expected
number of transactions which are lost in an interval to the expected total num-
ber of transactions which arrive during that interval. Since the evolution of
{Z(t), N (t)), t > 0} in the intervals comprising of successive visits to state A is
stochastically identical it suffices to consider just one such interval. The number
of transactions lost is given by the summation of three quantities: (1) transac-
tions in the queue when the system is exiting state A because of the failure or
initiation of PM (2) transactions that arrive while failure recovery or PM is in
progress and (3) transactions that are disregarded due to the buffer being full.
The last quantity is of special significance since the probability of buffer being
full will increase due to the degrading service rate. It follows that the probability
of loss is given by
328 K.S. Trivedi and K. Vaidyanathan
) ∞
πA E[Nl ] + λ πB γf + πC γr + πA pK (t)dt
0
Ploss = (7)
λ (πB γf + πC γr + πA E[U ])
where E[Nl ] is the expected number of transactions in the buffer when the system
is exiting state A. Equation 7 is valid only for policy II. Under policy
* ∞ I sojourn
time in state A is limited by δ, so the upper limit in the integral 0 pK (t)dt is
δ instead of ∞.
Next an upper bound on the mean response time of a transaction given that it
is successfully served, Tres , is derived. The mean number of transactions, denoted
by E, which are accepted for service while the software is in state A is given
by the mean number of transactions which are not accepted due to the buffer
being full, subtracted from the mean total number
of transactions
) which
arrive
∞
while the software is in state A, that is, E = λ E[U ] − pK (t)dt . Out of
t=0
these transactions, on the average, E[Nl ] are discarded later because of failure
or initiation of PM. Therefore, the mean number of transactions which actually
receive service given that they were accepted is given by E − E[Nl ]. The mean
total amount of )time the transactions spent in the system while the software is in
∞
state A is W = ipi (t) dt. This time is composed of the mean time spent
t=0 i
by the transactions which were served as well as those which were discarded,
denoted as WS and WD , respectively. Therefore, W = WS + WD . The response
time we are interested in is given by Tres = WS /(E − E[Nl ]), which is upper
W
bounded by Tres < E−E[N l]
.
pi (t) is the probability that there are i transactions queued for service, which
is also the probability of being in state i of the subordinated process at time t.
pi (t) is the probability that the system failed when there were i transactions
queued for service. These transient probabilities for both policies can be obtained
by solving the systems of forward differential-difference equations given in [15]. In
general they do not have a closed-form analytical solution and must be evaluated
numerically. Once these probabilities are obtained, the rest of the quantities PAB ,
PAC , E[U ] and E[Nl ] can be easily computed [15] and then used to obtain the
steady state availability ASS , the probability of transaction lost Ploss and the
upper bound on the response time of a transaction Tres .
Examples are presented to illustrate the usefulness of the presented model in
determining the optimum value of δ (PM interval in the case of policy I and PM
wait in the case of policy II). First, the service rate and failure rate are assumed
to be functions of real time, where ρ(t) is defined to be the hazard function
of Weibull distribution, while μ(t) is defined to be a monotone non-increasing
function that approximates the service degradation. Figure 4 shows Ass and Ploss
for both policies plotted against δ for different values of the mean time to perform
PM γr . Under both policies, it can be seen that for any particular value of δ,
higher the value of γr , lower is the availability and higher is the corresponding
loss probability. It can also be observed that the value of δ which minimizes
probability of loss is much lower than the one which maximizes availability. In
Software Reliability and Rejuvenation: Modeling and Analysis 329
0 .9 9 8
I, 0 .1 5
0 .9 9 7 0 .0 4
I, 0 .3 5
I, 0 .5 5
0 .9 9 6 0 .0 3 I, 0 .8 5
II, 0 .1 5
L o s s P ro b a b ility
A v a ila b ility 0 .9 9 5 0 .0 2
I, 0 .1 5 II, 0 .3 5
0 .9 9 4 I, 0 .3 5 II, 0 .5 5
I, 0 .5 5 0 .0 2 II, 0 .8 5
0 .9 9 3 I, 0 .8 5
II, 0 .1 5 0 .0 1
0 .9 9 2 II, 0 .3 5
II, 0 .5 5 0 .0 1
0 .9 9 1
II, 0 .8 5
0 .9 9 0 0 .0 0
0 .0 1 0 0 .0 2 0 0 .0 3 0 0 .0 4 0 0 .0 0 .0 5 0 .0 1 0 0 .0 1 5 0 .0
δ δ
Fig. 4. Results for experiment 1
fact, the probability of loss becomes very high at values of δ which maximize
availability. For any specific value of γr , policy II results in a lower minima in
loss probability than that achieved under policy I. Therefore, if the objective is
to minimize long run probability of loss, such as in the case of telecommunication
switching software, policy II always fares better than policy I.
1 .0 0 0 0 .1 0 6 .0
5 .0 re a l tim e
L o s s P ro b a b ility
0 .9 9 7 0 .0 8
R e s p o n s e T im e
b u s y tim e
4 .0
A v a ila b ility
n o fa ilu re
0 .9 9 5 0 .0 5 3 .0
re a l tim e re a l tim e
B u s y tim e b u s y tim e 2 .0
0 .9 9 2 n o fa ilu re 0 .0 3 n o fa ilu re
1 .0
0 .9 9 0 0 .0 0 0 .0
0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0
δ δ δ
Figure 5 shows ASS , Ploss and upper bound on Tres plotted against δ under
policy I. Each of the figures contains three curves. μ(·) and ρ(·) in the solid curve
are functions of real time μ(t) and ρ(t), whereas in the dotted curve they are
functions (with the same parameters) of the mean total processing time μ(L(t))
and ρ(L(t)). The dashed curve represents a third system in which no crash/hang
failures occur ρ(·) = 0, but service degradation is present with μ(·) = μ(t).
This experiment illustrates the importance of making the right assumptions in
capturing aging because as seen from the figure, depending on the forms chosen
for μ(·) and ρ(·), the measures vary in a wide range.
Software rejuvenation has been applied to cluster systems [8,45]. This is a novel
application, which significantly improves cluster system availability and produc-
tivity. The Stochastic Reward Net (SRN) model of a cluster system employing
simple time-based rejuvenation is shown in Figure 6. The cluster consists of n
nodes which are initially in a “robust” working state, Pup . The aging process
330 K.S. Trivedi and K. Vaidyanathan
P c lo c k
n
T im m d 4
P u p
n
g 2
g 1 T re ju v in te rv a l
T c m o d e P s ta rtre ju v
g 2
#
T fp ro b
T n o d e re p a ir g 6 T im m d 1 0
g 5
T im m d 8 g 4 T im m d 9 g 4
T s y s re p a ir T im m d 6 T im m d 5 p ro b = n 1 p ro b = n 2
P n o d e fa il2 P fp ro b
T im m d 1 3 P re ju v 1 P re ju v 2
g 1 g 1
T im m d 3 g 3
# g 1
T im m d 1 T n o d e fa il T re ju v 1 T re ju v 2
c
T im m d 1 1 T im m d 1 2 g 1
g 7 g 7
P s y s fa il P n o d e fa il1
(1 -c ) T im m d 1 4
# #
P re ju v e d
T im m d 2 T n o d e fa ilre ju v P fp ro b re ju v T fp ro b re ju v
T im m d 1 5 g 1 T im m d 7
g 1
In the simple time-based policy, rejuvenation is done successively for all the
operational nodes in the cluster, at the end of each deterministic interval. The
transition Trejuvinterval fires every d time units depositing a token in place
Pstartrejuv . Only one node can be rejuvenated at any time (at places Prejuv1
or Prejuv2 ). Weight functions are assigned such that the probability of selecting
a token from Pup or Pf prob is directly proportional to the number of tokens in
each. After a node has been rejuvenated, it goes back to the “robust” working
state, represented by place Prejuved . This is a duplicate place for Pup in order to
distinguish the nodes which are waiting to be rejuvenated from the nodes which
have already been rejuvenated. A node, after rejuvenation, is then allowed to
fail with the same rates as before rejuvenation even when another node is being
rejuvenated. Duplicate places for Pupb and Pf prob are needed to capture this.
Node repair is disabled during rejuvenation. Rejuvenation is complete when the
sum of nodes in places Prejuved , Pf probrejuv and Pnodef ail2 is equal to the total
number of nodes, n. In this case, the immediate transition Timmd10 fires, putting
back all the rejuvenated nodes in places Pup and Pf prob . Rejuvenation stops
when there are a−1 tokens in place Pnodef ail2 , to prevent a system failure. The
clock resets itself when rejuvenation is complete and is disabled when the system
Software Reliability and Rejuvenation: Modeling and Analysis 331
is undergoing repair. Guard functions (g1 through g7) are assigned to express
complex enabling conditions textually.
In condition-based rejuvenation (Figure 7), rejuvenation is attempted only
when a node transits into the “failure probable” state. In practice, this degraded
state could be predicted in advance by means of analyses of some observable
system parameters [16]. In case of a successful prediction, assuming that no
other node is being rejuvenated at that time, the newly detected node can be
rejuvenated. A node is allowed to fail even while waiting for rejuvenation.
n
T im m d 4
P u p
n
g 1
P d e te c t T im m d 8
T im m d 1 1
g 2 T c m o d e
#
T fp ro b
T n o d e re p a ir g 1
c 2
g 5
T im m d 7 g 4
T im m d 6 P fp ro b
T s y s re p a ir P n o d e fa il2 # T im m d 9
P re ju v
T n o d e fa il2
T im m d 3 g 1
g 3
g 1
T im m d 1 (1 -c 2 ) T im m d 1 0 T re ju v
c 1
P s y s fa il T n o d e fa il1
(1 -c 1 ) P n o d e fa il1
P d e te c tfa il
#
T im m d 2
g 1 T im m d 5
For the analyses, the following values are assumed. The mean times spent in
places Pup and Pf prob are 240 hrs and 720 hrs respectively. The mean times to
repair a node, to rejuvenate a node and to repair the system are 30 min, 10 min
and 4 hrs respectively. In this analysis, the common-mode failure is disabled and
node failure coverage is assumed to be perfect. All the models were solved using
the SPNP (Stochastic Petri Net Package) tool [22]. The measures computed
were expected unavailability and the expected cost incurred over a fixed time
interval. It is assumed that the cost incurred due to node rejuvenation is much
less than the cost of a node or system failure since rejuvenation can be done at
predetermined or scheduled times. In our analysis, we fix the value for costnodef ail
at $5,000/hr, the costrejuv at $250/hr. The value of costsysf ail is computed as
the number of nodes, n, times costnodef ail .
Figure 8 shows the plots for an 8/1 configuration (8 nodes including 1 spare)
system employing simple time-based rejuvenation. The upper plot and lower
plots show the expected cost incurred and the expected downtime (in hours)
respectively in a given time interval, versus rejuvenation interval (time between
successive rejuvenation) in hours. If the rejuvenation interval is close to zero, the
332 K.S. Trivedi and K. Vaidyanathan
system is always rejuvenating and thus incurs high cost and downtime. As the
rejuvenation interval increases, both expected unavailability and cost incurred
decrease and reach an optimum value. If the rejuvenation interval goes beyond
the optimal value, the system failure has more influence on these measures than
rejuvenation. The analysis was repeated for 2/1, 8/2, 16/1 and 16/2 configu-
rations. For time-based rejuvenation, the optimal rejuvenation interval was 100
hours for the 1-spare clusters, and approximately 1 hour for the 2-spare clus-
ters. In our analysis of condition-based rejuvenation, we assumed 90% prediction
coverage. For systems that have one spare, time-based rejuvenation can reduce
downtime by 26% relative to no rejuvenation. Condition-based rejuvenation does
somewhat better, reducing downtime by 62% relative to no rejuvenation. How-
ever, when the system can tolerate more than one failure at a time, downtime is
reduced by 98% to 95% via time-based rejuvenation, compared to a mere 85%
for condition-based rejuvenation.
4
x 1 0
2 .2
2
E x p e c te d C o s t
1 .8
1 .6
1 .4
1 .2
0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0
R e ju v e n a tio n In te r v a l ( h o u r s )
1 .0 5
1
E x p e c te d D o w n tim e
0 .9 5
0 .9
0 .8 5
0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0
R e ju v e n a tio n In te r v a l ( h o u r s )
et al. [16] propose a methodology for detection and estimation of aging in the
UNIX operating system. An SNMP-based distributed resource monitoring tool
was used to collect operating system resource usage and system activity data
from nine heterogeneous UNIX workstations connected by an Ethernet LAN at
the Department of Electrical and Computer Engineering at Duke University. A
central monitoring station runs the manager program which sends get requests
periodically to each of the agent programs running on the monitored work-
stations. The agent programs in turn obtain data for the manager from their
respective machines by executing various standard UNIX utility programs like
pstat, iostat and vmstat. For quantifying the effect of aging in operating system
resources, the metric Estimated time to exhaustion is proposed. The earlier work
[16] uses a purely time-based approach to estimate resource exhaustion times,
whereas the the work presented in [44] takes into account the current system
workload as well.
A methodology based on time-series analysis to detect and estimate resource
exhaustion times due to software aging in a web server while subjecting it to
an artificial workload, is proposed in [31]. Avritzer and Weyuker [4] monitor
production traffic data of a large telecommunication system and describe a reju-
venation strategy which increases system availability and minimizes packet loss.
Cassidy et al. [7] have developed an approach to rejuvenation for large online
transaction processing servers. They monitor various system parameters over a
period of time. Using pattern recognition methods, they come to the conclusion
that 13 of those parameters deviate from normal behavior just prior to a crash,
providing sufficient warning to initiate rejuvenation.
3 5 0 0 0
R e a l M e m o ry F re e
2 5 0 0 0 1 5 0 0 0
0 1 0 2 0 3 0 4 0 5 0
T im e
1 8 0 2 0 0 2 2 02 6 0
2 4 0
F ile T a b le S iz e
1 6 0
1 4 0
0 1 0 2 0 3 0 4 0 5 0
T im e
is rejected for the variables considered. Given that a global trend is present
and that its slope is calculated for a particular resource, the time at which the
resource will be exhausted because of aging only, is estimated. Table 1 refers to
several objects on Rossby and lists an estimate of the slope (change per day) of
the trend obtained by applying Sen’s slope estimate for data with seasons [16].
The values for real memory and swap space are in Kilobytes. A negative slope, as
in the case of real memory, indicates a decreasing trend, whereas a positive slope,
as in the case of file table size, is indicative of an increasing trend. Given the
slope estimate, the table lists the estimated time to failure of the machine due to
aging only with respect to this particular resource. The calculation of the time
to exhaustion is done by using the standard linear approximation y = mx + c.
A comparative effect of aging on different system resources can be obtained
from the above estimates. Overall, it was found that file table size and process
table size are not as important as used swap space and real memory free since they
have a very small slope and high estimated times to failure due to exhaustion.
Based on such comparisons, we can identify important resources to monitor and
manage in order to deal with aging related software failures. For example, the
resource used swap space has the highest slope and real memory free has the
second highest slope. However, real memory free has a lower time to exhaustion
than used swap space.
Software Reliability and Rejuvenation: Modeling and Analysis 335
Table 1. Estimated slope and time to exhaustion for Rossby, Velum and Jefferson
objects
The method discussed in the previous subsection assumes that accumulated use
of a resource over a time period depends only on the elapsed time. However, it
is intuitive that the rate at which a resource is consumed is dependent on the
current workload. In this subsection, we discuss a measurement-based model to
estimate the rate of exhaustion of operating system resources as a function of
both time and the system workload [44]. The SNMP-based distributed resource
monitoring tool described previously was used for collecting operating system
resource usage and system activity parameters (at 10 min intervals) for over 3
months. Only results for the data collected from the machine Rossby are dis-
cussed here. The longest stretch of sample points in which no reboots or failures
occurred were used for building the model. A semi-Markov reward model [42] is
constructed using the data. First different workload states are identified using
statistical cluster analysis and a state-space model is constructed. Corresponding
to each resource, a reward function based on the rate of resource exhaustion in
the different states is then defined. Finally the model is solved to obtain trends
and the estimated exhaustion rates and time to exhaustion for the resources.
The following variables were chosen to characterize the system workload -
cpuContextSwitch, sysCall, pageIn, and pageOut. Hartigan’s k-means clustering
algorithm [21] was used for partitioning the data points into clusters based on
workload. The statistics for the eleven workload clusters obtained are shown
in Table 2. Clusters whose centroids were relatively close to each other and
those with a small percentage of data points in them, were merged to simplify
computations. The resulting clusters are W1 = {1, 2, 3}, W2 = {4, 5}, W3 = {6},
W4 = {7}, W5 = {8}, W6 = {9}, W7 = {10} and W8 = {11}.
Transition probabilities from one state to another were computed from data,
resulting in transition probability matrix P of the embedded discrete time
Markov chain The sojourn time distribution for each of the workload states
was fitted to either 2-stage hyper-exponential or 2-stage hypo-exponential dis-
336 K.S. Trivedi and K. Vaidyanathan
Cluster Center % of
No. cpuConSw sysCall pgOut pgIn pts.
1 48405.16 94194.66 5.16 677.83 0.98
2 54184.56 122229.68 5.39 81.41 0.76
3 34059.61 193927.00 0.02 136.73 0.93
4 20479.21 45811.71 0.53 243.40 1.89
5 21361.38 37027.41 0.26 12.64 7.17
6 15734.65 54056.27 0.27 14.45 6.55
7 37825.76 40912.18 0.91 12.21 11.77
8 11013.22 38682.46 0.03 10.43 42.87
9 67290.83 37246.76 7.58 19.88 4.93
10 10003.94 32067.20 0.01 9.61 21.23
11 197934.42 67822.48 415.71 184.38 0.93
tribution functions. The fitted distributions were tested using the Kolmogorov-
Smirnov test at a significance level of 0.01.
Two resources, usedSwapSpace and realMemoryFree, are considered for the
analysis, since the previous time-based analysis suggested that they are criti-
cal resources. For each resource, the reward function is defined as the rate of
corresponding resource exhaustion in different states. The true slope (rate of
increase/decrease) of a resource at every workload state is estimated by using
Sen’s non-parametric method [44]. Table 3 shows the slopes with 95% confidence
intervals.
It was observed that slopes in a given workload state for a particular resource
during different visits to that state are almost the same. Further, the slopes across
different workload states are different and generally higher the system activity,
higher is the resource utilization. This validates the assumption that resource
usage does depend on the system workload and the rates of exhaustion vary
with workload changes. It can also be observed from Table 3 that the slopes
for usedSwapSpace in all the workload states are non-negative, and the slopes
for realMemoryFree are non-positive in all the workload states except in one.
It follows that usedSwapSpace increases whereas realMemoryFree decreases over
time which validates the software aging phenomenon.
The semi-Markov reward model was solved using the SHARPE tool [37] de-
veloped by researchers at Duke University. The slope for the workload-based esti-
mation is computed as the expected reward rate in steady state from the model.
The times to resource exhaustion is computed as the job completion time (mean
time to accumulate x amount of reward) of the Markov reward model. Table 4
gives the estimates for the slope and time to exhaustion for usedSwapSpace and
realMemoryFree. It can be seen that workload based estimations gave a lower
time to resource exhaustion than those computed using time based estimations.
Since the machine failures due to resource exhaustion were observed much before
Software Reliability and Rejuvenation: Modeling and Analysis 337
usedSwapSpace realMemoryFree
State Slope 95 % Conf. Slope 95 % Conf.
Est. Interval Est. Interval
W1 119.3 5.5 - 222.4 -133.7 -137.7 - -133.3
W2 0.57 0.40 - 0.71 -1.47 -1.78 - -1.09
W3 0.76 0.73 - 0.80 -1.43 -2.50 - -0.62
W4 0.57 0.00 - 0.69 -1.23 -1.67 - -0.80
W5 0.78 0.75 - 0.80 0.00 -5.65 - 6.00
W6 0.81 0.64 - 1.00 -1.14 -1.40 - -0.88
W7 0.00 0.00 - 0.00 0.00 0.00 - 0.00
W8 91.8 72.4 - 111.0 91.7 -369.9 - 475.2
the times to resource exhaustion estimated by the time based method, it follows
that the workload based approach results in better estimations.
Table 4. Estimates for slope (in KB/10 min) and time to exhaustion (in days) for
usedSwapSpace and realMemoryFree
The first data set was collected in a 7-day period with a connection rate of
350 requests/sec. The second set was collected in a 25-day period with connec-
tion rate of 400 request/sec. During the experiment, we recorded more than 100
parameters, but for our modeling purposes, six representative parameters per-
taining to system resources were selected (Table 5). In addition to the six system
status parameters, the response time of the web server, recorded by httperf on
the client machine, is also included in the model as a measure of performance of
the web server.
3 0 0 4 0
2 5 0 3 5
2 0 0 3 0
r e s p o n s e tim e ( m s )
r e s p o n s e tim e ( m s )
1 5 0 2 5
1 0 0 2 0
5 0 1 5
0 1 0
0 5 0 1 0 0 1 5 0 2 0 0 0 5 0 1 0 0 1 5 0 2 0 0
tim e ( h o u r s ) tim e ( h o u r s )
(a ) (b )
propriate model with one output and four inputs for each parameter - connection
rate, linear trend, periodic series with a period of one week, and periodic series
with a period of one day. The autocorrelation function (ACF) and the partial
autocorrelation function (PACF) for the output are computed. The ACF and
the PACF help us decide the appropriate model for the data [38]. For example,
from the ACF and PACF of used swap space it can be determined that an au-
toregressive model of order 1 [AR(1)] is suitable for this data series. Adding the
inputs to the AR(1) model, we get the ARX(1) model for used swap space:
Yt = aYt−1 + b1 Xt + b2 Lt + b3 Wt + b4 Dt , (8)
where Yt is the used swap space, Xt is the connection rate, Lt is the time step
which represents the linear trend, Wt is the weekly periodic series and Dt is the
daily periodic series. After observing the ACF and PACF of all the parameters,
we find that all of the PACFs cut off at certain lags. So all the multiple input
single output (MISO) models are of the ARX type, only with different orders.
This gives great convenience in combining them into a multiple input multiple
output (MIMO) ARX model which is described later.
In order to combine the MISO ARX models into a MIMO ARX model, we
need to choose the order between different outputs. This is done by inspecting
the CCF (cross-correlation function) between each pair of the outputs to find
out the leading relationship between them. If the CCF between parameter A and
340 K.S. Trivedi and K. Vaidyanathan
B gets its peak value at a positive lag k, we say that A leads B by k steps and
it might be possible to use A to predict B. In our analysis, there are 21 CCFs
that need to be computed. And in order to reduce the complexity, we only use
the CCFs that exhibit obvious leading relationship with lags less than 10 steps.
The next step after determination of the orders is to estimate the coefficients
of the model by the least squares method. The first half of the data is used to
estimate the parameters and the rest of the data is then used to verify the model.
Figure 11 shows the two-hour-ahead (24-step) predicted used swap space which
6
x 1 0
1 4
m e a s u re d
1 3 tw o − h o u r p r e d ic te d
1 2
1 1
u s e d s w a p s p a c e (b y te s )
1 0
4
0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0
T im e ( h o u r s )
Fig. 11. Measured and two-hour ahead predicted used swap space
is computed using the established model and the data measured up to two hours
before the predicted time point. From the plots, we can see that the predicted
values are very close to the measured values.
The first commercial version of a software rejuvenation agent (SRA) for the IBM
xSeries line of cluster servers has been implemented with our collaboration [8,
26,45]. The SRA was designed to monitor consumable resources, estimate the
time to exhaustion of those resources, and generate alerts to the management in-
frastructure when the time to exhaustion is less than a user-defined notification
horizon. For Windows operating systems, the SRA acquires data on exhaustible
resources by reading the registry performance counters and collecting parameters
such as available bytes, committed bytes, non-paged pool, paged pool, handles,
threads, semaphores, mutexes, and logical disk utilization. For Linux, the agent
accesses the /proc directory structure and collects equivalent parameters such
Software Reliability and Rejuvenation: Modeling and Analysis 341
as memory utilization, swap space, file descriptors and inodes. All collected pa-
rameters are logged on to disk. They are also stored in memory preparatory to
time-to-exhaustion analysis.
In the current version of the SRA, rejuvenation can be based on elapsed time
since the last rejuvenation, or on prediction of impending exhaustion. When
using Timed Rejuvenation, a user interface is used to schedule and perform re-
juvenation at a period specified by the user. It allows the user to select when
to rejuvenate different nodes of the cluster, and to select “blackout” times dur-
ing which no rejuvenation is to be allowed. Predictive Rejuvenation relies on
curve-fitting analysis and projection of the utilization of key resources, using
recently observed data. The projected data is compared to prespecified upper
and lower exhaustion thresholds, within a notification time horizon. The user
specifies the notification horizon and the parameters to be monitored (some pa-
rameters believed to be highly indicative are always monitored by default), and
the agent periodically samples the data and performs the analysis. The predic-
tion algorithm fits several types of curves to the data in the fitting window. These
different curve types have been selected for their ability to capture different types
of temporal trends. A model-selection criterion is applied to choose the “best”
prediction curve, which is then extrapolated to the user-specified horizon. The
several parameters that are indicative of resource exhaustion are monitored and
extrapolated independently. If any monitored parameter exceeds the specified
minimum or maximum value within the horizon, a request to rejuvenate is sent
to the management infrastructure. In most cases, it is also possible to identify
which process is consuming the preponderance of the resource being exhausted,
in order to support selective rejuvenation of just the offending process or a group
of processes.
S O F T W A R E R E JU V E N A T IO N
O p e n -lo o p a p p ro a c h C lo s e d -lo o p a p p ro a c h
E la p s e d E la p s e d tim e
tim e a n d lo a d O ff-lin e O n -lin e
(p e rio d ic )
6 Conclusions
References
1. E. Adams. Optimizing Preventive Service of the Software Products. IBM Journal
of R&D, 28(1):2-14, January 1984.
2. P. E. Amman and J. C. Knight. Data Diversity: An Approach to Software Fault
Tolerance. In Proc. of 17th Int. Symp. on Fault Tolerant Computing, pages 122-126,
June 1987.
3. A. Avizienis and L. Chen. On the Implementation of N-version Programming for
Software Fault Tolerance During Execution. In Proc. IEEE COMPSAC 77, pp
149-155, November 1977.
4. A. Avritzer and E.J. Weyuker. Monitoring Smoothly Degrading Systems for In-
creased Dependability. Empirical Software Eng. Journal, Vol 2, No. 1, pp 59-77,
1997.
5. L. Bernstein. Text of seminar delivered by Mr. Bernstein. In University Learning
Center, George Mason University, January 29 1996.
6. A. Bobbio, A. Sereno and C. Anglano. Fine Grained Software Degradation Models
for Optimal rejuvenation policies. Performance Evaluation, Vol. 46, pp 45-62, 2001.
7. K. Cassidy, K. Gross and A. Malekpour. Advanced Pattern Recognition for De-
tection of Complex Software Aging in Online Transaction Processing Servers. In
Proc. Dependable Systems and Networks, DSN 2002, Washington D.C., June 2002.
8. V. Castelli, R. E. Harper, P. Heidelberger, S. W. Hunter, K. S. Trivedi, K.
Vaidyanathan and W. Zeggert. Proactive Management of Software Aging. IBM
Journal of R&D, Vol. 45, No.2, March 2001.
9. R. Chillarege, S. Biyani and J. Rosenthal. Measurement of Failure Rate in Widely
Distributed Software. In Proc. of 25th IEEE Int. Symp. on Fault Tolerant Com-
puting, pp 424-433, Pasadena, CA, July 1995.
10. T. Dohi, K. Goševa–Popstojanova and K. S. Trivedi. Analysis of Software Cost
Models with Rejuvenation. In Proc. of the 5th IEEE Int. Symp. on High Assurance
Systems Engineering, HASE 2000, Albuquerque, NM, November 2000.
11. T. Dohi, K. Goševa–Popstojanova and K. S. Trivedi. Statistical Non-Parametric
Algorithms to Estimate the Optimal Software Rejuvenation Schedule. Proc. of the
2000 Pacific Rim Int. Symp. on Dependable Computing, PRDC 2000, Los Angeles,
CA, December 2000.
12. S. Garg, A. Puliafito, M. Telek and K. S. Trivedi. Analysis of Software Rejuvenation
Using Markov Regenerative Stochastic Petri Net. In Proc. of the Sixth Int. Symp.
on Software Reliability Engineering, pp 180-187, Toulouse, France, October 1995.
344 K.S. Trivedi and K. Vaidyanathan
13. S. Garg, Y. Huang, C. Kintala and K. S. Trivedi. Time and Load Based Soft-
ware Rejuvenation: Policy, Evaluation and Optimality. In Proc. of the First Fault-
Tolerant Symposium, Madras, India, December 1995.
14. S. Garg, Y. Huang and C. Kintala, K.S. Trivedi, Minimizing Completion Time of
a Program by Checkpointing and Rejuvenation. Proc. 1996 ACM SIGMETRICS
Philadelphia, PA, pp 252-261, May 1996.
15. S. Garg, A. Puliafito, M. Telek and K. S. Trivedi. Analysis of Preventive Main-
tenance in Transactions Based Software Systems. IEEE Trans. on Computers, pp
96-107, Vol.47, No.1, January 1998.
16. S. Garg, A. van Moorsel, K. Vaidyanathan and K. S. Trivedi. A Methodology for
Detection and Estimation of Software Aging. In Proc. of the Ninth Int. Symp.
on Software Reliability Engineering, pp 282-292, Paderborn, Germany, November
1998.
17. J. Gray. Why do Computers Stop and What Can be Done About it? In Proc. of
5th Symp. on Reliability in Distributed Software and Database Systems, pp 3-12,
January 1986.
18. J. Gray. A Census of Tandem System Availability Between 1985 and 1990. IEEE
Trans. on Reliability, 39:409-418, October 1990.
19. J. Gray and D. P. Siewiorek. High-Availability Computer Systems. IEEE Com-
puter, pages 39-48, September 1991.
20. B. O. A. Grey. Making SDI Software Reliable through Fault-tolerant Techniques.
Defense Electronics, pp 77–80,85–86, August 1987.
21. J. A. Hartigan. Clustering Algorithms. New York:Wiley, 1975.
22. C. Hirel, B. Tuffin and K. S. Trivedi. SPNP: Stochastic Petri Net Package. Version
6.0. B. R. Haverkort et al. (eds.): TOOLS 2000, Lecture Notes in Computer Science
1786, pp 354-357, Springer-Verlag Heidelberg, 2000.
23. J. J. Horning, H. C. Lauer, P. M. Melliar-Smith and B. Randell. A Program
Structure for Error Detection and Recovery. Lecture Notes in Computer Science,
16:177-193, 1974.
24. Y. Huang, P. Jalote and C. Kintala. Two Techniques for Transient Software Error
Recovery. Lecture Notes in Computer Science, Vol. 774, pp 159-170. Springer
Verlag, Berlin, 1994.
25. Y. Huang, C. Kintala, N. Kolettis and N. D. Fulton. Software Rejuvenation:
Analysis, Module and Applications. In Proc. of 25th Symp. on Fault Tolerant
Computing, pp 381-390, Pasadena, CA, June 1995.
26. IBM Netfinity Director Software Rejuvenation - White Paper. IBM Corporation,
Research Triangle Park, NC, January 2001.
27. P. Jalote, Y. Huang and C. Kintala. A Framework for Understanding and Handling
Transient Software Failures. In Proc. 2nd ISSAT Int. Conf. on Reliability and
Quality in Design, Orlando, FL, 1995.
28. J. C. Laprie, J. Arlat, C. Béounes, K. Kanoun and C. Hourtolle. Hardware and Soft-
ware Fault Tolerance: Definition and Analysis of Architectural Solutions. In Proc.
of 17th Symp. on Fault Tolerant Computing, pp 116-121, Pittsburgh, PA,1987.
29. J. C. Laprie (Ed.). Dependability: Basic Concepts and Terminology. Springer-
Verlag, Wien, New York, 1992.
30. I. Lee and R. K. Iyer. Software Dependability in the Tandem GUARDIAN System.
IEEE Trans. on Software Engineering, pp 455-467, Vol. 21, No. 5, May 1995.
31. L. Li, K. Vaidyanathan and K. S. Trivedi. An Approach to Estimation of Soft-
ware Aging in a Web Server. In Proc. of the Int. Symp. on Empirical Software
Engineering, ISESE 2002, Nara, Japan, October 2002 (to appear).
Software Reliability and Rejuvenation: Modeling and Analysis 345
32. E. Marshall. Fatal Error: How Patriot Overlooked a Scud. Science, pp 1347, March
13 1992.
33. D. Mosberger and T. Jin. Httperf - A Tool for Measuring Web Server Performance
In First Workshop on Internet Server Performance, WISP, Madison, WI, pp.59-67,
June 1998.
34. A. Pfening, S. Garg, A. Puliafito, M. Telek and K. S. Trivedi. Optimal Rejuvenation
for Tolerating Soft Failures. Performance Evaluation, 27 & 28, pp 491-506, October
1996.
35. D. K. Pradhan. Fault-Tolerant Computer System Design. Prentice Hall, Englewood
Cliffs, NJ, 1996.
36. S. M. Ross. Stochastic Processes. John Wiley & Sons, New York, 1983.
37. R. A. Sahner, K. S. Trivedi, A. Puliafito. Performance and Reliability Analysis
of Computer Systems - An Example-Based Approach Using the SHARPE Software
Package. Kluwer Academic Publishers, Norwell, MA, 1996.
38. R. H. Shumway and D. S. Stoffer. Time Series Analysis and Its Applications,
Springer-Verlag, New York, 2000.
39. K. Smith and M. Seltzer. File System Aging - Increasing the Relevance of File
System Benchmarks In Proc. of ACM SIGMETRICS, June 1997.
40. M. Sullivan and R. Chillarege. Software Defects and Their Impact on System
Availability - A Study of Field Failures in Operating Systems. In Proc. 21st IEEE
Int. Symp. on Fault Tolerant Computing, pages 2–9, 1991.
41. A. T. Tai, S. N. Chau, L. Alkalaj, and H. Hecht. On-board Preventive Mainte-
nance: Analysis of Effectiveness and Optimal Duty Period. In Proc. of 3rd Int.
Workshop on Object-oriented Real-time Dependable Systems, Newport Beach, Cal-
ifornia, February 1997.
42. K. S. Trivedi, J. Muppala, S. Woolet and B. R. Haverkort. Composite Performance
and Dependability Analysis. Performance Evaluation, Vol. 14, Nos. 3-4, pp 197-
216, February 1992.
43. K. S. Trivedi. Probability and Statistics, with Reliability, Queuing and Computer
Science Applications, 2nd edition. John Wiley, 2001.
44. K. Vaidyanathan and K. S. Trivedi. A Measurement-Based Model for Estimation
of Resource Exhaustion in Operational Software Systems. In Proc. of the Tenth
IEEE Int. Symp. on Software Reliability Engineering, pp 84-93, Boca Raton, FL,
November 1999.
45. K. Vaidyanathan, R. E. Harper, S. W. Hunter, K. S. Trivedi. Analysis and Imple-
mentation of Software Rejuvenation in Cluster Systems. In Proc. of the Joint Int.
Conf. on Measurement and Modeling of Computer Systems, ACM SIGMETRICS
2001/Performance 2001, Cambridge, MA, June 2001.
46. http://www.apache.org
47. http://www.software-rejuvenation.com
P e r fo r m a n c e V a lid a tio n o f M o b ile S o ftw a r e A r c h ite c tu r e s
1 2 1
V in c e n z o G ra s s i , V itto rio C o rte lle s s a , R a ffa e la M ira n d o la
1
D ip a rtim e n to d i In fo rm a tic a , S is te m i e P ro d u z io n e
U n iv e rs ità d i R o m a “ T o r V e rg a ta ” , Ita ly
g r a s s i v @ a c m . o r g , m i r a n d o l a @ i n f o . u n i r o m a 2 . i t
2
D ip a rtim e n to d i In fo rm a tic a
U n iv e rs ità d e L ’A q u ila , Ita ly
c o r t e l l e @ u n i v a q . i t
A b s t r a c t . D e s ig n p a ra d ig m s b a s e d o n th e id e a o f c o d e m o b ility h a v e b e e n
re c e n tly in tro d u c e d , w h e re c o m p o n e n ts o f a n a p p lic a tio n m a y (a u to n o m o u s ly
o r u p o n re q u e s t) m o v e to d iffe re n t lo c a tio n s , d u rin g th e a p p lic a tio n
e x e c u tio n . B e s id e s , s o ftw a re te c h n o lo g ie s a re re a d ily a v a ila b le (e .g . J a v a -
b a s e d ), th a t p ro v id e to o ls to im p le m e n t th e s e p a ra d ig m s . B a s e d o n m o b ile
c o d e p a ra d ig m s a n d te c h n o lo g ie s , d iffe re n t b u t fu n c tio n a lly e q u iv a le n t
s o f tw a r e a r c h ite c tu r e s c a n b e d e f in e d a n d it is w id e ly re c o g n iz e d th a t, in
g e n e ra l, th e a d o p tio n o f a p a rtic u la r a rc h ite c tu re c a n h a v e a la rg e im p a c t o n
q u a lity a ttrib u te s su c h a s m o d ifia b ility , re u s a b ility , re lia b ility , a n d
p e rfo rm a n c e . H e n c e , v a lid a tio n a g a in s t s p e c ific a ttrib u te s is n e c e s s a ry a n d
c la im s fo r a c a re fu l p la n n in g o f th is a c tiv ity . W ith in th is fra m e w o rk , th e g o a l
o f th is tu to ria l is tw o fo ld : to p ro v id e a g e n e ra l m e th o d o lo g y fo r th e
v a lid a tio n o f s o ftw a re a rc h ite c tu re s , w h e re th e fo c u s is o n th e tra n s itio n fro m
th e m o d e lin g o f s o ftw a re a rc h ite c tu re s to th e v a lid a tio n o f n o n -fu n c tio n a l
re q u ire m e n ts ; to s u b s ta n tia te th is g e n e ra l m e th o d o lo g y in to th e s p e c ific c a s e
o f s o ftw a re a rc h ite c tu re s e x p lo itin g m o b ile c o d e .
1 I n tr o d u c tio n
T h e p e rv a s iv e d e p lo y m e n t o f la rg e -s c a le n e tw o rk in g in fra s tru c tu re s is v a s tly c h a n g in g
th e a rc h ite c tu re o f s o ftw a re s y s te m s a n d a p p lic a tio n s , le a d in g to m o re a n d m o re
a p p lic a tio n s d e s ig n e d to o p e ra te in d is trib u te d w id e a re a e n v iro n m e n ts , th u s in tro d u c in g
n e w c h a lle n g e s to a rc h ite c ts o f s c a la b le d is trib u te d a p p lic a tio n s . In d e e d , th e la rg e
n u m b e r o f a v a ila b le h o s ts w ith v e ry d iffe re n t c a p a b ilitie s , c o n n e c te d b y n e tw o rk s w ith
v a ry in g c a p a c itie s a n d lo a d s , im p lie s th a t th e d e s ig n e r is u n lik e ly to k n o w a p rio ri h o w
to s tru c tu re th e a p p lic a tio n in a w a y th a t b e s t le v e ra g e s th e a v a ila b le in fra s tru c tu re , a n d
th a t a n y a s s u m p tio n re g a rd in g th e u n d e rly in g p h y s ic a l s y s te m , w h ic h is m a d e e a rly a t
th e d e s ig n tim e , is u n lik e ly to h o ld la te r.
T h is h ig h ly h e te ro g e n e o u s a n d d y n a m ic e n v iro n m e n t ris e s p ro b le m s th a t c o u ld b e
c o n s id e re d n e g lig ib le in lo c a l a re a e n v ir o n m e n ts . A s a c o n s e q u e n c e , te c h n o lo g ie s ,
a rc h ite c tu re s a n d m e th o d o lo g ie s tra d itio n a lly u s e d to d e v e lo p d is trib u te d a p p lic a tio n s in
lo c a l a r e a e n v ir o n m e n ts , u s u a lly b a s e d o n th e n o tio n o f lo c a tio n tr a n s p a r e n c y , e x h ib it
s e v e ra l lim ita tio n s in w id e a re a e n v iro n m e n ts , a n d o fte n fa il in p ro v id in g th e d e s ire d
q u a lity le v e l. O n th e c o n tra ry , lo c a tio n a w a r e n e s s h a s b e e n s u g g e s te d a s a n in n o v a tiv e
a p p ro a c h in th e d e s ig n o f s o ftw a re a p p lic a tio n s fo r w id e a re a e n v iro n m e n ts , to d e a l s in c e
th e e a rly d e s ig n p h a s e s w ith th e c h a ra c te ris tic s a n d c o n s tra in ts o f th e d iffe re n t lo c a tio n s .
E x p lic itly c o n s id e rin g c o m p o n e n ts lo c a tio n a t th e a p p lic a tio n le v e l s tra ig h tfo rw a rd ly
le a d s to e x p lo it th e lo c a tio n c h a n g e a s a n e w d im e n s io n in th e d e s ig n a n d
im p le m e n ta tio n o f d is trib u te d a p p lic a tio n s . In d e e d , m o b ile c o d e d e s ig n p a ra d ig m s , b a s e d
o n th e a b ility o f m o v in g c o d e a c r o s s th e n o d e s o f a n e tw o rk , h a v e b e e n re c e n tly
M .C . C a lz a r o s s a a n d S . T u c c i ( E d s .) : P e r f o r m a n c e 2 0 0 2 , L N C S 2 4 5 9 , p p . 3 4 6 − 3 7 3 , 2 0 0 2 .
© S p rin g e r-V e rla g B e rlin H e id e lb e rg 2 0 0 2
P e rfo rm a n c e V a lid a tio n o f M o b ile S o ftw a re A rc h ite c tu re s 3 4 7
in tr o d u c e d . B e s id e s , s o f tw a r e te c h n o lo g ie s a r e r e a d ily a v a ila b le ( e .g . J a v a - b a s e d ) , th a t
p ro v id e to o ls to im p le m e n t th e s e p a ra d ig m s , s o th a t b o th h a v e b e c o m e a c e n tra l p a rt o f
th e to o ls e t s u p p o rtin g th e d e s ig n o f a p p lic a tio n s fo r w id e a re a e n v iro n m e n ts
C o d e m o b ility , a s it is in te n d e d in th is p e rs p e c tiv e , s h o u ld n o t b e c o n fu s e d w ith th e
w e ll k n o w n c o n c e p t o f p r o c e s s m ig r a tio n , e v e n if th e a d o p te d m e c h a n is m s to
im p le m e n t th e m m a y b e s im ila r. P ro c e s s m ig ra tio n is a (d is trib u te d ) O S is s u e , re a liz e d
tra n s p a re n tly to th e a p p lic a tio n (u s u a lly to g e t lo a d b a la n c in g ), a n d h e n c e d o e s n o t
re p re s e n t a to o l in th e h a n d s o f th e a p p lic a tio n d e s ig n e r; o n th e c o n tra ry , c o d e m o b ility
is in te n d e d to b rin g th e a b ility o f c h a n g in g lo c a tio n u n d e r th e c o n tro l o f th e d e s ig n e r,
s o r e p r e s e n tin g a n e w to o l h e /s h e c a n e x p lo it to a c c o m p lis h q u a lity re q u ire m e n ts ,
la y in g th e fo u n d a tio n fo r a n e w g e n e ra tio n o f te c h n o lo g ie s , a rc h ite c tu re s , m o d e ls , a n d
a p p lic a tio n s .
U s in g m o b ile c o d e p a ra d ig m s a n d te c h n o lo g ie s , d iffe re n t b u t fu n c tio n a lly e q u iv a le n t
s o ftw a re a rc h ite c tu re s c a n b e d e s ig n e d a n d im p le m e n te d , a n d it is w id e ly re c o g n iz e d th a t,
in g e n e ra l, th e a d o p tio n o f a p a rtic u la r a rc h ite c tu re c a n h a v e a la rg e im p a c t o n q u a lity
a ttrib u te s o f a d is trib u te d a p p lic a tio n s u c h a s m o d ifia b ility , re u s a b ility , re lia b ility , a n d
p e rfo rm a n c e [3 3 ]. In p a rtic u la r, w ith re s p e c t to p e rfo rm a n c e , c o d e m o b ility o ffe rs to
a p p lic a tio n d e s ig n e rs n e w la titu d e s in u s in g th e s y s te m s re s o u rc e s . N o lo n g e r re m o te
re s o u rc e s m u s t b e a c c e s s e d re m o te ly ; in s te a d , (p a rt o f) th e a p p lic a tio n c a n m o v e to u s e
th e re s o u rc e s lo c a lly . U n d e r th e rig h t c irc u m s ta n c e s , th is c a n re d u c e b o th n e tw o rk tra ffic
a n d n e tw o rk p ro to c o l o v e rh e a d , s o re d u c in g th e to ta l a m o u n t o f w o rk d o n e b y th e
s y s te m , a n d im p ro v in g th e p e rfo rm a n c e o f th e e n tire s y s te m . O n th e o th e r h a n d , u n d e r
th e w r o n g c ir c u m s ta n c e s , th e e n tir e s y s te m s lo w s d o w n , e .g . b e c a u s e o f e x c e s s iv e
m ig ra tio n tra ffic , o r in c re a s e d lo a d a t a lre a d y c o n g e s te d n o d e s . H e n c e , v a lid a tio n o f
m o b ility -b a s e d a rc h ite c tu re s a g a in s t s p e c ific p e rfo rm a n c e a ttrib u te s is n e c e s s a ry , a n d
c a lls fo r a c a re fu l p la n n in g o f th is a c tiv ity .
T h e g o a l o f th is tu to r ia l is tw o f o ld : to p ro v id e a g e n e ra l m e th o d o lo g y fo r th e
v a lid a tio n o f s o ftw a re a rc h ite c tu re s , w h e re th e fo c u s is o n th e tra n s itio n fro m th e
m o d e lin g o f s o ftw a re a rc h ite c tu re s to th e v a lid a tio n o f n o n -fu n c tio n a l re q u ire m e n ts ; to
s h o w h o w th is g e n e ra l m e th o d o lo g y c a n b e s u b s ta n tia te d in th e s p e c ific c a s e o f s o ftw a re
a rc h ite c tu re s e x p lo itin g m o b ile c o d e . W e e m p h a s iz e th e fo rm e r p o in t in s e c tio n 2 ,
w h e re w e p ro v id e a ta x o n o m y o f th e p a ra m e te rs e v e ry a p p ro a c h to s o ftw a re a rc h ite c tu re
v a lid a tio n s h o u ld d e p e n d o n . T h e n , fo r th e la tte r p o in t, w e re v ie w a p p ro a c h e s fo r th e
v a lid a tio n o f m o b ile s o ftw a re a rc h ite c tu re s , p re s e n tin g th e m in th e fra m e w o rk o f th e
a b o v e m e n tio n e d ta x o n o m y . T o p ro v id e a b a s ic u n d e rs ta n d in g o f th e fe a tu re s a n d
p e rfo rm a n c e re la te d c o s ts o f d iffe re n t m o b ile c o d e s ty le s , w e firs t g iv e in s e c tio n 3 a
b a s ic ta x o n o m y o f th e s e s ty le s , a n d th e n , in s e c tio n 4 , w e s u rv e y m e th o d o lo g ie s fo r
p e rfo rm a n c e v a lid a tio n o f m o b ile s o ftw a re a rc h ite c tu re s . W e c la s s ify th e s e
m e th o d o lo g ie s a s a d -h o c a n d g e n e r a l-p u r p o s e m e th o d o lo g ie s . A d -h o c m e th o d o lo g ie s
c o n s id e r c o d e m o b ility “ in is o la tio n ” , la c k in g o f fe a tu re s to m o d e l a w h o le s o ftw a re
a p p lic a tio n . G e n e r a l- p u r p o s e m e th o d o lo g ie s o v e r c o m e th is lim ita tio n b y e m b e d d in g
c o d e m o b ility m o d e lin g in to s o m e fo rm a lis m fo r th e s p e c ific a tio n o f s o ftw a re
a p p lic a tio n s . F in a lly , s e c tio n 5 c o n c lu d e s th e p a p e r a n d p ro v id e s h in ts fo r fu tu re
re se a rc h .
1
N o te th a t a n a rc h ite c tu ra l s ty le is d e fin e d a s a s e t o f c o n s tru c tio n ru le s th a t a d e v e lo p e r h a s
to fo ll o w w h ile b u ild in g a s o ftw a re a rc h i te c tu re . D e p e n d in g o n th e s ty l e , th o s e ru le s c a n
s p re a d o v e r d iffe r e n t a s p e c ts , s u c h a s : ty p e s o f in te ra c tio n s a m o n g c o m p o n e n ts , ro le s o f
c o m p o n e n ts , ty p e s o f c o n n e c to rs , e tc . In o u r c a s e , w e fo c u s o n a rc h ite c tu ra l s ty le s d e fin e d
o n th e c a p a b ility o f c o m p o n e n ts to m o v e .
P e rfo rm a n c e V a lid a tio n o f M o b ile S o ftw a re A rc h ite c tu re s 3 4 9
2
U s u a lly th e m is s in g in fo rm a tio n a p p e a rs (in th e w h o le a p p ro a c h ) e ith e r a s a n n o ta tio n s o n
th e a v a ila b le s o ftw a re a rc h ite c tu re d e s c rip tio n o r a s a n in te g ra tio n o f th e d e s c rip tio n its e lf
(in th e la tte r c a s e , fo r e x a m p le , a s a n e x te n s io n o f a s o ftw a re c o n n e c to r).
3 5 0 V . G ra s s i, V . C o rte lle s s a , a n d R . M ira n d o la
o r i gin a l n o t a t i o n a r c h it e ct u r a l s ty l e n o n - f u n ti o n a l
(O N ) (A S ) a t rt i b u t e N ( F A )
m i s s i n g i n f o r m a t io n t a r g e t m o de l
(M I ) n o t a ti o n ( T M N )
c o l le c t i o n t e h c n i q u e s o l u ti o n t ec h n i q u e
( C T ) (S T )
F i g u r e 1 . G ra p h o f d e p e n d e n c ie s a m o n g p a ra m e te rs fo r N F R v a lid a tio n
3 M o b ile C o d e P a r a d ig m s
T h e d e f in itio n o f th e s o ftw a r e a r c h ite c tu r e o f a n a p p lic a tio n , th a t is its c o a r s e - g r a in e d
o rg a n iz a tio n in te rm s o f c o m p o n e n ts a n d in te ra c tio n s a m o n g th e m , re p re s e n ts o n e o f
th e firs t a n d c ru c ia l s te p s in th e d e s ig n s ta g e [5 ]. S o ftw a re a rc h ite c tu re s c a n b e c la s s ifie d
a c c o rd in g to th e a d o p te d d e s ig n p a r a d ig m s ( o r a r c h ite c tu r a l s ty le s ) , e a c h o f th e m
c h a ra c te riz e d b y s p e c ific a rc h ite c tu ra l a b s tra c tio n s /ru le s a n d re fe re n c e s tru c tu re s , th a t c a n
th e n b e in s ta n tia te d in to a c tu a l a rc h ite c tu re s . C lie n t-s e rv e r is a tra d itio n a l e x a m p le o f
d e s ig n s ty le . In th is p e rs p e c tiv e , d iffe re n t m o b ile c o d e s ty le s c a n b e id e n tifie d , e a c h
c h a r a c te r iz e d b y d if f e r e n t in te r a c tio n p a tte r n s a m o n g c o m p o n e n ts lo c a te d a t d iffe re n t
s ite s , a n d th e a v a ila b le te c h n o lo g ie s fo r c o d e m o b ility p ro v id e s th e m e c h a n is m s to
in s ta n tia te th e m . A re v ie w o f th e s e te c h n o lo g ie s is o u t o f th e s c o p e o f th is p a p e r (s e e
[1 3 , 1 7 ] fo r a re v ie w o f n o ta b le e x a m p le s o f th e m ), w h e re a s in th is s e c tio n w e p re s e n t a
ta x o n o m y o f m o b ile c o d e s ty le s , a im e d a t p ro v id in g a b a s ic u n d e rs ta n d in g o f th e ir
fe a tu re s a n d p e rfo rm a n c e re la te d c o s ts , th a t w ill b e e x p lo ite d in th e s u b s e q u e n t
p re s e n ta tio n o f p e rfo rm a n c e v a lid a tio n m e th o d o lo g ie s fo r m o b ile a rc h ite c tu re s 3.
T h e ta x o n o m y is la rg e ly in s p ire d to th e o n e s p re s e n te d in [1 3 , 2 9 ], a n d is b a s e d o n
th e d e c o m p o s itio n o f d is trib u te d a p p lic a tio n s in to c o d e c o m p o n e n ts (th e k n o w -h o w to
p e rfo rm a c o m p u ta tio n ), r e s o u r c e s c o m p o n e n ts (re fe re n c e s to re s o u rc e s n e e d e d to
p e rfo rm a c o m p u ta tio n ), s ta te c o m p o n e n ts (c o m p ris in g p riv a te d a ta a s w e ll a s c o n tro l
in fo rm a tio n th a t id e n tify a th re a d o f e x e c u tio n , s u c h a s th e c a ll s ta c k a n d in s tru c tio n
p o in te r), in te r a c tio n s ( e v e n ts in v o lv in g tw o o r m o r e c o m p o n e n ts , lik e e x c h a n g in g a
m e s s a g e ), s ite s (lo c a tio n s w h e re p ro c e s s in g ta k e s p la c e ).
A ll th e b a s ic m o b ile c o d e s ty le s in c lu d e d in th is ta x o n o m y c o n s id e r a s in g le
in te ra c tio n b e tw e e n c o m p o n e n ts re s id in g a t tw o d iffe re n t s ite s , a im e d a t c a rry in g o u t a
g iv e n o p e ra tio n . T h e y d iffe r in th e d is trib u tio n o f c o m p o n e n ts a t th e tw o s ite s a t th e
b e g in n in g o f th e in te ra c tio n , in th e in te ra c tio n p a tte rn , a n d in th e d is trib u tio n o f
c o m p o n e n ts a t th e e n d o f th e in te ra c tio n , a s s h o w n in ta b le 1 , w h e re A a n d B d e n o te
th e c o m p o n e n ts th a t p a rtic ip a te in th e in te ra c tio n , C a n d R d e n o te th e c o d e a n d
3
O f c o u rs e , k n o w le d g e o f th e c h a ra c te ris tic s o f a p a rtic u la r m o b ile c o d e te c h n o lo g y w o u ld b e
n e c e s s a ry to fin e ly tu n e , in th e la te p h a s e s o f th e d e v e lo p m e n t c y c le , th e p e rfo rm a n c e m o d e l
o f m o b ile s o ftw a re a rc h ite c tu re .
3 5 2 V . G ra s s i, V . C o rte lle s s a , a n d R . M ira n d o la
4
W e p re fe r to c a ll th is s ty le “ re m o te e x e c u tio n (R E X )” a s in [2 9 ] in s te a d o f th e o fte n u s e d
d e n o m in a tio n “ re m o te e v a lu a tio n (R E V )” to a v o id a m b ig u ity w ith th e R E V s c h e m e p ro p o s e d
in [3 4 ], w h ic h is a p a rtic u la r m e c h a n is m th a t im p le m e n ts th is s ty le .
P e rfo rm a n c e V a lid a tio n o f M o b ile S o ftw a re A rc h ite c tu re s 3 5 3
4 .1 A d - h o c M o d e ls
A d -h o c m o d e ls c o n s id e r c o d e m o b ility “ in is o la tio n ” , p ro v id in g c o s t m o d e ls fo r a s in g le
in te ra c tio n b e tw e e n c o m p o n e n ts , fo r d iffe re n t m o b ile c o d e s ty le s . F ro m o u r v a lid a tio n
f r a m e w o r k v ie w p o in t, th e y c o n s id e r N F A ∈ { to ta l n e tw o rk lo a d , to ta l p ro c e s s in g
tim e } , w h ile th e a d o p te d T M N c o n s is ts o f e ith e r c lo s e d -fo rm a n a ly tic e x p re s s io n s , o r
d y n a m ic m o d e ls th a t c a n b e n u m e ric a lly e v a lu a te d . B e c a u s e o f th e la c k o f fe a tu re s to
m o d e l a w h o le a p p lic a tio n , a d -h o c m o d e ls c a n n o t b e c o n s id e re d a s g e n e ra l v a lid a tio n
m e th o d o lo g ie s . H o w e v e r, th e y c o n trib u te to c la rify th e d e p e n d e n c ie s b e tw e e n (A S ,
N F A ) a n d M I, w h e n A S s p re a d s o v e r m o b ility -b a s e d s ty le s , b y g iv in g in s ig h ts a b o u t
th e q u a n titie s th a t a ffe c t th e s e le c te d N F A s , a n d h e n c e a b o u t th e in fo rm a tio n th a t s h o u ld
b e c o lle c te d in a n y v a lid a tio n m e th o d o lo g y fo r th e s e a ttrib u te s .
W e re v ie w a d -h o c m o d e ls p ro p o s e d in th e lite ra tu re 5, p r e s e n tin g a ll o f th e m in a
u n ifie d s c e n a rio , c o n s is tin g o f a s in g le “ in te ra c tio n s e s s io n ” b e tw e e n tw o p a rtn e rs (A
a n d B ) re s id in g a t d iffe re n t lo c a tio n s , w ith A re q u e s tin g to B th e e x e c u tio n o f a n
o p e ra tio n th a t c a n b e a rtic u la te d in N “ lo w le v e l” re q u e s ts , a n d c o rre s p o n d in g
(in te rm e d ia te ) re s u lts .
C lo s e d - F o r m M o d e ls . W e p re s e n t a ll th e m o d e ls re v ie w e d in th is s e c tio n a s
s p e c ia l in s ta n c e s o f g e n e ra l c lo s e d -fo rm e x p re s s io n s fo r th e a v e ra g e to ta l n e tw o rk lo a d ,
X X
a n d th e a v e ra g e to ta l p r o c e s s in g tim e , d e n o te d a s L a n d T re s p e c tiv e ly , w ith
X ∈ { R E X , C O D , M A } .
C o m m o n p a ra m e te rs u s e d in a ll th e c lo s e d -fo rm s (h e n c e re p re s e n tin g th e m is s in g
in fo rm a tio n M I fo r th e c o n s id e re d m e a s u re s ) a re :
re q : a v e ra g e s iz e (in b y te s ) o f a “ lo w le v e l” o p e ra tio n re q u e st
re p : a v e ra g e s iz e (in b y te s ) o f a “ lo w le v e l” re s u lt
X 6 X
: c o m m u n ic a tio n o v e rh e a d ; B r e q : a v e ra g e s iz e o f a s in g le re q u e s t;
5
M o s t o f th e p a p e rs c o n s id e re d in th is s e c tio n a ls o p re s e n t m o d e ls fo r th e c lie n t-s e rv e r
s ty le , fo r th e s a k e o f c o m p a r is o n w i th c o d e m o b ility s ty le s .
6
T h is c o e ffic ie n t ta k e s in to a c c o u n t th e o v e rh e a d c a u s e d b y a d d itio n a l in fo rm a tio n n e e d e d
X
fo r c o n n e c tio n s e t u p a n d m e s s a g e e n c a p s u la tio n ; in g e n e ra l, th e c o e ffic ie n t, X ∈ { C O D ,
3 5 4 V . G ra s s i, V . C o rte lle s s a , a n d R . M ira n d o la
X
B r e p : a v e ra g e s iz e o f a s in g le re p ly ;
X
: n e tw o rk th ro u g h p u t (in b y te s /s e c ); : a v e ra g e n e tw o rk la te n c y ;
X
M : a v e ra g e m a rs h a llin g /u n m a rs h a llin g tim e o f a re q u e s t/re p ly ;
T r Xe q : a v e r a g e p r o c e s s i n g t i m e ( f o r A ) o f a r e q u e s t ;
X
T r e p : a v e ra g e p ro c e s s in g tim e (fo r B ) o f a re p ly ;
X X
: s e m a n tic c o m p re s s io n fa c to r fo r re p lie s (0 < ≤ 1 ).
O th e r p a ra m e te rs , u s e d o n ly in s o m e c lo s e d fo rm s , a re lis te d in th e fo llo w in g .
R E X s ty le . A a sse m b le s th e o rig in a l re q u e s ts in to le s s th a n N “ h ig h -le v e l” o p e ra tio n s
re q u e s ts ( a t m o s t, th e y a re a ll a s s e m b le d in a s in g le o p e ra tio n ), s e n d s th e m to g e th e r
w ith th e c o rre s p o n d in g c o d e to B , a n d g e t s th e c o rre s p o n d in g re p lie s . C lo s e d -fo rm
e x p r e s s io n s fo r th e n e tw o rk lo a d a n d th e p r o c e s s in g tim e a re :
L
R E X
= R
R E X R E X
( B c R o E d Xe + R E X
B re p
R E X
)
T
R E X
=
1
L R E X + R R E X ( R E X R E X
+ M R E X + T re q
R E X
+ T re p )
R E X
w h e re : R : n u m b e r o f “ h ig h le v e l” o p e ra tio n re q u e s ts n e e d e d to c o m p le te th e
in te ra c tio n
B c R o E d Xe : a v e ra g e s iz e o f th e c o d e o f a h ig h le v e l o p e ra tio n s e n t to B fo r re m o te
e v a lu a tio n
T a b l e 2 . P ro p o s e d p a ra m e te rs in s ta n tia tio n s in c lo s e d -fo rm s fo r th e R E X s ty le
R E X R E X R E X R E X R E X
R R E X R E X M R E X R E X
B c o d e B r e p T r e q T r e p
[2 ] 1 > 1 > 0 N re p 1 , 1 /N - - - -
[2 1 ]
7 ≥ 1 , < N > 1 > 0 > 0 1 0 > 0 > 0 > 0
R E X R E X
( r e q ·N /R ) (r e p ·N /R )
L C O D
= R
C O D C O D
( B Cr e O q D + P c C o Od e D ( B C f e O t c D h + B c C o O d eD ) + C O D C O D
B re p )
T
C O D
=
1
L C O D + R C O D ( C O D
+ M C O D + T re q
C O D
+ T re p
C O D
)
R E X , M A } m a y b e d e p e n d e n t o n th e s iz e o f th e d a ta e x c h a n g e d in th e c o m m u n ic a tio n , th a t is
X X
th e o v e rh e a d c o e ffic ie n t fo r d a ta o f s iz e Y is : = (Y ).
7
T h e a u th o rs in [2 1 ] c a lls th e R E X s ty le a s “ s ta tio n a ry a g e n t a c c e s s ” (S A ) s ty le .
P e rfo rm a n c e V a lid a tio n o f M o b ile S o ftw a re A rc h ite c tu re s 3 5 5
w h e re :
C O D
R : a v e ra g e n u m b e r o f “ h ig h le v e l” o p e ra tio n s n e e d e d to c o m p le te th e in te ra c tio n ;
P c C o Od e D : p ro b a b ility th a t th e c o d e fo r a h ig h le v e l o p e ra tio n is n o t a lre a d y p re s e n t a t
th e lo c a tio n o f B ;
B Cf e O t c D h : a v e ra g e s iz e o f th e re q u e s t fo r th e c o d e o f a h ig h le v e l o p e ra tio n s e n t b y B ;
B c C o O d eD : a v e ra g e s iz e o f th e c o d e o f a h ig h le v e l o p e ra tio n .
R
C O D C O D
C O D C O D C O D C O D C O D C O D C O D
M C O D C O D C O D
B r e q P c o d e B c o d e B fe tc h B re p T re q T re q
W e p o in t o u t th a t th e m o d e l fo r th e p ro c e s s in g tim e o f th e C O D s ty le h a s b e e n
e x tra p o la te d fro m th e m o d e ls fo r o th e r s ty le s , s in c e n o e x p lic it m o d e l fo r th e p ro c e s s in g
tim e o f th is s ty le is p re s e n t in th e lite ra tu re . F o r th is re a s o n n o s p e c ific in s ta n tia tio n o f
p a ra m e te rs in th e la tte r fo u r c o lu m n s o f ta b le 3 is g iv e n .
M A s ty le . A m o v e s to th e B s ite , to in te ra c t lo c a lly w ith B . T h e n , it c a n g o b a c k t o
th e s ta rtin g s ite , o r m o v e to s o m e o th e r s ite , c a rry in g w ith it th e in fo rm a tio n
a c c u m u la te d a t th e B s ite ; in th e la tte r c a s e , it c a n o p tio n a lly s e n d b a c k th e c o lle c te d
in fo rm a tio n to th e s ta rtin g s ite . C lo s e d -fo rm e x p re s s io n s fo r th e n e tw o r k lo a d a n d th e
p ro c e s s in g tim e a re :
L M A
= M A
(( P M A
c o d e + b a c k c o d e ) B c M o dA e
+ (1 + b a c k (
) B M A
s ta te + B d M a At a ) + ( 1 − b a c k ) re p
M A
B M A
re p )
M A 1
L M A + M A
+ M M A + T re q + T re p
T = M A M A
M A
w h e re : P c o d e : p ro b a b ility th a t th e c o d e o f th e m o b ile a g e n t is n o t a lre a d y p re s e n t a t
th e lo c a tio n o f B ;
B c M o dA e : a v e r a g e s i z e o f t h e m o b i l e a g e n t c o d e ;
M A
B s ta te : a v e ra g e s iz e o f th e m o b ile a g e n t e x e c u tio n s ta te ;
B d M a tA a : a v e ra g e s iz e o f th e m o b ile a g e n t d a ta (b e fo re th e in te ra c tio n s ta rts );
⎧ 1 if th e a g e n t g o e s b a c k to th e s ta rtin g lo c a tio
b a c k = ⎨ ;
⎩ 0 o th e rw is e
3 5 6 V . G ra s s i, V . C o rte lle s s a , a n d R . M ira n d o la
8 ⎧ 1 if th e a g e n t c o d e is n o t re ta in e d a t th e re tu rn lo c a tio
c o d e = ⎨ ;
⎩ 0 o th e rw is e
⎧ 1 if a “ h ig h le v e l” re p ly is s e n t to th e s ta rtin g lo c a tio
re p = ⎨ .
⎩ 0 o th e rw is e
[3 5 ] [9 ] [2 ] [2 2 ] [2 1 ]
M A 1 1 > 1 > 1 > 1
P
M A ≥ 0 , ≤ 1 ≥ 0 , 1 1 1
c o d e
≤ 1
B
M A > 0 (≥ N ·r e q ) > 0 > 0 > 0 > 0
c o d e
B
M A > 0 > 0 > 0 > 0 > 0
s ta te
B d a ta
M A ≥ 0 ≥ 0 ≥ 0 ≥ 0 0
M A > 0 , ≤ 1 1 1 , 1 /N > 0 , ≤ 1 1
M A N ·r e p N ·r e p N ·r e p N ·r e p > 0
B re p
b a c k 0 0 , 1 0 0 1
c o d e - 1 - - 0
re p 0 , 1 0 0 , 1 1 1
9
M A (1 + re p )δ - - 0 0
M A M A M A M A 1 0 - - > 0 > 0
M 2 μ ( B d a ta + B s ta te + re p B r e p )
M A 0 - - 0 > 0
T re q
M A 0 - - > 0 > 0
T re p
I n a ll th e c o n s id e re d m o d e ls fo r th e M A s ty le (w ith th e e x c e p tio n o f [2 2 ], w h e re it
is u n sp e c ifie d ) it is a s s u m e d th a t, a fte r th e c o m p le tio n o f th e in te ra c tio n , th e a g e n t d a ta
o l l o w s : B d M a tA a ← B d M a tA a + M A
g ro w a s f B r Me p A . I n t h i s w a y it is m o d e le d th e (p o s s ib le )
a c c u m u la tio n o f in fo rm a tio n c o lle c te d b y th e m o b ile a g e n t a s it v is its n e w s ite s .
W ith re g a rd to th e p a ra m e te rs in s ta n tia tio n s s h o w n in ta b le 4 , it s h o u ld b e n o te d
th a t th e m a rs h a llin g /u n m a rs h a llin g o v e rh e a d o f [3 5 ] is c a lc u la te d u n d e r th e a s s u m p tio n
th a t th e a g e n t c o d e is a lre a d y a v a ila b le in tra n s p o rt fo rm a t. [2 2 ] a n a ly z e s a b ro a d c a s t
8
If th e a g e n t g o e s b a c k to th e s ta rtin g A lo c a tio n , th e n c o d e = 0 m e a n s th a t o n l y its d a ta
(th e o rig in a l d a ta p lu s th e o n e s c o lle c te d a t B lo c a tio n ) a n d e x e c u tio n s ta te a c tu a lly g o b a c k ,
s in c e a c o p y o f th e ( im m u ta b le ) c o d e h a s b e e n re ta in e d th e re ; th e te rm (1 - b a c k ) th a t
m u ltip lie s re p m e a n s th a t o n ly if th e a g e n t d o e s n o t g o b a c k to th e s ta rtin g lo c a tio n , a
re p ly c o u ld b e s e n t th e re .
9
δ d e n o te s th e a v e r a g e r o u n d tr ip tim e ( in s e c s .) .
1 0
μ > 0 re p re s e n ts a m a rs h a llin g /u n m a rs h a llin g fa c to r (in s e c s /b y te ); th is fa c to r is m u ltip lie d
b y 2 to ta k e in to a c c o u n t b o th m a rs h a llin g a n d u n m a rs h a llin g o f a m e s s a g e .
P e rfo rm a n c e V a lid a tio n o f M o b ile S o ftw a re A rc h ite c tu re s 3 5 7
4 .2 G e n e r a l- P u r p o s e F o r m a l M o d e ls : P r o c e s s A lg e b r a s
T h e v a lid a tio n m e th o d o lo g ie s c o n s id e re d in th is s e c tio n a re b a s e d o n th e s e le c tio n o f
O N = { P ro c e s s A lg e b ra s } , a n d d o n o t fo c u s o n s p e c ific N F A s . H e n c e , th e a d o p te d
T M N is g e n e ra l e n o u g h to a llo w th e e v a lu a tio n o f d iffe re n t N F A s , a n d c o n s is ts o f
T M N = { S to c h a s tic P ro c e s s A lg e b ra s + a s s o c ia te d M a rk o v P ro c e s s e s } ; c o rre s p o n d in g ly ,
p o s s ib le S T s c o n s is t o f a n y s o lu tio n te c h n iq u e s u ita b le fo r th is T M N . F ro m th e s e
c h o ic e s it re s u lts th a t M I in c lu d e s a t le a s t th e (e x p o n e n tia l) c o m p le tio n ra te s o f a ll th e
a c tiv itie s th a t a re m o d e le d in th e a d o p te d P ro c e s s A lg e b ra .
P ro c e s s a lg e b ra s a re w e ll-k n o w n fo rm a lis m s fo r th e m o d e llin g a n d a n a ly s is o f p a ra lle l
a n d d is trib u te d s y s te m s . W h a t m a k e s th e m a ttra c tiv e a s O N fo r th e e v a lu a tio n o f la rg e
a n d c o m p le x s y s te m s , a re m a in ly th e ir c o m p o s itio n a l a n d a b s tra c tio n fe a tu re s , th a t
fa c ilita te b u ild in g c o m p le x s y s te m m o d e ls fro m s m a lle r o n e s . M o re o v e r, th e y a re
e q u ip p e d w ith a fo rm a l s e m a n tic s , th a t a llo w s a n o n a m b ig u o u s s y s te m s p e c ific a tio n ,
a n d a c a lc u lu s th a t a llo w s to p ro v e rig o ro u s ly w h e th e r s o m e fu n c tio n a l p ro p e rtie s h o ld .
S to c h a s tic p ro c e s s a lg e b ra s a re a n e x te n s io n o f th e s e fo rm a lis m s w ith s to c h a s tic
fe a tu re s fo r th e s p e c ific a tio n o f s y s te m a c tiv itie s d u ra tio n , th a t a llo w th e a n a ly s is o f
q u a n tita tiv e n o n -fu n c tio n a l p ro p e rtie s . W e d e fe r to th e v a s t a v a ila b le lite ra tu re fo r
d e ta ils a b o u t th e g e n e r a l c h a r a c te r is tic s o f th e s e f o r m a lis m s ( e .g ., [ 1 4 ] ) , a n d f o c u s in
th is s e c tio n o n p ro c e s s a lg e b ra s fo r th e m o d e lin g o f m o b ile s o ftw a re a rc h ite c tu re s . W e
3 5 8 V . G ra s s i, V . C o rte lle s s a , a n d R . M ira n d o la
f r o m t h e s e d e f i n i t i o n s w e g e t t h a t P || Q e v o l v e s i n t o P 1 || Q 1 , t h a t i s , p r o c e s s e s P a n d
Q s y n c h r o n iz e ( i.e ., w a it f o r e a c h o th e r ) th a n k s to a c o m m u n ic a tio n a lo n g lin k a , a n d
1 1
N o te th a t, fo r th e s a k e o f s im p lic ity , th is s y n ta x is in c o m p le te , s in c e w e a re o m ittin g
c o n s tru c ts to d e fin e a b s tra c tio n m e c h a n is m s , o r re c u rs iv e b e h a v io r, e tc .
1 2
S u b s c rip t i is u s e d to d is tin g u is h d iffe re n t in te rn a l a c tio n s , w h ic h is u s e fu l fo r m o d e lin g
p u rp o s e s .
P e rfo rm a n c e V a lid a tio n o f M o b ile S o ftw a re A rc h ite c tu re s 3 5 9
th e n p ro s e c u te in p a ra lle l (p o s s ib ly in d e p e n d e n tly o f e a c h o th e r, if n o o th e r
s y n c h ro n iz in g c o m m u n ic a tio n ta k e s p la c e in th e ir fo llo w in g b e h a v io r).
-c a lc u lu s [2 4 ]. T h is a lg e b ra , b e s id e s s y n c h ro n iz a tio n b e tw e e n p a ra lle l p ro c e sse s,
a llo w s a ls o lin k n a m e s c o m m u n ic a tio n , s o th a t w e c a n c h a n g e th e lin k s a p ro c e ss u se s
to c o m m u n ic a te w ith o th e r p ro c e s s e s . T h e p o s s ib le s y s te m a c tio n s a re ∈ { i (i = 1 ,
2 , … ), in x , o u t x , in x (y ), o u t x (Y )} , w h e re in a d d itio n to th e a b o v e d e f in itio n s , Y (y )
is a “ lin k n a m e ” (lin k v a ria b le ), s e n t (re c e iv e d ) o v e r th e lin k n a m e d x . F o r e x a m p le ,
w ith th e fo llo w in g s p e c if ic a tio n s :
P 1 := o u t a (b ).P 3 P 2 := o u t a (c ).P 3 Q := i n a (y ).o u t y .Q 1
w e g e t t h a t P 1 || Q e v o l v e s i n t o P 3 || o u t b . Q 1 , w h i l e P 2 || Q e v o l v e s i n t o P 3 || o u t c . Q 1 .
In th is e x a m p le , th e p a ra lle l c o m p o s itio n o f Q w ith P 1 o r P 2 g iv e s ris e to th e e v o lu tio n
o f Q in to a p ro c e s s th a t c o m m u n ic a te s a lo n g th e lin k b o r c , re s p e c tiv e ly , a n d th e n
b e h a v e s a s p ro c e ss Q 1 .
H O -c a lc u lu s [ 3 0 ] . B e s id e s th e o p e r a tio n s o f -c a lc u lu s , th is a lg e b r a a llo w s a ls o th e
c o m m u n ic a tio n o f p ro c e s s n a m e s , s o th a t w e c a n c h a n g e th e b e h a v io r o f th e re c e iv in g
p ro c e s s . T h e p o s s ib le s y s te m a c tio n s a re a g a in ∈ { i ( i = 1 , 2 , … ) , in x , o u t x ,
i n x (y ), o u t x (Y ))} , b u t, in a d d itio n to th e a b o v e d e f in itio n s , Y ( y ) m a y a ls o b e a
“ p ro c e s s n a m e ” (p ro c e s s v a ria b le ) b e s id e s a lin k n a m e (v a ria b le ), s e n t ( re c e iv e d ) o v e r th e
lin k n a m e d x . F o r e x a m p le , w ith th e fo llo w in g s p e c ific a tio n s :
P 1 := o u t a (R ).P 3 P 2 := o u t a (S ).P 3 Q := i n a (z ).z .Q 1
w e g e t t h a t P 1 || Q e v o l v e s i n t o P 3 || R . Q 1 , w h i l e P 2 || Q e v o l v e s i n t o P 3 || S . Q 1 . I n
o th e r w o rd s , th e p a ra lle l c o m p o s itio n o f Q w ith P 1 o r P 2 g iv e s ris e to th e e v o lu tio n o f
Q in to a p ro c e s s th a t b e h a v e s lik e p ro c e s s R o r S , re s p e c tiv e ly , a n d th e n a s p ro c e s s Q 1.
E x a m p le 2 13. L e t u s c o n s id e r th e s y s te m o f e x a m p le 1 in th e c a s e o f K = 2 fly in g
c o m p a n ie s , w ith F i a n d a i ( i= 1 , 2 ) d e n o tin g a c o m p a n y a n d th e c h a n n e l u s e d to
c o m m u n ic a te w ith it, C d e n o tin g th e o v e ra ll c o d e c o rre s p o n d in g to th e N “ lo w le v e l”
in te ra c tio n s , a n d R i th e o v e ra ll re s p o n s e c o lle c te d a t c o m p a n y F i. U s in g H O π - c a lc u lu s ,
th is a p p lic a tio n c o u ld b e m o d e le d a s fo llo w s , in c a s e o f R E X p a ra d ig m (w h e re S y s
m o d e ls th e o v e ra ll a p p lic a tio n ):
T ra v A g = o u t a 1 ( C ) .i n a 1 ( x ) .o u t a 2 ( C ) .i n a 2 ( x ) .T r a v A g
F i = i n a i ( z ) . z .o u t a i ( R i ) . F i
S y s = T r a v A g || F 1 || F 2
E n d O fE x a m p le 2 .
M o d e ls w it h “ D ir e c t ” L o c a t io n S p e c if ic a t io n . T h e a b o v e a p p ro a c h e s su g g e st
a s O N fo r th e m o d e lin g o f m o b ile a rc h ite c tu re s a p ro c e s s a lg e b ra w h e re th e lo c a tio n o f
a p ro c e s s is in d ire c tly d e fin e d in te rm s o f its c o n n e c tiv ity , i.e . th e lin k n a m e s it s e e s
a n d th e id e n tity o f th e p ro c e s s e s it c a n c o m m u n ic a te w ith a t a g iv e n in s ta n t o f tim e
u s in g th o s e lin k s ; h e n c e , th e lo c a tio n o f a p ro c e s s c a n b e c h a n g e d b y c h a n g in g th e
lin k s it s e e s (b y s e n d in g it n e w lin k n a m e s , a s in t h e π -c a lc u lu s , o r b y s e n d in g th e
p r o c e s s its e lf , i.e ., its n a m e , a s in th e H O π -c a lc u lu s to a re c e iv in g p ro c e s s th a t h a s a
1 3
A d a p te d fro m [2 5 ].
3 6 0 V . G ra s s i, V . C o rte lle s s a , a n d R . M ira n d o la
A m b ie n t c a lc u lu s [7 ]. In th is fo rm a lis m th e c o n c e p t o f a m b ie n t is a d d e d to th e b a s ic
c o n s tru c ts fo r p ro c e s s e s d e fin itio n a n d c o m p o s itio n d e s c rib e d a b o v e . A n a m b ie n t h a s a
n a m e th a t s p e c if ie s its id e n tity , a n d c a n b e th o u g h t o f a s a s o rt o f b o u n d a ry th a t
e n c lo s e s a s e t o f ru n n in g p ro c e s s e s . A m b ie n ts , d e n o te d a s n [P ], w h e re n is th e a m b ie n t
n a m e a n d P is th e e n c lo s e d p r o c e s s , c a n b e e n te r e d , e x ite d o r o p e n e d ( i.e ., d is s o lv e d ) b y
a p p ro p ria te o p e ra tio n s e x e c u te d b y a p ro c e s s , s o a llo w in g to m o d e l m o v e m e n t a s th e
c ro s s in g o f a m b ie n t b o u n d a rie s . A m b ie n ts a re h ie ra rc h ic a lly n e s te d , a n d a p ro c e s s c a n
o n ly e n te r a n a m b ie n t w h ic h is s ib lin g o f its a m b ie n t in th e h ie ra rc h y , a n d c a n e x it
o n ly in to th e p a re n t a m b ie n t o f its a m b ie n t; h e n c e , m o v in g to a “ fa r” a m b ie n t in th e
a m b ie n ts h ie ra rc h y re q u ire s , in th is fo rm a lis m , th e e x p lic it c ro s s in g o f m u ltip le
a m b ie n ts . T h e m o b ility o p e r a tio n s f o r a n a m b ie n t n [ .] a r e d e n o te d b y i n a m b n ,
o u t a m b n , o p e n n , re s p e c tiv e ly 14. I n g e n e r a l, a p r o c e s s c a n n o t f o r g e th e m b y its e lf ,
b u t r e c e iv e s th e m th a n k s to th e c o m m u n ic a tio n o p e ra tio n s in a n d o u t . H e n c e , a
p ro c e s s re c e iv in g o n e o f s u c h o p e ra tio n s th ro u g h a c o m m u n ic a tio n a c tu a lly re c e iv e s a
c a p a b ility fo r it, b e in g a llo w e d to e x e c u te th e c o rre s p o n d in g o p e ra tio n o n th e n a m e d
a m b ie n t. T h e (p a rtia l) fo rm a l s y n ta x o f th is a lg e b ra is th e n a s fo llo w s :
P : : = 0 ⏐ π . P ⏐ P + P ⏐ P || P ⏐ n [ P ] ⏐ …
w ith ∈ { i (i = 1 , 2 , … ), in (x ), o u t (M ), in a m b n , o u ta m b n , o p e n n } , w h e re x is
a v a ria b le a n d M s ta n d s fo r e ith e r a n a m b ie n t n a m e (n ) , o r a c a p a b ility fo r a n a m b ie n t
(e ith e r in a m b n , o r o u t a m b n , o r o p e n n ) . C o m m u n ic a tio n is r e s tric te d to b e lo c a l,
i.e . o n ly b e tw e e n p r o c e s s e s e n c lo s e d in th e s a m e a m b ie n t. C o m m u n ic a tio n b e tw e e n
n o n lo c a l p ro c e s s e s re q u ire s th e d e fin itio n o f s o m e s o rt o f “ m e s s e n g e r” a g e n t th a t
e x p lic itly c ro s s e s th e re q u ire d a m b ie n t b o u n d a rie s b rin g in g w ith its e lf th e in fo rm a tio n
to b e c o m m u n ic a te d . A lte rn a tiv e ly , a p ro c e s s c a n m o v e its e lf to th e a m b ie n t o f its
p a rtn e r b e fo re (lo c a lly ) c o m m u n ic a tin g w ith it. In b o th c a s e s , th e m e s s e n g e r o r th e
m o v in g p ro c e s s m u s t p o s s e s s th e n e e d e d c a p a b ilitie s .
K L A I M (K e r n e l L a n g u a g e fo r A g e n ts In te r a c tio n a n d M o b ility ) [1 1 ]. T h is fo rm a lis m
a llo w s to d e fin e a n e t o f lo c a tio n s th a t a re b a s ic a lly n o t n e s te d in to e a c h o th e r, w ith
d ire c t c o m m u n ic a tio n p o s s ib le , in p rin c ip le , b e tw e e n p ro c e s s e s lo c a te d a t a n y lo c a tio n ,
d iffe re n tly fro m th e a m b ie n t c a lc u lu s (a n y w a y , th e e x te n s io n to n e s te d lo c a tio n is
p o s s ib le ). A n o th e r re m a rk a b le d iffe re n c e w ith th e a m b ie n t c a lc u lu s , a n d w ith a ll th e
p re v io u s ly m e n tio n e d a lg e b ra s , c o n s is ts o f th e a d o p tio n o f a g e n e r a tiv e ( r a th e r th a n
m e s s a g e p a s s in g ) s ty le o f c o m m u n ic a tio n , b a s e d o n th e u s e o f tu p le sp a c e s a n d th e
1 4
N o te th a t th e a m b ie n t o p e ra tio n s a re n a m e d i n , o u t a n d o p e n in th e o rig in a l p a p e r [7 ];
w e h a v e re n a m e d th e m to a v o id c o n fu s io n w ith th e n a m e s u s e d in th is p a p e r to d e n o te th e
m e s a g e p a s s in g c o m m u n ic a tio n o p e ra tio n s .
P e rfo rm a n c e V a lid a tio n o f M o b ile S o ftw a re A rc h ite c tu re s 3 6 1
1 5
N o te th a t th e tu p le o p e ra tio n s a re n a m e d i n , o u t , r e a d a n d e v a l in th e o rig in a l p a p e r
[1 1 ]; w e h a v e re n a m e d th e m to a v o id c o n fu s io n w ith th e n a m e s u s e d in th is p a p e r to d e n o te
th e m e s s a g e p a s s in g c o m m u n ic a tio n o p e ra tio n s .
3 6 2 V . G ra s s i, V . C o rte lle s s a , a n d R . M ira n d o la
0 1 1 2 3 4 1 5
2 2
E n d O fE x a m p le 3
In g e n e ra l, m e th o d o lo g ie s fo r th e tra n s la tio n o f p ro c e s s a lg e b ra s in to a T M N
s u ita b le fo r th e a n a ly s is o f N F A s a re b a s e d o n th e a s s o c ia tio n o f a s to c h a s tic d u ra tio n ,
th a t h e n c e re p re s e n t a M I fo r th e s e m e th o d o lo g ie s , to th e a c tiv itie s s p e c ifie d in th e
p ro c e s s a lg e b ra , s o o b ta in in g , a s a firs t s te p , a s to c h a s tic p ro c e s s a lg e b ra th a t, in o u r
fra m e w o rk , c a n b e c o n s id e re d a s a n in te rm e d ia te n o ta tio n to w a rd th e fin a l T M N . T h e n ,
s ta rtin g fro m a s to c h a s tic p ro c e s s a lg e b ra m o d e l, a s to c h a s tic d u ra tio n c a n b e a s s o c ia te d
to e a c h la b e l in th e la b e le d tra n s itio n s y s te m th a t re p re s e n ts th e o p e ra tio n a l s e m a n tic o f
th e o rig in a l m o d e l. If th is d u ra tio n is e x p o n e n tia lly d is trib u te d (h e n c e e x p re s s e d b y a n
e x p o n e n tia l ra te ), th e n w e g e t a c o n tin u o u s tim e M a rk o v c h a in . A fu n d a m e n ta l p ro b le m
to m a k e p ra c tic a lly u s a b le th e s e a p p ro a c h e s is h o w g iv in g a m e a n in g fu l v a lu e to th e
e x p o n e n tia l ra te s a s s o c ia te d to th e tra n s itio n s , s in c e , in a re a lis tic s y s te m , th e ir n u m b e r
is v e ry h ig h . T h e id e a o f [2 5 ], is to a s s o c ia te to e a c h tra n s itio n a la b e l th a t d o e s n o t
m e r e ly r e g is te r th e a c tio n a s s o c ia te d to th a t tr a n s itio n ( e .g ., 1, a s in e x a m p le 3 ) , b u t
a ls o th e in fe re n c e ru le s u s e d d u rin g th e d e d u c tio n o f th e tra n s itio n , s o to k e e p tra c e o f
th e “ u n d e rly in g o p e ra tio n s ” th a t le a d to th e e x e c u tio n o f th a t a c tio n . F o r in s ta n c e , in
e x a m p le 3 th e o p e ra tio n u n d e rly in g th e e x e c u tio n o f a c tio n 1 is a s e le c tio n o p e ra tio n
b e tw e e n th e tw o c o n c u rre n tly e n a b le d o p e ra tio n 1 a n d 2. T h e s e “ e n h a n c e d ” la b e ls c a n
b e u s e d to d e fin e a s y s te m a tic w a y fo r d e riv in g th e ra te s to b e a s s o c ia te d to th e s y s te m
tr a n s itio n s . T h e e n h a n c e d la b e ls a r e b u ilt u s in g s y m b o ls f r o m a s u ita b le a lp h a b e t ( e .g .,
{ + , ||, … } ) , t o r e c o r d t h e i n f e r e n c e r u l e s u s e d d u r i n g t h e d e r i v a t i o n o f t h e t r a n s i t i o n s .
F o r e x a m p le , th e tra n s itio n ru le s g iv e n a b o v e w o u ld b e re w ritte n a s fo llo w s to g e t
e n h a n c e d la b e ls 17:
P ⎯ ⎯ → P ' P ⎯ ⎯ → P ' P ⎯ ⎯ o u⎯ ⎯ t x → P ' , Q ⎯ ⎯ ' i ⎯ n ⎯ x → Q '
, + , || , || o u t x , ' i n x
.P ⎯ ⎯ → P P + Q ⎯ ⎯⎯ → P ' P || Q ⎯ ⎯ → P ' || Q P || Q ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ ⎯ → P '| | Q '
1 6
In th e “ s ta n d a rd ” s e m a n tic s o f p ro c e s s a lg e b ra s [2 4 ], th e la b e l o f th is la tte r ru le is e q u a l to
τ , th a t is a n in v is i b le a c tio n , s in c e th e tw o m a tc h in g in p u t a n d o u tp u t o p e ra tio n s “ c o n s u m e ”
e a c h o th e r, m a k in g th e m u n o b s e rv a b le fo r a n e x te rn a l “ o b s e rv e r” (e .g . a p ro c e s s ru n n in g in
p a ra lle l).
1 7
A g a in , w e a re i n tro d u c in g a s im p lific a tio n : in a c o m p le te s p e c ific a tio n d iffe re n t s y m b o ls
s h o u ld b e u s e d to d is tin g u is h th e s e le c tio n o f th e le ft o r rig h t a lte rn a tiv e in a p a ra lle l o r
a lte rn a tiv e c h o ic e c o m p o s itio n (s e e [2 5 ]).
P e rfo rm a n c e V a lid a tio n o f M o b ile S o ftw a re A rc h ite c tu re s 3 6 3
w h e re a n e n h a n c e d la b e l is , in g e n e ra l, g iv e n b y = , w ith d e n o tin g , a s b e fo re , a
*
p a rtic u la r s y s te m a c tio n , a n d ∈ { + , ||, … } d e n o tin g th e s e q u e n c e o f in f e r e n c e r u le s
fo llo w e d to fire th a t a c tio n .
E x a m p le 4 . T h e tra n s itio n s y s te m o f e x a m p le 3 w o u ld b e e n h a n c e d a s fo llo w s :
B y s u i t a b l y d e f i n i n g t h e f u n c t i o n s $ b a n d $ s, w e c a n l i m i t t h e p r o b l e m o f
c a lc u la tin g m e a n in g fu l tra n s itio n ra te s to th e p ro b le m o f d e fin in g o n ly th e c o s t o f th e
“ p rim itiv e ” s y s te m a c tio n s , a n d o f th e s lo w in g fa c to rs c a u s e d b y a p a rtic u la r ta rg e t
a rc h ite c tu re . N o te th a t, w ith re s p e c t to o u r v a lid a tio n fra m e w o rk , th is m e a n s th a t th e
m e th o d o lo g y o f [2 5 ], b e s id e s d e fin in g a m e th o d fo r th e tra n s la tio n fro m O N to T M N ,
a ls o g iv e s s tro n g in d ic a tio n a b o u t w h a t M I s h o u ld b e c o lle c te d . H a v in g th is
in fo rm a tio n , th e c a lc u la tio n o f th e a c tu a l tra n s itio n ra te s c a n b e c o m p le te ly a u to m a te d
(b u t it s h o u ld b e re m a rk e d th a t th e d e fin itio n o f th e a b o v e fu n c tio n s is , in g e n e ra l, q u ite
a n a m b i t i o u s t a s k ) . I t s h o u l d b e n o t e d a l s o t h a t , b y c h a n g i n g t h e d e f i n i t i o n o f $ s, w e
c a n a ls o a n a ly z e th e im p a c t o n p e rfo rm a n c e o f d iffe re n t ta rg e t a rc h ite c tu re s .
O n c e th e ra te s o f a ll th e p o s s ib le tra n s itio n s fro m a g iv e n s ta te (re p re s e n tin g th e
s y s t e m b e h a v i n g l i k e p r o c e s s P i) h a v e b e e n d e t e r m i n e d , t h e o v e r a l l r a t e f r o m s t a t e P i t o
a n o th e r s ta te P j w h ic h is o n e -s te p s u c c e s s o r o f s ta te P i is g iv e n b y :
q ( P i, P j) = ∑ $ ( )
P i ⎯ ⎯ϑ⎯ → P j
(n o te th a t, in g e n e ra l, m o re th a n o n e tra n s itio n fro m s ta te P i to s ta te P j m a y b e p re s e n t
in th e g ra p h o f th e tra n s itio n s y s te m ). T h e p ro c e s s s o o b ta in e d g iv e s u s in fo rm a tio n
a b o u t th e ra te a t w h ic h th e s y s te m a c tio n s a re p e rfo rm e d . In g e n e ra l, to c a rry o u t a
1 8
N o te th a t in th is p re s e n ta tio n w e d o n o t h a v e e x p lic ite ly c o n s id e re d th e p ro b le m o f h o w
c a lc u la tin g th e ra te o f s y n c h ro n iz a tio n (i.e ., c o m m u n ic a tio n ) o p e ra tio n s ; fo r a d is c u s s io n o f
th is to p ic , s e e [1 8 ].
3 6 4 V . G ra s s i, V . C o rte lle s s a , a n d R . M ira n d o la
p e r f o r m a n c e e v a lu a tio n , w e n e e d a ls o a r e w a r d s tr u c tu r e , th a t a s s o c ia te s to e a c h s ta te o f
th e p ro c e s s a p e rfo rm a n c e -re la te d re w a rd (o r c o s t), th a t c o n s titu te s a n o th e r p o s s ib le M I
th a t s h o u ld b e c o lle c te d . In th e d e s c rib e d a p p ro a c h , th e s e re w a rd s c o u ld b e c a lc u la te d
u s in g a p ro c e d u re s im ila r to th e o n e fo llo w e d to c a lc u la te th e tra n s itio n ra te s . B e s id e s
b e in g a d d e d to th e M a rk o v p ro c e s s d e riv e d fro m a p ro c e s s a lg e b ra s p e c ific a tio n , re w a rd s
c o u ld a ls o b e fo rm a lly in c lu d e d in a s to c h a s tic p ro c e s s a lg e b ra , to a llo w fo rm a l
re a s o n in g a b o u t th e m . F o r a d is c u s s io n a b o u t th is to p ic , w h ic h is b e y o n d th e s c o p e o f
th is p a p e r, s e e [1 8 ].
4 .3 G e n e r a l- P u r p o s e S e m i- F o r m a l M o d e ls : U M L
T h e a d v a n ta g e o f u s in g p ro c e s s a lg e b ra s a s O N m a in ly c o n s is ts in th e p o s s ib ility o f a
rig o ro u s a n d n o n a m b ig u o u s m o d e lin g a c tiv ity . H o w e v e r, th e u s e o f th e s e fo rm a l
n o ta tio n s d o e s n o t h a v e y e t g a in e d w id e s p re a d a c c e p ta n c e in th e p ra c tic e o f s o ftw a re
d e v e lo p m e n t. O n th e c o n tra ry , a s e m i-fo rm a l n o ta tio n , th e U n ifie d M o d e lin g L a n g u a g e
( U M L ) [ 6 ] , th a t la c k s s o m e o f th e f o r m a l r ig o r o f th e n o ta tio n s c o n s id e r e d in th e
p r e v io u s s e c tio n , h a s q u ic k ly b e c o m e a d e - f a c to s ta n d a rd in th e in d u s tr ia l s o ftw a re
d e v e lo p m e n t p ro c e s s . U M L re c e n t s u c c e s s is m a in ly d u e to th e fo llo w in g re a s o n s [1 ]:
• It a llo w s to e m b e d in to th e m o d e l s ta tic a n d d y n a m ic a s p e c ts o f th e s o ftw a re b y
u s in g d iffe re n t d ia g ra m s , e a c h re p re s e n tin g a d iffe re n t v ie w o f th e s o ftw a re
s y s te m . E a c h v ie w c a p tu re s a d iffe re n t s e t o f c o n c e rn s a n d a s p e c ts re g a rd in g th e
s u b je c t. T h e r e f o r e it is b ro a d ly a p p lic a b le to d iffe re n t ty p e s o f d o m a in s o r
s u b je c t a re a s .
• T h e s a m e c o n c e p tu a l fra m e w o rk a n d th e s a m e n o ta tio n c a n b e u s e d fro m
s p e c ific a tio n th ro u g h d e s ig n to im p le m e n ta tio n .
• I n U M L , m o r e th a n in c la s s ic a l o b je c t o rie n te d a p p ro a c h e s , th e b o u n d a rie s
b e tw e e n a n a ly s is , d e s ig n a n d im p le m e n ta tio n a re n o t c le a rly s ta te d . A s a
c o n s e q u e n c e , th e re is m o re fre e d o m in s o ftw a re d e v e lo p m e n t p ro c e s s , e v e n if th e
R a tio n a l U n if ie d P r o c e s s [ 1 9 ] h a s b e e n p r o p o s e d a s a g u id e lin e fo r s o ftw a re
p ro c e s s d e v e lo p m e n t b a s e d o n U M L .
• U M L is n o t a p ro p rie ta ry a n d c lo s e d la n g u a g e b u t a n o p e n a n d fu lly e x te n s ib le
la n g u a g e . T h e e x te n s ib ility m e c h a n is m s a n d th e p o te n tia l fo r a n n o ta tio n s o f
U M L a llo w it to b e c u s to m iz e d a n d ta ilo re d to p a rtic u la r s y s te m ty p e s , d o m a in s ,
a n d m e th o d s /p ro c e s s e s . It c a n b e e x te n d e d to in c lu d e c o n s tru c ts fo r w o rk in g
w ith in a p a r tic u la r c o n te x t ( e .g ., p e r f o r m a n c e r e q u ir e m e n t v a lid a tio n ) w h e r e e v e n
v e ry s p e c ia liz e d k n o w le d g e c a n b e c a p tu re d .
• It is w id e ly s u p p o rte d b y a b ro a d s e t o f to o ls . V a rio u s to o l v e n d o rs in te n d to
s u p p o rt U M L in o rd e r to fa c ilita te its a p p lic a tio n th ro u g h o u t a n o rg a n iz a tio n .
B y h a v in g a s e t o f to o ls th a t s u p p o rt U M L , k n o w le d g e m a y b e m o re re a d ily
c a p t u r e d a n d m a n i p u l a t e d t o m e e t a n o r g a n i z a t i o n 's o b j e c t i v e s .
U M L c o n s is ts o f tw o p a rts : a n o ta tio n , u s e d to d e s c r ib e a s e t o f d ia g ra m s (a ls o
c a lle d th e s y n ta x o f th e la n g u a g e ) a n d a m e ta m o d e l ( a ls o c a lle d th e s e m a n tic s o f th e
la n g u a g e ) th a t s p e c ifie s th e a b s tra c t in te g ra te d s e m a n tic s o f U M L m o d e lin g c o n c e p ts .
T h e U M L n o ta tio n e n c o m p a s s e s s e v e ra l k in d s o f d ia g ra m s , m o s t o f th e m b e lo n g in g to
p re v io u s m e th o d o lo g ie s , th a t p ro v id e s p e c ific v ie w s o f th e s y s te m . U M L d ia g ra m s c a n
b e d is tin g u is h e d in to fo u r m a in ty p e s :
1 . S ta tic d ia g ra m s : U s e C a s e , C la s s a n d O b je c t D ia g ra m s
2 . B e h a v io ra l d ia g ra m s : A c tiv ity a n d S ta te D ia g ra m s
3 . In te ra c tio n d ia g ra m s : S e q u e n c e a n d C o lla b o ra tio n D ia g ra m s
P e rfo rm a n c e V a lid a tio n o f M o b ile S o ftw a re A rc h ite c tu re s 3 6 5
4 . Im p le m e n ta tio n d ia g ra m s : C o m p o n e n t a n d D e p lo y m e n t D ia g ra m s
“ S t a n d a r d ” U M L a s O N f o r m o b ile a r c h it e c t u r e . S ta n d a rd U M L c a n b e u s e d
a s O N fo r m o b ile a rc h ite c tu re s , s in c e U M L a lre a d y p ro v id e s s o m e m e c h a n is m s fo r th is
g o a l. T h e y a r e m a in ly b a s e d o n th e u s e o f a ta g g e d v a lu e l o c a t i o n w ith in a
c o m p o n e n t to e x p re s s its lo c a tio n , a n d o f th e c o p y a n d b e c o m e s te re o ty p e s to e x p re s s
th e lo c a tio n c h a n g e o f a c o m p o n e n t. T h e fo rm e r s te re o ty p e c a n b e u s e d to s p e c ify th e
c re a tio n o f a n in d e p e n d e n t c o m p o n e n t c o p y a t a n e w lo c a tio n (lik e in th e C O D a n d
R E X s ty le s ), a n d th e la tte r to s p e c ify a lo c a tio n c h a n g e o f a c o m p o n e n t th a t p re s e rv e s
its id e n tity (lik e in th e M A s ty le ). In [6 ] it is s h o w n h o w to u s e th e s e m e c h a n is m s
w ith in a C o lla b o ra tio n D ia g ra m to m o d e l th e lo c a tio n c h a n g e o f a m o b ile c o m p o n e n t
in te rle a v e d w ith in te ra c tio n s a m o n g c o m p o n e n ts .
E n d O fE x a m p le 5
H o w e v e r, th is m o d e llin g a p p ro a c h p re s e n ts s o m e d ra w b a c k s , s in c e it m ix e s to g e th e r
tw o d iffe re n t v ie w s , o n e c o n c e rn in g th e a rc h ite c tu ra l s ty le ( e .g . th e f a c t th a t a
c o m p o n e n t b e h a v e s a c c o rd in g to s o m e m o b ility s ty le ), a n d th e o th e r o n e c o n c e rn in g th e
a c tu a l s e q u e n c e o f m e s s a g e s e x c h a n g e d b e tw e e n c o m p o n e n ts d u rin g a p a rtic u la r
in te ra c tio n . M o re o v e r, th is a p p ro a c h m a y le a d to a p ro life ra tio n o f o b je c ts in th e
d ia g ra m s , th a t a c tu a lly re p re s e n t th e s a m e o b je c t a t d iffe re n t lo c a tio n s . B o th th e s e
d ra w b a c k s c a n le a d to q u ite o b s c u re m o d e ls o f th e a p p lic a tio n b e h a v io r.
“ E x t e n d e d ” U M L a s O N f o r m o b ile a r c h it e c t u r e . T o o v e rc o m e th e d ra w b a c k s
o f s ta n d a rd U M L a s O N fo r m o b ile a rc h ite c tu re s , th e d e p e n d e n c y b e tw e e n th e m o d e le d
A S a n d th e c h o s e n O N c a n b e m a d e e x p lic it b y a d o p tin g a d iffe re n t a p p ro a c h b a s e d o n
th e u s e o f b o th C o lla b o r a tio n a n d S e q u e n c e D ia g r a m s ( C D a n d S D ) , w ith a c le a r
s e p a ra tio n o f c o n c e rn s b e tw e e n th e m , a s p ro p o s e d in [1 5 ]. T h e S D d e s c rib e s th e a c tu a l
s e q u e n c e o f in te ra c tio n s b e tw e e n c o m p o n e n ts , w h ic h is b a s ic a lly in d e p e n d e n t o f th e
a d o p te d s ty le a n d o b e y s o n ly to th e in trin s ic lo g ic o f th e a p p lic a tio n , w h ile th e C D
o n ly m o d e ls th e in te r a c tio n s tr u c tu r e ( i.e . w h o in te r a c ts w ith w h o m ) a n d s ty le , w ith o u t
s h o w in g th e a c tu a l s e q u e n c e o f e x c h a n g e d m e s s a g e s .
T h e in te ra c tio n lo g ic is d e s c rib e d u s in g th e s ta n d a rd U M L n o ta tio n fo r S D . T h e
in te ra c tio n s tru c tu re is m o d e le d b y th e lin k s th a t c o n n e c t c o m p o n e n ts in C D , w ith
a rro w s s p e c ify in g u n id ire c tio n a l o r b id ire c tio n a l in te ra c tio n s . F o r th e in te ra c tio n s ty le ,
th e m a in g o a l is to d is tin g u is h a s ty le w h e re c o m p o n e n ts lo c a tio n is s ta tic a lly
a s s ig n e d , fro m a s ty le w h e re c o m p o n e n ts d o c h a n g e lo c a tio n to a d a p t to e n v iro n m e n t
3 6 6 V . G ra s s i, V . C o rte lle s s a , a n d R . M ira n d o la
c h a n g e . T o th is p u rp o s e , th e s ta n d a rd l o c a t i o n ta g g e d v a lu e c a n b e u s e d to s p e c ify
th e c o m p o n e n t lo c a tio n , w h ile it is n e c e s s a ry to e x te n d th e U M L s e m a n tic s b y
in tro d u c in g a n e w s te re o ty p e m o v e T o th a t a p p lie s to m e s s a g e s in th e C D . W h e re
p re s e n t, m o v e T o in d ic a te s th a t th e s o u r c e c o m p o n e n t m o v e s to th e lo c a tio n o f its
ta rg e t b e fo re s ta rtin g a s e q u e n c e o f c o n s e c u tiv e in te ra c tio n s w ith it. If n o o th e r
in fo rm a tio n is p re s e n t, th is s ty le a p p lie s to e a c h s e q u e n c e o f in te ra c tio n s s h o w n in th e
a s s o c ia te d S D , b e tw e e n th e s o u rc e a n d ta rg e t c o m p o n e n ts o f th e m o v e T o m e s s a g e ;
o th e r w is e a c o n d itio n c a n b e a d d e d to r e s tr ic t th is s ty le to a s u b s e t o f in te ra c tio n s
b e tw e e n tw o c o m p o n e n ts . It s h o u ld b e n o te d th a t th is a p p ro a c h a p p e a rs s u ita b le to
m o d e l o n ly m o b ile a rc h ite c tu re s w h e re th e a rc h ite c tu re s ty le is A S = { M A } .
E x a m p le 6 A c c o rd in g to th e a d o p te d m o d e lin g fra m e w o rk , th e tra v e l a g e n c y e x a m p le
a p p lic a tio n c a n b e m o d e le d a s s h o w n in fig u re 2 .
F ig u r e 2 . T ra v e l a g e n c y e x a m p le : (a ) in te ra c tio n lo g ic , (b ) a rc h ite c tu ra l s ty le
a :A G E N C Y c :C O L L . f1 :F L Y C . F 2 :F L Y C . f1 :F L Y C O M P .
s ta rt( ) lo c a tio n = L 1
r e q ( i) « m o v e T o »
R e p (i)
a :A G E N C Y « m o v e T o » c :C O L L E C T O R
* ( i= 1 .. N )
lo c a tio n = L 0 lo c a tio n = L ?
r e q ( i)
R e p (i) « m o v e T o »
f1 :F L Y C O M P .
e n d () * ( i= 1 .. N )
lo c a tio n = L 2
(a ) (b )
F ig u r e 2 .a s h o w s a S D th a t d e s c r ib e s in d e ta il th e “ lo g ic ” o f th e in te r a c tio n , i.e . th e
s e q u e n c e o f m e s s a g e s e x c h a n g e d a m o n g th e c o m p o n e n ts . In th is d ia g ra m n o
in fo rm a tio n is p re s e n t a b o u t th e a d o p te d s ty le , th a t is w h e th e r o r n o t s o m e c o m p o n e n t
c h a n g e s lo c a tio n d u rin g th e in te ra c tio n s . T h is in fo rm a tio n is p ro v id e d b y th e C D in
fig u re 2 .b , th a t m o d e ls a s ty le w h e re c o m p o n e n t m o b ility is c o n s id e re d . M o re
p re c is e ly , th e d ia g ra m s h o w s th a t o n ly c c a n c h a n g e lo c a tio n , a n d a c c o rd in g to th e
m o v e T o s e m a n tic s d e s c rib e d a b o v e , it m o v e s to th e lo c a tio n o f a , f 1 o r f 2 b e fo re
in te r a c tin g w ith th e m . N o te th a t in f ig u r e 2 .b th e lo c a tio n o f c is le f t u n s p e c if ie d ( L ? ) ,
s in c e it c a n d y n a m ic a lly c h a n g e . In g e n e ra l, it is p o s s ib le to g iv e it a s p e c ifie d v a lu e in
th e d ia g ra m , th a t w o u ld s h o w th e “ in itia l” lo c a tio n o f th e m o b ile o b je c t in a n in itia l
d e p l o y m e n t c o n f i g u r a t i o n . EndOfExample6
In g e n e ra l, th e re c o u ld b e u n c e rta in ty a b o u t th e c o n v e n ie n c e o f a d o p tin g a m o b ile
c o d e s ty le in th e d e s ig n o f a n a p p lic a tio n . T o m o d e l th is u n c e rta in ty a b o u t th e
a r c h ite c tu r e ( i.e . lo c a tio n a n d p o s s ib le m o b ility o f c o m p o n e n ts ), a n e w s te re o ty p e
m o v e T o ? h a s b e e n p r o p o s e d in [ 1 5 ] , th a t e x te n d s th e s e m a n tic s o f th e m o v e T o
s te re o ty p e d e s c rib e d a b o v e . W h e n a m e s s a g e b e tw e e n tw o c o m p o n e n ts in a C D is
la b e le d w ith m o v e T o ? , th is m e a n s th a t th e s o u r c e c o m p o n e n t “ c o u ld ” m o v e to th e
lo c a tio n o f its ta rg e t a t th e b e g in n in g o f a s e q u e n c e o f in te ra c tio n s w ith it. In a s e n s e ,
th is m e a n s th a t, b a s e d o n th e in fo rm a tio n th e d e s ig n e r h a s a t th a t s ta g e , h e /s h e
c o n s id e rs a c c e p ta b le b o th a s ta tic a n d a m o b ile a rc h ite c tu re . H e n c e , a g e n e ra l U M L
s u p p o rt to m o d e l a (p o s s ib ly ) m o b ile a rc h ite c tu ra l s ty le c o n s is ts o f a C D w h e re s o m e
m e s s a g e s a re u n la b e le d , s o m e c a n b e la b e le d w ith th e (p o s s ib ly c o n s tra in e d ) m o v e T o
s te re o ty p e , a n d s o m e w ith th e m o v e T o ? s te re o ty p e . T h e fo rm e r tw o c a s e s c o rre s p o n d
t o a s i t u a t i o n w h e r e t h e d e s ig n e r f e e ls c o n f id e n t e n o u g h to d e c id e a b o u t th e b e s t
a r c h ite c tu r a l s ty le , w h ile th e la tte r to a s itu a tio n w h e re th e d e s ig n e r la c k s s u c h a
P e rfo rm a n c e V a lid a tio n o f M o b ile S o ftw a re A rc h ite c tu re s 3 6 7
M a r k o v m o d e ls . In g e n e ra l, a M R P m o d e ls a s ta te tra n s itio n s y s te m , w h e re th e n e x t
s ta te is s e le c te d a c c o rd in g to a tra n s itio n p ro b a b ility th a t o n ly d e p e n d s o n th e c u rre n t
s ta te . M o re o v e r, e a c h tim e a s ta te is v is ite d o r a tra n s itio n o c c u rs , a re w a rd is
a c c u m u la te d , th a t d e p e n d s o n th e in v o lv e d s ta te o r tra n s itio n . T y p ic a l m e a s u re s th a t c a n
b e d e riv e d fro m s u c h a m o d e l a re th e re w a rd a c c u m u la te d in a g iv e n tim e in te rv a l, o r th e
re w a rd a c c u m u la tio n ra te in th e lo n g p e rio d . A M D P e x te n d s th e M R P m o d e l b y
a s s o c ia tin g to e a c h s ta te a s e t o f a lte r n a tiv e d e c is io n s , w h e r e b o th th e r e w a r d s a n d th e
tra n s itio n s a s s o c ia te d to th a t s ta te a re d e c is io n d e p e n d e n t. A p o lic y f o r a M D P c o n s is ts
in a s e le c tio n , fo r e a c h s ta te , o f o n e o f th e a s s o c ia te d d e c is io n s , th a t w ill b e ta k e n e a c h
tim e th a t s ta te is v is ite d . H e n c e , d iffe re n t p o lic ie s le a d to d iffe re n t s y s te m b e h a v io rs a n d
to d iffe re n t a c c u m u la te d re w a rd s . In o th e r w o rd s , a M D P d e fin e s a fa m ily o f M R P s , o n e
f o r e a c h d if f e r e n t p o lic y th a t c a n b e d e te r m in e d . A lg o r ith m s e x is t to d e te rm in e th e
o p tim a l p o lic y w ith re s p e c t to s o m e o p tim a lity c rite rio n (e .g . m in im iz a tio n o f th e
a c c u m u la te d re w a rd ) [2 8 ].
In th e tra n s la tio n m e th o d o lo g y a d o p te d in [1 5 ], a M R P /M D P s ta te c o rre s p o n d s to a
p o s s ib le c o n fig u ra tio n o f th e c o m p o n e n ts lo c a tio n , w h ile a s ta te tra n s itio n m o d e ls th e
o c c u rre n c e o f a n in te ra c tio n b e tw e e n c o m p o n e n ts o r a lo c a tio n c h a n g e , a n d th e
3 6 8 V . G ra s s i, V . C o rte lle s s a , a n d R . M ira n d o la
a s s o c ia te d re w a rd is th e c o s t o f th a t in te ra c tio n . In c a s e o f M D P , th e d e c is io n s
a s s o c ia te d to s ta te s m o d e l th e a lte rn a tiv e c h o ic e s o f m o b ility o r n o m o b ility a s
a rc h ite c tu ra l s ty le , fo r th o s e c o m p o n e n ts th a t a re th e s o u rc e o f a m o v e T o ? m e s s a g e .
T h e tr a n s la tio n m e th o d f r o m th e e x te n d e d U M L to th is T M N c o n s is ts o f th e
d e fin itio n o f s o m e e le m e n ta ry g e n e ra tio n ru le s , a n d th e n in th e u s e o f th e s e ru le s to
d e fin e a M D P g e n e ra tio n a lg o rith m [1 5 ].
O n c e th e M D P h a s b e e n g e n e ra te d , it c a n b e s o lv e d to d e te rm in e th e o p tim a l p o lic y ,
th a t is th e s e le c tio n o f a d e c is io n in e a c h s ta te th a t o p tim iz e s th e re w a rd a c c u m u la te d in
th e c o rre s p o n d in g M R P . O f c o u rs e , th e o p tim a l p o lic y d e p e n d s o n th e v a lu e s g iv e n to
th e s y s te m p a r a m e te r s ( e .g ., th e s iz e o f th e m e s s a g e s a n d o f th e p o s s ib ly m o b ile
c o m p o n e n t). D iffe re n t v a lu e s fo r th e s e p a ra m e te rs m o d e l d iffe re n t s c e n a rio s .
Q u e u e in g N e tw o r k m o d e ls . A d iffe re n t m e th o d o lo g y fo r th e d e riv a tio n o f p e rfo rm a n c e
m o d e ls f r o m e x te n d e d U M L d ia g ra m s h a s b e e n p ro p o s e d in [ 1 6 ] , b a s e d o n S P E
te c h n iq u e s , h a v in g q u e u e in g n e tw o rk m o d e ls a s b a s ic T M N .
T h e S P E b a s ic c o n c e p t is th e s e p a r a tio n o f th e s o f tw a r e m o d e l (S M ) fro m its
e x e c u tio n e n v iro n m e n t m o d e l ( i.e ., h a rd w a re p la tfo rm m o d e l o r m a c h in e ry m o d e l,
M M ) . T h e S M c a p tu r e s th e e s s e n tia l a s p e c ts o f s o ftw a re b e h a v io r; a n d is u s u a lly
re p re s e n te d b y m e a n s o f E x e c u tio n G ra p h s (E G ). A n E G is a g ra p h w h o s e n o d e s
re p re s e n t s o ftw a re w o rk lo a d c o m p o n e n ts a n d w h o s e e d g e s re p re s e n t tra n s fe rs o f c o n tro l.
E a c h n o d e is w e ig h te d b y a d e m a n d v e c to r th a t re p re s e n ts th e re s o u rc e u s a g e o f th e n o d e
( i.e ., th e d e m a n d f o r e a c h r e s o u r c e ) .
T h e M M m o d e ls th e h a rd w a re p la tfo rm a n d is b a s e d o n th e E x te n d e d Q u e u e in g
N e tw o rk M o d e l (E Q N M ). T o s p e c ify a n E Q N M , w e n e e d to d e fin e : th e c o m p o n e n ts
( i.e ., s e r v ic e c e n te r s ) , th e to p o lo g y ( i.e ., th e c o n n e c tio n s a m o n g c e n te r s ) a n d s o m e
re le v a n t p a ra m e te rs (s u c h a s jo b c la s s e s , jo b ro u tin g a m o n g c e n te rs , s c h e d u lin g
d is c ip lin e a t s e rv ic e c e n te rs , s e rv ic e d e m a n d a t s e rv ic e c e n te rs ). C o m p o n e n t a n d
to p o lo g y s p e c ific a tio n is p e rfo rm e d a c c o rd in g to th e s y s te m d e s c rip tio n , w h ile
p a ra m e te rs s p e c if ic a tio n is o b ta in e d f r o m in f o r m a tio n d e riv e d b y E G s a n d f r o m
k n o w le d g e o f re s o u rc e c a p a b ilitie s . O n c e th e E Q N M is c o m p le te ly s p e c ifie d , it c a n b e
a n a ly z e d b y u s e o f c la s s ic a l s o lu tio n te c h n iq u e s ( s im u la tio n , a n a ly tic a l te c h n iq u e ,
h y b rid s im u la tio n ) to o b ta in p e rfo rm a n c e in d ic e s s u c h a s th e m e a n n e tw o rk re s p o n s e
tim e o r th e u tiliz a tio n in d e x .
T o c o p e w ith m o b ility , in th e m e th o d o lo g y p ro p o s e d in [1 6 ], w e ll-k n o w n
fo rm a lis m s s u c h a s E G a n d E Q N M h a v e b e e n e x te n d e d b y d e fin in g th e m o b ? -E G a n d
m o b ? -E Q N M fo rm a lis m s w ith th e g o a l o f m o d e llin g c o d e m o b ility a n d th e u n c e rta in ty
a b o u t its p o s s ib le a d o p tio n , w ith in a m o d e l o f th e s y s te m d y n a m ic s .
T o in c lu d e th e in fo rm a tio n a b o u t p o s s ib le c o m p o n e n t m o b ility e x p re s s e d in th e
C D s b y m o v e T o ? m e s s a g e s , a n e w k in d o f E G c a lle d m o b ? – E G is d e riv e d [1 6 ]. T h e
m o b ? -E G m o d ifie s th e o rig in a l E G b y in tro d u c in g m v n o d e s th a t m o d e l th e c o s t o f
c o d e m o b ility . M o re o v e r, th e m o b ? -E G e x te n d s th e E G f o r m a lis m b y in tr o d u c in g a
n e w k in d o f n o d e , c a lle d m o b ? , c h a ra c te riz e d b y tw o d iffe re n t o u tc o m e s , “ y e s ” a n d “ n o ” ,
th a t c a n b e n o n -d e te rm in is tic a lly s e le c te d , fo llo w e d b y tw o p o s s ib le E G s . T h e E G
c o rre s p o n d in g to b ra n c h “ y e s ” m o d e ls th e s e le c tio n o f c o m p o n e n t m o b ility s ty le , w h ile
th e E G o f th e b ra n c h “ n o ” m o d e ls th e s ta tic c a s e .
E x a m p le 7 . T h e s tru c tu re (w ith o u t la b e ls s h o w in g p e rfo rm a n c e re la te d in fo rm a tio n ) o f
th e m o b ? -E G d e riv e d fro m th e S D a n d C D o f e x a m p le 6 is illu s tra te d in th e fo llo w in g
fig u re .
P e rfo rm a n c e V a lid a tio n o f M o b ile S o ftw a re A rc h ite c tu re s 3 6 9
m o b ?
y e s n o
m v N
m o b ?
n o
m o b ? y e s
y e s n o N
m v m v
N
N
N
m v
m v m v
E n d O fE x a m p le 7
M o b ? -E G c a n b e c o n s id e re d b y its e lf a s th e T M N fo r a firs t k in d o f p e rfo rm a n c e
e v a lu a tio n c o rre s p o n d in g to th e s p e c ia l c a s e o f a s ta n d -a lo n e a p p lic a tio n w h e re th e
a p p lic a tio n u n d e r s tu d y is th e u n iq u e in th e e x e c u tio n e n v iro n m e n t (th e re fo re th e re is n o
re s o u rc e c o n te n tio n ). In th is c a s e p e rfo rm a n c e e v a lu a tio n c a n b e c a rrie d o u t b y s ta n d a rd
g ra p h a n a ly s is te c h n iq u e s [3 2 ] to a s s o c ia te a n o v e ra ll “ c o s t” to e a c h p a th in th e m o b ? -
E G a s a fu n c tio n o f th e c o s t o f e a c h n o d e th a t b e lo n g s to th a t p a th . N o te th a t e a c h p a th
in th e m o b ? – E G c o rre s p o n d s to a d iffe re n t m o b ility s tra te g y , c o n c e rn in g w h e n a n d
w h e re c o m p o n e n ts m o v e . H e n c e th e s e re s u lts p ro v id e a n o p tim a l b o u n d o n th e e x p e c te d
p e rfo rm a n c e fo r e a c h s tra te g y , a n d c a n h e lp th e d e s ig n e r in s e le c tin g a s u b s e t o f th e
p o s s ib le m o b ility s tra te g ie s th a t d e s e rv e fu rth e r in v e s tig a tio n in a m o re re a lis tic s e ttin g
o f c o m p e titio n w ith o th e r a p p lic a tio n s .
T h e c o m p le te a p p lic a tio n o f S P E te c h n iq u e s im p lie s th e d e fin itio n o f a ta rg e t
p e rfo rm a n c e m o d e l o b ta in e d fro m th e m e rg in g o f th e m o b ? -E G w ith a Q N m o d e lin g th e
e x e c u tin g p la tfo rm . T h e m e rg in g le a d s to th e c o m p le te s p e c ific a tio n o f a E Q N M b y
d e fin in g jo b c la s s e s a n d ro u tin g , u s in g in fo rm a tio n fro m th e b lo c k s a n d p a ra m e te rs o f
th e m o b ? -E G . H o w e v e r, w e ll k n o w n tra n s la tio n m e th o d o lo g ie s [1 0 , 3 2 ] a re n o t
s u ffic ie n t to p e rfo rm th is m e rg in g b e c a u s e o f th e p re s e n c e o f th e m o b ? n o d e s w ith n o n -
d e te rm in is tic s e m a n tic s in th e m o b ? -E G ; h e n c e it is n e c e s s a ry to g iv e a n e w tra n s la tio n
ru le to c o p e w ith th is k in d o f n o d e s . T o th is e n d a n e x te n s io n o f c la s s ic a l E Q N M s h a s
b e e n p ro p o s e d [1 6 ], to b e u s e d a s T M N w h e n th e O N is th e e x te n d e d U M L d e fin e d in
th e p re v io u s s e c tio n . T h e e x te n s io n is b a s e d o n th e d e fin itio n o f n e w s e rv ic e c e n te rs ,
c a lle d r ? (o u tin g ), th a t m o d e l th e p o s s ib ility , a f te r th e v is it o f a s e r v ic e c e n te r (a n d
th e re fo re th e c o m p le tio n o f a s o ftw a re b lo c k ) to c h o o s e , in a n o n -d e te rm in is tic w a y ,
w h ic h is th e r o u tin g to f o llo w : th e o n e m o d e llin g th e s ta tic s tra te g y o r th e o n e
m o d e llin g th e m o b ile s tra te g y .
In s u c h a w a y , a jo b v is itin g c e n te r r? g e n e ra te s tw o d iffe re n t m u tu a lly e x c lu s iv e
p a th s : o n e p a th m o d e ls th e jo b ro u tin g w h e n th e c o m p o n e n t c h a n g e s its lo c a tio n , th e
o th e r o n e m o d e ls th e ro u tin g o f a s ta tic c o m p o n e n t. N o te th a t, a s n o d e m o b ? in th e
E G , n o d e s r? a re c h a ra c te riz e d b y a n u ll s e rv ic e tim e , s in c e th e y o n ly re p re s e n t a ro u tin g
s e le c tio n p o in t. T h e o b ta in e d m o d e l is c a lle d m o b ? - E Q N M a n d is c h a ra c te riz e d b y
d iffe re n t ro u tin g c h a in s s ta rtin g fro m n o d e s r? . N o te th a t th e s e d iffe re n t ro u tin g c h a in s
3 7 0 V . G ra s s i, V . C o rte lle s s a , a n d R . M ira n d o la
n o
y e s C P U 1
C P U 0 r?
n e t0 1
n o
r?
y e s
C P U 2
n e t0 2 n e t1 2
p a th 1
p a th 2
p a th 3
E n d O fE x a m p le 8
W h e n th e m o b ? -E Q N M is th e T M N , th e S T s u g g e s te d in [1 6 ] fo r c o n te n tio n b a s e d
a n a ly s is is b a s e d o n s o lv in g th e m o b ? -E Q N M th ro u g h w e ll a s s e s s e d te c h n iq u e s [2 0 ,
3 2 ], s e p a ra te ly c o n s id e rin g e a c h d iffe re n t E Q N M b e lo n g in g to th e fa m ily m o d e le d b y
th e m o b ? -E Q N M . W h e n th e n u m b e r o f d iffe re n t E Q N M s is h ig h , th is s o lu tio n
a p p ro a c h c o u ld re s u lt in a h ig h c o m p u ta tio n c o m p le x ity . T h is p ro b le m c a n b e
a lle v ia te d b y e x p lo itin g re s u lts fro m th e s ta n d -a lo n e a n a ly s is . H o w e v e r, m o re e ffic ie n t
s o lu tio n m e th o d s d e s e rv e fu rth e r in v e s tig a tio n . S ta rtin g fro m th e o b ta in e d re s u lts it is
p o s s ib le to c h o o s e th e m o b ility s tr a te g y w h ic h is o p tim a l a c c o rd in g to th e s e le c te d
c rite rio n , fo r e x a m p le th e o n e th a t m in im iz e s th e re s p o n s e tim e .
5 C o n c lu s io n s
T h e p r im a r y g o a l o f th is tu to r ia l h a s b e e n to p r o v id e a s tr u c tu r e d v ie w w ith in th e
d o m a in o f p e rfo rm a n c e v a lid a tio n o f m o b ile s o ftw a re a rc h ite c tu re s . T h e c la s s ific a tio n o f
th e a p p ro a c h e s p re s e n te d in s e c tio n 4 h a s b e e n s u p p o rte d b y a g e n e ra l fra m e w o rk
(s e c tio n 2 ) c la s s ify in g th e p a ra m e te rs e a c h a p p ro a c h h a s to d e a l w ith . In ta b le 5 w e
s u m m a riz e th e v a lu e s (in s o m e c a s e s th e c la s s e s o f v a lu e s ) a s s u m e d b y th e p a ra m e te rs
in tro d u c e d in s e c tio n 2 in a ll th e ty p e s o f a p p ro a c h e s re v ie w e d in s e c tio n 4 .
B e s id e s a d -h o c m o d e ls , w h o s e m e rits a n d lim ita tio n s h a v e b e e n o u tlin e d in s e c tio n
4 .1 , tw o k in d s o f a p p r o a c h e s f o r th e s y s te m a tic m o d e lin g a n d a n a ly s is o f N F A in
m o b ile s o ftw a re a rc h ite c tu re s e m e rg e fro m o u r re v ie w , b a s e d o n th e u s e o f fo rm a l o r
s e m i-fo rm a l la n g u a g e s a s O N . W e w o u ld lik e to re m a rk h e re th a t o u r re v ie w is
p ro b a b ly n o t c o m p le te , b u t w e b e lie v e it is re p re s e n ta tiv e o f e x is tin g a p p ro a c h e s .
T h e m e rit o f fo rm a l la n g u a g e s c o m e s p rim a rily fro m th e ir la c k o f a m b ig u ity , a n d
th e ir p re c is e c o m p o s itio n a l fe a tu re s . H o w e v e r, a s it c a n a ls o b e in fe rre d fro m ta b le 5 ,
th e ir u s e in N F A v a lid a tio n re q u ire s th e a s s ig n m e n t o f re w a rd a n d e x p o n e n tia l d u ra tio n
to a ll th e m o d e lle d a c tio n s , th a t c o u ld b e q u ite a d iffic u lt ta s k . T h e m e th o d p re s e n te d in
P e rfo rm a n c e V a lid a tio n o f M o b ile S o ftw a re A rc h ite c tu re s 3 7 1
O N A S N F A T M N M I S T
R E X , p ro c e s s in g tim e , c lo s e d -fo rm a n a ly tic
n o n e C O D , M A n e tw o rk lo a d m o d e l
R E X , M A p ro c e s s in g tim e P e tri n e t p a ra m e te rs n u m e ric a l
“ in d ire c t
lo c a tio n ” tra n s itio n a n d
F o rm a l R E X , “ a n y ” s to c h a s tic P . A . re w a rd m a in ly
( P .A .) “ d ire c t C O D , M A a n d M R P ra te s n u m e ric a l
lo c a tio n ”
s ta n d a rd
S e m i- in te ra c tio n re la te d p e rfo rm a n c e m a in ly
F o rm a l (s ta n d -a lo n e M R P , M D P a n n o ta tio n s in n u m e ric a l
(U M L ) w ith a p p lic a tio n s ) th e S D s
m o b ility - M A
o rie n te d th ro u g h p u t, p e rfo rm a n c e
e x te n s io n s re s p o n s e tim e E x e c u tio n a n n o ta tio n s in a n a ly tic ,
(s ta n d -a lo n e a n d G ra p h , th e U M L n u m e ric a l,
c o n te n tio n -b a s e d E Q N M d ia g ra m s s im u la tio n
m e a s u re s )
A c k n o w le d g e m e n ts
W o rk p a rtia lly s u p p o rte d b y M U R S T p ro je c t “ S A H A R A : S o ftw a re A rc h ite c tu re s fo r
h e te ro g e n e o u s a c c e s s n e tw o rk in fra s tru c tu re s ” .
3 7 2 V . G ra s s i, V . C o rte lle s s a , a n d R . M ira n d o la
B ib lio g r a p h y
1 . S . A lh ir, “ T h e tru e v a lu e o f th e U n ifie d M o d e lin g L a n g u a g e ” , D is tr ib u te d C o m p u tin g , 2 9 -
3 1 , J u ly 1 9 9 8 .
2 . M . B a ld i, G .P . P ic c o “ E v a lu a tin g th e tra d e o ffs o f m o b ile c o d e d e s ig n p a ra d ig m s in
n e tw o rk m a n a g e m e n t a p p lic a tio n s ” in P ro c . 2 0 th In t. C o n f. o n S o ftw a r e E n g in e e r in g
(IC S E 9 8 ) , ( R . K e m m e r e r a n d K . F u ta ts u g i e d s .) , K y o to , J a p a n , A p r . 1 9 9 8 .
3 . S .B a ls a m o , M .S im e o n i “ D e r iv in g P e rfo rm a n c e M o d e ls fro m S o ftw a re A rc h ite c tu re
S p e c ific a tio n s ” R e s . R e p . C S -2 0 0 1 -0 4 , D ip . d i In fo rm a tic a , U n iv e rs ità d i V e n e z ia , F e b .
2 0 0 1 ; E S M 2 0 0 1 , S C S , E u r o p e a n S im u la tio n M u tic o n fe r e n c e 2 0 0 1 , P ra g u e , 6 -9 J u n e
2 0 0 1 .
4 . M . B a rb e a u “ T ra n s fe r o f m o b ile a g e n ts u s in g m u ltic a s t: w h y a n d h o w to d o it o n w ire le s s
m o b ile n e tw o rk s ” T e c h . R e p . T R - 0 0 - 0 5 , S c h o o l o f C o m p u te r S c ie n c e , C a rle to n
U n iv e rs ity , J u ly 2 0 0 0 .
5 . L . B a s s , P . C le m e n ts , R . K a z m a n , S o ftw a r e A r c h ite c tu r e s in P r a c tic e , A d d is o n -W e s le y ,
N e w Y o rk , N Y , 1 9 9 8 .
6 . G . B o o c h , J . R u m b a u g h , a n d I.J a c o b s o n , T h e U n ifie d M o d e lin g L a n g u a g e U s e r G u id e ,
A d d is o n W e s le y , N e w Y o rk , 1 9 9 9 .
7 . L . C a r d e lli, A .D . G o r d o n “ M o b ile a m b ie n ts ” F o u n d a tio n s o f S o ftw a r e S c ie n c e a n d
C o m p u ta tio n a l S tr u c tu r e s (M . N iv a t e d .), L N C S 1 3 7 8 , S p rin g e r-V e rla g , 1 9 9 8 , p p . 1 4 0 -
1 5 5
8 . N . C a rrie ro , D . G e le rn te r “ L in d a in c o n te x t” C o m m u n ic a tio n s o f th e A C M , v o l. 3 2 , n o .4 ,
1 9 8 9 , p p . 4 4 4 -4 5 8 .
9 . T .- H . C h ia , S . K a n n a p a n “ S tr a te g ic a lly m o b ile a g e n ts ” in P r o c . 1 s t I n t. C o n f. o n M o b ile
A g e n ts (M A ’9 7 ), S p rin g e r-V e rla g , 1 9 9 7 .
1 0 . V . C o rte lle s s a , R . M ira n d o la “ P R IM A -U M L : a p e rfo rm a n c e v a lid a tio n in c re m e n ta l
m e th o d o lo g y o n e a r ly U M L d ia g ra m s ” S c ie n c e o f C o m p u te r P r o g r a m m in g , E ls e v ie r
S c ie n c e , v o l 4 4 , n .1 , p p 1 0 1 -1 2 9 , J u ly 2 0 0 2 .
1 1 . R . D e N ic o la , G . F e rra ri, R . P u g lie s e , B . V e n n e ri “ K L A IM : a k e rn e l la n g u a g e fo r a g e n ts
in te ra c tio n a n d m o b ility ” IE E E T r a n s . o n S o ftw a r e E n g in e e r in g , v o l. 2 4 , n o . 5 , M a y
1 9 9 8 , p p . 3 1 5 -3 3 0
1 2 . G . F e r r a r i, C . M o n ta n g e r o , L . S e m in i, S . S e m p r in i “ M o b ile a g e n ts c o o r d in a tio n in
M o b a d tl ” P r o c . o f 4 t h In t. C o n f. o n C o o r d in a tio n M o d e ls a n d L a n g u a g e s
(C O O R D IN A T IO N ’0 0 ), ( A . P o r to a n d G .- C . R o m a n e d s .) , S p r in g e r - V e r la g , L im a s s o l,
C y p ru s , S e p t. 2 0 0 0 .
1 3 . A . F u g g e tta , G .P . P ic c o , G . V ig n a “ U n d e rs ta n d in g c o d e m o b ility ” IE E E T r a n s . o n
S o ftw a r e E n g in e e r in g , v o l. 2 4 , n o . 5 , M a y 1 9 9 8 , p p . 3 4 2 -3 6 1 .
1 4 . N . G o tz , U . H e rz o g , M . R e tte lb a c h “ M u ltip ro c e s s o r s y s te m d e s ig n : th e in te g ra tio n o f
fu n c tio n a l s p e c ific a tio n a n d p e rfo rm a n c e a n a ly s is u s in g s to c h a s tic p ro c e s s a lg e b ra s ” in
P e r fo r m a n c e E v a lu a tio n o f C o m p u te r a n d C o m m u n ic a tio n S y s te m s (L . D o n a tie llo a n d R .
N e ls o n e d s .) , L N C S 7 2 9 , S p r in g e r - V e r la g , 1 9 9 3 .
1 5 . V . G r a s s i, R . M ira n d o la , “ M o d e lin g a n d p e rfo rm a n c e a n a ly s is o f m o b ile s o ftw a re
a rc h ite c tu re s in a U M L fra m e w o rk ” in < < U M L 2 0 0 1 > > C o n fe r e n c e P r o c e e d in g s , L N C S
2 1 8 5 , S p rin g e r V e rla g , O c to b e r 2 0 0 1 .
1 6 . V . G ra s s i, R . M ira n d o la , “ P R IM A m o b -U M L : a M e th o d o lo g y fo r P e rfo rm a n c e a n a ly s is o f
M o b ile S o ftw a re A rc h ite c tu re ” , in W O S P 2 0 0 2 , T h ir d In te r n a tio n a l C o n fe r e n c e o n
S o ftw a r e a n d P e r fo r m a n c e , A C M , J u ly 2 0 0 2 .
1 7 . R . G ra y , D . K o tz , G . C y b e n k o , D . R u s “ M o b ile a g e n ts : m o tiv a tio n s a n d s ta te -o f-th e -a rt
s y s te m s ” in H a n d b o o k o f A g e n t T e c h n o lo g y , A A A I/M IT P re s s , 2 0 0 1 .
1 8 . H . H e rm a n n s , U . H e rz o g , J .-P . K a to e n “ P ro c e s s a lg e b ra s fo r p e rfo rm a n c e e v a lu a tio n ” ,
T h e o r e tic a l C o m p u te r S c ie n c e , v o l. 2 7 4 , n o . 1 -2 , 2 0 0 2 , p p . 4 3 -8 7 .
1 9 . I . J a c o b s o n , G . B o o c h , J . R u m b a u g h , T h e U n ifie d S o ftw a r e D e v e lo p m e n t P r o c e s s ,
A d d is o n -W e s le y O b je c t T e c h n o lo g y S e rie s , 1 9 9 9 .
2 0 . R . J a in , A r t o f C o m p u te r S y s te m s P e r fo r m a n c e A n a ly s is , W ile y , N e w Y o rk , 1 9 9 0 .
P e rfo rm a n c e V a lid a tio n o f M o b ile S o ftw a re A rc h ite c tu re s 3 7 3
1 Introduction
The fast development of new technologies for high bandwidth networks, wireless
communication, data compression, and high performance CPUs has made it
technically possible to deploy sophisticated communication infrastructures for
supporting a variety of multimedia applications. Among these we can distinguish,
for instance, quality audio and video on demand (to the home), virtual reality
environments, digital libraries, and cooperative design.
Multimedia objects, such as movies, voice extracts, texts, and pictures, are
usually stored in compressed (encoded) form on the disks of a multimedia server.
Since the encoded objects might be long, the playing of an object should not be
delayed until the whole object is transmitted. Instead, the playing of the object
should be initiated as early as possible.
A common characteristic among multimedia applications is the so-called con-
tinuous nature of their generated data. In continuous media (CM), strict timing
relationships exist that define the schedule by which CM data must be rendered
This work is supported in part by grants from CNPq/ProTeM. E. de Souza e Silva
is also supported by additional grants from CNPq/PRONEX and FAPERJ.
M.C. Calzarossa and S. Tucci (Eds.): Performance 2002, LNCS 2459, pp. 374–404, 2002.
c Springer-Verlag Berlin Heidelberg 2002
Performance Issues of Multimedia Applications 375
(e.g., a video displayed, 3D graphics rendered, or audio played out). These tim-
ing relationships coupled with the high aggregate bandwidth needs, the high
individual application bandwidth needs, and the high storage requirements pose
significant challenges to the design of such systems. This is particularly trouble-
some in the scenario of the Internet, which is beginning to be used to convey
multimedia data but which was not designed for this purpose.
In this work, we discuss the main technical issues involved in the design and
implementation of practical (distributed) multimedia systems. We take a partic-
ular view, which divides the system in three main components: the multimedia
server, the resource sharing techniques for transmitting data across the network,
and methods for improving the utilization of network bandwidth and buffers. We
look at each of these components, reviewing the related literature, introducing
the key underlying technical issues, and providing insights on how each of them
impacts the performance of the multimedia system.
be served. This way the server is able to multiplex the disk bandwidth among
various clients, which are served concurrently. The approach works because the
total disk bandwidth available at the server far exceeds the display rate with
which each client consumes bytes.
Let Oi be a reference to the ith multimedia object in the server and bi be a
reference to any data block of the object Oi . Consistently with several prototype
implementations, we assume that the data blocks of each object Oi are all of
the same size. The data blocks of distinct objects, however, might be of different
sizes (i.e., size(bi ) = size(bj )).
A client makes a request for an object Oi . If this request is admitted into the
system, the server starts sending blocks of the object Oi to the client machine.
The client might have to wait until the buffer fills up to a pre-defined threshold
before starting to play the object. The time interval between the client request
and the beginning of the display is called startup latency. To send the blocks
to the client, the server first retrieves them from disk into main memory. Thus,
buffers are also required at the server side.
A client gets a block of data and starts consuming it. Before consuming all
the data in that block, the client must get the next block of data for the object
it is playing. Otherwise, interruption in the service will occur. In the case of
a movie, this means that the motion picture might suddenly freeze in front of
the user (also called hiccup). Thus, each client must get the blocks of data in a
timely fashion.
The blocks which compose the various multimedia objects are laid out across
the disks in the system. A simplistic approach is to store all blocks of the same
object on a single disk. The main advantage of this approach is simplicity and
ease of maintenance. However, there is a considerable disadvantage. If a popular
378 E. de Souza e Silva et al.
video is heavily requested, the disk that stores that video will be overloaded.
Thus, severe load imbalance might result, which limits the number of clients
that can be served. More sophisticated strategies involve spreading the blocks of
the same object across multiple disks (the so called striping techniques).
Layout Using Striping. The key idea of striping is to spread out the data
blocks of each object across the disks of the server. This way, during the service
time of an object, each client request is continuously moved from one disk to
another and shares the bandwidth of all the disks in the system. We say that
the object storage has been decoupled from the disks and call this effect object
decoupling (see Figure 1). Object decoupling provides a load balancing effect
which allows a higher number of clients in the system and a better utilization of
disk bandwidth.
Usually, when a striped layout is used, the server operates in cycles. At
each cycle of duration T , the server retrieves one data block for each client in
the system (this retrieval incurs in three delays: seek time, rotational latency,
and transfer time). While that client consumes the block, other clients can be
served. Discontinuities in the service are avoided by guaranteeing that each client
is served in every cycle. When all clients in the system have been served, the
server sleeps if there is still time available in the current cycle of duration T .
To accommodate objects with distinct bandwidth requirements, we can sim-
ply allow the sizes of the storage units to vary. For instance, objects Oi and Oj
will have blocks sizes bi and bj (bi = bj ), respectively. Each block is stored as
a separate storage unit. For a same object Oi , however, the block sizes are all
the same (i.e., bi [j] = bi [j + 1]). At the disk level, one can keep the storage unit
size constant to avoid fragmentation. For an object that has higher bandwidth
requirements, two or more storage units can be combined to compose a data
block, as illustrated in Figure 1. Each data block of the object Oi is composed
of a single storage unit while each data block of the object Oj is composed of
two storage units. We see that two or more disks might now be involved in the
retrieval of a unique data block. Since the storage unit size is kept constant,
storage and bandwidth fragmentation problems are minimized.
Random Data Allocation Layout. Striping layouts are good because they
provide object decoupling. However, in general, all striping strategies impose a
tight coupling between the layout itself and the block access pattern as a way
to balance the load among the various disks. To avoid this tight coupling, an al-
ternative is to employ a random data allocation. It can be shown that a random
layout is as good as striping in terms of performance [64], but presents important
advantages as we briefly point out here.
A random data allocation layout uses storage units that are all of the same
size. However, contrary to the striping approach, each storage unit is stored in a
disk position that is determined according to the following procedure: (a) select
a disk at random; (b) within that disk, select a free position at random.
As a result, storage units are placed randomly across the disks of the system.
Objects with higher bandwidth requirements are served by combining several
Performance Issues of Multimedia Applications 379
O i
o b je c t
d e c o u p lin g
1 2 D D + 1
O
j
1 2 D /2
b lo c k
d e c o u p lin g
e q u a l-s iz e d
s to r a g e u n its
1 2 3 4 5 D
Fig. 1. Hybrid layout with equal-sized stripe units and block sizes which vary from one
object to the other.
storage units to form a data block. Also, the physical location of the data blocks
is now independent of the block access pattern.
A random data allocation layout provides the following characteristics: object
decoupling; access pattern decoupling; no disk storage fragmentation; small prob-
ability of prolonged bandwidth fragmentation; good performance. The good per-
formance is attained because the load tends to be statistically balanced among
the various disks. Random data allocation is the only layout scheme that pro-
vides all these features together. Because of this, it simplifies the overall design
and implementation of the system. Therefore, we argue that it is the paradigm
of choice for the design and implementation of multimedia servers in general.
Comparative Performance Analysis: Striping versus Random Layout.
In [64] a detailed comparison between a server based on striping and a server
based on a random layout is presented. The experimental results show that
system performance with a random layout is competitive or superior to the
performance obtained with a striping layout. This is the case not only for un-
predictable access patterns generated by sophisticated interactive applications
such as virtual worlds and scientific visualizations, but also for sequential access
patterns generated by more standard video applications.
To illustrate, let us focus on the case of standard video applications. When
only a small amount of buffer is allowed at the server (say, 1.5 MBytes per
stream), a striping layout performs slightly superior to a random layout providing
an increase in the maximum number of streams sustained of roughly 5%. If the
amount of buffer per stream is allowed to increase to 3.5 MBytes per stream,
both layouts lead to the same overall performance. Additional increments in
buffer space per stream favor the random layout, whose performance becomes
superior.
Assume now that more disk space is made available, such that data blocks
can be replicated. This is useful, for instance, to improve reliability against disk
failure. Consider a 25% degree of replication of video data blocks. This is good
for a random layout because replicated blocks can be used to alleviate the load
380 E. de Souza e Silva et al.
of momentarily overloaded disks. In this case, with a buffer space of 3.5 MBytes
per stream, a server based on a random layout presents performance (maximum
number of streams sustained by the server) that is 10-15% higher than the
performance of a server based on a striping layout.
In practical installations, there are other important issues that have to be con-
sidered for proper operation of a multimedia server. Among these, we distinguish
the staging of new videos, the reconfiguration of the server to improve perfor-
mance, and fault tolerance against failures of service in the disks of the server.
In this section, we discuss these issues in more detail and compare their relative
performance considering random-based and striping-based servers.
The Staging Mechanism. Since multimedia objects might be quite large (par-
ticularly movie objects), the number of objects that can be stored on the disks
of the system might be quite limited. This implies that the objects in the system
need to be replaced by new ones from time to time. Since the new objects are
usually loaded from tape, we call this process the staging mechanism. This is
an issue which has not received much attention in the specialized literature but
which is critically important in any practical system.
For offline staging, the use of block decoupling provides an efficient solution.
If online staging is desired, the admission control and the scheduling processes
are affected because a new stream has to be admitted and scheduled. This type of
stream might require higher bandwidth because redundant data (such as copies
of the data to support fault tolerance) have also to be updated. For a striped
layout under heavy load, it might be the case that two fragmented pieces of
bandwidth are available but cannot be used for staging because a coalesced
bandwidth is required. For instance, this might be the case of a new stream
which is been fed live to the server. If the new stream is not live, then there is
no problem because the staging can proceed in non real-time mode.
Staging has similar costs both for a striped and for a random layout, when-
ever the staging is offline. For online staging, a random layout is advantageous.
For instance, a random layout makes it easier to deal with the staging of a new
stream which is been fed live. Also, a random layout allows the staging of a
new object at a rate which is different from its playout rate which is often more
difficult to do with a striped layout.
Disk Reconfiguration. In practical situations, it is reasonable to expect that
the demand on a given server system might eventually exceed its planned ca-
pacity. For instance, it might be the case that the demand for disk bandwidth
exceeds the total disk bandwidth currently available in the server. This problem
can be fixed by adding new disks to the system and copying data blocks (of the
objects already in the system and of new objects) into the new disks. This is
what we call disk reconfiguration. We would like to be able to reconfigure the
system while maintaining the server fully operational.
Performance Issues of Multimedia Applications 381
Consider that we have D disks in the system and that we want to add K
new disks. For simplicity, we consider here that the new disks are of the same
capacity and of the same bandwidth as the disks already installed in the server.
Consider also that no new objects will be added to the system. With current
disk technology, the extra K disks can be “hot” inserted into the system while
it is running. Thus, no interruption in service is required. However, the storage
units need to be remapped to take advantage of the newly available bandwidth.
To exemplify, assume an installation with 8 disks to which 2 new disks are
added. We have that D = 8 and K = 2. In this case, it can be shown that 80%
of all storage units need to be moved if the layout is done with striping, while
only 20% of all storage units need to be moved if the layout is random. Thus, we
conclude that it is much cheaper to reconfigure an installation when the layout
is random.
Fault Tolerance. Maintaining the integrity of the data and its accessibility are
crucial aspects of a multimedia server. Particularly critical are failures of the
disks of the system. While each individual disk is fairly reliable, a large set of
disks presents a considerably higher likelihood of failure of a component. With
a multimedia server, it is particularly important to provide tolerance to this
type of failure because failure of a single disk might disrupt the service to all
clients in the system. Basically, fault tolerance is provided by the maintenance
of redundant information about the data. Two basic schemes can be used: full
replication and parity encoding.
With parity encoding, the D disks of the system are divided in ng groups.
Let g, g = D/ng , be the number of disks per group. For each group, one of the
disks is reserved for storing parity information while the remaining g − 1 are
used for storing data. The parity information is computed as the exclusive-or of
the storage units in the g − 1 disks. We use storage units instead of data blocks
because, in case block decoupling is used, data blocks are not confined to a single
disk. Let sui [k], sui [k + 1], . . . , sui [k + g − 2] be g − 1 consecutive storage units
(belonging to data blocks of object Oi ) which appear each in a separate disk
(assume this for now). Then, the parity information pi [k] for this set of storage
units is computed as pi [k] = sui [k] ⊕ sui [k + 1] ⊕ . . . ⊕ sui [k + g − 2]. The
set composed of the parity storage unit p[k] and of the g − 1 storage units from
sui [k] to sui [k + g − 2] is called a parity group of size g. If the disk that contains
sui [k + 1] is lost, this storage unit can be rebuilt by the following computation
sui [k + 1] = sui [k] ⊕ pi [k] ⊕ . . . ⊕ sui [k + g − 2]. Thus, the disk with the
parity information takes the place of the disk which was lost.
The idea of fault tolerance with full replication is to use additional space
which is of the same size of the space occupied by the whole set of data blocks.
Thus, all data blocks are duplicated. While more expensive in terms of space,
this approach allows recovering from some types of catastrophic failures and
improving the performance of the system. Gains in performance are possible
because any request for a data block can now be served by two different disks
and thus, we can always select the disk with a smaller queue.
382 E. de Souza e Silva et al.
3 Transmitting Information
There are several performance issues that need to be addressed in order to trans-
mit continuous real-time streams over the Internet with acceptable quality. For
instance, real time video encoded in MPEG2 typically requires an average band-
width of approximately 1-4 Mbps, and a voice stream approximately from 6-
64Kbps, depending on the encoding scheme. However, so far the Internet does
not allow bandwidth reservation as needed. In addition congestion in the net-
work may cause significant variability on the interval between the arrival of
successive packets (jitter). Since real time streams must be decoded and played
following strict time constraints, large jitter values will cause the playout pro-
cess to be interrupted. Packet losses may also severely degrade the quality of the
multimedia presentation, depending on the loss pattern. Yet another problem is
network heterogeneity and client heterogeneity. Client heterogeneity means that
the receivers have different network requirements, due to different capabilities
to present the received multimedia information. For multicast applications, the
heterogeneity imposes an additional challenge since a stream being transmitted
would have to be multicast through several networks and clients (with possi-
bly drastic different characteristics) and somehow adapt to the needs of each
client. In this section we discuss a few mechanisms used to mitigate the effects
of random delay and losses in the network.
3.1 How to Cope with Network Jitter and the Rate Variability
We start by considering an audio stream encoded with PCM, say with silence
detection. The audio stream is sampled at 125μsec interval and usually 160
samples are collected in a single packet generating a CBR stream of one IP packet
per 20msec [47] at each active interval. The client consumes the 160 samples at
Performance Issues of Multimedia Applications 383
every 20msec, and thus it is vulnerable to random delays in the network. If the
expected information does not arrive on time, annoying distortions may occur
in the decoded audio signal. Let T be the packet generation interval and X the
corresponding segment interarrival time. The random variable J = X − T is
called jitter.
One simple mechanism to reduce the jitter is to use a playout buffer at the
client, where a given number of packets are stored. At the beginning of each
active period, where packets are generated, the client stores packets till a given
threshold is reached before starting to decode the received samples. The thresh-
old value may be fixed at the beginning of the connection or be adjusted dynam-
ically during the duration of the session. Figure 2 illustrates the basic idea. In
number of
p a c k e t l o s s
packets
( s t a r v a t i o n )
u(t)
p l a y o u t
u(t) b u f f e r f u l l
a(t)
a(t)
l(t)
10
l(t)
B B
5
H H
0 5 10 15 20 0 5 10 15 20 time
that figure, the curve l(t) is equal to the number of packets consumed by the ap-
plication by time t. (In this example, it is assumed PCM encoding and thus the
packet consumption rate is constant.) The upper curve is simply u(t) = l(t) + B
where B is the playout buffer space. The curve labeled a(t) is equal to the num-
ber of packets that have arrived by time t. Note that the arrival instants are not
equally spaced due to the jitter introduced by the network. In the leftmost part
of Fig. 2, B = 8 and H = 6, and so the decoding starts immediately after the
arrival of the 6-th packet. The amount of packets stored in the playout buffer
as a function of time is a(t) − l(t), while u(t) − a(t) quantifies the buffer space
available at t. Buffer starvation occurs if the lower curve touches the bottom
curve and buffer overflow occurs when the middle curve crosses the top curve.
As shown in the figure, the buffer empties at t = 18. Thus, at t = 19 there is no
packet to be decoded (buffer starvation). When this occurs, some action must be
taken perhaps re-playing the last information in the buffer, as an approximation
of the data carried by the missing packet at that time. In the right hand part
of Fig. 2 the threshold value H is increased to H = 7. As a consequence, l(t)
384 E. de Souza e Silva et al.
is shifted to the right. In this example, this change prevents buffer starvation
during the observation period. The value of B is also decreased to 7, and u(t) is
moved downwards with respect to the preceding curve.
It is easy to see that this simple technique eliminates any negative jitter. From
Fig. 2, it is also clear that larger threshold values decrease the jitter variabil-
ity. However, latency increases with increasing threshold values. But interactive
applications, such as a live conversation, do not tolerate latencies larger than
200 − 300 msec. This imposes a constrain of 20-25 packets on H. An issue is the
choice of H and the amount of buffer space necessary to minimize the loss of
packets in case a long burst of packets arrive at the receiver.
Diniz and de Souza e Silva [22] calculate the distribution of the jitter as seen
by the client, when a playout buffer is used. The packet interarrival time is mod-
eled by a phase-type distribution that matches the first and second moments
of this measure obtained from actual network measurements. Packets are con-
sumed at constant rate (PCM), similar to the example of Fig. 2. Silent periods
are included in the model. The goal is to study the tradeoffs between latency and
probability of a positive jitter. It was concluded that the probability of a positive
jitter can be significantly reduced, while maintaining an acceptable latency for
real time traffic.
In addition to the delay variability imposed by the network, compressed
audio/video streams exhibit non-negligible burstiness on several time scales, due
to the encoding schemes and scene variations. Sharp variations on traffic rates
have a negative impact on resource utilization. For instance, more bandwidth
may be necessary to maintain the necessary QoS for the application. The issue
is to develop control algorithms to smooth the CM traffic before transmission to
the clients.
Smoothing techniques can be applied at the traffic source or at another in-
termediate node (e.g., a proxy) in the path to the client. Sen et al [65] address
the issue of online bandwidth smoothing. To better understand the problem con-
sider Fig. 3, where it is assumed that there is no variable delay imposed by the
network when a compressed video stream is sent to a client.
Due to the compression encoding, the rate of bit consumption at the client
node varies with time. Video servers however, read fixed size blocks of informa-
tion from the storage server (each block may be fragmented into packets to fit
the network maximum transfer unit (MTU) before transmission). Then in Fig.
3, the interval between the consumption of constant size data blocks at the client
varies with time. The jumps in the y-axis (a block of data) are of constant size.
This contrasts with the usual representation of variable frame size consumed at
fixed intervals of time (e.g. 1/30sec).
In Fig. 3, a controller at the server site schedules the transmission of video
blocks after they are retrieved from the storage server and queued in a FIFO
buffer. Two sets of curves are shown in the figure. In set 1, let lc (t) be the number
of bits consumed by the client by time t, and uc (t) = lc (t) + Bc , where Bc is
the size of the playout buffer. Similarly, let as (t) (in set 2) be the accumulated
number of bits that are read from the server disks during (0, t) according to the
Performance Issues of Multimedia Applications 385
server
smoothing client
algorithm
number of
packets
as(t) uc(t)
us(t)
ac(t)
lc(t)
10
B s
ls(t)
H
set 2
set 1
0 5 10 15 20
t0 25
t1 30 35
t’1 40 time
demand of the client, and ls (t) be the smoothed stream curve, i.e. the number
of bits effectively dispatched by t. Note that: (a) ac (t) = ls (t − τ ) where τ is a
(assumed constant) network delay from the server to the client; (b) as (t − τ ) =
lc (t) where τ # is the constant network delay plus the delay to fill the playout
buffer; (c) the jumps of lc (t) occur at the instants of consumption of a block of
data. If we assume that the playout buffer is filled until its capacity before the
continuous stream is played back, the number of bits in the playout buffer is
given by the difference between the top and the middle curves in set 1.
The server seeks to transmit data to the client as smooth as possible that is,
ls (t) in the figure should resemble a straight line with the smallest possible right
angle. Since the shape of lc (t) and consequently uc (t) and as (t) is determined
by the encoding algorithm applied to the video to be transmitted, the issue is
how to plan the transmission of the data so that uc (t) ≤ ac (t) ≤ lc (t), and
yet the maximum transmission rate is kept as close as possible to the average
consumption rate.
In [62] Salehi et al obtained an efficient algorithm that can generate a trans-
mission schedule given the complete knowledge of ac (t). This is referred to as an
offline algorithm. Roughly, from a initial time ti (start from i = 0), one should
construct the longest possible line that does not violate the constraints imposed
by uc (t) and lc (t) in Fig. 3. Clearly, by construction, this straight line intersects
one of the boundary curves at a time point ti+1 > ti , (and so the rate would
have to be changed at this point), and touches one of these curves at a time
ti+1 < ti+1 . To avoid sudden rate changes one should vary the previous rate
as soon as possible. Consequently a new starting point is chosen at ti+1 . The
process is repeated (setting i = i + 1) until the end of the stream is reached, and
ac (t) is obtained which determines the scheduling algorithm.
386 E. de Souza e Silva et al.
The set 2 in Fig. 3 shows the arrival and transmission curves at the server.
Note that the server starts its transmission as soon as a threshold H is reached
(in the figure the threshold is equal to 5 blocks). us (t) = ls (t) + Bs , where
Bs is the FIFO buffer available at the server. One should note that, once ls (t)
is determined, Bs and the threshold can be calculated to avoid overflow and
underflow.
We can represent the curves uc (t), lc (t) and ac (t) as vectors u, l, a respec-
tively, each with dimension N , where N is the number of data blocks in the video
stream, and the i-th entry in one of the vectors, say vector l, is the amount of
time to consume the i-th block. In [62] majorization is used as a measure of
smoothness of the curves. Roughly, if a vector x is majorized by y (x ≺ y) then
a smoother curve than y. It is shown in [62] that if x ≺ y then
x represents
var(x) = i (xi − x)2 and since, by definition, the maximum entry in x is less
or equal than the corresponding entry in y, a vector x that is majorized by y
has smaller maximum rate and smaller variance than y. The schedule algorithm
outlined above is shown to be optimum and unique in [62]. Furthermore the
optimal schedule minimizes the effective bandwidth requirements.
This algorithm is the basis for the online smoothing problem [65]. It is as-
sumed that, at any time τ , the server has the knowledge of the time to consume
each of the next P blocks. This is called the lookahead interval and is used to
compute the optimum smoothing schedule using the algorithm of [62]. Roughly,
blocks that are read from the storage server are delayed by w units and passed
to the server buffer that implements the smoothing algorithm which is invoked
every 1 ≤ α ≤ w blocks.
Further work in the topic include [50] where it is shown that, given a buffer
space of size B at the server queue, a maximum delay jitter J can be achieved
by an off-line algorithm (similar to the above algorithm). Furthermore, an on-
line algorithm can achieve a jitter J using buffer space 2B at the server FIFO
queue. While the smoothing techniques described above are suitable for video ap-
plications, interactive visualization applications pose additional problems. This
subject was studied in [73].
In summary, the amount of buffer space in the playout buffer of Fig. 3 is a
key parameter that determines how smooth the transmission of a CM stream
can be. It also serves to reduce the delay variability introduced by the network.
The queue at the server site in Fig. 3 implements the smoothing algorithm and
the amount of buffer used is also a design parameter. As mentioned above, jitter
reduction and smoothness are achieved at expense of the amount of buffer space.
But the larger these buffers the larger the latency to start playing the stream.
packet delays may be so large that they may arrive too late to be played at the
receiver, in a real time application.
A common method to recover from a packet loss is retransmission. However,
a retransmitted packet will arrive at the receiver at least one round-trip-time
(RTT) later than the original copy and this delay may be unacceptable for real
time applications. Therefore, retransmission strategies are only useful if the RTT
between the client and the server is very small compared with amount of time
to empty the playout buffer.
A number of retransmission strategies have been proposed in the context
of multicast streaming. For example, receiver-initiated recovery schemes may
obtain the lost data from neighboring nodes which potentially have short RTTs
with respect to the node that requested the packet [69].
The other methods to cope with packet loss in CM applications are error
concealment, error resilience, interleaving and FEC (forward error correction).
Error resilience and error concealment are techniques closely coupled with the
compression scheme used. Briefly error resilience schemes attempt to limit the
error propagation due to the loss, for instance via re-synchronization. An error
propagation occurs when the decoder needs the information contained in one
frame to decode other frames. Error concealment techniques attempt to recon-
struct the signal from the available information when part of it is lost. This is
possible if the signal exhibits short term self-similarities. For instance, in voice
applications the decoder could simply re-play the last packet received when the
current packet is not available. Another approach is to interpolate neighboring
signal values. Several other error concealment techniques exist (see [57,74] for
more details and references on the subject).
Interleaving is an useful technique for reducing the effect of loss bursts. The
basic idea is to separate adjacent packets of the original stream by a given
distance, and re-organize the sequence at the receiver. This scheme introduces
no redundancy (and so does not consume extra bandwidth), but introduces
latency to re-order the packets at the receiver.
FEC techniques add sufficient redundancy in the CM stream so that the
received bit stream can be reconstructed at the receiver even when packet losses
occur. The main advantage of FEC schemes is the small delay to recover from
losses in comparison with recovery using retransmission. However, this advantage
comes at the expense of increasing the transmission rate. The issue is to develop
FEC techniques that can recover most of the common patterns of losses in the
path from the sender to the receiver without much increase in the bandwidth
requirements to transmit the continuous stream.
A number of FEC techniques have been proposed in the literature [57,74].
The simplest approach is as follows. The stream of packets is divided into groups
of size N − 1, and a XOR operation is performed on the N − 1 packets of each
group. The resulting “parity packet” is transmitted after each group. Clearly, if
a single packet is lost in a group of N packets the loss can be recovered.
Three issues are evident from this simple scheme. First, since a new packet
is generated for every N − 1 packets the bandwidth requirements increases by
388 E. de Souza e Silva et al.
(a)
1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4
X O R X O R X O R X O R
(b) 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2
per window and two subsets, and we call this a 2:6 class of algorithm. The result
of the XOR operation for a set of packets is sent piggybacked in a packet of the
next window. Clearly we can use codecs of smaller transmission rate as in [12]
for saving in bandwidth. Note that burst errors of size at most equal to 2 packets
can be recovered, and efficiency in bandwidth is gained at expense of latency
to recover from losses. Furthermore, the scheme in Fig. 4(a) has the practically
same overhead as the simple XOR scheme first described, but can recover from
consecutive losses of size two. Another class of schemes can be obtained by merg-
ing two distinct k : n class of algorithms, such that all packets belong to at least
two different subsets and therefore are covered by two different XORs. Figure
4(b) illustrates an example where schemes 1:2 and 3:6 are overlapped. In this
case, all losses of size one in the larger window can be recovered. Furthermore,
complex loss patterns can also be protected. For example, if packets 2, 3, 4, and
5 were lost, they can all be recovered. The overhead of this mixed scheme is
clearly obtained by adding the overhead of the individual schemes.
In [29] the algorithm is applied in the real data obtained from measures and
the efficiency of different FEC-based algorithms are evaluated. The conclusions
show the importance of adapting to different networks conditions. (This issue
was also addressed by Bolot et al [11], in the context of the scheme of [12].)
Furthermore, in all the tests performed, the class of schemes that mixed two
windows provided better results than the class with a single window under the
same overhead.
Altman et al developed an analytical model to analyze the FEC scheme of [12]
and other related schemes. The loss process was modeled using a simple M/M/K
queue. The aim is to assess the tradeoffs between increased loss protection from
the FEC algorithm and the adverse impact from the resulting increase in the
network resources usage due to the redundancy added to the original stream.
The results show that the scheme studied may not always result in performance
gains, in particular if a non-negligible fraction of the flows implements the same
FEC scheme. It would be interesting to evaluate the performance of other FEC
approaches.
To conclude the subsection we refer to a recent proposed approach to mitigate
the effect of losses. As is evident from above, short term loss correlations have
an adverse effect on the efficiency of the recovery algorithms. One way to reduce
the possible correlations is to split the continuous stream sent from a source to a
390 E. de Souza e Silva et al.
given destination into distinct paths. For instance, we could split a video stream
in two and sent the even packets in the sequence via one path and the odd
packets via another path to the destination. This is called path diversity [76].
In [6] it is assumed that the loss characteristics in a path can be represented
by a 2-state Markov chain (Gilbert model) and a Markov model was developed
to access the advantages of the approach. Clearly a number of tradeoffs exists,
depending on the loss characteristics of each path, if the different paths share or
not a set of links, etc.
network bandwidth, can be quickly exhausted. Consider, for instance, the sce-
narios 1 and 2 discussed in Sec. 2. To maintain 1500 concurrent streams live,
with a separate bandwidth allocated to each of them, it is necessary to sustain
a total bandwidth of 2,25 Gbps at the output channel of the multimedia server.
While technically feasible, this is quite expensive nowadays and prohibitive from
a commercial point of view.
A common approach for dealing with this problem is to allow several clients to
share a common stream. This is accomplished through mechanisms, here called
resource sharing techniques, which allow the clients to share streams and buffers.
The goal is to reduce the demand for network bandwidth, disk bandwidth, and
storage space. While providing the capability of stream sharing, these techniques
have also to provide QoS to the clients.
As is Sec. 3 client QoS is affected by the server characteristics, such as la-
tency and available disk bandwidth, and by the network characteristics, such as
bandwidth, loss, jitter, and end-to-end delay. The challenge is to provide very
short startup latency and jitter for all client requests and to be able to serve a
large number of users at minimum costs.
The bandwidth sharing mechanisms proposed in the literature fall into two
categories: client request oriented and periodic broadcast. Client request oriented
techniques are based on the transmission, by the server, of a CM stream in
response to multiple requests for its data blocks from the client. Periodic broad-
cast techniques are based on the periodic transmission by the server of the data
blocks.
Besides these sharing mechanisms, one can also use proxy servers to reduce
network load. Proxy servers are an orthogonal technique to bandwidth sharing
protocols, but one which is quite popular because it can be easily implemented
and can be managed at low costs.
We partition our presentation in three topics. We first describe client request
oriented techniques. Then, periodic broadcast mechanisms are presented, followed
by a discussion on proxy-based strategies.
The simplest approach for allowing the sharing of bandwidth is to batch new
clients together whenever possible. This is called batching [2,20,21] and works as
follows. Upon the request of a new media stream si by an arriving client ck , a
batching window is initiated. Every new client that arrives within the bounds of
this window and requests the stream si is inserted in a waiting queue i.e., it is
batched together with the client ck . When the window expires, a single trans-
mission for the media stream si is initiated. This transmission is shared by all
clients, as in standard broadcast television. Batching policies reduce bandwidth
requirements at the expense of introducing an additional delay to the users (i.e.,
client startup latency increases).
Stream tapping [15], patching [13,40], and controlled multicast [32] were in-
troduced to avoid the latency problems of batching. They are very similar tech-
Performance Issues of Multimedia Applications 393
niques. They can provide immediate service to the clients, while allowing clients
arriving at different instants to share a common stream.
In the basic patching scheme [40], the server maintains a queue with all pend-
ing requests. Whenever a server channel becomes available, the server admits all
the clients that requested a given video at once. These clients compose a new
batch. Assume, that this new batch of clients requested a CM stream si which
is already being served. Then, all clients in this new batch immediately join this
on-going multicast transmission of si and start buffering the arriving data. To
obtain the initial part of the stream si , which is called a patch because it is no
longer being multicasted, a new channel is opened with the multimedia server.
Data arriving through this secondary channel is immediately displayed. Once the
initial part of si (i.e., the patch) has been displayed, the client starts consuming
data from its internal buffer. Thus, in this approach, the clients are responsible
for maintaining enough buffer space to allow merging the patch portion of the
stream with its main part. They also have to be able to receive data in two
channels.
Stream tapping, optimal patching [13], and controlled multicast differ from
the basic patching scheme in the following way: they define an optimal patching
window wi for each CM stream si . This window is the minimum interval between
the initial instants of two successive complete transmissions of the same stream
si . The size of wi can improve the performance of patching. If wi is set too large,
most of the server channels are used to send patches. On the other hand, if wi is
too small no stream merging will occur. The patching window size is optimal if
it minimizes the requirements of server and network bandwidth. The algorithm
works as follows. Clients which requested the stream si prefetch data from an
on-going multicast transmission, if they arrive within wi units of time from the
beginning of the previous complete transmission. Otherwise, a new multicast
transmission of stream si is initiated. A mathematical model which captures the
relation between the patching window size and the required server bandwidth
is proposed in [13]. In [32] an expression for the optimal patching window is
obtained.
Figure 5 illustrates batching and patching techniques. In Fig. 5(a), three new
clients requesting the stream si arrive within the batching window. They are
served by the same multicast transmission of si . Figure 5(b) shows the patching
mechanism. We assume that the three requests arrive within a patching window.
The request r0 triggers the initial multicast transmission of stream si , r1 triggers
the transmission of the patch interval (t1 − t0 ) of si for r1 , and r2 starts a
transmission of the (t2 − t0 ) missing interval of si for r2 .
A study of the bandwidth required by optimal patching, stream tapping, and
controlled multicast is presented in [27]. It is assumed that the arrivals of client
requests are Poisson with mean rate equal to λi for stream si . The required
server bandwidth for delivery of stream si is given by [27] as: BOP,ST,CM =
(1 + wi2 Ni /2)/(wi + 1/Ni ), where Ni = λi Ti , Ti is the total length of stream si
and wi is the patching window.
394 E. de Souza e Silva et al.
m u ltic a s t o f s i
p a tc h m u ltic a s t
m u ltic a s t o f s i
fo r r1
r r r r r p a tc h m u ltic a s t
0 1 2 0 1 r2
fo r r2
tim e t0 t1 t2 tim e
b a tc h in g w in d o w p a tc h in g w in d o w
( a ) b a tc h in g te c h n iq u e ( b ) p a tc h in g te c h n iq u e
The expression presented above is very similar to the results obtained in [13,
32]. The value of the optimal patching window √ can be obtained differentiating the
expression for BOP,ST,CM . It is equal to ( 2Ni + 1−1)/Ni . The server √ bandwidth
for an optimal patching window is given by [32]: Boptimal window = 2Ni + 1 − 1.
A second approach to reduce server and network bandwidth requirements,
called Piggybacking, was introduced in [4,35,48]. The idea is to change dinami-
cally the display rates of on-going stream transmissions to allow one stream to
catch up and merge with the other. Suppose that stream si is currently being
transmitted to a client. If a new request for si arrives, then a new transmission
of si is started. At this point in time, the server slows down the data rate of
the first transmission and speeds up the data rate of the second transmission of
si . As soon as the two transmissions become identical, they can be merged and
one of the two channels can be released. One limitation of this technique is that
it requires a specialized hardware to support the change of the channel speed
dinamically.
In the hierarchical stream merging (HSM) techniques [7,25,27] clients that
request the same stream are hierarchically merged into groups. The client re-
ceives simultaneously two streams: the one triggered by its own request and a
second stream which was initiated by an earlier request from a client. With time,
the client is able to join the latter on-going multicast transmission and the de-
livery of the stream initiated by it can be aborted. The merged clients also start
listening on the next most recently initiated stream. Figure 6 shows a scenario
where four client requests arrive for stream s1 during the interval (t1 , t3 ). The
server initiates a new multicast transmission of s1 for each new client. At time
t2 , client c2 , who is listening to the stream for c1 , can join this on-going multicast
transmission. At time t3 , client c4 joins the transmission of c3 . Thus, after t3 ,
client c4 can listen to the multicast transmission initiated at t1 . At t5 , client c4
will be able to join the transmission started for client c1 . Bandwidth skimming
(BS) [26] is similar to HSM. In this technique, policies are defined to reduce the
user bandwidth requirements to less than twice the stream playback rate.
In [27], an expression for the required bandwidth of a server operating with
the HSM and BS techniques was obtained. Suppose that the transmission rate
Performance Issues of Multimedia Applications 395
needed by a client is equal to b units of the media playback rate (b = 2 for HSM
and b < 2 for bandwidth skimming) and that the request arrivals are Poisson
with mean rate equal to λi for stream si . The required server bandwidth for
delivery of stream si can be approximated by: BHSM,BS ≈ ηb ln(Ni /ηb + 1),
where Ni is defined as above and ηb is the positive real constant that satisfies:
b
ηb [1 − (ηb /(ηb + 1)) ] = 1
C h a n n e l
c 1
1
c 4 jo in s c 1
c jo in s c 2
2 1
c 3 jo in s c 1 3
c 3 4
c 4 jo in s c 3 5
c 6
2
c 4
7
t1 t2 t3 t4 t5
( a ) h ie r a r c h ic a l s tr e a m m e r g in g (b ) s k y s c ra p e r b ro a d c a s t
Most of the studies in the literature have evaluated the required server band-
width for the proposed sharing techniques. One key question is how the perfor-
mance of a multimedia system is affected when the server bandwidth is limited
by the equations previously presented. In a recent work [68] analytical models
for HSM, BS and patching techniques are proposed to evaluate two performance
metrics of a multimedia system: the mean time a client request is delayed if the
server is overloaded (it is called the mean client waiting time) and the fraction
of clients who renege if they are not served with low delay (it is called the balk-
ing rate). The models assumptions are: client request arrivals are Poisson, each
client requests the entire media stream and all media streams have the same
length and require the same playout rate. The model proposed to evaluate the
balking rate is a closed two-center queueing network with C clients (C is the
server bandwidth capacity in number of channels). The mean client waiting time
is obtained from a two-center queueing network model with K users (There are
one user per class and each class represents one stream.) Each user models the
first request for a stream si , the others requests that batch with the first are
not represented in the model. Results obtained from the analytical models show
that the client balking rate may be high and the mean client waiting time is
low when the required server bandwidth is defined as in the equations presented
above. Furthermore the two performance parameters are very sensitive to the
increase in the client load.
The idea behind periodic broadcast techniques is that each stream is divided into
segments that can then be simultaneously broadcast periodically on a set of k
396 E. de Souza e Silva et al.
different channels. A channel c1 delivers only the first segment of a given stream
and the other (k −1) channels deliver the remainder of the stream. When a client
wants to watch a video, he must wait for the beginning of the first segment on
channel c1 . A client has a schedule for tuning into each of the (k − 1) channels to
receive the remaining segments of the video. The broadcasting schemes can be
classified into three categories [39]. The first group of periodic broadcast tech-
niques divides the stream into increasing sized segments and transmits them in
channels of the same bandwidth. Smaller segments are broadcast more frequently
than larger segments and the segments follow a size progression (l1 , l2 , ..., ln ). In
the Pyramid Broadcasting (PB) protocol [70], the sizes of the segments follow
a geometric distribution and one channel is used to transmit different streams.
The transmission rate of the channels is high enough to provide on time delivery
of the stream. Thus, client bandwidth and storage requirements are also high.
To address the problem of high resource requirements at the client side, a
technique called Permutation-based Pyramid Broadcasting (PPB) was proposed
in [3]. The idea is to multiplex a channel into k subchannels of lower rate. In the
Skyscraper broadcast technique [41] each segment is continuously transmitted at
the video playback rate on one channel as shown in Fig. 6. The series of segments
sizes is 1,2,2,5,5,12,12,25,25,52,52,... with a largest segment size equal to W .
Figure 6 shows two client request arrival times: one just prior the third segment
and the other before the 18th segment broadcast on channel 1. The transmission
schedules of both clients are represented by the gray shaded segments. The
schedule is such that a client is able to continuous playout the stream receiving
data in no more than two channels. The required maximum client buffer space
is equal to the largest segment size.
For all techniques described above the required server bandwidth is equal
to the number of channels and is independent of the client request arrival rate.
Therefore, these periodic broadcast techniques are very bandwidth efficient when
the client request arrival rate is high. A dynamic skyscraper technique was pro-
posed in [24] to improve the performance of the skyscraper. It considered the
dynamic popularity of the videos and assumed lower client arrival rates. It dy-
namically changes the video that is broadcast on the channels. A set of segments
of a video are delivered in transmission clusters. Each cluster starts every W
slots on channel 1 and broadcasts a different video according to the client re-
quests. Client requests are scheduled to the next available transmission cluster
using a FIFO discipline, if there is no transmission cluster already been assigned
to the required video. A new segment size progression is proposed in [27] to
provide immediate service to client. Server bandwidth requirements for trans-
mitting a stream si considering Poisson arrivals with mean rate λi are given by
[27]: BDyn Sky = 2U λi + (K − 2)/(1 + 1/λi W U ), where U is the duration of a
unit-segment, W is the largest segment size and K is the number of segments in
the segment size progression.
Another group, the harmonic broadcast techniques, divide the video in equal
sized segments and transmit them into channels of decreasing bandwidth. The
Performance Issues of Multimedia Applications 397
third group combines the approaches described above. They are a hybrid scheme
of pyramid and harmonic broadcasting.
determine which video and what percentage of it has to be cached at the proxy.
The first stores hot videos i.e., popular videos, entirely at the proxy. The second
stores a portion of a video so as to minimize the bandwidth requirements on the
server-proxy path. Results shown that the second heuristic performs better than
the first.
Another approach is presented in [59]. A mechanism for caching video layers
is used in conjunction with a congestion control and a quality adaptation mecha-
nism. The number of video layers cached at the proxy is based on the popularity
of the video. The more popular is a video, the more layers are stored in the
proxy. Enhancement layers of cached streams are added according to a quality
adaptation mechanism [60]. One limitation of this approach is that it requires
the implementation of a congestion control and of a quality adaptation mecha-
nism in all the transmissions between clients and proxies and between proxies
and servers.
Partial Caching of the Video File. The scheme proposed in [66] is based on
the storage of the initial frames of a CM stream in a proxy cache. It is called
proxy prefix caching. It was motivated by the observation that the performance
of CM applications can be poor due to the delay, throughput and loss character-
istics of the Internet. As presented in Sec. 3 the use of buffers can reduce network
bandwidth requirements and allow the application to tolerate larger variations
in the network delay. However, the buffer size is limited by the maximum startup
latency a client can tolerate. Proxy prefix caching allows reducing client startup
latency, specially when buffering techniques are used. The scheme work as fol-
lows. When a client requests a stream, the proxy immediately delivers the prefix
to the client and asks the server to initiate the transmission of the remaining
frames of the stream. The proxy uses two buffers during the transmission of
stream si : the prefix buffer Bp and a temporary buffer Bt . Initially, frames are
delivered from the Bp buffer while frames coming from the server are stored in
the Bt buffer.
Caching with Bandwidth Sharing Techniques. When using a proxy server
in conjunction with scalable delivery mechanisms several issues have to be ad-
dressed. The data to be stored at each proxy depends on the relative cost of
streaming a video from the server and from the proxy, the number of proxies,
the client arrival rate and the path from the server to the proxy (unicast or
multicast enabled).
Most of the studies in the literature [23,58,16,36,71,5] define a system cost
function which depends on the fraction of the stream stored at the proxy (wi ),
the bandwidth required for a stream (bi ), the client arrival rate (λi ) and the
length of the stream (Ti ). The cost for delivering a stream si is given by
Ci (wi , bi , λi , Ti ) = Bserver (wi , bi , λi , Ti ) + Bproxy (wi , bi , λi , Ti ) where Bserver is
the cost of the server-proxy required bandwidth and Bproxy is the cost of the
proxy-client required bandwidth. Then, an optimization problem is formulated.
The goal is to minimize the transmission costs subject to bounds on the total
storage and/or bandwidth available at the proxy. The solution of the problem
gives the proxy cache allocation that minimizes the aggregate transmission cost.
Performance Issues of Multimedia Applications 399
The work of [71] combines proxy prefix caching with client request oriented
techniques for video delivery between the proxy and the client. It is assumed that
the transmission between the server and the proxy is unicast and the network
paths from the proxy to the clients are either multicast/broadcast or unicast.
Two scenarios are evaluated: (a) the proxy-client path is unicast and (b) the
proxy-client path is multicast. For the scenario (a) two transmission strategies
are proposed. In the first a batching technique is used to group the client re-
quest arrivals within a window wpi (equal to the length of the prefix stored at
the proxy). Each group of clients is served from the same unicast transmission
from the server to the proxy. The second is an improvement of the first. It is
similar to the patching technique used in the context of unicast. If a client re-
quest arrives at time t after the end of wpi , the proxy schedules a patch for the
transmission of the missing part from the server. The (Ti − t) (Ti is the length of
the stream si ) remaining frames of the stream are delivered from the on-going
transmission of stream si . The client will receive data from at most two chan-
nels: the patch channel and the on-going transmission channel. For the scenario
(b), two transmission schemes are presented: the multicast patching technique
[13] implemented at the proxy and the multicast merging which is similar to the
stream merging technique [25]. A dynamic programming algorithm is used to
solve the optimization problem. Results show that the transmission costs when
a prefix cache is used are lower compared to caching the entire stream, and that
significant transmissions savings can be obtained with a small proxy size.
The work in [36] studies the use of proxy prefix caching with periodic broad-
cast techniques. The authors propose the use of patching to deliver the prefix
from the proxy to the client and periodic broadcast to deliver the remaining
frames (the suffix) from the server to the client. Clients will temporarily receive
both the prefix from the proxy and the suffix from the server. Therefore, the
number of channels a client needs is the sum of the channels to obtain the suffix
and the prefix. A slight modification in periodic broadcast and patching is intro-
duced such that the maximum number of simultaneous channels required by a
client is equal to two. Proxy buffer allocation is based on a three steps algorithm
aimed at minimizing server bandwidth in the path from the server to the proxy.
Results show that the optimal buffer allocation algorithm outperforms a scheme
where the proxy buffer is evenly divided among the streams without considering
the length of each stream.
In [5] the following scenarios are considered: (a) the bandwidth skimming
protocol is used in the server-proxy and proxy-client paths, (b) the server-proxy
path is unicast capable and the bandwidth skimming technique is used in the
proxy-client path, and (c) scenarios (a) and (b) combined to proxy prefix caching.
In the scenario (a) the proxy can store an arbitrary fraction of each stream si .
Streams are merged at the proxy and at the server using the closest target
bandwidth skimming protocol [25]. In the scenario (b) the server-proxy path
is unicast, thus only streams requested from the same proxy can be merged at
the server. Several results are obtained from a large set of system configuration
parameters. They show that the use of proxy servers is cost effective in the
400 E. de Souza e Silva et al.
following cases: the server-proxy path is not multicast enabled or the client
arrival rate is low or the cost to deliver a stream from the proxy to the client is
very small when compared to the cost to deliver a stream from the server to the
client.
5 Conclusions
In this chapter we have surveyed several performance issues related to the design
of real time voice and video applications (such as voice transmission tools and
multimedia video servers). These include issues from continuous media retrieval
to transmission. Since the topic is too broad to be covered in one chapter we
trade deepness of exposition to broadness, in order to cover a wide range of
inter-related problems.
As can be seen in the material covered, an important aspect in the design of
multimedia servers is the storage strategy. We favor the use of the random I/O
technique due to its simplicity of implementation and comparable performance
with respect to other schemes. This technique is particularly attractive when
different types of data are placed in the server, for instance mixture of voice,
video, transparencies, photos, etc. Furthermore, the same technique can be easily
employed in proxies. To evaluate the performance of the technique, queueing
models constructed from real traffic streams traces can be used. It is clear the
importance of accurate traffic models to feed the overall server model.
A multimedia server should try to send the requested streams as smooth
as possible (or as close as possible to CBR traffic) to minimize the impact of
sudden rate changes in the network resource. Large buffers at the receiver imply
better smoothing, but at the expense of increasing latency to start displaying
a stream. The receiver playout buffer is also used to reduce the packet delay
variability imposed by the network and to help in the recovery process when
a packet loss occur. We have surveyed a few packet recovery techniques, and
presented the main tradeoffs such as error correction capability and increase in
the transmission rate, efficiency versus latency, etc. Modeling the loss process
is an important problem and many issues remain open. Although some of the
conclusions in the chapter were drawn based on the study of voice traffic the
issues are not different for video traffic.
Due to the high speed of modern disk systems, presently the bottleneck to
delivery the continuous media stream to clients is mainly at the local network
where the server is attached, and not at the storage server. Therefore, an is-
sue that has drawn attention in recent years is the development of algorithms
to conserve bandwidth, when a large number of clients submit requests to the
server. Since multicast is still far from been widely deployed, we favor schemes
that use unicast transmission from the storage server to proxy servers. Between
the proxy and the clients multicast is more likely to be feasible, and therefore
multicast-based techniques to reduce bandwidth requirements are most likely to
be useful in the path from the proxy to the clients.
Performance Issues of Multimedia Applications 401
References
1. A. Adas. Traffic Models in Broadband Networks. IEEE Communications Magazine,
(7):82–89, 1997.
2. C. C. Aggarwal, J. L. Wolf, and P. S. Wu. On optimal batching policies for video-
on-demand storage server. In Proc. of the IEEE Conf. on Multimedia Systems,
1996.
3. C. C. Aggarwal, J. L. Wolf, and P. S. Wu. A permutation-based pyramid broad-
casting scheme for video-on-demand systems. In Proc. of the IEEE Conf. on Mul-
timedia Systems, 1996.
4. C.C. Aggarwal, J.L. Wolf, and P.S. Wu. On optimal piggyback merging policies.
In Proc. ACM Sigmetrics’96, pages 200–209, May 1996.
5. J. Almeida, D. Eager, M. Ferris, and M. Vernon. Provisioning content distribution
networks for streaming media. In Proc. of IEEE/Infocom’02, June 2002.
6. J. Apostolopoulos, T. Wong, W. Tan, and S. Wee. On multiple description stream-
ing with content delivery networks. In Proc. of IEEE/Infocom’02, NY, June 2002.
7. A. Bar-Noy, G. Goshi, R. E. Ladner, and K. Tam. Comparison os stream merging
algorithms for media-on-demand. In Proc. MMCN’02, January 2002.
8. S. Berson, R.Muntz, S. Ghandeharizadeh, and X. Ju. Staggered striping in multi-
media information systems. In ACM SIGMOD Conference, 1994.
9. W. Bolosky, J.S. Barrera, R. Draves, R. Fitzgerald, G. Gibson, M. Jones, S. Levi,
N. Myhrvold, and R. Rashid. The Tiger video fileserver. In Proc. NOSSDAV’96.
1996.
10. J-C. Bolot. Characterizing end-to-end packet delay and loss in the Internet. In
Proc. ACM Sigcomm’93, pages 289–298, September 1993.
11. J-C. Bolot, S. Fosse-Parisis, and D. Towsley. Adaptative FEC-based error control
for Internet telephony. In Proc. of IEEE/Infocom’99, pages 1453–1460, 1999.
12. J-C. Bolot and A. Vega-Garcı́a. The case for FEC-based error control for packet
audio in the Internet. ACM Multimedia Systems, 1997.
13. Y. Cai, K. Hua, and K. Vu. Optimizing patching performance. In Proc. SPIE/ACM
Conference on Multimedia Computing and Networking, 1999.
14. S. Campos, B. Ribeiro-Neto, A. Macedo, and L. Bertini. Formal verification and
analysis of multimedia systems. In ACM Multimedia Conference. Orlando, Novem-
ber 1999.
15. S. W. Carter and D. D. E. Long. Improving video-on-demand server efficiency
through stream tapping. In Sixth International Conference on Computer Commu-
nications and Networks, pages 200–207, 1997.
402 E. de Souza e Silva et al.
16. S.-H.G. Chan and F. Tobagi. Tradeoff between system profit and user delay/loss
in providing near video-on-demand service. IEEE Transactions on Circuits and
Systems for Video Technology, 11(8):916–927, August 2001.
17. E. Chang and A. Zakhor. Cost analyses for VBR video servers. IEEE Multimedia,
3(4):56–71, 1996.
18. A.L. Chervenak, D.A. Patterson, and R.H. Katz. Choosing the best storage system
for video service. In ACM Multimedia Conf., pages 109–119. SF, 1995.
19. T. Chua, J. Li, B. Ooi, and K. Tan. Disk striping strategies for large video-on-
demand servers. In ACM Multimedia Conf., pages 297–306, 1996.
20. A. Dan, D. Sitaram, and P. Shahabuddin. Scheduling policies for an on-demand
video server with batching. In Proc. of the 2nd ACM Intl. Conf. on Multimedia,
pages 15–23, 1994.
21. A. Dan, D. Sitaram, and P. Shahabuddin. Dynamic batching policies for an on-
demand video server. Multimedia Systems, (4):112–121, 1996.
22. M.C. Diniz and E. de Souza e Silva. Models for jitter control at destination. In
Proc., IEEE Intern. Telecomm. Symp., pages 118–122, 1996.
23. D. Eager, M. Ferris, and M. Vernon. Optimized caching in systems with heteroge-
nous client populations. Performance Evaluation, (42):163–185, 2000.
24. D. Eager and M. Vernon. Dynamic skyscraper broadcasts for video-on-demand. In
4th International Workshop on Multimedia Information Systems, September 1998.
25. D. Eager, M. Vernon, and J. Zahorjan. Optimal and efficient merging schedules
for video-on-demand servers. In Proc. ACM Multimedia’99, November 1999.
26. D. Eager, M. Vernon, and J. Zahorjan. Bandwidth skimming: A technique for
cost effective video-on-demand. In Proc. Multimedia Computing and Networking,
January 2000.
27. D. Eager, M. Vernon, and J. Zahorjan. Minimizing bandwidth requirements for
on-demand data delivery. IEEE Transactions on Knowledge and Data Engineering,
13(5):742–757, September 2001.
28. F. Fabbrocino, J.R. Santos, and R.R. Muntz. An implicitly scalable, fully interac-
tive multimedia storage server. In DISRT’98, pages 92–101. Montreal, July 1998.
29. D.R. Figueiredo and E. de Souza e Silva. Efficient mechanisms for recovering voice
packets in the Internet. In Proc. of IEEE/Globecom’99, Global Internet Symp.,
pages 1830–1837, December 1999.
30. C.S. Freedman and D.J. DeWitt. The SPIFFI scalable video-on-demand system.
In ACM Multimedia Conf., pages 352–363, 1995.
31. V.S. Frost and B. Melamed. Traffic Modeling for Telecommunications Networks.
IEEE Communications Magazine, 32(3):70–81, 1994.
32. L. Gao and D. Towsley. Supplying instantaneous video-on-demand services using
controlled multicast. In IEEE International Conference on Multimedia Computing
and Systems, pages 117–121, 1999.
33. S. Ghandeharizadeh, R. Zimmermann, W. Shi, R. Rejaie, D. Ierardi, and T.-W.
Li. Mitra: a scalable continuous media server. Multimedia Tools and Applications,
5(1):79–108, July 1997.
34. L. Golubchick, J.C.S. Lui, E. de Souza e Silva, and R.Gail. Evaluation of perfor-
mance tradeoffs in scheduling techniques for mixed workload multimedia servers.
Journal of Multimedia Tools and Applications, to appear, 2002.
35. L. Golubchick, J.C.S. Lui, and R. Muntz. Reducing i/o demand in video-on-
demand storage servers. In Proc. ACM Sigmetrics’95, pages 25–36, May 1995.
36. Y. Guo, S. Sen, and D. Towsley. Prefix caching assisted periodic broadcast: Frame-
work and techniques to support streaming for popular videos. In Proc. of ICC’02,
2002.
Performance Issues of Multimedia Applications 403
37. D. Heyman and D. Lucantoni. Modeling multiple ip traffic with rate limits. In J.M.
de Souza, N. da Fonseca, and E. de Souza e Silva, editors, Teletraffic Engineering
in the Internet Era, pages 445–456. 2001.
38. D.P. Heyman and T.V. Lakshman. What are the Implications of Long-Range De-
pendence for VBR-Video Traffic Engineering. IEEE/ACM Transactions on Net-
working, 4(3):301–317, June 1996.
39. Ailan Hu. Video-on-demand broadcasting protocols: A comprehensive study. In
Proc. IEEE Infocom, pages 508–517, 2001.
40. K. A. Hua, Y. Cai, and S. Sheu. Patching: A multicast technique for true video-on
demand services. In Proceedings of ACM Multimedia, pages 191–200, 1998.
41. K.A. Hua and S. Sheu. Skyscraper broadcasting: a new broadcasting scheme for
metropolitan video-on-demand systems. In Proc. of ACM Sigcomm’97, pages 89–
100. ACM Press, 1997.
42. J.Chien-Liang, D.H.C. Du, S.S.Y. Shim, J. Hsieh, and M. Lin. Design and evalua-
tion of a generic software architecture for on-demand video servers. IEEE Trans-
actions on Knowledge and Data Engineering, 11(3):406–424, May 1999.
43. P. Ji, B. Liu, D. Towsley, and J. Kurose. Modeling frame-level errors in gsm wireless
channels. In Proc. of IEEE/Globecom’02 Global Internet Symp., 2002.
44. S. Jin and A. Bestavros. Scalability of multicast delivery for non-sequential stream-
ing access. In Proc. of ACM Sigmetrics’02, June 2002.
45. K. Keeton and R. Kantz. Evaluating video layout strategies for a high-performance
storage server. In ACM Multimedia Conference, pages 43–52, 1995.
46. J. Korst. Random duplicated assignment: An alternative to striping in video
servers. In ACM Multimedia Conference, pages 219–226. Seattle, 1997.
47. J.F. Kurose and K.W. Ross. Computer Networking: A Top-Down Approach Fea-
turing the Internet. Addison-Wesley, 2001.
48. S.W. Lau, J.C.S. Lui, and L. Golubchik. Merging video streams in a multimedia
storage server: Complexity and heuristics. ACM Multimedia Systems Journal,
6(1):29–42, January 1998.
49. R.M.M. Leão, E. de Souza e Silva, and Sidney C. de Lucena. A set of tools for
traffic modelling, analysis and experimentation. In Lecture Notes in Computer
Science 1786 (TOOLS’00), pages 40–55, 2000.
50. Y. Mansour and B Patt-Shamir. Jitter control in QoS networks. IEEE/ACM
Transactions on Networking, 2001.
51. A.P. Markopoulou, F.A. Tobagi, and M.J. Karam. Assessment of VoIP quality
over Internet backbones. In Proc. of IEEE/Infocom’02, June 2002.
52. H. Michiel and K. Laevens. Traffic Engineering in a Broadband Era. Proceedings
of the IEEE, pages 2007–2033, 1997.
53. R.R. Muntz, J.R. Santos, and S. Berson. A parallel disk storage system for real-time
multimedia applications. Intl. Journal of Intelligent Systems, 13(12):1137–1174,
December 1998.
54. B. Ozden, R. Rastogi, and A. Silberschatz. Disk striping in video server environ-
ments. In IEEE Intl. Conference on Multimedia Computing and Systems, 1996.
55. B. Ozden, R. Rastogi, and A. Silberschatz. On the design of a low-cost video-on-
demand storage system. In ACM Multimedia Conference, pages 40–54, 1996.
56. K. Park and W. Willinger. Self-Similar Network Traffic: an Overview, pages 1–38.
John Wiley and Sons, INC., 2000.
57. C. S. Perkins, O. Hodson, and V. Hardman. A survey of packet-loss recovery
techniques for streaming audio. IEEE Network Magazine, pages 40–48, Sep. 1998.
404 E. de Souza e Silva et al.
58. S. Ramesh, I. Rhee, and K. Guo. Multicast with cache (mcache): An adaptative
zero-delay video-on-demand service. IEEE Transactions on Circuits and Systems
for Video Technology, 11(3):440–456, March 2001.
59. R. Rejaie, H. Yu, M. Handley, and D. Estrin. Multimedia proxy caching mechanism
for quality adaptive streaming applications in the Internet. In Proc. IEEE Infocom,
pages 980–989, 2000.
60. Reza Rejaie, Mark Handley, and Deborah Estrin. Quality adaptation for congestion
controlled video playback over the Internet. In Proc. ACM Sigcomm’99, pages 189–
200, August 1999.
61. K. Salamatian and S. Vaton. Hidden Markov Modeling for network communica-
tion channels. In Proc. of Sigmetrics/Performance’01, pages 92–101, Cambridge,
Massachusetts, USA, June 2001.
62. J.D. Salehi, Z.L.Zhang, J.F. Kurose, and D. Towsley. Supporting stored video:
reducing rate variability and end-to-end resource requirements through optimal
smoothing. IEEE/ACM Transactions on Networking, 6(4):397–410, 1998.
63. J.R. Santos and R. Muntz. Performance analysis of the RIO multimedia storage
system with heterogeneous disk configurations. In ACM Multimedia Conf., 1998.
64. J.R. Santos, R. Muntz, and B. Ribeiro-Neto. Comparing random data allocation
and data striping in multimedia servers. In Proc. ACM Sigmetrics’00, pages 44–55.
Santa Clara, 2000.
65. S. Sen, J. Rexford, J. Dey, J. Kurose, and D. Towsley. Online smoothing of variable-
bit-rate streaming video. IEEE Transactions on Multimedia, 2000.
66. S. Sen, J. Rexford, and D. Towsley. Proxy prefix caching for multimedia streams.
In Proc. IEEE Infocom, pages 1310–1319, 1999.
67. P.J. Shenoy and H.M. Vin. Efficient striping techniques for multimedia file servers.
In Proc. NOSSDAV’97, pages 25–36. 1997.
68. H. Tan, D. Eager, M. Vernon, and H. Guo. Quality of service evaluations of
multicast streaming protocols. In Proc. of ACM Sigmetrics 2002, June 2002.
69. D. Towsley, J. Kurose, and S. Pingali. A comparison of sender-initiated and
receiver-initiated reliable multicast protocols. IEEE Journal on Selected Areas
in Communications, 15(3):398–406, April 1997.
70. S. Viswanathan and T. Imielinski. Pyramid broadcasting for video on demand
service. In Proc. IEEE Multimedia Computing and Networking, volume 2417, pages
66–77, 1995.
71. B. Wang, S. Sen, M. Adler, and D. Towsley. Optimal proxy cache allocation for
efficient streaming media distribution. In Proc. IEEE Infocom, 2002.
72. Y. Wang, Z. Zhang, D. Du, and D. Su. A network-conscious approach to end-to-
end video delivery over wide area networks using proxy servers. In Proc. of IEEE
Infocom 98, pages 660–667, Abril 1998.
73. W.R. Wong. On-time Data Delivery for Interactive Visualization Apploications.
PhD thesis, UCLA/CS Dept., 2000.
74. D. Wu, Y.T. Hou, and Y. Zhang. Transporting real-time video over the Internet:
Challenges and approaches. Proceedings of the IEEE, 88(12):1855–1875, December
2000.
75. M. Yajnik, S. Mon, J. Kurose, and D. Towsley. Measurement and modeling of the
temporal dependence in packet loss. In Proc. of IEEE/Infocom’99, 1999.
76. E. Steinbach Yi J. Liang and B. Girod. Real-time voice communication over the
Internet using packet path diversity. In Proc. ACM Multimedia 2001, Ottawa,
Canada, Sept./Oct. 2001.
Markovian Modeling of Real Data Traffic:
Heuristic Phase Type and MAP Fitting of
Heavy Tailed and Fractal Like Samples
1 Introduction
In the late 80’s, traffic measurement of high speed communication networks
indicated unexpectedly high variability and burstiness over several time scales,
which indicated the need of new modeling approaches capable to capture the
observed traffic features. The first promising approach, the fractal modeling of
high speed data traffic [28], resulted in a big bum in traffic theory. Since that
time a series of traffic models were proposed to describe real traffic behavior:
fractional Gaussian noises [30,37], traditional [7] and fractional ARIMA processes
[18], fractals and multifractals [49,13], etc.
A significant positive consequence of the new traffic engineering wave is that
the importance of traffic measurement and the proper statistical analysis of
measured datasets became widely accepted and measured datasets of a wide
range of real network configurations became publicly available [52].
In spite of the intensive research activity, there are still open problems asso-
ciated with these new traffic models:
This work is supported by the OTKA-T34972 grant of the Hungarian Research
Found.
M.C. Calzarossa and S. Tucci (Eds.): Performance 2002, LNCS 2459, pp. 405–434, 2002.
c Springer-Verlag Berlin Heidelberg 2002
406 A. Horváth and M. Telek
– None of the traffic models is evidently verified by the physical behavior of the
networks. The proposed models allow us to represent some of the features
of data traffic, but some other features are not captured. Which are the
important traffic features?
– The traffic features of measured data are checked via statistical tests and
the traffic features of the models are checked using analysis and simulation
methods. Are these tests correct enough? Is there enough data available for
reliable tests?
– The majority the proposed traffic models has important asymptotic proper-
ties, but all tests are based on finite datasets. Shall we draw consequence on
the asymptotic properties based on finite datasets? And vice-versa, shall we
draw consequence from the asymptotic model behavior on the performance
of finite systems.
– Having finite datasets the asymptotic properties extracted from tests per-
formed on different time scales often differ. Which is the dominant time scale
to consider?
The above listed questions refer to the correctness of traffic models. There
is an even more important issue which determines the utility of a traffic model,
which is computability. The majority of the mentioned traffic models are not
accompanied with effective analysis tools which would allow us to use them in
practical traffic engineering.
In this paper we discuss the application of Markovian models for traffic en-
gineering. The most evident advantage of this modeling approach with respect
to the above mentioned ones is that it is supported with a set of effective analy-
sis techniques called matrix geometric methods [34,35,27,29]. The other features
of Markovian models with respect to the answers of the above listed questions
are subjects to discussion. By the nature of Markovian models, non-exponential
asymptotic behavior cannot be captured, and hence, they are not suitable for
that purpose. Instead, recent research results show that Markovian models are
able to approximate arbitrary non-Markovian behavior for an arbitrary wide
range of scales.
The paper summarizes a traffic engineering procedure composed by the fol-
lowing steps:
– statistical analysis of measured traffic data,
– Markovian approximation of traffic processes,
– analysis of performance parameters based on the Markovian model.
All steps of this procedure are supported with a number of numerical example
and the results are verified against simulation and alternative analysis methods.
The paper is organized as follows. Section 2 discusses some relevant char-
acteristics of traffic processes and describe models that exhibit these features.
Statistical tests for identifying these characteristics in datasets are described in
Section 3. A short introduction to Markovian models is given in 4. An overview
of the existing fitting methods with connected application examples is given in
5. The survey is concluded in 6.
Markovian Modeling of Real Data Traffic 407
The Weibull family (F (x) = 1 − e−(t/a) ) with c < 1 is long tailed, even if
c
all moments of the Weibull distributed random variables are finite. The heavy
tailed distributions form a subclass of the long tailed class.
A characteristic property of the heavy tailed class is the asymptotic relation
of the distribution of the sum of n samples, Sn = Y1 +. . .+Yn , and the maximum
of n samples, Mn = max1≤i≤n Yi :
Xi = B(i + 1) − B(i),
where i are i.i.d. standard normal random variables and the cj coefficients
Γ (j+d)
implement moving average with parameter d according to cj = Γ (d)Γ (j+1) . For
j d−1
large values of j the coefficients cj ∼ Γ (d) . The asymptotic behavior of the
auto-covariance function is
with coefficient Cd = π −1 Γ (1 − 2d) sin(πd). For 0 < d < 1/2 the auto-covariance
function has the same polynomial decay as the auto-covariance function of frac-
tional Gaussian noise with H = d + 1/2.
The better choice among these two processes depends on the applied anal-
ysis method. The fractional Gaussian noise is better in exhibiting asymptotic
properties based on finite number of samples, while the generation of fractional
ARIMA process samples is easier since it is based on an explicit expression.
3
1 H ill e s tim a to r
q q e s tim a to r
2 .5
0 .1
2
0 .0 1
e s tim a te
1 .5
c c d f
0 .0 0 1
1
0 .0 0 0 1 0 .5
1 e -0 5 0
0 .1 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 7 0 0 0 8 0 0 0 9 0 0 0 1 0 0 0 0
L e n g th in 1 0 0 b y te s k
Fig. 1. Experimental ccdf of the length of Fig. 2. The Hill- and the dynamic qq-plot
requests arriving to the server for the EPA trace
Markovian Modeling of Real Data Traffic 411
Hill estimator. A possible approach to estimate the index of the tail behavior
α is the Hill estimator [20]. This estimator provides the index as a function of
the k largest elements of the dataset and is defined as
−1
1
k−1
αn,k = log X(n−i) − log X(n−k) (3)
k i=0
where X(1) ≤ ... ≤ X(n) denotes the order statistics of the dataset. In practice,
the estimator given in (3) is plotted against k and if the plot stabilizes to a
constant value this provides an estimate of the index. The Hill-plot (together
with the dynamic qq-plot that will be described later) for the EPA trace is
depicted in Figure 2.
The idea behind the procedure and theoretical properties of the estimator
are discussed in [39]. Applicability of the Hill estimator is reduced by the fact
that
– its properties (e.g. confidence intervals) are known to hold only under con-
ditions that often cannot be validated in practice [39],
– the point at which the power-law tail begins must be determined and this
can be difficult because often the datasets do not show clear border between
the power-law tail and the non-power-low body of the distributions.
By slight modifications in the way the Hill plot is displayed, the uncertainty
of the estimation procedure can be somewhat reduced, see [39,40].
for a fixed value of k. (As one can see only the k upper order statistics is con-
sidered in the plot, the other part of the sample is neglected.) The plot, if the
data is close to Pareto, should be a straight line with slope 1/α. By determining
the slope of the straight line fitted to the points by least squares, we obtain the
so-called qq-estimator [25].
The qq-estimator can be visualized in two different ways. The dynamic qq-
plot, depicted in Figure 2, plots the estimate of α as the function of k (this plot
is similar to the Hill-plot). The static qq-plot, given in Figure 3, depicts (4) for a
fixed value of k and shows its least square fit. As for the Hill-plot, when applying
the qq-estimator, the point at which the tail begins has to be determined.
412 A. Horváth and M. Telek
1 7 -0 .5
q q p lo t
le a s t s q u a r e fit
1 6 -1
1 5
-1 .5
L o g 1 0 (P [X > x ])
1 4
lo g s o r te d d a ta
-2
1 3
-2 .5 R a w D a ta
1 2 2 -A g g re g a te d
4 -A g g re g a te d
-3 8 -A g g re g a te d
1 1 1 6 -A g g re g a te d
3 2 -A g g re g a te d
1 0 -3 .5 6 4 -A g g re g a te d
1 2 8 -A g g re g a te d
p o w e r lo w fr a c tio n
9 -4
0 1 2 3 4 5 6 7 8 2 2 .5 3 3 .5 4 4 .5 5 5 .5 6 6 .5 7
q u a n tile s o f e x p o n e n tia l" L o g 1 0 ( s iz e - m e a n )
Fig. 3. Static qq-plot for the EPA trace Fig. 4. Complementary distribution func-
tion for different aggregation levels for the
EPA trace
Recently, it has been agreed [28,36,37] that when one studies the long-range
dependence of a traffic trace the most significant parameter to be estimated is
the degree of self-similarity, usually given by the so-called Hurst-parameter. The
aim of the statistical approach, based on the theory of self-similarity, is to find
the Hurst-parameter.
In this section methods for estimating the long-range dependence of datasets
are recalled. Beside the procedures described here, several other can be found in
the literature. See [3] for an exhaustive discussion on this subject.
It is important to note that the introduced statistical tests of self-similarity,
based on a finite number of samples, provides an approximate value of H only
for the considered range of scales. Nothing can be said about the higher scales
and the asymptotic behavior based on these tests.
Throughout the section, we illustrate the application of the estimators on
the first trace of the well-known Bellcore dataset set that contains local-area
network (LAN) traffic collected in 1989 on an Ethernet at the Bellcore Morris-
town Research and Engineering facility. It may be downloaded from the WEB
site collecting traffic traces [52]. The trace was first analyzed in [16].
Markovian Modeling of Real Data Traffic 413
Variance-time plot. One of the tests for pseudo self-similarity is the variance-
time plot. It is based on the fact that for self-similar time series {X1 , X2 , . . . }
The variance-time plot depicts Log(Var(X (m) )) versus Log(m). For pseudo self-
similar time series, the slope of the variance-time plot −β is greater than −1.
The Hurst parameter can be calculated as H = 1 − (β/2). A traffic process is
said to be pseudo self-similar when the empirical Hurst parameter is between 0.5
and 1.
The variance-time plot for the analyzed Bellcore trace is depicted in Figure
5. The Hurst-parameter given by the variance-time plot is 0.83.
V a r ia n c e tim e p lo t 4 .5
le a s t s q u a r e fit O r ig in a l tr a c e
4 (1 ,2 )
1 (1 ,5 )
3 .5 (2 ,1 0 )
(2 ,2 0 )
3
lo g 1 0 ( R /S ( n ) )
V a r ia n c e
0 .1 2 .5
1 .5
0 .0 1
1
0 .5
0 .0 0 1 0
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6 1 e + 0 7 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6
A g g r e g a tio n le v e l n
Fig. 5. Variance-time plot and its least Fig. 6. R/S plot and its least square fit
square fit for the Bellcore trace for the Bellcore trace
R/S plot. The R/S method is one of the oldest tests for self-similarity, it is
discussed in detailin [31]. For interarrival time series, Z = {Zi , i ≥ 1}, with
n
partial sum Yn = i=1 Zi , and sample variance
1 2
n
1
S 2 (n) = Zi − 2 · Yn2 ,
n i=1 n
Since T (q) is always concave, the Legendre spectrum fL (α) may be found by
simple calculations using that
Let us mention here that there are also other kinds of fractal spectrum defined
in the fractal world (see for example [42]). The Legendre spectrum is the most
attractive one from numerical point of view, and even though in some cases it
is less informative than, for example, the large deviation spectrum, it provides
enough information in the cases considered herein.
In case of a discrete-time process X we assume that we are given the incre-
ments of a continuous-time process. This way, assuming that the sequence we
examine consists of N = 2L numbers, the sum in (5) becomes
N/2n −1
(2n ) q
Sn (q) = |Xk | , 0 ≤ n ≤ L, (7)
k=0
where the expectation is ignored. Ignoring the expectation is accurate for small n,
i.e., for the finer resolution levels. In order to estimate T (q), we plot log2 (Sn (q))
against (L − n), n = 0, 1, ..., L, then T (q) is found by the slope of the linear
line fitted to the curve. If the linear line shows good correspondence with the
Markovian Modeling of Real Data Traffic 415
8 0
4
¾ ËÕ
6 0
2
4 0 0
ËÕ
2 0 -2
Ë·½Õ
¾
0 -4
q = - 3
q = - 2
-2 0 q = - 1 -6
¾
q = 0
q = 1
-4 0 q = 2 -8
q = 3
q = 4
-6 0 -1 0
0 2 4 6 8 1 0 1 2 1 4 1 6 1 8 2 0 0 2 4 6 8 1 0 1 2 1 4 1 6 1 8
Fig. 7. Scaling of log-moments with lin- Fig. 8. Increments of log-moments for the
ear fits for the interarrival times of the interarrival times of the Bellcore pAug
Bellcore pAug trace trace
curve, i.e., if log2 (Sn (q)) scales linearly with log(n), then the sequence X can be
considered a multifractal process.
Figure 7, 9, 8 and 10 illustrate the above described procedure to obtain the
Legendre spectrum of the famous Bellcore pAug traffic trace (the trace may be
found at [52]). Figure 7 depicts the scaling behavior of the log moments calcu-
lated through (7). With q in the range [−3, 4], excluding the finest resolution
levels n = 0, 1 the moments show good linear scaling. For values of q outside the
range [−3, 4] the curves deviate more and more from linearity. As, for example,
in [43] one may look at non-integer values of q as well, but, in general, it does not
provide notably more information on the process. To better visualize the devia-
tion from linearity Figure 8 depicts the increments of the log-moment curves of
Figure 7. Completely horizontal lines would represent linear log-moment curves.
The partition function T (q) is depicted in Figure 9. The three slightly differ-
ent curves differ only in the considered range of the log-moments curves, since
different ranges result in different linear fitting. The lower bound of the linear
fitting is set to 3, 5 and 7, while the upper bound is 18 in each cases. (In the
rest of this paper the fitting range is 5 - 18 and there are 100 moments evaluated
in the range q ∈ [−5, +5].) Since the partition function varies only a little (its
derivative is in the range [0.8, 1.15]), it is not as informative as its Legendre
transform is (Figure 10). According to (6) the Legendre spectrum is as wide
as wide the range of derivatives of the partition function is, i.e., the more the
partition function deviates from linearity the wider the Legendre spectrum is.
The Legendre transform significantly amplifies the scaling information, but it is
also sensitive to the considered range of the log-moments curves.
See [43] for basic principles of interpreting the spectrum. We mention here
only that a curve like the one depicted in Figure 10 reveals a rich multifractal
spectrum. On the contrary, as it was shown in [51], the fractional Brownian
motion (fBm) has a trivial spectrum. The partition function of the fBm is a
straight line which indicates that its spectrum consists of one point, i.e., the
behavior of its log-moments is identical for any q.
416 A. Horváth and M. Telek
4
3 1
5 3
2 7 0 .9 5 5
7
0 .9
0
0 .8 5
T (q )
-2 0 .8
0 .7 5
-4
0 .7
-6 0 .6 5
0 .6
-8
0 .5 5
-5 -4 -3 -2 -1 0 1 2 3 4 5 0 .8 0 .8 5 0 .9 0 .9 5 1 1 .0 5 1 .1 1 .1 5
q «
Fig. 9. Partition function estimated Fig. 10. The Legendre transform of the
through the linear fits shown in Figure 7 partition function (Figure 9) results in the
Legendre spectrum
Haar wavelet. Another way to carry out multiscale analysis is the Haar wavelet
transform. The choice of using the unnormalized version of the Haar wavelet
transform is motivated by the fact that it suits more the analysis of the Marko-
vian point process introduced further on.
The multiscale behavior of the finite sequence Xi , 1 ≤ i ≤ 2L will be repre-
sented by the quantities cj,k , dj,k , j = 0, . . . , L and k = 1, . . . , 2L /2j . The finest
resolution is described by c0,k , 1 ≤ k ≤ 2L which gives the finite sequence itself,
i.e., c0,k = Xk . Then the multiscale analysis based on the unnormalized Haar
wavelet transform is carried out by iterating
cj,k = cj−1,2k−1 + cj−1,2k , (8)
¿½
¿½
¾½ ¾¾
¾½ ¾¾
½½ ½¾ ½¿
½½ ½¾ ½¿ ½
¼½ ¼¾ ¼¿
½ ¾ ¿
C o n s t r a i n t s
½ ¾
½ ¾
¼ ·½ ½ ½
Fig. 12. Canonical form for Acyclic continuous-time PH distributions
low, both, for having few model parameters to evaluate and for obtaining com-
putable models. The presence of slow decay behavior (heavy tail or long range
correlation) in measured datasets makes the fitting more difficult. Typically a
huge number of samples needed to obtain a fairly reliable view on the stochastic
behavior over a range of several orders of magnitude, and, of course, the asymp-
totic behavior can not be checked based on finite datasets. A class of fitting
methods approximates the asymptotic behavior based on the reliably known
ranges (e.g., based on 106 i.i.d. samples the cdf. can be approximated up to the
1 − F (x) ∼ 10−4 − 10−5 limit). The asymptotic methods are based on the as-
sumption that the dominant parameters (e.g., tail decay, correlation decay) of
the known ranges remain unchanged in the unknown region up to the asymptotic
limit.
Unfortunately, Markovian models can not exhibit any complex asymptotic
behavior. In the asymptotic region Markovian models have exponential tail de-
cay or autocorrelation. Due to this dominant property Markovian models were
not considered for fitting datasets with slow decaying features for a long time.
Recently, in spite of the exponential asymptotic decay behavior, Markovian mod-
els with slow decay behavior for several orders of magnitude were introduced.
These results broaden the attention from asymptotically slow decay models to
models with slow decay in given predefined range. The main focus of this paper
is on the use of Markovian models with slow decay behavior in applied traffic
engineering.
A finite dataset provides only a limited information about the stochastic
properties of traffic processes. Especially, the long range and the asymptotic
behavior cannot be extracted from finite dataset. To overcome the lack of these
important model properties the set of information provided by the dataset is
often accompanied by engineering assumptions in practice. One of the most
commonly applied traffic engineering assumptions is that the decay trends of a
known region continuous to infinity.
The use of engineering assumptions has a significant role in model fitting as
well. With this respect there are two major classes of fitting methods:
– fitting based on al the samples,
– fitting based on information extracted from the samples,
Naturally, there are methods which combines these two approaches.
The fitting methods based on extracted information find their roots in traffic
engineering assumptions. It is a common goal in traffic engineering to find a
simple (characterized by few parameters), but robust (widely applicable) traffic
model which is based on few representative traffic parameters of network traffic.
The traffic models discussed in Section 2 are completely characterized by very few
parameters. E.g., the tail behavior of a power tail distribution is characterized by
the heavy tail index α, fractional Gaussian noise is characterized by parameter H
and the variance over a natural time unit. Assuming that there is representative
information of the dataset, it is worth to complete the model fitting based on
this compact description of the traffic properties instead of using all the very
large dataset. Unfortunately, a commonly accepted, accurate and compact traffic
420 A. Horváth and M. Telek
characterization is not available up to now. This way, when the fitting is based on
extracted information, the goodness of fitting strongly depend on the descriptive
power of the selected characteristics to be fitted.
In this section we introduce a selected set of fitting methods from both classes.
The fitting methods that are based on extracted information are composed by
two mains steps: the statistical analysis of the dataset to extract representative
properties and the fitting itself based on these properties. The first step of this
procedure is based on the methods presented in the previous section, and only
the second step is considered here.
5.1 PH Fitting
Tail fitting based on the ccdf. The method proposed by Feldmann and Whitt
[14] is a recursive fitting procedure that results in a hyper-exponential distribu-
tion whose cumulative distribution function (ccdf) at a given set of points is
“very close” to the ccdf of the original distribution. This method was success-
fully applied to fit Pareto and Weibull distributions.
C o n s t r a i n t s
½ ¾
½ ¾
B o d y
¼ ·½ ½ ½
¼ ½
½ ¾
½ ¾
T a i l
O r ig .
0 .0 1 C F 1 + H y p e re x p .
C F 1
H y p e re x p .
0 .0 0 0 1
1 e -0 6
1 e -0 8
1 e -1 0
1 e -1 2
1 e -1 4
1 e -1 6
1 e -1 8
1 e -2 0
0 .0 1 0 .1 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6 1 e + 0 7 1 e + 0 8
Fig. 14. Different parts of the pdf are approximated by different parts of the PH
structure
The result of this fitting algorithm is a Phase type distribution of order n+m,
where n is the number of phases used for fitting the body and m is the number
of phases used for fitting the tail. The structure of this Phase type distribution is
depicted in Figure 13 where we have marked the phases used to fit the body and
those to fit the tail. The parameters β1 , . . . , βm , μ1 , . . . , μm are computed by
considering the tail while the parameters α1 , . . . , αm , λ1 , . . . , λ2 are determined
considering the main part of the distribution.
To illustrate the combined fitting method, we consider the following Pareto-
like distributions [45]:
!
αB −1 e− B t
α
for t ≤ B
Pareto I: f (t) =
αB α e−α t−(α+1) for t > B
bα e−b/t −(α+1)
Pareto II: f (t) = x
Γ (α)
0 .5 1
O r ig . O r ig .
0 .4 5 8 + 4 M L 8 + 4 M L
8 + 4 A D 8 + 4 A D
8 + 1 0 M L 8 + 1 0 M L
0 .4 8 + 1 0 A D 1 e -0 5 8 + 1 0 A D
0 .3 5
0 .3 1 e -1 0
0 .2 5
0 .2 1 e -1 5
0 .1 5
0 .1 1 e -2 0
0 .0 5
0 1 e -2 5
0 1 2 3 4 5 6 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6 1 e + 0 7 1 e + 0 8
Fig. 15. Pareto II distribution and its PH approximation with the combined method
0 .2 0 .1
O r ig . O r ig .
0 .1 8 8 + 1 0 M L 0 .0 1 8 + 1 0 M L
8 + 1 0 A D 8 + 1 0 A D
0 .1 6 0 .0 0 1
0 .0 0 0 1
0 .1 4
1 e -0 5
0 .1 2
1 e -0 6
0 .1
1 e -0 7
0 .0 8
1 e -0 8
0 .0 6
1 e -0 9
0 .0 4 1 e -1 0
0 .0 2 1 e -1 1
0 1 e -1 2
0 2 4 6 8 1 0 1 2 1 4 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6 1 e + 0 7
Body Tail
Fig. 16. Queue length distribution of an M/G/1 queue and its approximate M/PH/1
queue
1
D a ta s e t P o ly n o m ia l ta il
0 .9 F ittin g I D a ta s e t
F ittin g II F ittin g I
0 .1 F ittin g II
0 .8
0 .7
0 .6 0 .0 1
c c d f
c c d f
0 .5
0 .4
0 .0 0 1
0 .3
0 .2
0 .1 0 .0 0 0 1
0 1 0 2 0 3 0 4 0 5 0 6 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0
L e n g th in 1 0 0 b y te s L e n g th in 1 0 0 b y te s
Fig. 17. Body of the approximating dis- Fig. 18. Tail of the approximating distri-
tributions butions
A simple explanation is that fitting a process is more difficult than fittign a dis-
tribution. Capturing slow decaying behaviour with general MAP fitting seems
impossible.
Anyhow, there are numerical methods available for fitting low order MAPs
directly to datasets. In [32,15,46] a fitting method based on maximum likelihood
estimate is presented, and in [47] the EM method is used for maximizing the
likelihood estimate.
Simple numerical tests (like taking a MAP, drawing samples from it, and
fitting a MAP of the same order to these samples) often fail for MAPs of higher
order (≥ 3) and the accuracy of the method does not necessarily improve with
increasing number of samples.
in [12]. Renewal processes with heavy tailed interarrival times also exhibit self-
similar properties. Using this fact the approximate heavy tailed PH distributions
can be used to create a MAP with PH renewal process. In [1] superposition of 2
state MMPPs are used for approximating 2nd order self-similarity. The proposed
procedure fits the mean arrival rate, the 1-lag correlation, the Hurst parameter
and the required range of fitting.
The process Xn (n > 0) representing the number of arrivals in the nth time-slot
is asymptotically second-order self-similar with Hurst parameter H = (3 − c)/2
([49]).
Using the method of Feldman and Whitt [14] one may build an arrival process
whose interarrival times are independent, identically distributed PH random
variables with pdf approximating (13). To check pseudo self-similarity of this
PH renewal processes Figure 19 plots V ar(X (m) ) of PH arrival processes whose
interarrival time is a 6 phase PH approximation of the pdf given in (13) for
different values of c. As it can be observed V ar(X (m) ) is close through several
orders of magnitude to the straight line corresponding to the self-similar case
426 A. Horváth and M. Telek
with slope 2(H − 1). The aggregation level where V ar(X (m) ) drops compared to
the straight line may be increased by changing the parameters of the PH fitting
algorithm.
1 0 1 0
P H r e n e w a l, H = 0 .8 IP P
P H r e n e w a l, H = 0 .7 P H
1 P H r e n e w a l, H = 0 .6 1 S u p e rp o s e d
0 .1
0 .1
0 .0 1
V a r ia n c e
V a r ia n c e
0 .0 1
0 .0 0 1
0 .0 0 1
0 .0 0 0 1
0 .0 0 0 1 1 e -0 5
1 e -0 5 1 e -0 6
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6 1 e + 0 7 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6 1 e + 0 7
A g g r e g a tio n le v e l A g g r e g a tio n le v e l
Fig. 19. Variance-time plot of pseudo self- Fig. 20. Superposition of the PH arrival
similar arrival processes with i.i.d. PH in- process with an MMPP
terarrival
The parameters of the two-state MMPP with which the PH arrival process
is superposed are calculated in two steps:
1. At first we calculate the parameters of an Interrupted Poisson Process (IPP).
The IPP is a two-state MMPP that has one of its two arrival rates equal
to 0. The calculated parameters of the IPP are such that the superposition
of the PH arrival process and the IPP results in a traffic source with the
desired first and second order parameters E(N1 ), I(t1 ) and I(t2 ).
2. In the second step, based on the IPP we find a two-state MMPP that has the
same first and second order properties as the IPP has (recalling results from
[4]), and with which the superposition results in the desired third centralized
moment.
If the MMPP is “less long-range dependent” than the PH arrival process,
the pseudo self-similarity of the superposed traffic model will be dominated by
the PH arrival process. This fact is depicted in Figure 20. It can be observed
that if the Hurst parameter is estimated based on the variance-time plot the
Hurst parameter of the superposed model is only slightly smaller than the Hurst
parameter of the PH arrival process. In numbers, the Hurst parameter of the
PH arrival process is 0.8 while it is 0.78 for the superposed model (based on the
slope in the interval (10, 106 )). This behavior is utilized in the fitting method to
approximate the short and long range behavior in a separate manner.
We illustrate the procedure by fitting the Bellcore trace. Variance-time plots
of the traffic generated by the MAPs resulted from the fitting are depicted in
Figure 21. The curve signed by (x1 , x2 ) belongs to the fitting when the first
(second) time point of fitting the IDC value, t1 (t2 ), is x1 (x2 ) times the expected
interarrival time. R/S plots for both the real traffic trace and the traffic generated
by the approximating MAPs are given in Figure 22. The fitting of the traces
Markovian Modeling of Real Data Traffic 427
4 .5
O r ig in a l tr a c e O r ig in a l tr a c e
(1 ,2 ) 4 (1 ,2 )
(1 ,5 ) (1 ,5 )
1 (2 ,1 0 ) (2 ,1 0 )
3 .5
(2 ,2 0 ) (2 ,2 0 )
3
lo g 1 0 ( R /S ( n ) )
2 .5
V a r ia n c e
0 .1
2
1 .5
0 .0 1
1
0 .5
0 .0 0 1 0
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6 1 e + 0 7 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 e + 0 6
A g g r e g a tio n le v e l n
Fig. 21. Variance-time plots of MAPs Fig. 22. R/S plots of MAPs with different
with different time points of IDC match- time points of IDC matching
ing
were tested by a •/D/1 queue, as well. The results are depicted in Figure 23.
The •/D/1 queue was analyzed by simulation with different levels of utilization
of the server. As one may observe the lower t1 and t2 the longer the queue length
distribution follows the original one.
The fitting method provides a MAP whose some parameters are the same
as those of the original traffic process (or very close). Still, the queue length
distribution does not show a good match. This means that the chosen parameters
do not capture all the important characteristics of the traffic trace.
1 1
O r ig in a l tr a c e O r ig in a l tr a c e
(1 ,2 ) (1 ,2 )
0 .1 (1 ,5 ) 0 .1 (1 ,5 )
(2 ,1 0 ) (2 ,1 0 )
0 .0 1 (2 ,2 0 ) 0 .0 1 (2 ,2 0 )
0 .0 0 1 0 .0 0 1
P r o b a b ility
P r o b a b ility
0 .0 0 0 1 0 .0 0 0 1
1 e -0 5 1 e -0 5
1 e -0 6 1 e -0 6
1 e -0 7 1 e -0 7
1 e -0 8 1 e -0 8
1 1 0 1 0 0 1 1 0 1 0 0 1 0 0 0
Q u e u e le n g th Q u e u e le n g th
ρ = 0.2 ρ = 0.4
1 1
O r ig in a l tr a c e O r ig in a l tr a c e
(1 ,2 ) (1 ,2 )
0 .1 (1 ,5 ) 0 .1 (1 ,5 )
(2 ,1 0 ) (2 ,1 0 )
0 .0 1 (2 ,2 0 ) 0 .0 1 (2 ,2 0 )
0 .0 0 1 0 .0 0 1
P r o b a b ility
P r o b a b ility
0 .0 0 0 1 0 .0 0 0 1
1 e -0 5 1 e -0 5
1 e -0 6 1 e -0 6
1 e -0 7 1 e -0 7
1 e -0 8 1 e -0 8
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0
Q u e u e le n g th Q u e u e le n g th
ρ = 0.6 ρ = 0.8
−λ λ
λ −λ
• λ γλ
λ • γλ
γλ • λ
γλ λ •
1 e + 1 0 8 0
O r ig in a l tr a c e
1 e + 0 9 A p p r o x im a tin g tr a c e
6 0
1 e + 0 8
1 e + 0 7 4 0
ËÕ
¾
1 e + 0 6
2 0
1 0 0 0 0 0
¾
1 0 0 0 0 0
q = - 3
q = - 2
1 0 0 0 -2 0 q = - 1
q = 0
1 0 0 q = 1
-4 0 q = 2
1 0 q = 3
q = 4
1 -6 0
0 2 4 6 8 1 0 1 2 1 4 1 6 1 8 2 0 0 2 4 6 8 1 0 1 2 1 4 1 6 1 8 2 0
Fig. 24. The second moment of the Haar Fig. 25. Scaling of log-moments of the
wavelet transform at different aggregation original trace and the fitting MMPP
levels
represent the log-moment curves of the fitting MAP and the solid lines indicate
the corresponding log-moment curves of the Bellcore trace. In the range of n ∈
(3, 19) the log-moment curves of the fitting MAP are very close to the ones of
the original trace. The log-moment curves of the approximate MAP are also very
close to linear in the considered range.
4
O r ig in a l tr a c e
A p p r o x im a tin g tr a c e 1
2
0 .9 5
0 0 .9
0 .8 5
T (q )
-2
0 .8
-4 0 .7 5
0 .7
-6
0 .6 5
O r ig in a l s p e c tr u m
-8 A p p r o x im a tin g s p e c tr u m
0 .6
-5 -4 -3 -2 -1 0 1 2 3 4 5 0 .8 5 0 .9 0 .9 5 1 1 .0 5 1 .1 1 .1 5
q «
Fig. 26. Partition function estimated Fig. 27. The Legendre transform of the
through the linear fits shown in Figure 25 original dataset and the one of the ap-
proximate MMPP
The partition functions of the fitting MAP and of the original trace are
depicted in Figure 26. As it was mentioned earlier, the visual appearance of
the partition function is not very informative about the multifractal scaling
behavior. Figure 27 depicts the Legendre transform of the partition functions
of the original dataset and the approximating MAP. The visual appearance of
the Legendre transform significantly amplifies the differences of the partition
functions. In Figure 27, it can be seen that both processes exhibit multifractal
behavior but the original dataset has a bit richer multifractal spectrum.
We also compared the queuing behavior of the original dataset with that of
the approximate MAP assuming deterministic service time and different levels of
utilization, ρ. Figure 28 depicts the queue length distribution resulting from the
Markovian Modeling of Real Data Traffic 431
1 1
O r ig in a l tr a c e O r ig in a l tr a c e
A p p r o x im a tin g tr a c e A p p r o x im a tin g tr a c e
0 .1 0 .1
0 .0 1
0 .0 1
0 .0 0 1
P r o b a b ility
P r o b a b ility
0 .0 0 1
0 .0 0 0 1
0 .0 0 0 1
1 e -0 5
1 e -0 5
1 e -0 6
1 e -0 7 1 e -0 6
1 e -0 8 1 e -0 7
1 1 0 1 0 0 1 1 0 1 0 0 1 0 0 0
Q u e u e le n g th Q u e u e le n g th
ρ = 0.2 ρ = 0.4
1 1
O r ig in a l tr a c e O r ig in a l tr a c e
A p p r o x im a tin g tr a c e A p p r o x im a tin g tr a c e
0 .1 0 .1
0 .0 1 0 .0 1
0 .0 0 1 0 .0 0 1
P r o b a b ility
P r o b a b ility
0 .0 0 0 1 0 .0 0 0 1
1 e -0 5 1 e -0 5
1 e -0 6 1 e -0 6
1 e -0 7 1 e -0 7
1 e -0 8 1 e -0 8
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0
Q u e u e le n g th Q u e u e le n g th
ρ = 0.6 ρ = 0.8
original and the approximate arrival processes. The queue length distribution
curves show a quite close fit. The probability of an empty queue, which is not
displayed in the figures, is the same for the MAP as for the original trace since
the MAP has the same average arrival intensity as the original trace. The fit is
better with a higher queue utilization, which might mean that different scaling
behaviors play a dominant rule at different utilizations, and the ones that are
dominant at high utilization are better approximated by the proposed MAP.
6 Conclusions
This paper collects a set of methods which can be used in practice for mea-
surement based traffic engineering. The history of traffic theory of high speed
communication networks is summarized together with a short introduction to
the mathematical foundation of the applied concepts. The common statistical
methods for the analysis of data traces and the practical problems of their ap-
plication is discussed.
The use of Markovian methods is motivated by the fact that an effective
analysis technique, the matrix geometric method, is available for the evalua-
tion of Markovian queuing systems. To obtain the Markovian approximation
of measured traffic data a variety of heuristic fitting methods are applied. The
properties and abilities of these methods are also discussed.
432 A. Horváth and M. Telek
References
1. A. T. Andersen and B. F. Nielsen. A markovian approach for modeling packet
traffic with long-range dependence. IEEE Journal on Selected Areas in Commu-
nications, 16(5):719–732, 1998.
2. S. Asmussen and O. Nerman. Fitting Phase-type distributions via the EM algo-
rithm. In Proceedings: ”Symposium i Advent Statistik”, pages 335–346, Copen-
hagen, 1991.
3. J. Beran. Statistics for long-memory processes. Chapman and Hall, New York,
1994.
4. A. W. Berger. On the index of dispersion for counts for user demand modeling. In
ITU, Madrid, Spain, June 1994. Study Group 2, Question 17/2.
5. A. Bobbio, A. Horváth, M. Scarpa, and M. Telek. Acyclic discrete phase type
distributions: Properties and a parameter estimation algorithm. submitted to Per-
formance Evaluation, 2000.
6. S. C. Borst, O. J. Boxma, and R. Nunez-Queija. Heavy tails: The effect of the ser-
vice discipline. In Tools 2002, pages 1–30, London, England, April 2002. Springer,
LNCS 2324.
7. G. E. P. Box, G. M Jenkins, and C. Reinsel. Time Series Analysis: Forecasting
and Control. Prentice Hall, Englewood Cliff, N.J., third edition, 1994.
8. E. Castillo. Extreme Value Theory in Engineering. Academic Press, San Diego,
California, 1988.
9. M. E. Crovella and M. S. Taqqu. Estimating the heavy tail index from scaling
properties. Methodology and Computing in Applied Probability, 1(1):55–79, 1999.
10. A. Cumani. On the canonical representation of homogeneous Markov processes
modelling failure-time distributions. Microelectronics and Reliability, 22:583–602,
1982.
11. R. El Abdouni Khayari, R. Sadre, and B. Haverkort. Fitting world-wide web
request traces with the EM-algorithm. In Proc. of SPIE, volume 4523, pages 211–
220, Denver, USA, 2001.
12. R. El Abdouni Khayari, R. Sadre, and B. Haverkort. A valiadation of the pseudo
self-similar traffic model. In Proc. of IPDS, Washington D.C., USA, 2002.
13. A. Feldman, A. C. Gilbert, and W. Willinger. Data networks as cascades: Investi-
gating the multifractal nature of internet WAN traffic. Computer communication
review, 28/4:42–55, 1998.
14. A. Feldman and W. Whitt. Fitting mixtures of exponentials to long-tail distribu-
tions to analyze network performance models. Performance Evaluation, 31:245–
279, 1998.
15. W. Fischer and K. Meier-Hellstern. The Markov-modulated Poisson process
(MMPP) cookbook. Performance Evaluation, 18:149–171, 1992.
16. H. J. Fowler and W. E. Leland. Local area network traffic characteristics, with im-
plications for broadband network congestion management. IEEE JSAC, 9(7):1139–
1149, 1991.
17. R. Fox and M. S. Taqqu. Large sample properties of parameter estimates for
strongly dependent stationary time series. The Annals of Statistics, 14:517–532,
1986.
Markovian Modeling of Real Data Traffic 433
38. A. Ost and B. Haverkort. Modeling and evaluation of pseudo self-similar traffic with
infinite-state stochastic petri nets. In Proc. of the Workshop on Formal Methods
in Telecommunications, pages 120–136, Zaragoza, Spain, 1999.
39. S. Resnick. Heavy tail modeling and teletraffic data. The Annals of Statistics,
25:1805–1869, 1997.
40. S. Resnick and C. Starica. Smoothing the hill estimator. Advances in Applied
Probability, 29:271–293, 1997.
41. J. Rice. Mathematical Statistics and Data Analysis. Brooks/Cole Publishing,
Pacific Grove, California, 1988.
42. R. H. Riedi. An introduction to multifractals. Technical report, Rice University,
1997. Available at http://www.ece.rice.edu/˜riedi.
43. R. H. Riedi, M. S. Crouse, V. J. Ribeiro, and R. G. Baraniuk. A multifractal wavelet
model with application to network traffic. IEEE Transactions on Information
Theory, 45:992–1018, April 1999.
44. S. Robert and J.-Y. Le Boudec. New models for pseudo self-similar traffic. Per-
formance Evaluation, 30:1997, 57-68.
45. M. Roughan, D. Veitch, and M. Rumsewicz. Numerical inversion of probability
generating functions of power-law tail queues. tech. report, 1997.
46. T. Rydén. Parameter estimation for Markov Modulated Poisson Processes.
Stochastic Models, 10(4):795–829, 1994.
47. T. Ryden. An EM algorithm for estimation in Markov modulated Poisson pro-
cesses. Computational statist. and data analysis, 21:431–447, 1996.
48. T. Rydén. Estimating the order of continuous phase-type distributions and markov-
modulated poisson processes. Stochastic Models, 13:417–433, 1997.
49. B. Ryu and S. B. Lowen. Point process models for self-similar network traffic, with
applications. Stochastic models, 14, 1998.
50. M. Telek and A. Heindl. Moment bounds for acyclic discrete and continuous
phase-type distributions of second order. In in proc. of Eighteenth Annual UK
Performance Engineering Workshop (UKPEW), Glasgow, UK, 2002.
51. J. Lévy Véhel and R. H. Riedi. Fractional brownian motion and data traffic model-
ing: The other end of the spectrum. In C. Tricot J. Lévy Véhel, E. Lutton, editor,
Fractals in Engineering, pages 185–202. Springer, 1997.
52. The internet traffic archive. http://ita.ee.lbl.gov/index.html.
Optimization of Bandwidth and Energy
Consumption in Wireless Local Area Networks
1 Introduction
t ime spre ading o f t he ac c e sse s t hat t he st andard bac ko f f pro c e dure ac c o mplishe s has a
ne gat iv e impac t o n bo t h t he c hanne l ut iliz at io n, and t he e ne rgy c o nsumpt io n. S pe c if i-
c ally, t he t ime spre ading o f t he ac c e sse s c an int ro duc e large de lays in t he me ssage
t ransmissio ns, and e ne rgy w ast age s due t o t he c arrie r se nsing. Furt he rmo re , t he I EEE
80 2 . 1 1 po lic y has t o pay t he c o st o f c o llisio ns t o inc re ase t he bac ko f f t ime w he n t he
ne t w o rk is c o nge st e d.
I n [ 5 ] , [ 6 ] and [ 7 ] , giv e n t he binary e x po ne nt ial bac ko f f sc he me ado pt e d by t he
st andard, so lut io ns hav e be e n pro po se d f o r a be t t e r unif o rm dist ribut io n o f ac c e sse s.
T he mo st pro mising dire c t io n f o r impro v ing t he bac ko f f pro t o c o ls is t o ado pt f e e d-
bac k- base d t uning algo rit hms t hat e x plo it t he inf o rmat io n re t rie v e d f ro m t he o bse rv a-
t io n o f t he c hanne l st at us [ 8] , [ 9 ] , [ 1 0 ] . Fo r t he I EEE 80 2 . 1 1 M A C pro t o c o l, so me
aut ho rs hav e pro po se d an adapt iv e c o nt ro l o f t he ne t w o rk c o nge st io n by inv e st igat ing
t he numbe r o f use rs in t he syst e m [ 1 1 ] , [ 1 2 ] , [ 1 3] . T his inv e st igat io n c o uld re sult
e x pe nsiv e , dif f ic ult t o o bt ain, and subje c t t o signif ic ant e rro rs, e spe c ially in high
c o nt e nt io n sit uat io ns [ 1 2 ] . Dist ribut e d ( i. e . inde pe nde nt ly e x e c ut e d by e ac h st at io n)
st rat e gie s f o r po w e r sav ing hav e be e n pro po se d and inv e st igat e d in [ 1 4] , [ 1 5 ] . S pe c if i-
c ally, in [ 1 4] t he aut ho rs pro po se a po w e r c o nt ro lle d w ire le ss M A C pro t o c o l base d o n
a f ine - t uning o f ne t w o rk int e rf ac e t ransmit t ing po w e r. [ 1 5 ] e x t e nds t he algo rit hm
pre se nt e d in [ 7 ] w it h po w e r sav ing f e at ure s.
T his pape r pre se nt s and e v aluat e s a dist ribut e d me c hanism f o r t he c o nt e nt io n c o n-
t ro l in I EEE 80 2 . 1 1 W LA Ns t hat e x t e nds t he st andard ac c e ss me c hanism w it ho ut
re quiring any addit io nal hardw are . O ur me c hanism dynamic ally adapt s t he bac ko f f
w indo w siz e t o t he c urre nt ne t w o rk c o nt e nt io n le v e l, and guarant e e s t hat an I EEE
80 2 . 1 1 W LA N asymptotically ac hie v e s it s o pt imal c hanne l ut iliz at io n and/ o r t he
minimum e ne rgy c o nsumpt io n. Fo r t his re aso n w e name d o ur me c hanism A sympt o t i-
c al O pt imal Bac ko f f ( AOB) . T o t une t he parame t e rs o f o ur me c hanism w e analyt i-
c ally st udie d t he bandw idt h and e ne rgy c o nsumpt io n o f t he I EEE 80 2 . 1 1 st andard, and
w e de riv e d c lo se d f o rmulas t hat re lat e t he pro t o c o l bac ko f f parame t e rs t o t he max imum
t hro ughput and t o t he minimal e ne rgy c o nsumpt io n.
O ur analyt ic al st udy o f I EEE 80 2 . 1 1 pe rf o rmanc e is base d o n a p- pe rsist e nt mo de l
o f t he I EEE 80 2 . 1 1 pro t o c o l [ 1 1 ] , [ 1 5 ] . T his pro t o c o l mo de l dif f e rs f ro m t he st andard
pro t o c o l o nly in t he se le c t io n o f t he bac ko f f int e rv al. I nst e ad o f t he binary e x po ne nt ial
bac ko f f use d in t he st andard, t he bac ko f f int e rv al o f t he p- pe rsist e nt I EEE 80 2 . 1 1
pro t o c o l is sample d f ro m a ge o me t ric dist ribut io n w it h parame t e r p. I n [ 1 1 ] , it w as
sho w n t hat a p- pe rsist e nt I EEE 80 2 . 1 1 c lo se ly appro x imat e s t he st andard pro t o c o l.
I n t his pape r, w e use t he p- pe rsist e nt mo de l t o de riv e analyt ic al f o rmulas f o r t he
I EEE80 2 . 1 1 pro t o c o l c apac it y and e ne rgy c o nsumpt io n. Fro m t he se f o rmulas w e
c o mput e t he p v alue ( i. e . , t he av e rage bac ko f f w indo w siz e ) c o rre spo nding t o max i-
mum c hanne l ut iliz at io n ( optimal capacity p, also re f e rre d t o as pCopt ) , and t he p v alue
c o rre spo nding t o minimum e ne rgy c o nsumpt io n ( optimal energy p, also re f e rre d t o as
E
popt ) . T he pro pe rt ie s o f t he o pt imal o pe rat ing po int s ( bo t h f ro m t he e f f ic ie nc y and
po w e r sav ing st andpo int ) are de e ply inv e st igat e d. I n addit io n, w e also pro v ide c lo se d
f o rmulas f o r t he o pt imal p v alue s. T he se f o rmulas are use d by A O B t o dynamic ally
t une t he W LA N bac ko f f parame t e rs e it he r t o max imiz e W LA N e f f ic ie nc y, o r t o
minimiz e W LA N e ne rgy c o nsumpt io n.
O pt imiz at io n o f Bandw idt h and Ene rgy C o nsumpt io n 437
2. IEEE 802.11
Backoff _ Counter
INT Rnd CW _ Size ,
w he re Rnd() is a f unc t io n w hic h re t urns pse udo - rando m numbe rs unif o rmly dist ribut e d
in [ 0 . . 1 ] .
T he Binary Ex po ne nt ial Bac ko f f is c harac t e riz e d by t he e x pre ssio n t hat giv e s t he
de pe nde nc y o f t he CW_Size parame t e r by t he numbe r o f unsuccessful transmission
attempts ( N_A) alre ady pe rf o rme d f o r a giv e n f rame . I n [ 1 ] it is de f ine d t hat t he f irst
t ransmissio n at t e mpt f o r a giv e n f rame is pe rf o rme d ado pt ing CW_Size e qual t o t he
minimum v alue CW_Size_min ( assuming lo w c o nt e nt io n) . A f t e r e ac h unsuc c e ssf ul
( re ) t ransmissio n o f t he same f rame , t he st at io n do uble s C W _ S iz e unt il it re ac he s t he
max imal v alue f ix e d by t he st andard, i. e . CW_Size_MAX, as f o llo w s:
CW _ Size N _ A
min CW _ Size _ MAX , CW _ Size _ min 2 N _ A 1
.
Po sit iv e ac kno w le dge me nt s are e mplo ye d t o asc e rt ain a suc c e ssf ul t ransmissio n. T his
is ac c o mplishe d by t he re c e iv e r ( imme diat e ly f o llo w ing t he re c e pt io n o f t he dat a
f rame ) w hic h init iat e s t he t ransmissio n o f an ac kno w le dge me nt f rame ( A C K ) af t e r a
t ime int e rv al Short Inter Frame Space ( S I FS ) , w hic h is le ss t han DI FS .
I f t he t ransmissio n ge ne rat e s a c o llisio n1 , t he CW_Size parame t e r is do uble d f o r
t he ne w sc he duling o f t he re t ransmissio n at t e mpt t hus o bt aining a f urt he r re duc t io n o f
c o nt e nt io n.
T he inc re ase o f t he C W _ S iz e parame t e r v alue af t e r a c o llisio n is t he re ac t io n t hat
t he 80 2 . 1 1 st andard DC F pro v ide s t o make t he ac c e ss me c hanism adapt iv e t o c hanne l
c o ndit io ns.
c o nge st e d syst e ms, be c ause it c o nc e nt rat e s t he ac c e sse s in a re duc e d t ime w indo w , and
he nc e it may c ause a high c o llisio n pro babilit y.
parame t e r v alue
Numbe r o f S t at io ns ( M) v ariable f ro m 2 t o 2 0 0
C W _ S iz e _ min 1 6
C W _ S iz e _ M A X 1 0 2 4
C hanne l t ransmissio n rat e 2 M b/ s
Paylo ad siz e Ge o me t ric dist ribut io n ( parame t e r q)
A c kno w le dge me nt siz e 2 0 0 P se c ( 5 0 Byt e s)
H e ade r siz e 1 36 P se c ( 34 Byt e s)
S lo t T ime ( t slot ) 5 0 P se c
S I FS 2 8 P se c
DI FS 1 2 8 P se c
Pro pagat io n t ime < 1 P se c
0.8
0.7
0.5
0.4
0.3
0.2
0.1
0
20 40 60 80 100 120 140 160 180 200
Number of active stations
P_ T ( N_ A = 4) P_ T ( N_ A = 8)
1
P_T
0 .8
0 .6
0 .4
0 .2
0
0 0 .2 0 .4 0 .6 0 .8 1
S_U
P_ T ( N_ A = 4) P_ T ( N_ A = 8)
1
P_T
0 .8
0 .6
opt_S_U= 0 . 80
0 .4
0 .2
0
0 0 .2 0 .4 0 .6 0 .8 1
S_U
4. Protocol Model
sage w ait ing t o be t ransmit t e d; iv) t he me ssage le ngt hs, say l i , are rando m v ariable s
ide nt ic ally and inde pe nde nt ly dist ribut e d.
1 p
M
( 5)
E> Idle@ ,
1 1 p
M
1 1 p
M
( 6)
E> N c @ 1 ,
Mp1 p
M 1
E[ Coll | Coll ] ( 7)
^
f
t slot
¦ h >1 1 Pr^L d h` p@
M
>
1 1 p
M
Mp1 p
M 1
@ h 1 ,
>1 1 Pr^L h` p@
M
M > Pr^L d h` Pr^L h`@ p1 p
M 1
`
Pro o f . T he pro o f is o bt aine d w it h st andard pro babilist ic c o nside rat io ns ( se e [ 1 1 ] ) .
¡
E> S@ is a c o nst ant inde pe nde nt o f t he p v alue , but de pe nde nt o nly o n t he me ssage
le ngt h dist ribut io n. A s it appe ars f ro m Equat io n ( 4) and Le mma 1 , t he c hanne l ut iliz a-
t io n is a f unc t io n o f t he pro t o c o l parame t e r p, t he numbe r M o f ac t iv e st at io ns and t he
me ssage le ngt h dist ribut io n. T he pro t o c o l c apac it y, say U MAX , is o bt aine d by f inding
t he p v alue , say pCopt , t hat max imiz e s Equat io n ( 3) . S inc e E> S@ is a c o nst ant , t he
ut iliz at io n- max imiz at io n pro ble m is e quiv ale nt t o t he f o llo w ing minimiz at io n pro b-
le m:
1
(popt) - M=10
(popt) - M=100
(capacity) - M=10
(capacity) - M=100
0.1
Relative Error
0.01
0.001
0.0001
1e-005
0 10 20 30 40 50 60 70 80 90 100
Message Length (time slots)
° C 1 p M C 1 ½° ( 10)
min ®
p > 0 ,1 @ °̄ Mp1 p
M 1 ¾
p > 0 ,1 @
^
min F p, M , C ` .
°¿
0.01
0.001
0.0001
1e-005
0 10 20 30 40 50 60 70 80 90 100
Message Length (time slots)
C
f o rmula f o r t he popt v alue . No t e t hat t his c lo se d f o rmula ge ne raliz e s re sult s alre ady
kno w n f o r S - A LO H A ne t w o rks [ 2 3] .
MM 1
( 12)
1 2 C 1 1
pCopt # ,
M 1 C 1
>
w he re C E max ^L1 , L2 ` . @
PRO O F. T he pro o f is re po rt e d in A ppe ndix B.
1 2 C 1 1 ( 13)
M pCopt # ,
C 1
1 ( 14)
pCopt # .
M C 2
A s f o r t he t hro ughput max imiz at io n, it is de sirable t o hav e a simple r re lat io nship t han
E
Equat io n ( 1 5 ) t o pro v ide an appro x imat io n f o r t he popt v alue . T o t his e nd, in t he
f o llo w ing w e w ill inv e st igat e t he ro le o f t he v ario us t e rms o f Equat io n ( 1 5 ) in de t e r-
mining t he Ene rgy C o nsumpt io n. S pe c if ic ally, in t he e ne rgy c o nsumpt io n f o rmula
w e se parat e t he t e rms t hat are inc re asing f unc t io n o f t he p v alue f ro m t he t e rms t hat
are de c re asing f unc t io n o f t he p v alue .
T o ac hie v e t his, it is use f ul t o int ro duc e t he f o llo w ing pro po sit io n.
PRX M 1 lt slot E> N ta @ M E> EnergyColl | N tr t 1 @`
PRO O F. T he pro o f re quire s so me alge braic manipulat io ns o f Equat io n ( 1 5 ) . W e f irst
o bse rv e t hat t he numbe r o f suc c e ssf ul t ransmissio ns in a v irt ual t ransmissio n t ime is
M . Furt he rmo re , in t he v irt ual t ransmissio n t ime t he re is e x ac t ly o ne suc c e ssf ul
t ransmissio n o f t he t agge d st at io n, and in av e rage t he re are ( M 1 ) suc c e ssf ul t rans-
missio ns o f t he o t he r st at io ns. I t is also st raight f o rw ard t o de riv e t hat t he av e rage
numbe r o f c o llisio ns t hat w e o bse rv e w it hin a v irt ual t ransmissio n t ime is
O pt imiz at io n o f Bandw idt h and Ene rgy C o nsumpt io n 45 1
100
E[Idle_p]
E[EnergyColl] - PTXPRX=1
E[EnergyColl] - PTXPRX=2
E[EnergyColl] - PTXPRX=10
10
Energy Consumption
0.1
0.01
0 0.02 0.04 0.06 0.08 0.1
p value
E
Fig. 6: popt appro x imat io n w it h M 1 0 and l 2 t slot .
E
Fro m Pro po sit io n 2 it f o llo w s t hat popt c o rre spo nds t o t he p v alue t hat minimiz e s
t he de no minat o r o f Equat io n ( 1 6 ) . I t is also w o rt h no t ing t hat t he se c o nd and t hird
t e rms o f t his de no minat o r do no t de pe nd o n t he p v alue and he nc e t he y play no ro le in
t he minimiz at io n pro c e ss. No w o ur pro ble m re duc e s t o f ind
p
^
min E> N ta @ E Energy Idle_ > p @ E>N @ M E>Energy
ta Coll | N tr t 1 @` . ( 17)
>
E Energy Idle _ p @ E>Energy tag _ Coll @
| tag _ Coll Ptag _ Coll | N tr t1
.
( 20)
E .
Equat io n ( 2 0 ) de f ine s a simple but appro x imat e re lat io nship t o c harac t e riz e popt
S pe c if ic ally, in Fig. 6 w e hav e plo t t e d E[ Energy Idle _ p ] and E[ EnergyColl | N tr t 1 ]
v e rsus t he p v alue , f o r v ario us PTX PRX v alue s. E[ Energy Idle _ p ] is e qual t o
E[ Idle _ p] due t o t he assumpt io n t hat PRX = 1 . T he p v alue t hat c o rre spo nds t o t he
int e rse c t io n po int o f t he E[ Energy Idle _ p ] and E[ EnergyColl | N tr t 1 ] c urv e s is t he
E
appro x imat io n o f t he popt v alue , as Equat io n ( 1 6 ) indic at e s. A s t he
E[ EnergyColl | N tr t 1 ] re lat e d t o PTX / PRX 1 is e qual t o t he av e rage le ngt h o f a
c o llisio n giv e n a t ransmissio n at t e mpt , i. e . E[ Coll | N tr t 1 ] , t he p v alue t hat c o rre -
spo nds t o t he int e rse c t io n po int o f t he E[ Idle _ p] and E[ Coll | N tr t 1 ] c urv e s pro -
v ide s a go o d appro x imat io n o f t he pCopt v alue , as Equat io n ( 6 ) indic at e s. W e no t e t hat
by inc re asing t he PTX v alue also E[ EnergyColl | N tr t 1 ] gro w s due t o t he rise in t he
e ne rgy c o nsumpt io n o f t agge d- st at io n c o llisio ns. H o w e v e r, E[ Energy Idle _ p ] do e s no t
E
de pe nd o n t he PTX v alue , he nc e , o nly a de c re ase in t he popt v alue c an balanc e t he
inc re ase in E[ EnergyColl | N tr t 1 ] .
M 1 ª M 2 ECT 1 º ( 21)
1 2 «C 1 » 1
M ¬ M PRX M ¼
E
popt # ,
ª M 2 ECT 1 º
M 1 «C 1 »
¬ M PRX M ¼
O pt imiz at io n o f Bandw idt h and Ene rgy C o nsumpt io n 45 3
where C >
E max ^L1 , L2 `@ and ECT >
E Energy tag _ Coll tag _ Coll , N tr @
2 .
PRO O F. T he pro o f o f t his Le mma c an be f o und in [ 2 4]
A s w e hav e do ne in Pro po sit io n 1 , t he f o llo w ing pro po sit io n pro v ide s an analyt ic al
inv e st igat io n o f t he M popt
E f o r a large ne t w o rk- siz e po pulat io n. T his inv e st igat io n is
E
use f ul be c ause it sho w s ho w f o r a large ne t w o rk siz e po pulat io n t he popt v alue t e nds
t o t he pCopt v alue .
§ E 1 · ( 22)
1 2 ¨ C CT 1 ¸ 1
© PRX M ¹ 1 2 C 1 1
M popt
E
# | .
§ ECT 1 · C 1
¨C 1 ¸
© PRX M ¹
PRO O F. T he pro o f o f t his pro po sit io n is st raight f o rw ard. U nde r t he c o ndit io n
M !! 1 , Equat io n ( 2 1 ) c an be re w rit t e n as ( 2 2 ) by no t ing t hat ( M 1 ) | M and
( M 2 ) |M .
¡
E
lat io n as it c o nt ribut e s o nly t o ECT . O bv io usly popt pCopt w he n PTX PRX .
H o w e v e r, t he c o mpariso n be t w e e n t he st ruc t ure o f Equat io n ( 1 3) and Equat io n ( 2 2 )
sho w also t hat t he c o rre spo nde nc e be t w e e n t he o pt imal p v alue s c o nt inue s t o ho ld.
0.8
0.7
0.4
0.3
0.2
0.1
0
2 20 40 60 80 100 120 140 160 180 200
Number of active stations
400000
300000
200000
100000
0
2 20 40 60 80 100 120 140 160 180 200
Number of active stations
0.6
0.5
0.2
0.1
0
0 500 1000 1500 2000 2500 3000 3500 4000
Slot Units
0.6
0.5
Channel Utilization
0.4
0.1
0
0 500 1000 1500 2000 2500 3000 3500 4000
Block Units
0.8
0.79
Channel Utilization
0.78
5 A blo c k unit c o rre spo nds t o 5 1 2 slo t s. T he blo c k unit is int ro duc e d t o smo o t h t he t rac e .
T he smo o t hing w as int ro duc e d t o re duc e t he f luc t uat io ns and t hus inc re asing t he f igure
re adabilit y.
45 8 M . C o nt i and E. Gre go ri
0.8
0.78
Channel Utilization
0.76
AOB: no estimation error
AOB: -50% Slot Utilization error
0.74 AOB: -25% Slot Utilization error
AOB: -10% Slot Utilization error
AOB: +10% Slot Utilization error
AOB: +25% Slot Utilization error
0.72 AOB: +50% Slot Utilization error
10 20 30 40 50 60 70 80 90 100
Number of active stations
Acknoledgements
References
Appendix: A
^1 p
M
p1 p
M 1
` ( A.1)
p ½ .
>1 @ M 1 1
C ® p1 p 1 p
M 1 M
¾
¯ p¿
O pt imiz at io n o f Bandw idt h and Ene rgy C o nsumpt io n 46 1
M M 1
PColl | N tr t1 PN tr t1 >
1 1 p
M
Mp1 p
M 1
@ 2
p 2 O Mp 3
( A.3)
O Mp
3
PColl | N tr t1 PN tr t1
Appendix: B
6 T his assumpt io n is alw ays t rue in a st able syst e m as t he av e rage numbe r o f t ransmit t ing
st at io ns in an e mpt y slo t must be le ss t han o ne .
46 2 M . C o nt i and E. Gre go ri
MM 1
( B.1)
1 2 C 1 1
pCopt # ,
M 1 C 1
>
w he re C E max ^L1 , L2 ` . @
PRO O F.
I n [ 2 4] it is sho w n t hat if t he ne t w o rk st at io ns are o pe rat ing c lo se t o t he pCopt v alue
and if t he st at io ns o pe rat e in asympt o t ic c o ndit io ns E>Coll | Collision@ | C . T he
E>Coll | Collision@ | C assumpt io n indic at e s t hat E[ Coll | Collision ] de pe nds o nly
o n t he me ssage - le ngt h dist ribut io n and Equat io n ( 1 1 ) c an be re w rit t e n as:
1 p
M
^ >
C 1 1 p
M
Mp1 p
M 1
@` 0 . ( B.2)
U nde r t he c o ndit io n Mp 1 ,
M M 1
1 p
M
| 1 Mp
2
p 2 2 Mp
3
, ( B.3)
> @ | M M2 1 p
2 ( B.4)
1 1 p Mp1 p 2 Mp
M M 1 3
.
Hewlett Packard Labs, 1501 Page Mill Rd., Palo Alto, CA, USA, 94304
{jerry rolia,rich friedrich,chandrakant patel}@hp.com
www.hpl.hp.com/research/internet
1 Introduction
In the not-too-distant future, billions of people, places and things could all be
connected to each other and to useful services through the Internet. Re-use
and scale motivate the need for service centric computing. With service centric
computing application services, for example payroll or tax calculation, may be
composed of other application services and also rely on computing, network-
ing, and storage resources as services. These services will be offered according
to a utility paradigm. They will be provisioned, delivered, metered, managed,
and purchased in a consistent manner when and where they are needed. This
paper explains the components of service centric computing with examples of
performance studies that pertain to resources offered as a service.
Figure 1 illustrates the components of service centric computing. Applications
may be composed of application and resource services via open middleware such
as Web services. Applications discover and acquire access to services via a grid
service architecture. Resource utilities offer computing, network, and storage
resources as services. They may also offer complex aggregates of these resources
with specific qualities of service. We refer to this as Infrastructure on Demand
(IOD).
M.C. Calzarossa and S. Tucci (Eds.): Performance 2002, LNCS 2459, pp. 463–479, 2002.
c Springer-Verlag Berlin Heidelberg 2002
464 J. Rolia, R. Friedrich, and C. Patel
A p p lic a tio n s
O p e n M id d le w a re :
W e b s e rv ic e s
. G rid S e rv ic e A rc h ite c tu re
R e s o u rc e U tilitie s
2.1 Applications
We consider technical, commercial, and ubiquitous classes of applications and
their requirements on service centric computing.
Technical applications are typically processing, data, and/or communications
intensive. Examples of applications are found from life and material sciences,
manufacturing CAE and CAD, national defense, high-end film and video, elec-
tronic design simulation, weather and climate modeling, geological sciences, and
basic research. These applications typically present batch style jobs with a finite
duration.
Commercial applications may be accessed via Intranet or Internet systems.
They often rely on multi-tier architectures that include firewall, load balancer
and server appliances and exploit concepts of horizontal and vertical scalability.
Service Centric Computing – Next Generation Internet Computing 465
3.1 Globus
Globus is U.S. government sponsored organization formed to develop Grid so-
lutions for high performance scientific computing applications. The goals are
essentially those listed for technical applications in Section 2.1. The initial Grid
vision for the Globus Grid development was to enable computational grids:
Service Centric Computing – Next Generation Internet Computing 467
R S L
S p e c ia liz a tio n
B ro k e r
R S L R S L
Q u e rie s /In fo
A p p lic a tio n In fo S e rv ic e
.
R S L
C o − a llo c a to r R S L
R S L
R S L
G R A M G R A M G R A M
L S F C o n d o r O th e r
R e s o u rc e S c h e d u le rs
There are many examples of resource schedulers including LSF [12], Con-
dor [13] and Legion [14]. A review of scheduling in Grid environments is given
in reference [11].
468 J. Rolia, R. Friedrich, and C. Patel
4 Resource Utilities
In a world of service centric computing, applications may rely on resource utilities
for some or all of their resource needs. Today’s grid environments offer resources
in an infrastructure transparent manner. Yet some styles of applications place
explicit requirements on infrastructure topology and qualities of service. In this
section we describe programmable data centers as resource utilities that can
offer infrastructure on demand. Technologies that contribute to infrastructure
on demand are described along with several examples of research in this area.
P ro g ra m m a b le
n e tw o rk fa b ric s
In fra s tru c tu re
P ro c e s s in g S to ra g e o n D e m a n d
e le m e n ts e le m e n ts
V A E 1
.
In fra s tru c tu re V irtu a l
S p e c ific a tio n w irin g
V A E n
In te rn e t
Programmable Servers. Since the time of early IBM mainframes server vir-
tualization has been an important feature for resource management. With server
virtualization each job or application is isolated within its own logically inde-
pendent system partition. The fraction of system resources associated with each
partition can be dynamically altered, permitting the vertical scaling of resources
associated with an application. Server virtualization is a convenient mechanism
to achieve server consolidation. Examples of systems that support server virtu-
alization include:
Server virtualization offers: performance isolation for partitions – when the re-
source consumption of each partition can bounded; capacity on demand – when
the fractions of resources associated with partitions can be changed dynamically;
security isolation – depending on the implementation; and can be used to sup-
port high availability solutions with redundant, but idle, application components
residing in partitions on alternative servers.
1
5 0 % fm P e a k
5 0 % F M P e a k
0 .9 5 0 % F M M e a n
0 .8
0 .7
F r a c tio n o f C P U s
0 .6
0 .5
0 .4
0 .3
0 .2
0 .1
0
6 8 1 6 3 2
N u m b e r o f C P U s p e r S e rv e r
boot time) and time needed to drain user sessions prior to removing a server. The
results show resource savings, with respect to a static allocation of resources, of
approximately 29%. In both cases the peak to mean ratio for resource demand
was approximately 1.6. A transformed version of the E-commerce trace with
a peak to mean of 5 offered resource savings of 68% with respect to a static
allocation of resources. We note that the resource savings are expected to be
higher in practice since the static cases would have to be further over-provisioned.
exploit online measurements to improve the design while the system operates.
Experiments showed performance results within 15% of that achieved by expert
administrators.
Wide area load balancing. A wide area load balancing system is described
in reference [30]. A goal of the work is to balance application demands over
servers within and across utilities using distributed, collaborative, self-control
mechanisms.
5 Research Challenges
This section offers several performance related research challenges for service
centric computing. They address issues of data center design and resource man-
agement, resource management for planetary scale computing (federations of
infrastructure providers), and general control and validation for these large scale
systems. We use the term resource broadly to include information technology
and energy.
What is the most efficient, economical data center design?
– What high density, low power, high performance computing architectures
most economically support infrastructure as a service?
– What are the simple building blocks of processing, communications and stor-
age that support dynamic allocation at a data center level of granularity?
– What are the implications of commodity components on multi-system de-
signs?
What are the most effective performance management techniques for utility com-
puting?
– What measurement techniques/metrics are appropriate for large scale dis-
tributed environments?
– What automated techniques are appropriate for creating models of applica-
tions and infrastructure?
– How are models validated for this very dynamic world?
What are the most efficient dynamic resource management techniques?
– What techniques are appropriate for ensuring qualities of service within lay-
ers of shared infrastructures?
– What to do about federations of providers (co-allocation)?
– What techniques are appropriate for making good use of resources?
What control system techniques can be applied effectively to this scale and
dynamism?
– What automated reasoning techniques can eliminate the complexity of con-
trolling large scale systems?
– What control theoretic techniques are applicable to reactive and predictive
events?
– What time scales are appropriate for control?
– How are control measures and decisions coordinated across federated sys-
tems?
What is the science of large scale computing that provides probabilistic assurance
of large scale behavior based on small scale experiments?
– What is the equivalent to the aeronautical engineer’s wind tunnel?
– What behaviors scale linearly?
Service Centric Computing – Next Generation Internet Computing 477
7 Trademarks
Sun and Sun Fire are trademarks of the Sun Microsystems Inc., IBM is a trade-
mark of International Business Machines Corporation, Intel is a trademark of
Intel Corporation, VMware is a trademark of VMware Inc., HP Utility Data
Center with Controller Software is a trademark of Hewlett Packard Company,
Terraspring is a trademark of Terraspring, and Think Dynamics is a trademark
of Think Dynamics.
References
1. www.webservices.org.
2. www.w3.org/XML.
3. www.w3.org/TR/SOAP.
4. www.w3.org/TR/wsdl.
5. www-106.ibm.com/developerworks/webservices/library/ws-wsilspec.html.
6. www.uddi.org.
7. www.globalgridforum.org.
8. Czajkowski K., Foster I., Karonis N., Kesselman C., Martin S., Smith W., and
Tuecke S.: A Resource Management Architecture for Metacomputing Systems.
JSSPP, 1988, 62-82.
478 J. Rolia, R. Friedrich, and C. Patel
9. Foster I., Kesselman C., Nick J., and Tuecke S.: The Physiology of the
Grid: An Open Grid Services Architecture for Distributed Systems Integration.
www.globus.org, January, 2002.
10. The Grid: Blueprint for a New Computing Infrastructure, Edited by Ian Foster
and Carl Kesselman, July 1998, ISBN 1-55860-475-8.
11. Krauter K., Buyya R., and Maheswaran M.: A taxonomy and survey of grid re-
source management systems for distributed computing. Software-Practice and Ex-
perience, vol. 32, no. 2, 2002, 135-164.
12. Zhou S.: LSF: Load sharing in large-scale heterogeneous distributed systems, Work-
shop on Cluster Computing, 1992.
13. Litzkow M., Livny M. and Mutka M.: Condor - A Hunter of Idle Workstations. Pro-
ceedings of the 8th International Conference on Distributed Computing Systems,
June, 1998, 104-111.
14. Natrajan A., Humphrey M., and Grimshaw A.: Grids: Harnessing Geographically-
Separated Resources in a Multi-Organisational Context. Proceedings of High Per-
formance Computing Systems, June, 2001.
15. Rolia J., Singhal S. and Friedrich R.: Adaptive Internet Data Centers. Proceedings
of the European Computer and eBusiness Conference (SSGRR), L’Aquila, Italy,
July 2000, Italy, http://www.ssgrr.it/en/ssgrr2000/papers/053.pdf.
16. www.hp.com.
17. www.sun.com.
18. www.ibm.com.
19. www.vmware.com.
20. www.intel.com.
21. HP Utility Data Center Architecture, http://www.hp.com/solutions1/
infrastructure/solutions /utilitydata/architecture/index.html.
22. www.terraspring.com.
23. www.thinkdynamics.com.
24. Appleby K., Fakhouri S., Fong L., Goldszmidt G. and Kalantar M.: Oceano –
SLA Based Management of a Computing Utility. Proceedings of the IFIP/IEEE
International Symposium on Integrated Network Management, May 2001.
25. Ranjan S., Rolia J., Zu H., and Knightly E.: QoS-Driven Server Migration for
Internet Data Centers. Proceedings of IWQoS 2002, May 2002, 3-12.
26. Rolia J., Zhu X., Arlitt M., and Andrzejak A.: Statistical Service Assurances for
Applications in Utility Grid Environments. HPL Technical Report, HPL-2002-155.
27. Anderson E., Hobbs M., Keeton K., Spence S., Uysal M., and Veitch A.: Hip-
podrome: running circles around storage administration. Conference on File and
Storage Technologies (FAST3902), 17545188 - 284530 January 2002, Monterey,
CA. (USENIX, Berkeley, CA.).
28. Borowsky E., Golding R., Jacobson P., Merchant A., Schreier L., Spasojevic M.,
and Wilkes J.: Capacity planning with phased workloads, WOSP, 1998, 199-207.
29. Foster I., Kesselman C., Lee C., Lindell R., Nahrstedt K., and Roy A.: A Dis-
tributed Resource Management Architecture that Supports Advance Reservations
and Co-Allocation. Proceedings of the International Workshop on Quality of Ser-
vice, 1999.
30. Andrzejak, A., Graupner, S., Kotov, V., and Trinks, H.: Self-Organizing Control
in Planetary-Scale Computing. IEEE International Symposium on Cluster Com-
puting and the Grid (CCGrid), 2nd Workshop on Agent-based Cluster and Grid
Computing (ACGC), May 21-24, 2002, Berlin.
Service Centric Computing – Next Generation Internet Computing 479
31. Patel C., Bash C., Belady C., Stahl L., and Sullivan D.: Computational Fluid
Dynamics Modeling of High Compute Density Data Centers to Assure System Inlet
Air Specifications. Proceedings of IPACK’01 The Pacific Rim/ASME International
Electronic Packaging Technical Conference and Exhibition July 8-13, 2001, Kauai,
Hawaii, USA.
32. Patel, C.D., Sharma, R.K, Bash, C.E., Beitelmal, A: Thermal Considerations in
Cooling Large Scale High Compute Density Data Centers, ITherm 2002 - Eighth
Intersociety Conference on Thermal and Thermomechanical Phenomena in Elec-
tronic Systems. May 2002, San Diego, California.
33. Sharma, R.K, Bash. C.E., Patel, C.D.: Dimensionless Parameters for Evaluation of
Thermal Design and Performance of Large Scale Data Centers. Proceedings of the
8th ASME/AIAA Joint Thermophysics and Heat Transfer Conf., St. Louis, MO,
June 2002.
E u r o p e a n D a ta G r id P r o je c t: E x p e r ie n c e s o f D e p lo y in g a
L a r g e S c a le T e s tb e d fo r E -s c ie n c e A p p lic a tio n s
1 1 2 3
F a b riz io G a g lia rd i , B o b J o n e s , M a rio R e a le , a n d S te p h e n B u rk e
O n b e h a lf o f th e E U D a ta G rid P ro je c t
1 C E R N , E u ro p e a n P a rtic le P h y s ic s L a b o ra to ry ,
C H -1 2 1 1 G e n e v e 2 3 , S w itz e rla n d
{ F a b r i z i o . G a g l i a r d i , B o b . J o n e s } @ c e r n . c h
h t t p : / / w w w . c e r n . c h
2 IN F N C N A F , V ia le B e rti-P ic h a t 6 /2 ,
I-4 0 1 2 7 B o lo g n a , Ita ly
m a r i o . r e a l e @ c n a f . i n f n . i t
3 R u th e rfo rd A p p le to n L a b o ra to ry ,
C h ilto n , D id c o t, O x o n , U K
s . b u r k e @ r l . a c . u k
A b s tr a c t. T h e o b je c tiv e o f th e E u ro p e a n D a ta G rid (E D G ) p ro je c t is to a s s is t
th e n e x t g e n e ra tio n o f s c ie n tific e x p lo ra tio n , w h ic h re q u ire s in te n s iv e
c o m p u ta tio n a n d a n a ly s is o f s h a re d la rg e -s c a le d a ta s e ts , fro m h u n d re d s o f
te ra b y te s to p e ta b y te s , a c ro s s w id e ly d is trib u te d s c ie n tific c o m m u n itie s . W e s e e
th e s e re q u ire m e n ts e m e rg in g in m a n y s c ie n tific d is c ip lin e s , in c lu d in g p h y s ic s ,
b io lo g y , a n d e a rth s c ie n c e s . S u c h s h a rin g is m a d e c o m p lic a te d b y th e
d is trib u te d n a tu re o f th e re s o u rc e s to b e u s e d , th e d is trib u te d n a tu re o f th e
re s e a rc h c o m m u n itie s , th e s iz e o f th e d a ta s e ts a n d th e lim ite d n e tw o rk
b a n d w id th a v a ila b le . T o a d d re s s th e s e p ro b le m s w e a re b u ild in g o n e m e rg in g
c o m p u ta tio n a l G rid te c h n o lo g ie s to e s ta b lis h a re s e a rc h n e tw o rk th a t is
d e v e lo p in g th e te c h n o lo g y c o m p o n e n ts e s s e n tia l fo r th e im p le m e n ta tio n o f a
w o rld -w id e d a ta a n d c o m p u ta tio n a l G rid o n a s c a le n o t p re v io u s ly a tte m p te d .
A n e s s e n tia l p a rt o f th is p ro je c t is th e p h a s e d d e v e lo p m e n t a n d d e p lo y m e n t o f a
la rg e -s c a le G rid te s tb e d .
M .C . C a lz a r o s s a a n d S . T u c c i ( E d s .) : P e r f o r m a n c e 2 0 0 2 , L N C S 2 4 5 9 , p p . 4 8 0 – 4 9 9 , 2 0 0 2 .
© S p rin g e r-V e rla g B e rlin H e id e lb e rg 2 0 0 2
E u ro p e a n D a ta G rid P ro je c t 4 8 1
1 I n tr o d u c tio n
T h e E U D a ta G rid (E D G ) is a p ro je c t fu n d e d b y th e E u r o p e a n U n i o n w i t h 0
th ro u g h th e F ra m e w o rk V IS T R & D p ro g ra m m e (s e e w w w .e u - d a ta g r id .o r g ) . T h e r e
a re 2 1 p a rtn e r o rg a n is a tio n s fro m 1 5 E U c o u n trie s , w ith a to ta l p a rtic ip a tio n o f o v e r
2 0 0 p e o p le , fo r a p e rio d o f th re e y e a rs s ta rtin g in J a n u a ry 2 0 0 1 . T h e o b je c tiv e s o f th e
p ro je c t a re to s u p p o rt a d v a n c e d s c ie n tific re s e a rc h w ith in a G rid e n v iro n m e n t,
o ffe rin g c a p a b ilitie s fo r in te n s iv e c o m p u ta tio n a n d a n a ly s is o f s h a re d la rg e -s c a le
d a ta s e ts , fro m h u n d re d s o f te ra b y te s to p e ta b y te s , a c ro s s w id e ly d is trib u te d s c ie n tific
c o m m u n itie s . S u c h re q u ire m e n ts a re e m e rg in g in m a n y s c ie n tific d is c ip lin e s ,
in c lu d in g p a rtic le p h y s ic s , b io lo g y , a n d e a rth s c ie n c e s .
T h e E D G p ro je c t h a s n o w re a c h e d its m id -p o in t, s in c e th e p ro je c t s ta rte d o n Ja n u a ry
st st
1 2 0 0 1 a n d th e fo re s e e n e n d o f th e p ro je c t is o n D e c e m b e r 3 1 2 0 0 3 . A t th is s ta g e ,
v e ry e n c o u ra g in g re s u lts h a v e a lre a d y b e e n a c h ie v e d in te rm s o f th e m a jo r g o a ls o f
th e p ro je c t, w h ic h a re th e d e m o n s tra tio n o f th e p ra c tic a l u s e o f c o m p u ta tio n a l a n d
d a ta G rid s fo r w id e a n d e x te n d e d u s e b y th e h ig h e n e rg y p h y s ic s , b io -in fo rm a tic s a n d
e a rth o b s e rv a tio n c o m m u n itie s .
A p ro d u c tio n q u a lity te s tb e d h a s b e e n s e t u p a n d im p le m e n te d a t a n u m b e r o f E D G
s ite s , w h ile a s e p a ra te d e v e lo p m e n t te s tb e d a d d re s s e s th e n e e d fo r ra p id te s tin g a n d
p ro to ty p in g o f th e E D G m id d le w a re . T h e E D G p ro d u c tio n te s tb e d c o n s is ts c u rre n tly
o f te n s ite s , s p re a d a ro u n d E u ro p e : a t C E R N (G e n e v a ), IN F N -C N A F (B o lo g n a ), C C -
IN 2 P 3 (L y o n ), N IK H E F (A m s te rd a m ), IN F N -T O (T o rin o ), IN F N -C T (C a ta n ia ),
IN F N -P D (P a d o v a ), E S A -E S R IN (F ra s c a ti), Im p e ria l C o lle g e (L o n d o n ), a n d R A L
(O x fo rd s h ire ). T h e E D G d e v e lo p m e n t te s tb e d c u rre n tly c o n s is ts o f fo u r s ite s : C E R N ,
IN F N -C N A F , N IK H E F , a n d R A L . T h e re fe re n c e s ite fo r th e E D G c o lla b o ra tio n is a t
C E R N , w h e re , b e fo re a n y o ffic ia l v e rs io n o f th e E D G m id d le w a re is re le a s e d , th e
in itia l te s tin g o f th e s o ftw a re is p e rfo rm e d a n d th e m a in fu n c tio n a litie s a re p ro v e n ,
b e fo re d is trib u tio n to th e o th e r d e v e lo p m e n t te s tb e d s ite s .
2 T h e E u r o p e a n D a ta G r id M id d le w a r e A r c h ite c tu r e
S ix te e n s e rv ic e s h a v e b e e n im p le m e n te d b y th e m id d le w a re d e v e lo p e rs , b a s e d o n
o rig in a l c o d in g f o r s o m e s e rv ic e s a n d o n th e u s a g e o f th e G lo b u s 2 to o lk it ( se e
w w w .g lo b u s .o rg ) fo r b a s ic G rid in fra s tru c tu re s e rv ic e s : a u th e n tic a tio n (G S I), s e c u re
file tra n s fe r (G rid F T P ), in fo rm a tio n s y s te m s (M D S ), jo b s u b m is s io n (G R A M ) a n d th e
G lo b u s R e p lic a C a ta lo g u e . In a d d itio n th e jo b s u b m is s io n s y s te m u s e s s o ftw a re fr o m
th e C o n d o r-G p ro je c t [8 ]. T h e m id d le w a re a ls o re lie s o n g e n e ra l o p e n s o u rc e s o ftw a re
su c h a s O p e n L D A P .
T h e m id d le w a re d e v e lo p m e n t is d iv id e d in to s ix fu n c tio n a l a re a s : w o rk lo a d
m a n a g e m e n t, d a ta m a n a g e m e n t, G rid M o n ito rin g a n d In fo rm a tio n S y s te m s , fa b ric
m a n a g e m e n t, m a s s d a ta s to ra g e , a n d n e tw o rk m o n ito rin g . A s k e tc h o f th e e s s e n tia l
E D G a rc h ite c tu re is s h o w n in F ig u re 1 [1 ], w h e re th e re la tio n s h ip b e tw e e n th e
O p e ra tin g S y s te m , G lo b u s to o ls , th e E D G m id d le w a re a n d th e a p p lic a tio n s is s h o w n .
T h e E D G a rc h ite c tu re is th e re fo re a m u lti-la y e re d a rc h ite c tu re . A t th e lo w e s t le v e l is
th e o p e ra tin g s y s te m . G lo b u s p ro v id e s th e b a s ic s e rv ic e s fo r s e c u re a n d a u th e n tic a te d
u s e o f b o th o p e ra tin g s y s te m a n d n e tw o rk c o n n e c tio n s to s a fe ly tra n s fe r file s a n d d a ta
a n d a llo w in te ro p e ra tio n o f d is trib u te d s e rv ic e s . T h e E D G m id d le w a re u s e s th e
G lo b u s s e rv ic e s , a n d in te rfa c e s to th e h ig h e s t la y e r, th e u s e r a p p lic a tio n s ru n n in g o n
th e G rid .
E u ro p e a n D a ta G rid P ro je c t 4 8 3
a p p lic a tio n A L IC E A T L A S C M S L H C b O th e r
la y e r
V O c o m m o n
a p p lic a tio n la y e r L H C O th e r
G R ID H ig h le v e l G R I D
m id d le w a r e m id d le w a r e
G L O B U S B a s ic S e r v c e s
2 .0
O S & N e t s e rv ic e s
F ig . 1 . T h e s c h e m a tic la y e re d E D G a rc h ite c tu re : th e G lo b u s h o u rg la s s
2 .1 W o r k lo a d M a n a g e m e n t S y s te m (W M S )
T h e g o a l o f th e W o rk lo a d M a n a g e m e n t S y s te m is to im p le m e n t a n a rc h ite c tu re fo r
d is trib u te d s c h e d u lin g a n d re s o u rc e m a n a g e m e n t in a G rid e n v iro n m e n t. It p ro v id e s to
th e G rid u s e rs a s e t o f to o ls to s u b m it th e ir jo b s , h a v e th e m e x e c u te d o n th e
d is trib u te d C o m p u tin g E le m e n ts (a G rid re s o u rc e m a p p e d to a n u n d e rly in g b a tc h
s y s te m ), g e t in fo rm a tio n a b o u t th e ir s ta tu s , re trie v e th e ir o u tp u t, a n d a llo w th e m to
a c c e s s G rid re s o u rc e s in a n o p tim a l w a y (o p tim iz in g C P U u s a g e , re d u c in g file
tra n s fe r tim e a n d c o s t, a n d b a la n c in g a c c e s s to re s o u rc e s b e tw e e n u s e rs ). It d e a ls w ith
th e J o b M a n a g e r o f th e G rid A p p lic a tio n la y e r a n d th e G rid S c h e d u le r in th e
C o lle c tiv e S e rv ic e s la y e r. A fu n c tio n a l v ie w o f th e w h o le W M S s y s te m is re p re s e n te d
in fig u re 4 .
4 8 4 F . G a g lia rd i e t a l.
L o c a l L o c a l
A p p lic a tio n D a ta b a se
G r id A p p lic a tio n L a y e r
J o b D a ta M e ta d a ta O b je c t to F ile
M g m t. M g m t. M g m t. M a p .
C o lle c tiv e S e r v ic e s
G r id S c h e d u le r R e p lic a I n fo r m a tio n
M a n a g e r M o n ito r in g
U n d e r ly in g G r id S e r v ic e s
S Q L C o m p . S to r a g e R e p lic a A u th o r . S e r v ic e
D a ta b a se E le m . E le m . C a ta lo g A u th e n . In d e x
S e r v e r S e r v ic e s S e r v ic e s a n d A c c .
F a b r ic s e r v ic e s
R e so u r c e C o n fig . M o n ito r . N o d e F a b r ic
M g m t. M g m t A n d F a u lt I n s ta lla tio n S to r a g e
T o le r a n c e & M g m t. M g m t.
A p p l ic a t io n A r e a s
P h y s ic s A p p l. ( W P 8 ) E a r t h O b s e r v a t io n A p p l. ( W P 9 ) B io lo g y A p p l. ( W P 1 0 )
D a t a G r id S e r v ic e s
W o r k lo a d M a n a g e m e n t ( W P 1 )
D a ta M a n a g e m e n t ( W P 2 ) M o n ito r in g S e r v ic e s ( W P 3 )
C o r e M id d le w a r e
G lo b u s M i d d l e w a r e S e r v i c e s ( I n f o r m a t i o n , S e c u r i t y , . . . )
P h y s ic a l F a b r ic
F a b r ic M a n a g e m e n t ( W P 4 ) N e tw o r k in g ( W P 7 ) M a s s S to ra g e M a n a g e m e n t (W P 5 )
F ig . 3 . T h e E D G s e rv ic e a rc h ite c tu re
- R e s o u r c e B r o k e r (R B ): T h is p e rfo rm s m a tc h -m a k in g b e tw e e n th e re q u ire m e n ts o f
a jo b a n d th e a v a ila b le re s o u rc e s , a n d a tte m p ts to s c h e d u le th e jo b s in a n o p tim a l
w a y , ta k in g in to a c c o u n t th e d a ta lo c a tio n a n d th e re q u ire m e n ts s p e c ifie d b y th e
u s e r. T h e in fo rm a tio n a b o u t a v a ila b le re s o u rc e s is re a d d y n a m ic a lly fro m th e
In fo rm a tio n a n d M o n ito rin g S y s te m . T h e s c h e d u lin g a n d m a tc h -m a k in g a lg o rith m s
u s e d b y th e R B a re th e k e y to m a k in g e ffic ie n t u s e o f G rid re s o u rc e s . In
p e rfo rm in g th e m a tc h -m a k in g th e R B q u e rie s th e R e p lic a C a ta lo g u e , w h ic h is a
s e rv ic e u s e d to re s o lv e lo g ic a l file n a m e s (L F N , th e g e n e ric n a m e o f a file ) in to
p h y s ic a l file n a m e s (P F N , w h ic h g iv e s th e p h y s ic a l lo c a tio n a n d n a m e o f a
p a rtic u la r file re p lic a ). T h e jo b c a n th e n b e s e n t to th e s ite w h ic h m in im is e s th e
c o s t o f n e tw o rk b a n d w id th to a c c e s s th e file s .
- J o b S u b m is s io n S y s t e m (JS S ): T h is is a w ra p p e r fo r C o n d o r-G [8 ] , in te rf a c in g th e
G rid to a L o c a l R e s o u rc e M a n a g e m e n t S y s te m (L R M S ), u s u a lly a b a tc h s y s te m
lik e P B S , L S F o r B Q S . C o n d o r-G is a C o n d o r-G lo b u s jo in t p ro je c t, w h ic h
c o m b in e s th e in te r-d o m a in re s o u rc e m a n a g e m e n t p ro to c o ls o f th e G lo b u s T o o lk it
w ith th e in tra -d o m a in re so u rc e a n d jo b m a n a g e m e n t m e th o d s o f C o n d o r to a llo w
h ig h th ro u g h p u t c o m p u tin g in m u lti-d o m a in e n v iro n m e n ts .
- L o g g in g a n d B o o k k e e p in g (L B ): T h e L o g g in g a n d B o o k k e e p in g s e rv ic e s to re s a
v a rie ty o f in fo rm a tio n a b o u t th e s ta tu s a n d h is to ry o f s u b m itte d jo b s u s in g a
M y S Q L d a ta b a s e .
4 8 6 F . G a g lia rd i e t a l.
J o b D e s c r ip tio n L a n g u a g e (J D L )
T h e J D L a llo w s th e v a rio u s c o m p o n e n ts o f th e G rid S c h e d u le r to c o m m u n ic a te
re q u ire m e n ts c o n c e rn in g th e jo b e x e c u tio n . E x a m p le s o f s u c h re q u ire m e n ts a re :
In o rd e r fo r a u s e r to h a v e a jo b c o rre c tly e x e c u te d o n a w o rk e r n o d e o f a n
a v a ila b le C o m p u tin g E le m e n t, th e u s e r’s c re d e n tia ls h a v e to b e tra n s m itte d b y th e
c re a tio n o f a p ro x y c e rtific a te . A u s e r is s u e s a g rid -p ro x y -in it c o m m a n d o n a u s e r
in te rfa c e m a c h in e to c re a te a n X .5 0 9 P K I p ro x y c e rtific a te u s in g th e ir lo c a lly s to re d
p riv a te k e y . A n a u th e n tic a tio n re q u e s t c o n ta in in g th e p ro x y p u b lic a n d p riv a te k e y s
a n d th e u s e r’s p u b lic k e y is s e n t to a s e rv e r; th e s e rv e r g e ts th e re q u e s t a n d c re a te s a
c o d e d m e s s a g e b y m e a n s o f th e u s e r’s p u b lic k e y , s e n d in g it b a c k to th e u s e r p ro c e s s
o n th e U s e r In te rfa c e m a c h in e . T h is m e s s a g e is d e c o d e d b y m e a n s o f th e u s e r’s
p riv a te k e y a n d s e n t b a c k a g a in to th e s e rv e r (in th is c a s e n o rm a lly th e R e s o u rc e
B ro k e r). W h e n th e s e rv e r g e ts th e c o rre c tly d e c o d e d m e s s a g e it c a n b e s u re a b o u t th e
u s e r’s id e n tity , s o th a t a n a u th e n tic a te d c h a n n e l c a n b e e s ta b lis h e d a n d th e u s e r
c re d e n tia ls c a n b e d e le g a te d to th e b ro k e r.
U se rs h a v e a t th e ir d is p o s a l a s e t o f c o m m a n d s to h a n d le jo b s b y m e a n s o f a
c o m m a n d lin e in te rfa c e in s ta lle d o n a U s e r In te rfa c e m a c h in e , o n w h ic h th e y h a v e a
n o rm a l lo g in a c c o u n t a n d h a v e in s ta lle d th e ir X 5 0 9 c e rtific a te . T h e y c a n s u b m it a jo b ,
q u e ry its s ta tu s , g e t lo g g in g in fo rm a tio n a b o u t th e jo b h is to ry , c a n c e l a jo b , b e n o tifie d
v ia e m a il o f th e jo b ’s e x e c u tio n , a n d re trie v e th e jo b o u tp u t. W h e n a jo b is s u b m itte d
to th e s y s te m th e u s e r g e ts b a c k a G rid -w id e u n iq u e h a n d le b y m e a n s o f w h ic h th e jo b
c a n b e id e n tifie d in o th e r c o m m a n d s .
2 .2 D a ta M a n a g e m e n t S y ste m (D M S )
T h e g o a l o f th e D a ta M a n a g e m e n t S y s te m is to s p e c ify , d e v e lo p , in te g ra te a n d te s t
to o ls a n d m id d le w a re to c o h e re n tly m a n a g e a n d s h a re p e ta b y te -s c a le in fo rm a tio n
v o lu m e s in h ig h -th ro u g h p u t p ro d u c tio n -q u a lity g rid e n v iro n m e n ts . T h e e m p h a s is is
o n a u to m a tio n , e a s e o f u s e , s c a la b ility , u n ifo rm ity , tra n s p a re n c y a n d h e te ro g e n e ity .
T h e D M S w ill m a k e it p o s s ib le to s e c u re ly a c c e s s m a s s iv e a m o u n ts o f d a ta in a
u n iv e rs a l g lo b a l n a m e s p a c e , to m o v e a n d re p lic a te d a ta a t h ig h s p e e d fro m o n e
g e o g ra p h ic a l s ite to a n o th e r, a n d to m a n a g e s y n c h ro n is a tio n o f d is trib u te d re p lic a s o f
file s o r d a ta b a s e s . G e n e ric in te rfa c e s to h e te ro g e n e o u s m a s s s to ra g e m a n a g e m e n t
s y s te m s w ill e n a b le s e a m le s s a n d e ffic ie n t in te g ra tio n o f d is trib u te d re s o u rc e s . T h e
m a in c o m p o n e n ts o f th e E D G D a ta M a n a g e m e n t S y s te m , c u rre n tly p ro v id e d o r in
d e v e lo p m e n t, a re a s fo llo w s :
T h e R e p lic a M a n a g e r
T h e E D G R e p lic a M a n a g e r w ill a llo w u s e rs a n d ru n n in g jo b s to m a k e c o p ie s o f file s
b e tw e e n d iffe re n t S to ra g e E le m e n ts , s im u lta n e o u s ly u p d a tin g th e R e p lic a C a ta lo g u e ,
a n d to o p tim is e th e c re a tio n o f file re p lic a s b y u s in g n e tw o rk p e rfo rm a n c e
in fo rm a tio n a n d c o s t fu n c tio n s , a c c o rd in g to th e file lo c a tio n a n d s iz e . It w ill b e a
d is tr ib u te d s y s te m , i.e . d if f e r e n t in s ta n c e s o f th e R e p lic a M a n a g e r w ill b e r u n n in g o n
d iffe re n t s ite s , a n d w ill b e s y n c h ro n is e d to lo c a l R e p lic a C a ta lo g u e s , w h ic h w ill b e
E u ro p e a n D a ta G rid P ro je c t 4 8 9
T h e R e p lic a C a ta lo g u e
T h e R e p lic a C a ta lo g u e h a s a s a p rim a ry g o a l th e re s o lu tio n o f L o g ic a l F ile N a m e s
in to P h y s ic a l F ile N a m e s , to a llo w th e lo c a tio n o f th e p h y s ic a l file (s ) w h ic h c a n b e
a c c e s s e d m o s t e ffic ie n tly b y a jo b . It is c u rre n tly im p le m e n te d u s in g G lo b u s s o ftw a re
b y m e a n s o f a s in g le L D A P s e rv e r ru n n in g o n a d e d ic a te d m a c h in e . In fu tu re it w ill b e
im p le m e n te d b y a d is trib u te d s y s te m w ith a lo c a l c a ta lo g u e o n e a c h S to ra g e E le m e n t
a n d a s y s te m o f R e p lic a L o c a tio n In d ic e s to a g g re g a te th e in fo rm a tio n fro m m a n y
s ite s . In o rd e r to a c h ie v e m a x im u m fle x ib ility th e tra n s p o rt p ro to c o l, q u e ry
m e c h a n is m , a n d d a ta b a s e b a c k e n d te c h n o lo g y w ill b e d e c o u p le d , a llo w in g th e
im p le m e n ta tio n o f a R e p lic a C a ta lo g u e s e rv e r u s in g m u ltip le d a ta b a s e te c h n o lo g ie s
(s u c h a s R D B M S s , L D A P -b a s e d d a ta b a s e s , o r fla t file s ). A P Is a n d p ro to c o ls b e tw e e n
c lie n t a n d s e rv e r a re re q u ire d , a n d w ill b e p ro v id e d in fu tu re re le a s e s o f th e E D G
m id d le w a re . T h e u s e o f m e c h a n is m s s p e c ific to a p a rtic u la r d a ta b a s e is e x c lu d e d .
A ls o th e q u e ry te c h n o lo g y w ill n o t b e tie d to a p a rtic u la r p ro to c o l, s u c h a s S Q L o r
L D A P . T h e u s e o f G S I-e n a b le d H T T P S fo r tra n s p o rt a n d X M L fo r in p u t/o u tp u t d a ta
re p re s e n ta tio n is fo re s e e n . B o th H T T P S a n d X M L a re th e m o s t w id e ly u s e d in d u s try
s ta n d a rd s fo r th is ty p e o f s y s te m .
T h e R e p lic a M a n a g e r, G rid u s e rs a n d G rid s e rv ic e s lik e th e s c h e d u le r (W M S ) c a n
a c c e s s th e R e p lic a C a ta lo g u e in fo rm a tio n v ia A P Is . T h e W M S m a k e s a q u e ry to th e
R C in th e firs t p a rt o f th e m a tc h m a k in g p ro c e s s , in w h ic h a ta rg e t c o m p u tin g e le m e n t
fo r th e e x e c u tio n o f a jo b is c h o s e n a c c o rd in g to th e a c c e s s ib ility o f a S to ra g e E le m e n t
c o n ta in in g th e re q u ire d in p u t file s . T o d o s o , th e W M S h a s to c o n v e rt lo g ic a l file
n a m e s in to p h y s ic a l file n a m e s . B o th lo g ic a l a n d p h y s ic a l file s c a n c a rry a d d itio n a l
m e ta d a ta in th e fo rm o f " a ttrib u te s " . L o g ic a l file a ttrib u te s m a y in c lu d e ite m s s u c h a s
file s iz e , C R C c h e c k s u m , file ty p e a n d file c re a tio n tim e s ta m p s .
G D M P
T h e G D M P c lie n t-s e rv e r s o ftw a re s y s te m is a g e n e ric file re p lic a tio n to o l th a t
re p lic a te s file s s e c u re ly a n d e ffic ie n tly fro m o n e s ite to a n o th e r in a D a ta G rid
e n v iro n m e n t u s in g s e v e ra l G lo b u s G rid to o ls . In a d d itio n , it m a n a g e s re p lic a
c a ta lo g u e e n trie s fo r file re p lic a s , a n d th u s m a in ta in s a c o n s is te n t v ie w o f n a m e s a n d
lo c a tio n s o f re p lic a te d file s . A n y file fo rm a t c a n b e s u p p o rte d fo r file tra n s fe r u s in g
4 9 0 F . G a g lia rd i e t a l.
S p itfir e
S p itfire is a s e c u re , G rid -e n a b le d in te rfa c e to a re la tio n a l d a ta b a s e . S p itfire p ro v id e s
s e c u re q u e ry a c c e s s to re m o te d a ta b a s e s th ro u g h th e G rid u s in g G lo b u s G S I
a u th e n tic a tio n .
2 .3 G r id M o n ito r in g a n d I n fo r m a tio n S y s te m s
2 .4 E D G F a b r ic I n s ta lla tio n a n d J o b M a n a g e m e n t T o o ls
T h e E D G c o lla b o ra tio n h a s d e v e lo p e d a c o m p le te s e t o f to o ls fo r th e m a n a g e m e n t o f
P C fa rm s (fa b ric s ), in o rd e r to m a k e th e in s ta lla tio n a n d c o n fig u ra tio n o f th e v a rio u s
n o d e s a u to m a tic a n d e a s y fo r th e s ite m a n a g e rs m a n a g in g a te s tb e d s ite , a n d fo r th e
c o n tro l o f jo b s o n th e W o rk e r N o d e s in th e fa b ric . T h e m a in ta s k s a re :
U s e r J o b C o n tr o l a n d M a n a g e m e n t (G r id a n d lo c a l jo b s ) o n fa b r ic b a tc h a n d /o r
in te r a c tiv e C P U s e r v ic e s . T h e re a re tw o b ra n c h e s :
- T h e R e so u r c e M a n a g e m e n t s u b s y s te m is a la y e r o n to p o f th e b a tc h a n d
in te ra c tiv e s e rv ic e s (L R M S ). W h ile th e G rid R e so u rc e B r o k e r m a n a g e s w o rk lo a d
d is trib u tio n b e tw e e n fa b ric s , th e R e s o u rc e M a n a g e m e n t su b s y s te m m a n a g e s th e
w o rk lo a d d is trib u tio n a n d re s o u rc e s h a rin g o f a ll b a tc h a n d in te r a c tiv e s e rv ic e s
in s id e a fa b ric , a c c o rd in g to d e fin e d p o lic ie s a n d u se r q u o ta a llo c a tio n s.
- C o n fig u r a tio n M a n a g e m e n t p ro v id e s th e c o m p o n e n ts to m a n a g e a n d s to re
c e n tra lly a ll fa b ric c o n fig u ra tio n in fo rm a tio n . T h is in c lu d e s th e c o n fig u ra tio n o f a ll
E D G s u b s y s te m s a s w e ll a s in fo rm a tio n a b o u t th e fa b ric h a rd w a re , s y s te m s a n d
s e rv ic e s .
- F a b r ic M o n ito r in g a n d F a u lt T o le r a n c e p ro v id e s th e n e c e s s a ry c o m p o n e n ts fo r
g a th e rin g , s to rin g a n d r e trie v in g p e rfo rm a n c e , fu n c tio n a l, s e tu p a n d e n v iro n m e n ta l
d a ta fo r a ll fa b ric e le m e n ts . It a ls o p ro v id e s th e m e a n s to c o rre la te th a t d a ta a n d
e x e c u te c o rre c tiv e a c tio n s if p ro b le m s a re id e n tifie d .
h ttp
L C F G c lie n t
m k x p r o f
r d x p r o f ll dd xx pp rr oo ff
W e b S e rv e r
X M L P ro file
(o n e p e r c lie n t)
L C F G se rv e r
G e n e r ic
C o m p o n e n t
D B M F ile
L C F G C o m p o n e n ts
F ig . 5 . L C F G in te rn a l o p e ra tio n
E u ro p e a n D a ta G rid P ro je c t 4 9 3
2 .5 T h e S to r a g e E le m e n t
T h e S to ra g e E le m e n t h a s a n im p o rta n t ro le in th e s to ra g e o f d a ta a n d th e m a n a g e m e n t
o f file s in th e G rid d o m a in , a n d E D G is w o rk in g o n its d e fin itio n , d e s ig n , s o ftw a re
d e v e lo p m e n t, s e tu p a n d te s tin g .
A S to ra g e E le m e n t is a c o m p le te G rid -e n a b le d in te rfa c e to a M a s s S to ra g e
M a n a g e m e n t S y s te m , ta p e o r d is k b a s e d , s o th a t m a s s s to ra g e o f file s c a n b e a lm o s t
c o m p le te ly tra n s p a re n t to G rid u s e rs . A u s e r s h o u ld n o t n e e d to k n o w a n y th in g a b o u t
th e p a rtic u la r s to ra g e s y s te m a v a ila b le lo c a lly to a g iv e n G rid re s o u rc e , a n d s h o u ld
o n ly b e re q u ire d to re q u e s t th a t file s s h o u ld b e re a d o r w ritte n u s in g a c o m m o n
in te rfa c e . A ll e x is tin g m a s s s to ra g e s y s te m s u s e d a t te s tb e d s ite s w ill b e in te rfa c e d to
th e G rid , s o th a t th e ir u s e w ill b e c o m p le te ly tra n s p a re n t a n d th e a u th o ris a tio n o f u s e rs
to u s e th e s y s te m w ill b e in te rm s o f g e n e ra l q u a n titie s lik e s p a c e u s e d o r s to ra g e
d u ra tio n .
T h e p ro c e d u re s fo r a c c e s s in g file s a re s till in th e d e v e lo p m e n t p h a s e . T h e m a in
a c h ie v e m e n ts to d a te h a v e b e e n th e d e fin itio n o f th e a rc h ite c tu re a n d d e s ig n fo r th e
S to ra g e E le m e n t, c o lla b o ra tio n w ith G lo b u s o n G rid F T P /R F IO a c c e s s , c o lla b o ra tio n
w ith P P D G o n a c o n tro l A P I, s ta g in g fro m a n d to th e C A S T O R ta p e s y s te m a t C E R N ,
a n d a n in te rfa c e to G D M P . In itia lly th e s u p p o rte d s to ra g e in te rfa c e s w ill b e U N IX
d is k s y s te m s , H P S S (H ig h P e rfo rm a n c e S to ra g e S y s te m ), C A S T O R (th ro u g h R F IO ),
a n d re m o te a c c e s s v ia th e G lo b u s G rid F T P p ro to c o l. L o c a l file a c c e s s w ith in a s ite
w ill a ls o b e a v a ila b le u s in g U n ix f ile a c c e s s , e .g . w ith N F S o r A F S . E D G a r e a ls o
d e v e lo p in g a g rid -a w a re U n ix filin g s y s te m w ith o w n e rs h ip a n d a c c e s s c o n tro l b a s e d
o n G rid c e rtific a te s ra th e r th a n lo c a l U n ix a c c o u n ts .
3 T h e E D G T e stb e d
E D G h a s d e p lo y e d th e m id d le w a re o n a d is trib u te d te s tb e d , w h ic h a ls o p ro v id e s s o m e
sh a re d s e rv ic e s . A c e n tra l s o ftw a re re p o s ito ry p ro v id e s d e fin e d b u n d le s o f R P M s
a c c o rd in g to m a c h in e ty p e , to g e th e r w ith L C F G s c rip ts to in s ta ll a n d c o n fig u re th e
so f tw a re .
T h e te s tb e d s ite s e a c h im p le m e n t a U s e r In te rfa c e m a c h in e , a G a te k e e p e r a n d a s e t o f
W o r k e r N o d e s ( i.e . a G r id C o m p u tin g E le m e n t) , m a n a g e d b y m e a n s o f a L o c a l
4 9 4 F . G a g lia rd i e t a l.
R e so u rc e M a n a g e m e n t S y s te m , a n d a S to r a g e E le m e n t (d is k o n ly a t m o s t s ite s , b u t
w ith ta p e s to ra g e a t C E R N , L y o n a n d R A L ). S o m e s ite s h a v e a ls o s e t u p a lo c a l
R e so u rc e B ro k e r. A s a re fe re n c e , F ig . 7 s h o w s a ty p ic a l s ite s e tu p in te rm s o f m a c h in e
c o m p o s itio n , fo r b o th d e v e lo p m e n t a n d p ro d u c tio n te s tb e d s , n a m e ly th e c u rre n t
C E R N te s tb e d , w ith p ro d u c tio n , d e v e lo p m e n t a n d s e rv ic e m a c h in e s (n e tw o rk tim e
se rv e r, N F S s e rv e r, L C F G s e rv e r, m o n ito rin g se rv e rs).
T h e o p e ra tio n o f th e m a k e g rid m a p d a e m o n
F ig . 6 . T h e o p e ra tio n o f th e m a k e g rid m a p d a e m o n
E u ro p e a n D a ta G rid P ro je c t 4 9 5
F ig . 7 . T h e C E R N te s tb e d c lu s te r c o m p o s itio n
4 F u tu r e D e v e lo p m e n ts
5 C o n c lu s io n s
A ll w o rk p a c k a g e s h a v e d e fin e d a n in te n s e s c h e d u le o f n e w re s e a rc h a n d
d e v e lo p m e n t, w h ic h w ill b e s u p p o rte d b y th e p ro g re s s iv e in tro d u c tio n o f h ig h -s p e e d
s c ie n tific n e tw o rk s s u c h a s th o s e d e p lo y e d b y R N G E A N T . T h is w ill in c re a s e th e
ra n g e o f p o s s ib ilitie s a v a ila b le to th e E D G d e v e lo p e rs . A s a n e x a m p le , E D G h a s
p ro p o s e d th e in tro d u c tio n o f a N e tw o rk O p tim is e r s e rv e r, to e s ta b lis h in w h ic h c a s e s
it is p re fe ra b le to a c c e s s a file fro m a re m o te lo c a tio n o r to trig g e r lo c a l c o p y in g ,
a c c o rd in g to n e tw o rk c o n d itio n s in th e e n d -to -e n d lin k b e tw e e n th e re le v a n t s ite s . T h e
d e v e lo p m e n t o f D iffe re n tia te d S e rv ic e s a n d P a c k e t F o rw a rd in g p o lic ie s is s tro n g ly
e n c o u ra g e d , in o rd e r to m a k e G rid a p p lic a tio n s c o p e b e tte r w ith th e d y n a m ic n e tw o rk
p e rfo rm a n c e a n d c re a te d iffe re n t c la s s e s o f s e rv ic e s to b e p ro v id e d to d iffe re n t c la s s e s
o f a p p lic a tio n s , a c c o rd in g to th e ir re q u ire m e n ts in te rm s o f b a n d w id th , th ro u g h p u t,
d e la y , jitte r e tc .
T h e im p a c t o f th e n e w G lo b u s fe a tu re s fo re s e e n b y th e in tro d u c tio n o f th e O G S A
p a ra d ig m s u g g e s te d b y th e U S G lo b u s d e v e lo p e rs , w h e re th e m a in a c c e n t is o n a W e b
S e rv ic e s o rie n te d a rc h ite c tu re , is b e in g e v a lu a te d b y E D G , a n d a n e v o lu tio n o f th e
c u rre n t a rc h ite c tu re in th a t d ire c tio n c o u ld b e e n v is a g e d . T h is is p ro p o s e d fo r fu tu re
re le a s e s o f th e E D G m id d le w a re a n d it m a y b e c o n tin u e d w ith in itia tiv e s in th e n e w
E U F P 6 fra m e w o rk . A n im p o rta n t c o lla b o ra tio n h a s a lre a d y b e e n e s ta b lis h e d v ia th e
G R I D S T A R T in itia tiv e ( w w w .g r id s ta r t.o r g ) w ith th e o th e r te n e x is tin g E U f u n d e d
G r id p r o je c ts . I n p a r tic u la r , th e E U C r o s s G r id p r o je c t ( w w w .c r o s s g r id .o r g ) w h ic h w ill
e x p lo it D a ta G rid te c h n o lo g ie s to s u p p o rt a v a rie ty o f a p p lic a tio n s , a ll d e m a n d in g
g u a r a n te e d q u a lity o f s e r v ic e ( i.e . r e a l tim e e n v ir o n m e n t s im u la tio n , v id e o s tr e a m in g
a n d o th e r a p p lic a tio n s re q u irin g h ig h n e tw o rk b a n d w ith ).
R e fe r e n c e D o c u m e n ts
N o te : a ll o ffic ia l E D G d o c u m e n ts a re a v a ila b le o n th e w e b a t th e U R L :
h ttp ://e u - d a ta g r id .w e b .c e r n .c h /e u - d a ta g r id /D e liv e r a b le s /d e f a u lt.h tm
[1 ] D a ta G r id D 1 2 .4 : “ D a ta G r id A r c h ite c tu r e ”
[2 ] D a ta G rid D 8 .1 a : “ D a ta G rid U s e r R e q u ire m e n ts a n d S p e c ific a tio n s fo r th e D a ta G rid
P ro je c t”
[3 ] D a ta G rid D 9 .1 : “ R e q u ire m e n ts S p e c ific a tio n : E O A p p lic a tio n R e q u ire m e n ts fo r G rid ”
[4 ] D a ta G r id D 1 0 .1 : W P 1 0 R e q u ir e m e n ts D o c u m e n t
[5 ] D a ta G rid D 8 .2 : “ T e s tb e d 1 A s s e s s m e n t b y H E P A p p lic a tio n s ”
[6 ] “ T h e A n a to m y o f th e G rid ” , I. F o s te r, C . K e s s e lm a n , e t a l. T e c h n ic a l R e p o rt, G lo b a l
G r id F o r u m , 2 0 0 1 , h ttp ://w w w .g lo b u s .o r g /r e s e a r c h /p a p e r s /a n a to m y .p d f
[7 ] D a ta G rid D 6 .1 : “ T e s tb e d S o ftw a re In te g ra tio n P ro c e s s ”
[8 ] C o n d o r P r o je c t ( h ttp ://w w w .c s .w is c .e d u /c o n d o r /) . J im B a s n e y a n d M ir o n L iv n y ,
“ D e p lo y in g a H ig h T h ro u g h p u t. C o m p u tin g C lu s te r” , H ig h P e rfo rm a n c e C lu s te r
c o m p u tin g ,R a jk u m a r B u y y a , E d ito r , V o l. 1 , C h a p te r 5 , P r e n tic e H a ll P T R ,M a y 1 9 9 9 .
N ic h o la s C o le m a n , " A n Im p le m e n ta tio n o f M a tc h m a k in g A n a ly s is in C o n d o r" ,
M a s te rs ' P ro je c t re p o rt, U n iv e rs ity o f W is c o n s in , M a d is o n , M a y 2 0 0 1 .
[1 0 ] D a ta G rid A rc h ite c tu re V e rs io n 2 , G . C a n c io , S . F is h e r, T . F o lk e s , F . G ia c o m in i, W .
H o s c h e k , D . K e ls e y , B . T ie rn e y ,
h ttp ://g r id - a tf .w e b .c e r n .c h /g r id - a tf /d o c u m e n ts .h tm l
[1 1 ] E D G U s a g e G u id e lin e s ( h ttp ://m a r ia n n e .in 2 p 3 .f r /d a ta g r id /d o c u m e n ta tio n /E D G -
U s a g e -G u id e lin e s .h tm l)
[1 2 ] S o ftw a re R e le a s e P la n D a ta G rid -1 2 -P L N -3 3 3 2 9 7 ;
h ttp ://e d m s .c e r n .c h /d o c u m e n t/3 3 3 2 9 7
[1 3 ] P ro je c t te c h n ic a l a n n e x .
[1 4 ] D a ta G r id D 1 2 .3 : “ S o f tw a r e R e le a s e P o lic y ”
D a ta G r id P u b lic a tio n s
G a g lia r d i, F ., B a x e v a n id is , K ., F o s te r , I ., a n d D a v ie s , H . G r id s a n d R e s e a r c h N e tw o r k s a s
D riv e rs a n d E n a b le rs o f F u tu re In te rn e t A rc h ite c tu re s . T h e N e w In te r n e t A r c h ite c tu r e (to b e
p u b lis h e d )
B u y y a , R . S to c k in g e r, H . E c o n o m ic M o d e ls fo r re s o u rc e m a n a g e m e n t a n d s c h e d u lin g in
G rid c o m p u tin g . T h e J o u r n a l o f C o n c u r r e n c y a n d C o m p u ta tio n : P r a tic e a n d E x p e r ie n c e
(C C P E ) S p e c ia l is s u e o n G r id c o m p u tin g e n v ir o n m e n ts . 2 0 0 2
S to c k in g e r, H . D a ta b a s e R e p lic a tio n in W o rld -W id e D is trib u te d D a ta G rid s . P h D th e s is ,
2 0 0 2 .
P rim e t, P . H ig h P e rfo rm a n c e G rid N e tw o rk in g in th e D a ta G rid P ro je c t. T e re n a 2 0 0 2 .
S to c k in g e r , H ., S a m a r , A ., A llc o c k , B ., F o s te r , I ., H o ltm a n , K .,a n d T ie r n e y , B . F ile a n d
th
O b je c t R e p lic a tio n in D a ta G rid s . 1 0 IE E E S y m p o s iu m o n H ig h P e r fo r m a n c e D is tr ib u te d
C o m p u tin g (H P D C 2 0 0 1 ). S a n F ra n c is c o , C a lifo rn ia , A u g u s t 7 -9 , 2 0 0 1 .
H o s c h e k , W ., J a e n - M a r tin e z , J ., S a m a r , A ., S to c k in g e r , H . a n d S to c k in g e r , K . D a ta
M a n a g e m e n t in a n In te rn a tio n a l D a ta G rid P ro je c t. IE E E /A C M In te r n a tio n a l W o r k s h o p o n G r id
C o m p u tin g G r id ’2 0 0 0 – 1 7 -2 0 D e c e m b e r 2 0 0 0 B a n g a lo re , In d ia . “ D is tin g u is h e d P a p e r”
A w a rd .
4 9 8 F . G a g lia rd i e t a l.
O th e r s G r id P u b lic a tio n s
F o s te r , I ., K e s s e lm a n , C ., M .N ic k , J . A n d T u e c k e , S . T h e P h y s io lo g y o f th e G r id : A n O p e n
G rid S e rv ic e s A rc h ite c tu re fo r D is trib u te d S y s te m s In te g ra tio n .
st
F o s te r, I. T h e G rid : A n e w in fra s tru c tu re fo r 2 1 C e n tu ry S c ie n c e . P h y s ic s T o d a y , 5 4 (2 ).
2 0 0 2
F o s te r, I. A n d K e s s e lm a n , C . G lo b u s : A T o o lk it-B a s e d G rid A rc h ite c tu re . In F o s te r, I. a n d
K e s s e lm a n , C . e d s . T h e G r id : B lu e p r in t fo r a N e w C o m p u tin g In fr a s tr u c tu r e , M o rg a n
K a u fm a n n , 1 9 9 9 , 2 5 9 -2 7 8 .
F o s te r, I. a n d K e s s e lm a n , C . ( e d s .) . T h e G r id : B lu e p r in t fo r a N e w C o m p u tin g
In fr a s tr u c tu r e , M o rg a n K a u fm a n n , 1 9 9 9 .
E u ro p e a n D a ta G rid P ro je c t 4 9 9
G lo s s a r y
A F S A n d re w F ile S y s te m
B Q S B a tc h Q u e u e S e rv ic e
C E C o m p u tin g E le m e n t
C V S C o n c u rre n t V e rs io n in g S y s te m
E D G E u ro p e a n D a ta G rid
E IP E x p e rim e n t In d e p e n d e n t P e rs o n
F tre e L D A P -b a s e d d y n a m ic d ire c to ry s e rv ic e
G D M P G rid D a ta M irro rin g P a c k a g e
II In fo rm a tio n In d e x
IT e a m In te g ra tio n T e a m
JD L J o b D e s c rip tio n L a n g u a g e
JS S J o b S u b m is s io n S e rv ic e
L B L o g g in g a n d B o o k k e e p in g
L C F G A u to m a te d s o ftw a re in s ta lla tio n s y s te m
L D A P L ig h tw e ig h t D ire c to ry A c c e s s P ro to c o l
L F N L o g ic a l F ile N a m e
L S F L o a d S h a rin g F a c ility
M D S G lo b u s M e ta c o m p u tin g D ire c to ry S e rv ic e
M S M a s s S to ra g e
N F S N e tw o rk F ile S y s te m
P B S P o rta b le B a tc h S y s te m
R B R e so u rc e B ro k e r
R C R e p lic a C a ta lo g u e
R F IO R e m o te F ile I/O s o ftw a re p a c k a g e
R P M R e d H a t P a c k a g e M a n a g e r
S E S to ra g e E le m e n t
T B 1 T e s tb e d 1 (p ro je c t m o n th 9 re le a s e o f D a ta G rid )
U I U s e r In te rfa c e
V O V irtu a l O rg a n is a tio n
W N W o rk e r N o d e
W P W o rk p a c k a g e