DML 2
DML 2
DML 2
uk/wrap/3955 This thesis is made available online and is protected by original copyright. Please scroll down to view the document itself. Please refer to the repository record for this item for information to help you to cite it. Our policy information is available from the repository home page.
DISCOUNT BAYESIAN
AND FORECASTING
MODELS
AMEEN
DISCOUNT
BAYESIAN AND
MODELS
FORECASTING
JAMAL
R. M. AMEEN
PH.
D.
DEPARTMENT OF STATISTICS
TABLE OF CONTENTS
page
1- CHAPTER 1.1 Status 1.2 Outline of the Thesis ONE : INTRODUCTION
t
3
2.2.2 EWR and time series 2.2.3 Some comments on EWR 2.3 The simultaneous adaptive forecasting 2.4 Discount weighted estimation 2.4.1 The model 2.4.2 DWE for time series 2.5 Applications 2.5.1 A simple linear growth model 2.5.2 A practical example: The U.S. Air Passenger data set 2.6 Summary
7 7 8 10 10 11 13 13 15 19
3- CHAPTER THREE : -YNAMIC LINEAR MODELS 3.1 Introduction 3.2 The DLM's
3.3 Relation between DLM's and DWE's 3.4 Some limitations and drawbacks
20 20 22 23 26
3.5 Summary
4.1 Introduction
27
4.2 Normal Weighted Bayesian Models 4.3 Normal Discount Bayesian Models
28 31
31
33 33
35 36
36
38
40 LEARNING
41 42
43
44 47
68
7.4 Normal weighted Bayesian multiprocess models 7.5 Multiprocess models with CUSUM's 7.6 Summary
70 73 76
77
79 79 80
83
87
87
87 90
multiprocessor
91 91
91
8.5 Summary
95
100
102
ACKNOWLEDGMENTS
Statistics, of
University
Finally
Research-Iraq for financial High Education Scientific Ministry the the of and present) and
support.
To those
I love so much
I owe so much
SUMMARY
This thesis is concerned with Bayesian forecasting concept of multiple parsimony. discounting In addition, is introduced
The and
in order
this overcomes many of the drawbacks of the Normal which uses a system variance matrix. to the scale of independent (NDBM) is introduced These A
and invariance
variables.
to overcome
these
Facilities
for parameter
and multiprocess
modelling
are provided.
many limiting
results are easily obtained for NDBMM's. A general class is introduced. This includes the class of NWBM's and for A
operates
of Management
by Exception.
illustrative of number
applications
CHAPTER
ONE
INTRODUCTION
1.1. STATUS
The study scientists Indeed,
:
of processes that are subject to sequential developments, has occupied
one of the most active topics in statistics. arrives sequentially its plausible pattern according to and hidden more reliable and control
time,
in order
future and
predictions.
( ) Bayesian have been used to analyse time series non procedures passive procedure seem to be through categories. model construction. Models
into different broadly two be classified can Social models provide structures
The be Scientific this of members class. other class may are called political organisations Models, These aim to build structures which fit specific environmental characteristics as important An closely as possible. subclass which is the concern of this thesis concerns
' The is build that to chance. aim of elements models and measure contain environments 'a deeper in to their adequacy understanding of the causal mechanism obtain order Scientific Models is Statistical Models This the of subclass called environment. governing Throughout its tools. the thesis, models are principle and as statistics with mathematics
-2random component among the common criterion Kendall, caused by measurement short term forecasting functions errors. Before the appearance of computers, the so called Moving Average
procedures, through
least squares.
This is reviewed in, references. With the, during the late and
development
used models in forecasting Averages (EWMA) the ICI and forecasting Brown's
developed package -
DOUBTS, Weighted
of MULDO
Brown(1963).
2 since they
much of the research described in this thesis. well known and widely used class of models is the Autoregressive models of Box and Jenkins (1970). Integrated
Another Moving
Average (ARIMA)
'Given 'a series of observations {y} and uncorrelated random residuals {a, }, having a. fixed "distribution, usually assme`dNormal, with` zero mean and a constant variance, an
of ;p, ; A1.......
q and
d are constants whose values belong to a known domain, and are to be estimated from the available data ( parameters in a non Bayesian sense).
Despite the existence of a vast amount of literature, number of unknown natural descriptive constants, meanings.. that are often difficult Further, these models depend on a large since they do not have the recommended Moreover, mean the and. '
to interpret using
for, estimation
a considerable They, of
amount
intervention,
example, -
discontinuities
-3State Space representations , (1963) have gained considerable and the works of Kalman grounds -regarding
fast
computer
storage
problems.
However,
a natural about
recipe would
representation through
is expressed
probability
such a foundation.
and drawbacks
are pointed out. series process is defined to be a parameterised joint probability distribution for all t. Initially,
terminology,
a time
process I Y, 10,1 possessing a complete there is available. prior information the parameter are represented
( that is incorporated
is adopted from Chapter 3 and onwards. Vectors while capital bold phase letters are used for
matrices except the random vector Y,. 1.2. OUTLINE OF THE THESIS : Weighted Regression (EWR)
Simultaneous (1967) Adaptive Forecasting Harrison Brown(1963) the and of are method of reviewed with some critical comments. discount principle technique. The EWR method is then exploited using the
to introduce the general Discount Weighted Estimation ( DWE ) allows different discount factors to be
different for introducing to and provides model components a preparatory ground assigned
the discount principle into Bayesian Modelling. The method is then applied to the U. S.
Air
those of DOUBTS
The Dynamic
and
) models.
Linear
Stevens(1976) and
prior assumptions,
function. Some limitations and drawbacks of the DLM's are also pointed out.
-4-
'4 into Bayesian introduced is in Chapter The discount -, through principle modelling
Normal ;Weighted Bayesian Models ( NWBM's ). This includes the class of DLM's as a
case.,., Other. important, special Bayesian Models - (NDBM's), introduced. and parsimonious subclasses like Normal Modified NDBM's and Extended NDBM's Discount are also
observation
is given. in Chapter
3 and practical
procedures
are introduced
variances that have some known pattern or move slowly with time.
r, ",.
Chapter
6, is devoted to ; reparameterisations
of, any NDBM,, transformations for calculating
and limiting
canonical adaptive
results.
Given the
A
eigenstructure
to similar
the limiting
matrices.
In practice, results
a limiting useful
state
for independent Results and matrices be of each other. calculated variance can precision ''given. NWBM's Limiting predictors are compared formulations somecannical are also of ARIMAmodels. with those
In Chapter sketched.
7, the principle
In' Bayesian
forecasting,
the backward
Cumulative
Sum (CUSUM)
departures and
are reintroduced
multiprocess
discount the based on which are models called Multiprocess characteristics NDB
and efficient
of applications
having different
Chapter in is 9. Attention discussion Finally a general given directions for future to in possible research. progress, and work
-5-
CHAPTER
TWO
DISCOUNT
WEIGHTED
ESTIMATION
2.1.
INTRODUCTION:
Operational simplicity and parsimony are among the desirable properties in model
' is used here in the sense of Roberts and Harrison of unknown constants involved in the
is the number
(1963) developed
the Exponentially
Weighted
Regression
the ` discounted
' sum of squares of the one step ahead forecasting factor, it has parsimony of order 1. It methods, that
about the future state of the process decays factors. The discount concept is a
using discount
key issue of the thesis and will be exploited in this and the later chapters.
In this chapter, Exponential Weighted Regression is reviewed in Section 2.2 with the The DOUBTS is introduced method is reviewed in 2.3 and in 2.4., matrices Ameen and Harrison and provides simple
being on time series construction. emphasis the Discount (1983 Weighted Estimation method
a). This
generalisation formulas.
EWR of
uses discount
In Section 2.5 a simple linear growth seasonal DWE model application is given using the U. S.Air passenger data series. and Box-Jenkins.
and a practical
2.2. EXPONENTIAL
2.2.1. The Model
WEIGHTED
REGRESSION
One general locally linear representation of a time series process at any time, t future is Ytt, outcomes : with t
-6-
Yc+k
-/O+kO$,
k +t+k
Ec+k
V1 -[O,
(2.1)
functions of time 0'i k =A(1),A(2),... A(n))i, k are unknown with the subscript t, k indicating , that their, estimates are based on the data available up to and including is a random error term ( ee+k variance V ). Usually,, .,, O,, and V are called the parameters of the model and in a Bayesian sense, k However, (yi, f in EWR models, these are assumed 1'. (0 ,V] time t, and Ei+k
they have associated prior distributions. as constants for the past data De ={(ye, f),... O k =0 is estimated &, '
Se
Differentiating
(2.2)'with
(2.3) . -0
Now, define
q c-1
(2.4)
(2.5) i, o
'Assuming that Q, -1 is the generalised inverse of q, it can be seen from (2.3), , that
This
relationship
on r.,
where as = Q1-1/'&
is by, forecast k The given point ahead steps mj_1" yt -fg and e=
-7-
I+k mt.
polynomial
constructed
(1963), by Brown into define fg, je Ck, as presented k= with dimension n. Therefore, f=j using the notations
and
Le=C-'kt-,
+f 'yt
(2.7)
where
and a, =Q, - j'
2.2.3.
Some Comments
on EWR
In order to get some insight into the terms and equations obtained in Sections 2.2.1 and 2.2.2, consider the minimisation by maximising (2.2) of again. Note that the same estimates of 0 Given that E,+, is a Normal in (2.4)
can be obtained
L=exp{-S=/(2V)}
random variable ,L
' of 0 at time t.
the information
contributed
discounted by P. This,
8. together with the convergenceof (2.2), restricts the values of to the range, 0<0<1. Thus the role of the ` discount factor ', describes the rate at which the information about Moreover, distinct X2,... X1, X. time. given as eigenvalues changes with model parameters of'C the convergence of ( 2.6 )'requires that 0< (/X;'I < 1. This can be seen on
(2.6) as rewriting
Q, R`
Combining
Q0C-e
i0
R,C, -: J "fC
the restrictions on ,
The
from follows the convergence of Q;. the vector adaptive st convergence of 2.3. THE SLMULTANEOUS ADAPTIVE FORECASTING: can be decomposed into
variation.
three different
the seasonal
and random
Suppose that
component changes relatively very slowly, `so that the greater percentage of the predictive ( data in the is trend to analysed at variation and random changes variation attributable the end of this chapter, is of this type). E%VRassumes that the loss of information with for both the trend the and seasonal components, whereas we rate same age, occurs at know that the information on the seasonal component is more durable, and hence, more appropriately
appropriate
than that
is which
This led Harrison (1967) to propose an alternative linear growth and seasonal method
EWR to which considered a simple multiplicative procedure 2. model of parsimony, DOUBTS That work led to the development Forecasting,
forecasting the of
Simultaneous or,
Adaptive
C. I. I. basis is the the short of which (1965). Scott and of DOUBTS Whittle , with (1965) some
Harrison
The
is a short
review
FF(k) is Ft(k)=(mc+kbe)Sj(k)
9.
where
mm , -t+6i_1--(1-031 6=bi-i-(1-431)2e s)e
e =y1-Ft-itl)
factor. discount is trend the i31 and The seasonal component for k periods ahead is given by ,
n
Se(k)=17
=t
{a$(t)cos(H,
zk)-b.
(t)sin(Hlzk)},
harmonics,
with H, taking
the appropriate
(t), and a,
c= diag{c1, cz,.,.,. c* }
Ck
cos(zk)
si(zk)
-ain(zk)
cos(zk)
Then
at=iia1-1+aet
(Yt/mt) e'e"
be details More found in Harrison (1965). factor Scott discount (3z. can on and a seasonal Although it is not intended here to proceed with the generalisation of this method, by the end of this chapter it will be evident that higher degree and parsimonious
to polynomials with more economical but still efficient seasonal components can be
accommodated. However,
like its predecessors, the method is limited and suffers from It is purely a point estimator. Unlike Holts
both theoretical and logical justifications. seasonal forecasting method, equations while
components . Other means of constructing used through stochastic adaptive
the seasonal effects are included in the trend updating contribution in is removed updating mi the seasonal
the trend
well defined
of the parameter
2.4. ` DISCOUNT
In this section,
WEIGHTED
ESTIMATION
Estimation (DWE), using different discount factors for different model components.
be by D, time t. Oi, data Dt_1}, f1), D4 estimated the at =0 given mg and ={(yi, k , ,
DEFINITION
A DWE model is given by
E(YI+k IDe, Ij+k]-le+x i,
(2.9) t
11 -
et'Yi-finit-i
(2.10)
+I 'tft
Qg =BIQt-lB1,
(2.11)
; 0<13, <1
9-
i=1,2,... n-
(2.12)
The EWR model is retained when B ={31 where I is an identity Notice that inversion only Qi-' and not Q, needs to be calculated Q,-' has been around to obtain
matrix mi.
Although
method
Henderson et
and Smith(1972),
by practitioners
inversions matrix
R, =B
"Qe-t-iB-"
(2.14)
at =Re!
'e(1+IeR,
f'e)-1
(2.15)
It can be seen from (2.11), that any initial value for Q. and hence =o be will , ,
dominated ( a around after small number an , the dimension of 0) of iterations. In cases
default settings Qo-1=a1, and no is a large number, 105 where a , adequate for operation. However in most cases, there is at least a
of the size of the elements of 0 which will give a better value of no so From 2.2.3 know that we , Qo= VCo-1, where, Co represents
Then
the variance of c; choosing a liberal c2,.,.,. cJ/V These ideas are illustrated .
setting Q-1=dia9{cl,
by an example in 2.5.
2.4.2. DWE for Time Series The principle of superposition states that any linear combination of linear models is a linear model. Model builders often use this in reverse, decomposing a linear model and
12.
extending
the principle
to statistical
may be decomposed into a set of components of normal random vectors. is that obtain the component a complete k>0, models can be built Hence in practice, separately
model. often
for all t,
C is decomposed into
(C1, C2,.,.,. C, }, where C is non singular. assuming that the n, square matrix n; n.
DEFINITION The method of DWE, for time series, is given by the forecast function A; =fC Mg
E[Ye+itIDI]=fI+knag
dL=4e
1j=Rej(/Ril,
+l)-i
_ (l - s j )Re
B-ti
2.16
_B-yCQ`-1-'C,
<1,
THEOREM
For the DWE method defined above, if X0, X;,2,...X; *i=1,2,... Q0, bounded then' for limQ, G; all and =Q exists :._ zero eigenvalues of e-=
-13-
PROOF
Using (2.16), we have,
Qt=B Is , -` Qv1p " ,r
t- t t2ttt2k2 Gl-
QOG
k=0
, -k
kk2
commute.
A introducing forecasts. temporal adaptivity proper way of un sounded and statistically is dealt with later through discounting the prior information. Under the above assumptions, the recursive formulas converges considerably fast to benefits, be later, form. Apart from limiting these provide as will seen computational a limiting justifications of many commonly used forecasting structures in literature. in spirit, uncovers models. 2.5. APPLICATIONS:
linear
It also
the partial success achieved by some classical models like the ARLMA
2.5.1.
A simple
growth
seasonal
of superposition
time series model can be constructed using a linear combination of a linear growth,
-14-
seasonal and random components. The linear growth model may be described by the pair 10 1 ]} . This
is evident since I1Ci =i1, k]. Then if m9 and b&are the present estimates of the level and growth rate, the forecast function of this component is j1 GkMt = m, - kb,, which is the familiar Holt-Winters linear growth, function.
Any o additive seasonal pattern T ._. S(1) .... ,. S(T), for which n is the integer part of
T+ 1) /2
S(j)=0
S(k)=(a,
co.-(kwi)-b;,
-in(kwi))
where w =2ir/ T.
An alternative seasonal component model which gives an identical performance to that previously discussedis
I2 [/2,1'f2,2'
C2 = diag {C2 11 C2 2,.,.,.
2, n]
02.0
/2
of this
k=0J
harmonic
and C2 k=
representation
[CO3
sin (kw) occurs
k-
exists
representation
in terms of a limited
number
of harmonics
Jenkins (1970)examined the mean monthly temperature for Central England in 1964 and demonstrated that over 96 % of the variation can be described by a single first harmonic the rest of the variation being well described as random . In this case the seasonal pattern is captured by two unknowns rather than eleven as in the full monthly seasonal
-15description. In applying DWE it is generally advisable to associate a discount factor t with the linear growth
description.
component but
trend.
The full linear growth seasonal model is then
{f 1"f 2i; diag(C, C2i; diag{(31I2, (32I2n}}
where Ik 2.5.2.
k=2,2n,
is the identity matrix of dimension k. Example: The U. S.Air Passenger Data Series
data from 1951 to 1960 is The series is a
A practical
favourite with analysts since it has strong trend and seasonal components with very little
randomness . However, it is not a strong test of a forecasting method. Harrison (1965)
EWR by Brown the that method proposed cannot achieve a Mean Absolute showed Deviation (MAD) of less than 3% since it insists upon using a single inadequate discount factor in a case in which the trend and seasonal components require significantly different discount factors. He stated that if, on this data, a method cannot achieve a MAD of less than 3%, then that method can be regarded as suspect. Harrison analyzed the data using the DOUBTS method described in 2.3. In this section the DWE model { J, C, B } is applied to the logarithms of the data
using: j=(1,0,1,0,.,., 1,01, C=diag{CI, CZ,.,.,. C5}
-16
"A ( for discount factors l the trend relating to C1 ) and p2 for pair of was used with
(, no9QO1=(
0}
of no seasonal pattern
a 95
interval [ 80 ; 280 1 and a monthly growth of between 4c and W 'c per month.
Hence
this is a very weak prior although it does not assume complete ignorance. Fig. 1 presents
the one-step-ahead point predictions For comparison, with the observations. errors over the last six years were
MAD DWE 2.3% achieved. performance of obtained and a 'Another Writing book data is in Box Jenkins. known the the given of and of analysis well of the t' observation and aj as the corresponding one step
logarithm the as z
=e-ie-1+ze-12
This is 61. 4 is 9=. 41 also of and method =. minimised when where the mean square error parametric following 2 the and parsimony table indicates the comparability of the
(0.84 DWE discount DOUBTS that the pair with same of and performance with that of 0.93) as described in Harrison (1965) and with the discount pair (0.76 , 0.91) which MAD z the the errors. of reduces
-17-
Forecast
Errors
DOUBTS
Year 1955 1956 1957 1958 1959
1960
DWE
(. 84,. 93) 7.7 5.4 5.5 14.7 11.5
11.5
DWE
(. 76,. 91) 7.0 5.4 5.6 13.7 9.8
12.1
B&J
OZEAN
Clearly,
9.4
9.4
difference
8.9
9.3
in this example,
no significant
results.
However,
intervention can easily be accommodated in the phase of sudden changes and these depend
on a small number discount assessed easily of factors. The following table illustrates
P2 P,
0.6 0.7 0.8 0.9 1.0
0.8
9.71 9.45 10.41 16.56 32.58
0.9
9.39 9.08 9.12 11.91 22.59
1.0
15.36 15.3 14.15 13.77 16.44
Passengers
Iv W
(*1000)
to 00 00 014
OOOO OOOO O
o..
co
3. -
01 0
rf
co r
o'
0 co
N O
"19
2.6. SUMMARY:
The methods of EWR and DOUBTS are reviewed and some general comments ,
drawbacks introduced introducing and limitations are pointed extension out. The estimation procedure of DWE is for
as a fruitful
of EWR
to provide
the U. S.air passenger data set and the results previous existing ones.
are encouraging
-20-
CHAPTER
THREE
DYNAMIC
LINEAR
MODELS
3.1. INTRODUCTION
One applications, intelligence of the main
:
contributions to Bayesian Statistics, both in theory and
way of combining
experts and
provided
The DLM's
of Harrison
justifications Holt(1957),
known
forecasting
procedures,
(1980).
an extensive Filter,
applications
of the Kalman
Kalman(1963).
Bayesian Statistics
introduced facilities the the and phenomenon of on-line of random understanding widened variance learning, intervention of stationarity. has relaxed the assumption and modelling multiprocess ,
In this chapter, DLM's are reviewed in 3.2 and relationships with DWE methods are
discussed in 3.3. for The DLM recursive formulas in the parameter estimation are attractive problem.
the computer
storage
However,
The specification
The class of DLM's as defined by Harrison and Stevens (1976) constitute quadruples {F, C, V, W}1 with proper dimensionality. A particular parametrised process { YY!0, } can
be modelled using this class of models if the following linear relations hold:
21 -
i)
YY=PtCt-"j
ii)
Oi=CeOe-i+'re
we..'Ni0+We1
(3.2)
The first of these equations is called the observation observable vector structure Yt to an unobservable state parameter distributed
equation vector
and an additive
to be uncorrelated
WW Vi respectively. and variance matrices Given an initial it follows that (YY IDe-t) -N[1e; ft 1 (Oe(De)` N[me+Ce 1 (3.3) prior (03IDo)-N'mo; C0;, using Bayes theorem with Di={1&, De_1;,
(3.4)
where:
li=PjGjm<_1 ; Yj=PjRjp, 4+VJ (3.5)
wi ==&-i+A&ej
(3.6)
(3.7)
(3.8)
(3.9)
cc=It -
(3.10)
When {P, C, V, W} are all known and are not dependent on time DLM is then the , DLM called a constant
-22-
BETWEEN
DLM's
AND DWE's
estimation using DWE and the estimation the ` we first following give relationship,
DEFINITION For any DWE {j, C. B}, definite, .a -, corresponding -e Qt-i-1 with initial DLM
using DLM's.
C't) V0 is nonnegative definite and H =B, -'Ci. Q, -, -t represents a generalised inverse of Q_i.
being singular.,
THEOREM
3.1.
the corresponding , Further, DLM produce a forecast function (Oi ID1)--N[m,; C,; where mi is the
by DWE. -1VV.
PROOF: From the initial settings, the theorem is true for t=1. Using induction, suppose
'
4'
and since
CC-1=Rg-t+ f'tfo Ve_1=Lff lQe-ige 1+1 'ifjJ/VV
Q1/V .
we have -'
-l
Ce = Q,
Ve
-23
)
E; OD=G m ei=m _1+a ai=Cif'e/Vt=Qi-lj'i, the DWE estimate, since
the DWE s,
iii)
E; YY+kIDe]-fe+.,
H CI.
i=1
$m=
COROLLARY
For t>0.
is unusual in its dependence upon C, the In DLM terms the above setting for WW _1, The D, that the 0, the given concept observers concerning observer of uncertainty _1. depends his information is development the future the process upon current of also view of Souza(1978) and in Smith (1979). Forecasting, Entropy in adopted
AND DRAWBACKS
Time series processesare often best described reasonably using parametric statistical intervention be in In this can various stages of performed case, efficient model models. framework, Bayesian Within the the analysis. DLM's are often used for this purpose.
However, the latter require experience in the representation of innovations using Normal distributions. probability The specification of the associated system variance matrices has
Practical because V the arise problems of non uniqueness obstacle. of major a proved lack familiarity because the W of and and with such matrices causes application
difficulties and lead practitioners to other methods. Even experienced people find that
-24
have little feel for
they
natural
quantitative
the elements
of these matrices.
Their
ambiguity
For example:
of =, \9t-t
+ W9
v` with Wt
--N'O; U=
V-as as
aS W-a(1-X
Zj )S
can
be
represented
as
and without
for an infinite
it can be written
so that
Var(Y,
z
)=V+W/(1-X2)
and
>. "v
;t... t
Cov(YY, Yt+k)-kk W/(1-X2).
Provided
that
is
covariance
matrix,
the
joint
distribution
of
Y, does not depend on a. i. e; for infinitely many values of the variances Y,, YI+l, Y: +2,.,., +k of the u's and'w's, the same forecast distribution .
V, W and C using sample autocovariances, see
system
is the 'understand this to concern Of is both to elaborate and simple and procedure easy
-26-
3.5. SUMMARY
The class of DLM's is represented in 3.2 and its relation with DWE estimates is DLM having DWE is there the same It in 3.3 that a exists model, given a shown given .
forecast function. Limitations discussed in 3.4. DLM drawbacks s are of and
CHAPTER
FOUR
NORMAL
DISCOUNT
BAYESIAN
MODELS
4.1. INTRODUCTION:
Two desirable properties conceptual estimation. estimation parsimony. Chapter of applied mathematical models are ease of application and
of discount factors in methods of sequential the method of DWE which generalises the by Brown(1963). In is
Exponentially of method
Weighted
Regression promoted
information
is now worth M units then its worth , However if a system has numerous particular components may be
characteristics
associated with
different values. The DWE method provides a means of doing this but it take to required is strictly a point estimation method.
The major objective of this work is upon the discount concept. This
Bayesian
ICI forecasting in been the has applied concept Harrison(1965) forecasting method and Harrison Scott(1965), and
package MULDO
package, does
which
generalise,
latter
). which and
Godolphin
(1975),
The use of such models has involved matrix W which has elements that are
practitioners
specifying
a system
-28proportional
discounting. ( NDBM's) introduced components. time
is concerned with a class of Normal the system variance matrix possibly different discount
posterior
precision P,
t-1 to a prior
The term
precision
is used in its
Bayesian sense but may also be thought The use of -the discount variance W, ' since ambiguity matrix
of as a Fisherian measure of information. overcomes the major the discount matrix disadvantages is invariant of the system ( up to a linear
is removed.
transformation')
the methods 'and are easily applied. measured ,: of operation -it' is 'anticipated dynamic behaviour, performed regression, forecasting quality control that
time series analysis , the detection of changes in process , modelling where the observations are
sequentially
or c rdered according
to some index.
In this chapter Normal Weighted Bayesian Models (NWBM) are introduced in 4.2 Particular is DLM's is their emphasis given to a subclass pointed out. relations with' and is discussed NDBM's'and mdels'called the the retaining of model coherency possibility of in 4.3. ' Other practically important like Modified NDBM's and the subclasses of models
in These discussed in NDBM's 4.4. the the Extended extend capability of models the are behaviour in in dealing the and process with cases pof sudden changes correlated
NDBM's in finally 4.5 Some the are given and a short on comments observations. in is 4.6. chapter the given summary of
4.2. NORMAL
WEIGHTED
BAYESIAN
MODELS
:,
Consider a parameterised
the
unobservable
vector
of
state
parameters
containing
certain
process state
interest. of characteristics
-29-
a product
at time t
with
the corresponding
component
of 04 representing
its level of
demand at that time. distributions. operationally made for 0,. probability distribution
Each of the YY and 0, are random, vectors and so, have probability the discount principle discussed in Chapter no distributional by introducing for 2 provides assumption an initial updating an is
Although
method
joint the
distribution
Bayes theorem
the way that the parameters evolve with transition DLL[. In order to introduce the discount stage. The relevant
time and the amount of precision lost at each are stated in (3.1) and (3.2) for the
model assumptions
principle
the class of
DEFINITION For a parameterised process { Y1!04} ,t>0, where the observation probability distribution is a NWBM is given by a quadruple
(4.1)
N[m
C1_1] 9 _1;
(4.2)
at time t, is
N[Gg mt-t; Rt) ;R =H CC-, H't (4.3)
_,
is. C,
THEOREM
4.1. the one step ahead forecasting distribution and the updating
N; me+L'ei
(4.5)
,,
_1}
and
IiPgGirat_i
Y=FjRjP'tVt
(4.6)
'
mt=Cimi_t-Ater Ct=(I-AeFt)Re
, ec-7e-re
(4.7)
A. =RtF'tke-1=CeF'tvt-'. .
(4.8)
PROOF
The proof is standered in normal Bayesian theory. obtained from the identity f(Ye1Oe)f(OeIDe-t)-I(YjJoe-jt(ocI functions the where variables. Similarly, f (. ) 's are density functions terms However, the results can be
Yt, De-j)
of the appropriate random
by rearranging
the quadratic
)+(0j-Cjmt-t)'Rt-1(09-C&=$-t)
defined in Cs the theorem. and are as m, linear and non-linear are normal. distribution models If
is nonnegative
the conditional
NWBM , any it is
setting
evident
3Z
these different model components. Before introducing practically more efficient and parsimonious NDBM settings, it is interesting to point out some relations with other well known models. THEOREM 4.2.
{F. C. V, B},. with non singular Ci for all t, and initial setting
Given a NDBM
(m0; C0),
If Bt =l,
function NDBM1 forecast V, VV the and = setting (rno; Q0= Co-' V) is identical to that
is identical
to that of of
function
with
If B4 =1,
PROOF
The proof is by induction. From the assumptions in all the cases, since , Now, assuming that
it is for that time t, true is t-1 time true show we the theorem at , iGiven from the NDBM results, for time t, we have
M, =GOwt_ltaitt ,
Ce-i_8e-i+F,
eV
spe
This gives
Ce 1V=PG'e 1qt-iCc i+F', F, =Q, for EWR.
Hence,
at=Ci 1F'jV=QtF't
k.
32'-
these different model components. Before introducing practically more efficient and parsimonious NDBM settings, it is
interesting to point out some relations with other well known models. THEOREM
Given
( 0;
4.2.
a NDBM {F. C, V, B},, with non singular CC for all t. and initial setting
CO),
i-
If Bt=I,
If Bt, =I,
PROOF The proof is by induction. From the assumptions since in all the cases, , Now, assuming that
the theorem is true at time t-1 , we show that it is true for time t, iGiven from the NDBM results, for time t, we have
1Rt=is1A_1<<C
Ct-1=R6_1+A
1Fg gV
Ri
1=G'1
1Ce 1Ge
This gives
c lV=pGe 1Qe_1c 1+P'iF=Qi for EWR.
Hence,
1F't V= Q4F'j C, at = a+ for the EWR.
-33iiiii`=B''C'c-`C The proof is similar to part one, with Rj The two models are identical, since, for the tCg `B NDBM with B =!,
Bt=CA_iC't=CeCt_tC'i-0
WW=O.
Furthermore it can be seen from the Theorem in Section 3.3, that the limiting
forecast distribution DLM of a constant NDBM {F. C, V, B}. is identical to that of a constant definite. However, as
{F, C, V, WI with
W=(HCH'-CCC')V
being nonnegative
for the time series models defined in Section 2.3.2, particular structured NDBL1's for which I. C, =diag{C1, C,,... C, }i is the identity matrix of
to canonically
dimension
4.3.2.
Forecasting
Updating and
parameter
NDBM's with
distribution at time t-1 as in (4.2), the one step at time t,
posterior
parameter
distributions
is given by
Hc4.1 1M
Ft(k)=E{Yt+kIDei-Pe+k
4.3.3.
Coherency
at +k -GC+k0ttk -
-1
+wt,
k+
t, k
Defining
,
Re, i-g&+iCcH'+1
-34-
(4.9)
k+l=Ht+k,.
l(I
kFt+k)Rt.
t+k+1
At,
pit+ki =Rt. k k
8t_k
series,
Ve-k
Fe-k Re
kFg.
is a scalar quantity.
THEOREM
4.3.
,
.2
of W,
jr,,,
Z FGRF' Y
FG2C, FGC& Ct
FGR FR GC1 R
FQ FRG' G 2C6 GR
Q
-35It can be seen that the posterior distribution Cmt+RF'1' But (oo+2I, Now Q-GRF'Y be incoherent. inequality will De) e+1, V C_1; Q-GRF'Y-PRC" showing that by the discount since IDe+1) (O, is of +i C +i=R-RF'C'-1FR!
1(fe+i-FGne);
principle Ct k= Ci_,
(4.9),
be can extended
to establish
NDBM for DLMt Y, the the of and equivalence _k. The above theorem ensures the testability the DLM. However, given a starting distributions initial N VBMI's the of on the same lines with with F, C, V and B, particular
prior
The noted difference is that the DLM uses the set of equations (3.1) and (3.2) go is known while, the NWBM starts B. with [ma; Co, and the transition
is acknowledged
4.3.4.
Sequential
:
and are subject to slow
experiments
sequentially
perhaps due to some uncontrollable ) models are hardly justified for a sequential by Harrison.
quality problems
in the production
However,
matrix
overcome experiment
22 completely
by be can represented
Y; =A1. +Q,, `erj1 1i e i, j =1,2
.,
block 0,, the Qjjj the represents and represents effect where collective & YE9, time t. at =0 any stage with ji ii
treatments
effect
to partition
an orthogonal
and A3.& as the effects of treatment Usually this is performed as follows: treatments and their interaction
sum of the
random error.
"C =8t. t -A2t -eat -e4I -Elt
ii-
1x
random error.
Y121 -A16 +029 -031 -010+620
iii-
similarly,
--'In
with
F1 .1
11 -1 11 -1 1 -1
(4.10)
C is taken as the identity matrix to indicate a steady parameter evolution with time. 4.4. OTHER IMPORTANT SPECIAL CASES:
4.4.1.
The Modified
In modelling discontinuities
-37-
protects arising
on unchanged that
components
against
unwanted This
interaction
those components
is possible
discontinuities occurrence of
in the data need not require complete respecification such as Jaynes (1983).
DEFINITION Let (Oi_1lDi_1)--N[me_1; Ct_li ,G= diag { C1, C2,.,.,. C, }B= diag
be {C,., }, let C_1 be Ri the structures of partitioned and and of , _1 , for i, j = 1,2...... r. A modified NDBM is a NDB.M {F. C. Y, B, such that
(4.11
for i* j
(4.12)
statistical
it be In is time processes, series may possible to classify the types of processes common. level, Such in into be and/or growth seasonal components. changes changes can change the increasing to by only corresponding components so that, uncertainty modelled components uncertainty is not effected. In DLM's, other
for blocks. For NDBM's, the the the only vector, relevant error w,, state uncertainty of future uncertainty is controlled by the discount matrix. It can be observed from the
definition of NDBM, that the future uncertainty introduced to a particular block will be transmitted to other blocks through their correlation between them. The modified NDBM
is introduced to prevent this. intervention relevant Moreover, a major disturbance factor N, on a particular block may
be signaled with
using a discount
by N periods. under
Although,
loses invariance
temporarily,
uncertainty
These ideasare usedin the examplesthat are given in Chapter 8 and Migon and Harrison
(1983) have applied modified NDBM's in their models.
-38The above definition That is, can be modified to include more general transition form of G as {G;, } and that of GC, _1G'=E C matrices. as {Eji}
R, -12
fori'
j.
when high frequencies and some type of stochastic The extended NDBM is defined by the quintuple N[^-,; Ct-tJ , this defines
(Y 1Oc)-N(FFOt; VV)
(OeIDc-i)-N[Gi
], R me-i;
where
- C Ri =Bt Often with in regression and design of- experiments a constant variance
It %
stable
decay.
This may be the case with the example in 4.3.4, if the block effects are so that ) (A1, IDt_, )-Ne5, 95,, is unknown where u2 i; g With the design matrix and
or exchangeable
P defined by
F1=(FOJ
C=
0 (0,0,0,1)
0I,
4
-39-
Similarly in modelling correlated observation errors such as those generated by a (1-e1B)(1-e2B)St process vi = stationary second order autoregressive
1bt {(1,0.01; 0 10 b2 Fi+2
type models of ,
I;
0;
diag{{1,1. B}, W}
EWR Generalised EWR to which considers stationary extension of Harrison and Akram (1983).
observation
-40-
4.5.. SUMMARY:
The'discount concept is introduced models. into Bayesian modelling and forecasting via a
NDBM's
NDBM's formulas
41 -
CHAPTER
FIVE
ON-LINE
VARIANCE
LEARNING
5.1. INTRODUCTION:
One of representation that the consequences of the conceptual differences between the Bayesian is
process and its non Bayesian formulation structure for the variance
the former
dynamic
V, of the of
observation Kalman
known be is to that assumed constant a often error v, and ARLMA techniques. of it VV is important is crucial models. for
in the formulations
Filtering
estimation but
application governs
of the
forecasting
in multiprocess
since it
different the of
practitioners
have little
forecast variance
Y' ( ) ii Theorem 6.4. is discount )lim (; is ; the V=11 /X; part of where the relationship ,
i=1
e-Z
factor associated with the iih parameter 0, with associated eigenvalue k,. If required the be V derive to may acknowledged and used marginal extra uncertainty associated with forecast distributions Bayesian manner. A number of approaches based on the idea of De Groot(1970) have been adopted for
estimating univariate the observation variance V. Smith (1977) has discussed the problem for
is given by
procedure non
Bayesian
elaborations
Stevens(1975),
Harrison
and Johnston(1983)
are briefly
-42-
reviewed and a generalisation of the latter is given in 5.3. A new procedure called the power law is given in 5.4 , Ameen and Harrison (1983 b). 5.2. THE BAYESIAN APPROACH:
ie
)- r(
-t
rte
-t
(5.3)
22
exp{((*le_1/2)-1)log4b,
-i-(a,
-i/2)ee-t}
(5.4)
(5.6) transition
required
parameters
A special choice is introduced later in Section 5.4. Other forms are dealt with in Ameen (1983 b). These are specific functions either defined through posterior entropies or like advertising awareness. However, it follows
from (5.1)-(5.6) and using Bayes theorem, that the recurrence relationships for m, C, i, k, (4.6)-(4.8) and are exactly as with the setting V=1 , and
-43t
(YeDe-tlie)"Yfet
41
(Ot ID,
-i,
bt )`
obtained
Ye+i- ye+i
(1T-
to
t
with T16degrees of freedom . elegant and is properly Bayesian. It is not easy to structure of V,
method is operationally
the observation
variance
V.
Denoting the value of the curve at that VV x yt 20, by they chose the and
maximum
estimate
proposed
Harrison
Stevens(1975)
assumes that
PSQ % L,
where L1,SS are the level and seasonality constant with C( say ) is
known Q the P, while constants proportionality are components and obtained from the median of a pre specified N ordered constants
corresponding
probabilities theoretically
data line information. the using on updated which are do not generalise easily. and profound Another
on-line estimation
-44-
1-1 tt
) Ve-t;
1-a e
or more generally
1'
5.4.
THE POWER
LAW:
efficient and robust procedure can be described using the
For, a univariate
time series,
(5.8)
the accuracy expressed in terms of degrees of freedom or of equivalent observations. , the analysis of 5.2 it is seen that Vi IDi (4 J. =1/E Hence, if required,
forecasts can be
produced as in 5.2 using a student t-distribution may be wise to protect distributions, one simple O'Hagan effective
with -9, degrees of freedom. In practice it Outlierdisturbances. and major prone Using mixture d, distributions. However
practical
is to define [4,6).
(1- f, ajmin(e,
KY } where in ,
-45
suspected that
'I1 -13l1e_1`1
This procedure is easily applied and experience with both pure time series and regression type models is encouraging. 0.95 <<1. choose then it is recommended dimension empirical of the state However, because of the skew distribution of di it is wise to
Further that
prior of the parameter vector 0 is vague commences at time n+1 where n is the . with positive observations. an
variance .
vector
2b
In stock
control
with b=0.75
An estimate
;, of a is then derived as
a, =ZZ/fig, where
Zt=Ze-t+desfYet .s
A more general procedure for accommodating stochastic scale parameters is as follows. See Ameen ( 1983 c). Let Oj be a scale parameter with posterior probability density function (pdf) at time t-1 given by (5.3) and prior pdf for time t, be given by (5.6). Moreover, let s& , me-1 -1) ID, (0t-, and f (0
I0, (YY ) be the f k the random variables with pdf's of modes and ,/
De-1) respectively. e
f (Qek1 De-t) ,
1Dt-i)
(5.10)
e
n)2-. b 4
_)2
1-b
! (itIse)/(AcDc-i)]
(1(re1Oc)/(oejDe-i)
. -.
). 2
'i/(fczc)/(keI
Di-t)I/(meI IDt_, )
De)] '/
1-. b
'(mcIDc)f
(01IDe)
Ain
),z
exp{-[ .
/(fase)/(k
(a 1)t2ln .. /( rac ID) t
1.6t/2}/
I-, b'(rnc
ti jDc) , Dc )/ IO,
I Dt. Oc mode of
In comparison
posterior
s_
/(Oc', e`Dc)..
_'d ''
, 1' 2
'
r'}'
(OIDe)/
'(m
IDe)
we have
Ize)f (De-i)/I ID&)} (ye (ke (m& a -W(ae-t)+21n{I The formulas (5.9) and (5.10) are exact for normal random vectors and are in contrast with those of (5.1) and (5.5). However, the formulation above goes well beyond the exponential family of distributions and has the key for introducing a constructive dynamic evolution of location and scale parameters in generalised dynamic models.
., e. _ar
cr
-47-
5.5. SUMMARY:
A proper Bayesian on line estimation described in 5.2 .A procedure for the observation variance is
The power law is described in 5.4 Outlines for a general model , for which stochastic . is be given. accommodated, can scale parameters
; i'i ".
xv
-48
CHAPTER
SIX
LIMITING
6.1. INTRODUCTION:
There variance has been a continued interest
RESULTS
in deriving
limiting
vector
equations
{j, C, V, B} these values can be obtained directly. constant DL11's {j, C, V, W} NDBM's which have
set of
to those of constant
is often fast and, in order to achieve conceptual previous efforts have been devoted to determining
constant
have limiting
Harrison and Akram (1983) and Roberts and Harrison (1984). Similar models and the method of transforming from one similar model to another
are defined. Limiting for the state covariance matrix results CC and the adaptive vector. matrix C' and
for first models similar a, are stated then for general constant NDBM's.
errors is obtained
for NDBM's.
8.2. SIMILAR
MODELS
AND REPARAMETRISATION
objectives of theoretical
:
is to obtain unified
developments
fields of applications.
By looking
at the most In
meaningful NDBM's
economical
of practitioners.
-49
DLM's.
DEFINITION
F FC FC2 A constant NWBM {F, C. V. H} is called observable if is of full rank.
FC"The observability
parameter
of state
C2, V, H2}
The importance of finding similar models arise in practice since, real life problems
are rather complicated benefits, in this their `primary' physically form statistical meaningful formulation. relationships Apart among from the
computational primary
provides
variables
like growth,
components.
THEOREM
6.1.
L= Ts' T
PROOF
Since Ml and M2 are similar, it follows that F2=F1L and C2k=LCI 1, L-
F`^n-I
FiG'
i. e:
T2= TIC I
From observability
it follows that
This gives .
similar
reparametrisations. #=L9
That
produces of P and
by the specifications
role in that
specification,
forms are
these ideas.
THEOREM
6.2.
3'/X1 <1; i=1,2 If the n. ....
be the eigenvalues of G, and 0< Let X1, X. X21 ... is NDBM {/, V, G, p1} observable, then constant lim{C, R, i', &}, ={C, R, Y, s} tx being C, R non singular. and uniquely exists, with
PROOF
From the NDBM results,
-V
, tip
51
i-1
-e C, QoC-e QQ -' -o
Hence, using the assumptions 0< 3'/X <
pi C. -: f. JC-;
1, IimQ, c-= exist.
(s. 1)
=Q
Q= 8-0
-r3$C. 8-0
- u T. T., G-n
where T=j',
f',..,.. "-
T=G-(n-t)! is unique,
G '"-1f
assume
s=C, -Esc-1+I, /.
Therefore,
s-Q=,
Successive applications have Z-Q Since, a, = R, j' unique. Ya-=C, Moreover, =O. Q5=C1-1V, V-', the limiting
-1(s-Q)c-,
C'-k(s-Q)C-k, and as k-x,
(6.2)
we
and
Q _C, -iQC-i+
f, f =C-i f, y-i
,R
=R-1000'
(6.3)
Y= fRJ'+V
, s=Rf'Y
'=Cj'V-1
(6.4)
THEOREM
Let,
6.3.
n, =n, i=1 C=diag{C1, C2,... C, } and B=diag{(31I1, p2I2,... 43,1, } with
-52-
uniquely exists.
PROOF
The proof is similar to that of Theorem (6.2), knowing that the observability
In order to have some ensight into the sensitivity of these models, consider a DLMI
Yt = At + v, . Vt ^- NO; V}
N_0; W]
A =Ai_1 T wi
Given the prior (60IDa) -- N[mo; C0], it can be seen that the posterior state variance Cg and the -adaptive and C =A V= ((W2+4 WV)".- W)/2. Now, both A,, converge to C and A respectively coefficient
discount NDBM takethe the same prior settings and with consider a
factor as 1-((1+4 V/ W)`*- 1) W/(2 V). This guaranties that the NDBM and the DLM both have the same limiting variance C, _1 distribution. However, given any common posterior
NDBM-to the faster DLM the for than if t, CC the Therefore, >C converges all faster limit NDBM the for to t, then if Cj the limit. ` -However, <C converges all than the DLM. This generalises to higher -dimensions.
53 -
8.3. A COMMON
CANONICAL
REPRESENTATION
One of the most common and yet simple canonical forms for observable models have distinct C which with system matrices C=dia9{X1, X2,.,.,.X. } and I=11.1,1,..... following theorem holds eigenvalues , Kl, Xz,.,.,.k. that is
THEOREM 6.4.
Let if , C, V, B} be a constant NDBM. where with 0<; /= < 'I'l I],
C=diag{X1, X2,... X. } and B=diag{I31, (3z......i3j i=1,2.... Xi u, =pi', , n all distinct. Then
" i) Jim&, =a=jal,
&"'o
1-u,
ui
a2,... a"j',
with
a, =(1-u.
)ll
f;
u2 ,
1-uI
ii)
limYt=Y=V/fl
u?
" limC1=C={cij},
&'s
1-uku, f
Uk Ui fi
"
1-u1uj
Up, uj
c; j=V(1-uiuj)fl
kj
iv)
B'-`'-
CCG' _ {w;1},
ell u; u1
PROOF
From Theorem 6.2 or 6.3, lim{C, R, Y, a}j={C,
: -z
}=C=(I-af)R {c;;
Q_B''C'-'QC'-'B"+! '! , Q=C-'V={vr; }
(6.5)
(6.6)
-54-
fI
(6.7)
az(n),... a. (n)J.
(6.8)
Qa(n)=I For n=l, (6.8) and (6.9) gives al(t)= For n2t2, i=1,2.... n ;' from (6.9), we have
(6.9
tO) t6.
i=1,2,.,.,.
(6.11)
Therefor,
w-1 awn)(i' 9. ai (n)y4..
substituting
This gives
'Si 1-s-t u
yjjaj(n)=1 1-u1 u.
(6.12)
1- us
-55-
ii)
From (6.4) V=Y-JRJ'
=(1-js)Y
(1'=t
ajY
nn
V a, = I- [-[ u?
iii)
(ii) (i) from follows and the result iv) Easily derived from the definition of W
COROLLARY
6.4.1.
for all k then (i) reduces to the EWR result of Dobbie (1963).
If k =0
The theorem is of practical interest mainly for periodic models with distinct For inside the lying unit circle. a real observation series a or on complex eigenvalues limiting be the C adopted and corresponding values would similar model with real
derived. are easily For example, if G=,, cosw -sinw sines cosw an alternative NDBM can be
G= with considered
`V 0e
0i
:.
THEOREM 8.5.
-56-
NWBM
and have
transformations
identical characteristic
PROOF:
Since limC1= 0 is non singular. from the N VBM properties, we have
C=([-AP)R
R=UCH'
This gives
CH'-10-I= (I-AF)H (6.13)
The result follows from (6.13). The above results can be used to calculate the limiting adaptive coefficients for its NDBM if state covariance matrix converges to a non singular any observable limit. In particular, for {f, G, V, 3I} NDBM's such that 0<
NDBM's
2 /x; <1,
where
COROLLARY
0.5.1.
has Jordan form and 0<< J(X) where, a X2.
(n)p
i=1,2,... n ,
-57-
PROOF
It is easily seen that the above NDBM is observable. Since 0<0<
(6.2), Theorem using lim{C, a}, ={C, a} both uniquely exist. Moreover
X2,
C is non
singular.
From Theorem (6.5) det(G H=-lG for all r.
-i1)det((I-aj)G-il)
This gives
tx-r1 -r2 1000 a1..
det{
-r. 000a
}=(/a-=
where, a=x-z
(6.14)
The result follows from the comparison of the coefficients of each power of a in this equation. COROLLARY 6.5.2.
matrix with entries 1 and 0<p
<1,
then
i=1,2,.,.,. i-: n.
ai=
PROOF Following the steps as in the proof of Corollary 6.5.1, the alternative form of (6.14) in the variable x is
58.
(1-z)*
-a1(1-z)*-'+a3S(1-z)-2t...
+(-1)"a.
z*-1"((3-z)'
(6.15)
Writting
-1
)actk=
()l-l)
_n
i=1,2....
a.
COROLLARY
In Corollary
6.5.3.
6.5.2, if j is replaced by /=1.1.1......
=1. ......
1. then
ai_ i
F'
PROOF
Similar calculations In a. )+ (-1)kzk(1-z)w-kQk-(-z)e give the alternative form of (6.15) as
-lk k,
k-0
ik
COROLLARY
6.5.4.
010, Jwith 0<R<1, then a=1+13' and a; =0
59
Now,
( distinct X. that X1, X2,.,.,. not necessary given NDBM, the restrictions Denoting
covariance
matrix.
(I -a f) C by p, ;i=1,2,.,.,.
.' THEOREM
6.6.
any constant observable NWBM {f . C. V, H . for which limCC =C e-=
Given
is positive definite,
nn
B)ye-
B)e,. =0
shift operator
PROOF.
Since limCC =0 is positive definite, lim{R, a}1={R, a} exists and R is non
singular.
Let p,, i=1,2,.. n be the eigenvalues of I=CR-1C=(I-af )C.
(4.1) A direct application of the Bayes theorem in updating (4.3) using , with univariate observations, gives as t ---,
wt= Gw-1 aee (6.16)
Cm1-11. V 1j'y,
or
(6.17)
mi =xmt-1-+'sye
et,+1-
yt+t
-f
Cm%
we have
ee+i-ye+i-fC(I-BC)-ise,
Hence,
-60-
(6.18)
ee+i=(1-BfG(I-B%)-la)Y41
n
(6.19)
Nte
that,
det(I-BC)
and
det(I-B%)
are
fl(1-7,
B)
and
fJ(1-p;
B)
PI(B)ee+t-
n fl (1-XiB)yi+t : -1
(6.20)
(1-P. B)re+t-P2(B)yj+1 : -i
(6.21)
H(1-X1B)
II (1-P1B); --Pi(B)P2(B)
The result follows using the factorisation This result result obtained
theorem. (1976) and hence the same through special DLM The
(1975)
That is B=I.
can be relaxed since (2.16) and (2.17) can also be obtained unbiased models. linear estimation. Hence the results may be
minimum
variance
8.4. A GENERAL
Any constant observable NWBM
LIMITING
constant M={f,
THEOREM :
NWBM {/, G, V, B} , is similar to the canonical
G, V, B}
where
writting
B=diag{(31, 2,.,.,. %} ,
-61-
j=
1=X1/(R1)"=u1
if
i=1,2,...
n.
h,
j=0
otherwise,
', ;
<
j =! a,, with
a,, =ui
for jzi
and
a;j =0 otherwise.
6.3,
that
limC= =C=
equation
H'Q_QH-1+g,
I'I
of Q=(9,, } :.
ui -1)
(uiu2-1)
-i
_1 (u; ul-1)
for
i>2
and
qi-1, ui k 5i, k
Uk
where
Sr, k `qjj
k=9.,
k'
and
i-i
It follows that
i)
C=Q-'V
AA
ii)
s=Q-lj'
a1=1-
flu; : -i
and a. =(-1),
+i(1-al)fl(uu, ;. 1
-I)
Va
Y=
(1-41)
fl =V u?
i
-I
-62-
iv)
W=HCH'-CCC'
B.S. RELATIONS
B) Y,
lime', =a,. Applying the appropriate Dynamic Model ; j. C. v..: to the realised series
(1-piB)et}=O
j-X i=1 i-l
=0
Box-Jenkins forecast
function is equivalent to that of the Dynamic Model. For an unbalanced ARLMA process
pv fl(1-A1B)yt i-i fl (1-P: B)ae = i-t
Let n=
{p, max
q}.
(or p
n=q
) by taking p-q of
forecast functions,
all ARIMA
processes can be
taken as the original prior variance then, the forecast functions can be identical to that of ARIMA models all the way through the sequential analysis. However, as
informations NWBM in a sensible way. This the parameter stated earlier, provides simplifies explaining and controlling the process and models behaviour.
63 -
6.6. SUMMARY
This chapter
:
is concerned with the derivation matrices, some well of some interesting limiting
the adaptive coefficients and the known and simple within canonical similar the The
functions
In particular
transformation
Limiting
variance.
and covariance
Box-Jenkins
is discussed in
rr
CHAPTER
SEVEN
MULTIPROCESS
MODELS
WITH CUSUMS
7.1.
INTRODUCTION
:
data sets are based on the assumptions collected and well behaved. properties that the input in practice, Often the
Many: analyses of statistical data is free from exceptions, it is hard to believe that data contains missing
properly
However,
can be guaranteed.
values,
the occurrence of any of these events in sequential and damages the available prior information as In This various occur
breakdown
These events call for model revision and amendments. ' Management producing by Exception routine ' is widely applied. by
forecasting, constitutes
mathematical
methods
forecasts
required
decision makers.
These forecasts
circumstances
of a major
change arising from the use of reliable market (1967)) Harrison and or, to the occurrence of
and Scott(1965)
of demand which causes unusual forecasting errors A flowchart of the principle is given in Fig. 7.2.
A Management
by Exception
Forecasting
System (Fig. 2)
Regular Data
Market Information
'*vAARKETDEPARTMENT: ', information to provide forecasting system. routine Vet forecasts and issue USER, DEPARTMENTS: e.g. Stock Control, Production planning and purchasing systems. Market planning, budgeting and control
Exception Signals
Forecasts
,.
-66In this chapter, efficient statistical models are introduced to deal automatically with exceptions. Ameen and Harrison (1983 c). Section 2 reviews the historical background and
j$;;;
discounting in Section 4. In Section 5, the ideas from the backward CUSUM and the
multiprocess combined to models of Harrison provide both Stevens together and and with the Modified NDB. %I's are models called
economical
efficient
multiprocess
Ameen(1983 a).
7.2. HISTORICAL
Woodward
BACKGROUND
AND DEVELOPMENTS
:
tests to detect
and Goldsmith
CUSUM
The procedure is given by Page (1954), Barnard(1959), Harrison and Davies(1964) used CUSUM's for
and Ewan(1963).
details in Section More 3. These data on the reviewed are storage problems. reduce CUSUM statistic can be found in Van Dobben De Bruyn (1968) and Bissell(1969). (For Wald(1947)). tests see general sequential Previously having detected a change,, ad hoc intervention procedures were applied. The first routine computer forecasting systems for stock control and production planning, linear ( Holts ) Moving Averages Weighted EWMA Exponential growth and employed limiting forecasting All the methods used model, with or without seasonal components. The behaved data. long history occurrence of well predictors which assume a reasonably that, means change a major of in some respects, the current data does not reflect a well
behaved process and that there is greater uncertainty than usual about the future. Hence the next data points will be very informative in removing much of this increased
the by limiting be be than they allocated would given more weight uncertainty and should
I
-67-
predictor.
and
e, =y, -'n,
-t
a, =0.2
for demand is the period t. observed where y, Suppose that in the limiting that mi = 100 with department case, the variance variance V(c1) = Var ( f1IDt_1 )= of a CUSUM 125 and signal.
an associated
Marketing
level is not now 100 but 150 and that their variance associated with this estimate is not 25 but 300. In the past there was no formal based upon the assumptions way of dealing with this. derived of stationarity Classical time
are inappropriate.
here is originally
r* 9-i/10 a'+'0.2
if i=1,2,.. 6 if i>6
This approach is not very satisfactory nor does it generalise well in dealing with other kinds of change. The DLM's of Harrison and Stevens(1971,1976) and the NWBMI's introduced in Chapter 4 provide a formal way of combining subjective judgements and data. In the DLM is the adopted example above
Y9=99+v9 ; vi--N[0; 100]
0t 0j_l+w$
w, -N(we+W
-681,, A, represents the underlying market limit (Ai ID, )-N(m,; 20j. The limiting level at time t, W' =5 recurrence relationship IID, and usually w, = 0. In the is mj =m, _t+0.2e, and the
limiting provides
is (YY;
)-N(mi;
In the
information distribution
is communicated is not
as we+l-N50;
400!. 1IDj)--N(150;
Immediately .
value of 0.2.
the same results can be achieved using an NDBM time its value is reduced to ii
discount factor =0.8 where at the intervention adjusting the state prior mean from 100 to 150. related works are those of Kalman
=0.066 with
Other
(1963),
Smith(1979)
and Harrison
and
Akram (1983).
Bayesian forecasting provides a means of dealing with specified types of major deals forecasting forms These that the change modelled with of. are so system changes.,,
them in a prescribed way. the initial implementation of the resulting multiprocess models
is described in Harrison and Stevens (1971,1975,1976). In addition they involve to the limitations drawbacks and
Smith
these state
models
to monitoring
Restricting
is success
(1983) Smith in Gathercole the by reducing computation and achieved redundant Makov(1983). models according to some pre the specified rules.
For another
In general
practice
development
methods
CUSUM
:
6
-69-
Control
departures
charts provide
simple
changes and
control. In fact Further
in quality
Sum charts
the amount
and direction
Length Run Average increase to the this topic the sensitivity use and of on Woodward and Goldsmith(1964) and Ewan value, the
Barnard(1959), in found be the signals can of and Kemp(1960). CUSUM statistic Given
T the and as a target observed value as process yt where e, =y, - T. Then choosing
S, is defined for each time t as Ytvisual , S, is Normal inspection with zero mean.
two positive
from hole V-shaped the and placed a piece of cardboard on cut out graph with the a using vertex point. V-mask the of pointed horizontally with a distance (Lo/a)+1 from the leading No change is and Davies The target
being apart with angle 241 where tan4, =a. , curve remains inside the V-mask. for monitoring forecasts of-product Harrison demand.
is the one step ahead point value forecast errors. ahead economical algorithm Define
forecasts so that the e, series is that of the one step simple and
ae+1-min{Lo, at}+a-ce+i/Ye+,
",
and
(7.1)
min(L
}+ate
kt+l 4
Yt+1 forecast A is is if the ahead variance. step change signaled one and only if where
Initially 0. 4g} < min{a,, lines. guide used as THEOREM 7.1. do=dO=Lo. In choosing Lo and a, the following facts may be
Given that the V-mask has not signaled a change at time t, for time t+l,
-70-
i)
x
11.
(7.3)
.3
11)
(7.4)
< 0.
ii)
^F
Substituting (7.4) in each of (7.1) and (7.2), it follows that a, >0 +l >0. ,
;
and d, +i
4i
7.4. ; NORMAL
WEIGHTED
BAYESIAN as
MULTIPROCESS proposed by
: Stevens
represents a NWBM
where
N
P('); LOis the posterior probability of model i at time t, the model transition probability vector such that
P, ')=1
i, ,
l'1=(aitl,.... 4j
is
'Tkjl'1=PAMIMI'il
j,
that is the
time k the t-1 that t time that at operational model given operates model at probability in 's known 's M') Initially the that the M{'l. practice are assume p(') and was , although let At ') be N t-1 time 's there v posterior the on-line. conditional are estimated , distributions forecasts are : (YIM'
t-1, ' M'
(0,
D_1)-N[j,
(ii)
71 -
where
`(ii)_F(i)C(i)wat_1(`) and it, ('i)_F(i)RI(i)F, (il, y (i) , with
li) ,
is expressed as a mixture
.V
)
of N2 Tormals
Ye De-t)'_''
PC
-i
Also. t,
N posterior given
models at time t-1 , ,V2 prior models are produced for time the .V2 posterior models
for which given the data at time t and using Bayes theorem,
where
el
Ae(+)=81
(ii)_
(li)d-t )
('1)
11-1,
(')1, D, ) L(MM')IM,
yg(a)I I
are :
I)n1i)Pt-1
In practice, in order to keep the computations manageable the same collapsing , Stevens Harrison is defined by used to complete the cycle. calculating and as procedure
as
-72N
Ptti) i-i
N
p9(+i)
Cti)_
N P+i)iC`a)+(me")__ . =t
As in partitioned
Harrison
and
models,
the
NWB
multiprocess
models are
interactively.
class is used on-line for Class II models are used models. As explained
model discrimination, for modelling in Chapter multiprocess applications. as outliers cases slope Brown(1983) generally
model estimation
and hypothesis
some prescribed
and alternative
as NWBM's. as
applications
counted
multiprocess
The former
has worked
well in analysing
in level and seasonality and sharp changes changes have not been modelled
in many
as would
be desired.
also commented
on this problem.
small compared
Hence slope changes are sometimes Cook(1980), distinguishing Harrison and Davies
A further
criticism
of these
These involve is that they problems are multiprocess models unnecessary computations. Class by introducing of a new class of multiprocess models using a combination overcome I and Class II models with the Backward CUSUM statistic as a control device for shifting model operations-from Class I to Class II models. Class I is retained when one of the
probability limit.
-73-
7.5. MULTIPROCESS
MODELS WITH
CUSUMS :
there is a preferred model tif(1) called the .
The analogy with quality in an expected way. control is
(1983).
The other In
i>_2 M'l; model some generally significant models , particular outliers or mavericks and significant
type of departure
In the new approach the mother model M(l) is represented by a NWBM {j, C, V, H}i
This model forecasts which produces are used unless a departure from normal is
CUSUM by the scheme which operates signaled Then, starting with the latest observation which
on the one- step-ahead forecast errors. helped to trigger in a multiprocess the signal, the other with a
Class II way,
of transition Pt'
model ( NO
the
probability
model is
scheme is reset.
operating,
the competing
models are
and
in C3} C=diag{C1, C1 represents a trend component For example which consider , d Cz a seasonal component. Let model major changes in trend, .1M3 model and model M4 model major changes in seasonality. vectors respectively. For M 'l
outliers,
The priors for time t+1 are then formed from the posteriors as follows: r02 lG1MIW8 ('))--N[ (D eMe
e+i
1(x) 83 (x)
+ (0) U) () G zs R. R2 wa
-74-
where $k'=
I,. " " -, Cl, Ck(1) C'k/P( k1)
k =1,2 ;and
otherwise
R3$)
= C1 C'311)C'2"(01
02
0<01
)<011)
=0i3)i4)
<1
({)
(2)-
(3)
the change,
from the
as stable as
possible in order. to, prepare good estimates structure estimates between the model components
violent
fluctuations
components.
In addition so that )P .1
a set of preparatory
probabilities
signals
values for the NWB multiprocess that marginal characterising order to exercise control
the change continues to be taken from the mother model. over the response of models to exceptional for alternative events,
a guard
variance
the one-step-ahead forecast variances with prespecified i1) , ll , The outlier (2)
01), R(1) and 12) can be calculated. (2) and '. This gives j(3)= Yi2)
variance
-75'). Y4')= ]
Since defining r, =f
(
, C1 C;;,
(4) ) -1
In particular,
&_1C',
I ';,
(1) (4)
we have
1r2+ )
r2+2(1
during If first the single model phase, the CUSUM required signals a change. the Chapter described in 5. be V methods using either on-line estimated may variance During the multiprocess phase the forecast distribution
the mixture function of Normals.
introduced
by Lindley (1976).
-767.8. SUMMARY :
The principle forecasting the of of Management systems by Exception in Section is discussed and a historical 2. The backward multiprocess Finally CUSUM background statistic is in
is given
NWB Section Section in in 3 4, the and reviewed the light Harrison of and Stevens multiprocess
models.
seasonal model.
t,
,.:
", z
-77-
CHAPTER
EIGHT
APPLICATIONS
8.1. INTRODUCTION
This situations. since any chapter
:
of the developed theory using the principle can be decomposed vectors. with This in a variety of
is devoted to applications
combination
of component
multivariate block
random
is associated
a meaningful
I2,.,.,. f, 1 and B=diag{131I1, 212...... , I, } each with G; =X1 's being distinct X, the , eigenvalues is, fora are pair
dimensionality. of C.
processes,
where complex
That
the adopted
form is
1, sines cosw This could represent a damped sine wave of period 21 and would L sines cosw W have a single associated discount factor 00<0The discount criterion. factors used
are not chosen according to any optimisation Trigg here. research Leach(1967) and
have attempted
discount
factor as specific functions of sign and absolute one step ahead error forecasts. with 0<P<1,
This gives
the
as
the
weight
corresponding
to a data
point
that
is k periods
old. N-1.
Comparing
the average
average,
(1/N) 2 (N-1-i)=(N-1)/2, . -0
to the
model,
Montgomery
and Johnson
-78(1976) have obtained the relationship =(N1) /(N+1). Using this relation,
Agnew (1982) suggested that 0.33 <_13 0.78 Clearly, such low values of give highly .
adaptive unsuitable encouraging models with large lead-. time forecast variances , which would be totally more where data
and production
planning.
,N
This leads to higher li values. periods. the discount factors here are chosen more
the discontinuity
Experience shows model robustness against this choice. In modelling protect model discontinuity periods, the Modified from NDBM 's are used in order to
components
information
unwanted
interactions
For a straightforward
application of a single NDBM to a data series which exhibits US data the air passengers see set which is analysed in
Chapter 2. Other selected series considered here , are : i) A simulated seasonal series with trend, level, seasonal changes , outliers and in is This the to the these performance examine phase of missing observations. discontinuities and major changes knowing the true underlying model. The
prescription
multiprocess models.
iii)
For a typical data set with an unknown and variable observation variance, the Road Death Series is chosen and the CUSUM multiprocessor is applied with an the variance. observation of estimation on-line
-79-
8.2. SIMULATED
SERIES
In order to examine model performance in phase of major changes and impulses with
a minimum risk of misspecifications, artificial data is generated. The data is analysed
Intervention. both an automatic method and using using For an automatic way of dealing with these changes, a multiprocess model is used.
Automation from in analysing point statistical of view. data sets may not be a desirable property However. multiprocess to aim for
a Bayesian
applications
in the detection
8.2.1.
Simulation
The artificial
data is simulated by the superposition of three component series. =rwl, w3Jt and
These are an independent random noise v,, a linear trend component r'1
by harmonic '2,9=(W31w419 The 12. represented a single of period component cyclic 0 a , is carried out using the model : simulation Yt=f O'+v'
where
C=diap{C1,
C21
C1= I
1,,
Accordingly,
i)
observation at t=32,
in order to
iv)
following t=85
8.2.2.1 INTERVENTION:
Intervention involves changing a routine or existing probability model often by
introducing through
In classical time series, interventions are specified' In Bayesian Dynamic Models, distributions which not only
transfer
intervention
is achieved through
introduce an expected effect but also introduce an extra uncertainty associated with the change. The object of structuring a model, is to enable changes to be made to particularmodel components in such a way that leaves other components largely unaffected In the following example, a useful way in which additional uncertainty can be specified through the discount factors is illustrated. role of the state random noise w,. V B, } is applied with f For the data simulated in 8.2.1, the NDBM {J, C, VV, ,G and from Apart intervention s, 3, i34}9. times B=diag{i31, defined at of there and as
values 1=s=0.9 and 3=i=0.95 for `optimal' No are used. attempts discount factors.
the
to be appropriate
bearing in mind that it is usually preferable to err on , factors, Harrison (1967). Initially , the same When the
discount
but for go with a vague covariance are adopted values it is that (v) (i) to to assumed occur about are changes
matrix
2000 I.
81
model components
may provide.
Foreknowledge growth
of the jump at 61 by
change at t=
imprecise
descriptions
sharp
At these intervention
times a Modified
as described
The simultaneous
sudden change in trend and seasonality . The three missing gY observations of 0, for t= 113
are dealt
distributions .
114, and 115 as the prior parameter , and the corresponding one step Since Mean
distribution
(Oi ID112)
in order to demonstrate V=
method.
400, without
the limiting
Deviation Y=
forecast be the errors would step ahead one about 18.7 = of performance in terms of the
The performance
YEAR
rs
1 39.5
2 20.3
3 20.5
4 20.3
5 15.0
6 17.2
7 17.4
8 20.1
9 21.5
10 15.8
MAD
observations 00 00 a 0 0
It $ 3 -
I I 0 :
(D
T -+ 1. ,W
1
(!)
U7 (n
4*
tD
Z 1rn
o o
CD (D 1 4) < a !V .. `O rt 0 CO
a T
13
x >
1 -0 1IIrn 17
v)
et -o.
W
I :z 1-a IM
C
O3<_.
' Iz 0% 1-4 et
I-.
Irn
o o"- 1 3 lo
-;
Iz
1-4
'
In
I l- I
Ir
10
O
CD
Ln Irn
,, >.
8.2.3.
For an automatic way of dealing with the major changes in the series , it is assumed
that the series is monthly Given with an additive linear growth and one harmonic seasonal could be
component.
the possible changes in the series which seasonal changes and/or However, changes, combinations
of them ( 23
with the number of models considered. models should be considered. results. an NDB multiprocess model is
This suggests that a fewer number of alternative For a rough comparison constructed factor with intervention
A basic or mother
in the basic intervention and I3 2=021 an outlier factor and the variance
discount
are found
observation probabilities
variance controlling
0.8,0.095,0.1 are taken t+1 as time change models at Given the same initial NDB multiprocess f_ using 'case, intervention in the as settings
and 0.005 respectively. case, the data is then analysed controller. In the latter of 0.8 for
models without
0.5 threshold 2.0 with probability taken respectively a and L,, and a are as back to single model operation. intervention the unlike model, multiprocess
switching
As is to be expected,
adjustments. model proper make and to changes recognise time 3,4,6 for the years errors observed
Level in the and growth models. changes the alternative are combined of selection the This growth together. is change seasonal and are while, modelled to model, change trend
84 -
model selection
A summary
of the
in Fig. 4 and Fig. 5 presents that with after removing the errors
statistic.
MAD's
The
performance in terms of MAD for each year in both cases is given in the following table for comparison.
Multiprocess
Models
Simulated -
Data
( with
and
without
CUSUMS)
YEAR
1a1i689
10
without
CUSUM's 32.8 21.3 34.7 40 4 15.6 22.5 16.5 263 209 19.3
with CUSUM's 26.9 17.9 23.9 29.1 13.9 28.0 14.9 32.9 22.4 17.5
that
CUSUM's is to be preferred in both terms of performance and computer storage and it is interesting Moreover, to note that, time. running the order of preference among the available to be used for the
information is in the amount of accordance with models, model construction information . The intervention
CUSUM's be the best a preferred models with when multiprocess choice would second ,
known. is model Finally, if no information about a preferred model and the types of the CUSUM
occurrence is known, times their change and statistic be the candidate. would
the multiprocess
model without S
observations
N pp Opp O ON
I-
.., .
r
1.1
39
1-s
0) cl
N A ft
3 _
C w
ID
CD 3
IC, I' 1r
Cl) -
to
1 1i
fk'*2 F"` i
, , ..-k _: S ..
c
W 0%
m
D
G1 T m G) a -0
o C"
N (D '7 <
-4
_v 1m 1O 1 f'7 rr
o--, .. { s k
Bt :{ a'.. 't
. tCo
r'
1 Cn
1(n
> rn
0 > I
1 fD
rn IC! f
C)
a
f) CA 13 C r
>
ti N
o 7
1-a rn
CD r
(! 1
Irn Im
11-
'0 0.
rn (J1
Co
N O
.11
of
-" o
n.i
0 o
observations
N
O C) O
a 0 0
$1 1I 37 N N cD
1f7 -+
O
(D f!) r cD "D
C) 1'
i11
l11
.'rcN 0 `;
d c: z
-X)
m )Cl
C
M1
W
CDN
.<
c'mI (D N
C)
ID a.
1>"
1 P*1 CZ) 1 t! )
CD,.
cc
a a1
co
a m C)
(A
Irn I3 1I-.
Irn ()
"
ON
0 7
-1 0 7
Ic ($)
IC 13
11 In
r+ N "' O co co 17 r O 7
10
01.
CO
N O
8.3.1.
The Data :
medical data set giving the number of prescriptions according for five years
1966.
variance is reasonably assumed. charges in June in December 1968 caused a major an outlier.
prescription epidemic
1970 'caused'
multiprocess
modelling,
these events are not known. automatically Modified initial together with
unobserved
is analysed
NDB multiprocess
models without
information prior
22 0; 000
10 00 00.25
0 251,1
types of major disturbances are considered, namely , sharp trend changes and outliers. 8.3.2. NDBM Multiprocess Models - Known Observation Variance :
Here the routine model has a linear growth and full seasonal components with discount factors l = 0.95 and z = 0.975. The discount factors for the corresponding trend change model are (1,(3z)=(0.02,0.975 constant throughout so that.
()}=0.05 I"' f lr2=P{, tte+t(2 ")}=0.025 ,
7T3Oft+1(3)lAfj
; ta123
-88where the A`1 's are: observation Mother Outlier and Trend The
variance is estimated
as 0.36.
model information
_i
and that at time t model j operates is is (0! M1,MM,Dt; NDBM rules. the outlier R; jJ
distribution
according
In order to control
the response of the models to the outliers that Yz= Y3 . Point predictions
are obtained
introduced
by Lindley
along with
the observations
percentage one step ahead error forecasts. The first 12 observations seasonal pattern month 24 recognition. the model has recognised a minor
charges at month 30 has caused a negative error between outlier and sharp as an in
the influenza epidemic at month 48 is properly in terms of MAD, for the last four years,
identified
The performance
is tabulated
Section 8.3.3.
one point
i NON OO
step
error
of No. of
NW Op O
prediction
prescriptions
(000
s)
I
001 Oc0 1 < G1 7 m Co "t 1D 0) (D
I-1
1C) 10> 11 1-. 4 1= 1 rn
o 7T N
N r N r
1 1-0
Irn 1 U)
o.
o
u
a -1
"+
**
lo 1z 1cr
ID In
In, IU1 C11
Co
ON 0
*`
The same model specifications described in 8.3.2 are resumed here with
Backward CUSUM statistic with initial until values La=2.0 and a=0.3. the CUSUM interactively monitor Predictions
Mother the on
model performance
signals a change,
is regained for the Mother model. during the Mother model performance other models run
in parallel as described in 7.5 as preparatory arrangements for comming changes. The
is summarised
in Fig. 7 together
It can be seen that all the changes are properly identified is slightly better than that
of 8.3.2 while the process time is reduced with that of the multiprocess models
the CUSUM
two models.
.. 1 .
YEAR
CUSUM's without
1
0.63
2
0.52
3
1.79
4
0.47
.;
},.
with CUSUM's
0.57
0.46
0.7
0.34
a" "i5
No.
ONNW O O Cl
of
prescr
pt
Ul
ons (000 s)
O
I ti i J / \
rc o L -o -0 n)
CD CD
nn CC u) (n CC
33 1Q N A
i ** *
*
*i
001. o- 3 Cl) mCD (n < f mI -n n JT mm 0) aiCL m C)
-11 I c')
1ti i= i rn 1-o 1rn itn IC) im 1D 1
-4 i, 1o
it
-o o
W -D
rn icn
iz i itn irr I-
0 7 N
I co CL 0 *
-t-
4a,
it
1
im IO
-92-
8.4. ROAD
DEATH
SERIES
8.4.1.
The Data :
representing quarterly road deaths in U. K. for the present
years 1960-1969. - It can be seen from Fig. 8 that the main observed discontinuities
in the series are the outlying observation in the first quarter of 1963 due to a cold icy
winter 4; preventing traffic using many roads and the trend ,a high variation change in 1967 due to the of the observation error
introduction variable
of breathalyser.
Generally
can be observed.
an on-line
estimation
of the observation
activity
falls during boom is traffic the slump roads on and a period when more in a model and would lead to more reliable purposes and no attempt is
predictions.
8.4.2.
's :
In this analysis, Modified NDB multiprocess models are used with CUSUM's. The
alternative models assumed are: factors trend change, outlier 0.9. and seasonal changes. The main
These figures are lower than the data. The trend change
example
the quarterly
((3l, 2) = are
the discount
The defined in 7.5. variance variance for observation the models alternative control rule be it to is proportional assuming on-line the estimated model main of i. deaths, e. number of
V, =aE{Yt IDt_1}.
to the expected
10. X0 initial in 5.4 defined law and = values the with is power using estimated on-line a
-93-
,rjo =
probabilities
are 0.7899,0.1,0.11
CUSUM The initial in 8.2.3. Lo 2.0 parameters where as order = and a=0.5 same the threshold return probability is 0.8. For this model:
j ='. 1.0.1.0.1' and C =diag{C1. C: }
010 and Cz= -1 00
and
Where
cl=
1I0 1,
-1
( 2 IDO) a
N:
0 90 -50 30
; diag{1000,100,100013}1
Finally,
is the in Fig. 8 showing that the performance and presented results of a summary
dealt The CUSUM with successfully. the are the summary changes on prescribed all direction different the model operation and of periods of changes at the statistic shows breakdown points. model
deaths cu
C)
per
quarter
O. m
C)
41
11 ij
I= 1: 10 I fC
I 1
I "0 *
iF r *
a-
I I
:0
"TI 1
1. 07
1b
(
")
CD
N
CC)
_ CD
C'f
m CD
C7 y 3 N SI * *
01 cD eS b
0 N O CD d C)
1> O
ID I >1= I CA I X) 1 R1 1 C! )
%y % N . '
3 I0
Uf
D`
`\
0'
N O
eft
N r
N r
0
*
N co
N OJ
N /
W % I Jr
Uli N
95 -
8.5.
SUMMARY
:
of applications of the Normal Bayesian Models based on are
through
discounting
by Exception
presented. intervention
is made by examining
C'CS AI's.
3, the models are applied to a real data concerning observation quarterly -appropriate variance is assumed.
the number
96 -
CHAPTER
NINE
DISCUSSION
LL _ "": ..+ . mau ;i
AND FURTHER
RESEARCH
A.
in statistics, and
In, modelling
future
on-line,
any environmental
or external effects that are not, anticipated. A The pioneering' with work 'oP Harrison and Stevensi 1976) sequentially has provided applied
statisticians static
with time,
and limiting
applications
by Harrison
Stevens(1975), and
invariant not scale are ambiguous covariance matrices , Harrison Roberts and sense of (1984).
diverted in difficulties the them to the use of other problem and estimation considerable less constructive models. The principle aim of this study is to replace the state error variance by a small
number within invariant discount of discount the Bayesian factors. This gives models which enjoy the principle Discount factors of parsimony they are
framework.
parsimonious
the potential
in practical
in particular,
types of discontinuity.
of discounting through
the classical point estimation with DOUBTS and ARLMA are pointed replacing the
framework
and testing
comparison
and limitations
known classical models are retained in the sense that they have the same limiting functions extensive. as special constant NDBM's. However the Bayesian facilities
Two methods
estimation
variance.
high stochastic
computer
time problems.
simple and easy way of communication Almost all types of major disturbances
in phases of major disturbances. in time series processes are series of that principle kind are
are common
components,
detected successfully. of change are information multiprocess multiprocess disturbances on major models. Clearly,
However, is often
in the analysis of real data sets advance in which case it is useful to adopt the resulting analysis from the
missing,
be is to expected, as
is less from Fig. 4 intervention in those than the successful models shown no information on the disturbances is fed
models. Apart from the missing observations, into the multiprocess models with models. More efficient
CUSUM's
little a where
more information
is provided
existence of a particular
model representing
demonstrate These data to sets. are used real discontinuity types of certain observation noise is present. occurring
when the data is fairly stable and also when high The
This causes a delay in the task of recognising changes. dealt with are promising. statistic In particular,
when multiprocess
the CUSUM
is recommended for efficiency and economy influences and misspecifications when major
protect component
-98disturbances are present. More applications can be found in Ameen and Harrison (1983 a b, c). In all cases the underlying model parameters have been given physical meanings are provided
It can
to transfer information
be argued that the
from or to other
amount of further of effort
of interest.
developments
exploitation and
to the amount
spent in developing
The following
lists a number
of suggestions for further research: iThe models deal with Normality Filters processes defined only on the entire real line with the
assumptions so that successive estimates are obtained using Kalman However, in many real life problems , processes
recurrence relations.
sample spaces and do not cover the real line estimates outside their
These points
and demands
Smith (1979) and Souza and Harrison (1979) have extended the DLM's to include non Normal Steady State models. These ideas are combined with discount principle, models. iiAmeen (1983 b), the
The forecast functions are specified using the design and transition matrices. It is important to develop methods that provide more automation in model identification and a proper Bayesian on-line parameter learning procedure will
improve the performance. Some considerable success has been achieved by
Migon and Harrison (1983) considering non linearity and non Normality of the
processes. iii-The discount choice of factors is left to the modellers and work needs to be The generalised EWR and
-99Godolphin by out and Stone transition (1980) for the DLM's matrices. forecast Also, with in which lower explode they
pointed
discount rapidly
of lead time
distributions
iv-
v-
The
limiting
results
obtained
are
mostly
based
on
specified
canonical
representations and more general results are possible. viIn a general context. interest more applications when the of the theory in different process is subject to fields of
a dynamic
development
an overall improvement
APPENDIX
.
U.S.AIR PASSENGERSDATA
YEAR JAN FEB MAR APR MAY
JUN JUL
SIMULATED
DATA
JAN FEB
189 108
270 261
392 318
221 179
318 317
411 404
273 267
233 192
377 340
449 444
MAR
APR MAY JUN
93
77 42 52
192
201 166 150
335
283 276 253
185
114 122 80
269
234 198 185
347
267 238 216
202
176 146 84
163
106 59 69
246
202 185 175
377
338
JUL
AUG SEP OCT NOV DEC
67
75 155 236 320 270
193
244 255 300 343 382
282
86 387 388 501 482
143
148 205 292 307 331
239
237 314 343 409 408
187
193 242 269 272 306
108
132 157 187 206 207
108
130 201 268 319 375
222
254 338 402 478 467 373 433 483 567 617
101 -
PRESCRIPTION DATA
YEAR
1966
23.1
21.4
1967
23.9
1968
25.9
1969
23.1
-- 24.3 _.
22.3 23.6
22.3
1970
23.3 23.3
21.8
24.4 25.2
23.6
22.2 23.8
22.4
NOV DEC
. , ,
22.8 23.1
23.8 26.6
21.7 23.4
21.0 28.6
22.4 23.7
960
1 2 3 4 486 514 614 710 1 1
961---- 962
516 546 587 653 501 499 587 650
963
400 547 619 742
964
570 582 664 790
965
592 648 660 751
966
578 604 658 822 610 542 659 629
96L L
518 499 603 650 518 541
102 -
REFFERENCES
[1]
AGNEW,
R. A. (1982). Econometric
forecasting
via discounted
Vol. 29, No. '3,291-302. to discussion of the paper by E. T. Jaynes. Valencia, Spain, Sept. 1983. models and forecasting.
Bayesian entropy
Forecasting. of
[6]
International Meeting Bayesian Statistics, for Invited the on second paper models. Valencia, Spain, Sept. 1983. (7j AMEEN, J. R. M. and HARRISON, P. J. (1983 c). Discount Bayesian multiprocess modelling with CUSUM's. Proceedings of International Time Series Conference,
Nottingham, (O. D. Anderson ed.), 1983. North Holland. (8J ANDERSON, 0. D. (1977). A commentary on "A Series ". Time survey of
Soc. B, 21 239-270. , [11] BISSELL, A. F. (1969). Cusum techniques for quality control ( with discussion) . d.
103-
R. Statist. Soc. C, 18,1-30. [121 BOX, G. E. P. and JENKINS, G. M. (1970). Time Series Analysis, Forecasting and Control: San Fransisco, Holden Day.
[13J BOX, G. E. P. and TIAO, G. C. (1975). Intervention analysis with applications to
Ass.. 70.70-79.
[141 BROWN, R. G. (1963). Smoothing, Forecasting and Control. San Fransisco: Holden
Day. [151 BROWN. R. C. (1983). The balance of effort in forecasting. J. of Forecasting. Vol. 1,
No. 1.49-53. [161 CANTARELIS, N. and JOHNSTON, F. R. (1983). On-Line variance estimation for
the steady state Bayesian forecasting model. J. Time Series An., Vol. 3, No. 4.225-
234. [171 CHATFIELD, C. (1978). The Holt-Winters forecasting procedure. App. Statist., 27,
No. 3,264- 279. [18] De GROOT, M. H. (1970). Optimal Statistical Decisions. New York, Mc Craw-Hill. [19] DOBBIE, J. M. (1963). Forecasting predict trends by exponential smoothing. Opns. Res. 11,908-918. [20] EWAN, W. D. (1963). When and how to use Cusum charts. Technometrics, 5,1-22. [211 EWAN, W. D. and KEMP, K. W. (1960). Sampling inspection of continuous
between successive results. Biometrika, 47,363processes with no autocorrelation 371. [221 GATHERCOLE, R. B. and SMITH, J. Q. (1983). A dynamic forecasting model for a
time series. Proceedings of International Time Series
Conference, Nottingham, ( O. D. Anderson, ed.). North Holland. [23] GEBEL, A. (ed.). (1974). Applied Optimal Estimation. MIT Press, Cambridge. [241 GODOLPHIN, E. J. and HARRISON, P. J. (1975). Equivalent theorems for
104-
projecting
J. R. Statist.
Soc. B, 42,
35-46. [261 HARRISON, P. J. (1965). Short-term sales forecasting. J. R. Statist. Soc. C. (Appl. Statist.
[27]
15,102-139.
P.. 1. (1967). Exponential smoothing and short-term sales forerasting.
HaRRISON,
Conference held at
[29] HARRISON, P. J. and DAVIES, O. L. (1964). The use of Cumulative sum ( Cusum, ) techniques for the control of routine forecasts of produce demand. Oper. Res.(J. O. R. S. A. ) 12,325-333. [30] HARRISON, P. J. and JOHNSTON, F. R. (1983). A regression method with non , ( Submitted ). J. O. Rep. R. Warwick Res. 35. to of stationary parameters. ]31] HARRISON, P. J., LEONARD, multivariate T. and GAZARD, T. N. (1977). An application of Res. and
[32] HARRISON, P. J. and PEARCE,S. F. (1972). The use of trend curves as an aid to
Manage., 2 149-170. Ind. Mark. forecasting. market ,
[331 HARRISON, P. J. and SCOTT, F. A. (1965). A development system for use in short
investigations.
Meeting
ion "
[35] HARRISON, P. J. and STEVENS, C. F. (1971). A Bayesian approach to short-term forecasting. Oper. Res. Quart., 22,341-362.
Biometrika,
[39]HOLT,
of
inference. Invited paper for the Second International Meeting on Bayesian Statistics, Valencia, Spain, Sept. 1983. [41] KALMAN, R. E. (1963). New methods in Wiener filtering theory. In Proceedings of
Application Random Function Engineering Theory Symposium first of the on and ) F. KOZIN. New York : Wiely. BOGDANOFF ( L. J. Probability. and eds.
[42] KALMAN, R. C. and BUCY, R. S. (1961). New results in linear filtering theory. J.
tos -
[47] MAKOV;
models in the presenceof jumps. The Statistician 32,207-213. [48] : MAYBECK, P. S. (1982). Stochastic Models. Estimation and Control. Vol. 2,
advertising.
[51] MONTGOMERY,
Analysis, [521 MUTH,
McGrow-Hill,
(1981). J. E. "Forecasting `
Canada. Quebec City, Forecasting. Symp. First Inter. Paper to on updating methd. [53] O'HAGAN, A. (1979). On outlier rejection phenomenon in Bayes inference. J. R.
Statist. Soc. B, 41,358-367. [54]' PAGE, E. S. (1954). Continuous inspection schemes. Biometrika, 41 , 100-115. 155 PRIESTLEY, M. B. (1980). State-dependent models: Ageneral approach to non-
[571 SMITH, A. F. M. and COOK, D. G. (1980). Straight lines with a change point :A Statist. 29,180-189. data Applied transplant Bayesian analysis of some renal .
J. Q. (1977). Problems
in Bayesian
Phenomenon, Catastrophe
. F_", >>
107 ;,
[601 SMITH, J. Q. (1979). A generalisation of Bayesian steady forecasting model. J. R. Statist. soc. B, 41 378-387. ,
[611 SMITH, J. Q. (1983). Forecasting 32,109-115. accident claims for an assurance company. The
Statistician,
approach
to forecasting Holland ,
: The multistate
535-542.
[651 SOUZA, R. C. and HARRISON, P. J. (1979). Steady state system forecasting :A Bayesian entropy approach. Warwick Res. Rep. 33. [66] STEVENS, C. F. (1974). On the variability Res. Quart., 25,411-420. of demand for families of items. Oper.
Soc. A, 132,
108'
Sci., Man. 6,324. averages., [74] WOODWARD, R. H. and GOLDSMITH, P. L. (1964). Cumulative Sum Techniques.
( first edition
P. C. (1971).
Recursive 10.209-224.
A. (1971). An Introduction
f, y