DML 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 117

University of Warwick institutional repository: http://go.warwick.ac.uk/wrap A Thesis Submitted for the Degree of PhD at the University of Warwick http://go.warwick.ac.

uk/wrap/3955 This thesis is made available online and is protected by original copyright. Please scroll down to view the document itself. Please refer to the repository record for this item for information to help you to cite it. Our policy information is available from the repository home page.

DISCOUNT BAYESIAN
AND FORECASTING

MODELS

JAMAL RASUL MOHAMMAD

AMEEN

Statistics, t5epartmenF of Warwick, University of 7AL CV4 Coventry .,.,


; ..,.--''

DISCOUNT

BAYESIAN AND

MODELS

FORECASTING

JAMAL

R. M. AMEEN

PH.

D.

DEPARTMENT OF STATISTICS

UNIVERSITYOF WARWICK MAY 1984

TABLE OF CONTENTS

page
1- CHAPTER 1.1 Status 1.2 Outline of the Thesis ONE : INTRODUCTION

t
3

2- CHAPTER TWO : DISCOUNT WEIGHTED ESTIMATION 2.1 Introduction


2.2 Exponential Weighted Regression
5

2.2.1 The model

2.2.2 EWR and time series 2.2.3 Some comments on EWR 2.3 The simultaneous adaptive forecasting 2.4 Discount weighted estimation 2.4.1 The model 2.4.2 DWE for time series 2.5 Applications 2.5.1 A simple linear growth model 2.5.2 A practical example: The U.S. Air Passenger data set 2.6 Summary

7 7 8 10 10 11 13 13 15 19

3- CHAPTER THREE : -YNAMIC LINEAR MODELS 3.1 Introduction 3.2 The DLM's
3.3 Relation between DLM's and DWE's 3.4 Some limitations and drawbacks

20 20 22 23 26

3.5 Summary

4- CHAPTER FOUR : NORMAL DISCOUNT BAYESIAN'MODELS

4.1 Introduction

27

4.2 Normal Weighted Bayesian Models 4.3 Normal Discount Bayesian Models

28 31

4.3.1 The model


4.3.2 Forecasting and updating with NDBM's 4.3.3 Coherency
4.3.4 Sequential analysis of designed experiments 4.4 Other important special cases

31
33 33
35 36

4.4.1 The modified NDBM

36

4.4.2 Extended NDBLI's


4.5 Summary 5- CHAPTER FIVE : ON-LINE VARIANCE
5.1 Introduction 5.2 The Bayesian approach

38
40 LEARNING
41 42

5.3 Non Bayesian methods :A short review


5.4 The power law 5.5 Summary

43
44 47

6- CHAPTER SIX: LIMITING RESULTS


6.1 Introduction 6.2 Similar models and reparameterisation 6.3 A common canonical representation 6.4 A general limiting theorem 6.5 Relations with ARIMA models 6.6 Summary 7- CHAPTER SEVEN : MULTIPROCESS MODELS WITH CUSUMS 7.1 Introduction 7.2 Historical background and developments 64 66 48 48 53 60 62 63

7.3 The backward CUSUM

68

7.4 Normal weighted Bayesian multiprocess models 7.5 Multiprocess models with CUSUM's 7.6 Summary

70 73 76

8- CHAPTER EIGHT : APPLICATIONS


8.1 Introduction
8.2 A simulated series of the data

77
79 79 80

8.2.1 Simulation 8.2.2 Intervention

8.2.3 Multiprcess models - The artificial data


8.3 The prescription series

83
87

8.3.1 The data


8.3.2 NDB- multiprocess 8.3.3 The CUSUM models : Known observation : Known observation variance variance

87
87 90

multiprocessor

8.4 The road death series 8.4.1 The data


8.4.2 The NDB multiprocess model with CUSUM's

91 91
91

8.5 Summary

95

9- CHAPTER NINE : DISCUSSIONAND FURTHER RESEARCH 10-APPENDIX


11-REFERENCES

100
102

ACKNOWLEDGMENTS

I would like to acknowledge my great indebtedness and convey an expression of


P. J. Harrison for his Professor to thanks assistance, guidance, many throughout the preparation of this work. Thanks and encouragement

also to the members of staff and my of Warwick for many valuable

fellow students at the Department discussions and the Computer

Statistics, of

University

Unit for their helpful assistance and facilities.

Finally

I would like to thank the University of Sulaimaniyah ( Salahuddin at

Research-Iraq for financial High Education Scientific Ministry the the of and present) and
support.

To those
I love so much

I owe so much

SUMMARY

This thesis is concerned with Bayesian forecasting concept of multiple parsimony. discounting In addition, is introduced

and sequential estimation. to achieve parametric

The and

in order

conceptual Dynamic drawbacks

this overcomes many of the drawbacks of the Normal which uses a system variance matrix. to the scale of independent (NDBM) is introduced These A

Linear Model (DLM) specification involve ambiguity Discount

and invariance

variables.

class of Normal difficulties.

Bayesian Models learning

to overcome

these

Facilities

for parameter

and multiprocess

modelling

are provided.

Unlike the DLM's, Normal of DLM's Weighted

many limiting

results are easily obtained for NDBMM's. A general class is introduced. This includes the class of NWBM's and for A

Bayesian Models (NWBM)

Other important case. special a as These are particularly according

subclasses of Extended and Modified useful in modelling discontinuities

introduced. also are systems which

operates

to the principle are given.

of Management

by Exception.

illustrative of number

applications

CHAPTER

ONE

INTRODUCTION

1.1. STATUS
The study scientists Indeed,

:
of processes that are subject to sequential developments, has occupied

for a long time and is currently in the majority often

one of the most active topics in statistics. arrives sequentially its plausible pattern according to and hidden more reliable and control

of real life problems, information desired detect it is to and to facilitate control,

some index, characteristics estimates engineering In the past,

time,

in order

reduce noise and obtain quality control

future and

predictions.

The areas of economics, See Whittle

are full of such examples.

(1969), Astrom (1970), Young(1974).

( ) Bayesian have been used to analyse time series non procedures passive procedure seem to be through categories. model construction. Models

processes. The most popular

into different broadly two be classified can Social models provide structures

one of these is called Social Models. behaves. Social or

which govern the way the environment

The be Scientific this of members class. other class may are called political organisations Models, These aim to build structures which fit specific environmental characteristics as important An closely as possible. subclass which is the concern of this thesis concerns

' The is build that to chance. aim of elements models and measure contain environments 'a deeper in to their adequacy understanding of the causal mechanism obtain order Scientific Models is Statistical Models This the of subclass called environment. governing Throughout its tools. the thesis, models are principle and as statistics with mathematics

Statistical Models be specified to otherwise. unless meant


In the classical sense, a time series is a sequential series of observations on a Wold(1954), time. phenomenon which evolves with suggested that a time series process can be decomposed into deterministic components like trend and seasonality with -a

-2random component among the common criterion Kendall, caused by measurement short term forecasting functions errors. Before the appearance of computers, the so called Moving Average

procedures, through

was used to fit polynomial, Stuart and Ord (1983). of computers,

least squares.

This is reviewed in, references. With the, during the late and

See also Anderson (1977) for further

development

the most widely Weighted Moving into

used models in forecasting Averages (EWMA) the ICI and forecasting Brown's

50's were the Exponential

and Holts growth method, Exponential in Chapter

seasonal, model -which ; later embodied in the computer

developed package -

DOUBTS, Weighted

of MULDO

Regression,, (EWR), stimulated

Brown(1963).

These models are reviewed

2 since they

much of the research described in this thesis. well known and widely used class of models is the Autoregressive models of Box and Jenkins (1970). Integrated

Another Moving

Average (ARIMA)

'Given 'a series of observations {y} and uncorrelated random residuals {a, }, having a. fixed "distribution, usually assme`dNormal, with` zero mean and a constant variance, an

ARIMA(p, d,q) is defined in the 'notation of Box and Jenkins by:


(i +d +( 1B+... PB)(1-B)dyt=(1+01B+... +9,8')e

where B is the backward shift operator, B y1 = y&_1,and 44D

of ;p, ; A1.......

q and

d are constants whose values belong to a known domain, and are to be estimated from the available data ( parameters in a non Bayesian sense).
Despite the existence of a vast amount of literature, number of unknown natural descriptive constants, meanings.. that are often difficult Further, these models depend on a large since they do not have the recommended Moreover, mean the and. '

to interpret using

for, estimation

square error criterion, resulting make

a considerable They, of

amount

is data required. of past

models are not robust. in the form

demand stationarity subjective information

or derived stationarity difficult. For

intervention,

example, -

discontinuities

the estimates. all can ruin and sudden changes

-3State Space representations , (1963) have gained considerable and the works of Kalman grounds -regarding

and Bucy (1961) and Kalman performance and reduced Bayesian

fast

computer

storage

problems.

However,

a natural about

recipe would

have a fully quantities

representation through

in which the uncertainty distributions. provide

all the unknown

is expressed

probability

The Bayesian Dynamic This

Linear Models of Harrison and is reviewed in Chapter 3 and a

Stevens (1971,1976) number of limitations In Bayesian

such a foundation.

and drawbacks

are pointed out. series process is defined to be a parameterised joint probability distribution for all t. Initially,

terminology,

a time

process I Y, 10,1 possessing a complete there is available. prior information the parameter are represented

( that is incorporated

in the process analysed ) about

definition This 0,. vector

is adopted from Chapter 3 and onwards. Vectors while capital bold phase letters are used for

by bold phase small letters

matrices except the random vector Y,. 1.2. OUTLINE OF THE THESIS : Weighted Regression (EWR)

In Chapter 2 the one discount factor Exponential

Simultaneous (1967) Adaptive Forecasting Harrison Brown(1963) the and of are method of reviewed with some critical comments. discount principle technique. The EWR method is then exploited using the

to introduce the general Discount Weighted Estimation ( DWE ) allows different discount factors to be

This includes the EWR method,

different for introducing to and provides model components a preparatory ground assigned
the discount principle into Bayesian Modelling. The method is then applied to the U. S.

Air

Passengers series and the results are compared with


Integrated of Harrison Moving Average ( ARIMA

those of DOUBTS
The Dynamic

and

Autoregressive Models (DLM's) initial

) models.

Linear

Stevens(1976) and

are reviewed in Chapter 3.

Given some forecast

prior assumptions,

for each DWE model,

there is a DLM with an identical

function. Some limitations and drawbacks of the DLM's are also pointed out.

-4-

'4 into Bayesian introduced is in Chapter The discount -, through principle modelling

Normal ;Weighted Bayesian Models ( NWBM's ). This includes the class of DLM's as a
case.,., Other. important, special Bayesian Models - (NDBM's), introduced. and parsimonious subclasses like Normal Modified NDBM's and Extended NDBM's Discount are also

The possibility of including time series models with correlated observations,,


on the coherency of these models and their A short outline of the existing on-line relation estimation with the of the for

and some brief comments previous models are given. variance

observation

is given. in Chapter

3 and practical

procedures

are introduced

variances that have some known pattern or move slowly with time.

r, ",.

Chapter

6, is devoted to ; reparameterisations
of, any NDBM,, transformations for calculating

and limiting
canonical adaptive

results.

Given the
A

eigenstructure

to similar

forms are available.

direct procedure is provided

the limiting

factors with no reference state distribution is

to the state precision or covariance often quickly reached. This

matrices.

In practice, results

a limiting useful

limiting such makes

and saves unnecessary

computations such as the adaptive vector.

Given the adaptive factors, limiting

state

for independent Results and matrices be of each other. calculated variance can precision ''given. NWBM's Limiting predictors are compared formulations somecannical are also of ARIMAmodels. with those
In Chapter sketched.

-'This leads to generalisations of some previous results.


Management of by Exception is its in forecasting use and models had largely replaced for errors However, models

7, the principle

In' Bayesian

forecasting,

the use of multiprocess

the backward

Cumulative

Sum (CUSUM)

of the one step ahead forecasting specific targets.

detecting changes CUSUM's

departures and

level from the process of

are reintroduced

to forecasting principle. Models

systems and operate with These provide with CUSUM's. .

multiprocess

discount the based on which are models called Multiprocess characteristics NDB

both economical A number

and efficient

of applications

having different

are considered in Chapter 8.

Chapter in is 9. Attention discussion Finally a general given directions for future to in possible research. progress, and work

is also paid to further

-5-

CHAPTER

TWO

DISCOUNT

WEIGHTED

ESTIMATION

2.1.

INTRODUCTION:
Operational simplicity and parsimony are among the desirable properties in model

constructions, (1984). model (EWR), errors.

The word - parsimony

' is used here in the sense of Roberts and Harrison of unknown constants involved in the

The order of parsimony construction. minimising Brown

is the number

(1963) developed

the Exponentially

Weighted

Regression

the ` discounted

' sum of squares of the one step ahead forecasting factor, it has parsimony of order 1. It methods, that

As the method depends on one discount

in be the comming sections, evident will the information with

as is the case in many forecasting

content of past observations

about the future state of the process decays factors. The discount concept is a

its age and this is accomplished

using discount

key issue of the thesis and will be exploited in this and the later chapters.
In this chapter, Exponential Weighted Regression is reviewed in Section 2.2 with the The DOUBTS is introduced method is reviewed in 2.3 and in 2.4., matrices Ameen and Harrison and provides simple

being on time series construction. emphasis the Discount (1983 Weighted Estimation method

a). This

generalisation formulas.

EWR of

uses discount

recurrence updating is constructed

In Section 2.5 a simple linear growth seasonal DWE model application is given using the U. S.Air passenger data series. and Box-Jenkins.

and a practical

The results are compared with those of DOUBTS

2.2. EXPONENTIAL
2.2.1. The Model

WEIGHTED

REGRESSION

One general locally linear representation of a time series process at any time, t future is Ytt, outcomes : with t

-6-

Yc+k

-/O+kO$,

k +t+k

Ec+k

V1 -[O,

(2.1)

where the components

[f )Jj+k independent (2),.,., (n (1), / / /L+k are of, = variables or known

functions of time 0'i k =A(1),A(2),... A(n))i, k are unknown with the subscript t, k indicating , that their, estimates are based on the data available up to and including is a random error term ( ee+k variance V ). Usually,, .,, O,, and V are called the parameters of the model and in a Bayesian sense, k However, (yi, f in EWR models, these are assumed 1'. (0 ,V] time t, and Ei+k

is short hand for the mean of E$+k being 0 and the

they have associated prior distributions. as constants for the past data De ={(ye, f),... O k =0 is estimated &, '

Given a discount factor 0< 1)}.

by m, as the value of 0 that minimises the discounted sum of squares: (2.2)

Se

'(ye-; -/e-i0) i. o, equating the result to 0, and

Differentiating

(2.2)'with

respect to Oat 0=m,,

(2.3) . -0

Now, define
q c-1

(2.4)
(2.5) i, o
'Assuming that Q, -1 is the generalised inverse of q, it can be seen from (2.3), , that

This

'and following (2.5), (2.4) the gives , with

relationship

on r.,

where as = Q1-1/'&

is by, forecast k The given point ahead steps mj_1" yt -fg and e=

-7-

I+k mt.

2.2.2. EWR, and Time Series


In time series processes, the form of the forecast to a reasonable degree of approximation. General function can often be specified up predictors can be

polynomial

constructed

through specifications of the design vector,

A simple and efficient way, where Cis a non singular matrix

(1963), by Brown into define fg, je Ck, as presented k= with dimension n. Therefore, f=j using the notations

and the criterion

of Section 2.2.1, with

being independent of time the alternative , Qe=IBC. -iQt-1C-t_

forms of (2.4) and (2.5) are : f If (2.6)

and

Le=C-'kt-,

+f 'yt

(2.7)

The current estimate of 0 at time t, is then given by

where
and a, =Q, - j'

2.2.3.

Some Comments

on EWR

In order to get some insight into the terms and equations obtained in Sections 2.2.1 and 2.2.2, consider the minimisation by maximising (2.2) of again. Note that the same estimates of 0 Given that E,+, is a Normal in (2.4)

can be obtained

L=exp{-S=/(2V)}

for 0. function Information

random variable ,L

can be called the ` likelihood to the so called Fisher's

' of 0 at time t.

(2.6) is proportional and second derivative Bayesian sense.

( matrix about ni minus the ' matrix (2.6), in a Qj is and

of L with The /'/,

respect to 0 at 0=rag) constant content is from

' precision the or , V. Moreover, in

proportionality the information

decomposed into PG'`'Qi_1G-i,

the most recent observation

the information

contributed

from the past data,

discounted by P. This,

8. together with the convergenceof (2.2), restricts the values of to the range, 0<0<1. Thus the role of the ` discount factor ', describes the rate at which the information about Moreover, distinct X2,... X1, X. time. given as eigenvalues changes with model parameters of'C the convergence of ( 2.6 )'requires that 0< (/X;'I < 1. This can be seen on

(2.6) as rewriting

Q, R`
Combining

Q0C-e
i0

R,C, -: J "fC

the restrictions on ,

IX112,1x212.... IxI2}. have 0<<min{1, we

The

from follows the convergence of Q;. the vector adaptive st convergence of 2.3. THE SLMULTANEOUS ADAPTIVE FORECASTING: can be decomposed into
variation.

Consider a time series process that


components of trend, seasonality,

three different
the seasonal

and random

Suppose that

component changes relatively very slowly, `so that the greater percentage of the predictive ( data in the is trend to analysed at variation and random changes variation attributable the end of this chapter, is of this type). E%VRassumes that the loss of information with for both the trend the and seasonal components, whereas we rate same age, occurs at know that the information on the seasonal component is more durable, and hence, more appropriately
appropriate

discounted using a much higher discount factor

than that

is which

for the trend component.

This led Harrison (1967) to propose an alternative linear growth and seasonal method

EWR to which considered a simple multiplicative procedure 2. model of parsimony, DOUBTS That work led to the development Forecasting,

forecasting the of

Simultaneous or,

Adaptive

C. I. I. basis is the the short of which (1965). Scott and of DOUBTS Whittle , with (1965) some

term forecasting, computer examined comments. .: the method.

MULDO. package following

Harrison

The

is a short

review

The k-steps predictor

FF(k) is Ft(k)=(mc+kbe)Sj(k)

9.

where
mm , -t+6i_1--(1-031 6=bi-i-(1-431)2e s)e

e =y1-Ft-itl)

factor. discount is trend the i31 and The seasonal component for k periods ahead is given by ,
n

Se(k)=17
=t

{a$(t)cos(H,

zk)-b.

(t)sin(Hlzk)},

where there are n significant the range 1 to T/2,

harmonics,

with H, taking

the appropriate

integer values in 27rk T

(t), and a,

and 6, (t) are the harmonic coefficients at time t. xk=

length T is the of the seasonal period. where Given


a&= a1, b1b,. I&

c= diag{c1, cz,.,.,. c* }

Ck

cos(zk)

si(zk)

-ain(zk)

cos(zk)

Then
at=iia1-1+aet

(Yt/mt) e'e"

is Brown's adaptive constant vector, where a

whose elements are functions of the

be details More found in Harrison (1965). factor Scott discount (3z. can on and a seasonal Although it is not intended here to proceed with the generalisation of this method, by the end of this chapter it will be evident that higher degree and parsimonious

to polynomials with more economical but still efficient seasonal components can be

accommodated. However,

like its predecessors, the method is limited and suffers from It is purely a point estimator. Unlike Holts

both theoretical and logical justifications. seasonal forecasting method, equations while
components . Other means of constructing used through stochastic adaptive

the seasonal effects are included in the trend updating contribution in is removed updating mi the seasonal

the trend

vectors for sequential estimation Gelb (1974).

purposes are that are

approximation, in any statistical irrespective

These provide estimates

not necessarily optimal convergence (1982). properties

sense. They possess some desirable, uncertainties.

well defined

of the parameter

See also Maybeck

2.4. ` DISCOUNT
In this section,

WEIGHTED

ESTIMATION

the idea of ` discounting

', as discussed in Section 2.3, is generalised

to ` multiple discounting '.

This provides a new class of models called Discount Weighted

Estimation (DWE), using different discount factors for different model components.

2.4.1. The Model


Let a time series process be represented by (2.1), Eg+k (k>0) be independent of

be by D, time t. Oi, data Dt_1}, f1), D4 estimated the at =0 given mg and ={(yi, k , ,

DEFINITION
A DWE model is given by
E(YI+k IDe, Ij+k]-le+x i,

where : ='ne-i Aae -I qt . + aeee (2.8)

(2.9) t

11 -

et'Yi-finit-i

(2.10)
+I 'tft

Qg =BIQt-lB1,

(2.11)

B=diag{R1, 2,... {3}

; 0<13, <1

9-

i=1,2,... n-

(2.12)

The EWR model is retained when B ={31 where I is an identity Notice that inversion only Qi-' and not Q, needs to be calculated Q,-' has been around to obtain

matrix mi.

of order it. the

Although

method

for obtaining Lindley

for a long time

Henderson et

later al(1959) and appreciated

and Smith(1972),

even now it does not seem to be generally recursion which avoids

by practitioners

even in E%VR case. A more attractive

inversions matrix

and their associated problems is to replace (2.9) - (2.11), by :-'


1=(I-a, j,, )Rt 2.13)

R, =B

"Qe-t-iB-"

(2.14)

at =Re!

'e(1+IeR,

f'e)-1

(2.15)

It can be seen from (2.11), that any initial value for Q. and hence =o be will , ,
dominated ( a around after small number an , the dimension of 0) of iterations. In cases

ignorance initial the of ( say usually are ,

default settings Qo-1=a1, and no is a large number, 105 where a , adequate for operation. However in most cases, there is at least a

vague impression that f

of the size of the elements of 0 which will give a better value of no so From 2.2.3 know that we , Qo= VCo-1, where, Co represents

is close to yl. no matrix

Fisher's Information upper limit for V,

about 0 at time t=0.

Then

1 Q0 can be set by assuming an marginal value c; for 0; and

the variance of c; choosing a liberal c2,.,.,. cJ/V These ideas are illustrated .

setting Q-1=dia9{cl,

by an example in 2.5.

2.4.2. DWE for Time Series The principle of superposition states that any linear combination of linear models is a linear model. Model builders often use this in reverse, decomposing a linear model and

12.

extending

the principle

to statistical

models using the fact that a normal

random vector The major point linearly to Ck diag

may be decomposed into a set of components of normal random vectors. is that obtain the component a complete k>0, models can be built Hence in practice, separately

and then combined

model. often

given a time series for which j, _k =f r components and written C=

for all t,

C is decomposed into

(C1, C2,.,.,. C, }, where C is non singular. assuming that the n, square matrix n; n.

The case of special interest

by be covered will factor and

C; has a single associated discount

DEFINITION The method of DWE, for time series, is given by the forecast function A; =fC Mg

E[Ye+itIDI]=fI+knag

by is calculated recursively where Mt


MtCn g-t+atet

dL=4e

1j=Rej(/Ril,

+l)-i

_ (l - s j )Re
B-ti

2.16

_B-yCQ`-1-'C,

<B=dia9{, 0<; } 1,,... ,!, h, , where and order n,"

<1,

and Ii is an identity matrix of

THEOREM

2.1. r , are non

For the DWE method defined above, if X0, X;,2,...X; *i=1,2,... Q0, bounded then' for limQ, G; all and =Q exists :._ zero eigenvalues of e-=

-13-

IX, 1ki 12} 0<3 <min{1, 1121I\,. 2I2,...

PROOF
Using (2.16), we have,
Qt=B Is , -` Qv1p " ,r

t- t t2ttt2k2 Gl-

QOG
k=0

, -k

kk2

since B' and C-I

commute.

The convergence of (2.17), gives that

)21 O <1 ;i=i, 2,.... ; 1=1,2 <IR;/(A;,; n, ,...


The result is obtained by combining this with the conditions 0<; < 1.

Some modellers have suggested to move beyond these assumptions in order to


increase models adaptivity, Muth (1981). Clearly such models produce highly unreliable

A introducing forecasts. temporal adaptivity proper way of un sounded and statistically is dealt with later through discounting the prior information. Under the above assumptions, the recursive formulas converges considerably fast to benefits, be later, form. Apart from limiting these provide as will seen computational a limiting justifications of many commonly used forecasting structures in literature. in spirit, uncovers models. 2.5. APPLICATIONS:
linear

It also

the partial success achieved by some classical models like the ARLMA

2.5.1.

A simple

growth

seasonal

model for normal random variables that suppose a ,

Using the principle

of superposition

time series model can be constructed using a linear combination of a linear growth,

-14-

seasonal and random components. The linear growth model may be described by the pair 10 1 ]} . This

is evident since I1Ci =i1, k]. Then if m9 and b&are the present estimates of the level and growth rate, the forecast function of this component is j1 GkMt = m, - kb,, which is the familiar Holt-Winters linear growth, function.
Any o additive seasonal pattern T ._. S(1) .... ,. S(T), for which n is the integer part of

T+ 1) /2

and such that


j=1

S(j)=0

be expressed in terms of harmonic functions as can .

S(k)=(a,

co.-(kwi)-b;,

-in(kwi))

where w =2ir/ T.

Equation '(2.18) scan be represented equivalently as


, ... _, S(k)=[i k a; cosies sinlw `-siniw Cosiw 6.

An alternative seasonal component model which gives an identical performance to that previously discussedis

I2 [/2,1'f2,2'
C2 = diag {C2 11 C2 2,.,.,.

2, n]
02.0

/2
of this

k=0J
harmonic

and C2 k=
representation

[CO3
sin (kw) occurs

,; siwhere cos (kw) when there

k-

1,2,... n. The practical an economical . For example,

benefit seasonal Box and

exists

representation

in terms of a limited

number

of harmonics

Jenkins (1970)examined the mean monthly temperature for Central England in 1964 and demonstrated that over 96 % of the variation can be described by a single first harmonic the rest of the variation being well described as random . In this case the seasonal pattern is captured by two unknowns rather than eleven as in the full monthly seasonal

-15description. In applying DWE it is generally advisable to associate a discount factor t with the linear growth
description.

component but

have a higher discount factor, z for the seasonal


is more stable than the

This is due to the fact that often the seasonal pattern

trend.
The full linear growth seasonal model is then
{f 1"f 2i; diag(C, C2i; diag{(31I2, (32I2n}}

where Ik 2.5.2.

k=2,2n,

is the identity matrix of dimension k. Example: The U. S.Air Passenger Data Series
data from 1951 to 1960 is The series is a

A practical

For comparison analyzed.

with other methods the ten years monthly

The data is given in Brown(1963)

and Box and Jenkins (1970).

favourite with analysts since it has strong trend and seasonal components with very little
randomness . However, it is not a strong test of a forecasting method. Harrison (1965)

EWR by Brown the that method proposed cannot achieve a Mean Absolute showed Deviation (MAD) of less than 3% since it insists upon using a single inadequate discount factor in a case in which the trend and seasonal components require significantly different discount factors. He stated that if, on this data, a method cannot achieve a MAD of less than 3%, then that method can be regarded as suspect. Harrison analyzed the data using the DOUBTS method described in 2.3. In this section the DWE model { J, C, B } is applied to the logarithms of the data
using: j=(1,0,1,0,.,., 1,01, C=diag{CI, CZ,.,.,. C5}

(1 11 I-jin(kw) toa(kw) f or the trend and Ck C1 __ l0 1= 1J Oh harmonic the description

sin(kw)l for k= 1,2,.,... 5 as representing J eos(kw) w=7r/6 . The point forecasts

of the seasonal pattern with of the DWE results.

where obtained as the exponential

-16
"A ( for discount factors l the trend relating to C1 ) and p2 for pair of was used with

the seasonal block. Initially the' prior specifications was


.100 001 . .1,0 0000.021

(, no9QO1=(

0}

which corresponds to a specification

of no seasonal pattern

!. a level lying within

a 95

interval [ 80 ; 280 1 and a monthly growth of between 4c and W 'c per month.

Hence

this is a very weak prior although it does not assume complete ignorance. Fig. 1 presents
the one-step-ahead point predictions For comparison, with the observations. errors over the last six years were

the one step ahead forecast

MAD DWE 2.3% achieved. performance of obtained and a 'Another Writing book data is in Box Jenkins. known the the given of and of analysis well of the t' observation and aj as the corresponding one step

logarithm the as z

ahead errors, their predictions

difference the equation: are obtained using Zt_13+at-13"0 -i-'ac-iz+Awae-ia

=e-ie-1+ze-12

This is 61. 4 is 9=. 41 also of and method =. minimised when where the mean square error parametric following 2 the and parsimony table indicates the comparability of the

(0.84 DWE discount DOUBTS that the pair with same of and performance with that of 0.93) as described in Harrison (1965) and with the discount pair (0.76 , 0.91) which MAD z the the errors. of reduces

-17-

The Mean Absolute

Forecast

Errors

For The Year 1955-1960

DOUBTS
Year 1955 1956 1957 1958 1959
1960

DWE
(. 84,. 93) 7.7 5.4 5.5 14.7 11.5
11.5

DWE
(. 76,. 91) 7.0 5.4 5.6 13.7 9.8
12.1

B&J

(.84,.93) 9.4 4.3 5.5 15.2 11.8


11.0

8.0 4.5 6.1 14.0 8.7


14.2

OZEAN
Clearly,

9.4

9.4
difference

8.9

9.3

in this example,

no significant

is observed between the above

results.

However,

DWE has the properties of being more general, parsimonious,

intervention can easily be accommodated in the phase of sudden changes and these depend
on a small number discount assessed easily of factors. The following table illustrates

MAD. discount in for different terms of of pairs selection models sensitivity

P2 P,
0.6 0.7 0.8 0.9 1.0

0.8
9.71 9.45 10.41 16.56 32.58

0.9
9.39 9.08 9.12 11.91 22.59

1.0
15.36 15.3 14.15 13.77 16.44

Passengers
Iv W

(*1000)
to 00 00 014

OOOO OOOO O

o..

co

3. -

01 0
rf

co r

o'

0 co

N O

"19

2.6. SUMMARY:
The methods of EWR and DOUBTS are reviewed and some general comments ,
drawbacks introduced introducing and limitations are pointed extension out. The estimation procedure of DWE is for

as a fruitful

of EWR

to provide

and prepare solid grounds

the discount concept into the Bayesian Forecasting.

The method is applied to when compared with

the U. S.air passenger data set and the results previous existing ones.

are encouraging

-20-

CHAPTER

THREE

DYNAMIC

LINEAR

MODELS

3.1. INTRODUCTION
One applications, intelligence of the main

:
contributions to Bayesian Statistics, both in theory and

is Bayesian forecasting. with the information

This provides a natural by the data. The method

way of combining

experts and

provided

The DLM's

of Harrison

Stevens (1971,1976) for many well

provide such means. classical

also gives limiting Brown(1963),

justifications Holt(1957),

known

forecasting

procedures,

Winters(1960), Priestley engineering

and Box and Jenkins(1970). In particular,

State space representations amount of literature

can be found in is available on has

(1980).

an extensive Filter,

applications

of the Kalman

Kalman(1963).

Bayesian Statistics

introduced facilities the the and phenomenon of on-line of random understanding widened variance learning, intervention of stationarity. has relaxed the assumption and modelling multiprocess ,

The method provides forecast distributions rather than point estimates.

In this chapter, DLM's are reviewed in 3.2 and relationships with DWE methods are
discussed in 3.3. for The DLM recursive formulas in the parameter estimation are attractive problem.

the ease of elaboration

and reduce considerably

the computer

storage

However,

the method is not free from drawbacks. This,

The specification

of the system matrix is discussed in

W has caused problems in practice. 3.4. A short summary

together with other difficulties,

in is 3.5 the the given chapter of contents of

3.2. THE DLM's

The class of DLM's as defined by Harrison and Stevens (1976) constitute quadruples {F, C, V, W}1 with proper dimensionality. A particular parametrised process { YY!0, } can

be modelled using this class of models if the following linear relations hold:

21 -

i)

YY=PtCt-"j

ii)

Oi=CeOe-i+'re

we..'Ni0+We1

(3.2)

The first of these equations is called the observation observable vector structure Yt to an unobservable state parameter distributed

equation vector

which combines the error The

and an additive

Normally be is to assumed which

with mean 0 and variance V. with time.

describes the evolution second equation, stated, the random vectors

of the state vector

Unless otherwise with known

v, and w, are assumed

to be uncorrelated

WW Vi respectively. and variance matrices Given an initial it follows that (YY IDe-t) -N[1e; ft 1 (Oe(De)` N[me+Ce 1 (3.3) prior (03IDo)-N'mo; C0;, using Bayes theorem with Di={1&, De_1;,

(3.4)

where:
li=PjGjm<_1 ; Yj=PjRjp, 4+VJ (3.5)

wi ==&-i+A&ej

(3.6)

CC =(I-AeFe)R1 R =C CC_,C't+W Al=R5F'5 Tt-1= C5F'I V, -1

(3.7)

(3.8)

(3.9)

cc=It -

(3.10)

When {P, C, V, W} are all known and are not dependent on time DLM is then the , DLM called a constant

-22-

3.3. RELATION The updating

BETWEEN

DLM's

AND DWE's

: suggests a connection between In order to establish this

(3.5)-(3.10), recurrence relations

estimation using DWE and the estimation the ` we first following give relationship,
DEFINITION For any DWE {j, C. B}, definite, .a -, corresponding -e Qt-i-1 with initial DLM

using DLM's.

setting (mo; Qo), where Q0 is nonnegative is given by (f , G, V, W}i where

!V, =(gl Qti-Lfl In case of. Q, _i

C't) V0 is nonnegative definite and H =B, -'Ci. Q, -, -t represents a generalised inverse of Q_i.

being singular.,

THEOREM

3.1.
the corresponding , Further, DLM produce a forecast function (Oi ID1)--N[m,; C,; where mi is the

For the DWE {f, C, B}i identical to that obtained

by DWE. -1VV.

DWE estimate and Cj=Q,

PROOF: From the initial settings, the theorem is true for t=1. Using induction, suppose

true for {t minas 1 }. From the DLM results, we have:

'

4'

Bi = Ci Qi-i-1 G 'g Vi + WV=8, Qj_1-1H' V,

and since
CC-1=Rg-t+ f'tfo Ve_1=Lff lQe-ige 1+1 'ifjJ/VV

Q1/V .
we have -'
-l

Ce = Q,

Ve

-23

)
E; OD=G m ei=m _1+a ai=Cif'e/Vt=Qi-lj'i, the DWE estimate, since

the DWE s,

iii)

The forecast function is


k

E; YY+kIDe]-fe+.,

H CI.
i=1

$m=

wm for the DIVE.

COROLLARY

3.1.1. the DAVE if gives a forecast function identical to that of

For t>0.

the DLM {f , C,, V,O}.


PROOF Obvious since from the definition WW =0 for all t.

is unusual in its dependence upon C, the In DLM terms the above setting for WW _1, The D, that the 0, the given concept observers concerning observer of uncertainty _1. depends his information is development the future the process upon current of also view of Souza(1978) and in Smith (1979). Forecasting, Entropy in adopted

3.4. SOME LIMITATIONS

AND DRAWBACKS

Time series processesare often best described reasonably using parametric statistical intervention be in In this can various stages of performed case, efficient model models. framework, Bayesian Within the the analysis. DLM's are often used for this purpose.

However, the latter require experience in the representation of innovations using Normal distributions. probability The specification of the associated system variance matrices has

Practical because V the arise problems of non uniqueness obstacle. of major a proved lack familiarity because the W of and and with such matrices causes application

difficulties and lead practitioners to other methods. Even experienced people find that

-24
have little feel for

they

natural

quantitative

the elements

of these matrices.

Their

ambiguity

arise since there exists an uncountable

number of time shift reparametrisations

which have identical forecast distributions.

For example:

The constant Normal DLM


Yt=et+ve

of =, \9t-t

+ W9

v` with Wt

--N'O; U=

V-as as

aS W-a(1-X

Zj )S

can

be

represented

as

Yt -x Ye-1= vj -, v, _14-wi. loss of generality,

This is a stationary series,

process provided that III I <1, equivalently as

and without

for an infinite

it can be written

x Y, =vt+- "Z x` we_i


i=0

so that
Var(Y,
z

)=V+W/(1-X2)

and
>. "v

;t... t
Cov(YY, Yt+k)-kk W/(1-X2).

Provided

that

is

covariance

matrix,

the

joint

distribution

of

Y, does not depend on a. i. e; for infinitely many values of the variances Y,, YI+l, Y: +2,.,., +k of the u's and'w's, the same forecast distribution .
V, W and C using sample autocovariances, see

is obtained This generalises easily to .

higher dimensional DLM's


Attempts

are made to-estimate the ambiguity

Lee(1980), " however, constraints

in these systems is always evident variance is also not invariant

unless further to the scale on Ameen The

'added !. The system error are variables

independent the which and 'Harrison

are measured. this

To overcome these difficulties, matrix by a discount matrix.

(1983 b) have replaced

system

is the 'understand this to concern Of is both to elaborate and simple and procedure easy

-25the coming chapters.

-26-

3.5. SUMMARY

The class of DLM's is represented in 3.2 and its relation with DWE estimates is DLM having DWE is there the same It in 3.3 that a exists model, given a shown given .
forecast function. Limitations discussed in 3.4. DLM drawbacks s are of and

CHAPTER

FOUR

NORMAL

DISCOUNT

BAYESIAN

MODELS

4.1. INTRODUCTION:
Two desirable properties conceptual estimation. estimation parsimony. Chapter of applied mathematical models are ease of application and

Hence the attraction 2 was concerned with

of discount factors in methods of sequential the method of DWE which generalises the by Brown(1963). In is

Exponentially of method

Weighted

Regression promoted

the simplest situation

a single discount factor

3 describes the rate at which information

lost with time so that if the current with

information

is now worth M units then its worth , However if a system has numerous particular components may be

is is k M I3 to units. ahead steps a period respect then the discount factor

characteristics

associated with

different values. The DWE method provides a means of doing this but it take to required is strictly a point estimation method.

The Bayesian approach to statistics is a logical and profound method. In forecasting


it provides information consequently). to provide as probability distributions ( support (likelihood) functions follow

These are essential to decision makers. Forecasting methods founded

The major objective of this work is upon the discount concept. This

Bayesian

ICI forecasting in been the has applied concept Harrison(1965) forecasting method and Harrison Scott(1965), and

package MULDO

as described in 2.3 , Hierarchical is a particular and other

ICI Multivariate the and The former The

package, does

Harrison, Leonard not easily

Cazard(1977). and Whittle(1965). Dynamic

which

generalise,

latter

applications have limiting Harrison

have been based upon Constant forecast functions Harrison equivalent

Linear Models ( DLM's

). which and

to those derived using EWR,

Godolphin

(1975),

Akram(1983). and variance

The use of such models has involved matrix W which has elements that are

practitioners

specifying

a system

-28proportional
discounting. ( NDBM's) introduced components. time

to functions of a discount factor.


This chapter

They are thus indirect applications of


Discount Bayesian Models

is concerned with a class of Normal the system variance matrix possibly different discount

which eliminates which associates

W. Instead a discount matrix is, factors with different model _1 at-

Such a discount precision

factor converts the component's Pi =43P, _i for time t.

posterior

precision P,

t-1 to a prior

The term

precision

is used in its

Bayesian sense but may also be thought The use of -the discount variance W, ' since ambiguity matrix

of as a Fisherian measure of information. overcomes the major the discount matrix disadvantages is invariant of the system ( up to a linear

is removed.

transformation')

to the scale on which both the independent

and dependent variables are and ease in

the methods 'and are easily applied. measured ,: of operation -it' is 'anticipated dynamic behaviour, performed regression, forecasting quality control that

Because of conceptual simplicity will

the NDBM1 approach

find many applications

time series analysis , the detection of changes in process , modelling where the observations are

and in general statistical

sequentially

or c rdered according

to some index.

In this chapter Normal Weighted Bayesian Models (NWBM) are introduced in 4.2 Particular is DLM's is their emphasis given to a subclass pointed out. relations with' and is discussed NDBM's'and mdels'called the the retaining of model coherency possibility of in 4.3. ' Other practically important like Modified NDBM's and the subclasses of models

in These discussed in NDBM's 4.4. the the Extended extend capability of models the are behaviour in in dealing the and process with cases pof sudden changes correlated

NDBM's in finally 4.5 Some the are given and a short on comments observations. in is 4.6. chapter the given summary of

4.2. NORMAL

WEIGHTED

BAYESIAN

MODELS

:,

Consider a parameterised

is I0, is 0, }, YY (Y, and vector an observable where process

the

unobservable

vector

of

state

parameters

containing

certain

process state

interest. of characteristics

For example each component of Y, may represent the sales of

-29-

a product

at time t

with

the corresponding

component

of 04 representing

its level of

demand at that time. distributions. operationally made for 0,. probability distribution

Each of the YY and 0, are random, vectors and so, have probability the discount principle discussed in Chapter no distributional by introducing for 2 provides assumption an initial updating an is

Although

simple and efficient This drawback for

method

of estimation, simply using

can be overcome Yo and Oo and

joint the

distribution

Bayes theorem

of the parameters as new data arrives.

This requires a model which describes

the way that the parameters evolve with transition DLL[. In order to introduce the discount stage. The relevant

time and the amount of precision lost at each are stated in (3.1) and (3.2) for the

model assumptions

principle

into the Bayesian approach,

the class of

Normal Weighted Bayesian Models ( NWBMi ) is defined

DEFINITION For a parameterised process { Y1!04} ,t>0, where the observation probability distribution is a NWBM is given by a quadruple

(Y Io1) ril - ,viFtog;


distribution the parameter posterior at time t-1 and given (ae-1IDe-i)
the prior parameter distribution
(Ot IDe-i) -

(4.1)

N[m

C1_1] 9 _1;

(4.2)

at time t, is
N[Gg mt-t; Rt) ;R =H CC-, H't (4.3)

Note that, R. is a variance matrix provided that c,


definition. by matrices

_,

is. C,

and V are variance _1

THEOREM

4.1. the one step ahead forecasting distribution and the updating

For any NWBM,

-30distributions by: are given parameter


(Y IDo-1) (4.4)

(01 IDe) where D, G,, D,

N; me+L'ei

(4.5)

,,

_1}

and

IiPgGirat_i

Y=FjRjP'tVt

(4.6)

'

mt=Cimi_t-Ater Ct=(I-AeFt)Re

, ec-7e-re

(4.7)

A. =RtF'tke-1=CeF'tvt-'. .

(4.8)

PROOF
The proof is standered in normal Bayesian theory. obtained from the identity f(Ye1Oe)f(OeIDe-t)-I(YjJoe-jt(ocI functions the where variables. Similarly, f (. ) 's are density functions terms However, the results can be

Yt, De-j)
of the appropriate random

by rearranging

the quadratic

)+(0j-Cjmt-t)'Rt-1(09-C&=$-t)

as ( , e_1e)'ie-1(Yc-1e)+(0, where ,j, NWBM's for W1 which = kip -m1), C`-i(0e-w1)

defined in Cs the theorem. and are as m, linear and non-linear are normal. distribution models If

form an extensive class of models containing 'the prior and posterior

distributions definite, to provide

H1C1_1H', -C1C, _1G'j -

is nonnegative

the conditional

(01 [ 0j_1) distributions. hand,

N[C101_1; W] may be introduced Thus, under this condition

forecast lead time coherent On the other normal DLM

NWBM , any it is

is a normal DLM. that any

setting

i, C'& W1)"C1 H1 = (G C1_1 +

evident

3Z
these different model components. Before introducing practically more efficient and parsimonious NDBM settings, it is interesting to point out some relations with other well known models. THEOREM 4.2.
{F. C. V, B},. with non singular Ci for all t, and initial setting

Given a NDBM
(m0; C0),

If Bt =l,

function NDBM1 forecast V, VV the and = setting (rno; Q0= Co-' V) is identical to that

is identical

to that of of

EV4 R {P,, Ci, j with initial iiThe NDBM initial iiiforecast

function

of DWE {F, C. 8;,

with

(i) in with setting as the joint posterior

V, = V. parameter distributions settings. are identical to those of

If B4 =1,

initial the DLM {F, C, V, O},, same with the

PROOF
The proof is by induction. From the assumptions in all the cases, since , Now, assuming that

is for true t=0. the theorem G, taken, Fi are and common

it is for that time t, true is t-1 time true show we the theorem at , iGiven from the NDBM results, for time t, we have
M, =GOwt_ltaitt ,

Ce-i_8e-i+F,

eV

spe

Re 1-RCe lCe iCe 1

This gives
Ce 1V=PG'e 1qt-iCc i+F', F, =Q, for EWR.

Hence,
at=Ci 1F'jV=QtF't

as for the EWR.

k.

32'-

these different model components. Before introducing practically more efficient and parsimonious NDBM settings, it is

interesting to point out some relations with other well known models. THEOREM
Given
( 0;

4.2.
a NDBM {F. C, V, B},, with non singular CC for all t. and initial setting

CO),

i-

If Bt=I,

NDBM function forecast is identical to that of of Vi V, the and =

EWR {F, C,R} with initial setting (mo; Q0=C0-i V)


ii, The NDBM initial forecast function is identical to that of DWE {F, C. B, with

setting as in (i) with the joint posterior

Vi = V. parameter distributions settings. are identical to those of

If Bt, =I,

the DLL1 {F, G, V, O},, with the same initial

PROOF The proof is by induction. From the assumptions since in all the cases, , Now, assuming that

is for true C taken, the theorem t=0. FF are and commo,

the theorem is true at time t-1 , we show that it is true for time t, iGiven from the NDBM results, for time t, we have
1Rt=is1A_1<<C

Ct-1=R6_1+A

1Fg gV

Ri

1=G'1

1Ce 1Ge

This gives
c lV=pGe 1Qe_1c 1+P'iF=Qi for EWR.

Hence,
1F't V= Q4F'j C, at = a+ for the EWR.

-33iiiii`=B''C'c-`C The proof is similar to part one, with Rj The two models are identical, since, for the tCg `B NDBM with B =!,

Bt=CA_iC't=CeCt_tC'i-0

for the DLM1 with

WW=O.

Furthermore it can be seen from the Theorem in Section 3.3, that the limiting
forecast distribution DLM of a constant NDBM {F. C, V, B}. is identical to that of a constant definite. However, as

{F, C, V, WI with

W=(HCH'-CCC')V

being nonnegative

with the DWE specifications is given emphasis with

for the time series models defined in Section 2.3.2, particular structured NDBL1's for which I. C, =diag{C1, C,,... C, }i is the identity matrix of

to canonically

C, of full rank n, . Bt =diag{I311t. {3I_,... I3, I, } where 0<0; and n, ts:1

dimension

4.3.2.

Forecasting

Updating and
parameter

NDBM's with
distribution at time t-1 as in (4.2), the one step at time t,

Given the posterior distribution forecast ahead

and the updating with R, =Bl

posterior

parameter

distributions

(4.5), (4.4) by and are given

" C, Cj_1 C'1B'-".

The k-steps ahead forecast function FF(k), k>0,

is given by
Hc4.1 1M

Ft(k)=E{Yt+kIDei-Pe+k

4.3.3.

Coherency

For the NWBM

joint forecast distributions be derived NDBM's coherent may and

DLM using a corresponding


Yt{k='$+AOt+k+lot+k+
et+k-, V'o; t-k

at +k -GC+k0ttk -

-1

+wt,

k+

t, k

1,0; --N[O+Wt, k' .'

Defining

,
Re, i-g&+iCcH'+1

-34-

Wt k ={w; 1} and Rt, k

is derived from the recursive relationships WW, k * wo =(1-APj)


Rt. -At H,

(4.9)

k+l=Ht+k,.

l(I

kFt+k)Rt.

t+k+1

At,

pit+ki =Rt. k k
8t_k

Vt-k - FC+k Rt, kF't_k)-1


=B -Ct_k

For univariate nt does of W&., t

series,

Ve-k

Fe-k Re

kFg.

is a scalar quantity.

Hence, the calculation

require matrix inversions.

THEOREM

4.3.

Given no missing observations, the NDBM {F, C, V, B} is coherent. PROOF


Let (0, ID, ) Define

$=CC C'+Wi+,=B-"CC C'a'-''


Q= CRC'+ WW+2=B-4CRC'B'-" I'=FRF'+Y Z=FRF'+Y.
is on the basis that is missing. In the

,
.2

Note that the calculation above relations,

of W,

jr,,,

for convenience. some subscripts are removed

Using the formal DLM relations,


ye+x y6+l ( at oe+l 09+2 IDS) " - N( FG 'm FGmt m, Gm,
2 G

Z FGRF' Y

FG2C, FGC& Ct

FGR FR GC1 R

FQ FRG' G 2C6 GR
Q

-35It can be seen that the posterior distribution Cmt+RF'1' But (oo+2I, Now Q-GRF'Y be incoherent. inequality will De) e+1, V C_1; Q-GRF'Y-PRC" showing that by the discount since IDe+1) (O, is of +i C +i=R-RF'C'-1FR!

1(fe+i-FGne);

'FRG'*B-'CC, _1C'B'-' However, not occur for the

principle Ct k= Ci_,

would 1, this the

6Vt,, 's defined

(4.9),

and the above procedure

be can extended

to establish

NDBM for DLMt Y, the the of and equivalence _k. The above theorem ensures the testability the DLM. However, given a starting distributions initial N VBMI's the of on the same lines with with F, C, V and B, particular

prior

[mo; Co; together

successive predictive NWBM's.

can be used to generate data sets following

The noted difference is that the DLM uses the set of equations (3.1) and (3.2) go is known while, the NWBM starts B. with [ma; Co, and the transition

assuming that uncertainty

is acknowledged

by the discount matrix

4.3.4.

Sequential

analysis of designed experiments


are often performed

:
and are subject to slow

Statistical movements variation. well

experiments

sequentially

as well as sharp changes, In such cases, static

perhaps due to some uncontrollable ) models are hardly justified for a sequential by Harrison.

sources of and may analysis of the

( non sequential DLM's

lead to false conclusions. characteristic

have been adopted of Nylon Polymer

quality problems

in the production

However,

already discussed regarding the W covariance these problems. For example, a

matrix

often arise. The NDBM's randomised sequential

overcome experiment

22 completely

by be can represented
Y; =A1. +Q,, `erj1 1i e i, j =1,2

.,

block 0,, the Qjjj the represents and represents effect where collective & YE9, time t. at =0 any stage with ji ii

treatments

effect

-36Now, contrast in order

to partition

the variation by partitioning

among the treatments, O,,, to 9s 1,93

an orthogonal

need to. be constructed

A.,, where Al. and j e 1 and 2 respectively

represents the block effect , AZ i and A,,, is the interaction ieffect.

and A3.& as the effects of treatment Usually this is performed as follows: treatments and their interaction

the effect in presence of both 'treatments 'and 2+ 1

sum of the

random error.
"C =8t. t -A2t -eat -e4I -Elt

ii-

main effect of treatment without treatment 1+

1x

sum of the terms with treatment

I- sum of the terms

random error.
Y121 -A16 +029 -031 -010+620

iii-

similarly,

for the main effect of treatment 2, we have


Y2l& -Q1 -029 +039 -41 +3j

The orthogonality condition suggests that


Y22L= +E4e +040 -029 -03t 1*

--'In

information the collecting above

with

Y'I =[Y11, YI2, Y21, Y22J,, an appropriate

NDBM may be {F, I, V, B} where


1111

F1 .1

11 -1 11 -1 1 -1

(4.10)

C is taken as the identity matrix to indicate a steady parameter evolution with time. 4.4. OTHER IMPORTANT SPECIAL CASES:

4.4.1.

The Modified

NDBM and changes in different characteristics of processes

In modelling discontinuities

it is intervention often advisable to operate a system that using or multiprocess models

-37-

protects arising

the information from

on unchanged that

components

against

unwanted This

interaction

effects and the of the

those components

have been disrupted.

is possible

discontinuities occurrence of

in the data need not require complete respecification such as Jaynes (1983).

by been has statisticians suggested often parameters as

DEFINITION Let (Oi_1lDi_1)--N[me_1; Ct_li ,G= diag { C1, C2,.,.,. C, }B= diag

{B1,BZ,.,.,. B, } {Rj j}t ,

be {C,., }, let C_1 be Ri the structures of partitioned and and of , _1 , for i, j = 1,2...... r. A modified NDBM is a NDB.M {F. C. Y, B, such that
(4.11

R; j=C, C;,j C'j,


The occurrence of sudden structural

for i* j

(4.12)

changes in the state of sequential

statistical

it be In is time processes, series may possible to classify the types of processes common. level, Such in into be and/or growth seasonal components. changes changes can change the increasing to by only corresponding components so that, uncertainty modelled components uncertainty is not effected. In DLM's, other

this is performed by increasing

for blocks. For NDBM's, the the the only vector, relevant error w,, state uncertainty of future uncertainty is controlled by the discount matrix. It can be observed from the

definition of NDBM, that the future uncertainty introduced to a particular block will be transmitted to other blocks through their correlation between them. The modified NDBM
is introduced to prevent this. intervention relevant Moreover, a major disturbance factor N, on a particular block may

be signaled with

using a discount

where N is chosen to age the This can be performed linear transformations

history effect of past even within blocks.

to that component this

by N periods. under

Although,

loses invariance

temporarily,

but enables to introduce

uncertainty

into a desired component of that block.

These ideasare usedin the examplesthat are given in Chapter 8 and Migon and Harrison
(1983) have applied modified NDBM's in their models.

-38The above definition That is, can be modified to include more general transition form of G as {G;, } and that of GC, _1G'=E C matrices. as {Eji}

given the partitioned

(4.11) and (4.12) are replaced by R,; =B 'Eis B, -' r and

R, -12

fori'

j.

4.4.2. Extended NDBM's


In, many applications an NDBM provides an adequate model but other applications This is particularly the rase hen C is singular and

may require a more general NWBM.

when high frequencies and some type of stochastic The extended NDBM is defined by the quintuple N[^-,; Ct-tJ , this defines

transfer responses are to be modelled

{F, C, Y, B, W}, where given (0c-, IDe-i) ,

(Y 1Oc)-N(FFOt; VV)

(OeIDc-i)-N[Gi

], R me-i;

where
- C Ri =Bt Often with in regression and design of- experiments a constant variance
It %

the variables are fairly of some ,

stable

be it advisable not may and

to subject their precision to an

exponential independent either static (4.10),

decay.

This may be the case with the example in 4.3.4, if the block effects are so that ) (A1, IDt_, )-Ne5, 95,, is unknown where u2 i; g With the design matrix and

or exchangeable

or subject to a very slow movement.

P defined by

this may be modelled using an extended NDBM with

F1=(FOJ

C=

0 (0,0,0,1)
0I,
4

1=b and W={W; 1} with W1.1=cr2its only non zero element.

-39-

Similarly in modelling correlated observation errors such as those generated by a (1-e1B)(1-e2B)St process vi = stationary second order autoregressive
1bt {(1,0.01; 0 10 b2 Fi+2

type models of ,

00C be preferred. may estimated

I;

0;

diag{{1,1. B}, W}

The only non zero element of W is 'W'22=V

and this can be easily models. For a particular errors see

(1971) for Zellner See line autoregressive also on .

EWR Generalised EWR to which considers stationary extension of Harrison and Akram (1983).

observation

-40-

4.5.. SUMMARY:
The'discount concept is introduced models. into Bayesian modelling and forecasting via a

NWBM general class of NDBM's attention , Modified

Some special important and the Extended Neat updating

and parsimonious are introduced. for the location

subclasses of Particular vector and

NDBM's

NDBM's formulas

is given to the NDBM's.

derived together matrices are scale _.

with their forecast distributions.

41 -

CHAPTER

FIVE

ON-LINE

VARIANCE

LEARNING

5.1. INTRODUCTION:
One of representation that the consequences of the conceptual differences between the Bayesian is

Markovian time as a series of a allows for a genuine

process and its non Bayesian formulation structure for the variance

the former

dynamic

V, of the of

observation Kalman

known be is to that assumed constant a often error v, and ARLMA techniques. of it VV is important is crucial models. for

in the formulations

Filtering

On-line Bayesian likelihoods intuitive

estimation but

a successful practical modelling

application governs

of the

forecasting

in multiprocess

since it

different the of

Experience has shown that

practitioners

have little

feeling for the size of this variance. Y.

It is often confused with the one step ahead

forecast variance

However for single constant NDBM cases, in the period of stability, w

Y' ( ) ii Theorem 6.4. is discount )lim (; is ; the V=11 /X; part of where the relationship ,
i=1

e-Z

factor associated with the iih parameter 0, with associated eigenvalue k,. If required the be V derive to may acknowledged and used marginal extra uncertainty associated with forecast distributions Bayesian manner. A number of approaches based on the idea of De Groot(1970) have been adopted for
estimating univariate the observation variance V. Smith (1977) has discussed the problem for

This is easily done since (0, .

V) is jointly estimated in a neat

steady state models.

The case of heavy tailed error distributions introduced

is given by

West (1982). The Bayesian Harrison (1983 c). Other

procedure non

in 5.2 can also be seen in Ameen and introduced by Harrison and

Bayesian

elaborations

Stevens(1975),

Harrison

and Pearce(1972) and Cantarelis

and Johnston(1983)

are briefly

-42-

reviewed and a generalisation of the latter is given in 5.3. A new procedure called the power law is given in 5.4 , Ameen and Harrison (1983 b). 5.2. THE BAYESIAN APPROACH:

It is assumed that the variance VV= Ibn -1 where d is unknown.


The observation distribution is

The posterior state distributions are


(5.2)

ID, -, with this Gamma pdf having a kernel

ie

)- r(

-t

rte

-t

(5.3)

22

exp{((*le_1/2)-1)log4b,

-i-(a,

-i/2)ee-t}

(5.4)

Defining the prior pdf's as


(0e (D eieee _ I )-N(Crs -i 1) i ehe _g (5.5

(,0t Da-ifo)"r where Ii represents and the information

(ae-i)2; ile-i)2) . in the posterior to prior

(5.6) transition

required

R, =Ht C_ H', posterior

are feasible functions respectively.

( such that (5.6) is well defined ) of the

parameters

The functions and X can play an important

role in both theory and applications.

A special choice is introduced later in Section 5.4. Other forms are dealt with in Ameen (1983 b). These are specific functions either defined through posterior entropies or like advertising awareness. However, it follows

accommodate some external information

from (5.1)-(5.6) and using Bayes theorem, that the recurrence relationships for m, C, i, k, (4.6)-(4.8) and are exactly as with the setting V=1 , and

-43t

(YeDe-tlie)"Yfet

41

(Ot ID,

-i,

bt )`

-1, 'wt+Ct6t -' .

ID,)-r(a,; 2; (4, '2) -n,


where distribution and ac =(ac_1)+cc' Yc ec As usual cc = i -c. The joint

( Yc+1,6&+1ID1 ) is readily case

obtained

and ( Yc_LIDc ) is derived by integrating

In the univariate 1 out .

Ye+i- ye+i
(1T-

to
t

the student t -distribution This

with T16degrees of freedom . elegant and is properly Bayesian. It is not easy to structure of V,

method is operationally

retain the elegancy when generalising is unknown

to many cases where the correlation Consequently practitioners

or where V1 is not a constant.

may prefer the

robust variance estimation

method discussed in 5.4

SHORT REVIEW: METHODS BAYESIAN NON 5.3. -A


In addition to the method mentioned in 5.2 ,a number of approaches Harrison have been

for estimating adopted fitting six point curve point by

the observation

variance

V.

and Pearce (1972) used a

data point. around each that of

Denoting the value of the curve at that VV x yt 20, by they chose the and

yt and assuming likelihood

J Vi N(0; where y1- yi P. Another method

maximum

estimate

proposed

Harrison

Stevens(1975)

assumes that

PSQ % L,

where L1,SS are the level and seasonality constant with C( say ) is

known Q the P, while constants proportionality are components and obtained from the median of a pre specified N ordered constants

corresponding

probabilities theoretically

data line information. the using on updated which are do not generalise easily. and profound Another

These methods are not method and

on-line estimation

based on the limiting

steady the state DLM's is suggested by Cantarelis properties of

Johnston (1983). This is described as follows

-44-

1-1 tt

) Ve-t;

1-a e

where a= lima, , a, being the adaptive coefficient.


A direct generalisation
" x r. e =t ,

of this method can be given as


t-1 1Z Ve_i (1-Je"e)ee

or more generally
1'

5.4.

THE POWER

LAW:
efficient and robust procedure can be described using the

A more general but simple, relationship Vj=(I-PFAj)Yj

For, a univariate

time series,

2 define d =(1- f1a, )e, . In parallel with the Bayesian

approach described in Section 5.2, the estimate of VV may be given by vt =Xe/tie

where Xj =Xt-i+de (5.7)

lit -*1e-1+1 Initially (Xo, -vla) may be chosen such that

(5.8)

Vo=X0/110 is a point estimate for Vo and rho is In

the accuracy expressed in terms of degrees of freedom or of equivalent observations. , the analysis of 5.2 it is seen that Vi IDi (4 J. =1/E Hence, if required,

forecasts can be

produced as in 5.2 using a student t-distribution may be wise to protect distributions, one simple O'Hagan effective

with -9, degrees of freedom. In practice it Outlierdisturbances. and major prone Using mixture d, distributions. However

the estimate from outliers (1979) can be introduced method

practical

is to define [4,6).

(1- f, ajmin(e,

KY } where in ,

general the constant

K belongs to the interval

In those cases in which it is

-45

suspected that

VV varies slowly over time a discount factor may be introduced by

(5.8) by (5.7) and replacing


Xi=13X_1+d2 9, and

'I1 -13l1e_1`1

This procedure is easily applied and experience with both pure time series and regression type models is encouraging. 0.95 <<1. choose then it is recommended dimension empirical of the state However, because of the skew distribution of di it is wise to

Further that

if the initial learning

prior of the parameter vector 0 is vague commences at time n+1 where n is the . with positive observations. an

variance .

vector
2b

In stock

control

, law V variance =ay,

with b=0.75

is often used. Stevens(1974).

An estimate

;, of a is then derived as
a, =ZZ/fig, where

Zt=Ze-t+desfYet .s

, It-07le-i+l V, V Future estimates of are; =aI{E(Yj+, +,, k ID, ))1'6

A more general procedure for accommodating stochastic scale parameters is as follows. See Ameen ( 1983 c). Let Oj be a scale parameter with posterior probability density function (pdf) at time t-1 given by (5.3) and prior pdf for time t, be given by (5.6). Moreover, let s& , me-1 -1) ID, (0t-, and f (0

I0, (YY ) be the f k the random variables with pdf's of modes and ,/
De-1) respectively. e

Define the link between YY, 0, and c, as follows


iYeOe) x 4tf, 1-6 (Ytlxt)f (Yilot) (5.9)

f (Qek1 De-t) ,

x 4i fl-m (ht IDi-i)/(oe

1Dt-i)

(5.10)

46 Combining ( 5.6) with' ( 5.9 (jrc, for is (0c, Dc_1) pdf Oc


1i )2 -ib (n

( ), 5.10 the approximate kernal of the posterior and

e
n)2-. b 4

_)2

1-b

! (itIse)/(AcDc-i)]

(1(re1Oc)/(oejDe-i)

. -.

). 2

'i/(fczc)/(keI

Di-t)I/(meI IDt_, )

De)] '/

1-. b

'(mcIDc)f

(01IDe)

Ain

),z
exp{-[ .

/(fase)/(k
(a 1)t2ln .. /( rac ID) t

1.6t/2}/

I-, b'(rnc

ti jDc) , Dc )/ IO,

where Mc is the posterior pdf

I Dt. Oc mode of

In comparison

with the approximate

posterior

s_

/(Oc', e`Dc)..

_'d ''

, 1' 2

'

r'}'

(OIDe)/

'(m

IDe)

we have

Ize)f (De-i)/I ID&)} (ye (ke (m& a -W(ae-t)+21n{I The formulas (5.9) and (5.10) are exact for normal random vectors and are in contrast with those of (5.1) and (5.5). However, the formulation above goes well beyond the exponential family of distributions and has the key for introducing a constructive dynamic evolution of location and scale parameters in generalised dynamic models.
., e. _ar

cr

-47-

5.5. SUMMARY:
A proper Bayesian on line estimation described in 5.2 .A procedure for the observation variance is

Bayesian techniques is given in 5.3. the non existing short review of

The power law is described in 5.4 Outlines for a general model , for which stochastic . is be given. accommodated, can scale parameters

; i'i ".

xv

-48

CHAPTER

SIX

LIMITING
6.1. INTRODUCTION:
There variance has been a continued interest

RESULTS

in deriving

limiting

values for the parameter DLM1's but

CC and the adaptive in solving Riccati

vector

a, associated with has restricted

observable constant progress.

the difficulty NDBM's the

equations

However for constant

{j, C, V, B} these values can be obtained directly. constant DL11's {j, C, V, W} NDBM's which have

Hence the results also apply to limiting, forecast distributions

set of

equivalent convergence parsimony,

to those of constant

These results are relevant . simplicity

to practice since and parametric DLM's which

is often fast and, in order to achieve conceptual previous efforts have been devoted to determining

constant

have limiting

forecast functions equivalent to those obtained by the application of EWR.

Harrison and Akram (1983) and Roberts and Harrison (1984). Similar models and the method of transforming from one similar model to another
are defined. Limiting for the state covariance matrix results CC and the adaptive vector. matrix C' and

for first models similar a, are stated then for general constant NDBM's.

to a model with a diagonal transition The limiting relationship

between the observations This leads to the

and the one step ahead prediction establishment of a relationship

errors is obtained

for NDBM's.

between the ARIMA

models and the constant NDBM's.

8.2. SIMILAR

MODELS

AND REPARAMETRISATION
objectives of theoretical

:
is to obtain unified

One of the desirable results that may

developments

be used in different representations,

fields of applications.

By looking

at the most In

meaningful NDBM's

economical

this eases the understanding

of practitioners.

this leads to canonical representations

of categorised models. their similarity

The properties of with the canonical within the class of

other more complicated models. Harrison

be studied through can models

discussed (1983) have Akram reparametrisations and

-49

DLM's.
DEFINITION
F FC FC2 A constant NWBM {F, C. V. H} is called observable if is of full rank.

FC"The observability
parameter

condition for NWBM's

is to ensure the estimability

of state

from finite time a number of observations. stage vectors at any

DEFINITION Two NWBM's M; ={F;, C;, V, H, } i=1,2


non singular transformation L such that

are said to be similar if there exists a

{F1L-1, LC1L-1, V, LII1L-1}={F3,

C2, V, H2}

The importance of finding similar models arise in practice since, real life problems
are rather complicated benefits, in this their `primary' physically form statistical meaningful formulation. relationships Apart among from the

computational primary

provides

variables

and those of the canonical

like growth,

level and seasonality

components.

THEOREM

6.1.
L= Ts' T

If Mi and M2 are two observable similar NWBM's then 1 where


Tj

PROOF
Since Ml and M2 are similar, it follows that F2=F1L and C2k=LCI 1, L-

-50k= 1,2,.,.,. {n minas 1}. This gives


F2 F2 C2 F1 F1 at L-1

F`^n-I

FiG'

i. e:
T2= TIC I

From observability

it follows that

T, and TZ are invertible L=T2-'T1

This gives .

The above result introduces the first functions eigenvalues specifically

similar

reparametrisations. #=L9

That

is if 0 is the state vector for the model M2. As forecast the

model then the reparametrisation are characterised

produces of P and

by the specifications

G and in particular canonical

of C plays an important useful in demonstrating

role in that

specification,

forms are

these ideas.

THEOREM

6.2.
3'/X1 <1; i=1,2 If the n. ....

be the eigenvalues of G, and 0< Let X1, X. X21 ... is NDBM {/, V, G, p1} observable, then constant lim{C, R, i', &}, ={C, R, Y, s} tx being C, R non singular. and uniquely exists, with

PROOF
From the NDBM results,

-V

Q, =C, -1Q, _1C-1+/,!

, tip

51

i-1

-e C, QoC-e QQ -' -o
Hence, using the assumptions 0< 3'/X <

pi C. -: f. JC-;
1, IimQ, c-= exist.

(s. 1)

=Q

To show that Q is positive definite, converges to zero, and


YY

consider (6.1) as i--x, the first term

Q= 8-0

" G. -' j, jG-i G/,.

-r3$C. 8-0

- u T. T., G-n

where T=j',

f',..,.. "-

From observability, To show that Q

T=G-(n-t)! is unique,

G '"-1f

RC"-Z /""""R "-1/'1 that there exists

is non singular. 8 such that

assume

s=C, -Esc-1+I, /.
Therefore,

s-Q=,
Successive applications have Z-Q Since, a, = R, j' unique. Ya-=C, Moreover, =O. Q5=C1-1V, V-', the limiting

-1(s-Q)c-,
C'-k(s-Q)C-k, and as k-x,

(6.2)
we

(6.2) gives f-Q=k of

R5=-'CCtC', forms for CC, R5, Yt : and a,

and

all exist and

Q _C, -iQC-i+

f, f =C-i f, y-i

,R

=R-1000'

(6.3)

Y= fRJ'+V

, s=Rf'Y

'=Cj'V-1

(6.4)

THEOREM
Let,

6.3.
n, =n, i=1 C=diag{C1, C2,... C, } and B=diag{(31I1, p2I2,... 43,1, } with

0<i3; <min{1, IX;, z lJ;, z 1;, Is}where 11 , Z1 ,... ,

are the eigenvalues of G;

dimension with n;.

-52-

If the constant NDBM {f, G, V, B} is observable, then lim{C, R, Y, a}t ={C, R, Y, a}


Moreover, C and R are non singular.

uniquely exists.

PROOF
The proof is similar to that of Theorem (6.2), knowing that the observability

of the model gives the observability

in each model component block.

In order to have some ensight into the sensitivity of these models, consider a DLMI
Yt = At + v, . Vt ^- NO; V}
N_0; W]

A =Ai_1 T wi

Given the prior (60IDa) -- N[mo; C0], it can be seen that the posterior state variance Cg and the -adaptive and C =A V= ((W2+4 WV)".- W)/2. Now, both A,, converge to C and A respectively coefficient

discount NDBM takethe the same prior settings and with consider a

factor as 1-((1+4 V/ W)`*- 1) W/(2 V). This guaranties that the NDBM and the DLM both have the same limiting variance C, _1 distribution. However, given any common posterior

DLM for is (W) A, the the adaptive coefficient at time t-1,


Aj(W)=1/(1+V1(CC_1+W))

is NDBM form the under while the alternative


Ae(R)=1/(1- V/CC_1)

NDBM-to the faster DLM the for than if t, CC the Therefore, >C converges all faster limit NDBM the for to t, then if Cj the limit. ` -However, <C converges all than the DLM. This generalises to higher -dimensions.

53 -

8.3. A COMMON

CANONICAL

REPRESENTATION

One of the most common and yet simple canonical forms for observable models have distinct C which with system matrices C=dia9{X1, X2,.,.,.X. } and I=11.1,1,..... following theorem holds eigenvalues , Kl, Xz,.,.,.k. that is

11. For such an observable NDBM the

THEOREM 6.4.
Let if , C, V, B} be a constant NDBM. where with 0<; /= < 'I'l I],

C=diag{X1, X2,... X. } and B=diag{I31, (3z......i3j i=1,2.... Xi u, =pi', , n all distinct. Then
" i) Jim&, =a=jal,
&"'o

1-u,
ui

a2,... a"j',

with

a, =(1-u.

)ll
f;

u2 ,

1-uI

ii)

limYt=Y=V/fl

u?

" limC1=C={cij},
&'s

1-uku, f
Uk Ui fi

"

1-u1uj
Up, uj

c; j=V(1-uiuj)fl
kj

iv)

lim WW =W= B-GCC'

B'-`'-

CCG' _ {w;1},

ell u; u1

PROOF
From Theorem 6.2 or 6.3, lim{C, R, Y, a}j={C,
: -z

R, Y, a} all exist, unique,

Moreover, 0, R non singular. are and

}=C=(I-af)R {c;;
Q_B''C'-'QC'-'B"+! '! , Q=C-'V={vr; }

(6.5)
(6.6)

-54-

a=a(n)=Q-i , where a'(n)=(al(n), From (6.6),


1 1- u' u Now, multiplying (6.7) by Q. gives

fI

(6.7)

az(n),... a. (n)J.

(6.8)

Qa(n)=I For n=l, (6.8) and (6.9) gives al(t)= For n2t2, i=1,2.... n ;' from (6.9), we have

(6.9

tO) t6.

i=1,2,.,.,.

(6.11)

Therefor,
w-1 awn)(i' 9. ai (n)y4..

substituting

for a. (n) in the firs n-1 equations of (6.11),


*-1 Qww 4ij ' TII 9, w

ah(n)=1 j-1 9ww'4w

This gives
'Si 1-s-t u

yjjaj(n)=1 1-u1 u.

(6.12)

Since a (n) is unique and (6.11) is true for all n,


1-u1u
ah(n)=-aj(n-1); uI 1 =1,2, ",.,. a-1

1- us

-55-

(i). (6.10) This proves together with ,

ii)
From (6.4) V=Y-JRJ'

=(1-js)Y

(1'=t

ajY

nn

but from ( i) it follows that

V a, = I- [-[ u?

iii)

From (6.5) we have esj =(e; j /uj u, )-a; a Y =ujuja1e1Y/(1-ujuj)

(ii) (i) from follows and the result iv) Easily derived from the definition of W

COROLLARY

6.4.1.
for all k then (i) reduces to the EWR result of Dobbie (1963).

If k =0

The theorem is of practical interest mainly for periodic models with distinct For inside the lying unit circle. a real observation series a or on complex eigenvalues limiting be the C adopted and corresponding values would similar model with real
derived. are easily For example, if G=,, cosw -sinw sines cosw an alternative NDBM can be

G= with considered

`V 0e

0i

A more general procedure for finding the adaptive coefficientscan be deduced


following the using

:.
THEOREM 8.5.

-56-

Let if A= CF' Y-'.

G, Y, H} be a constant , Then the , two

NWBM

with limC, = C, e-= (I -AF

non singular, )H and H-'

and have

transformations

identical characteristic If H=l,

polynomials. and C-`, and (I-AF)C and PC-' have

then (3-(I-AF)C polynomials.

identical characteristic

PROOF:
Since limC1= 0 is non singular. from the N VBM properties, we have

C=([-AP)R

R=UCH'

This gives
CH'-10-I= (I-AF)H (6.13)

The result follows from (6.13). The above results can be used to calculate the limiting adaptive coefficients for its NDBM if state covariance matrix converges to a non singular any observable limit. In particular, for {f, G, V, 3I} NDBM's such that 0<
NDBM's

2 /x; <1,

where

in following C the the as of are eigenvalues in used practice

that are commonly

apart from the one given in Theorem 6.4.

COROLLARY

0.5.1.
has Jordan form and 0<< J(X) where, a X2.

If /=[1,0,0,... 0] and G=J(X) Then


AG Xa, +a1+i= s where p= A s'=[al,

(n)p

i=1,2,... n ,

a2,... a,,; and a. +1=0.

-57-

PROOF
It is easily seen that the above NDBM is observable. Since 0<0<
(6.2), Theorem using lim{C, a}, ={C, a} both uniquely exist. Moreover

X2,
C is non

singular.
From Theorem (6.5) det(G H=-lG for all r.

-i1)det((I-aj)G-il)

This gives
tx-r1 -r2 1000 a1..

det{
-r. 000a

}=(/a-=

where, a=x-z

and r; =Xa; +a; +1, a. +1=0, i=1,2,... n. Hence,


fite i-0 (P-Q)w

(6.14)

The result follows from the comparison of the coefficients of each power of a in this equation. COROLLARY 6.5.2.
matrix with entries 1 and 0<p

IF f =[1,0,0,.,. 0] and a is an upper triangular

<1,

then
i=1,2,.,.,. i-: n.

ai=

PROOF Following the steps as in the proof of Corollary 6.5.1, the alternative form of (6.14) in the variable x is

58.

(1-z)*

-a1(1-z)*-'+a3S(1-z)-2t...

+(-1)"a.

z*-1"((3-z)'

(6.15)

Writting

this in powers of (1 -x)

and collecting the terms in the coefficient of

(1-z)', from both sides of (6.15), gives


itk
k k-0

-1

)actk=

()l-l)

_n

i=1,2....

a.

The values of ai can be found successively from the above equation.

COROLLARY
In Corollary

6.5.3.
6.5.2, if j is replaced by /=1.1.1......
=1. ......

1. then

ai_ i

F'

PROOF
Similar calculations In a. )+ (-1)kzk(1-z)w-kQk-(-z)e give the alternative form of (6.15) as

The result follows from comparison with the terms of


(13-zY= 3(1-z)-z(1-)l*-

-lk k,
k-0

ik

COROLLARY

6.5.4.
010, Jwith 0<R<1, then a=1+13' and a; =0

If I= [1,0,0......0] C= , ; otherwise. PROOF

Similar calculations shows that the alternative form of (6.15) is


z"+a2Z"-1+... +a,, z+a, -1-zA+0 0

The result follows.

59

Now,

( distinct X. that X1, X2,.,.,. not necessary given NDBM, the restrictions Denoting

) are the eigenvalues of C

for a constant proper

on B ensures the existence of limC, =C as a e-s the eigenvalues of CR-`C or equivalently

covariance

matrix.

(I -a f) C by p, ;i=1,2,.,.,.

n we have the following

.' THEOREM

6.6.
any constant observable NWBM {f . C. V, H . for which limCC =C e-=

Given

is positive definite,
nn

B)ye-

B)e,. =0

backward B is the where


error.

shift operator

and ei is the one step ahead forecast

PROOF.
Since limCC =0 is positive definite, lim{R, a}1={R, a} exists and R is non

singular.
Let p,, i=1,2,.. n be the eigenvalues of I=CR-1C=(I-af )C.

(4.1) A direct application of the Bayes theorem in updating (4.3) using , with univariate observations, gives as t ---,
wt= Gw-1 aee (6.16)

Cm1-11. V 1j'y,

or
(6.17)

mi =xmt-1-+'sye

From (6.16) and the identity

et,+1-

yt+t

-f

Cm%

we have

ee+i-ye+i-fC(I-BC)-ise,
Hence,

-60-

(1+BIC(I-BC)-1a)et+i-yeti The same identity with (6.17) gives, e +jye+j-/C(I-B%)i . yt or

(6.18)

ee+i=(1-BfG(I-B%)-la)Y41
n

(6.19)

Nte

that,

det(I-BC)

and

det(I-B%)

are

fl(1-7,

B)

and

fJ(1-p;

B)

respectively. (6.18) and (6.19) can be rewritten as

PI(B)ee+t-

n fl (1-XiB)yi+t : -1

(6.20)

(1-P. B)re+t-P2(B)yj+1 : -i

(6.21)

Where P1(B) andP3(B)

are polynomials of degree n in B.

From ( 6.20) and (6.21),


n

H(1-X1B)

II (1-P1B); --Pi(B)P2(B)

The result follows using the factorisation This result result obtained

theorem. (1976) and hence the same through special DLM The

includes the EWR result of McKenzie by Godolphin and Harrison

(1975)

formulations. Normality using

These are obtained assumption

for scalar discount matrices.

That is B=I.

can be relaxed since (2.16) and (2.17) can also be obtained unbiased models. linear estimation. Hence the results may be

minimum

variance

Normal beyond the extended

8.4. A GENERAL
Any constant observable NWBM

LIMITING
constant M={f,

THEOREM :
NWBM {/, G, V, B} , is similar to the canonical

G, V, B}

where

writting

B=diag{(31, 2,.,.,. %} ,

-61-

j=

0) and H= [h,, J such that

h. =-h,.:. and 0<,

1=X1/(R1)"=u1

if

i=1,2,...

n.

h,

j=0

otherwise,

', ;

<

j =! a,, with

a,, =ui

for jzi

and

a;j =0 otherwise.

It follows from Theorem

6.3,

that

limC= =C=

Q-i V exists where the precision

Liaponov be to the give rearranged recursion can

equation

H'Q_QH-1+g,

I'I
of Q=(9,, } :.

This allows an easy sequential term by term evaluation


2 Ut 4t1= 2i 412= 9t1

ui -1)

(uiu2-1)

q1,: qt; =ulu;

-i

_1 (u; ul-1)

for

i>2

and
qi-1, ui k 5i, k

Uk

where
Sr, k `qjj

k=9.,

k'

and

i-i

It follows that

i)

C=Q-'V
AA

ii)

s=Q-lj'

a1=1-

flu; : -i

and a. =(-1),

+i(1-al)fl(uu, ;. 1

-I)

Va

Y=
(1-41)

fl =V u?
i

-I

-62-

iv)

W=HCH'-CCC'

B.S. RELATIONS

WITH ARIMA MODELS :

Let Y, be a random time series generated according to an ARLMA model


nn

B) Y,

1< I <1 0 <Ix; <Ip; where -1 ,0 ,i=1.2


E and (12 a, a( .k=0 for all k>0.

is E Eat2 0. n and that at such = a, = ......


The appropriate error e', Box-Jenkins (1970) predictor

replaces a, by the one step ahead prediction

and it is well known that

lime', =a,. Applying the appropriate Dynamic Model ; j. C. v..: to the realised series

(1-piB)et}=O
j-X i=1 i-l

Hence limle, -eI c-z

=0

and with probability

one , the limiting

Box-Jenkins forecast

function is equivalent to that of the Dynamic Model. For an unbalanced ARLMA process
pv fl(1-A1B)yt i-i fl (1-P: B)ae = i-t

Let n=

{p, max

q}.

Then given any e>0 if n= ,

(or p

n=q

) by taking p-q of

(or the p i'8

q-p of the X; 'a) approximately

I Ie, is lim to close <E zero, assumed. - e', es

Thus, in the sense of limiting modelled by constant NWBM's.

forecast functions,

all ARIMA

processes can be

In fact if the limiting

posterior state variance is

taken as the original prior variance then, the forecast functions can be identical to that of ARIMA models all the way through the sequential analysis. However, as

informations NWBM in a sensible way. This the parameter stated earlier, provides simplifies explaining and controlling the process and models behaviour.

63 -

6.6. SUMMARY
This chapter

:
is concerned with the derivation matrices, some well of some interesting limiting

results regarding the posterior parsimonious representations. transfer

covariance using a simple

the adaptive coefficients and the known and simple within canonical similar the The

functions

In particular

transformation

procedure the forecasting

in 6.2. discussed is models adaptive link with 6.5. vector, precision

Limiting

results regarding matrices

variance.

and covariance

are given in 6.3 and 6.4.

Box-Jenkins

ARI. %1A models in terms of forecast functions

is discussed in

rr

CHAPTER

SEVEN

MULTIPROCESS

MODELS

WITH CUSUMS

7.1.

INTRODUCTION

:
data sets are based on the assumptions collected and well behaved. properties that the input in practice, Often the

Many: analyses of statistical data is free from exceptions, it is hard to believe that data contains missing

properly

However,

all these smoothness outliers that

can be guaranteed.

values,

and sudden structural

changes in the process

behaviour. procedures pointed

It is then believed causes model

the occurrence of any of these events in sequential and damages the available prior information as In This various occur

breakdown

out by Jaynes(1983). the principle of

These events call for model revision and amendments. ' Management producing by Exception routine ' is widely applied. by

forecasting, constitutes

mathematical

methods

forecasts

required

decision makers.

These forecasts

are acted upon unless exceptional

circumstances

due either to the anticipation information, (see Harrison

of a major

change arising from the use of reliable market (1967)) Harrison and or, to the occurrence of

and Scott(1965)

some unforeseen change in the pattern and consequently a model breakdown.

of demand which causes unusual forecasting errors A flowchart of the principle is given in Fig. 7.2.

A Management

by Exception

Forecasting

System (Fig. 2)

Regular Data

Routine Mathematical Forecasting Method


Interventioniby Exception Error Control scheme ( e.g.

Market Information

'*vAARKETDEPARTMENT: ', information to provide forecasting system. routine Vet forecasts and issue USER, DEPARTMENTS: e.g. Stock Control, Production planning and purchasing systems. Market planning, budgeting and control

Exception Signals

Forecasts

,.

-66In this chapter, efficient statistical models are introduced to deal automatically with exceptions. Ameen and Harrison (1983 c). Section 2 reviews the historical background and

j$;;;

developments. The backward Cumulative Sum (CUSUM) Statistic is reviewed in Section


3. The Multiprocess model approach of Harrison and Stevens is reviewed in the light of

discounting in Section 4. In Section 5, the ideas from the backward CUSUM and the
multiprocess combined to models of Harrison provide both Stevens together and and with the Modified NDB. %I's are models called

economical

efficient

multiprocess

Multiprocess Models with CUSUM1's. These eliminate many unnecessary computations


involved in the existing multiprocess unchanged structurally models and protect prior information on components

in other components, when changes occur

Ameen(1983 a).

7.2. HISTORICAL
Woodward

BACKGROUND

AND DEVELOPMENTS

:
tests to detect

and Goldsmith

(1964) have employed Backward

CUSUM

in demand. unanticipated_ changes Ewan and Kemp(1960) controlling

The procedure is given by Page (1954), Barnard(1959), Harrison and Davies(1964) used CUSUM's for

and Ewan(1963).

routine forecasts of product

demand and provided simple recursion formulas to -

details in Section More 3. These data on the reviewed are storage problems. reduce CUSUM statistic can be found in Van Dobben De Bruyn (1968) and Bissell(1969). (For Wald(1947)). tests see general sequential Previously having detected a change,, ad hoc intervention procedures were applied. The first routine computer forecasting systems for stock control and production planning, linear ( Holts ) Moving Averages Weighted EWMA Exponential growth and employed limiting forecasting All the methods used model, with or without seasonal components. The behaved data. long history occurrence of well predictors which assume a reasonably that, means change a major of in some respects, the current data does not reflect a well

behaved process and that there is greater uncertainty than usual about the future. Hence the next data points will be very informative in removing much of this increased

the by limiting be be than they allocated would given more weight uncertainty and should
I

-67-

predictor.

For example, consider the use of EW.MA with forecast function


Ft(k)=mi where ,

and

e, =y, -'n,

-t

a, =0.2

This may be written as


mt=0.8mt_t-0.2yt

for demand is the period t. observed where y, Suppose that in the limiting that mi = 100 with department case, the variance variance V(c1) = Var ( f1IDt_1 )= of a CUSUM 125 and signal.

an associated

of 20. As a result by stating

Marketing

intervene to may wish

that their best estimate of the

level is not now 100 but 150 and that their variance associated with this estimate is not 25 but 300. In the past there was no formal based upon the assumptions way of dealing with this. derived of stationarity Classical time

series methods Typically

are inappropriate.

introduce done to was what was 0.2. One procedure put

a change in the adaptive coefficient a which

here is originally

r* 9-i/10 a'+'0.2

if i=1,2,.. 6 if i>6

This approach is not very satisfactory nor does it generalise well in dealing with other kinds of change. The DLM's of Harrison and Stevens(1971,1976) and the NWBMI's introduced in Chapter 4 provide a formal way of combining subjective judgements and data. In the DLM is the adopted example above
Y9=99+v9 ; vi--N[0; 100]

0t 0j_l+w$

w, -N(we+W

-681,, A, represents the underlying market limit (Ai ID, )-N(m,; 20j. The limiting level at time t, W' =5 recurrence relationship IID, and usually w, = 0. In the is mj =m, _t+0.2e, and the

limiting provides

forecast distribution one step ahead the limiting point forecasts.

is (YY;

)-N(mi;

125j. hence the EWMA market forecast the

In the

example 280i. (YY,, Now

(B, ID, )-N(100; 201, the the one step ahead

information distribution

is communicated is not

as we+l-N50;

(Yt_, ID, )-N"100; 1251 but

400!. 1IDj)--N(150;

Immediately .

recursive equation departs from its limit future interventions limiting

and becomes mt+1=m, -50-0.75et., coefficient

Provided to its with

do not occur the adaptive Note that

a, +; returns fairly quickly

value of 0.2.

the same results can be achieved using an NDBM time its value is reduced to ii

discount factor =0.8 where at the intervention adjusting the state prior mean from 100 to 150. related works are those of Kalman

=0.066 with

Other

(1963),

Smith(1979)

and Harrison

and

Akram (1983).

Bayesian forecasting provides a means of dealing with specified types of major deals forecasting forms These that the change modelled with of. are so system changes.,,
them in a prescribed way. the initial implementation of the resulting multiprocess models

is described in Harrison and Stevens (1971,1975,1976). In addition they involve to the limitations drawbacks and

These are reviewed in Section 4. in Chapter 3,

of single DLM's mentioned and

unnecessary computations. Kidney transplants.

Smith

West (1983) have applied the models to steady

these state

models

to monitoring

Restricting

Normal to these non are generalised processes,

models by Souza(1981). Limited

is success

(1983) Smith in Gathercole the by reducing computation and achieved redundant Makov(1983). models according to some pre the specified rules.

by removing efforts attempt see

For another

In general

practice

development

and existence of these

methods

replaced the control chart techniques.

7.3. THE BACKWARD

CUSUM

:
6

-69-

Control
departures

charts provide

simple

and effective tools for detecting


valuable

changes and
control. In fact Further

from specific target

values and are particularly in detecting

in quality

Page(1954) used Cumulative

Sum charts

changes in process level. of these. changes.

they can be used for detecting developments

the amount

and direction

Length Run Average increase to the this topic the sensitivity use and of on Woodward and Goldsmith(1964) and Ewan value, the

Barnard(1959), in found be the signals can of and Kemp(1960). CUSUM statistic Given

T the and as a target observed value as process yt where e, =y, - T. Then choosing

S, is defined for each time t as Ytvisual , S, is Normal inspection with zero mean.

Since (e IDi)-N[O. constants Lo and a,

two positive

can be carried out with

the graph of (t, S, ) and

from hole V-shaped the and placed a piece of cardboard on cut out graph with the a using vertex point. V-mask the of pointed horizontally with a distance (Lo/a)+1 from the leading No change is and Davies The target

The edges of the V-mask

being apart with angle 241 where tan4, =a. , curve remains inside the V-mask. for monitoring forecasts of-product Harrison demand.

CUSUM long the as signaled as (1964 ) developed the method

is the one step ahead point value forecast errors. ahead economical algorithm Define

forecasts so that the e, series is that of the one step simple and

In order to reduce computer storage, a conventionally was employed.

ae+1-min{Lo, at}+a-ce+i/Ye+,

",

and

(7.1)

min(L

}+ate

kt+l 4

Yt+1 forecast A is is if the ahead variance. step change signaled one and only if where
Initially 0. 4g} < min{a,, lines. guide used as THEOREM 7.1. do=dO=Lo. In choosing Lo and a, the following facts may be

Given that the V-mask has not signaled a change at time t, for time t+l,

-70-

i)
x
11.

A change will be signaled, if


(ec+i/k4+t I>Lo+a

(7.3)

.3

11)

Change will not be signaled if ,


'<a

(7.4)

PROOF i) From (7.1) ,


ac+i=min{La+a-e, l/Yt+1'; a -a-ee+i' Ye-i }

Hence " given'(7.3), ai-

< 0.

ii)
^F

Substituting (7.4) in each of (7.1) and (7.2), it follows that a, >0 +l >0. ,
;

and d, +i

4i

7.4. ; NORMAL

WEIGHTED

BAYESIAN as

MULTIPROCESS proposed by

MODELS Harrison and

: Stevens

DLM The ` -'- multiprocess ,' models

(1971,1975,1976) are reviewed in the light of the NWBM's introduced in Chapter 3.


The s; et'' {(MI', tl),:,... (MIN), r(N))} for i=1,2,. is a multiprocess C(`), VI`I, II'1 NWBM with N model

com Sonents such that: p

'.. N " A`)={F(`),

represents a NWBM

where
N

P('); LOis the posterior probability of model i at time t, the model transition probability vector such that

P, ')=1

i, ,

l'1=(aitl,.... 4j

is

'Tkjl'1=PAMIMI'il

j,

that is the

time k the t-1 that t time that at operational model given operates model at probability in 's known 's M') Initially the that the M{'l. practice are assume p(') and was , although let At ') be N t-1 time 's there v posterior the on-line. conditional are estimated , distributions forecasts are : (YIM'
t-1, ' M'

(0,

_)1, Dr_1)--N[mt_1('); Ct_1(')] . The N2 conditional one step ahead _1IM,

D_1)-N[j,

(ii)

71 -

where
`(ii)_F(i)C(i)wat_1(`) and it, ('i)_F(i)RI(i)F, (il, y (i) , with

(i) R(1i) = 8t Ci-(I) ig9

li) ,

The one step ahead forecast distribution


Y

is expressed as a mixture
.V
)

of N2 Tormals

Ye De-t)'_''

PC

-i

(i) "V f. (ij); f, W) 7*

Also. t,

N posterior given

models at time t-1 , ,V2 prior models are produced for time the .V2 posterior models

for which given the data at time t and using Bayes theorem,

for time t are : (i)+Mi-i+)+De)"N(rsli)C("")1 IM, (Ot ; id =1,2,... N

where

el

Ae(+)=81
(ii)_

(li)d-t )
('1)

11-1,

likelihood the : and given

(')1, D, ) L(MM')IM,

yg(a)I I

))-ie(1)} exp {-'fice(): e(!

the associated N2 posterior probabilities Pe() xD

are :

I)n1i)Pt-1

In practice, in order to keep the computations manageable the same collapsing , Stevens Harrison is defined by used to complete the cycle. calculating and as procedure
as

-72N

Ptti) i-i
N

p9(+i)

Cti)_

N P+i)iC`a)+(me")__ . =t

As in partitioned

Harrison

and

Stevens(1976) in which ' ,,. equal to

models,

the

NWB

multiprocess

models are

into Class I models, but

there is no transition zero,

between models, that is in which such

r` has all - elements transitions

Class II models, and The former tests.

exists and models operate

interactively.

class is used on-line for Class II models are used models. As explained

model discrimination, for modelling in Chapter multiprocess applications. as outliers cases slope Brown(1983) generally

model estimation

and hypothesis

some prescribed

types of disturbances DLM's

and alternative

4, all the normal normal DLM

can be formulated can be

as NWBM's. as

In this sense, the NWBM such

applications

counted

multiprocess

The former

has worked

well in analysing

processes with disturbances components. However,

in level and seasonality and sharp changes changes have not been modelled

in many

as successfully This is largely variation

as would

be desired.

also commented

on this problem.

because such changes are

small compared

to both the random identified (1964)

level. in to process and changes Smith difficulty and in

Hence slope changes are sometimes Cook(1980), distinguishing Harrison and Davies

as a series of level changes. also commented analysis. on the

level and slope changes in CUSUM

A further

criticism

of these

These involve is that they problems are multiprocess models unnecessary computations. Class by introducing of a new class of multiprocess models using a combination overcome I and Class II models with the Backward CUSUM statistic as a control device for shifting model operations-from Class I to Class II models. Class I is retained when one of the
probability limit.

11 threshold Class some prespecified attains members of

-73-

7.5. MULTIPROCESS

MODELS WITH

CUSUMS :
there is a preferred model tif(1) called the .
The analogy with quality in an expected way. control is

In most multiprocess Class II applications,


` mother that M' ' model by Gathercole and Smith

(1983).

describes the data as long as it is behaving

The other In

i>_2 M'l; model some generally significant models , particular outliers or mavericks and significant

type of departure

from the norm.

changes in the trend are often modelled.

In the new approach the mother model M(l) is represented by a NWBM {j, C, V, H}i
This model forecasts which produces are used unless a departure from normal is

CUSUM by the scheme which operates signaled Then, starting with the latest observation which

on the one- step-ahead forecast errors. helped to trigger in a multiprocess the signal, the other with a

models are applied. high probability posterior MP)

All the models then operate to the ' mother'

Class II way,

of transition Pt'

model ( NO

), until such a time that When this happens,

the

probability

of model MD) exceeds a given value.

model is

begins to operate alone and the CUSUM although

scheme is reset.

When one model

operating,

based forecasts upon model AP), are all phase.

the competing

models are

being prepared in readiness for the multiprocess

and

in C3} C=diag{C1, C1 represents a trend component For example which consider , d Cz a seasonal component. Let model major changes in trend, .1M3 model and model M4 model major changes in seasonality. vectors respectively. For M 'l

outliers,

Let O1 and Oz, be the trend and seasonal parameter

i=1,2,3,4, let the posterior state distributions at time t, be given as


W oz ee ms(s) CW C. s(: W ) 0M

The priors for time t+1 are then formed from the posteriors as follows: r02 lG1MIW8 ('))--N[ (D eMe
e+i
1(x) 83 (x)

+ (0) U) () G zs R. R2 wa

-74-

where $k'=
I,. " " -, Cl, Ck(1) C'k/P( k1)

k =1,2 ;and

2 if k=1 l=4 1 if k=2

and i=2 and i=4

otherwise

R3$)

= C1 C'311)C'2"(01

02

0<01

)<011)

=0i3)i4)

<1

<z <<4', -R -Rz <1


The general . characterising mother principle is to derive the marginal prior for the parameter block

({)

(2)-

(3)

the change,

from the posterior

but to take all other information on other components Otherwise

from the

model., This aims at keeping the information

as stable as

possible in order. to, prepare good estimates structure estimates between the model components

for the changes. might produce

the covariance in the model

violent

fluctuations

of the presumed stable

components.

In addition so that )P .1

a set of preparatory

probabilities

is calculated using Bayes theorem, p' x L(ygIAf

When the CUSUM

signals

a change all these preparatory phase. However,

values are used as starting other than In

values for the NWB multiprocess that marginal characterising order to exercise control

generally all information

the change continues to be taken from the mother model. over the response of models to exceptional for alternative events,

a guard

procedure on the observation by choosing models-Ai')

variance

models is used. This is performed for the and 0, V(3)

V'3) and (') ; i>1 i> 1, are equal.

that so , P& example,

the one-step-ahead forecast variances with prespecified i1) , ll , The outlier (2)

Le: given B(i),

01), R(1) and 12) can be calculated. (2) and '. This gives j(3)= Yi2)

variance

i= Yip)j&1 be v3 that then can set so

-75'). Y4')= ]

Since defining r, =f
(

, C1 C;;,
(4) ) -1

(') can be chosen so that

In particular,

&_1C',

I ';,
(1) (4)

i=1,2 and r12= f1 CI C1,,, _1C'2J'2,


) -'s r12+ I)_7 -9'(i (1) P2 (1) - )12-'-(P2 Z1)

we have
1r2+ )

r2+2(1

This is a second degree equation and can be solved to ((3(4))-'.

This operates when

during If first the single model phase, the CUSUM required signals a change. the Chapter described in 5. be V methods using either on-line estimated may variance During the multiprocess phase the forecast distribution
the mixture function of Normals.

is often multimodal being


Normal loss

Point forecasts are then derived using a conjugate Further

introduced

by Lindley (1976).

discussion of the use of such loss functions

Smith, Harrison in found be can

Harrison Smith (1980). Zeeman(1981) and and and

-767.8. SUMMARY :
The principle forecasting the of of Management systems by Exception in Section is discussed and a historical 2. The backward multiprocess Finally CUSUM background statistic is in

is given

NWB Section Section in in 3 4, the and reviewed the light Harrison of and Stevens multiprocess

models are introduced

models.

all the above ideas are

Section in 5 to give multiprocess combined some detail for a linear growth

models with CUSUMIS and this is explained in

seasonal model.

t,

,.:

", z

-77-

CHAPTER

EIGHT

APPLICATIONS

8.1. INTRODUCTION
This situations. since any chapter

:
of the developed theory using the principle can be decomposed vectors. with This in a variety of

is devoted to applications

The NDBM's multivariate

{f, G, VV, BJ are constructed Normal random vector Norniai G,

of superposition into a linear

combination

of component

multivariate block

random

suggests that model

C =diag{C1, C2,.,.,. C, } where- the component. proper Accordingly, /=(jl,

is associated

a meaningful

I2,.,.,. f, 1 and B=diag{131I1, 212...... , I, } each with G; =X1 's being distinct X, the , eigenvalues is, fora are pair

dimensionality. of C.

This includes the case where

eigenvalues concerned, of conjugate G_r

But for real observation

processes,

where complex

it is usual to consider conjugate

pairs in the same block. of multiplicity one,

That

(Xeiv eigenvalues complex 'ke-")

the adopted

form is

1, sines cosw This could represent a damped sine wave of period 21 and would L sines cosw W have a single associated discount factor 00<0The discount criterion. factors used

typically throughout for further

are not chosen according to any optimisation Trigg here. research Leach(1967) and

There might be room to redefine Brown's

have attempted

discount

factor as specific functions of sign and absolute one step ahead error forecasts. with 0<P<1,
This gives

For an EWR {F=1, G=1,0},


adaptive coefficient is 1-. a,

it is easily seen that in the limit,


s-i m1=mo+with s-o (1_)k

the

as

the

weight

corresponding

to a data

point

that

is k periods

old. N-1.

Comparing

the average

N in data the period moving an age of

average,

(1/N) 2 (N-1-i)=(N-1)/2, . -0

to the

(1-)Ti3', age average +-0

based on the above EWR

model,

Montgomery

and Johnson

-78(1976) have obtained the relationship =(N1) /(N+1). Using this relation,

Agnew (1982) suggested that 0.33 <_13 0.78 Clearly, such low values of give highly .
adaptive unsuitable encouraging models with large lead-. time forecast variances , which would be totally more where data

for such purposes as stock control suggestion is that of Harrison That is

and production

planning.

A rather 3N-1 34V-1

(1983) Johnston and given by =

N represents age to half effect. point to halve in value. Apart from

,N

is the time for the weight of a particular

This leads to higher li values. periods. the discount factors here are chosen more

the discontinuity

close to I such that

the more stable the component

the closer its discount factor is to 1.

Experience shows model robustness against this choice. In modelling protect model discontinuity periods, the Modified from NDBM 's are used in order to

components

information

unwanted

interactions

and the guard

is described in 7.5 the employed. variance observation, procedure on

For a straightforward

application of a single NDBM to a data series which exhibits US data the air passengers see set which is analysed in

no major changes and outliers,

Chapter 2. Other selected series considered here , are : i) A simulated seasonal series with trend, level, seasonal changes , outliers and in is This the to the these performance examine phase of missing observations. discontinuities and major changes knowing the true underlying model. The

data is analysed using both intervention and multiprocess NDBM 's.


ii) In order to test the performance of the multiprocess multiprocessor, data set concerning a medical NDBM 's vis the CUSUM charges is chosen.

prescription

'The data was previously

Harrison-Stevens using analysed

multiprocess models.

iii)

For a typical data set with an unknown and variable observation variance, the Road Death Series is chosen and the CUSUM multiprocessor is applied with an the variance. observation of estimation on-line

-79-

All the data sets are provided in the appendix.

8.2. SIMULATED

SERIES

In order to examine model performance in phase of major changes and impulses with
a minimum risk of misspecifications, artificial data is generated. The data is analysed

Intervention. both an automatic method and using using For an automatic way of dealing with these changes, a multiprocess model is used.
Automation from in analysing point statistical of view. data sets may not be a desirable property However. multiprocess to aim for

a Bayesian

models have a wide range of

applications

in areas other than prediction and classification Data the of

of future outcomes and are especially valuable

in the detection

in different types changes process components. of of

8.2.1.

Simulation

The artificial

data is simulated by the superposition of three component series. =rwl, w3Jt and

These are an independent random noise v,, a linear trend component r'1

by harmonic '2,9=(W31w419 The 12. represented a single of period component cyclic 0 a , is carried out using the model : simulation Yt=f O'+v'

0= [ cOS Cz= cwith ,

where

C=diap{C1,

C21

C1= I

1,,

v, -N[0; 400], w', =[w'l, i, w'2 0'a=[75,9,85,11.

]-N[O; diag{13.175,0.017,1.14,1.14}) and

Accordingly,

a series of 120 monthly observations is generated and the following

major impulses and events are imposed:

i)

200 is subtracted from the intermediate simulate an outlier;

observation at t=32,

in order to

-80ii) immediately after t=36


to give a jump iii) following t=60, the linear growth is reversed in sign from roughly 8 units per

deseasonalised ' ' level is reduced by 270 the process ,

period to -8 , giving a slope change ;

iv)

following t=85

linear growth is again reversed in sign and simultaneously the ,


is increased by 50 and 115 are eliminated ; to give a period of three missing

the seasonal amplitude v) data points observations. 113,114

8.2.2.1 INTERVENTION:
Intervention involves changing a routine or existing probability model often by

introducing through

subjective information. functions

In classical time series, interventions are specified' In Bayesian Dynamic Models, distributions which not only

transfer

( Box and Tiao (1975)). transfer probability

intervention

is achieved through

introduce an expected effect but also introduce an extra uncertainty associated with the change. The object of structuring a model, is to enable changes to be made to particularmodel components in such a way that leaves other components largely unaffected In the following example, a useful way in which additional uncertainty can be specified through the discount factors is illustrated. role of the state random noise w,. V B, } is applied with f For the data simulated in 8.2.1, the NDBM {J, C, VV, ,G and from Apart intervention s, 3, i34}9. times B=diag{i31, defined at of there and as
values 1=s=0.9 and 3=i=0.95 for `optimal' No are used. attempts discount factors.

As explained earlier, the discount factors replace the

the

to improve model performance are rounded

are made by looking figures thought

The values chosen,

to be appropriate

bearing in mind that it is usually preferable to err on , factors, Harrison (1967). Initially , the same When the

the : side, of ' underestimating starting major

discount

but for go with a vague covariance are adopted values it is that (v) (i) to to assumed occur about are changes

matrix

2000 I.

the type of forthcoming

81

is no available information but known there is that event

on the size or even the sign of the updating procedure


information from

known is is it Since, that y32 an outlier, the coming change.


treats it as a missing observation. misspecifications t=37 is signaled This is to protect

model components

that the outlier observation

may provide.

Foreknowledge growth

of the jump at 61 by

by (1, Z)=(0.940,1) and for the trend

change at t=

distributions instance In the updating (l31, 2)=(1,0.98) . each ,


modified NDBM from since, the in practice effect of , it is desirable to protect of

are obtained using a


information changes on model in other in

components components. 4.4.1.

imprecise

descriptions

sharp

At these intervention

times a Modified

N'DB. NI is applied at t=86

as described

The simultaneous

sudden change in trend and seasonality . The three missing gY observations of 0, for t= 113

is signaled by with simply b

ao so Baa = diag{1,0.9 ,0.9 I} taking the posterior

are dealt

distributions .

114, and 115 as the prior parameter , and the corresponding one step Since Mean

distribution

(Oi ID112)

Fig-3 shows the observations

ahead expectations in the routine Absolute

in order to demonstrate V=

the power of the intervention major disturbances

method.

data generation ( MAD)

400, without

the limiting

Deviation Y=

forecast be the errors would step ahead one about 18.7 = of performance in terms of the

0.8 1's, where MAD

400. / ( i2) ( Theorem 6.4). The overall the outlier

is found to be 20.76 after omitting and ellb'

jump the e32,

e37 and the three

missing errors e113e11, below. given

The performance

in terms of the MAD for each year is

YEAR
rs

1 39.5

2 20.3

3 20.5

4 20.3

5 15.0

6 17.2

7 17.4

8 20.1

9 21.5

10 15.8

MAD

observations 00 00 a 0 0

It $ 3 -

I I 0 :
(D

T -+ 1. ,W
1

(!)

U7 (n

4*
tD

Z 1rn

o o

CD (D 1 4) < a !V .. `O rt 0 CO

a T

13

x >

1 -0 1IIrn 17

v)

et -o.
W

I :z 1-a IM
C

O3<_.

' Iz 0% 1-4 et
I-.

Irn

o o"- 1 3 lo

-;

Iz
1-4

'

In

I l- I

Ir
10
O

CD
Ln Irn

,, >.

-83Multiprocess Models - The Artificial Data :

8.2.3.

For an automatic way of dealing with the major changes in the series , it is assumed
that the series is monthly Given with an additive linear growth and one harmonic seasonal could be

component.

this structure, outlier,

the possible changes in the series which seasonal changes and/or However, changes, combinations

considered are trend change, possibilities

of them ( 23

) 'alongside of the mother model. model the combined

successive operations of the main if any. Clearly, the computer

changes may reasonably storage and running

time increases exponentially

with the number of models considered. models should be considered. results. an NDB multiprocess model is

This suggests that a fewer number of alternative For a rough comparison constructed factor with intervention

with four models comprising. factor

A basic or mother

model with trend discount variance V= 400,

j31=0.9 and seasonal discount

1 2=0.95 and observation model, a trend

all' as specified factors ' :,; change (31=0.02 discount

in the basic intervention and I3 2=021 an outlier factor and the variance

change model with

discount

and a seasonal change model. for the outlier model

The seasonal from the

are found

observation probabilities

variance controlling

rule for the alternative

models defined in 7.5. Transition trend, outlier and seasonal

from models at time t to each of the mother,

0.8,0.095,0.1 are taken t+1 as time change models at Given the same initial NDB multiprocess f_ using 'case, intervention in the as settings

and 0.005 respectively. case, the data is then analysed controller. In the latter of 0.8 for

models without

and with the CUSUM

0.5 threshold 2.0 with probability taken respectively a and L,, and a are as back to single model operation. intervention the unlike model, multiprocess

switching

As is to be expected,

models need more

adjustments. model proper make and to changes recognise time 3,4,6 for the years errors observed

The high forecasting

and 8, may partly be due to that and partly due to

Level in the and growth models. changes the alternative are combined of selection the This growth together. is change seasonal and are while, modelled to model, change trend

84 -

keep the process of alternative models performance the CUSUM . without

model selection

vague and simple.

A summary

of the

a CUSUM is presented The overall

in Fig. 4 and Fig. 5 presents that with after removing the errors

statistic.

MAD's

e32, e37, e38, e39, eiis+elil

and ells were found to be 25.03 and 22.74 respectively.

The

performance in terms of MAD for each year in both cases is given in the following table for comparison.

Multiprocess

Models

Simulated -

Data

( with

and

without

CUSUMS)

YEAR

1a1i689

10

without
CUSUM's 32.8 21.3 34.7 40 4 15.6 22.5 16.5 263 209 19.3

with CUSUM's 26.9 17.9 23.9 29.1 13.9 28.0 14.9 32.9 22.4 17.5

It is easily observed from the above table,

that

the multiprocess model with

CUSUM's is to be preferred in both terms of performance and computer storage and it is interesting Moreover, to note that, time. running the order of preference among the available to be used for the

information is in the amount of accordance with models, model construction information . The intervention

model would be the best choice when all the

about the structural

The known. their times are occurrence changes and

CUSUM's be the best a preferred models with when multiprocess choice would second ,
known. is model Finally, if no information about a preferred model and the types of the CUSUM

occurrence is known, times their change and statistic be the candidate. would

the multiprocess

model without S

observations
N pp Opp O ON

I-

.., .
r
1.1

39

1-s

0) cl
N A ft

3 _

C w

ID

CD 3

IC, I' 1r

Cl) -

to

1 1i

fk'*2 F"` i

, , ..-k _: S ..

c
W 0%

m
D
G1 T m G) a -0

o C"
N (D '7 <

-4
_v 1m 1O 1 f'7 rr

o--, .. { s k
Bt :{ a'.. 't

. tCo

r'

1 Cn
1(n

> rn

0 > I
1 fD

rn IC! f

C)

a
f) CA 13 C r
>

ti N

o 7

1-a rn
CD r

(! 1

Irn Im
11-

'0 0.

rn (J1

Co

N O

.11

of
-" o

n.i

0 o

observations
N
O C) O

a 0 0

$1 1I 37 N N cD

1f7 -+

O
(D f!) r cD "D

C) 1'

i11

l11

.'rcN 0 `;
d c: z

-X)

m )Cl
C

M1

1 C1) 113 1 r_r-

W
CDN

.<
c'mI (D N

C)
ID a.

1>"
1 P*1 CZ) 1 t! )

CD,.

cc
a a1

co

a m C)
(A

Irn I3 1I-.

Irn ()

"

ON

0 7
-1 0 7

Ic ($)
IC 13

11 In

r+ N "' O co co 17 r O 7
10

01.

CO

N O

-878.3. THE PRESCRIPTION SERIES

8.3.1.

The Data :
medical data set giving the number of prescriptions according for five years

This is a monthly starting from March

1966.

the figures taken are normalised

to the number of the analysis of

effective working Harrison

days in the month.

This is to compare the result with used multiprocess models.

and Stevens who previously

The data is strongly

for which a constant observation seasonal It is observed that increased

variance is reasonably assumed. charges in June in December 1968 caused a major an outlier.

prescription epidemic

influenza level in the an and change However,

1970 'caused'

for the purpose of demonstrating Consequently other

multiprocess

modelling,

it is assumed that but, are dealt with using

these events are not known. automatically Modified initial together with

they are not anticipated changes. The data

unobserved

is analysed

NDB multiprocess

models without

and with the CUSUM statistic

using the same

information prior

given as is t (OOIDO)=( s ID0)-N(

22 0; 000

10 00 00.25

0 251,1

signifying a weak prior with no growth and no seasonal pattern.

In both cases only two

types of major disturbances are considered, namely , sharp trend changes and outliers. 8.3.2. NDBM Multiprocess Models - Known Observation Variance :

Here the routine model has a linear growth and full seasonal components with discount factors l = 0.95 and z = 0.975. The discount factors for the corresponding trend change model are (1,(3z)=(0.02,0.975 constant throughout so that.
()}=0.05 I"' f lr2=P{, tte+t(2 ")}=0.025 ,

) and the model transition probabilities are

7T3Oft+1(3)lAfj

; ta123

-88where the A`1 's are: observation Mother Outlier and Trend The

change models respectively. is as given in 7.4. ( i= 1,2,3

variance is estimated

as 0.36.

the model operation

Given the posterior the probability

model information

at time t-1 as {(m,; C; ); P, },

_i

that at time t-1 model i operated prior parametric

and that at time t model j operates is is (0! M1,MM,Dt; NDBM rules. the outlier R; jJ

'Ir, P; and the corresponding where the R; J 's are calculated

distribution

according

to the Modified and jumps,

In order to control

the response of the models to the outliers that Yz= Y3 . Point predictions

variance is chosen so Normal Loss function

are obtained

using the conjugate

introduced

by Lindley

(1976) and plotted

along with

the observations

in Fig. 6 with the

percentage one step ahead error forecasts. The first 12 observations seasonal pattern month 24 recognition. the model has recognised a minor

are used for trend and unobserved shift at

the increase of prescription .

charges at month 30 has caused a negative error between outlier and sharp as an in

and is followed -15% of trend changes is resolved. outlier.

by an error of -5% as the uncertainty

the influenza epidemic at month 48 is properly in terms of MAD, for the last four years,

identified

The performance

is tabulated

Section 8.3.3.

This is for a direct comparison

with the results obtained using multiprocess

CUSUM's. models with

one point
i NON OO

step

error

of No. of
NW Op O

prediction

prescriptions

(000

s)

I
001 Oc0 1 < G1 7 m Co "t 1D 0) (D

I-1
1C) 10> 11 1-. 4 1= 1 rn

o 7T N
N r N r

1 1-0
Irn 1 U)

o.
o

u
a -1

"+
**

lo 1z 1cr

o_ I6-" Irn o1 Cl)


w a
w O1 3 O 7 IC Ir I-C Ii7

ID In
In, IU1 C11

Co

ON 0

*`

13 IC ID In, Ir it Imo-. Iz c-I

-908.3.3. The CUSUM Multiprocessor Known Observation Variance: the

The same model specifications described in 8.3.2 are resumed here with
Backward CUSUM statistic with initial until values La=2.0 and a=0.3. the CUSUM interactively monitor Predictions

are based at which of 0.98

Mother the on

model performance

signals a change,

time all the three models start operating

until a threshold probability

is regained for the Mother model. during the Mother model performance other models run
in parallel as described in 7.5 as preparatory arrangements for comming changes. The

model performance CUSUNI monitors.

is summarised

in Fig. 7 together

with the upper and lower Backward and that

It can be seen that all the changes are properly identified is slightly better than that

the, performance tr't nearly by 2/3. without


e t' ... 1 . 'ri

of 8.3.2 while the process time is reduced with that of the multiprocess models

In order to compare the performance statistic,

the CUSUM

the MAD for the last four years is tabulated

below for the

two models.
.. 1 .

YEAR
CUSUM's without

1
0.63

2
0.52

3
1.79

4
0.47

.;

},.

with CUSUM's

0.57

0.46

0.7

0.34

a" "i5

No.
ONNW O O Cl

of

prescr

pt
Ul

ons (000 s)
O

I ti i J / \
rc o L -o -0 n)

CD CD

nn CC u) (n CC
33 1Q N A

i ** *
*

*i
001. o- 3 Cl) mCD (n < f mI -n n JT mm 0) aiCL m C)

-11 I c')
1ti i= i rn 1-o 1rn itn IC) im 1D 1
-4 i, 1o

it

-o o
W -D

rn icn

iz i itn irr I-

0 7 N

I co CL 0 *

-t-

4a,

it

1
im IO

-92-

8.4. ROAD

DEATH

SERIES

8.4.1.

The Data :
representing quarterly road deaths in U. K. for the present

This is a series of 38 observations

years 1960-1969. - It can be seen from Fig. 8 that the main observed discontinuities

in the series are the outlying observation in the first quarter of 1963 due to a cold icy
winter 4; preventing traffic using many roads and the trend ,a high variation change in 1967 due to the of the observation error

introduction variable

of breathalyser.

Generally

can be observed.

this suggests that

an on-line

estimation

of the observation

variance is more appropriate relation

than a fixed and global estimate.

Using longer data sets, a is evident. The death

between the number of road deaths and industrial

activity

rate rises during periods.

falls during boom is traffic the slump roads on and a period when more in a model and would lead to more reliable purposes and no attempt is

This effect could be accommodated However,

predictions.

the analysis here is for demonstration productions.

industrial deaths to and relate road made

8.4.2.

the NDB Multiprocess

Models with CUSUM

's :

In this analysis, Modified NDB multiprocess models are used with CUSUM's. The
alternative models assumed are: factors trend change, outlier 0.9. and seasonal changes. The main

discount the sets model

(31=0.8 and z = and reflect

These figures are lower than the data. The trend change

in the precision used ones discount factors

example

the quarterly

((3l, 2) = are

(0.84,p2). As before, variance

the discount

factors for the seasonal

the observation and model change

for the outlier

found using the model are

The defined in 7.5. variance variance for observation the models alternative control rule be it to is proportional assuming on-line the estimated model main of i. deaths, e. number of
V, =aE{Yt IDt_1}.

to the expected

10. X0 initial in 5.4 defined law and = values the with is power using estimated on-line a

-93-

,rjo =

20. The model transition

probabilities

are 0.7899,0.1,0.11

and 0.0001 with the

CUSUM The initial in 8.2.3. Lo 2.0 parameters where as order = and a=0.5 same the threshold return probability is 0.8. For this model:
j ='. 1.0.1.0.1' and C =diag{C1. C: }
010 and Cz= -1 00

and

Where

cl=

1I0 1,

00 A weak prior distribution was yet initially as


JJO

-1

( 2 IDO) a

N:

0 90 -50 30

; diag{1000,100,100013}1

Finally,

is the in Fig. 8 showing that the performance and presented results of a summary

dealt The CUSUM with successfully. the are the summary changes on prescribed all direction different the model operation and of periods of changes at the statistic shows breakdown points. model

deaths cu
C)

per

quarter
O. m

C)

41

11 ij
I= 1: 10 I fC

I 1
I "0 *

iF r *
a-

I I
:0

"TI 1

1. 07
1b

(
")

CD
N

CC)

_ CD
C'f

m CD
C7 y 3 N SI * *

01 cD eS b
0 N O CD d C)

1> O
ID I >1= I CA I X) 1 R1 1 C! )

%y % N . '

3 I0

Uf

D`

`\

0'

N O

eft

N r

N r

0
*

N co

N OJ

N /
W % I Jr

Uli N

95 -

8.5.

SUMMARY

:
of applications of the Normal Bayesian Models based on are

In this chapter a number the principle of parsimony

through

discounting

and Management an artificially

by Exception

presented. intervention

The first application

is made by examining

generated data using In 8.2.3 the same In Section

discontinuity. the points of at

This is presented in 8.2.2. models with and without prescription

data is analysed using both multiprocess

C'CS AI's.

3, the models are applied to a real data concerning observation quarterly -appropriate variance is assumed.

charges where a constant variance. in U. K..

For an on-line estimation

of the observation of road deaths

data is chosen in 4 concerning -Section

the number

figures are presented in each case to summarise the model performance.

96 -

CHAPTER

NINE

DISCUSSION
LL _ "": ..+ . mau ;i

AND FURTHER

RESEARCH
A.

The Bayesian framework statistical, supervision techniques. and interaction

in statistics, and

is most promising predicting

and logical among existing outcomes. it requires

In, modelling

future

of the modellers to accommodate.

on-line,

any environmental

or external effects that are not, anticipated. A The pioneering' with work 'oP Harrison and Stevensi 1976) sequentially has provided applied

statisticians static

a base for analysing predictor models.

series arriving The DLM's Smith

with time,

away from of successful

and limiting

have seen a number

applications

by Harrison

Stevens(1975), and

(1983) and Smith and West(1983).

However, as pointed out in Chapter 3 of this study,

both the observation and state and not parsimonious in the

invariant not scale are ambiguous covariance matrices , Harrison Roberts and sense of (1984).

These problems have caused practitioners

diverted in difficulties the them to the use of other problem and estimation considerable less constructive models. The principle aim of this study is to replace the state error variance by a small
number within invariant discount of discount the Bayesian factors. This gives models which enjoy the principle Discount factors of parsimony they are

framework.

can more easily be set,

linear transformations under principle are generally

of scale, not ambiguous, and robust.

and models based on the is also

parsimonious

The discount principle

aimed at demonstrating applications, After providing

and publicising the study

the potential

of Bayesian modelling certain

in practical

in particular,

of processes with within

types of discontinuity.

the basic principle its efficiency

of discounting through

the classical point estimation with DOUBTS and ARLMA are pointed replacing the

framework

and testing

comparison

models applications, out. The discount

the drawbacks principle is then

and limitations

of the Bayesian DLM's NWBM's

carried out to construct

-97DLM's. This class of models has many interesting

subclasses and many existing

and well forecast are more

known classical models are retained in the sense that they have the same limiting functions extensive. as special constant NDBM's. However the Bayesian facilities

Two methods

are given for on-line for processes with

estimation

of the observation variations

variance.

This is essential especially

high stochastic

and those that

exhibit sudden changes and outliers. The controlling

In these cases multiprocess ,

models are advisable. governs

for the observation rule ratios.

variances is advisable since the variance

the model likelihood in the efficiency,

The use of CUSULI statistic storage and running

provides an overall improvement The NDBMM's allow a

computer

time problems.

simple and easy way of communication Almost all types of major disturbances

and intervention that

in phases of major disturbances. in time series processes are series of that principle kind are

are common

in the artificially present

data set. generated Chatfield see

Even less disturbed (1978). The discount

by statisticians, often avoided intervention different with

has simplified All types

components,

as the example in 8.2.2 demonstrates.

detected successfully. of change are information multiprocess multiprocess disturbances on major models. Clearly,

However, is often

in the analysis of real data sets advance in which case it is useful to adopt the resulting analysis from the

missing,

be is to expected, as

is less from Fig. 4 intervention in those than the successful models shown no information on the disturbances is fed

models. Apart from the missing observations, into the multiprocess models with models. More efficient

results are obtained

using the multiprocess by assuming the to

CUSUM's

little a where

more information

is provided

existence of a particular

model representing

the process. The models are also applied

demonstrate These data to sets. are used real discontinuity types of certain observation noise is present. occurring

the efficiency of the models in dealing with

when the data is fairly stable and also when high The

This causes a delay in the task of recognising changes. dealt with are promising. statistic In particular,

from all applications results for, called are models

when multiprocess

the CUSUM

is recommended for efficiency and economy influences and misspecifications when major

Modified NDBM's the and

protect component

-98disturbances are present. More applications can be found in Ameen and Harrison (1983 a b, c). In all cases the underlying model parameters have been given physical meanings are provided
It can

and simple transformations


practical' applications

to transfer information
be argued that the

from or to other
amount of further of effort

of interest.

developments

exploitation and

in these models is proportional models.

to the amount

spent in developing

the existing and less profound

The following

lists a number

of suggestions for further research: iThe models deal with Normality Filters processes defined only on the entire real line with the

assumptions so that successive estimates are obtained using Kalman However, in many real life problems , processes

recurrence relations.

are well defined on bounded sensibly, in which case,

sample spaces and do not cover the real line estimates outside their

these models may provide

feasible region. exploitation.

These points

seems to be the most promising

and demands

Smith (1979) and Souza and Harrison (1979) have extended the DLM's to include non Normal Steady State models. These ideas are combined with discount principle, models. iiAmeen (1983 b), the

to provide Generalised Bayesian Entropy .. `

The forecast functions are specified using the design and transition matrices. It is important to develop methods that provide more automation in model identification and a proper Bayesian on-line parameter learning procedure will
improve the performance. Some considerable success has been achieved by

Migon and Harrison (1983) considering non linearity and non Normality of the
processes. iii-The discount choice of factors is left to the modellers and work needs to be The generalised EWR and

done in developing methods for on-line estimation.

have in 6 ARIMA limiting restricted parameters as chapter the models obtained

-99Godolphin by out and Stone transition (1980) for the DLM's matrices. forecast Also, with in which lower explode they

pointed

suggest the use of singular factors, providing the uncertainty

discount rapidly

of lead time

distributions

less reliable long term predictions.

iv-

Generalising the models to include more correlation structures will provide a


wider range of applications.

v-

The

limiting

results

obtained

are

mostly

based

on

specified

canonical

representations and more general results are possible. viIn a general context. interest more applications when the of the theory in different process is subject to fields of

are needed especially as almost always

a dynamic

development

is the case. The NDBM's

replace the popular in the analysis.

classical regression models and provide

an overall improvement

Some applications on this topic are given by Harrison and Johnston(1983).

APPENDIX
.

U.S.AIR PASSENGERSDATA
YEAR JAN FEB MAR APR MAY
JUN JUL

951 145 150 178 163 172


178' 199

952 171 '180 193 181 183


218 230

953 196 196 236 235 229


'243 264

954 204 188 235 2.7 234


264 302

955 242 233 267 269 270


315 364

956 284 277 317 313 318


374 413

957 315 301 356 348 355


422 465

958 340 318 362 348 363


435 491

959 360 342 406 396 420


472 548

9 417 391 419 461 472


535 622

AUG SEP OCT


NOV DEC

199 184 162


146 '166

242 209 191


172 194'

272 237 211


180 201

293 259 229


203 ; 229

347 312 274


2.37 278

405 355 306


271 306

467 404 347


305 336

505 404 359


310 337

559 463 407


362 405

606 508 461


390 432

SIMULATED

DATA

JAN FEB

189 108

270 261

392 318

221 179

318 317

411 404

273 267

233 192

377 340

449 444

MAR
APR MAY JUN

93
77 42 52

192
201 166 150

335
283 276 253

185
114 122 80

269
234 198 185

347
267 238 216

202
176 146 84

163
106 59 69

246
202 185 175

377
338

JUL
AUG SEP OCT NOV DEC

67
75 155 236 320 270

193
244 255 300 343 382

282
86 387 388 501 482

143
148 205 292 307 331

239
237 314 343 409 408

187
193 242 269 272 306

108
132 157 187 206 207

108
130 201 268 319 375

222
254 338 402 478 467 373 433 483 567 617

101 -

PRESCRIPTION DATA

YEAR

JAN FEB MAR


APR

1966
23.1
21.4

1967
23.9

1968
25.9

1969
23.1

-- 24.3 _.
22.3 23.6
22.3

1970

23.3 23.3
21.8

24.4 25.2
23.6

22.2 23.8
22.4

MAY JUN JUL AUG SEP


OCT

21.1 20.8 19.8 18.8 20.2


21.9

22.7 22.4 20.8 19.6 -1.4


22.7

23.5 20.5 19.0 18.1 19.9


21.3

21.3 21.3 19.8 18.7 20.8


21.5

22.6 21.7 20.5 19.4 21.4


22.3

NOV DEC
. , ,

22.8 23.1

23.8 26.6

21.7 23.4

21.0 28.6

22.4 23.7

ROAD DEATH DATA

960
1 2 3 4 486 514 614 710 1 1

961---- 962
516 546 587 653 501 499 587 650

963
400 547 619 742

964
570 582 664 790

965
592 648 660 751

966
578 604 658 822 610 542 659 629

96L L
518 499 603 650 518 541

102 -

REFFERENCES

[1]

AGNEW,

R. A. (1982). Econometric

forecasting

via discounted

least squares. Naval

Research Logistics Quart., [2] AMEEN.

Vol. 29, No. '3,291-302. to discussion of the paper by E. T. Jaynes. Valencia, Spain, Sept. 1983. models and forecasting.

J. R. M4. (1983 a). Contribution

Second International [3] AMEEN. Warwick (! AMIEEN.

Meeting on Bayesian Statistics,

J. R. M. (1983 b). Generalised Res. Rep. 37. J. R. M. (1983 c). Contribution

Bayesian entropy

to discussion of the paper by M. West.

Second International Meeting on Bayesian Statistics, Valencia, Spain, Sept. 1983.


[5] AMEEN, J. R. M. and HARRISON, (to appear). P. J. (1983 a). Discount weighted estimation. J.

Forecasting. of

[6]

ARMEEN, J. R. " M. and HARRISON,

P. J. (1983 b). Normal discount Bayesian

International Meeting Bayesian Statistics, for Invited the on second paper models. Valencia, Spain, Sept. 1983. (7j AMEEN, J. R. M. and HARRISON, P. J. (1983 c). Discount Bayesian multiprocess modelling with CUSUM's. Proceedings of International Time Series Conference,

Nottingham, (O. D. Anderson ed.), 1983. North Holland. (8J ANDERSON, 0. D. (1977). A commentary on "A Series ". Time survey of

International Statist. Rev., 45,273-297.


[9] ASTROM, K. J. (1970). Introduction to Stochastic Control Theory. Academic Press,

Inc., New York.


[101 BARNARD, G. A. (1959). Control charts and stochastic processes. J. R. Statist.

Soc. B, 21 239-270. , [11] BISSELL, A. F. (1969). Cusum techniques for quality control ( with discussion) . d.

103-

R. Statist. Soc. C, 18,1-30. [121 BOX, G. E. P. and JENKINS, G. M. (1970). Time Series Analysis, Forecasting and Control: San Fransisco, Holden Day.
[13J BOX, G. E. P. and TIAO, G. C. (1975). Intervention analysis with applications to

Statist. J. Amer. economic and environmental problems.

Ass.. 70.70-79.

[141 BROWN, R. G. (1963). Smoothing, Forecasting and Control. San Fransisco: Holden
Day. [151 BROWN. R. C. (1983). The balance of effort in forecasting. J. of Forecasting. Vol. 1,

No. 1.49-53. [161 CANTARELIS, N. and JOHNSTON, F. R. (1983). On-Line variance estimation for

the steady state Bayesian forecasting model. J. Time Series An., Vol. 3, No. 4.225-

234. [171 CHATFIELD, C. (1978). The Holt-Winters forecasting procedure. App. Statist., 27,

No. 3,264- 279. [18] De GROOT, M. H. (1970). Optimal Statistical Decisions. New York, Mc Craw-Hill. [19] DOBBIE, J. M. (1963). Forecasting predict trends by exponential smoothing. Opns. Res. 11,908-918. [20] EWAN, W. D. (1963). When and how to use Cusum charts. Technometrics, 5,1-22. [211 EWAN, W. D. and KEMP, K. W. (1960). Sampling inspection of continuous

between successive results. Biometrika, 47,363processes with no autocorrelation 371. [221 GATHERCOLE, R. B. and SMITH, J. Q. (1983). A dynamic forecasting model for a
time series. Proceedings of International Time Series

general class of discontinuous

Conference, Nottingham, ( O. D. Anderson, ed.). North Holland. [23] GEBEL, A. (ed.). (1974). Applied Optimal Estimation. MIT Press, Cambridge. [241 GODOLPHIN, E. J. and HARRISON, P. J. (1975). Equivalent theorems for

104-

Soc. Statist. B, J. R. 37,205-215. polynomial projecting predictors.


[25] GODOLPHIN, polynomial E. J. and STONE, J. M. (1980). On the structural filter. representation for

projecting

Kalman based the on models

J. R. Statist.

Soc. B, 42,

35-46. [261 HARRISON, P. J. (1965). Short-term sales forecasting. J. R. Statist. Soc. C. (Appl. Statist.
[27]

15,102-139.
P.. 1. (1967). Exponential smoothing and short-term sales forerasting.

HaRRISON,

Man. Sci., 13,821-842.


;281 HARRISON. P. J. and AF RA\[. dynamic M. (1983). Generalised exponentially eighted

regression and parsimonious

linear models. International Holland.

Conference held at

Valencia, ( O. D. Anderson ed. ), North

[29] HARRISON, P. J. and DAVIES, O. L. (1964). The use of Cumulative sum ( Cusum, ) techniques for the control of routine forecasts of produce demand. Oper. Res.(J. O. R. S. A. ) 12,325-333. [30] HARRISON, P. J. and JOHNSTON, F. R. (1983). A regression method with non , ( Submitted ). J. O. Rep. R. Warwick Res. 35. to of stationary parameters. ]31] HARRISON, P. J., LEONARD, multivariate T. and GAZARD, T. N. (1977). An application of Res. and

hierarchical forecasting. Paper to R. Statist. Soc. Ind. Appl.

Section Conference, Manchester. Also Warwick Res. Rep. 15.

[32] HARRISON, P. J. and PEARCE,S. F. (1972). The use of trend curves as an aid to
Manage., 2 149-170. Ind. Mark. forecasting. market ,
[331 HARRISON, P. J. and SCOTT, F. A. (1965). A development system for use in short

term sales forecasting

investigations.

Paper to Ann. Conf. O. R. Soc. and Special O.

R. Soc. Meeting. Also Warwick Res. Rep. No. 26.


[341 HARRISON, P. J. and SMITH, J. Q. (1980). Discontinuity, on Bayesian Statistics, decision and conflict. Valencia, Spain ( ed.

Proc. of the first International Birnardo et al. ) May 1979.

Meeting

ion "
[35] HARRISON, P. J. and STEVENS, C. F. (1971). A Bayesian approach to short-term forecasting. Oper. Res. Quart., 22,341-362.

[36] HARRISON, P. J. and STEVENS, C. F. (1975). Bayesian forecasting in action :


Case studies. Warwick Res. Rep. 14.
[37] HARRISON. P. J. and STEVENS, C. F. (1976). Bayesian forecasting ( with

discussion ). J. R. Statist. Soc. B. 38.205-247.

[38] HENDERSON, C. R.. KEMIPTHORNE,


(1959). The estimation 15,192- 218. of environmental

0., SEARL, S. R. and KROSIGK, C. M.


and genetic trends subject to culling.

Biometrika,

[39]HOLT,

C. C. (1957). Forecasting seasonals and trends by exponentially weighted


Res. Memo. No. 32. (NONR 760(01)).

Inst. Technol. Carnegie moving averages.

[40] JAYNES, E. T. (1983). Highly informative

priors: the effect of multiplicity

of

inference. Invited paper for the Second International Meeting on Bayesian Statistics, Valencia, Spain, Sept. 1983. [41] KALMAN, R. E. (1963). New methods in Wiener filtering theory. In Proceedings of

Application Random Function Engineering Theory Symposium first of the on and ) F. KOZIN. New York : Wiely. BOGDANOFF ( L. J. Probability. and eds.
[42] KALMAN, R. C. and BUCY, R. S. (1961). New results in linear filtering theory. J.

Basic Eng., 83,95-108.

[431 KENDALL, M. STUART, A. and ORD, J. K. (1983). The Advanced Theory of ,


Statistics. Vol. 3,4-th ed., Charles Griffin & Company Limited. [44] LEE, T. T. (1980). A direct approach to identify the noise covariances of the Kalman filtering. IEEE. Trans. Automatic Control, Vol. 1, AC-25,841-842. [451 LINDLEY, D. V. (1976). A class of utility functions. Ann. Statist.. 4,1.10.

[46] LINDLEY, D. V. and SMITH, A. F. M. (1972). Bayesestimatesfor linear models.J.


R. Statist. Soc. B, 34 1-41. ,

tos -

[47] MAKOV;

UDI E.- (1983). 'Approximate

Bayesian procedures for dynamic linear

models in the presenceof jumps. The Statistician 32,207-213. [48] : MAYBECK, P. S. (1982). Stochastic Models. Estimation and Control. Vol. 2,

Academic Press, New York.


[49J Mc KENZIE. 24,131-140. [50[ MIGON, ., forecasting H. 'S. and HARRISON, to television P. J. (1983). An application Contributed of non-linear Bayesian E. An analysis of general exponential -(1976). smoothing. Oper. Res.

advertising.

paper to the Second International

Meeting on Bayesian Statistics,

Valencia, Spain. Sept. 1983.

[51] MONTGOMERY,
Analysis, [521 MUTH,

D. C. and JOHNSON, L. A. (1976). Forecasting and Time Series


New York. with polynomials by a generalised recursive

McGrow-Hill,

(1981). J. E. "Forecasting `

Canada. Quebec City, Forecasting. Symp. First Inter. Paper to on updating methd. [53] O'HAGAN, A. (1979). On outlier rejection phenomenon in Bayes inference. J. R.

Statist. Soc. B, 41,358-367. [54]' PAGE, E. S. (1954). Continuous inspection schemes. Biometrika, 41 , 100-115. 155 PRIESTLEY, M. B. (1980). State-dependent models: Ageneral approach to non-

linear time series analysis. Time Series Analysis, 1 ,47-71.

(561 ROBERTS, S. A. and HARRISON, P. J. (1984). Parsimonious modelling and


Res., Oper. J. 16,365-377. Eur. time forecasting of seasonal series.

[571 SMITH, A. F. M. and COOK, D. G. (1980). Straight lines with a change point :A Statist. 29,180-189. data Applied transplant Bayesian analysis of some renal .

An (1983). Monitoring M. WEST, transplants: M. F. renal (581 SMITH, A. and


application [59]SMITH; <': of the multiprocess Kalman filter. Biometrics, Statistics 39 No. 4 867-878. , , Relating to Discontinuous

J. Q. (1977). Problems

in Bayesian

Phenomenon, Catastrophe

Theory and Forecasting.

Ph. D. Thesis, Univ. of Warwick.

. F_", >>

107 ;,

[601 SMITH, J. Q. (1979). A generalisation of Bayesian steady forecasting model. J. R. Statist. soc. B, 41 378-387. ,
[611 SMITH, J. Q. (1983). Forecasting 32,109-115. accident claims for an assurance company. The

Statistician,

[62] SMITH, J. Q. HARRISON, P. J. and ZEEMAN, E. C. (1981). The analysis of some ,


discontinuous [63] decision processes. Eur. J. Oper. Res. 7,30-43. Approach to Forecasting. Ph. D. Thesis,

SOUZA, R. C. (1978). A Bayesian Entropy Univ. of Warwick.

[6.1] SOUZA. R. C. (1981). A Bayesian entropy

approach

to forecasting Holland ,

: The multistate

( Houston Tex. ) North Series Analysis Time In model.

535-542.

[651 SOUZA, R. C. and HARRISON, P. J. (1979). Steady state system forecasting :A Bayesian entropy approach. Warwick Res. Rep. 33. [66] STEVENS, C. F. (1974). On the variability Res. Quart., 25,411-420. of demand for families of items. Oper.

[671 TRIGG, D. W. and LEACH, G. A. (1967). Exponential smoothing with adaptive


Quart., Res. Oper. response rate. 18,53-64.

[68] VAN DOBBEN DE BRUYN, C. S. (1968). Cumulative


Practice. London : Griffin.

Sum Tests: Theory and

[69] WALD, A. (1947).SequentialAnalysis. John Wiley & Sons. New York.


[70] WEST, M. (1982). Aspects of recursive Bayesian Estimation. Ph. D. Thesis, Univ. of Nottingham. [71] WHITTLE, P. (1965). Recursive relations for predictors of non-stationary processes. J. R. Statist. Soc. B, 27,523-532. [72) WHITTLE, P. (1969). A view of stochastic control theory. J. R. Statist. 320-334.
[731 WINTERS, P. R. (1960). Forecasting sales by exponentially weighted moving

Soc. A, 132,

108'

Sci., Man. 6,324. averages., [74] WOODWARD, R. H. and GOLDSMITH, P. L. (1964). Cumulative Sum Techniques.

ICI Monograph No. 3, Oliver & Boyd. Edinburgh.


[75] WOLD, Wiksell, [76] YOUNG, H. (1954). A Study Stockholm in the Analysis 1938). approaches to time series analysis. But. Inst. of Stationary Time Series. Almqrist &

( first edition

P. C. (1971).

Recursive 10.209-224.

Applications. and -Maths. [77] ZELLNER,

A. (1971). An Introduction

to Bayesian Inference in Economics. Wiley.

f, y

You might also like