Time Series Forecasting With Multilayer Perceptrons and Elmen Neural Neworks
Time Series Forecasting With Multilayer Perceptrons and Elmen Neural Neworks
Time Series Forecasting With Multilayer Perceptrons and Elmen Neural Neworks
ABSTRACT
Multilayer perceptron network (MLP), FIR neural network and Elman neural network were compared in
four different time series prediction tasks. Time series include load in an electric network series, fluctua-
tions in a far-infrared laser series, numerically generated series and behaviour of sunspots series. FIR neu-
ral network was trained with temporal backpropagation learning algorithm. Results show that the
efficiency of the learning algorithm is more important factor than the network model used. Elman network
models load in an electric network series better than MLP network and in other prediction tasks it performs
similar to MLP network. FIR network performs adequately but not as good as Elman network.
1. Introduction
In this paper we study neural network architectures that are capable of learning temporal features in data in
time series prediction. The feedforward multilayer perceptron (MLP) network is used frequently in time
series prediction. MLP network, however, has the major limitation that it can only learn an input - output
mapping which is static [5]. Thus it can be used to perform a nonlinear prediction of a stationary time
series. A time series is said to be stationary when its statistics do not change with time. In many real world
problems, however, the time when certain feature in the data appears contains important information. More
specifically, the interpretation of a feature in data may depend strongly on the earlier features and the time
they appeared. A common example of this phenomenon is speech.
A conventional way of modelling stationary time series with MLP networks is presented in Fig. 1. The
input vector to the network consists of past samples of the time series as follows: x = [x(n-1), x(n-2),...,
x(n-p)]T. Here parameter p is the prediction order. The scalar output y(n) of the MLP network equals one-
step prediction y ( n ) = x̂ ( n ) . The actual value x(n) of the series represents the desired output. The net-
work tries to model time by giving it a spatial representation. It is not, however, able to deal with time-var-
ying sequences. A better solution is to let time have an effect on the networks response rather than
represent time by additional input dimension. This can be achieved when the network has dynamic proper-
ties such that it will respond to temporal sequences.
In FIR (Finite impulse response) neural network each neuron is extended to be able to process temporal
features by replacing synapse weights by finite impulse response filters. A general structure of a FIR filter
is shown in Fig. 2. A multilayer feedforward network is then built using these neurons as shown in Fig. 3.
Networks input layer consists of FIR filters feeding the data into neurons in hidden layer. Network may
have one or several hidden layers. Output layer consists of neurons which receive their inputs from previ-
ous hidden layer. At each time increment, one new value is fed to input filters, and output neuron produces
one scalar value. In effect this structure has the same functional properties as Time Delay Neural Network
(TDNN) [6]. However, FIR network is more clearly interpreted as a vectoral and temporal extension of
MLP. This interpretation also leads to the temporal backpropagation learning algorithm, which is used to
train the network [7].
x(n-1)
x(n-2) y(n)
x(n-p)
Fig. 1: Multilayer perceptron network used as one-step predictor of a time series.
+ + + y(k)
Fig. 2: Finite impulse response filter.
x(n-1) y(n)
FIR unit
Fig. 3: FIR neural network with one hidden layer, one input and one output.
y(n)
x(n-1)
Fig. 4: Elman neural network with one input and one output.
Another training algorithm that is guaranteed to converge to a solution has been proposed in [1]. FIR net-
work is stable and it has a high resolution, low depth memory. In effect the network is unable to learn tem-
poral features that are longer than its filter lengths summed together. Consequently, selection of the lengths
of FIR filters is quite critical in achieving good prediction performance.
In Elman network positive feedback is used to construct memory in the network as shown in Fig. 4 [4].
The network has input, hidden and output layers. Special units called context units save previous output
values of hidden layer neurons. Context unit values are then fed back fully connected to hidden layer neu-
rons and thus they serve as additional inputs to the network. Networks output layer values are not fed back
to network. The Elman network has a high depth, low resolution memory, since the context units keep
exponentially decreasing trace of past hidden neuron output values. The difference to FIR network is that
the memory in the network has no rigid limit, and the fact that the information concerning previous data is
preserved with better resolution than more distant data in the past.
Simulations were done with MLP network that has one hidden layer and one nonlinear output neuron,
Elman network that has one linear output neuron, and FIR neural network that has one hidden layer and
one nonlinear output neuron. Different combinations of prediction order and number of neurons in hidden
layer were tried in effort to find the architecture that would model the data most effectively. For FIR net-
works also the length of FIR filters feeding output neuron was a free parameter. MATLAB neural network
toolbox training functions trainlm and trainelm were used for training MLP and Elman networks, respec-
tively [3]. For FIR networks temporal backpropagation algorithm [7] was implemented with MATLAB.
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
−1 −1
0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 1400 1600 1800 2000
Fig. 5: Load in an electrical net (series 1) and fluctuations in a far-infrared laser (series 2).
1 1
0.8 0.9
0.6 0.8
0.4 0.7
0.2 0.6
0 0.5
−0.2 0.4
−0.4 0.3
−0.6 0.2
−0.8 0.1
−1 0
0 1000 2000 3000 4000 5000 6000 0 50 100 150 200 250
Fig. 6: Numerically generated series (series 3) and behaviour of sunspots (series 4).
Normalized mean square error (NMSE) was used as performance measure. In Eq. (1) σ2 is the variance of
the desired outputs di and N is the number of patterns.
N
1 2
NMSE = ----------- ∑ x – d (1)
2 i i
Nσ i = 1
Table 2. shows architectures which gave lowest NMSE for test data for each time series and for each neu-
ral network model. The number of inputs equals to prediction order p for MLP and Elman networks. For
FIR networks the lengths of the FIR filters are shown. For Elman networks number of context units equals
number of neurons in hidden layer.
The Elman network performs best in time series 1, which has a low-frequency trend. The network predicts
the slope of the trend in the end of the testing data more accurately than MLP network. In other prediction
tasks the Elman network performs nearly as good as MLP.
Table 2: Architectures which gave the lowest NMSE for test set
for each time series and neural network model.
Time Series Network Number of Neurons in NMSE for NMSE for
Inputs hidden layer training set test set
Since all prediction tasks were one-step predictions, MLP network performs well. The learning algorithm
used is quite important factor in networks performance. In our present studies we are considering multistep
prediction tasks. Preliminary results indicate that in multistep prediction tasks the temporal extension in
FIR and Elman neural networks allows these architectures to perform better than MLP network.
4. Conclusions
The results show that the efficiency of the learning algorithm is more important factor than the neural net-
work model used. The learning algorithms used with Elman and FIR neural networks were unable to fully
implement the richer structure of these networks. Training of Elman networks was three to ten times
slower than for MLP depending on the training data size and the number of network parameters. For FIR
network training time was five to twenty times longer than for MLP. Elman network models series 1 best,
and its performance in other tasks is similar as MLP networks performance. For FIR networks trained with
temporal backpropagation adequate performance was reached for two prediction tasks.
Acknowledgements
This study was financially supported by the Academy of Finland.
References
[1] A.D. Back and A.C. Tsoi, “FIR and IIR synapses, a new neural network architecture for time series
modeling”, Neural Computation, Vol. 3, pp. 375-385, 1991.
[2] B. de Vries and J. Principe, “The gamma model: A new neural net model for temporal processing”,
Neural Networks, Vol. 5, pp. 565-576, 1992.
[3] H. Demuth and M. Beale, Neural Network Toolbox for Use with MATLAB, The MathWorks Inc.,
April 1993.
[4] J.L. Elman, “Finding structure in time”, Cognitive Science, Vol. 14, pp. 179-211, 1990.
[5] S. Haykin, Neural networks: A comprehensive foundation, Macmillan College Publishing Com-
pany, New York, 1994.
[6] K. Lang, A. Waibel and G. Hinton, “A time-delay neural network architecture for isolated word rec-
ognition”, Neural Networks, Vol. 3, pp. 23-43, 1990.
[7] E. Wan, Finite impulse response neural networks with applications in time series prediction, Ph.D.
Dissertation, Stanford University, November 1993.
[8] A. Weigend and G. Gershenfeld (eds.), Time series prediction: Forecasting the future and under-
standing the past, Addison-Wesley, Reading, 1994.
[9] A. Weigend, B. Huberman and D. Rumelhart, “Predicting the future: A connectionist approach”, Int.
Journal of Neural Systems, Vol. 1, pp. 193-209, 1990.