Review Papaer

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Adaptive Neural Networks and Adaptive Neuro-Fuzzy Inference System for Short-Term

Load Forecasting: A Review and Concept


MR. VAIBHAV S. TELRANDHE1, PROF. V. R. INGALE 2
Department of Electronics Engineering
S.D. College of Engineering1,
B. D. College of Engineering2 Sewagram,Wardha, Maharashtra INDIA
[email protected], [email protected]
Abstract—Load forecasting has become in turned into a commodity to be sold and
recent years one of the major areas of bought at market prices. Since the load
research in electrical engineering, and forecasts play a crucial role in the
most traditional forecasting models and composition of these prices, they have
artificial intelligence techniques have become vital for the supply industry.
been tried out in this task. Adaptive Load forecasting is however a difficult task.
neural networks (NNs) have lately First, because the load series is complex and
received much attention, and a great exhibits several levels of seasonality: the
number of papers have reported load at a given hour is dependent not only
successful experiments and practical tests on the load at the previous hour, but also on
with them. Nevertheless, some authors the load at the same hour on the previous
remain skeptical, and believe that the day, and on the load at the same hour on the
advantages of using NNs in forecasting day with the same denomination in the
have not been systematically proved yet. previous week. Secondly, because there are
In order to investigate the reasons for many important exogenous variables that
such skepticism, this review examines a must be considered, specially weather-
collection of papers (published between related variables. It is relatively easy to get
1991 and 1999) that report the application forecasts with about 10% mean absolute
of NNs to short-term load forecasting. percent error (MAPE); however, the costs of
Our aim is to help to clarify the issue, by the error are Manuscript received August 24,
critically evaluating the ways in which the 1999. H. S. Hippert was supported by a
NNs proposed in these papers were Ph.D. scholarship granted by the Brazilian
designed and tested. Foundation for the Co-ordination of Higher
Index Terms—Load forecasting, Education and Graduate Training (PICDT-
multilayer perceptrons, neural network C APES). H. S. Hippert is with the
applications, neural networks, overfitting. Department of Statistics, Universidade
Federal de Juiz de Fora, Brazil. C. E.
I. INTRODUCTION Pedriera and R. C. Souza are with the
THE FORECASTING of electricity demand Department of Electrical Engineering,
has become one of the major research fields Pontificia Universidade Catolica do Rio De
in electrical engineering. The supply Janeiro, Brazil. Publisher Item Identifier S
industry requires forecasts with lead times 0885-8950(01)02306-9.so high that research
that range from the short term (a few that could help reducing it in a few percent
minutes, hours, or days ahead) to the long points would be amply justified. A often
term (up to 20 years ahead). Short-term quoted estimate in [10] suggests that an
forecasts, in particular, have become increase of 1% in the forecasting error
increasingly important since the rise of the would imply (in 1984) in a £10 million
competitive energy markets. Many countries increase in operating costs per year (recent
have recently privatized and deregulated studies on the economic aspects of load
their power systems, and electricity has been forecasting are [9], [26] ,[41], [76]). Most
forecasting models and methods have their application to the load forecasting
already been tried out on load forecasting, problem were published in the late 1980’s
with varying degrees of success. They may and early 1990’s [21]. Since then, the
be classified as time series (univariate) number of publications has been growing
models, in which the load is modeled as a steadily. Judging from the number of papers,
function of its past observed values, and NN-based forecasting systems have not
causal models, in which the load is modeled turned into a “passing fad,” as it was feared
as a function of some exogenous factors, they might [12]. It seems that they have
specially weather and social variables. been well accepted in practice, and that they
Some models of the first class suggested in are used by many utilities [50].
recent papers are multiplicative Nevertheless, the reports on the performance
autoregressive models [60], dynamic linear of NNs in forecasting have not entirely
[27] or nonlinear [80] models, threshold convinced the researchers in this area, and
autoregressive models [43], and methods the skepticism may be partly justified.
based on Kalman filtering [46], [69], [81]. Recent reviews and textbooks on forecasting
Some of the second class are Box and argue that there is little systematic evidence
Jenkins transfer functions [34], [47], as yet that NNs might outperform standard
ARMAX models [91], [92], optimization forecasting methods [13], [58]. Reviews of
techniques [94], nonparametric regression NN-based forecasting systems have
[11], structural models [36], and curve- concluded that much work still needs to be
fitting procedures [85]. Despite this large done before they are accepted as established
number of alternatives, however, the most forecasting techniques [33],
popular causal models are still the linear [38], [96], and that they are promising, but
regression ones [30], [35], [67], [74], [82], that “a significant portion of the NN
and the models that decompose the load, research in forecasting and prediction lacks
usually into basic and weatherdependent validity” [1]. How could this skeptical
components [10], [31], [45], [69]. These attitude adopted by some experts be
models are attractive because some physical reconciled with the apparent success
interpretation may be attached to their enjoyed by the NNs in load forecasting? In
components, allowing engineers and system order to investigate this matter we reviewed
operators to understand their behavior. 40 papers that reported the application of
However, they are basically linear devices, NNs to short-term load forecasting. These
and the load series they try to explain are papers were selected from those published in
known to be distinctly nonlinear functions of the leading journals in electrical engineering
the exogenous variables. between 1991 and 1999 (conferences
In recent times, much research has been proceedings were not considered).
carried out on the application of artificial We found that, on the whole, two major
intelligence techniques to the load shortcomings detract from the credibility of
forecasting problem. Expert systems have the results. First, most of the papers
been tried out [39], [73], and compared to proposed NN architectures that seemed to be
traditional methods [62]. Fuzzy inference too large for the data samples they intended
[64] and fuzzy-neural models [7], [65] have to model, i.e., there seemed to be too many
also been tried out. However, the models parameters to be estimated from
that have received the largest share of comparatively too few data points. These
attention are undoubtedly the artificial NNs apparently over fitted their data and
neural networks (NNs). The firstreports on one should, in principle, expect them to
yield poor out-of-sample forecasts. function , or bounded sigmoid (s-shaped)
Secondly, in most papers the models were functions, as the logistic one . The neurons
not systematically tested, and the results of are organized in a way that defines the
the tests were not always presented in an network architecture. The one we shall be
entirely satisfactory manner. This paper is most concerned with in this paper is the
organized as follows. In Section II we give a multilayer perceptron (MLP) type, in which
short introduction to NN modeling. In the Fig. 1. An artificial neuron. Fig. 2. A
Section III we briefly compare the two-layer feed-forward neural network.
approaches taken by each paper to the load neurons are organized in layers. The neurons
forecasting problem, and we outline the in each layer may share the same inputs, but
main features of the multilayer perceptrons are not connected to each other. If the
they proposed. In Section IV we summarize architecture is feed-forward, the outputs of
the choices and procedures reported in each one layer are used as the inputs to the
paper for data pre-processing, NN design, following layer. The layers between the
implementation and validation. In Section V input nodes and the output layer are called
we focus on the problems of overfitting and the hidden layers. Fig. 2 shows an example
model validation of the proposed models. In of a network with four input nodes, two
Section VI we briefly review papers that layers (one of which is hidden), and two
suggest NN architectures other than the output neurons. The parameters of this
multilayer perceptron, or some ways to network are the weight matrix (containing
combine them to linear methods. Section the weights that connect the neuron to the
VII is the conclusion. II. A SHORT input), the weight matrix , and the bias
INTRODUCTION TO NEURAL vector (the bias connections have not been
NETWORKS In this section we provide a represented in the figure). If logistic
short introduction to neural networks (a functions are used for the activation of the
complete treatment of the subject may be hidden layer, and linear functions used for
found in [8], [37]). Artificial neural the output layer, this network is equivalent
networks are mathematical tools originally to the model which shows how complex and
inspired by the way the human brain flexible even a small network can be. The
processes information. Their basic unit is the estimation of the parameters is called the
artificial neuron, schematically represented “training” of the network, and is done by the
in Fig. 1. The neuron receives (numerical) minimization of a loss function (usually a
information through a number of input quadratic function of the output error).
nodes (four, in this example), processes it Many 46 IEEE TRANSACTIONS ON
internally, and puts out a response. The POWER SYSTEMS, VOL. 16, NO. 1,
processing is usually done in two stages: FEBRUARY 2001 optimization methods
first, the input values are linearly combined, have been adapted for this task. The first
then the result is used as the argument of a training algorithm to be devised was the
nonlinear activation function. The back-propagation one, which uses a
combination uses the weights attributed to steepest-descent technique based on the
each connection, and a constant bias term , computation of the gradient of the loss
represented in the figure by the weight of a function with respect to the network
connection with a fixed input equal to 1. The parameters (that is the reason why the
activation function must be a non decreasing activation functions must be differentiable).
and differentiable function; the most Many other training algorithms, though, are
common choices are either the identity now available. In load forecasting
applications, this basic form of multilayer both confidence intervals [89] and
feed-forward architecture shown above is conditional probability densities [44],
still the most popular. Nevertheless, there is [90]. In terms of practical applications in
a large number of other designs, which forecasting, the success of NNs seems to
might be suitable for other applications. depend on the kind of problem under
Artificial NNs have been developed and consideration.
extensively applied since the mid-1980’s. An overview of the application of NNs to
There are many reports of successful forecasting may be found in [88].
applications [25], particularly in pattern III. AN OVERVIEW OF THE PROPOSED
recognition and classification [42], and in NN-BASED FORECASTING SYSTEMS
nonlinear control problems, where they may Most of the papers under review proposed
have some advantages over the traditional multilayer perceptrons that might be
techniques. Since quantitative forecasting is classified into two groups, according to the
based on extracting patterns from observed number of output nodes. In the first group
past events and extrapolating them into the are the ones that have only one output node,
future, one should expect NNs to be good used to forecast next hour’s load, next day’s
candidates for this task. In fact, peak load or next day’s total load. In the
NNs are very well suited for it, for at least second group are the ones that have several
two reasons. First, it has been formally output nodes to forecast a sequence of
demonstrated that NNs are able to hourly loads. Typically, they have 24 nodes,
approximate numerically any continuous to forecast next day’s 24 hourly loads (this
function to the desired accuracy (see [37], series of hourly loads is called the “load
[96] for references). In this sense, NNs may profile”).
be seen as multivariate, nonlinear and We start with the first group. Reference [68]
nonparametric methods, and they should be used three small sized NNs to forecast
expected to model complex nonlinear hourly loads, total loads and peak loads
relationships much better than the traditional TABLE I INPUT CLASSIFICATION
linear models that still form the core of the (2) And (3)-C: the day’s position in the
forecaster’s methodology. Secondly, calendar (weekday/weekend/holiday, month,
NNs are data-driven methods, in the sense or season), L: load, T: temperature, H:
that it is not necessary for the researcher to humidity, W: weather
postulate tentative models and then estimate variables (other than T and H), f(T):
their parameters. Given a sample of input nonlinear functions of T, LP: load
and output vectors, the NNs are able to parameters. (4) There are as many sets of
automatically map the relationship between classes as classification criteria.
them; they “learn” this relationship, and Cells marked with “...”: values were not
store this learning into their parameters. As reported in the paper.
these two characteristics suggest, (only one of these NNs is included in the
NNs should prove to be particularly useful Tables I–III). Reference
when one has a large amount of data, but [40] used a NN to forecast next day’s peak
little a priori knowledge about the laws that load, which is
govern the system that generated the data. needed as an input to the expert system in
In terms of theoretical research in [39] that forecasts next
forecasting, NNs have progressed from day’s profile. [14] suggested a nonfully
computing point estimates to computing connected network, in
order to reduce the number of weights. (3) input/hidden/output layers. Some papers
Reference [70] proposed reported ranges for the number of neurjons.
two NNs, one of which included a linear They are indicated
neuron among the sigmoidal by colnons. (4) L: linear, S: signoidal, Sin:
ones in the hidden layer. Reference [23] sinusoidal. (5) cv: cross–validation, tol:
experimented training was carried on
with feed-forward and recurrent NNs to until a specified tolerance (in-sample error)
forecast hourly loads, was reached, # iter: training was carried on
and was the only paper to report that linear for a fixed number of
models actually performed iterations. Cells marked with“...”: values
better than those NNs. were not reported in the paper.
These NNs with only one output neuron NN to forecast for each period. Reference
were also used to [57] experimented
forecast profiles, in either of two ways. The with three NNs to model data from two
first way was by repeatedly utilities, and concluded
forecasting one hourly load at a time, as in that NNs are “system dependent,” i.e., must
[28], [29]. be tailored for each
The second way was by using a system with specific utility. Reference [66] included
24 NNs in parallel, nonlinear functions of
one for each hour of the day: [61] compared temperature to the inputs, and also suggested
the results of such a procedure to improve
a system to those of a set of 24 regression the forecasting on holidays. Reference [6]
equations; [18] considered improved on it,
the load as an stationary process with level specially for forecasting sequences of
changes and holidays. Reference [52]
outliers, filtered these out by a Kalman trained a NN with data from a small utility,
filter, and modeled the and found it necessary
remaining load by a NN system; [86] to smooth their sample data by a manual
considered the load as the pre-processing
output of a dynamic system, and modeled it procedure. Reference [15] used a fractional
by a set of 24 recurrent factorial experiment
NNs. to find out the “quasioptimal” network
Most of the profile forecasting, however, design parameters
was done with the (number of neurons and layers, activation
NNs of the second group (the ones with functions, stopping
several input nodes). criteria, etc.) and came up with a rather
Reference [55] divided a day into 3 periods, unusual architecture:
and had a large a recurrent NN, with sinusoidal activation
HIPPERT et al.: NEURAL NETWORKS functions in the
FOR SHORT-TERM LOAD hidden layers. In [16], the same NN was
FORECASTING: A REVIEW AND trained by a weighted
EVALUATION 47 least squares procedure in which the weights
TABLE II were the marginal
NN ARCHITECTURES AND energy costs. Since the load series are often
IMPLEMENTATION nonstationary,
[17] suggested that NNs could be used to produced by both methods were linearly
model the first differences combined. It is argued
of the series, as nonlinear extensions to the that the second NN allowed the system to
ARIMA adapt more quickly to
models. Other authors dealt with the abrupt changes in temperature. In [63] the
problem of nonstationarity hourly loads were
by detrending the series [63], [83], [84] or classified according to the season (seven
by filtering it with a classes), to the day
Kalman filter [18]. of the week (three classes) and to the period
Some papers suggested systems in which a of the day (five
number of NNs classes), and each of these classes was
worked together to compute the forecasts. modeled by one of the
Reference [2] used a independent NNs that made up a very large
small NN that pre-processed some of the system. Reference
data and produced estimates [72] used a neural-gas network (a kind of
of peak load, valley load, and total load, NN used for vector
which were fed, 48 IEEE TRANSACTIONS ON POWER
together with some other data, into a very SYSTEMS, VOL. 16, NO. 1, FEBRUARY
large NN that computed 2001
next day’s profile. Reference [54] suggested TABLE III
a system of NUMBER OF PARAMETERS AND
12 NNs, one for each month of the year. In SAMPLE SIZES
order to improve the (2) Some papers reported ranges for the
forecast for “anomalous” days (holidays), number of neurons in their MLPs; these
the daily load profiles ranges are indicated by colons;
were classified by a Kohonen self-organized (3) The numbers marked with (?) represent
map. Reference our best guesses, since the actual numbers
[51] proposed a system in which the results were not clearly reported
of hourly, daily in the papers; (4) The ranges for the number
and weekly modules (38 NNs in total) were of weights correspond to the ranges for the
linearly combined. number of neurons.
This system was replaced in [49] by a Cells marked with“...”: values were not
smaller one, composed of reported in the paper.
24 NNs, one for each hour of the day. Later, quantization [59]) to classify the data into
some of these authors two groups, summer
proposed a system with only two NNs [50]. and winter, which were modeled by separate
One of them feed-forward NNs.
was trained to produce a first forecast of The forecasts were combined by a fuzzy
tomorrow’s profile. The module.
other one was trained to estimate Fuzzy logic, another artificial intelligence
tomorrow’s load changes with technique, has also
respect to today’s loads; these changes, been tried in combination with NNs in some
added to today’s loads, of the most recent
made up a second forecast of tomorrow’s papers. Reference [83] included a “front-end
profile. The forecasts fuzzy processor”
that received quantitative and qualitative models in the minute-by-minute forecasting
data, and put of the load in the
out four fuzzy numbers that measured the next half-hour.
expected load change We discuss these papers more fully in the
in each of the four periods into which the Sections IV and
target day had been Section V, focusing on the way the MLPs
divided; these numbers, together with some models were designed
temperature data, and tested.We are not concerned with the
were fed to the NN that computed the fuzzy modules; fuzzy
forecasted profile. The neural networks [7], [65] are also outside the
fuzzy pre-processing reduced the number of scope of this paper.
NN inputs and allowed A few papers suggested NN architectures
the system to work on qualitative data. other than the MLP,
Reference [53] sometimes combined with traditional linear
placed the fuzzy engine after the NN. The methods; we review
NN provided a “provisional these briefly in Section VI.
forecast” based on past loads, which was IV. ISSUES IN DESIGNING A NN-
afterwards BASED FORECASTING SYSTEM
modified by the fuzzy engine with basis on Neural networks are such flexible models
the temperature and that the task of designing
type of day (regular or holiday). Particular a NN-based forecasting system for a
attention was given particular application
to the modeling of holidays. Reference [22] is far from easy. There is a large number of
classified the data choices that have
into 48 fuzzy subsets according to to be made, but very few guidelines to help
temperature and humidity; the designer through
each subset was modeled by a separate NN. them. Some recent theoretical contributions
Reference [72], already can be found in [3],
mentioned above, used a fuzzy module to HIPPERT et al.: NEURAL NETWORKS
combine results FOR SHORT-TERM LOAD
from two separate NNs. Reference [48] FORECASTING: A REVIEW AND
forecasted the demand EVALUATION 49
of a residential area by decomposing it into a [78], [79], [87], but they have not been
normal load and a much tested in practice
weather sensitive load. The normal load was as yet.
modeled by three The design tasks can be roughly divided into
NNs (for week days, Saturdays and four headings:
Sundays); the weather sensitive A. Data pre-processing;
load was modeled by a fuzzy engine, with B. NN designing;
basis on weather C. NN implementation;
data. D. Validation.
The only paper that dealt with very short- This subdivision is somewhat artificial, as
term forecast was these stages in practice
[56]. The authors compared auto-regressive, tend to overlap, but it is useful in the
fuzzy and NN organization of what
is to follow. In most of the papers we profile is the calendar date (see Table I); the
reviewed, the authors week day profiles
made their choices guided by empirical tests are typically very different from the
and simulations. weekend profiles. Thus, the
In the next four sub-sections we summarize basic classification is into two groups: week
these choices; in days and weekend
Section V, we discuss their consequences days. References [17], [68] dealt only with
and implications. the week day profiles,
A. Data Pre-Processing discarding the weekends and holidays.
Before data are ready to be used as input to a Reference [57] ignored
NN, they may such distinction, but got poor results as a
be subjected to some form of pre- consequence
processing, which usually intends on weekends and holidays. Sometimes the
to make the forecasting problem more profiles of the days
manageable. Preprocessing just before Saturdays or after Sundays may
may be needed to reduce the dimension of be disturbed by
the input the weekend, so that special classes may be
vector, so as to avoid the “curse of needed for Mondays,
dimensionality” (the exponential Fridays, or even Thursdays [2], [63], [70].
growth in the complexity of the problem that Weekend days
results may be further classified according to the
from an increase in the number of social customs and
dimensions). Pre-processing working patterns in the country. If this
may also be needed to “clean” the data, by process is continued,
removing outliers, the number of classes may rise to eleven
missing values or any irregularities, since [40], though the usual
NNs are sensitive to number is seven (one class for each day of
such defective data. References [52], [72] the week). The typical
devised heuristics to within-week profiles may change from
regularize their data; [18] filtered out the season to season,
irregularities with a and they should then be further classified
Kalman filter. according to month or
Pre-processing also frequently means season [63], [72].
partitioning the input Holidays pose a special problem. Some
space so that a “local” model may be authors group them
designed for each subspace. with the weekend days; others reserve them
These models will be simpler and will special classes, or
require less data devise heuristics to deal with them [6], [50],
than a “global” model. In load forecasting, [51], [66].
this is usually done The second most important factor to affect
by classifying the input data (past load the load profile
profiles or weather data), is the weather. Because of this, the days may
and then using separate NNs to model data be classified
from each class. according to the weather conditions, by
The most important factor to determine the statistical measures
shape of the load
of similarity [28], [29], [70], by fuzzy permit network training. Extreme instances
engines [22], and by of this subdivision
neural-gas networks [72]. are found in [22], [63].
In [54] the profiles themselves (not the B. Designing the Neural Network
temperatures or Selecting an appropriate architecture is in
the calendar dates) were classified by a general the first
Kohonen selforganized step to take when designing a NN-based
map. The classes were then interpreted by forecasting system.
the system Many types of architecture have already
operator, so that the class to which the target been used in forecasting
day would belong applications. In all the papers discussed in
could be predicted. this chapter,
These classification procedures give a class however, the authors used the NN work-
label to each of horse, the multilayer
the profiles in the training sample. These perceptron, usually a feed-forward one (the
labels will make up exceptions were
a new variable that must be included in the the recurrent networks in [15], [16], [86]).
model. This may Most of them were
be done in either of two ways. The first one fully-connected (the exception was in [14]).
is by coding these Having chosen this type of architecture, one
class labels and using the codes as input must then decide
variables. That may on the number of hidden layers, of input
mean adding to the NN as many input nodes nodes, of neurons per
as the number of layer, and on the type of activation functions
classes, and having each class represented (see Table II). The
by a dummy variable number of hidden layers is not difficult to
[6], [14], [15], [49], [50], [52], [57], [66]; or, determine. It has been
alternatively, shown that one hidden layer is enough to
numbering the classes (normally on a binary approximate any continuous
basis), and feeding functions, although two layers may be useful
these numbers to the NN through a few in some
input nodes, so that a specific circumstances (see [37] for
large number of classes may be represented references). The papers we
by a comparatively reviewed used either one or two hidden
small number of input nodes [2], [22], [54]. layers. Defining the activation
The secondway to use this information is by function for the hidden neurons is also not
building separate difficult. This
NN models for each class (or a common functions must be differentiable and
NN, with a separate nondecreasing; most papers
set of weights for each class) [22], [28], used either the logistic or the hyperbolic
[51], [54], [55], [57], tangent functions,
[63], [72]. However, if the data are and it is not clear whether the choice has any
subdivided into too many effect on the forecasting
classes, there will not be enough profiles left accuracy [96].
in each class to In the following sub-sections we report how
the number of
output neurons, input nodes, and hidden with both methods a) and c), and found that
neurons were defined their results
in the papers under review. were roughly equivalent.
1) Selecting the Number of Output Neurons: b) Multi-Model Forecasting: This is a
Six of the papers common method
listed in Table II used one-output NNs to for load forecasting with regression models:
produce one-stepahead using 24 different
forecasts: forecasts for next day’s peak load models, one for each hour of the day.
or total load, Among the papers we
or forecasts for hourly loads (that is, given reviewed, [18], [61], [86] used systems of 24
the load series up to NNs in parallel.
hour , forecasts for the load at hour ). Most The advantage of this method is that the
of the reviewed individual networks
50 IEEE TRANSACTIONS ON POWER are relatively small, and so they are not
SYSTEMS, VOL. 16, NO. 1, FEBRUARY likely to be overfitted.
2001 c) Single-Model Multivariate Forecasting:
papers however were concerned with This is done
forecasting profiles. They by using a multivariate method to forecast
did this in one of three ways: a) iterative all the loads at once,
forecasting; b) multimodel so that each profile is represented by a 24-
forecasting; c) single-model multivariate dimensional vector.
forecasting. This method was used by most of the
a) Iterative Forecasting: This is done by researchers, who designed
forecasting one MLPs with 24 neurons in the output layer.
hourly load at a time and then aggregating (The exceptions were
this load to the series, [2], that forecast loads at each half-hour and
so that the forecasts for the later hours will so needed 48 output
be based on the neurons; and [55], that divided the profile
forecasts for the earlier ones. If the model is into three parts, forecasted
an ARIMA, it may by separated MLPs.) This method, however,
be shown that the forecasts will eventually has two serious
converge to the series drawbacks. The first one is that the MLPs
average. However, it is not clear what must be very
happens if the model large in order to accommodate 24 output
is a MLP. Reference [38] studied the neurons ([56], doing
accuracy of multi-steps minute-by-minute forecasting for the next
forecasts obtained iteratively by a MLP on half-hour using this
the -competition method, tried to reduce the number of output
time series, and found that this MLP nodes by compressing
outperformed the statistical the 30-dimension output vector into 16
models; however, [19], [20] reported that the nodes, using a
MLP outputs transformation technique). If the loads of
may behave chaotically. Among the papers one or two previous
we reviewed, [28], days are used as inputs, 24 or 48 input nodes
[29] used only this iterative method, will be required,
whereas [55], [57] experimented
and the number of MLP parameters will needs to select what lagged load values
very likely run into the should be used as inputs.
thousands. Some authors tried to adapt the Box and
The second drawback is that treating each Jenkins methodology
day as a vector for fitting ARIMA models, and selected the
means that one year of data will yield only lags by the analysis
365 data points, of the autocorrelation functions (ACF) and
which seems to be too few for the large the partial autocorrelation
MLPs required. Trying functions (PACF) [14], [57]. In doing so,
to increase the sample size by aggregating however, they
data from years way run the risk of discarding lagged variables
back in the past may not be feasible, because that showed no significant
in most places the linear correlation to the load, but which were
load series show a very clear upward trend. strongly
2) Selecting the Number of Input Nodes: nonlinearly correlated to it. Reference [28]
After selecting the used phase-space
number of layers and the number of output embedding, a technique that represents a
neurons required, system by one variable
one must choose the inputs. There are very and its lagged versions, to help determining
few theoretical which lagged
considerations to help in this decision; values of the load series should be used as
usually, one must have inputs.
some a priori knowledge about the behavior As the short-term load forecasting problem
of the system has been intensively
under study, and of the factors that condition studied for decades, there are some
the output of that empirical guidelines
system. that may help in selecting among the
The first variable to be used is almost candidate exogenous variables.
certainly the load itself, The main variable to be included is the air
as the load series is strongly autocorrelated. temperature,
If the MLP since it has been known since de 1930’s that
forecasts the profiles, treating them as 24- the demand rises on
dimension vectors, cold days because of the use of electric
the researcher does not have much choice: space- and water-heating
either he uses data devices, and on hot days, because of air
from one past day [2], [17], [51] or from two conditioning. The function
[6], [15], [16]. that relates the temperature to the load is
No experiments with three or more past days clearly nonlinear;
were reported, that is, of course, one of the main
because they would imply having 72 or motivations to use NN in this
more input nodes for context, since NNs can easily deal with
the loads only. If however the MLP is nonlinear relationships.
forecasting hourly loads, However, since this function seems to be U-
the problem becomes more complex, since shaped in many
the researcher then countries (see for example the graphs
relating peak load to maximum
temperature in [34], [35], [95]), some than determining the size of the input or the
authors used piecewise output layers. There
linear-quadratic functions of the temperature is again little theoretical basis for the
as input variables decision, and very few
[52], [66]. The idea was borrowed from successful heuristics have been reported
earlier papers that [96]. The issue may be
modeled the components of linear regression roughly compared to that of choosing the
models by such number of harmonics
piecewise functions [67] or by polynomial to be included in a Fourier model to
functions [34]. Some approximate a function;
authors experimented with other variables, if they are too few, the model will not be
such as relative humidity flexible enough to
or wind speed, since they have a strong model the data well; if they are too many,
effect on the the model will overfit
human sensation of thermal discomfort and the data. In most papers, authors chose this
may help explaining number by trial and
the use of heating and cooling devices [2], HIPPERT et al.: NEURAL NETWORKS
[17], [22], [50], [51]. FOR SHORT-TERM LOAD
Others concluded that the only significant FORECASTING: A REVIEW AND
weather variable was EVALUATION 51
the temperature [6], [53], [66]. In most error, selecting a few alternative numbers
cases, however, the authors and then running simulations
did not have much choice, as data on to find out the one that gave the best fitting
weather variables (or predictive)
other than temperature were simply performance. Some of the papers reported
unavailable. that variations in
As the MLPs were used in these papers as the number of hidden neurons did not
nonlinear regression significantly affect forecasting
models, load forecasts required weather accuracy [6], [49].
forecasts. Most C. Neural Network Implementation
authors run their simulations using observed After an MLP has been designed, it must be
weather values instead trained (that is, its
of forecasted ones, which is standard parameters must be estimated). One must
practice in load forecasting; select a “training algorithm”
however, one should keep in mind that the for this task. The most common in use is the
forecasting backpropagation
errors in practice will be larger than those algorithm, based on a steepest-descent
obtained in simulations, method that performs
because of the added weather forecast stochastic gradient descent on the error
uncertainty (for surface, though
some studies on the effect of this uncertainty many alternatives to it have been proposed
see [27], [75]). in recent years. Since
3) Selecting the Number of Hidden Neurons: these algorithms are iterative, some criteria
Determining must be defined to
the number of neurons in the hidden layer stop the iterations. In most of the papers
may be more difficult reviewed, training was
stopped after a fixed number of iterations, or design, implementation and validation of
after the error decreased NN models. We shall
belowsome specified tolerance (see Table use some of the guidelines proposed in [1]
II). These criteria for the evaluation of
are not adequate, as they insure that the these choices.
model fits closely to A. Evaluating the Neural Network Design—
the training data, but do not guarantee good The Problems of
out-of-sample performance; Overfitting and Overparameterization
they may lead to overfitting of the model The problem of overfitting is frequently
(this point mentioned in the
is further discussed in Section V). NN literature. It seems, however, that
Lastly, the training samples must be different authors give
appropriately selected. this word different meanings. For instance,
Since NNs are “data-driven” methods, they [1] remark that
typically require the usual MLPs trained by backpropagation
large samples in training. References [28], “are known to be
[29], [70] trained seriously prone to overfitting” and that this
their MLPs on small subsets that included could be prevented
data from only a by avoiding excessive training. On the other
few past days selected through statistical hand, [33], [96]
measures of similarity. remark that MLPs are prone to overfit
That resulted in samples that were very sample data because of
homogeneous, but also the large number of parameters that must be
very small. estimated. These
D. Neural Network Validation authors mean two different things, and we
The final stage is the validation of the should better start
proposed forecasting by defining our terms.
system. It is well known that goodness-of-fit “Overfitting” usually means estimating a
statistics are not model that fits the
enough to predict the actual performance of data so well that it ends by including some
a method, so most of the error randomness
researchers test their models by examining in its structure, and then produces poor
their errors in samples forecasts. In MLPs,
other that the one used for parameter as the remarks above imply, this may come
estimation (out-ofsample about for two reasons:
errors, as opposed to insample errors). Some because the model was overtrained, or
of the papers because it was too
reviewed did not clearly specify whether the complex.
results they One way to avoid overtraining is by using
reported had been obtained insample or out- cross-validation.
of-sample. The sample set is split into a training set and
V. DISCUSSION a validation set.
In this section we discuss the implications The NN parameters are estimated on the
and consequences training set, and the
of the choices made in the papers under performance of the model is tested, every
review on the issues of few iterations, on
the validation set. When this performance the number of parameters to be estimated.
starts to deteriorate Many methods have
(which means the NN is overfitting the been suggested to “prune” the NN, i.e., to
training data), the iterations reduce the number
are stopped, and the last set of parameters to of its weights, either by shedding some of
be computed the hidden neurons,
is used to produce the forecasts. or by eliminating some of the connections
Another way is by using regularization [77]. However, the
techniques. This involves adequate rate between the number of sample
modifying the cost function to be points required
minimized, by adding for training and the number of weights in the
to it a term that penalizes for the complexity network has not
of the model. yet been clearly defined; it is difficult to
This term might, for example, penalize for establish, theoretically,
the excessive curvature how many parameters are too many, for a
in the model by considering the second given sample size.
derivatives of the Returning nowto the load forecasting
output with respect to the inputs. Relatively problem, Table III compares
simple and smooth the sizes of the parameter sets to the sizes of
models usually forecast better than complex the training
ones. Overfitted sets in the papers under review. (The
NNs may assume very complex forms, with number of parameters was
pronounced curvature, not explicitly reported in any of the papers.
since they attempt to “track down” every We computed it by
single data point piecing together the information about the
in the training sets; their second derivatives number of MLPs and
are therefore very the number of neurons per MLP in the
large and the regularization term grows with forecasting systems.) It
respect to the error may be seen, by comparing columns (4) and
term. Keeping the total error low, therefore, (5), that most of
means keeping the the proposed MLPs, specially the ones that
model simple. None of the papers we forecasted profiles,
reviewed, however, used had more parameters than training points.
regularization techniques. This is a consequence
Overfitting however may also be a of the way they were designed; using 24
consequence of overparameterization, output neurons implies
that is, of the excessive complexity of the that the MLPs will be large, and the
model. The problem is very common in samples, small. One should,
MLP-based models; in principle, expect these MLPs to fit their
since they are often (and improperly) used training data very
as “black-box” well (in fact, to overfit them), but one should
devices, the users are sometimes tempted to not expect them to
add to them a large produce good forecasts.
number of variables and neurons, without 52 IEEE TRANSACTIONS ON POWER
taking into account SYSTEMS, VOL. 16, NO. 1, FEBRUARY
2001
B. Evaluating the Neural Network the test samples were adequate, so that some
Implementation inference might be
The next stage in modeling is the drawn.
implementation of the NN, Item i) may be interpreted in two ways.
that is, the estimation of its parameters. The First, the proposed
guidelines proposed method may be compared to some “naïve”
in [1] for evaluating “Effectiveness of method, which
implementation” were provides a (admittedly low) benchmark. The
based on the question: was the NN properly proposed method
trained and tested, must be noticeably better than the naïve
so that its performance was the best it could method, otherwise
possibly achieve? there would be no point in adopting it. The
According to these authors, the NN can be naïve forecast
said to have been is also useful, besides, to show the reader
properly implemented if how difficult a
i) it was well fitted to the data (the errors in forecasting problem is; as the size and load
the training profiles of the
sample must be reported); utilities concerned vary greatly across the
ii) its performances in the training sample reviewed papers, it
and in the test would have been interesting to see how
samples were comparable; difficult the problems
iii) its performances across different test were in each case (no paper reported them,
samples were however). Second,
coherent. the performance of the proposed method
Few of the papers under review reported any may be compared to
insample results that of a good standard method. The
[52], [61], so effectiveness of proposed method may be
implementation may not be examined not much more accurate than the standard
further. one, but it must have
C. Evaluating the Neural Network some other advantage (it may be easier to
Validation use, for instance).
Guidelines for evaluating the “Effectiveness It is difficult to find a good standard for
of validation” comparison in a
were also proposed in [1]. The evaluation is problem like load forecasting. ARMAX or
based on the question: regression models
was the performance of the proposed would probably be a good choice, but it
method fairly compared must be admitted that fitting
to that of some well-accepted method? A them would require as much hard work as
method is considered fitting the MLPs
to have been properly validated if: i) its they must test. That is perhaps the reason
performancewas why most papers did
compared to that of well-accepted methods; not make any kind of comparison (only [14],
ii) the comparison [15], [53], [55],
was based on the performance on test [56], [61], [63], [66], [68], [83] reported
samples; iii) the size of comparisons to standard
linear models). However, if there are no indicates that the loss function in the load
comparisons, the forecasting problem
reports on the performance of a proposed is clearly nonlinear, and that large errors
method are difficult to may have disastrous
interpret.We do not believe that comparisons consequences for a utility. Because of this,
to other NNs or to measures based on
fuzzy engines are valid, as these models are squared error are sometimes suggested, as
not yet considered they penalize large
“standard” or “well accepted” methods. errors (Root Mean Square Error was
We should like to add yet another item to suggested in [4], Mean
the guidelines suggested Square Percentage Error in [5]). Also, it is
above: that iv) the results of these generally recognized
comparisons should be that error measures should be easy to
thoroughly examined by means of the understand and closely
standard techniques used related to the needs of the decision-makers.
in forecasting, and reported as fully as Some papers
possible. In most papers, reported that the utilities would rather
the forecasting errors were not examined in evaluate forecasting
detail. systems by the absolute errors produced
Most reported only the Mean Absolute [63], [66], and this
Percent Errors suggests that Mean Absolute Errors could be
(MAPE); few also reported the standard useful (they were
deviation of the reported in [6], [63], [66]).
errors [2], [17], [55], [70], [86]. Although In any case, error measures are only
MAPE has become intended as summaries
somewhat of a standard in the electricity for the error distribution. This distribution is
supply industry, usually expected to
it is clearly not enough in this context. The be normal white noise in a forecasting
choice of error problem, but it will probably
measures to help comparing forecasting not be so in a complex problem like load
methods has been forecasting (specially
much discussed, as a consequence of the if this distribution is seen as multivariate,
many competitions conditioned on
that were started in the 1980’s [4], [5], [32]. lead time). No single error measure could
Most authors agree possibly be enough to
that the loss function associated with the summarize it. The shape of the distribution
forecasting errors, if should be suggested.
known, should be used in the evaluation of a A few papers included graphs of the
method. MAPE cumulative distribution of
would be an adequate error measure if the the errors [22], [63], [70], [86], others
loss function were suggested this distribution
linear (and linear in percentage, not in by reporting the percentage of errors above
absolute error); however, some critical
recent studies [41], [76] and the experience values [52], [63], [66], percentiles [70], [86],
of system operators or the maximum
errors [2], [28], [52]. A histogram of the and a weather-dependent one, both forecast
errors was included in by Adalines, and
[63]. The possibility of serial correlation random noise.
should be investigated Reference [24] used a functional-link
by graphical means (scatterplots and network that had only
correlograms) and portmanteau one neuron. The inputs were a set of
tests [58], [61]. Most of the papers reviewed, sinusoids, past forecasting
however, errors, and temperatures. The neuron had a
largely bypassed these standard forecasting linear activation
practices. function, and so this network may be
VI. SOME PROPOSED ALTERNATIVES interpreted as a linear
Afewpapers have proposed architectures model that decomposed the load into a
other than the MLP, weather-independent
or new ways to combine MLPs to standard component (modeled by a Fourier series), a
statistical methods. weather-dependent
In [95], the variables were grouped into a component (modeled by polynomial
few more or less homogeneous functions and by “functional
groups, and sorted according to an index that links”), and random noise.
measured Self-organizing NNs were also used.
how much each variable was (nonlinearly) Reference [84] trained
correlated to Kohonen networks to find “typical” profiles
the load. The ones which were most for each day of the
correlated were fed to a network week, and then used a fuzzy engine to
that implemented a projection pursuit compute corrections to
regression model. those profiles, based on the weather
Twenty-four such NNs were used to forecast variables and on the day
a profile; each one types. Reference [93] proposed an unusual
of them had a single hidden layer, with 5 self-organizing NN
neurons that were model, in which the neurons were split into
grouped into “sub-nets.” two clusters; one
Reference [71] dealt with the load series in of them received past load data, the other
the frequency received temperature
domain, using signal-processing techniques. data. Reference [20] used a Kohonen NN to
The series was classify the normalized
HIPPERT et al.: NEURAL NETWORKS sample profiles and to identify the “typical”
FOR SHORT-TERM LOAD profiles of each
FORECASTING: A REVIEW AND class. When forecasting for a Tuesday in
EVALUATION 53 October (for example),
decomposed into three components of one checked in which classes Tuesdays fell
different frequencies, in the past Octobers,
which were forecasted by separate Adaline and averaged the typical profiles of those
neurons (a kind classes. This average
of linear neurons, see [37]). The series of profile was then “de-normalized” in order to
forecasting errors produce the final
was also decomposed into three forecast (i.e., it was multiplied by the
components: an autoregressive standard deviation and
added to the mean; both parameters must sets. One would expect these NNs to have
have been previously overfitted their
forecasted by some linear method). training data, and onewould not, in
VII. CONCLUSION principle, expect them
The application of artificial NNs to to produce good out-of-sample forecasts.
forecasting problems has b) The results of the tests performed on
been much studied in recent times, and a these NNs, though
great number of papers apparently good, were not always very
that report successful experiments and convincing. All
practical tests have been those systems were tested on real data;
published. However, not all the authors and nevertheless, in
researchers in forecasting most cases the tests were not systematically
have been convinced by those reports, and carried out:
some believe the systems were not properly compared to
that the advantages of using NNs have not standard
been systematically benchmarks, and the analysis of the errors
proved yet. The aim of this review is to did not make
contribute to clarifying use of the available graphical and statistical
the reasons for this skepticism, by tools (e.g.,
examining a collection of recent scatterplots and correlograms).
papers on NN-based load forecasting, and In short, most of those papers presented
by critically evaluating seemingly misspecified
the ways the systems they proposed had models that had been incompleted tested.
been designed Taken by themselves,
and tested. they are not very convincing; however, the
This examination led us to highlight two sheer number
facts that may be of similar papers published in reputable
among the reasons why some authors are journals, and the fact
still skeptical: that some of the models they propose have
a) Most of the proposed models, specially been reportedly very
the ones designed successful in everyday use, seem to suggest
to forecast profiles, seemed to have been that those large
overparameterized. NN-based forecasters might work, after all,
Many were based on single-model and that we still do
multivariate forecasting, not properly understand how
i.e., they regarded the profiles as vectors overparameterization and overfitting
with affect them.
24 components that should be forecasted We believe, in conclusion, that more
simultaneously research on the behavior
by a single NN with 24 output neurons. This of these large neural networks is needed
approach before definite conclusions
led to the use of very large NNs, which are drawn; also, that more rigorous
might have hundreds standards should
of parameters to be estimated from very be adopted in the reporting of the
small data experiments and in the analysis
of the results, so that the scientific 858–863, 1996.
community could have [7] A. G. Bakirtzis, J. B. Theocharis, S. J.
more solid results on which to base the Kiartzis, and K. J. Satsios,
discussion about the role “Short-term load forecasting using fuzzy
played by NNs in load forecasting. neural networks,” IEEE Trans.
ACKNOWLEDGMENT Power Systems, vol. 10, no. 3, pp. 1518–
The authorswould like to thank Prof. R. R. 1524, 1995.
Bastos (Univ. Federal [8] C. M. Bishop, Neural Networks for
de Juiz de Fora, Brazil) for his careful Pattern Recognition. Oxford:
reading of the first Claredon Press, 1997.
version of the manuscript, and also Prof. D. [9] D. W. Bunn, “Forecasting loads and
Bunn (London Business prices in competitive power markets,”
School, UK) for many useful suggestions on Proc. IEEE, vol. 88, no. 2, pp. 163–169,
the second 2000.
version. [10] D. W. Bunn and E. D. Farmer, Eds.,
REFERENCES Comparative Models for Electrical
[1] M. Adya and F. Collopy, “How effective Load Forecasting: John Wiley & Sons,
are neural networks at forecasting 1985.
and prediction? A review and evaluation,” J. [11] W. Charytoniuk, M. S. Chen, and P.
Forecast., vol. 17, Van Olinda, “Nonparametric regression
pp. 481–495, 1998. based short-term load forecasting,” IEEE
[2] A. S. AlFuhaid, M. A. El-Sayed, and M. Trans. Power Systems,
S. Mahmoud, “Cascaded artificial vol. 13, no. 3, pp. 725–730, 1998.
neural networks for short-term load [12] C. Chatfield, “Neural networks:
forecasting,” IEEE Trans. Forecasting breakthrough or passing
Power Systems, vol. 12, no. 4, pp. 1524– fad?,” Int. J. Forecast, vol. 9, pp. 1–3, 1993.
1529, 1997. 54 IEEE TRANSACTIONS ON POWER
[3] U. Anders and O. Korn, “Model SYSTEMS, VOL. 16, NO. 1, FEBRUARY
selection in neural networks,” Neural 2001
Networks, vol. 12, pp. 309–323, 2000. [13] , “Forecasting in the 1990’s,” in Proc.
[4] J. S. Armstrong and F. Collopy, “Error V Annual Conf. of Portuguese
measures for generalizing about Soc. of Statistics, Curia, 1997, pp. 57–63.
forecasting methods: Empirical [14] S. T. Chen, D. C. Yu, and A. R.
comparisons,” Int. J. Forecast., vol. 8, Moghaddamjo, “Weather sensitive
pp. 69–80, 1992. short-term load forecasting using nonfully
[5] J. S. Armstrong and R. Fildes, connected artificial neural
“Correspondence on the selection of error network,” IEEE Trans. Power Systems, vol.
measures for comparisons among 7, no. 3, pp. 1098–1105,
forecasting methods,” J. Forecast., 1992.
vol. 14, pp. 67–71, 1995. [15] M. H. Choueiki, C. A. Mount-
[6] A. G. Bakirtzis, V. Petridis, S. J. Campbell, and S. C. Ahalt, “Building a
Kiartzis, M. C. Alexiadis, and A. H. ‘Quasi Optimal’ neural network to solve the
Maissis, “A neural network short term load short-term load forecasting
forecasting model for the problem,” IEEE Trans. Power Systems, vol.
Greek power system,” IEEE Trans. Power 12, no. 4, pp. 1432–1439,
Systems, vol. 11, no. 2, pp. 1997.
[16] M. H. Choueiki, C. A. Mount- short-term load forecasting system using
Campbell, and S. C. Ahalt, “Implementing functional link network,” IEEE
a weighted least squares procedure in Trans. Power Systems, vol. 12, no. 2, pp.
training a neural network to solve 675–680, 1997.
the short-term load forecasting problem,” [25] T. Dillon, P. Arabshahi, and R. J.
IEEE Trans. Power Systems, Marks II, “Everyday applications of
vol. 12, no. 4, pp. 1689–1694, 1997. neural networks,” IEEE T. Neural Nets., vol.
[17] T. W. S. Chow and C. T. Leung, 8, no. 4, pp. 825–826, 1997.
“Neural network based short-term load [26] A. P. Douglas, A. M. Breipohl, F. N.
forecasting using weather compensation,” Lee, and R. Adapa, “Risk due to
IEEE Trans. Power Systems, load forecast uncertainty in short term power
vol. 11, no. 4, pp. 1736–1742, 1996. system planning,” IEEE
[18] J. T. Connor, “A robust neural network Trans. Power Systems, vol. 13, no. 4, pp.
filter for electricity demand prediction,” 1493–1499, 1998.
J. Forecast., vol. 15, no. 6, pp. 437–458, [27] , “The impact of temperature forecast
1996. uncertainty on bayesian load
[19] M. Cottrell, B. Girard, Y. Girard, M. forecasting,” IEEE Trans. Power Systems,
Mangeas, and C. Muller, “Neural vol. 13, no. 4, pp. 1507–1513,
modeling for time series: A statistical 1998.
stepwise method for weight elimination,” [28] I. Drezga and S. Rahman, “Input
IEEE T. Neural Nets., vol. 6, no. 6, pp. variable selection for ANN-based
1355–1364, 1995. short-term load forecasting,” IEEE Trans.
[20] M. Cottrell, B. Girard, and P. Rousset, Power Systems, vol. 13, no.
“Forecasting of curves using a 4, pp. 1238–1244, 1998.
Kohonen classification,” J. Forecast., vol. [29] , “Short-term load forecasting with
17, pp. 429–439, 1998. local ANN predictors,” IEEE
[21] T. Czernichow, A. Piras, K. Imhof, P. Trans. Power Systems, vol. 14, no. 3, pp.
Caire, Y. Jaccard, B. Dorizzi, and 844–850, 1999.
A. Germond, “Short term electrical load [30] R. F. Engle, C. Mustafa, and J. Rice,
forecasting with artificial neural “Modeling peak electricity demand,”
networks,” Engineering Intelligent Syst., J. Forecast., vol. 11, pp. 241–251, 1992.
vol. 2, pp. 85–99, 1996. [31] J.Y. Fan and J. D. McDonald, “A real-
[22] M. Daneshdoost, M. Lotfalian, G. time implementation of short-term
Bumroonggit, and J. P. Ngoy, “Neural load forecasting for distribution power
network with fuzzy set-based classification syst.,” IEEE Trans. Power Systems,
for short-term load forecasting,” vol. 9, no. 2, pp. 988–994, 1994.
IEEE Trans. Power Systems, vol. 13, no. 4, [32] R. Fildes, “The evaluation of
pp. 1386–1391, extrapolative forecasting methods,” Int. J.
1998. Forecast., vol. 8, pp. 81–98, 1992.
[23] G. A. Darbellay and M. Slama, [33] W. L. Gorr, “Research prospective on
“Forecasting the short-term demand for neural network forecasting,” Int.
electricity—Do neural networks stand a J. Forecast., vol. 10, pp. 1–4, 1994.
better chance?,” Int. J. Forecast., [34] M. T. Hagan and S. M. Behr, “The time
vol. 16, pp. 71–83, 2000. series approach to short term
[24] P. K. Dash, H. P. Satpathy, A. C. Liew, load forecasting,” IEEE Trans. Power
and S. Rahman, “A real-time Systems, vol. PWRS-2, no. 3, pp.
785–791, 1987. [43] S. R. Huang, “Short-term load
[35] T. Haida and S. Muto, “Regression forecasting using threshold autoregressive
based peak load forecasting using models,” IEE Proc.—Gener. Transm.
a transformation technique,” IEEE Trans. Distrib., vol. 144, no. 5, pp.
Power Systems, vol. 9, no. 4, 477–481, 1997.
pp. 1788–1794, 1994. [44] D. Husmeier and J. G. Taylor, Eds.,
[36] A. Harvey and S. J. Koopman, Neural Networks for Conditional
“Forecasting hourly electricity demand Probability Estimation: Forecasting Beyond
using time-varying splines,” J. American Point Predictions (Perspectives
Stat. Assoc., vol. 88, no. 424, in Neural Computing): Springer-Verlag,
pp. 1228–1236, 1993. 1999.
[37] S. Haykin, Neural Networks—A [45] O. Hyde and P. F. Hodnett, “An
Comprehensive Foundation, 2nd. adaptable automated procedure for
ed. Upper Saddle River, NJ: Prentice Hall, short-term electricity load forecasting,”
1999. IEEE Trans. Power Systems,
[38] T. Hill, L. Marquez, M. O’Connor, and vol. 12, no. 1, pp. 84–93, 1997.
W. Remus, “Artificial neural [46] D. G. Infield and D. C. Hill, “Optimal
network models for forecasting and decision smoothing for trend removal in
making,” Int. J. Forecast., short term electricity demand forecasting,”
vol. 10, pp. 5–15, 1994. IEEE Trans. Power Systems,
[39] K. L. Ho, Y. Y. Hsu, C. F. Chen, T. E. vol. 13, no. 3, pp. 1115–1120, 1998.
Lee, C. C. Liang, T. S. Lai, and [47] G. M. Jenkins, “Practical experiences
K. K. Chen, “Short term load forecasting of with modeling and forecast,” Time
Taiwan power system using Series., 1979.
a knowledge-based expert system,” IEEE [48] H. R. Kassaei, A. Keyhani, T.Woung,
Trans. Power Systems, vol. 5, and M. Rahman, “A hybrid fuzzy,
no. 4, pp. 1214–1221, 1990. neural network bus load modeling and
[40] K. L. Ho, Y. Y. Hsu, and C. C. Yang, predication,” IEEE Trans. Power
“Short term load forecasting using Systems, vol. 14, no. 2, pp. 718–724, 1999.
a multilayer neural network with an adaptive [49] A. Khotanzad, R. Afkhami-Rohani, T.
learning algorithm,” IEEE L. Lu, A. Abaye, M. Davis, and
Trans. Power Systems, vol. 7, no. 1, pp. D. J. Maratukulam, “ANNSTLF—A neural-
141–149, 1992. network-based electric load
[41] B. F. Hobbs, S. Jitprapaikulsarn, S. forecasting system,” IEEE T. Neural Nets.,
Konda, V. Chankong, K. A. Loparo, vol. 8, no. 4, pp. 835–846,
and D. J. Maratukulam, “Analysis of the 1997.
value for unit commitment of [50] A. Khotanzad, R. Afkhami-Rohani, and
improved load forecasting,” IEEE Trans. D. Maratukulam,
Power Systems, vol. 14, no. 4, “ANNSTLF—Artificial neural network
pp. 1342–1348, 1999. short-term load forecaster—
[42] L. Holmstrom, P. Koistinen, J. Generation three,” IEEE Trans. Power
Laaksonen, and E. Oja, “Neural and Systems, vol. 13, no. 4,
statistical pp. 1413–1422, 1998.
classifiers—Taxonomy and two case [51] A. Khotanzad, R. C. Hwang, A. Abaye,
studies,” IEEE T. Neural and D. Maratukulam, “An adaptive
Nets., vol. 8, no. 1, pp. 5–17, 1997.
modular artificial neural network hourly 1998.
load forecaster and its implementation [59] T. M. Martinetz, S. G. Berkovich, and
at electric utilities,” IEEE Trans. Power K. L. Schulten, ““Neural-gas”
Systems, vol. 10, network for vector quantization and its
no. 3, pp. 1716–1722, 1995. application to time-series prediction,”
[52] S. J. Kiartzis, C. E. Zoumas, J. B. IEEE T. Neural Nets., vol. 4, no. 4, pp. 558–
Theocharis, A. G. Bakirtzis, and V. 568, 1993.
Petridis, “Short-term load forecasting in an [60] G. A. N. Mbamalu and M. E. El-
autonomous power system Hawary, “Load forecasting via
using artificial neural networks,” IEEE suboptimal seasonal autoregressive models
Trans. Power Systems, vol. 12, and iteratively reweighted
no. 4, pp. 1591–1596, 1997. least squares estimation,” IEEE Trans.
[53] K. H. Kim, J. K. Park, K. J. Hwang, Power Systems, vol. 8, no. 1, pp.
and S. H. Kim, “Implementation 343–348, 1993.
of hybrid short-term load forecasting system [61] J. S. McMenamin and F. A. Monforte,
using artificial neural networks “Short-term energy forecasting
and fuzzy expert systems,” IEEE Trans. with neural networks,” Energy J., vol. 19,
Power Systems, vol. 10, no. 4, pp. 43–61, 1998.
no. 3, pp. 1534–1539, 1995. [62] I. Moghram and S. Rahman, “Analysis
[54] R. Lamedica, A. Prudenzi, M. Sforna, and evaluation of five short-term
M. Caciotta, and V. O. Cencelli, load forecasting techniques,” IEEE Trans.
“A neural network based technique for Power Systems, vol. 4, no. 4,
short-term forecasting of pp. 1484–1491, 1989.
anomalous load periods,” IEEE Trans. HIPPERT et al.: NEURAL NETWORKS
Power Systems, vol. 11, no. 4, FOR SHORT-TERM LOAD
pp. 1749–1756, 1996. FORECASTING: A REVIEW AND
[55] K. Y. Lee, Y. T. Cha, and J. H. Park, EVALUATION 55
“Short-term load forecasting using [63] O. Mohammed, D. Park, R. Merchant,
an artificial neural network,” IEEE Trans. T. Dinh, C. Tong, A. Azeem, J.
Power Systems, vol. 7, no. 1, Farah, and C. Drake, “Practical experiences
pp. 124–132, 1992. with an adaptive neural network
[56] K. Liu, S. Subbarayan, R. R. Shoults, short-term load forecasting system,” IEEE
M. T. Manry, C.Kwan, F. L. Lewis, Trans. Power Systems,
and J. Naccari, “Comparison of very short- vol. 10, no. 1, pp. 254–265, 1995.
term load forecasting techniques,” [64] H. Mori and H. Kobayashi, “Optimal
IEEE Trans. Power Systems, vol. 11, no. 2, fuzzy inference for short-term load
pp. 877–882, 1996. forecasting,” IEEE Trans. Power Systems,
[57] C. N. Lu, H. T. Wu, and S. Vemuri, vol. 11, no. 1, pp. 390–396,
“Neural network based short 1996.
term load forecasting,” IEEE Trans. Power [65] S. E. Papadakis, J. B. Theocharis, S. J.
Systems, vol. 8, no. 1, pp. Kiartzis, and A. G. Bakirtzis, “A
336–342, 1993. novel approach to short-term load
[58] S. Makridakis, S. C. Wheelwright, and forecasting using fuzzy neural networks,”
R. J. Hyndman, Forecasting— IEEE Trans. Power Systems, vol. 13, no. 2,
Methods and Applications, 3rd ed, NY: John pp. 480–492, 1998.
Wiley & Sons,
[66] A. D. Papalexopoulos, S. Hao, and T. Brace, “Short-run forecasts of electricity
M. Peng, “An implementation of a loads and peaks,” Int. J. Forec.,
neural network based load forecasting model vol. 13, pp. 161–174, 1997.
for the EMS,” IEEE Trans. [75] D. K. Ranaweera, G. G. Karady, and R.
Power Systems, vol. 9, no. 4, pp. 1956– G. Farmer, “Effect of probabilistic
1962, 1994. inputs in neural network-based electric load
[67] A. D. Papalexopoulos and T. C. forecasting,” IEEE
Hesterberg, “A regression-based approach T. Neural Nets., vol. 7, no. 6, pp. 1528–
to short-term system load forecasting,” IEEE 1532, 1996.
Trans. Power Systems, [76] , “Economic impact analysis of load
vol. 5, no. 4, pp. 1535–1547, 1990. forecasting,” IEEE Trans.
[68] D. C. Park, M. A. El-Sharkawi, R. J. Power Systems, vol. 12, no. 3, pp. 1388–
Marks II, L. E. Atlas, and M. J. 1392, 1997.
Damborg, “Electric load forecasting using [77] R. Reed, “Pruning algorithms—A
an artificial neural network,” survey,” IEEE T. Neural Nets., vol. 4,
IEEE Trans. Power Systems, vol. 6, no. 2, no. 5, pp. 740–747, 1993.
pp. 442–449, 1991. [78] A. P. N. Refenes and A. D. Zapranis,
[69] J. H. Park, Y. M. Park, and K. Y. Lee, “Neural model identification, variable
“Composite modeling for adaptive selection and model accuracy,” J. Forecast.,
short-term load forecasting,” IEEE Trans. vol. 18, pp. 299–332,
Power Systems, vol. 6, no. 2, 1999.
pp. 450–457, 1991. [79] , Principles of Neural Model
[70] T. M. Peng, N. F. Hubele, and G. G. Identification, Selection and Adequacy—
Karady, “Advancement in the application w ith Applications to Financial
of neural networks for short-term load Econometrics: Springer-Verlag,
forecasting,” IEEE Trans. 1999.
Power Systems, vol. 7, no. 1, pp. 250–257, [80] R. Sadownik and E. P. Barbosa, “Short-
1992. term forecasting of industrial
[71] , “An adaptive neural network approach electricity consumption in Brazil,” J.
to one-week ahead load Forecast., vol. 18, pp. 215–224,
forecasting,” IEEE Trans. Power Systems, 1999.
vol. 8, no. 3, pp. 1195–1203, [81] S. Sargunaraj, D. P. Sen Gupta, and S.
1993. Devi, “Short-term load forecasting
[72] A. Piras, A. Germond, B. Buchenel, K. for demand side management,” IEE Proc.—
Imhof, and Y. Jaccard, “Heterogeneous Gener. Transm. Distrib., vol.
artificial neural network for short term 144, no. 1, pp. 68–74, 1997.
electrical load forecasting,” [82] S. A. Soliman, S. Persaud, K. El-Nagar,
IEEE Trans. Power Systems, vol. 11, no. 1, and M. E. El-Hawary, “Application
pp. 397–402, 1996. of least absolute value parameter estimation
[73] S. Rahman and O. Hazim, “A based on linear programming
generalized knowledge-based short-term to short-term load forecasting,” Elect. Power
load-forecasting technique,” IEEE T. Power & Energy Syst.,
Syst, vol. 8, no. 2, pp. vol. 19, no. 3, pp. 209–216, 1997.
508–514, 1993. [83] D. Srinivasan, A. C. Liew, and C. S.
[74] R. Ramanathan, R. Engle, C. W. J. Chang, “Forecasting daily load
Granger, F. Vahid-Araghi, and C.
curves using a hybrid fuzzy-neural approach using self-organizing fuzzy
approach,” IEE Proc.—Gener. ARMAX models,” IEEE Trans.
Transm. Distrib., vol. 141, no. 6, pp. 561– Power Systems, vol. 13, no. 1, pp. 217–225,
567, 1994. 1998.
[84] D. Srinivasan, S. S. Tan, C. S. Chang, [92] H. T. Yang, C. M. Huang, and C. L.
and E. K. Chan, “Parallel neural Huang, “Identification of ARMAX
network-fuzzy expert system strategy for model for short term load forecasting: An
short-term load forecasting: evolutionary programming
System implementation and performance approach,” IEEE Trans. Power Systems, vol.
evaluation,” IEEE Trans. 11, no. 1, pp. 403–408,
Power Systems, vol. 14, no. 3, pp. 1100– 1996.
1006, 1999. [93] H. Yoo and R. L. Pimmel, “Short term
[85] J. W. Taylor and S. Majithia, “Using load forecasting using a selfsupervised
combined forecasts with changing adaptive neural network,” IEEE Trans.
weights for electricity demand profiling,” J. Power Systems, vol.
Oper. Res. Soc., vol. 51, no. 14, no. 2, pp. 779–784, 1999.
1, pp. 72–82, 2000. [94] Z. Yu, “A temperature match based
[86] J. Vermaak and E. C. Botha, “Recurrent optimization method for daily load
neural networks for short-term prediction considering DLC effect,” IEEE
load forecasting,” IEEE Trans. Power Trans. Power Systems, vol.
Systems, vol. 13, no. 1, pp. 11, no. 2, pp. 728–733, 1996.
126–132, 1998. [95] J. L. Yuan and T. L. Fine, “Neural-
[87] J. P. Vila, V.Wagner, and P. Neveu, network design for small training sets
“Bayesian nonlinear model selection of high dimension,” IEEE T. Neural Nets.,
and neural networks:A conjugate prior vol. 9, no. 2, pp. 266–280,
approach,” IEEE T. Neural Nets., 1998.
vol. 11, no. 2, pp. 265–278, 2000. [96] G. Zhang, B. E. Patuwo, and M.Y. Hu,
[88] A. S. Weigend and N. A. Gershenfeld, “Forecasting with artificial neural
Eds., Time Series Prediction: networks: The state of the art,” Int. J.
Forecasting the Future and Understanding Forecast., vol. 14, pp. 35–62, 1998.
the Past. Reading, MA: Henrique S. Hippert is an Assistant
Addison-Wesley, 1994. Professor at the Department of Statistics
[89] A. S. Weigend and D. A. Nix, (Univ. Federal de Juiz de Fora, Brazil).
“Prediction with confidence intervals Received the D.Sc. degree from Pontificia
(local error bars),” in Proc. Int. Conf. Neural Universidade Catolica do Rio de Janeiro
Info. Processing (Brazil). Main research interests:
(ICONIP’94), Seoul, Korea, 1994, pp. 847– forecasting and neural networks.
852. Carlos E. Pedreira received the Ph.D.
[90] A. S.Weigend and A. N. Srivastava, degree in 1987 from the Imperial
“Predicting conditional probability College, University of London (EE
distributions: A connectionist approach,” Department). He has been an Associate
Int. J. Neural Syst., vol. 6, pp. Professor at the Pontificia Universidade
109–118, 1995. Catolica do Rio de Janeiro since 1993
[91] H. T. Yang and C. M. Huang, “A new (Assistant Prof. 1987–1993). Dr. Pedreira is
short-term load forecasting the Founding President of the
Brazilian Neural Networks Council (1992–
1994). Papers published in IEEE
TRANSACTIONS ON SYSTEMS, MAN,
AND CYBERNETICS, Intern. J. Neural
Systems, Mathematical Programming, J.
Computation Intelligence in Finance.
Patents for medical devices registered in
Brazil. Member of the editorial board
of the J. Comp. Intell. in Finance since
1997, and of the Computational Finance
Prog. Committee since 1994. Hobbies: art
photography, gourmet cooking and
wine tasting.
Reinaldo Castro Souza received the Ph.D.
degree in 1979 from Warwick University,
Coventry, UK (Statistics Department) and
afterwards spent a period
(1986/1987) as a Visiting Fellow at the
Statistics Department of the London
School of Economics. His major research
interests are in the field of Time Series
Analysis and Forecasting. He has been an
Associate Professor at Pontificia
Universidade Catolica do Rio de Janeiro
since 1990. Member of the IIF (International
Institute of Forecasters) and President of the
Brazilian Operations
Research Society since 1994. He has
published papers in international journals
such as: J. Forecast., Intern. J. Forecast., J.
Applied Meteorology, Latin-American
O. R. J., Stadistica, among others. Hobbies:
sports (tennis, volley-ball,
soccer), arts and French literature.

You might also like