Adaptive Neural Networks and Adaptive Neuro-Fuzzy Inference System for short-term load forecasting: a review and concept. Load forecasting has become a commodity to be sold and bought at market prices. This review examines a related variables in order to investigate the reasons for skepticism.
Adaptive Neural Networks and Adaptive Neuro-Fuzzy Inference System for short-term load forecasting: a review and concept. Load forecasting has become a commodity to be sold and bought at market prices. This review examines a related variables in order to investigate the reasons for skepticism.
Adaptive Neural Networks and Adaptive Neuro-Fuzzy Inference System for short-term load forecasting: a review and concept. Load forecasting has become a commodity to be sold and bought at market prices. This review examines a related variables in order to investigate the reasons for skepticism.
Adaptive Neural Networks and Adaptive Neuro-Fuzzy Inference System for short-term load forecasting: a review and concept. Load forecasting has become a commodity to be sold and bought at market prices. This review examines a related variables in order to investigate the reasons for skepticism.
Copyright:
Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online from Scribd
Download as pdf or txt
You are on page 1of 25
Adaptive Neural Networks and Adaptive Neuro-Fuzzy Inference System for Short-Term
Load Forecasting: A Review and Concept
MR. VAIBHAV S. TELRANDHE1, PROF. V. R. INGALE 2 Department of Electronics Engineering S.D. College of Engineering1, B. D. College of Engineering2 Sewagram,Wardha, Maharashtra INDIA [email protected], [email protected] Abstract—Load forecasting has become in turned into a commodity to be sold and recent years one of the major areas of bought at market prices. Since the load research in electrical engineering, and forecasts play a crucial role in the most traditional forecasting models and composition of these prices, they have artificial intelligence techniques have become vital for the supply industry. been tried out in this task. Adaptive Load forecasting is however a difficult task. neural networks (NNs) have lately First, because the load series is complex and received much attention, and a great exhibits several levels of seasonality: the number of papers have reported load at a given hour is dependent not only successful experiments and practical tests on the load at the previous hour, but also on with them. Nevertheless, some authors the load at the same hour on the previous remain skeptical, and believe that the day, and on the load at the same hour on the advantages of using NNs in forecasting day with the same denomination in the have not been systematically proved yet. previous week. Secondly, because there are In order to investigate the reasons for many important exogenous variables that such skepticism, this review examines a must be considered, specially weather- collection of papers (published between related variables. It is relatively easy to get 1991 and 1999) that report the application forecasts with about 10% mean absolute of NNs to short-term load forecasting. percent error (MAPE); however, the costs of Our aim is to help to clarify the issue, by the error are Manuscript received August 24, critically evaluating the ways in which the 1999. H. S. Hippert was supported by a NNs proposed in these papers were Ph.D. scholarship granted by the Brazilian designed and tested. Foundation for the Co-ordination of Higher Index Terms—Load forecasting, Education and Graduate Training (PICDT- multilayer perceptrons, neural network C APES). H. S. Hippert is with the applications, neural networks, overfitting. Department of Statistics, Universidade Federal de Juiz de Fora, Brazil. C. E. I. INTRODUCTION Pedriera and R. C. Souza are with the THE FORECASTING of electricity demand Department of Electrical Engineering, has become one of the major research fields Pontificia Universidade Catolica do Rio De in electrical engineering. The supply Janeiro, Brazil. Publisher Item Identifier S industry requires forecasts with lead times 0885-8950(01)02306-9.so high that research that range from the short term (a few that could help reducing it in a few percent minutes, hours, or days ahead) to the long points would be amply justified. A often term (up to 20 years ahead). Short-term quoted estimate in [10] suggests that an forecasts, in particular, have become increase of 1% in the forecasting error increasingly important since the rise of the would imply (in 1984) in a £10 million competitive energy markets. Many countries increase in operating costs per year (recent have recently privatized and deregulated studies on the economic aspects of load their power systems, and electricity has been forecasting are [9], [26] ,[41], [76]). Most forecasting models and methods have their application to the load forecasting already been tried out on load forecasting, problem were published in the late 1980’s with varying degrees of success. They may and early 1990’s [21]. Since then, the be classified as time series (univariate) number of publications has been growing models, in which the load is modeled as a steadily. Judging from the number of papers, function of its past observed values, and NN-based forecasting systems have not causal models, in which the load is modeled turned into a “passing fad,” as it was feared as a function of some exogenous factors, they might [12]. It seems that they have specially weather and social variables. been well accepted in practice, and that they Some models of the first class suggested in are used by many utilities [50]. recent papers are multiplicative Nevertheless, the reports on the performance autoregressive models [60], dynamic linear of NNs in forecasting have not entirely [27] or nonlinear [80] models, threshold convinced the researchers in this area, and autoregressive models [43], and methods the skepticism may be partly justified. based on Kalman filtering [46], [69], [81]. Recent reviews and textbooks on forecasting Some of the second class are Box and argue that there is little systematic evidence Jenkins transfer functions [34], [47], as yet that NNs might outperform standard ARMAX models [91], [92], optimization forecasting methods [13], [58]. Reviews of techniques [94], nonparametric regression NN-based forecasting systems have [11], structural models [36], and curve- concluded that much work still needs to be fitting procedures [85]. Despite this large done before they are accepted as established number of alternatives, however, the most forecasting techniques [33], popular causal models are still the linear [38], [96], and that they are promising, but regression ones [30], [35], [67], [74], [82], that “a significant portion of the NN and the models that decompose the load, research in forecasting and prediction lacks usually into basic and weatherdependent validity” [1]. How could this skeptical components [10], [31], [45], [69]. These attitude adopted by some experts be models are attractive because some physical reconciled with the apparent success interpretation may be attached to their enjoyed by the NNs in load forecasting? In components, allowing engineers and system order to investigate this matter we reviewed operators to understand their behavior. 40 papers that reported the application of However, they are basically linear devices, NNs to short-term load forecasting. These and the load series they try to explain are papers were selected from those published in known to be distinctly nonlinear functions of the leading journals in electrical engineering the exogenous variables. between 1991 and 1999 (conferences In recent times, much research has been proceedings were not considered). carried out on the application of artificial We found that, on the whole, two major intelligence techniques to the load shortcomings detract from the credibility of forecasting problem. Expert systems have the results. First, most of the papers been tried out [39], [73], and compared to proposed NN architectures that seemed to be traditional methods [62]. Fuzzy inference too large for the data samples they intended [64] and fuzzy-neural models [7], [65] have to model, i.e., there seemed to be too many also been tried out. However, the models parameters to be estimated from that have received the largest share of comparatively too few data points. These attention are undoubtedly the artificial NNs apparently over fitted their data and neural networks (NNs). The firstreports on one should, in principle, expect them to yield poor out-of-sample forecasts. function , or bounded sigmoid (s-shaped) Secondly, in most papers the models were functions, as the logistic one . The neurons not systematically tested, and the results of are organized in a way that defines the the tests were not always presented in an network architecture. The one we shall be entirely satisfactory manner. This paper is most concerned with in this paper is the organized as follows. In Section II we give a multilayer perceptron (MLP) type, in which short introduction to NN modeling. In the Fig. 1. An artificial neuron. Fig. 2. A Section III we briefly compare the two-layer feed-forward neural network. approaches taken by each paper to the load neurons are organized in layers. The neurons forecasting problem, and we outline the in each layer may share the same inputs, but main features of the multilayer perceptrons are not connected to each other. If the they proposed. In Section IV we summarize architecture is feed-forward, the outputs of the choices and procedures reported in each one layer are used as the inputs to the paper for data pre-processing, NN design, following layer. The layers between the implementation and validation. In Section V input nodes and the output layer are called we focus on the problems of overfitting and the hidden layers. Fig. 2 shows an example model validation of the proposed models. In of a network with four input nodes, two Section VI we briefly review papers that layers (one of which is hidden), and two suggest NN architectures other than the output neurons. The parameters of this multilayer perceptron, or some ways to network are the weight matrix (containing combine them to linear methods. Section the weights that connect the neuron to the VII is the conclusion. II. A SHORT input), the weight matrix , and the bias INTRODUCTION TO NEURAL vector (the bias connections have not been NETWORKS In this section we provide a represented in the figure). If logistic short introduction to neural networks (a functions are used for the activation of the complete treatment of the subject may be hidden layer, and linear functions used for found in [8], [37]). Artificial neural the output layer, this network is equivalent networks are mathematical tools originally to the model which shows how complex and inspired by the way the human brain flexible even a small network can be. The processes information. Their basic unit is the estimation of the parameters is called the artificial neuron, schematically represented “training” of the network, and is done by the in Fig. 1. The neuron receives (numerical) minimization of a loss function (usually a information through a number of input quadratic function of the output error). nodes (four, in this example), processes it Many 46 IEEE TRANSACTIONS ON internally, and puts out a response. The POWER SYSTEMS, VOL. 16, NO. 1, processing is usually done in two stages: FEBRUARY 2001 optimization methods first, the input values are linearly combined, have been adapted for this task. The first then the result is used as the argument of a training algorithm to be devised was the nonlinear activation function. The back-propagation one, which uses a combination uses the weights attributed to steepest-descent technique based on the each connection, and a constant bias term , computation of the gradient of the loss represented in the figure by the weight of a function with respect to the network connection with a fixed input equal to 1. The parameters (that is the reason why the activation function must be a non decreasing activation functions must be differentiable). and differentiable function; the most Many other training algorithms, though, are common choices are either the identity now available. In load forecasting applications, this basic form of multilayer both confidence intervals [89] and feed-forward architecture shown above is conditional probability densities [44], still the most popular. Nevertheless, there is [90]. In terms of practical applications in a large number of other designs, which forecasting, the success of NNs seems to might be suitable for other applications. depend on the kind of problem under Artificial NNs have been developed and consideration. extensively applied since the mid-1980’s. An overview of the application of NNs to There are many reports of successful forecasting may be found in [88]. applications [25], particularly in pattern III. AN OVERVIEW OF THE PROPOSED recognition and classification [42], and in NN-BASED FORECASTING SYSTEMS nonlinear control problems, where they may Most of the papers under review proposed have some advantages over the traditional multilayer perceptrons that might be techniques. Since quantitative forecasting is classified into two groups, according to the based on extracting patterns from observed number of output nodes. In the first group past events and extrapolating them into the are the ones that have only one output node, future, one should expect NNs to be good used to forecast next hour’s load, next day’s candidates for this task. In fact, peak load or next day’s total load. In the NNs are very well suited for it, for at least second group are the ones that have several two reasons. First, it has been formally output nodes to forecast a sequence of demonstrated that NNs are able to hourly loads. Typically, they have 24 nodes, approximate numerically any continuous to forecast next day’s 24 hourly loads (this function to the desired accuracy (see [37], series of hourly loads is called the “load [96] for references). In this sense, NNs may profile”). be seen as multivariate, nonlinear and We start with the first group. Reference [68] nonparametric methods, and they should be used three small sized NNs to forecast expected to model complex nonlinear hourly loads, total loads and peak loads relationships much better than the traditional TABLE I INPUT CLASSIFICATION linear models that still form the core of the (2) And (3)-C: the day’s position in the forecaster’s methodology. Secondly, calendar (weekday/weekend/holiday, month, NNs are data-driven methods, in the sense or season), L: load, T: temperature, H: that it is not necessary for the researcher to humidity, W: weather postulate tentative models and then estimate variables (other than T and H), f(T): their parameters. Given a sample of input nonlinear functions of T, LP: load and output vectors, the NNs are able to parameters. (4) There are as many sets of automatically map the relationship between classes as classification criteria. them; they “learn” this relationship, and Cells marked with “...”: values were not store this learning into their parameters. As reported in the paper. these two characteristics suggest, (only one of these NNs is included in the NNs should prove to be particularly useful Tables I–III). Reference when one has a large amount of data, but [40] used a NN to forecast next day’s peak little a priori knowledge about the laws that load, which is govern the system that generated the data. needed as an input to the expert system in In terms of theoretical research in [39] that forecasts next forecasting, NNs have progressed from day’s profile. [14] suggested a nonfully computing point estimates to computing connected network, in order to reduce the number of weights. (3) input/hidden/output layers. Some papers Reference [70] proposed reported ranges for the number of neurjons. two NNs, one of which included a linear They are indicated neuron among the sigmoidal by colnons. (4) L: linear, S: signoidal, Sin: ones in the hidden layer. Reference [23] sinusoidal. (5) cv: cross–validation, tol: experimented training was carried on with feed-forward and recurrent NNs to until a specified tolerance (in-sample error) forecast hourly loads, was reached, # iter: training was carried on and was the only paper to report that linear for a fixed number of models actually performed iterations. Cells marked with“...”: values better than those NNs. were not reported in the paper. These NNs with only one output neuron NN to forecast for each period. Reference were also used to [57] experimented forecast profiles, in either of two ways. The with three NNs to model data from two first way was by repeatedly utilities, and concluded forecasting one hourly load at a time, as in that NNs are “system dependent,” i.e., must [28], [29]. be tailored for each The second way was by using a system with specific utility. Reference [66] included 24 NNs in parallel, nonlinear functions of one for each hour of the day: [61] compared temperature to the inputs, and also suggested the results of such a procedure to improve a system to those of a set of 24 regression the forecasting on holidays. Reference [6] equations; [18] considered improved on it, the load as an stationary process with level specially for forecasting sequences of changes and holidays. Reference [52] outliers, filtered these out by a Kalman trained a NN with data from a small utility, filter, and modeled the and found it necessary remaining load by a NN system; [86] to smooth their sample data by a manual considered the load as the pre-processing output of a dynamic system, and modeled it procedure. Reference [15] used a fractional by a set of 24 recurrent factorial experiment NNs. to find out the “quasioptimal” network Most of the profile forecasting, however, design parameters was done with the (number of neurons and layers, activation NNs of the second group (the ones with functions, stopping several input nodes). criteria, etc.) and came up with a rather Reference [55] divided a day into 3 periods, unusual architecture: and had a large a recurrent NN, with sinusoidal activation HIPPERT et al.: NEURAL NETWORKS functions in the FOR SHORT-TERM LOAD hidden layers. In [16], the same NN was FORECASTING: A REVIEW AND trained by a weighted EVALUATION 47 least squares procedure in which the weights TABLE II were the marginal NN ARCHITECTURES AND energy costs. Since the load series are often IMPLEMENTATION nonstationary, [17] suggested that NNs could be used to produced by both methods were linearly model the first differences combined. It is argued of the series, as nonlinear extensions to the that the second NN allowed the system to ARIMA adapt more quickly to models. Other authors dealt with the abrupt changes in temperature. In [63] the problem of nonstationarity hourly loads were by detrending the series [63], [83], [84] or classified according to the season (seven by filtering it with a classes), to the day Kalman filter [18]. of the week (three classes) and to the period Some papers suggested systems in which a of the day (five number of NNs classes), and each of these classes was worked together to compute the forecasts. modeled by one of the Reference [2] used a independent NNs that made up a very large small NN that pre-processed some of the system. Reference data and produced estimates [72] used a neural-gas network (a kind of of peak load, valley load, and total load, NN used for vector which were fed, 48 IEEE TRANSACTIONS ON POWER together with some other data, into a very SYSTEMS, VOL. 16, NO. 1, FEBRUARY large NN that computed 2001 next day’s profile. Reference [54] suggested TABLE III a system of NUMBER OF PARAMETERS AND 12 NNs, one for each month of the year. In SAMPLE SIZES order to improve the (2) Some papers reported ranges for the forecast for “anomalous” days (holidays), number of neurons in their MLPs; these the daily load profiles ranges are indicated by colons; were classified by a Kohonen self-organized (3) The numbers marked with (?) represent map. Reference our best guesses, since the actual numbers [51] proposed a system in which the results were not clearly reported of hourly, daily in the papers; (4) The ranges for the number and weekly modules (38 NNs in total) were of weights correspond to the ranges for the linearly combined. number of neurons. This system was replaced in [49] by a Cells marked with“...”: values were not smaller one, composed of reported in the paper. 24 NNs, one for each hour of the day. Later, quantization [59]) to classify the data into some of these authors two groups, summer proposed a system with only two NNs [50]. and winter, which were modeled by separate One of them feed-forward NNs. was trained to produce a first forecast of The forecasts were combined by a fuzzy tomorrow’s profile. The module. other one was trained to estimate Fuzzy logic, another artificial intelligence tomorrow’s load changes with technique, has also respect to today’s loads; these changes, been tried in combination with NNs in some added to today’s loads, of the most recent made up a second forecast of tomorrow’s papers. Reference [83] included a “front-end profile. The forecasts fuzzy processor” that received quantitative and qualitative models in the minute-by-minute forecasting data, and put of the load in the out four fuzzy numbers that measured the next half-hour. expected load change We discuss these papers more fully in the in each of the four periods into which the Sections IV and target day had been Section V, focusing on the way the MLPs divided; these numbers, together with some models were designed temperature data, and tested.We are not concerned with the were fed to the NN that computed the fuzzy modules; fuzzy forecasted profile. The neural networks [7], [65] are also outside the fuzzy pre-processing reduced the number of scope of this paper. NN inputs and allowed A few papers suggested NN architectures the system to work on qualitative data. other than the MLP, Reference [53] sometimes combined with traditional linear placed the fuzzy engine after the NN. The methods; we review NN provided a “provisional these briefly in Section VI. forecast” based on past loads, which was IV. ISSUES IN DESIGNING A NN- afterwards BASED FORECASTING SYSTEM modified by the fuzzy engine with basis on Neural networks are such flexible models the temperature and that the task of designing type of day (regular or holiday). Particular a NN-based forecasting system for a attention was given particular application to the modeling of holidays. Reference [22] is far from easy. There is a large number of classified the data choices that have into 48 fuzzy subsets according to to be made, but very few guidelines to help temperature and humidity; the designer through each subset was modeled by a separate NN. them. Some recent theoretical contributions Reference [72], already can be found in [3], mentioned above, used a fuzzy module to HIPPERT et al.: NEURAL NETWORKS combine results FOR SHORT-TERM LOAD from two separate NNs. Reference [48] FORECASTING: A REVIEW AND forecasted the demand EVALUATION 49 of a residential area by decomposing it into a [78], [79], [87], but they have not been normal load and a much tested in practice weather sensitive load. The normal load was as yet. modeled by three The design tasks can be roughly divided into NNs (for week days, Saturdays and four headings: Sundays); the weather sensitive A. Data pre-processing; load was modeled by a fuzzy engine, with B. NN designing; basis on weather C. NN implementation; data. D. Validation. The only paper that dealt with very short- This subdivision is somewhat artificial, as term forecast was these stages in practice [56]. The authors compared auto-regressive, tend to overlap, but it is useful in the fuzzy and NN organization of what is to follow. In most of the papers we profile is the calendar date (see Table I); the reviewed, the authors week day profiles made their choices guided by empirical tests are typically very different from the and simulations. weekend profiles. Thus, the In the next four sub-sections we summarize basic classification is into two groups: week these choices; in days and weekend Section V, we discuss their consequences days. References [17], [68] dealt only with and implications. the week day profiles, A. Data Pre-Processing discarding the weekends and holidays. Before data are ready to be used as input to a Reference [57] ignored NN, they may such distinction, but got poor results as a be subjected to some form of pre- consequence processing, which usually intends on weekends and holidays. Sometimes the to make the forecasting problem more profiles of the days manageable. Preprocessing just before Saturdays or after Sundays may may be needed to reduce the dimension of be disturbed by the input the weekend, so that special classes may be vector, so as to avoid the “curse of needed for Mondays, dimensionality” (the exponential Fridays, or even Thursdays [2], [63], [70]. growth in the complexity of the problem that Weekend days results may be further classified according to the from an increase in the number of social customs and dimensions). Pre-processing working patterns in the country. If this may also be needed to “clean” the data, by process is continued, removing outliers, the number of classes may rise to eleven missing values or any irregularities, since [40], though the usual NNs are sensitive to number is seven (one class for each day of such defective data. References [52], [72] the week). The typical devised heuristics to within-week profiles may change from regularize their data; [18] filtered out the season to season, irregularities with a and they should then be further classified Kalman filter. according to month or Pre-processing also frequently means season [63], [72]. partitioning the input Holidays pose a special problem. Some space so that a “local” model may be authors group them designed for each subspace. with the weekend days; others reserve them These models will be simpler and will special classes, or require less data devise heuristics to deal with them [6], [50], than a “global” model. In load forecasting, [51], [66]. this is usually done The second most important factor to affect by classifying the input data (past load the load profile profiles or weather data), is the weather. Because of this, the days may and then using separate NNs to model data be classified from each class. according to the weather conditions, by The most important factor to determine the statistical measures shape of the load of similarity [28], [29], [70], by fuzzy permit network training. Extreme instances engines [22], and by of this subdivision neural-gas networks [72]. are found in [22], [63]. In [54] the profiles themselves (not the B. Designing the Neural Network temperatures or Selecting an appropriate architecture is in the calendar dates) were classified by a general the first Kohonen selforganized step to take when designing a NN-based map. The classes were then interpreted by forecasting system. the system Many types of architecture have already operator, so that the class to which the target been used in forecasting day would belong applications. In all the papers discussed in could be predicted. this chapter, These classification procedures give a class however, the authors used the NN work- label to each of horse, the multilayer the profiles in the training sample. These perceptron, usually a feed-forward one (the labels will make up exceptions were a new variable that must be included in the the recurrent networks in [15], [16], [86]). model. This may Most of them were be done in either of two ways. The first one fully-connected (the exception was in [14]). is by coding these Having chosen this type of architecture, one class labels and using the codes as input must then decide variables. That may on the number of hidden layers, of input mean adding to the NN as many input nodes nodes, of neurons per as the number of layer, and on the type of activation functions classes, and having each class represented (see Table II). The by a dummy variable number of hidden layers is not difficult to [6], [14], [15], [49], [50], [52], [57], [66]; or, determine. It has been alternatively, shown that one hidden layer is enough to numbering the classes (normally on a binary approximate any continuous basis), and feeding functions, although two layers may be useful these numbers to the NN through a few in some input nodes, so that a specific circumstances (see [37] for large number of classes may be represented references). The papers we by a comparatively reviewed used either one or two hidden small number of input nodes [2], [22], [54]. layers. Defining the activation The secondway to use this information is by function for the hidden neurons is also not building separate difficult. This NN models for each class (or a common functions must be differentiable and NN, with a separate nondecreasing; most papers set of weights for each class) [22], [28], used either the logistic or the hyperbolic [51], [54], [55], [57], tangent functions, [63], [72]. However, if the data are and it is not clear whether the choice has any subdivided into too many effect on the forecasting classes, there will not be enough profiles left accuracy [96]. in each class to In the following sub-sections we report how the number of output neurons, input nodes, and hidden with both methods a) and c), and found that neurons were defined their results in the papers under review. were roughly equivalent. 1) Selecting the Number of Output Neurons: b) Multi-Model Forecasting: This is a Six of the papers common method listed in Table II used one-output NNs to for load forecasting with regression models: produce one-stepahead using 24 different forecasts: forecasts for next day’s peak load models, one for each hour of the day. or total load, Among the papers we or forecasts for hourly loads (that is, given reviewed, [18], [61], [86] used systems of 24 the load series up to NNs in parallel. hour , forecasts for the load at hour ). Most The advantage of this method is that the of the reviewed individual networks 50 IEEE TRANSACTIONS ON POWER are relatively small, and so they are not SYSTEMS, VOL. 16, NO. 1, FEBRUARY likely to be overfitted. 2001 c) Single-Model Multivariate Forecasting: papers however were concerned with This is done forecasting profiles. They by using a multivariate method to forecast did this in one of three ways: a) iterative all the loads at once, forecasting; b) multimodel so that each profile is represented by a 24- forecasting; c) single-model multivariate dimensional vector. forecasting. This method was used by most of the a) Iterative Forecasting: This is done by researchers, who designed forecasting one MLPs with 24 neurons in the output layer. hourly load at a time and then aggregating (The exceptions were this load to the series, [2], that forecast loads at each half-hour and so that the forecasts for the later hours will so needed 48 output be based on the neurons; and [55], that divided the profile forecasts for the earlier ones. If the model is into three parts, forecasted an ARIMA, it may by separated MLPs.) This method, however, be shown that the forecasts will eventually has two serious converge to the series drawbacks. The first one is that the MLPs average. However, it is not clear what must be very happens if the model large in order to accommodate 24 output is a MLP. Reference [38] studied the neurons ([56], doing accuracy of multi-steps minute-by-minute forecasting for the next forecasts obtained iteratively by a MLP on half-hour using this the -competition method, tried to reduce the number of output time series, and found that this MLP nodes by compressing outperformed the statistical the 30-dimension output vector into 16 models; however, [19], [20] reported that the nodes, using a MLP outputs transformation technique). If the loads of may behave chaotically. Among the papers one or two previous we reviewed, [28], days are used as inputs, 24 or 48 input nodes [29] used only this iterative method, will be required, whereas [55], [57] experimented and the number of MLP parameters will needs to select what lagged load values very likely run into the should be used as inputs. thousands. Some authors tried to adapt the Box and The second drawback is that treating each Jenkins methodology day as a vector for fitting ARIMA models, and selected the means that one year of data will yield only lags by the analysis 365 data points, of the autocorrelation functions (ACF) and which seems to be too few for the large the partial autocorrelation MLPs required. Trying functions (PACF) [14], [57]. In doing so, to increase the sample size by aggregating however, they data from years way run the risk of discarding lagged variables back in the past may not be feasible, because that showed no significant in most places the linear correlation to the load, but which were load series show a very clear upward trend. strongly 2) Selecting the Number of Input Nodes: nonlinearly correlated to it. Reference [28] After selecting the used phase-space number of layers and the number of output embedding, a technique that represents a neurons required, system by one variable one must choose the inputs. There are very and its lagged versions, to help determining few theoretical which lagged considerations to help in this decision; values of the load series should be used as usually, one must have inputs. some a priori knowledge about the behavior As the short-term load forecasting problem of the system has been intensively under study, and of the factors that condition studied for decades, there are some the output of that empirical guidelines system. that may help in selecting among the The first variable to be used is almost candidate exogenous variables. certainly the load itself, The main variable to be included is the air as the load series is strongly autocorrelated. temperature, If the MLP since it has been known since de 1930’s that forecasts the profiles, treating them as 24- the demand rises on dimension vectors, cold days because of the use of electric the researcher does not have much choice: space- and water-heating either he uses data devices, and on hot days, because of air from one past day [2], [17], [51] or from two conditioning. The function [6], [15], [16]. that relates the temperature to the load is No experiments with three or more past days clearly nonlinear; were reported, that is, of course, one of the main because they would imply having 72 or motivations to use NN in this more input nodes for context, since NNs can easily deal with the loads only. If however the MLP is nonlinear relationships. forecasting hourly loads, However, since this function seems to be U- the problem becomes more complex, since shaped in many the researcher then countries (see for example the graphs relating peak load to maximum temperature in [34], [35], [95]), some than determining the size of the input or the authors used piecewise output layers. There linear-quadratic functions of the temperature is again little theoretical basis for the as input variables decision, and very few [52], [66]. The idea was borrowed from successful heuristics have been reported earlier papers that [96]. The issue may be modeled the components of linear regression roughly compared to that of choosing the models by such number of harmonics piecewise functions [67] or by polynomial to be included in a Fourier model to functions [34]. Some approximate a function; authors experimented with other variables, if they are too few, the model will not be such as relative humidity flexible enough to or wind speed, since they have a strong model the data well; if they are too many, effect on the the model will overfit human sensation of thermal discomfort and the data. In most papers, authors chose this may help explaining number by trial and the use of heating and cooling devices [2], HIPPERT et al.: NEURAL NETWORKS [17], [22], [50], [51]. FOR SHORT-TERM LOAD Others concluded that the only significant FORECASTING: A REVIEW AND weather variable was EVALUATION 51 the temperature [6], [53], [66]. In most error, selecting a few alternative numbers cases, however, the authors and then running simulations did not have much choice, as data on to find out the one that gave the best fitting weather variables (or predictive) other than temperature were simply performance. Some of the papers reported unavailable. that variations in As the MLPs were used in these papers as the number of hidden neurons did not nonlinear regression significantly affect forecasting models, load forecasts required weather accuracy [6], [49]. forecasts. Most C. Neural Network Implementation authors run their simulations using observed After an MLP has been designed, it must be weather values instead trained (that is, its of forecasted ones, which is standard parameters must be estimated). One must practice in load forecasting; select a “training algorithm” however, one should keep in mind that the for this task. The most common in use is the forecasting backpropagation errors in practice will be larger than those algorithm, based on a steepest-descent obtained in simulations, method that performs because of the added weather forecast stochastic gradient descent on the error uncertainty (for surface, though some studies on the effect of this uncertainty many alternatives to it have been proposed see [27], [75]). in recent years. Since 3) Selecting the Number of Hidden Neurons: these algorithms are iterative, some criteria Determining must be defined to the number of neurons in the hidden layer stop the iterations. In most of the papers may be more difficult reviewed, training was stopped after a fixed number of iterations, or design, implementation and validation of after the error decreased NN models. We shall belowsome specified tolerance (see Table use some of the guidelines proposed in [1] II). These criteria for the evaluation of are not adequate, as they insure that the these choices. model fits closely to A. Evaluating the Neural Network Design— the training data, but do not guarantee good The Problems of out-of-sample performance; Overfitting and Overparameterization they may lead to overfitting of the model The problem of overfitting is frequently (this point mentioned in the is further discussed in Section V). NN literature. It seems, however, that Lastly, the training samples must be different authors give appropriately selected. this word different meanings. For instance, Since NNs are “data-driven” methods, they [1] remark that typically require the usual MLPs trained by backpropagation large samples in training. References [28], “are known to be [29], [70] trained seriously prone to overfitting” and that this their MLPs on small subsets that included could be prevented data from only a by avoiding excessive training. On the other few past days selected through statistical hand, [33], [96] measures of similarity. remark that MLPs are prone to overfit That resulted in samples that were very sample data because of homogeneous, but also the large number of parameters that must be very small. estimated. These D. Neural Network Validation authors mean two different things, and we The final stage is the validation of the should better start proposed forecasting by defining our terms. system. It is well known that goodness-of-fit “Overfitting” usually means estimating a statistics are not model that fits the enough to predict the actual performance of data so well that it ends by including some a method, so most of the error randomness researchers test their models by examining in its structure, and then produces poor their errors in samples forecasts. In MLPs, other that the one used for parameter as the remarks above imply, this may come estimation (out-ofsample about for two reasons: errors, as opposed to insample errors). Some because the model was overtrained, or of the papers because it was too reviewed did not clearly specify whether the complex. results they One way to avoid overtraining is by using reported had been obtained insample or out- cross-validation. of-sample. The sample set is split into a training set and V. DISCUSSION a validation set. In this section we discuss the implications The NN parameters are estimated on the and consequences training set, and the of the choices made in the papers under performance of the model is tested, every review on the issues of few iterations, on the validation set. When this performance the number of parameters to be estimated. starts to deteriorate Many methods have (which means the NN is overfitting the been suggested to “prune” the NN, i.e., to training data), the iterations reduce the number are stopped, and the last set of parameters to of its weights, either by shedding some of be computed the hidden neurons, is used to produce the forecasts. or by eliminating some of the connections Another way is by using regularization [77]. However, the techniques. This involves adequate rate between the number of sample modifying the cost function to be points required minimized, by adding for training and the number of weights in the to it a term that penalizes for the complexity network has not of the model. yet been clearly defined; it is difficult to This term might, for example, penalize for establish, theoretically, the excessive curvature how many parameters are too many, for a in the model by considering the second given sample size. derivatives of the Returning nowto the load forecasting output with respect to the inputs. Relatively problem, Table III compares simple and smooth the sizes of the parameter sets to the sizes of models usually forecast better than complex the training ones. Overfitted sets in the papers under review. (The NNs may assume very complex forms, with number of parameters was pronounced curvature, not explicitly reported in any of the papers. since they attempt to “track down” every We computed it by single data point piecing together the information about the in the training sets; their second derivatives number of MLPs and are therefore very the number of neurons per MLP in the large and the regularization term grows with forecasting systems.) It respect to the error may be seen, by comparing columns (4) and term. Keeping the total error low, therefore, (5), that most of means keeping the the proposed MLPs, specially the ones that model simple. None of the papers we forecasted profiles, reviewed, however, used had more parameters than training points. regularization techniques. This is a consequence Overfitting however may also be a of the way they were designed; using 24 consequence of overparameterization, output neurons implies that is, of the excessive complexity of the that the MLPs will be large, and the model. The problem is very common in samples, small. One should, MLP-based models; in principle, expect these MLPs to fit their since they are often (and improperly) used training data very as “black-box” well (in fact, to overfit them), but one should devices, the users are sometimes tempted to not expect them to add to them a large produce good forecasts. number of variables and neurons, without 52 IEEE TRANSACTIONS ON POWER taking into account SYSTEMS, VOL. 16, NO. 1, FEBRUARY 2001 B. Evaluating the Neural Network the test samples were adequate, so that some Implementation inference might be The next stage in modeling is the drawn. implementation of the NN, Item i) may be interpreted in two ways. that is, the estimation of its parameters. The First, the proposed guidelines proposed method may be compared to some “naïve” in [1] for evaluating “Effectiveness of method, which implementation” were provides a (admittedly low) benchmark. The based on the question: was the NN properly proposed method trained and tested, must be noticeably better than the naïve so that its performance was the best it could method, otherwise possibly achieve? there would be no point in adopting it. The According to these authors, the NN can be naïve forecast said to have been is also useful, besides, to show the reader properly implemented if how difficult a i) it was well fitted to the data (the errors in forecasting problem is; as the size and load the training profiles of the sample must be reported); utilities concerned vary greatly across the ii) its performances in the training sample reviewed papers, it and in the test would have been interesting to see how samples were comparable; difficult the problems iii) its performances across different test were in each case (no paper reported them, samples were however). Second, coherent. the performance of the proposed method Few of the papers under review reported any may be compared to insample results that of a good standard method. The [52], [61], so effectiveness of proposed method may be implementation may not be examined not much more accurate than the standard further. one, but it must have C. Evaluating the Neural Network some other advantage (it may be easier to Validation use, for instance). Guidelines for evaluating the “Effectiveness It is difficult to find a good standard for of validation” comparison in a were also proposed in [1]. The evaluation is problem like load forecasting. ARMAX or based on the question: regression models was the performance of the proposed would probably be a good choice, but it method fairly compared must be admitted that fitting to that of some well-accepted method? A them would require as much hard work as method is considered fitting the MLPs to have been properly validated if: i) its they must test. That is perhaps the reason performancewas why most papers did compared to that of well-accepted methods; not make any kind of comparison (only [14], ii) the comparison [15], [53], [55], was based on the performance on test [56], [61], [63], [66], [68], [83] reported samples; iii) the size of comparisons to standard linear models). However, if there are no indicates that the loss function in the load comparisons, the forecasting problem reports on the performance of a proposed is clearly nonlinear, and that large errors method are difficult to may have disastrous interpret.We do not believe that comparisons consequences for a utility. Because of this, to other NNs or to measures based on fuzzy engines are valid, as these models are squared error are sometimes suggested, as not yet considered they penalize large “standard” or “well accepted” methods. errors (Root Mean Square Error was We should like to add yet another item to suggested in [4], Mean the guidelines suggested Square Percentage Error in [5]). Also, it is above: that iv) the results of these generally recognized comparisons should be that error measures should be easy to thoroughly examined by means of the understand and closely standard techniques used related to the needs of the decision-makers. in forecasting, and reported as fully as Some papers possible. In most papers, reported that the utilities would rather the forecasting errors were not examined in evaluate forecasting detail. systems by the absolute errors produced Most reported only the Mean Absolute [63], [66], and this Percent Errors suggests that Mean Absolute Errors could be (MAPE); few also reported the standard useful (they were deviation of the reported in [6], [63], [66]). errors [2], [17], [55], [70], [86]. Although In any case, error measures are only MAPE has become intended as summaries somewhat of a standard in the electricity for the error distribution. This distribution is supply industry, usually expected to it is clearly not enough in this context. The be normal white noise in a forecasting choice of error problem, but it will probably measures to help comparing forecasting not be so in a complex problem like load methods has been forecasting (specially much discussed, as a consequence of the if this distribution is seen as multivariate, many competitions conditioned on that were started in the 1980’s [4], [5], [32]. lead time). No single error measure could Most authors agree possibly be enough to that the loss function associated with the summarize it. The shape of the distribution forecasting errors, if should be suggested. known, should be used in the evaluation of a A few papers included graphs of the method. MAPE cumulative distribution of would be an adequate error measure if the the errors [22], [63], [70], [86], others loss function were suggested this distribution linear (and linear in percentage, not in by reporting the percentage of errors above absolute error); however, some critical recent studies [41], [76] and the experience values [52], [63], [66], percentiles [70], [86], of system operators or the maximum errors [2], [28], [52]. A histogram of the and a weather-dependent one, both forecast errors was included in by Adalines, and [63]. The possibility of serial correlation random noise. should be investigated Reference [24] used a functional-link by graphical means (scatterplots and network that had only correlograms) and portmanteau one neuron. The inputs were a set of tests [58], [61]. Most of the papers reviewed, sinusoids, past forecasting however, errors, and temperatures. The neuron had a largely bypassed these standard forecasting linear activation practices. function, and so this network may be VI. SOME PROPOSED ALTERNATIVES interpreted as a linear Afewpapers have proposed architectures model that decomposed the load into a other than the MLP, weather-independent or new ways to combine MLPs to standard component (modeled by a Fourier series), a statistical methods. weather-dependent In [95], the variables were grouped into a component (modeled by polynomial few more or less homogeneous functions and by “functional groups, and sorted according to an index that links”), and random noise. measured Self-organizing NNs were also used. how much each variable was (nonlinearly) Reference [84] trained correlated to Kohonen networks to find “typical” profiles the load. The ones which were most for each day of the correlated were fed to a network week, and then used a fuzzy engine to that implemented a projection pursuit compute corrections to regression model. those profiles, based on the weather Twenty-four such NNs were used to forecast variables and on the day a profile; each one types. Reference [93] proposed an unusual of them had a single hidden layer, with 5 self-organizing NN neurons that were model, in which the neurons were split into grouped into “sub-nets.” two clusters; one Reference [71] dealt with the load series in of them received past load data, the other the frequency received temperature domain, using signal-processing techniques. data. Reference [20] used a Kohonen NN to The series was classify the normalized HIPPERT et al.: NEURAL NETWORKS sample profiles and to identify the “typical” FOR SHORT-TERM LOAD profiles of each FORECASTING: A REVIEW AND class. When forecasting for a Tuesday in EVALUATION 53 October (for example), decomposed into three components of one checked in which classes Tuesdays fell different frequencies, in the past Octobers, which were forecasted by separate Adaline and averaged the typical profiles of those neurons (a kind classes. This average of linear neurons, see [37]). The series of profile was then “de-normalized” in order to forecasting errors produce the final was also decomposed into three forecast (i.e., it was multiplied by the components: an autoregressive standard deviation and added to the mean; both parameters must sets. One would expect these NNs to have have been previously overfitted their forecasted by some linear method). training data, and onewould not, in VII. CONCLUSION principle, expect them The application of artificial NNs to to produce good out-of-sample forecasts. forecasting problems has b) The results of the tests performed on been much studied in recent times, and a these NNs, though great number of papers apparently good, were not always very that report successful experiments and convincing. All practical tests have been those systems were tested on real data; published. However, not all the authors and nevertheless, in researchers in forecasting most cases the tests were not systematically have been convinced by those reports, and carried out: some believe the systems were not properly compared to that the advantages of using NNs have not standard been systematically benchmarks, and the analysis of the errors proved yet. The aim of this review is to did not make contribute to clarifying use of the available graphical and statistical the reasons for this skepticism, by tools (e.g., examining a collection of recent scatterplots and correlograms). papers on NN-based load forecasting, and In short, most of those papers presented by critically evaluating seemingly misspecified the ways the systems they proposed had models that had been incompleted tested. been designed Taken by themselves, and tested. they are not very convincing; however, the This examination led us to highlight two sheer number facts that may be of similar papers published in reputable among the reasons why some authors are journals, and the fact still skeptical: that some of the models they propose have a) Most of the proposed models, specially been reportedly very the ones designed successful in everyday use, seem to suggest to forecast profiles, seemed to have been that those large overparameterized. NN-based forecasters might work, after all, Many were based on single-model and that we still do multivariate forecasting, not properly understand how i.e., they regarded the profiles as vectors overparameterization and overfitting with affect them. 24 components that should be forecasted We believe, in conclusion, that more simultaneously research on the behavior by a single NN with 24 output neurons. This of these large neural networks is needed approach before definite conclusions led to the use of very large NNs, which are drawn; also, that more rigorous might have hundreds standards should of parameters to be estimated from very be adopted in the reporting of the small data experiments and in the analysis of the results, so that the scientific 858–863, 1996. community could have [7] A. G. Bakirtzis, J. B. Theocharis, S. J. more solid results on which to base the Kiartzis, and K. J. Satsios, discussion about the role “Short-term load forecasting using fuzzy played by NNs in load forecasting. neural networks,” IEEE Trans. ACKNOWLEDGMENT Power Systems, vol. 10, no. 3, pp. 1518– The authorswould like to thank Prof. R. R. 1524, 1995. Bastos (Univ. Federal [8] C. M. Bishop, Neural Networks for de Juiz de Fora, Brazil) for his careful Pattern Recognition. Oxford: reading of the first Claredon Press, 1997. version of the manuscript, and also Prof. D. [9] D. W. Bunn, “Forecasting loads and Bunn (London Business prices in competitive power markets,” School, UK) for many useful suggestions on Proc. IEEE, vol. 88, no. 2, pp. 163–169, the second 2000. version. [10] D. W. Bunn and E. D. Farmer, Eds., REFERENCES Comparative Models for Electrical [1] M. Adya and F. Collopy, “How effective Load Forecasting: John Wiley & Sons, are neural networks at forecasting 1985. and prediction? A review and evaluation,” J. [11] W. Charytoniuk, M. S. Chen, and P. Forecast., vol. 17, Van Olinda, “Nonparametric regression pp. 481–495, 1998. based short-term load forecasting,” IEEE [2] A. S. AlFuhaid, M. A. El-Sayed, and M. Trans. Power Systems, S. Mahmoud, “Cascaded artificial vol. 13, no. 3, pp. 725–730, 1998. neural networks for short-term load [12] C. Chatfield, “Neural networks: forecasting,” IEEE Trans. Forecasting breakthrough or passing Power Systems, vol. 12, no. 4, pp. 1524– fad?,” Int. J. Forecast, vol. 9, pp. 1–3, 1993. 1529, 1997. 54 IEEE TRANSACTIONS ON POWER [3] U. Anders and O. Korn, “Model SYSTEMS, VOL. 16, NO. 1, FEBRUARY selection in neural networks,” Neural 2001 Networks, vol. 12, pp. 309–323, 2000. [13] , “Forecasting in the 1990’s,” in Proc. [4] J. S. Armstrong and F. Collopy, “Error V Annual Conf. of Portuguese measures for generalizing about Soc. of Statistics, Curia, 1997, pp. 57–63. forecasting methods: Empirical [14] S. T. Chen, D. C. Yu, and A. R. comparisons,” Int. J. Forecast., vol. 8, Moghaddamjo, “Weather sensitive pp. 69–80, 1992. short-term load forecasting using nonfully [5] J. S. Armstrong and R. Fildes, connected artificial neural “Correspondence on the selection of error network,” IEEE Trans. Power Systems, vol. measures for comparisons among 7, no. 3, pp. 1098–1105, forecasting methods,” J. Forecast., 1992. vol. 14, pp. 67–71, 1995. [15] M. H. Choueiki, C. A. Mount- [6] A. G. Bakirtzis, V. Petridis, S. J. Campbell, and S. C. Ahalt, “Building a Kiartzis, M. C. Alexiadis, and A. H. ‘Quasi Optimal’ neural network to solve the Maissis, “A neural network short term load short-term load forecasting forecasting model for the problem,” IEEE Trans. Power Systems, vol. Greek power system,” IEEE Trans. Power 12, no. 4, pp. 1432–1439, Systems, vol. 11, no. 2, pp. 1997. [16] M. H. Choueiki, C. A. Mount- short-term load forecasting system using Campbell, and S. C. Ahalt, “Implementing functional link network,” IEEE a weighted least squares procedure in Trans. Power Systems, vol. 12, no. 2, pp. training a neural network to solve 675–680, 1997. the short-term load forecasting problem,” [25] T. Dillon, P. Arabshahi, and R. J. IEEE Trans. Power Systems, Marks II, “Everyday applications of vol. 12, no. 4, pp. 1689–1694, 1997. neural networks,” IEEE T. Neural Nets., vol. [17] T. W. S. Chow and C. T. Leung, 8, no. 4, pp. 825–826, 1997. “Neural network based short-term load [26] A. P. Douglas, A. M. Breipohl, F. N. forecasting using weather compensation,” Lee, and R. Adapa, “Risk due to IEEE Trans. Power Systems, load forecast uncertainty in short term power vol. 11, no. 4, pp. 1736–1742, 1996. system planning,” IEEE [18] J. T. Connor, “A robust neural network Trans. Power Systems, vol. 13, no. 4, pp. filter for electricity demand prediction,” 1493–1499, 1998. J. Forecast., vol. 15, no. 6, pp. 437–458, [27] , “The impact of temperature forecast 1996. uncertainty on bayesian load [19] M. Cottrell, B. Girard, Y. Girard, M. forecasting,” IEEE Trans. Power Systems, Mangeas, and C. Muller, “Neural vol. 13, no. 4, pp. 1507–1513, modeling for time series: A statistical 1998. stepwise method for weight elimination,” [28] I. Drezga and S. Rahman, “Input IEEE T. Neural Nets., vol. 6, no. 6, pp. variable selection for ANN-based 1355–1364, 1995. short-term load forecasting,” IEEE Trans. [20] M. Cottrell, B. Girard, and P. Rousset, Power Systems, vol. 13, no. “Forecasting of curves using a 4, pp. 1238–1244, 1998. Kohonen classification,” J. Forecast., vol. [29] , “Short-term load forecasting with 17, pp. 429–439, 1998. local ANN predictors,” IEEE [21] T. Czernichow, A. Piras, K. Imhof, P. Trans. Power Systems, vol. 14, no. 3, pp. Caire, Y. Jaccard, B. Dorizzi, and 844–850, 1999. A. Germond, “Short term electrical load [30] R. F. Engle, C. Mustafa, and J. Rice, forecasting with artificial neural “Modeling peak electricity demand,” networks,” Engineering Intelligent Syst., J. Forecast., vol. 11, pp. 241–251, 1992. vol. 2, pp. 85–99, 1996. [31] J.Y. Fan and J. D. McDonald, “A real- [22] M. Daneshdoost, M. Lotfalian, G. time implementation of short-term Bumroonggit, and J. P. Ngoy, “Neural load forecasting for distribution power network with fuzzy set-based classification syst.,” IEEE Trans. Power Systems, for short-term load forecasting,” vol. 9, no. 2, pp. 988–994, 1994. IEEE Trans. Power Systems, vol. 13, no. 4, [32] R. Fildes, “The evaluation of pp. 1386–1391, extrapolative forecasting methods,” Int. J. 1998. Forecast., vol. 8, pp. 81–98, 1992. [23] G. A. Darbellay and M. Slama, [33] W. L. Gorr, “Research prospective on “Forecasting the short-term demand for neural network forecasting,” Int. electricity—Do neural networks stand a J. Forecast., vol. 10, pp. 1–4, 1994. better chance?,” Int. J. Forecast., [34] M. T. Hagan and S. M. Behr, “The time vol. 16, pp. 71–83, 2000. series approach to short term [24] P. K. Dash, H. P. Satpathy, A. C. Liew, load forecasting,” IEEE Trans. Power and S. Rahman, “A real-time Systems, vol. PWRS-2, no. 3, pp. 785–791, 1987. [43] S. R. Huang, “Short-term load [35] T. Haida and S. Muto, “Regression forecasting using threshold autoregressive based peak load forecasting using models,” IEE Proc.—Gener. Transm. a transformation technique,” IEEE Trans. Distrib., vol. 144, no. 5, pp. Power Systems, vol. 9, no. 4, 477–481, 1997. pp. 1788–1794, 1994. [44] D. Husmeier and J. G. Taylor, Eds., [36] A. Harvey and S. J. Koopman, Neural Networks for Conditional “Forecasting hourly electricity demand Probability Estimation: Forecasting Beyond using time-varying splines,” J. American Point Predictions (Perspectives Stat. Assoc., vol. 88, no. 424, in Neural Computing): Springer-Verlag, pp. 1228–1236, 1993. 1999. [37] S. Haykin, Neural Networks—A [45] O. Hyde and P. F. Hodnett, “An Comprehensive Foundation, 2nd. adaptable automated procedure for ed. Upper Saddle River, NJ: Prentice Hall, short-term electricity load forecasting,” 1999. IEEE Trans. Power Systems, [38] T. Hill, L. Marquez, M. O’Connor, and vol. 12, no. 1, pp. 84–93, 1997. W. Remus, “Artificial neural [46] D. G. Infield and D. C. Hill, “Optimal network models for forecasting and decision smoothing for trend removal in making,” Int. J. Forecast., short term electricity demand forecasting,” vol. 10, pp. 5–15, 1994. IEEE Trans. Power Systems, [39] K. L. Ho, Y. Y. Hsu, C. F. Chen, T. E. vol. 13, no. 3, pp. 1115–1120, 1998. Lee, C. C. Liang, T. S. Lai, and [47] G. M. Jenkins, “Practical experiences K. K. Chen, “Short term load forecasting of with modeling and forecast,” Time Taiwan power system using Series., 1979. a knowledge-based expert system,” IEEE [48] H. R. Kassaei, A. Keyhani, T.Woung, Trans. Power Systems, vol. 5, and M. Rahman, “A hybrid fuzzy, no. 4, pp. 1214–1221, 1990. neural network bus load modeling and [40] K. L. Ho, Y. Y. Hsu, and C. C. Yang, predication,” IEEE Trans. Power “Short term load forecasting using Systems, vol. 14, no. 2, pp. 718–724, 1999. a multilayer neural network with an adaptive [49] A. Khotanzad, R. Afkhami-Rohani, T. learning algorithm,” IEEE L. Lu, A. Abaye, M. Davis, and Trans. Power Systems, vol. 7, no. 1, pp. D. J. Maratukulam, “ANNSTLF—A neural- 141–149, 1992. network-based electric load [41] B. F. Hobbs, S. Jitprapaikulsarn, S. forecasting system,” IEEE T. Neural Nets., Konda, V. Chankong, K. A. Loparo, vol. 8, no. 4, pp. 835–846, and D. J. Maratukulam, “Analysis of the 1997. value for unit commitment of [50] A. Khotanzad, R. Afkhami-Rohani, and improved load forecasting,” IEEE Trans. D. Maratukulam, Power Systems, vol. 14, no. 4, “ANNSTLF—Artificial neural network pp. 1342–1348, 1999. short-term load forecaster— [42] L. Holmstrom, P. Koistinen, J. Generation three,” IEEE Trans. Power Laaksonen, and E. Oja, “Neural and Systems, vol. 13, no. 4, statistical pp. 1413–1422, 1998. classifiers—Taxonomy and two case [51] A. Khotanzad, R. C. Hwang, A. Abaye, studies,” IEEE T. Neural and D. Maratukulam, “An adaptive Nets., vol. 8, no. 1, pp. 5–17, 1997. modular artificial neural network hourly 1998. load forecaster and its implementation [59] T. M. Martinetz, S. G. Berkovich, and at electric utilities,” IEEE Trans. Power K. L. Schulten, ““Neural-gas” Systems, vol. 10, network for vector quantization and its no. 3, pp. 1716–1722, 1995. application to time-series prediction,” [52] S. J. Kiartzis, C. E. Zoumas, J. B. IEEE T. Neural Nets., vol. 4, no. 4, pp. 558– Theocharis, A. G. Bakirtzis, and V. 568, 1993. Petridis, “Short-term load forecasting in an [60] G. A. N. Mbamalu and M. E. El- autonomous power system Hawary, “Load forecasting via using artificial neural networks,” IEEE suboptimal seasonal autoregressive models Trans. Power Systems, vol. 12, and iteratively reweighted no. 4, pp. 1591–1596, 1997. least squares estimation,” IEEE Trans. [53] K. H. Kim, J. K. Park, K. J. Hwang, Power Systems, vol. 8, no. 1, pp. and S. H. Kim, “Implementation 343–348, 1993. of hybrid short-term load forecasting system [61] J. S. McMenamin and F. A. Monforte, using artificial neural networks “Short-term energy forecasting and fuzzy expert systems,” IEEE Trans. with neural networks,” Energy J., vol. 19, Power Systems, vol. 10, no. 4, pp. 43–61, 1998. no. 3, pp. 1534–1539, 1995. [62] I. Moghram and S. Rahman, “Analysis [54] R. Lamedica, A. Prudenzi, M. Sforna, and evaluation of five short-term M. Caciotta, and V. O. Cencelli, load forecasting techniques,” IEEE Trans. “A neural network based technique for Power Systems, vol. 4, no. 4, short-term forecasting of pp. 1484–1491, 1989. anomalous load periods,” IEEE Trans. HIPPERT et al.: NEURAL NETWORKS Power Systems, vol. 11, no. 4, FOR SHORT-TERM LOAD pp. 1749–1756, 1996. FORECASTING: A REVIEW AND [55] K. Y. Lee, Y. T. Cha, and J. H. Park, EVALUATION 55 “Short-term load forecasting using [63] O. Mohammed, D. Park, R. Merchant, an artificial neural network,” IEEE Trans. T. Dinh, C. Tong, A. Azeem, J. Power Systems, vol. 7, no. 1, Farah, and C. Drake, “Practical experiences pp. 124–132, 1992. with an adaptive neural network [56] K. Liu, S. Subbarayan, R. R. Shoults, short-term load forecasting system,” IEEE M. T. Manry, C.Kwan, F. L. Lewis, Trans. Power Systems, and J. Naccari, “Comparison of very short- vol. 10, no. 1, pp. 254–265, 1995. term load forecasting techniques,” [64] H. Mori and H. Kobayashi, “Optimal IEEE Trans. Power Systems, vol. 11, no. 2, fuzzy inference for short-term load pp. 877–882, 1996. forecasting,” IEEE Trans. Power Systems, [57] C. N. Lu, H. T. Wu, and S. Vemuri, vol. 11, no. 1, pp. 390–396, “Neural network based short 1996. term load forecasting,” IEEE Trans. Power [65] S. E. Papadakis, J. B. Theocharis, S. J. Systems, vol. 8, no. 1, pp. Kiartzis, and A. G. Bakirtzis, “A 336–342, 1993. novel approach to short-term load [58] S. Makridakis, S. C. Wheelwright, and forecasting using fuzzy neural networks,” R. J. Hyndman, Forecasting— IEEE Trans. Power Systems, vol. 13, no. 2, Methods and Applications, 3rd ed, NY: John pp. 480–492, 1998. Wiley & Sons, [66] A. D. Papalexopoulos, S. Hao, and T. Brace, “Short-run forecasts of electricity M. Peng, “An implementation of a loads and peaks,” Int. J. Forec., neural network based load forecasting model vol. 13, pp. 161–174, 1997. for the EMS,” IEEE Trans. [75] D. K. Ranaweera, G. G. Karady, and R. Power Systems, vol. 9, no. 4, pp. 1956– G. Farmer, “Effect of probabilistic 1962, 1994. inputs in neural network-based electric load [67] A. D. Papalexopoulos and T. C. forecasting,” IEEE Hesterberg, “A regression-based approach T. Neural Nets., vol. 7, no. 6, pp. 1528– to short-term system load forecasting,” IEEE 1532, 1996. Trans. Power Systems, [76] , “Economic impact analysis of load vol. 5, no. 4, pp. 1535–1547, 1990. forecasting,” IEEE Trans. [68] D. C. Park, M. A. El-Sharkawi, R. J. Power Systems, vol. 12, no. 3, pp. 1388– Marks II, L. E. Atlas, and M. J. 1392, 1997. Damborg, “Electric load forecasting using [77] R. Reed, “Pruning algorithms—A an artificial neural network,” survey,” IEEE T. Neural Nets., vol. 4, IEEE Trans. Power Systems, vol. 6, no. 2, no. 5, pp. 740–747, 1993. pp. 442–449, 1991. [78] A. P. N. Refenes and A. D. Zapranis, [69] J. H. Park, Y. M. Park, and K. Y. Lee, “Neural model identification, variable “Composite modeling for adaptive selection and model accuracy,” J. Forecast., short-term load forecasting,” IEEE Trans. vol. 18, pp. 299–332, Power Systems, vol. 6, no. 2, 1999. pp. 450–457, 1991. [79] , Principles of Neural Model [70] T. M. Peng, N. F. Hubele, and G. G. Identification, Selection and Adequacy— Karady, “Advancement in the application w ith Applications to Financial of neural networks for short-term load Econometrics: Springer-Verlag, forecasting,” IEEE Trans. 1999. Power Systems, vol. 7, no. 1, pp. 250–257, [80] R. Sadownik and E. P. Barbosa, “Short- 1992. term forecasting of industrial [71] , “An adaptive neural network approach electricity consumption in Brazil,” J. to one-week ahead load Forecast., vol. 18, pp. 215–224, forecasting,” IEEE Trans. Power Systems, 1999. vol. 8, no. 3, pp. 1195–1203, [81] S. Sargunaraj, D. P. Sen Gupta, and S. 1993. Devi, “Short-term load forecasting [72] A. Piras, A. Germond, B. Buchenel, K. for demand side management,” IEE Proc.— Imhof, and Y. Jaccard, “Heterogeneous Gener. Transm. Distrib., vol. artificial neural network for short term 144, no. 1, pp. 68–74, 1997. electrical load forecasting,” [82] S. A. Soliman, S. Persaud, K. El-Nagar, IEEE Trans. Power Systems, vol. 11, no. 1, and M. E. El-Hawary, “Application pp. 397–402, 1996. of least absolute value parameter estimation [73] S. Rahman and O. Hazim, “A based on linear programming generalized knowledge-based short-term to short-term load forecasting,” Elect. Power load-forecasting technique,” IEEE T. Power & Energy Syst., Syst, vol. 8, no. 2, pp. vol. 19, no. 3, pp. 209–216, 1997. 508–514, 1993. [83] D. Srinivasan, A. C. Liew, and C. S. [74] R. Ramanathan, R. Engle, C. W. J. Chang, “Forecasting daily load Granger, F. Vahid-Araghi, and C. curves using a hybrid fuzzy-neural approach using self-organizing fuzzy approach,” IEE Proc.—Gener. ARMAX models,” IEEE Trans. Transm. Distrib., vol. 141, no. 6, pp. 561– Power Systems, vol. 13, no. 1, pp. 217–225, 567, 1994. 1998. [84] D. Srinivasan, S. S. Tan, C. S. Chang, [92] H. T. Yang, C. M. Huang, and C. L. and E. K. Chan, “Parallel neural Huang, “Identification of ARMAX network-fuzzy expert system strategy for model for short term load forecasting: An short-term load forecasting: evolutionary programming System implementation and performance approach,” IEEE Trans. Power Systems, vol. evaluation,” IEEE Trans. 11, no. 1, pp. 403–408, Power Systems, vol. 14, no. 3, pp. 1100– 1996. 1006, 1999. [93] H. Yoo and R. L. Pimmel, “Short term [85] J. W. Taylor and S. Majithia, “Using load forecasting using a selfsupervised combined forecasts with changing adaptive neural network,” IEEE Trans. weights for electricity demand profiling,” J. Power Systems, vol. Oper. Res. Soc., vol. 51, no. 14, no. 2, pp. 779–784, 1999. 1, pp. 72–82, 2000. [94] Z. Yu, “A temperature match based [86] J. Vermaak and E. C. Botha, “Recurrent optimization method for daily load neural networks for short-term prediction considering DLC effect,” IEEE load forecasting,” IEEE Trans. Power Trans. Power Systems, vol. Systems, vol. 13, no. 1, pp. 11, no. 2, pp. 728–733, 1996. 126–132, 1998. [95] J. L. Yuan and T. L. Fine, “Neural- [87] J. P. Vila, V.Wagner, and P. Neveu, network design for small training sets “Bayesian nonlinear model selection of high dimension,” IEEE T. Neural Nets., and neural networks:A conjugate prior vol. 9, no. 2, pp. 266–280, approach,” IEEE T. Neural Nets., 1998. vol. 11, no. 2, pp. 265–278, 2000. [96] G. Zhang, B. E. Patuwo, and M.Y. Hu, [88] A. S. Weigend and N. A. Gershenfeld, “Forecasting with artificial neural Eds., Time Series Prediction: networks: The state of the art,” Int. J. Forecasting the Future and Understanding Forecast., vol. 14, pp. 35–62, 1998. the Past. Reading, MA: Henrique S. Hippert is an Assistant Addison-Wesley, 1994. Professor at the Department of Statistics [89] A. S. Weigend and D. A. Nix, (Univ. Federal de Juiz de Fora, Brazil). “Prediction with confidence intervals Received the D.Sc. degree from Pontificia (local error bars),” in Proc. Int. Conf. Neural Universidade Catolica do Rio de Janeiro Info. Processing (Brazil). Main research interests: (ICONIP’94), Seoul, Korea, 1994, pp. 847– forecasting and neural networks. 852. Carlos E. Pedreira received the Ph.D. [90] A. S.Weigend and A. N. Srivastava, degree in 1987 from the Imperial “Predicting conditional probability College, University of London (EE distributions: A connectionist approach,” Department). He has been an Associate Int. J. Neural Syst., vol. 6, pp. Professor at the Pontificia Universidade 109–118, 1995. Catolica do Rio de Janeiro since 1993 [91] H. T. Yang and C. M. Huang, “A new (Assistant Prof. 1987–1993). Dr. Pedreira is short-term load forecasting the Founding President of the Brazilian Neural Networks Council (1992– 1994). Papers published in IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, Intern. J. Neural Systems, Mathematical Programming, J. Computation Intelligence in Finance. Patents for medical devices registered in Brazil. Member of the editorial board of the J. Comp. Intell. in Finance since 1997, and of the Computational Finance Prog. Committee since 1994. Hobbies: art photography, gourmet cooking and wine tasting. Reinaldo Castro Souza received the Ph.D. degree in 1979 from Warwick University, Coventry, UK (Statistics Department) and afterwards spent a period (1986/1987) as a Visiting Fellow at the Statistics Department of the London School of Economics. His major research interests are in the field of Time Series Analysis and Forecasting. He has been an Associate Professor at Pontificia Universidade Catolica do Rio de Janeiro since 1990. Member of the IIF (International Institute of Forecasters) and President of the Brazilian Operations Research Society since 1994. He has published papers in international journals such as: J. Forecast., Intern. J. Forecast., J. Applied Meteorology, Latin-American O. R. J., Stadistica, among others. Hobbies: sports (tennis, volley-ball, soccer), arts and French literature.