One-hour-Ahead Wind Speed Prediction Using A Bayesian Methodology
One-hour-Ahead Wind Speed Prediction Using A Bayesian Methodology
One-hour-Ahead Wind Speed Prediction Using A Bayesian Methodology
time scale of interest for the predictions, NWP-based models observations of the variable, which is not always possible.
being usually the best option for longer term predictions (over Priors can be either informative or non-informative. One
around 6 hours ahead). could define that the average daily temperature in a given
location on the Equator line during the summer belonged to a
A. Time-series models
uniform distribution that varies between [-100oC, 100oC],
The time-series approaches can be particularly useful when which is not particularly informative. A more reasonable
shorter prediction horizons are required. Such methods are estimate could be made by looking at weather information
usually less complex and computationally intensive then taken at similar locations and say it could vary uniformly
NWP-based methods, making them a useful tool for between [20oC,50oC], or even that it was normally distributed,
generation dispatch applications. In particular, autoregressive with a mean of 35oC and a 99% confidence interval of ±15oC
processes are a class of standard statistical models which are about the mean.
well-known and of easy implementation. The posterior probability, on the other hand, is a
A traditional way of testing the performance of a short- conditional probability, which takes into account the
term prediction model is to compare its output with that from information contained in the prior and the normalised
the persistence method [7]. The persistence method consists of likelihood, as shown above.
using the wind speed over the past hour as the prediction for In many cases, it is not possible to derive the posterior
the next one. As simplistic as it may sound, the persistence distributions analytically, but samples may be generated using
method performance is quite remarkable over short prediction Markov Chain Monte Carlo (MCMC) methods. The resulting
horizons due to the typical time constants associated with posterior is then obtained not as a single number, but as a
weather systems. probability distribution. Empirical summary statistics can be
calculated from the samples in this distribution and used to
III. THE BAYESIAN METHODOLOGY draw inferences about their true values, such as the expected
The Bayesian approach to statistical analysis is not new, its (mean) value and confidence intervals of interest.
roots dating back to the mid 18th century, but it gained When using MCMC, a ‘burn-in’ stage must be taken into
considerable attention over the last two decades, particularly account, during which the different parameters should
due to the increase in computer processing power. During this converge to their true values. This is followed by a stage
time it has found many areas of application in the modelling during which the samples from the posterior distribution are
and forecasting of time series data, such as disease mapping kept for analysis purposes. With more complicated models,
and biological processes, as well as different modelling the convergence stage may take a long time.
frameworks, such as autoregressive moving average [8] and
A. Hierarchical Models and Expert Knowledge
state space techniques [9].
One of the main characteristics of Bayesian statistics lies One important feature of the Bayesian approach is the
on the fact that, contrary to the frequentist approach, the ability to implement nested structures, also referred to as
probabilities associated to a variable in a given process are not hierarchical models.
taken as how many times, or how often (their frequency) they This enables further characterisation of the variables in the
are observed. Instead, the probabilities (either extracted from model and the introduction of expert knowledge at different
actual observations; the expected range of possible values levels. This can be exemplified by considering a fictitious
based on previous knowledge or still simply estimated values) model of hourly electricity demand over the winter. The
are seen as a degree of belief attached to the variable, which in overall demand distribution may be modelled as normally
turn can take the form of a probability distribution. distributed. Further, the time-series data available may be
Bayes’ theorem, shown in (1), can be stated as the modelled by two additive functions, the domestic and the
probability of a variable A given the occurrence of another industrial consumption. The individual time-series models
variable B is equal to the normalised likelihood of B given A themselves may have any desired structure, e.g. an
( P(B|A) / P(B) ) times the probability of A. In other words the autoregressive arrangement. The domestic consumption can
conditional probability of A on B can be calculated using the further be made dependent on the average daily temperature,
conditional probability of B on A, and the degree of belief on which again can be characterised using any desired
A, the prior ( P(A) ). formulation, and made dependent on any other variable.
In the specific case of wind speed prediction, such facility
P ( B | A) . P( A) allows the inclusion of physical phenomena into the statistical
P( A | B) = (1)
P( B) model - as opposed to the statistical analysis stage used on
some of the NWP-based methods - adding flexibility and
The concepts of prior and posterior distributions are
strengthening the resulting model.
inherent to Bayesian statistics and are further developed
below.
IV. MODEL DEVELOPMENT
A prior probability is a marginal probability, i.e. a
description of a variable based on some knowledge of what A Bayesian hierarchical model was developed to model the
value it may assume, but not necessarily based on autocorrelation structure between consecutive hourly mean
3
wind speed samples, thus allowing the prediction of wind potential problem for the hierarchical model adopted since it
speeds at future time-steps. assumes at its first level that the data is normally distributed.
The input data for the model were two years of hourly Therefore a transformation of the original wind speed data
mean wind speeds from a weather station at the extreme north was required and the Box-Cox transformation was used as it
of Great Britain (Lerwick, Shetland Islands), taken for the presents a simple and straightforward procedure for
years of 1998 and 1999 (Fig. 1). The average wind speed for non-normality correction [11], as detailed below.
the site during this period was 7.9 m/s. The Box-Cox transformation y(λ), of a dataset y is defined
as:
30
⎧ (λ) y λ − 1
⎪y = , for λ ≠ 0
25 ⎨ λ (4)
⎪ y (λ ) = ln ( y ), for λ = 0
⎩
Wind Speed (m/s)
20
The choice of the value for the parameter λ can be made
through an analysis of the log-likelihood function:
15
n ⎡ n ( yi (λ) − y (λ)) 2 ⎤ n
10
f ( y, λ) = − . ln ⎢
2 ⎢⎣ i =1 n
∑ ⎥ + (λ − 1). ln( yi )
⎥⎦
∑ (5)
i =1
Fig. 1. Input wind speed data (two years of hourly averages). -2.5
-3.5
• Level 1: The wind speed data at the site is initially
defined as coming from a Normal distribution with an -4
underlying mean and variance, as in (2), where Uw is
the wind speed data, µw is the underlying mean at the -4.5
site and σw2, the associated variance.
-5
2
U w ~ N (µ w , σ w ) (2)
-5.5
where Uwt is the value of the series at time t, Uwt-1 to From Fig. 2, it can be seen that the value of λ which
maximises the function is λ = 0.5. In order to validate the
Uwt-n are the previous values of the series up to n
procedure, Matlab’s boxcox function was also used. This
previous time steps, n is the order of the model, β0 is
function performs a continuous assessment of the data on the
the time-series overall level, β1 to βn are the series variable λ, searching the resulting function maximum through
autocorrelation coefficients, and ut is a normally an optimisation procedure. This yielded a value of
distributed random term, with zero mean and variance λMAT =0.4913. The difference between the results was not
σw2. significant enough to justify the use of the more complex
calculation procedure, which may not be readily available to
A. Non-normality correction many, therefore λ = 0.5 was used in the data conversion.
It is a well known characteristic of general wind speed
V. SIMULATION RESULTS
series that its variation at a given site can be modelled using
the Weibull distribution [10]. As described above, this is a The model was built using wind speeds defined within a
window that moved through the wind dataset, in order to
4
generate sequential predictions. Each window contained two For purposes of illustration, the shape of the probability
years worth of data (17520 points) and for each of these distributions obtained for the six AR model coefficients
snapshot windows, one additional hour was predicted. This (9,000 samples) are also shown in Fig. 4.
process was carried for 48 data windows. A schematic
representation of the procedure is shown in Fig. 3.
beta[1,1] sample: 9000 beta[1,2] sample: 9000
TABLE I 10
COMPARISON BETWEEN COEFFICIENTS OF AR(6) MODEL, AS CALCULATED BY
MATLAB AND BAYESIAN INFERENCING
5
AR model Bayesian Inferencing Tim e-series
Matlab Predictions
coefficients Mean 2.5% 97.5%
β1 (t-1) 1.02 1.0545 Conf. Interval
1.0401 1.0694
0
β2 (t-2) - 0.04111 - 0.0824 - 0.1038 - 0.0610 0 10 20 30 40 50
β3 (t-3) - 0.0104 0.0125 - 0.0092 0.0333 Time (h)
β4 (t-4) - 0.003534 - 0.0272 - 0.0485 - 0.0058
β5 (t-5) - 0.02822 - 0.0142 - 0.0358 0.0071 Fig. 5. Wind speed predictions (with 95% confidence interval) and wind
β6 (t-6) 0.0077 0.0035 - 0.0113 0.0184 speed data (1-hour-ahead predictions).
5
2 interest.
1
Lower-order autoregressive (AR) models can be rather
limited for this particular application as they may fail to
0
capture the periodical variations of the wind speed. As further
-1 work, the authors are now pursuing the use of a more complex
-2 structure, such as an autoregressive moving average (ARMA)
-3 model, in conjunction with the incorporation of additional
variables, such as the atmospheric pressure, which could
-4
0 10 20 30 40 50 greatly improve the performance of the model.
Time (h)
VII. ACKNOWLEDGMENT
Fig. 6. Absolute prediction errors obtained for the developed AR model and
the persistence method. The authors gratefully acknowledge the contribution of
Gavin Shaddick for his initial help with the OpenBUGS
software and useful discussions on some of the model
C. Discussion
implementation aspects.
Despite the marginal improvement in performance over the The authors also thank the UK Met Office and the British
persistence method, the results obtained are encouraging with Atmospheric Data Centre for supplying the wind speed data
respect to the application of Bayesian inferencing to wind used in this research.
speed prediction, particularly considering the simplicity of the
statistical model (AR) employed and the fact that only wind VIII. REFERENCES
speed data was used in the modelling. [1] OpenBUGS project website, http://mathstat.helsinki.fi/openbugs,
The use of more complex models, such as autoregressive November, 2005.
moving average (ARMA) models, should improve the [2] G. Kariniotakis, P. Pinson, N. Siebert, G. Giebel, and R. Bartelmie, "The
State of the Art in Short-term Prediction of Wind Power - From an
predictions considerably, especially considering the ability of
Offshore Perspective," in Proc. 2004 Symposium ADEME – IFREMER
such models to characterise seasonality and other level (Renewable energies at sea). [Online]. Available:
variation effects in the data. http://anemos.cma.fr/download/publications/pub_2004_paper_SeaTech
Also, further exploitation of the hierarchical structure Week04_SOTA.pdf, November, 2005.
[3] L. Landberg, "Short-term prediction of the power production from wind
capability of the Bayesian methodology should lead to farms," Journal of Wind Engineering and Industrial Aerodynamics,
improved model performance, especially if other physical vol. 80, pp. 207-220, 1999.
variables known to have influence over the wind speed are [4] U. Focken, M. Lange, D. Heinemann, and H. P. Waldl, "Previento -
Regional Wind Power Prediction with Risk Control," in Proc. 2002
included, such as atmospheric pressure. Global Windpower Conference.
Finally, the use of simultaneous data from multiple sites [5] I. G. Damousis, M. C. Alexiadis, J. B. Theocharis, and P. S.
(described in the model as a multivariate normal distribution), Dokopoulosy, "A Fuzzy Model for Wind Speed Prediction and Power
Generation in Wind Parks Using Spatial Correlation," IEEE Trans.
together with the inclusion of other physical variables, should Energy Conversion, vol. 19, pp. 352-361, June 2004.
yield a more robust model, despite its higher complexity. In [6] T. G. Barbounis, J. B. Theocharis, M. C. Alexiadis, and P. S.
such an approach, the gradients between the physical variables Dokopoulos, "Long-Term Wind Speed and Power Forecasting Using
could also be used, in addition to single point readings, as an Local Recurrent Neural Network Models," IEEE Trans. Energy
Conversion, to be published.
attempt to improve the performance of the predictions. [7] M. Milligan, M. Schwartz, and Y.-H. Wan, "Statistical Wind Power
Forecasting Models: Results for U.S. Wind Farms," in Proc. 2003
VI. CONCLUSIONS Windpower Conference.
[8] P. Congdon, Bayesian Statistical Modelling. Chichester: Wiley, 2001, p.
The impact of wind power on power systems has received 556.
great attention over recent years, and reliable prediction [9] M. West and P.J. Harrison, Bayesian Forecasting and Dynamic Models.
New York, Sprinter-Verlag, 1997 (2nd ed.), p. 680.
models have become essential tools to assist in the system
6
IX. BIOGRAPHIES