Applying Bayesian Inference in A Hybrid CNN-LSTM Model For Time Series Prediction.
Applying Bayesian Inference in A Hybrid CNN-LSTM Model For Time Series Prediction.
Applying Bayesian Inference in A Hybrid CNN-LSTM Model For Time Series Prediction.
Abstract—Convolutional neural networks (CNN) LSTM, Transformer can model data with rich tem-
and Long short-term memory (LSTM) provide state- poral patterns and learn high-level representations of
of-the-art performance in various tasks. However, features and associated nonlinear functions without
these models are faced with overfitting on small data
and cannot measure uncertainty, which have a negative relying on experts to select which of the manually-
effect on their generalization abilities. In addition, the crafted features to employ [1], [4].
prediction task can face many challenges because of Besides evaluating the performance prediction,
the complex long-term fluctuations, especially in time quantification of uncertainty is considered as one of
series datasets. Recently, applying Bayesian inference
in deep learning to estimate the uncertainty in the the most important aspects of the decision-making
model prediction was introduced. This approach can process [5]. In order to quantify the model’s un-
be highly robust to overfitting and allows to estimate certainty, many researchers use Bayesian inference
uncertainty. In this paper, we propose a novel ap- to estimate the uncertainty in the prediction model
proach using Bayesian inference in a hybrid CNN- from probability distributions. As the result, it can
LSTM model called CNN-Bayes LSTM for time series
prediction. The experiments have been conducted on be highly robust to overfitting and easily learn from
two real time series datasets, namely sunspot and minor datasets. In the Bayesian framework, the
weather datasets. The experimental results show that posterior distribution provides all information about
the proposed CNN-Bayes LSTM model is more ef- the unknown parameters. Bayesian inference with
fective than other forecasting models in terms of Root different techniques such as Markov Chain Monte
Mean Square Error (RMSE) and Mean Absolute Error
(MAE) as well as for uncertainty quantification. Carlo, Laplace approximation, expectation propaga-
Index Terms—Bayesian inference; time series tion, variational inference have been used to quantify
dataset; uncertainty quantification the uncertainty in time series data prediction such as
sunspot dataset [6], [7], weather dataset [8], [9], etc.
I. I NTRODUCTION
In this study, we propose to use Bayesian infer-
Time series prediction is a field of research with ence in a hybrid model between CNN and LSTM.
increasing interest that is broadly used in various We test on two real datasets, namely sunspot and
applications such as economy, bio-medicine, en- weather datasets. In addition, we also compare the
gineering, astronomy, weather forecast, air traffic proposed model to the statistical models and deep
management. The purpose of time series prediction learning models as well as uncertainty quantifi-
is to predict the future state of a dynamic system cation. The main contributions of this paper are
from the observation of previous states [1]. However, summarised as follows:
in a significant number of prediction problems, we
have to face uncertainty, non-linearity, chaotic be- • We apply a Bayesian inference to update the
haviors and non-stationarity, which deteriorates the weight of hyper-parameters in a hybrid predic-
prediction accuracy of the model. tion method that combines CNN and LSTM.
In order to deal with these issues, many ap- We use 1D convolutional layer of CNN to
proaches have been proposed. They can be gen- extract the spatial features and LSTM to ex-
erally categorized into two types: the statistical tract the temporal features of the sunspot and
approach and the deep learning approach. Statisti- weather datasets.
cal approaches such as SARIMA [2], Prophet [3] • We also compare the prediction performance
can predict time series precisely by exploiting the of proposed model with statistical models
relationship between the original data and the pre- (SARIMA and Prophet) and deep leaning mod-
dicted states while deep learning approaches such as els (LSTM, GRU, Transformer, and Informer).
1
• Finally, we illustrate the way to calculate the to deal with heavy memory when using long input
model’s uncertainty used in time series dataset. sequences [22]. This approach is an improvement
The rest of our paper is structured as follows: of Transformer approach [13]. The main idea of In-
Section II provides brief review of relevant works former is to use a ProbSparse technique in selecting
for time series prediction. Section III introduces only the most crucial queries by using Kullback-
our proposed model while Section IV describes Leibler. So it can decrease the time complexity and
the experimental results of two studies on sunspots memory useage.
and weather prediction. The conclusions and future III. P ROPOSED METHOD
works are summarized in Section V.
A. Long Short-Term Memory
II. R ELATED WORKS LSTM network is a advanced version of RNN
To improve the models’ performance prediction proposed by Hochreiter in 1997 [23]. It is applied
for time series dataset, many researchers introduced very effectively used due to the capability of learn-
several statistical, deep learning which attack uncer- ing short and long dependencies. The network’s
tain complex time series. (and so RNN) default behavior is to remember
Statistical approaches could predict time series information for a long time. RNNs take the form
precisely by mapping the relationship among both of a repeating sequence of NN modules. In RNN,
original data and predicted data. These models these modules have a very simple structure, just a
include the ARIMA family of methods such as tanh layer. But the issue is that RNN cannot process
AR, ARMA, ARIMA, Random Walk, SARIMA [2], long-term dependency, LSTM is intended to prevent
Prophet [3], etc. While SARIMA is to describe the this problem. LSTMs also have a string structure.
current value in a time series based on prior observed Instead of a single NN layer, LSTM has four layers
data by adding three new hyper-parameters to de- which interact with each other (seen in Figure 1).
termine the AR, moving average and distinguishing The main idea of LSTM is that the cells’ state is
terms as well as an additional parameter for the depicted by the horizontal line (red line) at the top,
seasonal interval, Prophet is a more current time from Ct−1 to Ct . The cell state is like a carousel
series predicting method. Although this approach running straight through the whole chain with only
has some similarities to SARIMA, it models the a few small linear interactions. It is relatively easy
trend and seasonality of time series by combining for information to remain unaltered.
more configurable flexibility. In Prophet approach, LSTMs have the ability to remove or add informa-
the trend, seasonality, and holiday are the three tion to the cell state, which is carefully regulated by
main features, and holiday is selected to change structures called gates. The gate is an optional way
predictions. for information to pass through. They are composed
Deep learning has proven to be extremely effec- of a layer of sigmoid NN and a point-wise multi-
tive in computer vision, computer gaming, multime- plication operator. The output of the sigmoid layer
dia, and big data-related challenges. Deep learning are the number values in [0, 1], which describe the
approaches are also widely used to model time throughput of each component. 0 and 1 values mean
series data. Because of their capacity to collect ”let nothing through” and ”let everything through”,
temporal information, RNNs have proven useful in respectively. An LSTM has three sigmoid gates
forecasting time series [10]. Many researchers used to protect and control the cell state, including the
deep learning approaches such as RNN, LSTM, forget, the input, and the output gates.
GRU [11], [12], Transformer [13] or CNN models to Hence this allows long-term memory to be reset
forecast temporal information in time series dataset. and overcome the vanishing and exploding gradient
[14] proposed to use recursive Levenberg-Marquardt problems.
Bayesian in RNN to forecast electricity spot prices
as well as compute the uncertainty of the model. B. Bayesian inference in a CNN-LSTM model
Other researchers used CNN to predict wind power The proposed model named CNN-Bayes LSTM
[15], LSTM to predict wind speed [16], weather [8], that is illustrated in Fig. 2 has two main parts: CNN
[9], sunspot [10], [17], [18], or combine CNN and (extract the spatial data) and Bayes LSTM (extract
LSTM [19], [20], RNN and LSTM [21] to forecast long-term temporal data). After the data preparation,
the output in time series datasets. Recently, In 2021, high level spatial features can be extracted by using
Zhou proposed a novel approach called Informer a CNN layer.
2
Fig. 1: The LSTM architecture [23].
3
to maximize the target output. If the process is not 2) Weather dataset: Weather dataset used in our
good, we return to consider the hyper-parameters as research includes 1380 samples of the mean temper-
well as the architecture in LSTM model. Otherwise, ature values per month in Bangladesh from January
we go to use these weights to predict the future and 1901 to December 2015. This data is available at
evaluate the model. In our proposed model, to iden- Kaggle website [28]. It is divided into two sets,
tify vital LSTM hyper-parameter values, Bayesian including 965 and 415 samples in training and
optimization is used. testing sets, respectively.
Figure 4 illustrates the trend of monthly mean
C. Uncertainty quantification sunspots number and mean temperature from 1901
to 1905. Figure 5 illustrates the monthly mean
Before evaluating forecasting uncertainty, it is total sunspot number and temperature in entire two
necessary to identify the two types of uncertainty datasets.
(aleatoric and epistemic) and the appropriate solu-
tion to decrease them. The first type of uncertainty B. Evaluation
is epistemic. It refers to model’s uncertainty because To evaluate the performance of the model’s pre-
of the lacking of model’s knowledge in features of diction, we use two evaluation metrics in forecasting
the input space where there is tiny data such as task, including RMSE and MAE. RMSE is used to
data sparsity, bias, etc. [24]. It can be reduced by measure the magnitude of errors in the prediction
gathering enough data. We can achieve a model’s and is calculated as quadratic mean of the difference
confidence interval by estimating its epistemic un- between predicted value and observed value, called
certainty. The second type of uncertainty is aleatoric. prediction error. MAE is a measure of a model’s
It is essentially a noise inherent in the observations performance in relation to a test set. It captures
such as input-dependent due to either sensor noise as the average of the absolute values of individual
or motion noise which is uniform along the dataset. prediction mistakes across all instants in the test set.
It cannot be decreased even when more data is RMSE and MAE are defined as follows:
collected. We may calculate the prediction interval
v
u n n
u1 X 2 1X
by estimating the aleatoric and epistemic uncertainty RMSE = t yi − yi ) ; MAE =
(b |b
yi − yi |
[25], [26]. The confidence interval may be narrower n i=1 n i=1
than the prediction interval. (3)
where ybi and yi are the observed and predicted
IV. E XPERIMENTAL RESULTS values at time step i, n is the length of the sample
data.
A. Dataset
C. Empirical Results
To evaluate the performance of our proposed Table I shows the results obtained by the proposed
model, we test on two real time series datasets, method. Furthermore, to show the robustness of the
namely sunspot and weather datasets. proposed model, we compare the proposed model
1) Sunspot dataset: Sunspot dataset is collected with others models, namely SARIMA [2], Prophet
from January 1749 to February 2022 by the research [3], Transformer [13], Informer [22], LSTM [23],
working in the Royal Observatory of Belgium. This and GRU [11] models. The results show that, our
data is available at the World Data Center SILSO proposed model has outperformed others with 26.10
website [27]. The dataset used in our research in- of RMSE and 18.74 of MAE for sunspot dataset.
cludes 3278 samples of averaged total sunspot num- On weather dataset, the value obtained by the pro-
ber per month with the dates and the monthly mean posed method is 2.23 for RMSE and 1.64 for MAE
number of sunspots information. It is divided into respectively. It can be clearly seen that, there is a
two sets, including 2294 and 984 samples in training big gap in RMSE values between statistical models
and testing sets, respectively. For the forecasting, the and deep learning model. In sunspot dataset, while
data can be classified into either a fixed time period all used deep learning models have RMSE values
or a solar cycle. Solar cycles and ordinary years under 50 and MAE values from over 22 to under 40,
are not distinguished in the dataset. As a result, the especially Informer model obtained 29.90 at RMSE
dataset only uses the averaged number of sunspots and 22.25 at MAE, statistical models have over 50
seen in that month. in both RMSE values and MAE values, SARIMA
4
(a) (b)
Fig. 4: The trend of monthly mean sunspots number (a) and mean temperature (b) from 1901 to 1905 in
two datasets.
(a) The monthly sunspot number from 1749 to February 2022 (b) The monthly mean Temperature from 1901 to February 2015
Fig. 5: The trend of monthly mean sunspots number (a) and mean temperature (b) in entire two datasets.
TABLE I: Comparison of the proposed method and 6 are presented the models’ epistemic variance esti-
the state-of-the-art methods on two datasets. Two mation on two datasets. It is more interesting differ-
best results are in bold. ences between these models. We compare GRU and
Sunspots dataset Weather dataset proposed models in three aspects, including the real
Forecasting models
RMSE MAE RMSE MAE data, the predicted data and the epistemic uncertainty
SARIMA [2] 54.11 45.51 - - corresponding to the red line, the green line and
Prophet [3] 60.15 56.09 - - the light blue line, respectively. The 95% confidence
Transformer [13] 33.99 25.26 2.10 1.43 intervals for the sunspot number and the temperature
Informer [22] 29.90 22.35 2.32 1.82 of two models obtained from numerous predictions
LSTM [23] 46.14 39.44 2.32 1.75 are illustrated in this figure. Figure 6 shows that
GRU [11] 37.14 26.77 4.44 3.43 our proposed model captures the variation of the
Proposed model 26.10 18.74 2.23 1.64 predicted normalized value in the entire two datasets
whereas GRU sometimes fails to capture this mea-
sure in both datasets (the red circles).
and Prophet models obtained 54.11 versus 33.99 of
RMSE and 45.51 versus 56.09 of MAE respectively. V. C ONCLUSIONS AND F UTURE WORKS
Interestingly, on Sunspot dataset, proposed method In this paper, we have proposed a novel approach
had an outstanding performance in regarding RMSE using Bayesian inference in a hybrid CNN-LSTM
and MAE values, it is much better than well-known model called CNN-Bayes LSTM for time series
Informer model (25.95 versus 29.90 in RMSE and prediction. We evaluated the performance predic-
18.61 versus 23.35 respectively). In weather dataset, tion and uncertainty quantification of our proposed
the RMSE and MAE values are slightly lower than model and compared with six models in the lit-
that of Transformer (2.23 versus 2.10 in RMSE and erature, including SARIMA, Prophet, Transformer,
1.64 versus 1.43 respectively). Our result still is Informer, LSTM, and GRU in time series dataset
higher than other models such as Informer, LSTM forecasting. Experimental results have shown that
and GRU models. In addition, the proposed model proposed CNN-Bayes LSTM achieves better perfor-
can calculate the epistemic uncertainty. There are mance than existing methods in term of RMSE and
some differences between GRU and the proposed MAE values as well as the uncertainty quantification
models when using Bayesian inference. The Figures of the model. However, we only used 1D CNN
5
(a) sunspot data
and one factor such as the sunspot number and the [14] D. Mirikitani and N. Nikolaev, “Recursive bayesian recur-
temperature. It is interesting idea if we can test on rent neural networks for time-series modeling,” Transac-
tions on Neural Networks, vol. 21, no. 2, pp. 262–274, 2010.
many factors in high dimension dataset (3D or 4D). [15] K. Amarasinghe, D. L. Marino, and M. Manic, “Deep neural
networks for energy load forecasting,” in International
R EFERENCES Symposium on Industrial Electronics, 2017, pp. 1483–1488.
[16] J. Wang and Y. Li, “Multi-step ahead wind speed prediction
[1] Li, F. Zhang, L. Gao, Y. Liu, and X. Ren, “A novel
based on optimal feature extraction, long short term mem-
model for chaotic complex time series with large of data
ory neural network and error correction strategy,” Applied
forecasting,” Knowledge-Based Systems, vol. 222, 2021.
Energy, vol. 230, pp. 429–443, 2018.
[2] Box and Jenkins, “Time series analysis: Forecasting and
[17] Z. Pala and R. Atici, “Forecasting sunspot time series using
control,” Holden-Day Series in Time Series Analysis, pp.
deep learning methods,” Solar Physics, pp. 1–14, 2019.
161–215, 1976.
[18] T. Khan, F. Arafat, U. Mojumdar, A. Rajbongshi, T. Sid-
[3] S. Taylor and B. Letham, “Forecasting at scale,” The Amer-
diquee, and R. Chakraborty, “A machine learning approach
ican Statistician, vol. 72, no. 1, pp. 37–45, 2018.
for predicting the sunspot of solar cycle,” in International
[4] Y. Dang, Z. Chen, H. Li, and H. Shu, “A comparative study
Conference on Computing, Communication and Networking
of non-deep learning, deep learning, and ensemble learning
Technologies, 2020, pp. 1–4.
methods for sunspot number prediction,” arXiv, 2022.
[19] X. Shi, Z. Chen, H. Wang, D. Yeung, W. Wong, and
[5] M.Abdar, F. Pourpanah, S. Hussain, and R. D. .etc, “A
W. Woo, “Convolutional LSTM network: A machine learn-
review of uncertainty quantification in deep learning: Tech-
ing approach for precipitation nowcasting,” CoRR, 2015.
niques, applications and challenges,” CoRR, 2020.
[20] C.-J. Huang and P.-H. Kuo, “A deep CNN-LSTM model
[6] M. Sophie, R. Sachs, C. Ritter, V. Delouille, and L. Laure,
for particulate matter (PM2.5) forecasting in smart cities,”
“Uncertainty quantification in sunspot counts,” The Astro-
Sensors, vol. 18, no. 7, 2018.
physical Journal, vol. 886, no. 1, pp. 1–14, 2019.
[21] Y. Sudriani, I. Ridwansyah, and H. A. Rustini, “Long short
[7] M. Atencia, R. Stoean, and G. Joya, “Uncertainty quantifi-
term memory (LSTM) recurrent neural network (RNN) for
cation through dropout in time series prediction by echo
discharge level prediction and forecast in cimandiri river, in-
state networks,” Mathematics, vol. 8, no. 8, 2020.
donesia,” IOP Conference Series: Earth and Environmental
[8] A. Shafin, “Machine learning approach to forecast average
Science, vol. 299, 2019.
weather temperature of bangladesh,” Global Journal of
[22] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and
Computer Science and Technology: Neural & Artificial
W. Zhang, “Informer: Beyond efficient transformer for long
Intelligence, vol. 19, pp. 39–48, 2019.
sequence time-series forecasting,” CoRR, 2021.
[9] T. Siddique, S. Mahmud, A. Keesee, C. Ngwira, and
[23] S. Hochreiter and J. Schmidhuber, “Long short-term mem-
H. Connor, “A survey of uncertainty quantification in ma-
ory,” Neural computation, vol. 9, no. 8, pp. 1735–1780,
chine learning for space weather prediction,” Geosciences,
1997.
vol. 12, no. 1, 2022.
[24] Y. Dar, V. Muthukumar, and R. G. Baraniuk, “A farewell
[10] R. Chandra, S. Goyal, and R. Gupta, “Evaluation of deep
to the bias-variance tradeoff? an overview of the theory of
learning models for multi-step ahead time series prediction,”
over-parameterized machine learning,” arxiv, 2021.
arXiv:2103.14250, 2021.
[25] B. Kappen and S. Gielen, “Practical confidence and predic-
[11] K. Cho, B. Merrienboer, D. Bahdanau, and Y. Bengio,
tion intervals for prediction tasks,” Prog. Neural Process,
“On the properties of neural machine translation: Encoder-
vol. 8, pp. 128–135, 1997.
decoder approaches,” CoRR, 2014.
[26] A. Kendall and Y. Gal, “What uncertainties do we need in
[12] J. Chung, Ç. Gülçehre, K. Cho, and Y. Bengio, “Empirical
bayesian deep learning for computer vision?” CoRR, 2017.
evaluation of gated recurrent neural networks on sequence
[27] SILSO World Data Center, “The international sunspot num-
modeling,” CoRR, 2014.
ber,” International Sunspot Number Monthly Bulletin and
[13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones,
online catalogue, 1749–2022.
A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is
[28] www.kaggle.com/yakinrubaiat/bangladeshweather-dataset.
all you need,” CoRR, 2017.