Time Series Forecasting of Petroleum

JID: NEUCOM
ARTICLE IN PRESS [m5G;October 15, 2018;15:11]

Neurocomputing xxx (xxxx) xxx
Contents lists available at ScienceDirect
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
Time series forecasting of petroleum production using deep LSTM

recurrent networks
Alaa Sagheer a,b,∗, Mostafa Kotb b
a
College of Computer Science and Information Technology, King Faisal University, Saudi Arabia
b
Center for Artificial Intelligence and Robotics (CAIRO), Faculty of Science, Aswan University, Egypt
a r t i c l e i n f o a b s t r a c t
Article history: Time series forecasting (TSF) is the task of predicting future values of a given sequence using historical
Received 11 November 2017 data. Recently, this task has attracted the attention of researchers in the area of machine learning to ad-
Revised 4 July 2018
dress the limitations of traditional forecasting methods, which are time-consuming and full of complexity.
Accepted 30 September 2018
With the increasing availability of extensive amounts of historical data along with the need of perform-
Available online xxx
ing accurate production forecasting, particularly a powerful forecasting technique infers the stochastic
Communicated by Zidong Wang dependency between past and future values is highly needed. In this paper, we propose a deep learn-
ing approach capable to address the limitations of traditional forecasting approaches and show accurate
Keywords:
Time series forecasting predictions. The proposed approach is a deep long-short term memory (DLSTM) architecture, as an ex-
Deep neural networks tension of the traditional recurrent neural network. Genetic algorithm is applied in order to optimally
Recurrent neural networks configure DLSTM’s optimum architecture. For evaluation purpose, two case studies from the petroleum
Long-short term memory industry domain are carried out using the production data of two actual oilfields. Toward a fair eval-
Petroleum production forecasting uation, the performance of the proposed approach is compared with several standard methods, either
statistical or soft computing. Using different measurement criteria, the empirical results show that the
proposed DLSTM model outperforms other standard approaches.
© 2018 Elsevier B.V. All rights reserved.
1. Introduction munity after they showed better prediction accuracies [5]. Given
the several ANN algorithms, identifying a specific ANN algorithm
Time Series Forecasting (TSF) system involves predicting the for a forecasting task should be based on a compromise among
system behavior in future, which is based on information of the three aspects; namely, the complexity of the solution, the desired
current and past status of the system. Presently, TSF plays an im- prediction accuracy, and data characteristics [5]. Considering the
perative role in several real world problems, such as the financial first two aspects, i.e. precision and complexity, the best results are
markets, network traffic, weather forecasting, and petroleum (or obtained by the Feed Forward NN predictor, in which the informa-
oil) industry, among others [1]. In the past, the TSF problem has tion goes through the network in the forward direction only. How-
been influenced by linear statistical methods in order to achieve ever, on the addition of the third aspect, i.e. the data characteris-
the forecasting activities. Recently, several useful nonlinear time tics, Recurrent Neural Network (RNN) is found to be more suitable
series models were proposed such as the bilinear model [2], the than FFNN [6].
threshold autoregressive model [3] and the Autoregressive Condi- In RNN, the activations from each time step are stored in the in-
tional Heteroscedastic (ARCH) model [4], among others. However, ternal state of the network in order to provide a temporal memory
the analytical study of non-linear time series analysis is still in its property [7]. However, the most major weakness of RNN is carried
infancy compared to linear time series [1]. out during the requirement of learning long-range time dependen-
In the last two decades, several Artificial Neural Network (ANN) cies [7,8]. To overcome this drawback, Hochreiter and Schmidhuber
algorithms have drawn attention and have established themselves [9] developed the Long Short-Term Memory (LSTM) algorithm as
as serious contenders to statistical methods in the forecasting com- an extension to RNN [8,10]. Despite the advantages cited for LSTM
and its predecessor RNN, their performances for TSF problem are
∗
not satisfactory. Such shallow architectures, can not represent effi-
Corresponding author at: College of Computer Science and Information Technol-
ogy, King Faisal University, Saudi Arabia And Faculty of Science, Aswan University,
ciently the complex features of time series data, particularly, when
Egypt attempting to process highly nonlinear and long interval time se-
E-mail addresses: [email protected], [email protected] (A. Sagheer). ries datasets [8,11].
https://doi.org/10.1016/j.neucom.2018.09.082
0925-2312/© 2018 Elsevier B.V. All rights reserved.
Please cite this article as: A. Sagheer, M. Kotb, Time series forecasting of petroleum production using deep LSTM recurrent networks,
Neurocomputing (2018), https://doi.org/10.1016/j.neucom.2018.09.082
JID: NEUCOM
2 A. Sagheer, M. Kotb / Neurocomputing 000 (2018) 1–11
In this paper, we propose that a Deep LSTM (DLSTM) architec- data, such as frequency, variance, and mean, undergo alternation
ture can adapt with learning the nonlinearity and complexity of over time [11]. Third, the rock and fluid properties of the reservoirs
time-series data. The proposed deep model is correspondingly an are highly nonlinear and heterogeneous in nature [19].
extension of the original LSTM model, where it includes multiple It is known that, the petroleum production from a reservoir is
LSTM layers such that each layer contains multiple cells. The pro- dependent on several dynamic parameters, such as fluid satura-
posed model demonstrates a more effective use of the parameters tion and pressure in the reservoir, and static parameters such as
of each LSTM’s layer in order to train the forecasting model effi- porosity and permeability [18]. The majority of these parameters
ciently. It works as follows: each LSTM layer operates at different are not always available. Certainly, this limited data access from
time scale and, thereby, processes a certain part of the desired task the petroleum reservoirs lessens the overall accuracy of forecast-
and, subsequently, passes it on to the next layer until finally the ing [11].
last layer generates the output [12,13].
Thus, we can attribute the benefit of stacking more than LSTM 2.2. Related works
layer to the recurrent connections between the units in the same
layer, and the feed-forward connections between units in an LSTM Several approaches have been developed to overcome the afore-
layer and the LSTM layer above it [13,14]. This ensures an improved mentioned petroleum TSF challenges, however, yet the key for
learning with more sophisticated conditional distributions of any a successful forecasting lies in choosing the right representation
time series data. Also, it can perform hierarchical processing on among these approaches [11]. These approaches can be classified
difficult temporal tasks, and more naturally, capture the structure into two broad categories; namely, statistical approaches, and soft
of data sequences [11]. computing approaches. One of the most common traditional sta-
Towards fair evaluation, here we in this study train and validate tistical methods is the Autoregressive Integrated Moving Average
the DLSTM model through more than a scenario, where we have (ARIMA) [22].
used the genetic algorithm in order to optimally design and config- ARIMA and its variants can be used to achieve diverse forecast-
ure the best DLSTM architecture and parameters. Concurrently, we ing activities in the petroleum industry such as, prices, consump-
compare the DLSTM’s performance with the performance of other tion levels, and reservoir production [23]. Another known mathe-
reference models using the same datasets, and same experimen- matical method is the Decline Curve Analysis (DCA) method, which
tal conditions via different error measures. The reference models is based on the conventional ARPs equation. Historically, DCA has
vary from statistical methods, neural networks (shallow and deep) been widely used in petroleum industry, particularly in the sce-
methods, and hybrid (statistical and neural networks) methods. narios depicting the decline of petroleum production with the in-
The remainder of the paper is organized as follows: crease in production time [24].
Section 2 describes the TSF problem and associated works in Nevertheless, the performance of traditional mathematical
the oil and petroleum industry. The proposed DLSTM model is methods is still questionable. Indeed, more complex, high-
presented in Section 3. Section 4 shows the experiment settings of dimensional, and noisy real-world time-series data cannot be de-
this paper. The experimental results of two case studies are shown scribed with analytical equations based on parameters, in order to
in Section 5. Discussion and analysis of the results are provided in solve since the dynamics that are either too complex or unknown
Section 6 and, finally, the paper is concluded in Section 7. [11], as the case of DCA. Moreover, the main drawback of tradi-
tional methods is that these methods are based mainly on the
2. TSF problem statement analysis of subjective data types. In other words, they pick the
proper slope, and subsequently tune in the parameters of the nu-
The majority of real-world time series data sets have a tem- merical simulation model, in such a way that the reasonable values
poral or time sequence property, particularly, in forecasting activ- are retained, and finally, they are able to provide interpretations of
ities for weather, stock markets, robotics, and oilfields production, the oilfield’s geology [25]. But the oilfield’s geology and fluid prop-
among others. Correspondingly, it has been observed that finding erties of the oilfields are highly nonlinear and heterogeneous in
an effective method for forecasting trends in time-series datasets nature, thus yielding time series data that represent a long mem-
continues to be a long-standing unsolved problem with numerous ory process. Certainly these properties represent big challenges for
potential applications [1]. For this reason, time series forecasting traditional approaches, which still are far from estimating the ac-
is considered as one of the top ten challenging problems in data curate future production of petroleum [11,17].
mining due to its unique properties [15]. In this paper, we focus Since the past decade, sincere efforts have been evidently pub-
on the TSF problem of petroleum fields production. lished in the literature presenting the use of soft computing meth-
ods to achieve different forecasting activities in a number of
2.1. Overview of petroleum TSF
petroleum engineering applications. In 2011, Berneti and Shah-
bazian presented an imperialist competitive algorithm using ANN
Forecasting of the petroleum production is a very pertinent
to predict oil flow rate of the oil wells [26]. In 2012, Liu et al.
task in the petroleum industry, where the accurate estimation of
combined wavelets transformation with ANN in order to establish
petroleum reserves involves massive investment of money, time
a production-predicting model that used drill stem to test produc-
and technology in the context of a wide range of operating and
tion and wavelet coefficients [27]. In 2013, Chakra et al. presented
maintenance scenarios [16,17]. As such, a fairly precise estima-
an innovative higher-order NN model to focused on forecasting cu-
tion of petroleum quantity in the reservoir is in high demand
mulative oil production from a petroleum reservoir located in Gu-
[17,18]. However, several characteristics of petroleum time series
jarat, India [18].
data make such estimations more challenging.
More recently in 2016, Aizenberg et al. presented a multi-
First of all, the samples of petroleum time series data often
layer NN with multi-valued neurons capable of performing a
contain excessive noise, defects and anomalies, and, also some-
time series forecasting of oil production [25]. Aizenberg model is
times, high dimensionality [19,20]. Second, the petroleum time se-
based on a complex-valued neural network with a derivative-free
ries datasets are non-stationary and may exhibit variable trends by
backpropagation-learning algorithm. Eventually, Ma presented an
nature [21].1 This implies that the statistical characteristics of the
extension of the Arps decline model, which was constructed within
a nonlinear multivariate prediction approach [28]. The approach is
1
see chapter(3–5) in [21]. considered as a hybrid approach that combines the kernel trick
JID: NEUCOM
A. Sagheer, M. Kotb / Neurocomputing 000 (2018) 1–11 3
regulate such constant error flow [10]. Therefore, LSTM approxi-

mates the long-term information with significant delays by expe-
diting the conventional RNN algorithm [31].
The key in LSTM structure is the cell state (memory cell), which
looks like a conveyor belt. It runs straight down the entire chain
with the ability to add or remove information to the cell state,
carefully regulated by structures called gates. The gates are ways
for optional inlet of information. They are composed of a sigmoid
neural net layer and a pointwise multiplication operation as de-
picted in Fig. 2. An input at time step t is (Xt ), and the hidden
state from the previous time step (St−1 ) that is introduced to LSTM
Fig. 1. Processing of time sequence in RNN. block, and then the hidden state (St ) is computed as follows:
• The first step in LSTM is to decide what information is going

with the Arps exponential decline equation. It is worthy to men- to be thrown away from the cell state. This decision is made
tion that, in this paper we conduct a comparison with both Chekra by the following forget gate (ft ):
et al. [18] and Ma [28] approaches since both of these contribu-
ft = σ (Xt U f + St−1W f + b f ) (1)
tions present application of the same case studies as described in
this paper. • The following step is to decide which new information is go-
ing to be stored in the cell state. This step has two folds:
2.3. Motivation First, the input gate (it ) layer decides which values to be
updated. Second, a tanh layer that creates a vector of new
Although the soft computing methods, that employ different candidate values C˜t . These two folds can be described as fol-
ANN algorithms, are used to recover the aforementioned limita- lows:
tions of the statistical methods and yield more accurate forecasting it = σ (Xt U i + St−1W i + bi ) (2)
rates, they are observed to still face some challenges. It is demon-
strated that the traditional ANNs with shallow architectures are C˜t = tanh(Xt U c + St−1W c + bc ) (3)
devoid of sufficient capacity to accurately model the aforemen-
tioned complexity aspects of time series data, such as, high nonlin- • Then, update the old cell state, Ct−1 into the new cell state
earity, longer intervals, and big heterogeneous properties [28,29]. Ct , which can be given as:
This reason, and more, motivated us to solve the TSF problem Ct = Ct−1 ft it C˜t (4)
using a Deep Neural Network (DNN) architecture instead of shal-
low NN architecture models. DNNs models are termed deep be- • Finally, decide what is going to be produced as output. This
cause they are constructed by stacking multiple layers of nonlinear output will be based on the cell state, but will be a filtered
operations one top of one another with several hidden layers [30]. version. In this step, the output gate (ot ) decides what parts
of the cell state are going to be produced as output. Then,
the cell state goes through tanh layer (to push the values to
3. The proposed model
be between -1 and 1) and multiply it by the output gate as
follows:
Prior to introducing the proposed model, it is essential to de-
scribe briefly the original LSTM as in the current study it is the
ot = σ (Xt U o + St−1W o + bo ) (5)
precedent of the proposed model.
St = ot tanh(Ct ) (6)
3.1. The original LSTM model
From the previous six equations, the LSTM presents the follow-
ing three groups of parameters:
The traditional Recurrent Neural Network (RNN), aka vanilla
RNN, is one of the recursive neural network approaches that can 1. Input weights: Uf , Ui , Uo , Uc .
be applied for modeling of sequential data. The key feature of RNN 2. Recurrent weights: Wf , Wi , Wo , Wc .
is the network delay recursion, which enables it to describe the 3. Bias: bf , bi , bo , bc .
dynamic performance of systems [6]. The signal delay recursion
makes the output of the network at time t associate not only with 3.2. The deep LSTM recurrent network
the input at time t but also with recursive signals before time t,
as shown in Fig. 1. However its capability to process short-term It is widely demonstrated that increasing the depth of a neu-
sequential data, the weakness of RNN is carried out when learn- ral network is an effective way to improve the overall performance
ing long-range dependencies, or long-term context memorization, [30]. Encouraged by the impressive learning abilities of deep recur-
is demanded in time series forecasting applications [8,10]. rent network architectures [32], we have developed a deep LSTM
However, despite the introduction of several RNN variants, the recurrent network to be used in time series forecasting applica-
Long Short-Term Memory (LSTM) model is the elegant RNN’s vari- tions. In the proposed DLSTM, we are able to stack several LSTM
ant, which uses the purpose-built LSTM’s memory cell in order blocks, as shown in Fig. 3, one after another connected in a deep
to represent the long-term dependencies in time series data [31]. recurrent network fashion to combine the advantages of a single
In addition, LSTM is introduced to solve the vanishing gradient LSTM layer. The goal of stacking multiple LSTM in such a hierar-
problem of RNN in case of a long-term context memorization is chical architecture is to build the features at the lower layers that
required [8,9]. The LSTM model, developed by Horchreiter and will disentangle the factors of variations in the input data and then
Schmidhuber [9], truncates the gradients in the network where it combine these representations at the higher layers. In case of large
is innocuous by enforcing constant error flows through constant or complex data, it is demonstrated that such deep architecture
error carousels within special multiplicative units. These nonlin- will generalize better due to a more compact representation than
ear units learn to open or close gates in the network in order to shallow architecture [11,14,32].
JID: NEUCOM
Fig. 2. LSTM block, where ft , it , ot are forget, input, and output gates respectively.
vides the output. Another benefit, such architecture allows the hid-
den state at each level to operate at a different timescale. The last
two benefits have great impact in scenarios showing the use of
data with long-term dependency or in case of handling multivari-
ate time series datasets [33].
4. Experiments
In this section, we will show in detail all the experimental set-

tings that were adopted in order to implement both the proposed
model and the reference models using real datasets. In addition,
this section provides a brief overview of the reference models and
the optimality criteria, we will rely on for a comparative analysis
of their performance with the proposed model’s performance. The
codes that have been used in the experiments of this paper are
shared in github.2
4.1. Experimental setting
The following experimental settings have been adopted for all

experiments conducted in this paper.
4.1.1. Data preprocessing

The data used in this paper are the raw production data of two
actual oilfields, so it is highly possible to include noise as in influ-
encing factor. As such, it is not appropriate to use the raw produc-
tion data in the learning of NN because NN requires extremely low
Fig. 3. The architecture of DLSTM recurrent network.
learning rates. Thus, a preprocessing scenario consists of four steps
has been incorporated before the use of the raw production data
in the experiments of this paper.
In the DLSTM architecture as shown in Fig. 3, the input at
time t, Xt is introduced to the first LSTM block along with the Step 1: Reduce noise from raw data
(1 )
previous hidden state St−1 , the superscript (1) refers to the first To smoothen the raw data and remove any possible noise we
will use the moving average filter as a type of low pass filter
LSTM. The hidden state at time t, St(1 ) is computed as shown in
in the analogous way as described in [18]. Specifically, this
Section 3.1 and goes forward to the next time step and also goes
filter provides a weighted average of past data points in the
upward to the second LSTM block. The second LSTM uses the hid-
time series production data within a time span of five-points
den state St(1 ) along with the previous hidden state St−1
(2 )
to com-
to generate smoothed estimation of a time series. This step
pute St(2 ) , that goes forward to the next time step and upward to is imperatively incorporated to reduce the random noise in
the third LSTM block and so on, until the last LSTM block is com- data by retaining the sharpest step response associated with
piled in the stack. raw data [18].
The benefit of such stacked architecture is that each layer can
process some part of the desired task and subsequently pass it
on to the next layer until finally the last accumulated layer pro- 2
https://github.com/DeepWolf90/DLSTM2.
JID: NEUCOM
Step 2: Transform raw data to stationary data 1. The ARIMA model

As explained in Section 2.1, time series data are non- The comparison with the Auto-Regression Integrated Moving
stationary data, which may, in fact, exhibit a specific trend Average (ARIMA) algorithm represents a statistical-based compar-
[21]. Of course, stationary data is easier to model and will ison. For comparison purpose, we will implement the ARIMA pro-
very likely result in more skillful forecasts. In the current gram using Statsmodels [35] library where we use the grid search
preprocessing step, we removed the trend property in the to iteratively explore different combinations of the known ARIMA
data; whether it is increasing or decreasing trend. Later on, parameters (p, d, q), see chapter 5 [21]. For each combination of
we added the trend back to forecasts in order to return the these parameters, we fit a new ARIMA model and, subsequently,
forecasting problem into the original scale and calculate a we use the Akaike Information Criterion (AIC) value to choose the
comparable error score. A standard way to remove the trend best combination of these parameters [36]. The AIC measures how
is by the differencing of the data. That is the observation well a model fits with the data in consideration of the overall
from the previous time step (t-1) is subtracted from the cur- model complexity.
rent observation (t) [21].3 . 2. The Vanilla RNN model
Step 3: Transform data into supervised learning The comparison with the vanilla RNN model represents a ma-
We use one-step ahead forecast, where the next time step chine learning-based comparison. The original RNN is already cov-
(t+1) is predicted. We divide the time series into input (x) ered briefly in Section 3.1. For comparison purpose, we will imple-
and output (y) using lag time method, and specifically, in ment two RNN reference models, one with single hidden layer and
the study we have used different sizes of lag from lag1 to the other one with multiple hidden layers [13,14].
lag6. 3. The DGRU model
Step 4: Transform data into the problem scale The comparison with the Deep Gated Recurrent Unit (DGRU)
Like other neural networks, DLSTM expects data to be within model represents a deep learning-based comparison, where DGRU
the scale of the activation function used by the network. The is a counterpart of DLSTM. It is demonstrated that, the GRU model
default activation function for LSTM is the hyperbolic tan- is similar to the original LSTM model with the exception that GRU
gent (tanh), wherein its output values lie between −1 and 1. includes only two gates rather than three [37]. The experiments of
This is the preferred range for the time series data. Later on, DGRU are typically similar to that of DLSTM.
we transformed the scaled data back in order to return the 4. The NEA model
forecasting problem into the original scale. The comparison with the Nonlinear Extension for linear Arps
decline (NEA) model represents a hybrid-based comparison. NEA
is a hybrid method that combines Decline Curve Analysis (DCA),
4.1.2. Implementation scenario
which is a traditional statistical method, with the kernel machine,
The implementation experiments of the proposed DLSTM model
which is a machine learning method [28]. For comparison purpose,
include two different scenarios, namely, (i) static scenario and (ii)
we will rely on the results provided in [28] using same datasets of
dynamic scenario. In the static scenario, we fit the forecasting
both case studies described in this paper.
model with all of the training data and then forecast each new
5. The HONN model
time step once at a time with the testing data. In the dynamic
The comparison with the Higher-Order Neural Network (HONN)
scenario, we update the forecasting model each time step with
model is another machine learning-based comparison, where
the insertion of new observations from the testing data. In other
HONN is a feedforward multilayer neural network model that
words, the dynamic scenario uses the value of the previous fore-
employs what is called higher-order synaptic operations (HOSO).
casted value of the dependent variable to compute the next one,
HOSO of HONN embraces the linear correlation (conventional
whereas the static forecast uses the actual value for each subse-
synaptic operation) as well as the higher-order correlation of neu-
quent forecast.
ral inputs with synaptic weights [18]. For the comparison purpose,
we will rely on the results introduced by the authors of [18] by
4.1.3. Training of DLSTM exclusively using the second case study, since they did not apply
In the training phase of DLSTM experiments, we use the Genetic their method on the first case study described in this paper.
Algorithm (GA) to infer optimal selection for the proposed model
hyper-parameters. We implemented the GA using Distributed Evo-
4.3. Forecasting accuracy measures
lutionary Algorithms in Python (DEAP) library [34]. The number of
hyper-parameters is based on the implementation scenario. For the
In the literature, two kinds of errors are usually measured in
static scenario, there are three hyper-parameters, namely,number
order to estimate the forecasting precision and performance evalu-
of epochs, number of hidden neurons, and the lag size. For the dy-
ation of the forecasts, namely, scale-dependent errors and percent-
namic scenario, there are four hyper-parameters, the same three
age errors.
hyper-parameters of the static scenario, plus the number of updates,
(1) Scale-dependent errors These errors are on the same scale
which is the number of times we update our forecasting model
as the data itself. Therefore, as a limitation, the accuracy measures
each time step when new observations from the testing data are
that are based directly on this error cannot be used to make com-
inserted. This methodology is typically adopted in the experiments
parisons between series that are on different scales. The known
of other neural networks in the reference models.
scale-dependent measure is therefore based on the squared error,
namely, root mean square error (RMSE) [38], which can be given
4.2. Reference models as follows:
n
1 obs
Toward a fair evaluation, we will compare the proposed DL- RMSE = (yi − yipred )2 , (7)
n
STM model with different reference models that vary from statisti- i=1
cal methods, machine learning methods, and hybrid (statistical and
pred
machine learning) methods. The reference models are: Where, yobs
i
is the current observation and yi is its predicted
value.
(2) Percentage errors Percentage errors have the advantage of
3
chapter 3 in [21]. being scale-independent, and therefore are frequently used to
JID: NEUCOM
Table 1 5.2. Case study 2: Using production data of Cambay Basin oil field in
Best results of DLSTM with static scenario.
India
No. of layer No. of hidden units No.of Epochs lag RMSE RMSPE
1 [4] 953 5 0.234 3.337 As the previous case study, we examined the proposed model
2 [4,2] 787 5 0.227 3.253 and the reference models using real production data collected
3 [5,4,2] 800 5 0.209 2.995 through six years from 2004 to 2009, i.e. about 63 months. This
oilfield is located in the southwestern part of Tarapur Block of
Cambay Basin to the west of Cambay Gas Field in India [18].7 This
compare forecast performance between different scaled datasets. oilfield consists of total eight oil producing wells that present con-
The most commonly used measure is the root mean square per- tinuous production history. The authors in [18,28] considered only
centage error (RMSPE) [38], which can be given as follows: the cumulative oil production data from five wells; out of these
eight wells. Thus implying the availability of five input series cor-
2 responding to the monthly production of the five oil wells, plus

n pred
1 yi − yobs
i
RMSP E = × 100 (8) an output series as corresponding to the cumulative production of
n
i=1
yobs
i this oilfield. The relationship between the five input series and the
output series has been reported to be highly nonlinear [18].
It is clear that, both measures are calculated by comparing the Accordingly, and toward fair evaluation, in the experiments of
target values for the time series and its corresponding time series this case study we will consider also the same cumulative data of
predictions. The results obtained using both metrics are different the same five wells. We will follow the same experimental sce-
in their calculated values, but the significance of each metric is nario described in [18,28] by dividing the production dataset into
similar in performance measurement of the prediction models. No- two sets, i.e. first set (70% of data set) to be used to build the fore-
tably, since the production data presents different scales in the ma- casting models, and second set (30% of the data set) to be used for
jority of cases, it is preferable to rely on RMSPE, or any other per- testing the performance of the forecasting models. The results of
centage error measures, for estimating the relative error between each model shown in this section are based on the testing data.
different models [38]. The best performance results of the proposed DLSTM static sce-
nario, DLSTM dynamic scenario, single-RNN, Multi-RNN, and DGRU
5. Experimental results are shown separately inTables 7–11, respectively. Each table of
these five tables shows the values of each hyper-parameter, which
We proceed now to show the quantitative and visual results of optimally selected using the GA as described in Section 4.1.3. The
the proposed DLSTM model along with the reference models for relation between the original production data and their prediction
each case study. Notably, the results shown in all tables of this for the DLSTM model is illustrated in Figs. 6 and 7. Table 12 shows
section indicate the performance of the corresponding model in an overall comparison among these five models along with the
the testing data rather than training data. This has been done in best parameter combinations of ARIMA method and the best per-
concurrence with the widely demonstrated fact, which states, the formance results of NEA reported in [28] using the same data set.
genuine evaluation for forecasting performance should be based on The NEA results shown in Table 12 are imparted as they are given
unseen data not the historical (training) data, which already seen by the authors of [28] where they did not consider the RMSE mea-
by the model [39].4 sure.
This case study provides an extra comparison where we com-
5.1. Case study 1: Using production data of Block-1 of Huabei oil pare the proposed DLSTM model with the HONN model [18], de-
field in China scribed in Section 4.2. In their paper, the authors used three mea-
sures to evaluate their model and these include MSE, RMSE, and
This case study includes raw data collected from the Block-1 in MAPE. In the current paper, we have used the RMSE (RMSE is the
Huabei oilfield, which is located in north China [28].5 The dataset root of MSE) as described in Section 4.3. Subsequently, in this com-
of this oilfield contains 227 observations of the oil production data, parison we calculate the MAPE measure within our model to com-
in which the first 182 observations (80% of dataset) have been used pare with the MAPE results of HONN shown in [18]. The MAPE, as
to build, or train, the forecasting models, and the remaining 45 ob- a percentage error measure, can be computed as follows:
servations (20% of the dataset) have been used for testing the per- n pred
formance of the forecasting models. 1 |yi − yobs
i
|
MAPE = × 100 (9)
The best performance results of the proposed DLSTM static sce- n
i=1
yobs
i
nario, DLSTM dynamic scenario, single-RNN, Multi-RNN, and DGRU
Table 13 shows the comparison between the HONN model and the
are shown separately in Tables 1–5, respectively. Each of these five
proposed DLSTM model based on the three measures. For the pro-
tables show the values of each hyper-parameter, which has been
posed DLSTM model, the best results of both scenarios (static and
optimally selected using the GA as described in Section 4.1.3. The
dynamic) are shown in Table 13. The authors of [18] used three
relation between the original production data and their predic-
different lags in their experiments, and the best result as high-
tion for the DLSTM model is illustrated in Figs. 4 and 5. Table 6
lighted by them was inferred using lag 1 [18]8 which is included
shows an overall comparison among these five models along with
in Table 13.
the best parameter combinations of ARIMA method and the best
performance results of NEA model reported in [28]6 using same
data set. The NEA results shown in Table 6 are imparted as they 6. Results analysis and discussion
are given by the authors of [28] where they did not consider the
RMSE measure. In this paper, we tried to ensure a genuine evaluation for the
proposed model against five different types of comparison with
state-of-the-art techniques using two real world datasets. More
4
See section 3.4 , pages 177:184.
5
The raw data are listed in the Table 1 in [28].
6 7
Relation between the original production and the prediction results of NEA is The raw data are listed in the Table 1b in [18].
8
plotted in Fig. 1 [28]. see Table 3 in [18].
JID: NEUCOM
Table 2
Best results of DLSTM with dynamic scenario.
No. of layer No. of hidden units No.of Epochs lag update RMSE RMSPE
1 [3] 1352 3 1 0.267 3.783

2 [4,5] 1187 5 1 0.219 3.124
3 [4,3,3] 403 5 2 0.257 3.637
Fig. 4. Production data v.s. prediction using DLSTM-static.
Fig. 5. Production data v.s. prediction using DLSTM-dynamic.
than one standard optimality criteria are used to assess the per- tion we will discuss and analyze the results shown in the previous
formance of each model. It is widely demonstrated in literature section where we will focus on these results based on the percent-
that the percentage error measures are the most appropriate tool age error measure of each model.
to assess the performance of different forecasting models. It also
6.1. Case 1 versus Case 2
presents the percentage error capable to estimate the relative er-
ror between different models particularly when the samples of the
However it is not a real comparison since each case study has
time series data have different scales [39]. Accordingly, in this sec-
its own samples and source, but we can notice few observations on
JID: NEUCOM
Fig. 6. Production data v.s. prediction using DLSTM-static.
Table 3 Table 6
Best results of Single-RNN. Overall comparison among ARIMA, NEA
[28], RNN, DGRU, and DLSTM using data
No. of units No.of Epochs lag RMSE RMSPE set of case study 1.
[4] 1890 5 0.233 3.290
Forecasting model RMSE RMSPE
[5] 653 4 0.238 3.366
[3] 431 4 0.263 3.740 ARIMA 0.310 4.705
NEA [28] — 4.221
DLSTM(static) 0.209 2.995
Table 4 DLSTM(dynamic) 0.219 3.124
Best results of Multi-RNN. Single-RNN 0.233 3.290
Multi-RNN 0.219 3.129
No. of layer No. of hidden units No.of Epochs lag RMSE RMSPE DGRU 0.222 3.175
2 [2,4] 1551 5 0.219 3.129
2 [3,4] 1913 5 0.239 3.387
2 [2,2] 787 5 0.247 3.530
3 [5,5,4] 457 3 0.258 3.701 Table 7
3 [4,3,4] 1611 5 0.237 3.374 Best results of DLSTM with static scenario.
Table 5 1 [3] 1700 3 0.025 3.496
Best results of DGRU. 2 [1,1] 20 0 0 1 0.030 4.135
3 [2,2,1] 20 0 0 2 0.028 3.926
1 [4] 1870 2 0.256 3.610

2 [4,3] 1011 6 0.237 3.391
2 [5,3] 1514 6 0.222 3.175
3 [4,3,1] 354 6 0.263 3.734
values of the proposed DLSTM model are the global minimum val-
ues amongst the other reference models. In Tables 7, 8 and 10–
13 of the other case study, we will notice the same pattern for
both case studies. In case study 1, we can notice from Tables 1 and all models, again, with a superiority for the DLSTM over the other
2 that the best results of DLSTM are achieved using three LSTM models.
layers and two LSTM layers in static and dynamic scenario, respec- However, the DLSTM is the optimum among the other counter-
tively. Also, in Tables 4 and 5 we can notice that the best results of parts, though it illustrates a light variation in the hyper-parameters
multi-RNN and DGRU are achieved using two layers in both cases. values, particularly the parameter “number of layers”. In our opin-
From the accumulated comparison in Table 6, it is clear that the ion, this variation in the best hyper-parameter values between the
Table 8
Best results of DLSTM with dynamic scenario.
No. of layer No. of hidden units No.of Epochs Lag Update RMSE RMSPE
1 [5] 1259 6 4 0.029 4.219

2 [2,5] 1500 6 3 0.028 4.060
3 [4,4,5] 1400 6 4 0.032 4.482
JID: NEUCOM
Fig. 7. Production data v.s. prediction using DLSTM-dynamic.
Table 9 Table 12
Best results of Single-RNN. Overall comparison among ARIMA, NEA
[28], RNN, DGRU, and DLSTM using data
No. of units No.of Epochs lag RMSE RMSPE set of case study 2.
[1] 1551 4 0.029 4.095
Forecasting Model RMSE RMSPE
[2] 1115 1 0.029 4.133
[1] 953 2 0.030 4.174 ARIMA 0.027 3.773
NEA [28] — 4.221
DLSTM(static) 0.025 3.496
Table 10 DLSTM(dynamic) 0.028 4.060
Best results of Multi-RNN. Single-RNN 0.029 4.095
No. of layer No. of hidden units No.of Epochs lag RMSE RMSPE Multi-RNN 0.027 3.731
DGRU 0.028 3.991
2 [5,1] 1514 5 0.027 3.731
2 [2,4] 1551 5 0.028 4.125
2 [2,2] 787 3 0.030 4.196 Table 13
3 [1,1,3] 953 4 0.029 4.112 Comparison between HONN[18] and DLSTM.
3 [1,3,3] 953 2 0.031 4.353
Forecasting model MSE RMSE MAPE
HONN[18] 0.001 0.035 3.459

DLSTM(static) 0.0 0 0 0.025 2.851
two case studies may be attributed to the higher data samples in
DLSTM(dynamic) 0.0 0 0 0.028 2.976
case study 1 than case study 2. In other words, DLSTM does not
require large number of layers in case the dataset size is not large.
Of course, as the number of data samples is going to be bigger,
essentially the performance of DLSTM going to be better [32]. whereas DLSTM achieved 2.9. The same pattern for case study 2 is
presented in Table 12, where ARIMA achieved 3.8 minimum error
6.2. DLSTM versus ARIMA whereas DLSTM achieved 3.5. In other words, the DLSTM model
shows more efficiency than ARIMA model in predicting the future
It is easy to notice in Table 6 for case study 1, where the dataset oil productions and in describing the typical tendency of the oil
is large, all the errors of DLSTM are smaller than those of the production as shown in Fig. 4–7. In contrast, the predicted values
ARIMA algorithm. Specifically, ARIMA achieved 4.7 minimum error by ARIMA are quite far away from the oil production points where
Table 11
Best results of DGRU.
No. of layer No. of hidden units in each layer No.of Epochs lag RMSE RMSPE
1 [2] 787 2 0.029 4.125

2 [3,1] 431 4 0.030 4.207
3 [1,3,5] 354 6 0.029 4.035
3 [4,3,5] 354 6 0.028 3.991
JID: NEUCOM
the difference between both contenders approaches 2 points in synaptic operations are called Linear Synaptic Operation (LSO),
first case. We can estimate why the performance of ARIMA is not Quadratic Synaptic Operation (QSO) and Cubic Synaptic Operation
well due to its linearity nature whereas the relationship between (CSO), respectively [18]. The authors stated that the best HOSO op-
inputs and outputs is not linear in such a production data. As a eration is the third one (CSO).
nonlinear model, DLSTM could to describe smoothly the nonlinear It seems that the computation of HONN is complex where cal-
relationship between inputs and outputs. culation of the activation function of the model is a combination
of the conventional linear synaptic function plus the cubic synap-
6.3. DLSTM versus other recurrent NNs tic operation. In addition, most of parameters, such as time lag and
number of neurons in the hidden layer, are adjusted manually or
In this comparison, DLSTM is compared with its forefather, based on trial and error. This means that the parameters selection
RNN, and its counterpart, DGRU, where the three contenders have should be adjusted carefully to ensure accurate oil production fore-
the same origin and classified as recurrent neural networks. It is casting.
easy to notice in Table 6 for case study 1 that DLSTM achieved Nevertheless, in Table 13 DLSTM continues to show better per-
3.4 against 3.7 for Multi-RNN and 4.0 for DGRU. The same rates formance than HONN via the three error measures, particularly for
are approximately achieved in case study 2 and Table 12. However, percentage error measure. Namely, through the MAPE measure the
the error differences are not so big among the three contenders, DLSTM achieved 2.8 against 3.4 for HONN. In our perspective, the
since all of them have typical deep architecture, but still the pro- optimality of DLSTM’s performance can attribute to the recursive
posed DLSTM model shows better performance than the others. Of nature of DLSTM, against the feedforward nature of HONN. Indeed,
course, as the size of data is going to be large, expressively the the recursive property ensures more accurate prediction particu-
performance of DLSTM will be much better than RNN but may be larly when the dataset size going to be large.
similar to DGRU.
7. Conclusion
6.4. DLSTM versus reported approaches
In this paper, we developed a promising prediction model can
This is the most important comparison between the proposed be used in the majority of time series forecasting problems. How-
DLSTM model and other reported approaches, the NEA model ever, in this paper, it is tested specifically in case of petroleum
[28] and the HONN model [18] since these three models are non- time series applications. The proposed model is a deep architec-
linear and present different origins. For the NEA model, it is clear ture of the Long-Short Term Memory (LSTM) recurrent network,
in Tables 6 12 that the DLSTM model outperforms the NEA model where we denoted it as DLSTM. The paper empirically evidences
with a difference approaches to one point in case study 1. Namely, that, stacking of more LSTM layers ensures to recover the limita-
DLSTM achieved 2.9 against 4.2 achieved by NEA, whereas in case tions of shallow neural network architectures, particularly, when
study 2, the DLSTM achieved 3.4 against 4.2 achieved by NEA. This long interval time series datasets are used. In addition, the pro-
indicates that the DLSTM model is more accurate than the NEA posed deep model can describe the nonlinear relationship between
model in predicting the future oil production. the system inputs and outputs, particularly, if we knew that the
Superiority in performance is not the only advantage of DLSTM petroleum time series data are heterogeneous and full of complex-
over NEA but also NEA performance is evidenced to be highly de- ity and missing parts.
pendent on the selection of several parameters, as explained by Notably, in the two case studies described in this paper the pro-
the authors of [28]. Among these parameters, the most important posed model outperformed its counterparts deep RNN and deep
parameters, which may affect the NEA performance includes:(i) GRU. In addition, the performance of the proposed DLSTM is ob-
the regularized parameter (γ ), which controls the smoothness of served to be much better than the statistical ARIMA model. The
the model, and (ii) the kernel parameter (σ ) of the Gaussian ker- most important comparisons that conducted with two recent re-
nel used in the NEA model. It is demonstrated by the authors ported machine learning approaches, denoted as NEA and HONN,
in [28] that the NEA’s performance is sensitive to the values of where DLSTM outperformed both of them with a noticeable differ-
these two parameters. Accordingly, to investigate the performance ence on the scale of two different percentage error measures.
of NEA model in the prediction of oil production, several experi- The accurate prediction and learning performance shown in the
ments should be conducted in order to find improved and suitable paper indicate that the proposed deep LSTM model, and other
combinations of these parameters. deep neural network models, are eligible to be applied in the non-
Furthermore, the performance of these parameters in training linear forecasting problems in the petroleum industry. In our fu-
phase is totally reversed in testing phase. For example, the train- ture research plans, we will investigate the performance of DL-
ing errors are growing with larger (σ ) , whereas it is decreased STM in other forecasting problems especially when the problem
for testing errors. The converse will be in the case of (γ ) parame- includes multi-variables (multivariate) time series data.
ter, where training errors decrease with larger (γ ) but the testing
errors remain monotonic. If the designer is not aware of this rela-
Acknowledgments
tionship, the larger values of (σ ) will convert the model from non-
linear behavior to linear behavior [28]. In other words, the over-
The authors of this paper would like to express about their
all performance of NEA in training phase and testing phase is not
thank and gratitude to “Deanship of Scientific Research” at King
sufficiently harmonious and require careful deliberation with the
Faisal University, Saudi Arabia for their moral and financial support
parameters selections.
to this work under the research grant number (170069).
For the HONN model [18], we should highlight that this model
is similar to traditional multilayer feed forward neural network.
References
The difference here is that HONN employs what is called Higher-
Order Synaptic Operations (HOSO). HOSO of HONN embraces the [1] J.G. De Gooijer, R.J. Hyndman, 25 years of time series forecasting, Int. J. Fore-
linear correlation (conventional synaptic operation) as well as the cast. 22 (3) (2006) 443–473.
higher-order correlation of neural inputs with synaptic weights. In [2] D.S. Poskitt, A.R. Tremayne, The selection and use of linear and bilinear time
series models, Int. J. Forecast. 2 (1) (1986) 101–114.
the paper of [18], different HOSO have been applied up to third- [3] H. Tong, Non-Linear Time Series: A Dynamical System Approach, Oxford Uni-
order, where the first-order, the second-order, and the third-order versity Press, 1990.
JID: NEUCOM
[4] R.F. Engle, Autoregressive conditional heteroscedasticity with estimates of the [32] M. Hermans, B. Schrauwen, Training and analysing deep recurrent neural net-
variance of united kingdom inflation, Econometrica 50 (4) (1982) 987–1007. works, in: Proceedings of the 26th International Conference on Neural Infor-
[5] G. Zhang, B.E. Patuwo, M.Y. Hu, Forecasting with artificial neural networks: the mation Processing Systems NIPS 1, 2013, pp. 190–198.
state of the art, Int. J. Forecast. 14 (1998) 35–62. [33] S. Spiegel, J. Gaebler, A.L.E. De Luca, S. Albayrak, Pattern recognition and
[6] M.H. sken, P. Stagge, Recurrent neural networks for time series classification, classification for multivariate time series, in: Proceeding of the Fifth Inter-
Neurocomputing 50 (2003) 223–235. national Workshop on Knowledge Discovery from Sensor Data, 2011, pp.
[7] Bayer, J. Simon, Learning Sequence Representations, Technische Universitt 34–42.
Mnchen, 2015. [34] F.-A. Fortin, F.-M. De Rainville, M.-A. Gardner, M. Parizeau, C. Gagné, DEAP:
[8] Pascanu, Razvan, T. Mikolov, Y. Bengio, On the difficulty of training recurrent evolutionary algorithms made easy, J. Mach. Learn. Res. 13 (2012) 2171–2175.
neural networks, in: Proceedings of the 30th International Conference on Ma- [35] Seabold, Skipper, Perktold, Josef, Statsmodels: Econometric and statistical mod-
chine Learning (3), volume 28, 2013, pp. 1310–1318. eling with python, in: Proceeding of the 9th Python in Science Conference,
[9] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. 9 (8) 2010.
(1997) 1735–1780. [36] E.J. Bedrick, C.L. Tsai, Model selection for multivariate regression in small sam-
[10] I. Sutskever, Training recurrent neural networks, University of Toronto, 2012 ples, Biometrics 50 (1994) 226–231.
Ph.d. thesis. [37] C. Junyoung, C. Gulcehre, C. KyungHyun, Y. Bengio, Empirical evaluation of
[11] M. Längkvist, L. Karlsson, A. Loutfi, A review of unsupervised feature learning gated recurrent neural networks on sequence modeling, in: Proceeding of the
and deep learning for time-series modeling, Pattern Recognit. Lett. 42 (2014) Deep Learning workshop at NIPS, 2014.
11–24. [38] R.J. Hyndman, A.B. Koehler, Another look at measures of forecast accuracy, Int.
[12] M. Hermans, B. Schrauwen, Training and analyzing deep recurrent neural net- J. Forecast. 22 (4) (2006) 679–688.
works, in: Proceedings of the 26th International Conference on Neural Infor- [39] R.J. Hyndman, Measuring forecast accuracy, in: M. Gilliland, L. Tashman,
mation Processing Systems NIPS 1, 2013, pp. 190–198. U. Sglavo (Eds.), Business Forecasting: Practical Problems and Solutions, John
[13] R. Pascanu, C. Gulcehre, K. Cho, Y. Bengio, How to construct deep recurrent Wiley & Sons, 2016, pp. 177–183.
neural networks, in: Proceedings of the Second International Conference on
Learning Representations ICLR, 2014.
[14] P.E. Utgoff, D.J. Stracuzzi, Many-layered learning, Neural Comput. 14 (10) Dr. Alaa Sagheer received his B.Sc. and M.Sc. in Math-
(2002) 2497–2529. ematics from Aswan University, EGYPT. He got Ph.D. in
[15] Q. Yang, X. Wu, 10 challenging problems in data mining research, Int. J. Inf. Computer Engineering in the area of Intelligent Systems
Technol. Decis. Making 5 (2006) 597–604. from the Graduate School of Information Science and
[16] R. Mehrotra, R. Gopalan, Factors infleuencing strategic decision-making process Electrical Engineering, Kyushu University, Japan in 2007.
for the oil/gas industriesof UAE-a study, Int. J. Mark. Financ. Manag. 5 (2017) After receiving PhD, he served as an Assistant Professor
62–69. at Aswan University. In 2010, Dr. Sagheer established and
[17] R.B.C. Gharbi, G.A. Mansoori, An introduction to artificial intelligence appli- directed the Center for Artificial Intelligence and Robotics
cations in petroleum exploration and production, J. Pet. Sci. Eng. 49 (2005) (CAIRO) at Aswan University. He served also as the Prin-
93–96. cipal Investigator in CAIRO in several research and aca-
[18] N.C. Chakra, K.-Y. Song, M.M. Gupta, D.N. Saraf, An innovative neural forecast of demic projects funded by different Egyptian governmen-
cumulative oil production from a petroleum reservoir employing higher-order tal organizations. In 2013, Dr. Sagheer and his team won
neural networks (HONNs), J. Pet. Sci. Eng. 106 (2013) 18–33. the first prize, in a programming competition organized
[19] R. Nyboe, Fault detection and other time series opportunities in the petroleum by the Ministry of Communication and Information Technology (MCIT) Egypt, for
industry, Neurocomputing 73 (10–12) (2010) 1987–1992. their system entitled Mute and Hearing Impaired Education via an Intelligent Lip
[20] L. Martí, N. Sanchez-Pi, J. Molina, A. Garcia, Anomaly detection based on sensor Reading System. In 2014, he appointed as an Associate Professor at Aswan Univer-
data in petroleum industry applications, Sensors 15 (2015) 2774–2797. sity. In the same year, Dr. Sagheer joined the Department of Computer Science, Col-
[21] J.D. Cryer, K.-S. Chan, Time Series Analysis, 2nd ed., Springer Texts in Statistics, lege of Computer Sciences and Information Technology, King Faisal University, Saudi
Springer, New York, 2008. Arabia. Dr. Sagheers research interests include artificial intelligence, machine learn-
[22] S.L. Ho, M. Xie, The use of ARIMA models for reliability forecasting and analy- ing, pattern recognition, computer vision, and optimization theory. Recently, Dr.
sis, Comput. Ind. Eng. 35 (1998) 213–216. Sagheer extended his research interest to include quantum computing and quantum
[23] J. Choi, D.C. Roberts, E. Lee, Forecasting oil production in north dakota using communication. He authored and co-authored more than 40 research articles in his
the seasonal autoregressive integrated moving average (s-ARIMA), Nat. Resour. research interest fields. Dr. Sagheer is a member of IEEE and IEEE Computational
6 (2015) 16–26. Intelligence society. He is a reviewer for some journals and conferences related to
[24] A. Kamari, A.H. Mohammadi, M. Lee, A. Bahadori, Decline curve based models his research interests.
for predicting natural gas well performance, Petroleum 3 (2017) 242–248.
[25] I. Aizenberg, L. Sheremetov, L. Villa-Vargas, J. Martinez-Muñoz, Multilayer neu-
ral network with multi-valued neurons in time series forecasting of oil produc- Mostafa Kotb received his B.S. degree in computer sci-
tion, Neurocomputing 175 (2016) 980–989. ence from Aswan University, Aswan, Egypt at 2012. He
[26] S. Berneti, M. Shahbazian, An imperialist competitive algorithm artificial neural is now a master student and a research assistant in Cen-
network method to predict oil flow rate of the wells, Int. J. Comput. Appl. 26 ter for Artificial Intelligence and Robotics (CAIRO), Aswan
(2011) 47–50. University. His research interests include artificial intelli-
[27] Z. Liu, Z. Wang, C. Wang, Predicting reservoir production based on wavelet gence, machine learning, and deep learning.
analysis-neural network, advances in computer science and information en-
gineering, Adv. Intell. Soft Comput. 168 (2012).
[28] X. Ma, Predicting the oil production using the novel multivariate nonlinear
model based on Arps decline model and kernel method, Neural Comput. Appl.
29 (2016) 1–13.
[29] S.B. Taieb, G. Bontempi, A.F. Atiya, A. Sorjamaa, A review and comparison of
strategies for multi-step ahead time series forecasting based on the NN5 fore-
casting competition, Expert Syst. Appl. 39 (2012) 7067–7083.
[30] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015) 436–444.
[31] K. Greff, R.K. Srivastava, J. Koutnk, B.R. Steunebrink, J. Schmidhuber, LSTM:
A search space odyssey, IEEE Trans. Neural Netw. Learn. Syst. 28 (2017)
2222–2232.

Time Series Forecasting of Petroleum

Uploaded by

Copyright:

Available Formats

Time Series Forecasting of Petroleum

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Time Series Forecasting of Petroleum

Uploaded by

Copyright:

Available Formats

JID: NEUCOM

ARTICLE IN PRESS [m5G;October 15, 2018;15:11]

Contents lists available at ScienceDirect

Time series forecasting of petroleum production using deep LSTM

2 A. Sagheer, M. Kotb / Neurocomputing 000 (2018) 1–11

A. Sagheer, M. Kotb / Neurocomputing 000 (2018) 1–11 3

regulate such constant error ﬂow [10]. Therefore, LSTM approxi-

• The ﬁrst step in LSTM is to decide what information is going

4 A. Sagheer, M. Kotb / Neurocomputing 000 (2018) 1–11

In this section, we will show in detail all the experimental set-

4.1. Experimental setting

The following experimental settings have been adopted for all

4.1.1. Data preprocessing

A. Sagheer, M. Kotb / Neurocomputing 000 (2018) 1–11 5

Step 2: Transform raw data to stationary data 1. The ARIMA model

6 A. Sagheer, M. Kotb / Neurocomputing 000 (2018) 1–11

A. Sagheer, M. Kotb / Neurocomputing 000 (2018) 1–11 7

1 [3] 1352 3 1 0.267 3.783

Fig. 4. Production data v.s. prediction using DLSTM-static.

Fig. 5. Production data v.s. prediction using DLSTM-dynamic.

8 A. Sagheer, M. Kotb / Neurocomputing 000 (2018) 1–11

Fig. 6. Production data v.s. prediction using DLSTM-static.

1 [4] 1870 2 0.256 3.610

1 [5] 1259 6 4 0.029 4.219

A. Sagheer, M. Kotb / Neurocomputing 000 (2018) 1–11 9

Fig. 7. Production data v.s. prediction using DLSTM-dynamic.

HONN[18] 0.001 0.035 3.459

1 [2] 787 2 0.029 4.125

10 A. Sagheer, M. Kotb / Neurocomputing 000 (2018) 1–11

A. Sagheer, M. Kotb / Neurocomputing 000 (2018) 1–11 11

You might also like