Sag Heer 2019

Communicated by Prof.
Zidong Wang
Accepted Manuscript
Time Series Forecasting of Petroleum Production using Deep LSTM

Recurrent Networks
Alaa Sagheer, Mostafa Kotb
PII: S0925-2312(18)31163-9
DOI: https://doi.org/10.1016/j.neucom.2018.09.082
Reference: NEUCOM 20015
To appear in: Neurocomputing
Received date: 11 November 2017

Revised date: 4 July 2018
Accepted date: 30 September 2018
Please cite this article as: Alaa Sagheer, Mostafa Kotb, Time Series Forecasting of
Petroleum Production using Deep LSTM Recurrent Networks, Neurocomputing (2018), doi:
https://doi.org/10.1016/j.neucom.2018.09.082
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Time Series Forecasting of Petroleum Production

using Deep LSTM Recurrent Networks
Alaa Sagheera,b,∗, Mostafa Kotbb
a
College of Computer Science and Information Technology, King Faisal University, Kingdom of Saudi Arabia
b
Center for Artificial Intelligence and RObotics (CAIRO), Faculty of Science, Aswan University, Egypt
T
Abstract
IP
Time series forecasting (TSF) is the task of predicting future values of a given sequence using historical data.
CR
Recently, this task has attracted the attention of researchers in the area of machine learning to address the
limitations of traditional forecasting methods, which are time-consuming and full of complexity. With the
increasing availability of extensive amounts of historical data along with the need of performing accurate pro-
US
duction forecasting, particularly a powerful forecasting technique infers the stochastic dependency between
past and future values is highly needed. In this paper, we propose a deep learning approach capable to address
AN
the limitations of traditional forecasting approaches and show accurate predictions. The proposed approach
is a deep long-short term memory (DLSTM) architecture, as an extension of the traditional recurrent neural
network. Genetic algorithm is applied in order to optimally configure DLSTM’s optimum architecture. For
M
evaluation purpose, two case studies from the petroleum industry domain are carried out using the production
data of two actual oilfields. Toward a fair evaluation, the performance of the proposed approach is compared
with several standard methods, either statistical or soft computing. Using different measurement criteria, the
ED
empirical results show that the proposed DLSTM model outperforms other standard approaches.
Keywords: Time Series Forecasting, Deep Neural Networks, Recurrent Neural Networks, Long-Short Term
PT
Memory, Petroleum Production Forecasting
1. Introduction the past, the TSF problem has been influenced by lin-
CE
ear statistical methods in order to achieve the fore-

Time Series Forecasting (TSF) system involves
casting activities. Recently, several useful nonlinear
predicting the system behavior in future, which is
AC
time series models were proposed such as the bilin-

based on information of the current and past status
ear model [2], the threshold autoregressive model [3]
of the system. Presently, TSF plays an imperative
and the Autoregressive Conditional Heteroscedastic
role in several real world problems, such as the fi-
(ARCH) model [4], among others. However, the ana-
nancial markets, network traffic, weather forecasting,
lytical study of non-linear time series analysis is still
and petroleum (or oil) industry, among others [1]. In
in its infancy compared to linear time series [1].
∗
In the last two decades, several Artificial Neu-
Corresponding author.
Tel :966547184611 ral Network (ANN) algorithms have drawn atten-
Email address: [email protected] (Alaa Sagheer)
Preprint submitted to Neurocomputing October 5, 2018
ACCEPTED MANUSCRIPT
tion and have established themselves as serious con- more effective use of the parameters of each LSTM’s
tenders to statistical methods in the forecasting com- layer in order to train the forecasting model effi-
munity after they showed better prediction accura- ciently. It works as follows: each LSTM layer op-
cies [5]. Given the several ANN algorithms, iden- erates at different time scale and, thereby, processes
tifying a specific ANN algorithm for a forecasting a certain part of the desired task and, subsequently,
task should be based on a compromise among three passes it on to the next layer until finally the last layer
aspects; namely, the complexity of the solution, the generates the output [12, 13].
desired prediction accuracy, and data characteristics Thus, we can attribute the benefit of stacking
T
[5]. Considering the first two aspects, i.e. precision more than LSTM layer to the recurrent connections
and complexity, the best results are obtained by the between the units in the same layer, and the feed-
IP
Feed Forward NN predictor, in which the informa- forward connections between units in an LSTM layer
CR
tion goes through the network in the forward direc- and the LSTM layer above it [13, 14]. This ensures
tion only. However, on the addition of the third as- an improved learning with more sophisticated condi-
pect, i.e. the data characteristics, Recurrent Neural tional distributions of any time series data. Also, it
Network (RNN) is found to be more suitable than
FFNN [6].
In RNN, the activations from each time step are
US can perform hierarchical processing on difficult tem-
poral tasks, and more naturally, capture the structure
of data sequences [11].
AN
stored in the internal state of the network in order Towards fair evaluation, here we in this study
to provide a temporal memory property [7]. How- train and validate the DLSTM model through more
ever, the most major weakness of RNN is carried than a scenario, where we have used the genetic algo-
M
out during the requirement of learning long-range rithm in order to optimally design and configure the
time dependencies [7, 8]. To overcome this draw- best DLSTM architecture and parameters. Concur-
ED
back, Hochreiter et al. [9] developed the Long Short- rently, we compare the DLSTM’s performance with
Term Memory (LSTM) algorithm as an extension to the performance of other reference models using the
RNN [8, 10]. Despite the advantages cited for LSTM same datasets, and same experimental conditions via
PT
and its predecessor RNN, their performances for TSF different error measures. The reference models vary
problem are not satisfactory. Such shallow architec- from statistical methods, neural networks (shallow
CE
tures, can not represent efficiently the complex fea- and deep) methods, and hybrid (statistical and neu-
tures of time series data, particularly, when attempt- ral networks) methods.
ing to process highly nonlinear and long interval time The remainder of the paper is organized as fol-
AC
series datasets [8, 11]. lows: Section 2 describes the TSF problem and asso-
In this paper, we propose that a Deep LSTM (DL- ciated works in the oil and petroleum industry. The
STM) architecture can adapt with learning the non- proposed DLSTM model is presented in section 3.
linearity and complexity of time-series data. The Section 4 shows the experiment settings of this pa-
proposed deep model is correspondingly an exten- per. The experimental results of two case studies are
sion of the original LSTM model, where it includes shown in section 5. Discussion and analysis of the
multiple LSTM layers such that each layer contains results are provided in section 6 and, finally, the pa-
multiple cells. The proposed model demonstrates a per is concluded in section 7.
2
ACCEPTED MANUSCRIPT
2. TSF Problem Statement It is known that, the petroleum production from

a reservoir is dependent on several dynamic parame-
The majority of real-world time series data sets
ters, such as fluid saturation and pressure in the reser-
have a temporal or time sequence property, partic-
voir, and static parameters such as porosity and per-
ularly, in forecasting activities for weather, stock
meability [18]. The majority of these parameters are
markets, robotics, and oilfields production, among
not always available. Certainly, this limited data ac-
others. Correspondingly, it has been observed that
cess from the petroleum reservoirs lessens the overall
finding an effective method for forecasting trends in
accuracy of forecasting [11].
time-series datasets continues to be a long-standing
T
unsolved problem with numerous potential applica- 2.2. Related Works
IP
tions [1]. For this reason, time series forecasting is
Several approaches have been developed to over-
considered as one of the top ten challenging prob-
CR
come the aforementioned petroleum TSF challenges,
lems in data mining due to its unique properties
however, yet the key for a successful forecasting
[15]. In this paper, we focus on the TSF problem
lies in choosing the right representation among these
of petroleum fields production.
2.1. Overview of Petroleum TSF US approaches [11]. These approaches can be classi-
fied into two broad categories; namely, statistical
approaches, and soft computing approaches. One
AN
Forecasting of the petroleum production is a very
of the most common traditional statistical methods
pertinent task in the petroleum industry, where the
is the Autoregressive Integrated Moving Average
accurate estimation of petroleum reserves involves
(ARIMA) [22].
M
massive investment of money, time and technology

ARIMA and its variants can be used to achieve
in the context of a wide range of operating and main-
diverse forecasting activities in the petroleum indus-
tenance scenarios [16, 17]. As such, a fairly precise
ED
try such as, prices, consumption levels, and reser-

estimation of petroleum quantity in the reservoir is in
voir production [23]. Another known mathemati-
high demand [17, 18]. However, several characteris-
cal method is the Decline Curve Analysis (DCA)
PT
tics of petroleum time series data make such estima-

method, which is based on the conventional ARPs
tions more challenging.
equation. Historically, DCA has been widely used in
First of all, the samples of petroleum time series data
CE
petroleum industry, particularly in the scenarios de-

often contain excessive noise, defects and anomalies,
picting the decline of petroleum production with the
and, also sometimes, high dimensionality [19, 20].
increase in production time [24].
AC
Second, the petroleum time series datasets are non-

Nevertheless, the performance of traditional
stationary and may exhibit variable trends by nature
mathematical methods is still questionable. Indeed,
[21]1 . This implies that the statistical characteristics
more complex, high-dimensional, and noisy real-
of the data, such as frequency, variance, and mean,
world time-series data cannot be described with an-
undergo alternation over time [11]. Third, the rock
alytical equations based on parameters, in order to
and fluid properties of the reservoirs are highly non-
solve since the dynamics that are either too complex
linear and heterogeneous in nature [19].
or unknown [11], as the case of DCA. Moreover, the
1
see chapter(3-5) in [21] main drawback of traditional methods is that these
3
ACCEPTED MANUSCRIPT
methods are based mainly on the analysis of subjec- bines the kernel trick with the Arps exponential de-
tive data types. In other words, they pick the proper cline equation. It is worthy to mention that, in this
slope, and subsequently tune in the parameters of the paper we conduct a comparison with both Chekra
numerical simulation model, in such a way that the [18] and Ma [28] approaches since both of these con-
reasonable values are retained, and finally, they are tributions present application of the same case stud-
able to provide interpretations of the oilfield’s geol- ies as described in this paper.
ogy [25]. But the oilfield’s geology and fluid proper-
ties of the oilfields are highly nonlinear and hetero- 2.3. Motivation
T
geneous in nature, thus yielding time series data that Although the soft computing methods, that em-
represent a long memory process. Certainly these
IP
ploy different ANN algorithms, are used to re-
properties represent big challenges for traditional ap- cover the aforementioned limitations of the statistical
CR
proaches, which still are far from estimating the ac- methods and yield more accurate forecasting rates,
curate future production of petroleum [17] [11]. they are observed to still face some challenges. It
Since the past decade, sincere efforts have been is demonstrated that the traditional ANNs with shal-
evidently published in the literature presenting the
use of soft computing methods to achieve different
forecasting activities in a number of petroleum en-
US low architectures are devoid of sufficient capacity to
accurately model the aforementioned complexity as-
pects of time series data, such as, high nonlinearity,
AN
gineering applications. In 2011, Berneti et al. pre- longer intervals, and big heterogeneous properties
sented an imperialist competitive algorithm using [28, 29]. This reason, and more, motivated us to
ANN to predict oil flow rate of the oil wells [26]. solve the TSF problem using a Deep Neural Network
M
In 2012, Zhidi et al. combined wavelets transfor- (DNN) architecture instead of shallow NN architec-
mation with ANN in order to establish a production- ture models. DNNs models are termed deep because
ED
predicting model that used drill stem to test produc- they are constructed by stacking multiple layers of
tion and wavelet coefficients [27]. In 2013, Chakra et nonlinear operations one top of one another with sev-
al. presented an innovative higher-order NN model eral hidden layers [30].
PT
to focused on forecasting cumulative oil production

from a petroleum reservoir located in Gujarat, India 3. The Proposed Model
CE
[18].
Prior to introducing the proposed model, it is es-
More recently in 2016, Aizenberg et al. pre-
sential to describe briefly the original LSTM as in
sented a multilayer NN with multi-valued neurons
AC
the current study it is the precedent of the proposed

capable of performing a time series forecasting of
model.
oil production [25]. Aizenberg model is based on
a complex-valued neural network with a derivative- 3.1. The Original LSTM Model
free backpropagation-learning algorithm. Eventu-
ally, Ma et al. presented an extension of the Arps The traditional Recurrent Neural Network
decline model, which was constructed within a non- (RNN), aka vanilla RNN, is one of the recursive
linear multivariate prediction approach [28]. The ap- neural network approaches that can be applied for
proach is considered as a hybrid approach that com- modeling of sequential data. The key feature of
4
ACCEPTED MANUSCRIPT
RNN is the network delay recursion, which enables ory cell), which looks like a conveyor belt. It runs
it to describe the dynamic performance of systems straight down the entire chain with the ability to add
[6]. The signal delay recursion makes the output or remove information to the cell state, carefully reg-
of the network at time t associate not only with the ulated by structures called gates. The gates are ways
input at time t but also with recursive signals before for optional inlet of information. They are composed
time t, as shown in Fig. 1. However its capability of a sigmoid neural net layer and a pointwise multi-
to process short-term sequential data, the weakness plication operation as depicted in Fig. 2. An input
of RNN is carried out when learning long-range at time step t is (Xt ), and the hidden state from the
T
dependencies, or long-term context memorization, previous time step (S t−1 ) that is introduced to LSTM
is demanded in time series forecasting applications block, and then the hidden state (S t ) is computed as
IP
[8, 10]. follows:
CR
• The first step in LSTM is to decide what
information is going to be thrown away from
the cell state. This decision is made by the
US following forget gate ( ft ):

AN
Fig. 1. Processing of time ft = σ(Xt U f + S t−1 W f + b f ) (1)
sequence in RNN
• The following step is to decide which new in-

However, despite the introduction of several formation is going to be stored in the cell state.
M
RNN variants, the Long Short-Term Memory This step has two folds: First, the input gate (it )
(LSTM) model is the elegant RNN’s variant, which layer decides which values to be updated. Sec-
ED
uses the purpose-built LSTM’s memory cell in or- ond, a tanh layer that creates a vector of new
der to represent the long-term dependencies in time candidate values C̃t . These two folds can be
series data [31]. In addition, LSTM is introduced described as follows:
PT
to solve the vanishing gradient problem of RNN in

case of a long-term context memorization is required it = σ(Xt U i + S t−1 W i + bi ) (2)
CE
[8, 9]. The LSTM model, developed by Horchre-

iter et al. [9], truncates the gradients in the network C̃t = tanh(Xt U c + S t−1 W c + bc ) (3)
where it is innocuous by enforcing constant error
AC
• Then, update the old cell state, Ct−1 into the

flows through constant error carousels within special
new cell state Ct , which can be given as:
multiplicative units. These nonlinear units learn to
open or close gates in the network in order to regu-
Ct = Ct−1 ⊗ ft ⊕ it ⊗ C̃t (4)
late such constant error flow [10]. Therefore, LSTM
approximates the long-term information with signif-
• Finally, decide what is going to be produced
icant delays by expediting the conventional RNN al-
as output. This output will be based on the cell
gorithm [31].
state, but will be a filtered version. In this step,
The key in LSTM structure is the cell state (mem-
the output gate (ot ) decides what parts of the
5
ACCEPTED MANUSCRIPT
T
IP
CR
Fig. 2. LSTM block, where ft , it , ot are forget, input, and output gates respectively
cell state are going to be produced as output. shown in Fig. 3, one after another connected in a
Then, the cell state goes through tanh layer (to
push the values to be between -1 and 1) and
US deep recurrent network fashion to combine the ad-
vantages of a single LSTM layer. The goal of stack-
multiply it by the output gate as follows:
AN
ing multiple LSTM in such a hierarchical architec-
ture is to build the features at the lower layers that
ot = σ(Xt U o + S t−1 W o + bo ) (5) will disentangle the factors of variations in the input
M
data and then combine these representations at the

S t = ot ⊗ tanh(Ct ) (6)
higher layers. In case of large or complex data, it
is demonstrated that such deep architecture will gen-
ED
From the previous six equations, the LSTM presents

eralize better due to a more compact representation
the following three groups of parameters:
than shallow architecture [11, 14, 32].
1. Input weights: U f , U i , U o , U c .
PT
2. Recurrent weights: W f , W i , W o , W c .
CE
3. Bias: bf , bi , bo , bc .
3.2. The Deep LSTM Recurrent Network

AC
It is widely demonstrated that increasing the

depth of a neural network is an effective way to
improve the overall performance [30]. Encouraged
by the impressive learning abilities of deep recur-
rent network architectures [32], we have developed
a deep LSTM recurrent network to be used in time Fig. 3. The architecture of DLSTM recurrent network
series forecasting applications. In the proposed DL-
STM, we are able to stack several LSTM blocks, as In the DLSTM architecture as shown in Fig. 3,
6
ACCEPTED MANUSCRIPT
the input at time t, Xt is introduced to the first LSTM 4.1.1. Data preprocessing
(1)
block along with the previous hidden state S t−1 , the The data used in this paper are the raw production
superscript (1) refers to the first LSTM. The hidden data of two actual oilfields, so it is highly possible to
state at time t, S t(1) is computed as shown in section include noise as in influencing factor. As such, it
3.1 and goes forward to the next time step and also is not appropriate to use the raw production data in
goes upward to the second LSTM block. The sec- the learning of NN because NN requires extremely
ond LSTM uses the hidden state S t(1) along with the low learning rates. Thus, a preprocessing scenario
(2)
previous hidden state S t−1 to compute S t(2) , that goes consists of four steps has been incorporated before
T
forward to the next time step and upward to the third the use of the raw production data in the experiments
LSTM block and so on, until the last LSTM block is of this paper.
IP
compiled in the stack.
The benefit of such stacked architecture is that Step 1: Reduce noise from raw data
CR
each layer can process some part of the desired task To smoothen the raw data and remove any pos-
and subsequently pass it on to the next layer until fi- sible noise we will use the moving average fil-
nally the last accumulated layer provides the output.
Another benefit, such architecture allows the hidden
state at each level to operate at a different timescale.
US ter as a type of low pass filter in the analogous
way as described in [18]. Specifically, this fil-
ter provides a weighted average of past data
AN
The last two benefits have great impact in scenar- points in the time series production data within
ios showing the use of data with long-term depen- a time span of five-points to generate smoothed
dency or in case of handling multivariate time series estimation of a time series. This step is impera-
M
datasets [33]. tively incorporated to reduce the random noise

in data by retaining the sharpest step response
ED
4. Experiments associated with raw data [18].
In this section, we will show in detail all the ex- Step 2: Transform raw data to stationary data
PT
perimental settings that were adopted in order to im-

As explained in section 2.1, time series data
plement both the proposed model and the reference
are non-stationary data, which may, in fact, ex-
models using real datasets. In addition, this sec-
CE
hibit a specific trend [21]. Of course, station-

tion provides a brief overview of the reference mod-
ary data is easier to model and will very likely
els and the optimality criteria, we will rely on for a
result in more skillful forecasts. In the cur-
AC
comparative analysis of their performance with the

rent preprocessing step, we removed the trend
proposed model’s performance. The codes that have
property in the data; whether it is increasing
been used in the experiments of this paper are shared
or decreasing trend. Later on, we added the
in github2 .
trend back to forecasts in order to return the
4.1. Experimental Setting forecasting problem into the original scale and
The following experimental settings have been calculate a comparable error score. A standard
adopted for all experiments conducted in this paper. way to remove the trend is by the differencing
of the data. That is the observation from the
2
https://github.com/DeepWolf90/DLSTM2
7
ACCEPTED MANUSCRIPT
previous time step (t-1) is subtracted from the 4.1.3. Training of DLSTM
current observation (t) [21]3 . In the training phase of DLSTM experiments,
Step 3: Transform data into supervised learning we use the Genetic Algorithm (GA) to infer optimal
selection for the proposed model hyper-parameters.
We use one-step ahead forecast, where the next
We implemented the GA using Distributed Evolu-
time step (t+1) is predicted. We divide the time
tionary Algorithms in Python (DEAP) library [34].
series into input (x) and output (y) using lag
The number of hyper-parameters is based on the
time method, and specifically, in the study we
implementation scenario. For the static scenario,
T
have used different sizes of lag from lag1 to
there are three hyper-parameters, namely, number
lag6.
IP
of epochs, number of hidden neurons, and the lag
Step 4: Transform data into the problem scale size. For the dynamic scenario, there are four hyper-
CR
parameters, the same three hyper-parameters of the
Like other neural networks, DLSTM expects
static scenario, plus the number of updates, which is
data to be within the scale of the activation
the number of times we update our forecasting model
function used by the network. The default ac-
tivation function for LSTM is the hyperbolic US
tangent (tanh), wherein its output values lie be-
each time step when new observations from the test-
ing data are inserted. This methodology is typically
AN
adopted in the experiments of other neural networks
tween -1 and 1. This is the preferred range for
in the reference models.
the time series data. Later on, we transformed
the scaled data back in order to return the fore- 4.2. Reference Models
M
casting problem into the original scale.

Toward a fair evaluation, we will compare the
4.1.2. Implementation scenario proposed DLSTM model with different reference
ED
models that vary from statistical methods, machine

The implementation experiments of the pro-
learning methods, and hybrid (statistical and ma-
posed DLSTM model include two different scenar-
PT
chine learning) methods. The reference models are:

ios, namely, (i) static scenario and (ii) dynamic sce-
nario. In the static scenario, we fit the forecasting
1. The ARIMA model
CE
model with all of the training data and then fore-

The comparison with the Auto-Regression
cast each new time step once at a time with the test-
Integrated Moving Average (ARIMA) algorithm
ing data. In the dynamic scenario, we update the
AC
represents a statistical-based comparison. For

forecasting model each time step with the insertion
comparison purpose, we will implement the ARIMA
of new observations from the testing data. In other
program using Statsmodels [35] library where we
words, the dynamic scenario uses the value of the
use the grid search to iteratively explore different
previous forecasted value of the dependent variable
combinations of the known ARIMA parameters (p,
to compute the next one, whereas the static forecast
d, q), see chapter 5 [21]. For each combination
uses the actual value for each subsequent forecast.
of these parameters, we fit a new ARIMA model
3
chapter 3 in [21]
and, subsequently, we use the Akaike Information
Criterion (AIC) value to choose the best combination
8
ACCEPTED MANUSCRIPT
of these parameters [36]. The AIC measures how forward multilayer neural network model that em-
well a model fits with the data in consideration of ploys what is called higher-order synaptic operations
the overall model complexity. (HOSO). HOSO of HONN embraces the linear cor-
relation (conventional synaptic operation) as well as
2. The Vanilla RNN model the higher-order correlation of neural inputs with
The comparison with the vanilla RNN model synaptic weights [18]. For the comparison purpose,
represents a machine learning-based comparison. we will rely on the results introduced by the authors
The original RNN is already covered briefly in of [18] by exclusively using the second case study,
T
section 3.1. For comparison purpose, we will since they did not apply their method on the first case
implement two RNN reference models, one with study described in this paper.
IP
single hidden layer and the other one with multiple
4.3. Forecasting accuracy measures
CR
hidden layers [13, 14].
In the literature, two kinds of errors are usually
3. The DGRU model measured in order to estimate the forecasting pre-
The comparison with the Deep Gated Recurrent
Unit (DGRU) model represents a deep learning-
based comparison, where DGRU is a counterpart of
US cision and performance evaluation of the forecasts,
namely, scale-dependent errors and percentage
errors.
AN
DLSTM. It is demonstrated that, the GRU model
is similar to the original LSTM model with the (1) Scale-dependent errors
exception that GRU includes only two gates rather These errors are on the same scale as the data itself.
M
than three [37]. The experiments of DGRU are Therefore, as a limitation, the accuracy measures
typically similar to that of DLSTM. that are based directly on this error cannot be used
ED
to make comparisons between series that are on dif-

4. The NEA model ferent scales. The known scale-dependent measure
The comparison with the Nonlinear Extension is therefore based on the squared error, namely, root
PT
for linear Arps decline (NEA) model represents a mean square error (RMSE) [38], which can be given
hybrid-based comparison. NEA is a hybrid method as follows:
CE
that combines Decline Curve Analysis (DCA), v

t n
which is a traditional statistical method, with the 1 X obs
kernel machine, which is a machine learning method RMSE = (yi − yipred )2 (7)
n i=1
AC
[28]. For comparison purpose, we will rely on the

results provided in [28] using same datasets of both Where, yobs
i is the current observation and yipred is its
case studies described in this paper. predicted value.
5. The HONN model (2) Percentage errors

The comparison with the Higher-Order Neu- Percentage errors have the advantage of being scale-
ral Network (HONN) model is another machine independent, and therefore are frequently used to
learning-based comparison, where HONN is a feed- compare forecast performance between different
9
ACCEPTED MANUSCRIPT
scaled datasets. The most commonly used measure China [28]5 . The dataset of this oilfield contains 227
is the root mean square percentage error (RMSPE) observations of the oil production data, in which the
[38], which can be given as follows: first 182 observations (80% of dataset) have been
v
t used to build, or train, the forecasting models, and
1 X yi − yobs 2
n pred
i
the remaining 45 observations (20% of the dataset)
RMSPE = × 100 (8)
have been used for testing the performance of the
n i=1 yobs
i
forecasting models.
It is clear that, both measures are calculated by The best performance results of the proposed
T
comparing the target values for the time series and DLSTM static scenario, DLSTM dynamic scenario,
its corresponding time series predictions. The results single-RNN, Multi-RNN, and DGRU are shown sep-
IP
obtained using both metrics are different in their cal- arately in Tables 1, 2, 3, 4, and 5, respectively. Each
CR
culated values, but the significance of each metric of these five tables show the values of each hyper-
is similar in performance measurement of the pre- parameter, which has been optimally selected using
diction models. Notably, since the production data the GA as described in section 4.1.3. The relation
preferable to rely on RMSPE, or any other percent-

age error measures, for estimating the relative error
US
presents different scales in the majority of cases, it is between the original production data and their pre-
diction for the DLSTM model is illustrated in Fig-
ures 4 and 5. Table 6 shows an overall comparison
AN
between different models [38]. among these five models along with the best param-
eter combinations of ARIMA method and the best
5. Experimental Results performance results of NEA model reported in [28]6
M
using same data set. The NEA results shown in Ta-

We proceed now to show the quantitative and vi-
ble 6 are imparted as they are given by the authors of
ED
sual results of the proposed DLSTM model along

[28] where they did not consider the RMSE measure.
with the reference models for each case study. No-
tably, the results shown in all tables of this section 5.2. Case study 2: Using production data of Cambay
PT
indicate the performance of the corresponding model Basin oil field in India
in the testing data rather than training data. This has
As the previous case study, we examined the pro-
CE
been done in concurrence with the widely demon-

posed model and the reference models using real pro-
strated fact, which states, the genuine evaluation for
duction data collected through six years from 2004
forecasting performance should be based on unseen
to 2009, i.e. about 63 months. This oilfield is lo-
AC
data not the historical (training) data, which already

cated in the southwestern part of Tarapur Block of
seen by the model [39]4 .
Cambay Basin to the west of Cambay Gas Field in
5.1. Case study 1: Using production data of Block-1 India [18]7 . This oilfield consists of total eight oil
of Huabei oil field in China producing wells that present continuous production
This case study includes raw data collected from the 5

The raw data are listed in the Table 1 in [28]
Block-1 in Huabei oilfield, which is located in north 6
Relation between the original production and the prediction
results of NEA is plotted in Fig. 1 [28]
7
4
See section 3.4 , pages 177:184 The raw data are listed in the Table 1b in [18]
10
ACCEPTED MANUSCRIPT
Table 1: Best results of DLSTM with static scenario
No. of No. of hidden No.of

lag RMSE RMSPE
layer units Epochs
1 [4] 953 5 0.234 3.337
2 [4,2] 787 5 0.227 3.253
3 [5,4,2] 800 5 0.209 2.995
Table 2: Best results of DLSTM with dynamic scenario
T
lag update RMSE RMSPE
IP
layer units Epochs
1 [3] 1352 3 1 0.267 3.783
2 [4,5] 1187 5 1 0.219 3.124
CR
3 [4,3,3] 403 5 2 0.257 3.637
Table 3: Best results of Single-RNN
No. of No.of
lag RMSE RMSPE
US No. of
layer units
Table 4: Best results of Multi-RNN
No. of hidden No.of

Epochs
lag RMSE RMSPE
AN
units Epochs 2 [2,4] 1551 5 0.219 3.129
[4] 1890 5 0.233 3.290 2 [3,4] 1913 5 0.239 3.387
[5] 653 4 0.238 3.366 2 [2,2] 787 5 0.247 3.530
[3] 431 4 0.263 3.740 3 [5,5,4] 457 3 0.258 3.701
M
3 [4,3,4] 1611 5 0.237 3.374

ED
Table 6: Overall comparison among ARIMA, NEA [28], RNN,

DGRU, and DLSTM using data set of case study 1
Table 5: Best results of DGRU
Forecasting
No. of No. of No.of RMSE RMSPE
PT
lag RMSE RMSPE Model

layer hidden units Epochs ARIMA 0.310 4.705
1 [4] 1870 2 0.256 3.610 NEA [28] — 4.221
2 [4,3] 1011 6 0.237 3.391
CE
DLSTM(static) 0.209 2.995

2 [5,3] 1514 6 0.222 3.175 DLSTM(dynamic) 0.219 3.124
3 [4,3,1] 354 6 0.263 3.734 Single-RNN 0.233 3.290
Multi-RNN 0.219 3.129
AC
DGRU 0.222 3.175
history. The authors in [28] and [18] considered only of this oilfield. The relationship between the five in-
the cumulative oil production data from five wells; put series and the output series has been reported to
out of these eight wells. Thus implying the availabil- be highly nonlinear [18].
ity of five input series corresponding to the monthly Accordingly, and toward fair evaluation, in the
production of the five oil wells, plus an output se- experiments of this case study we will consider also
ries as corresponding to the cumulative production the same cumulative data of the same five wells. We
11
ACCEPTED MANUSCRIPT
Training data
Testing data
T
IP
Fig. 4. production data v.s. prediction using DLSTM-static Fig. 5. production data v.s. prediction using DLSTM-dynamic
CR
will follow the same experimental scenario described the HONN model [18], described in section 4.2. In
in [28] and [18] by dividing the production dataset their paper, the authors used three measures to eval-
into two sets, i.e. first set (70% of data set) to be uate their model and these include MSE, RMSE, and
used to build the forecasting models, and second set
(30% of the data set) to be used for testing the perfor-
US MAPE. In the current paper, we have used the RMSE
(RMSE is the root of MSE) as described in section
AN
mance of the forecasting models. The results of each 4.3. Subsequently, in this comparison we calculate
model shown in this section are based on the testing the MAPE measure within our model to compare
data. with the MAPE results of HONN shown in [18]. The
M
The best performance results of the proposed MAPE, as a percentage error measure, can be com-
DLSTM static scenario, DLSTM dynamic scenario, puted as follows:
single-RNN, Multi-RNN, and DGRU are shown sep-
ED
1 X |yi − yobs
n pred
arately in Tables 7, 8, 9, 10, and 11, respectively. i |
MAPE = × 100 (9)
Each table of these five tables shows the values of
n i=1 yobs
i
PT
each hyper-parameter, which optimally selected us- Table 13 shows the comparison between the HONN
ing the GA as described in section 4.1.3. The re- model and the proposed DLSTM model based on the
lation between the original production data and their
CE
three measures. For the proposed DLSTM model,

prediction for the DLSTM model is illustrated in Fig- the best results of both scenarios (static and dynamic)
ures 6 and 7. Table 12 shows an overall comparison are shown in Table 13. The authors of [18] used three
AC
among these five models along with the best param- different lags in their experiments, and the best result
eter combinations of ARIMA method and the best as highlighted by them was inferred using lag 1 [18]8
performance results of NEA reported in [28] using which is included in Table 13.
the same data set. The NEA results shown in Table
12 are imparted as they are given by the authors of 6. Results Analysis and Discussion
[28] where they did not consider the RMSE measure.
In this paper, we tried to ensure a genuine eval-
This case study provides an extra comparison
uation for the proposed model against five different
where we compare the proposed DLSTM model with
8
see table 3 in [18]
12
ACCEPTED MANUSCRIPT
Table 7: Best results of DLSTM with static scenario

lag RMSE RMSPE
layer units Epochs
1 [3] 1700 3 0.025 3.496
2 [1,1] 2000 1 0.030 4.135
3 [2,2,1] 2000 2 0.028 3.926
Table 8: Best results of DLSTM with dynamic scenario
T
lag update RMSE RMSPE
layer units Epochs
IP
1 [5] 1259 6 4 0.029 4.219
2 [2,5] 1500 6 3 0.028 4.060
CR
3 [4,4,5] 1400 6 4 0.032 4.482
Table 10: Best results of Multi-RNN
units
Table 9: Best results of Single-RNN
No. of No.of
Epochs
lag RMSE RMSPE
US No. of
layer
2
No. of hidden
units
[5,1]
No.of
Epochs
1514
lag
5
RMSE RMSPE
0.027 3.731
AN
[1] 1551 4 0.029 4.095 2 [2,4] 1551 5 0.028 4.125
[2] 1115 1 0.029 4.133 2 [2,2] 787 3 0.030 4.196
[1] 953 2 0.030 4.174 3 [1,1,3] 953 4 0.029 4.112
M
3 [1,3,3] 953 2 0.031 4.353
Table 11: Best results of DGRU

ED
No. of No. of hidden units No.of

lag RMSE RMSPE
layer in each layer Epochs
PT
1 [2] 787 2 0.029 4.125

2 [3,1] 431 4 0.030 4.207
3 [1,3,5] 354 6 0.029 4.035
3 [4,3,5] 354 6 0.028 3.991
CE
Table 12: Overall comparison among ARIMA, NEA [28], RNN,

DGRU, and DLSTM using data set of case study 2
AC
Table 13: comparison between HONN[18] and DLSTM

Forecasting Model RMSE RMSPE
ARIMA 0.027 3.773 Forecasting Model MSE RMSE MAPE
NEA [28] — 4.221
HONN[18] 0.001 0.035 3.459
DLSTM(static) 0.025 3.496
DLSTM(static) 0.000 0.025 2.851
DLSTM(dynamic) 0.028 4.060
DLSTM(dynamic) 0.000 0.028 2.976
Single-RNN 0.029 4.095
Multi-RNN 0.027 3.731
DGRU 0.028 3.991
13
ACCEPTED MANUSCRIPT
T
IP
Fig. 6. production data v.s. prediction using DLSTM-static Fig. 7. production data v.s. prediction using DLSTM-dynamic
CR
types of comparison with state-of-the-art techniques are the global minimum values amongst the other ref-
using two real world datasets. More than one stan- erence models. In Tables 7, 8, 10, 11, 12, and 13 of
dard optimality criteria are used to assess the perfor- the other case study, we will notice the same pattern
mance of each model. It is widely demonstrated in
literature that the percentage error measures are the
US for all models, again, with a superiority for the DL-
STM over the other models.
AN
most appropriate tool to assess the performance of However, the DLSTM is the optimum among the
different forecasting models. It also presents the per- other counterparts, though it illustrates a light varia-
centage error capable to estimate the relative error tion in the hyper-parameters values, particularly the
M
between different models particularly when the sam- parameter ”number of layers”. In our opinion, this
ples of the time series data have different scales [39]. variation in the best hyper-parameter values between
Accordingly, in this section we will discuss and ana- the two case studies may be attributed to the higher
ED
lyze the results shown in the previous section where data samples in case study 1 than case study 2. In
we will focus on these results based on the percent- other words, DLSTM does not require large num-
PT
age error measure of each model. ber of layers in case the dataset size is not large. Of
course, as the number of data samples is going to be
6.1. Case 1 versus Case 2
bigger, essentially the performance of DLSTM going
CE
However it is not a real comparison since each to be better [32].

case study has its own samples and source, but we
6.2. DLSTM versus ARIMA
AC
can notice few observations on both case studies. In

case study 1, we can notice from tables 1 and 2 that It is easy to notice in Table 6 for case study 1,
the best results of DLSTM are achieved using three where the dataset is large, all the errors of DL-
LSTM layers and two LSTM layers in static and dy- STM are smaller than those of the ARIMA algo-
namic scenario, respectively. Also, in tables 4 and 5 rithm. Specifically, ARIMA achieved 4.7 minimum
we can notice that the best results of multi-RNN and error whereas DLSTM achieved 2.9. The same pat-
DGRU are achieved using two layers in both cases. tern for case study 2 is presented in Table 12, where
From the accumulated comparison in Table 6, it is ARIMA achieved 3.8 minimum error whereas DL-
clear that the values of the proposed DLSTM model STM achieved 3.5. In other words, the DLSTM
14
ACCEPTED MANUSCRIPT
model shows more efficiency than ARIMA model outperforms the NEA model with a difference ap-
in predicting the future oil productions and in de- proaches to one point in case study 1. Namely, DL-
scribing the typical tendency of the oil production STM achieved 2.9 against 4.2 achieved by NEA,
as shown in Fig. 4, 5, 6, and 7. In contrast, the pre- whereas in case study 2, the DLSTM achieved 3.4
dicted values by ARIMA are quite far away from the against 4.2 achieved by NEA. This indicates that the
oil production points where the difference between DLSTM model is more accurate than the NEA model
both contenders approaches 2 points in first case. We in predicting the future oil production.
can estimate why the performance of ARIMA is not Superiority in performance is not the only ad-
T
well due to its linearity nature whereas the relation- vantage of DLSTM over NEA but also NEA perfor-
ship between inputs and outputs is not linear in such mance is evidenced to be highly dependent on the se-
IP
a production data. As a nonlinear model, DLSTM lection of several parameters, as explained by the au-
CR
could to describe smoothly the nonlinear relationship thors of [28]. Among these parameters, the most im-
between inputs and outputs. portant parameters, which may affect the NEA per-
formance includes:(i) the regularized parameter (γ),
6.3. DLSTM versus Other Recurrent NNs
In this comparison, DLSTM is compared with its

forefather, RNN, and its counterpart, DGRU, where
US which controls the smoothness of the model, and (ii)
the kernel parameter (σ) of the Gaussian kernel used
in the NEA model. It is demonstrated by the authors
AN
the three contenders have the same origin and classi- in [28] that the NEA’s performance is sensitive to the
fied as recurrent neural networks. It is easy to notice values of these two parameters. Accordingly, to in-
in table 6 for case study 1 that DLSTM achieved 3.4 vestigate the performance of NEA model in the pre-
M
against 3.7 for Multi-RNN and 4.0 for DGRU. The diction of oil production, several experiments should
same rates are approximately achieved in case study be conducted in order to find improved and suitable
ED
2 and table 12. However, the error differences are combinations of these parameters.
not so big among the three contenders, since all of Furthermore, the performance of these parame-
them have typical deep architecture, but still the pro- ters in training phase is totally reversed in testing
PT
posed DLSTM model shows better performance than phase. For example, the training errors are growing
the others. Of course, as the size of data is going to with larger (σ) , whereas it is decreased for testing
CE
be large, expressively the performance of DLSTM errors. The converse will be in the case of (γ) pa-
will be much better than RNN but may be similar to rameter, where training errors decrease with larger
DGRU. (γ) but the testing errors remain monotonic. If the
AC
designer is not aware of this relationship, the larger

6.4. DLSTM versus Reported Approaches values of (σ) will convert the model from nonlinear
This is the most important comparison between the behavior to linear behavior [28]. In other words, the
proposed DLSTM model and other reported ap- overall performance of NEA in training phase and
proaches, the NEA model [28] and the HONN model testing phase is not sufficiently harmonious and re-
[18] since these three models are nonlinear and quire careful deliberation with the parameters selec-
present different origins. For the NEA model, it tions.
is clear in Tables 6 and 12 that the DLSTM model For the HONN model [18], we should highlight
15
ACCEPTED MANUSCRIPT
that this model is similar to traditional multilayer ries forecasting problems. However, in this paper,
feed forward neural network. The difference here it is tested specifically in case of petroleum time se-
is that HONN employs what is called Higher-Order ries applications. The proposed model is a deep ar-
Synaptic Operations (HOSO). HOSO of HONN em- chitecture of the Long-Short Term Memory (LSTM)
braces the linear correlation (conventional synaptic recurrent network, where we denoted it as DLSTM.
operation) as well as the higher-order correlation of The paper empirically evidences that, stacking of
neural inputs with synaptic weights. In the paper of more LSTM layers ensures to recover the limitations
[18], different HOSO have been applied up to third- of shallow neural network architectures, particularly,
T
order, where the first-order, the second-order, and when long interval time series datasets are used. In
the third-order synaptic operations are called Linear addition, the proposed deep model can describe the
IP
Synaptic Operation (LSO), Quadratic Synaptic Op- nonlinear relationship between the system inputs and
CR
eration (QSO) and Cubic Synaptic Operation (CSO), outputs, particularly, if we knew that the petroleum
respectively [18]. The authors stated that the best time series data are heterogeneous and full of com-
HOSO operation is the third one (CSO). plexity and missing parts.
It seems that the computation of HONN is com-
plex where calculation of the activation function of
the model is a combination of the conventional linear
US Notably, in the two case studies described in this
paper the proposed model outperformed its counter-
parts deep RNN and deep GRU. In addition, the per-
AN
synaptic function plus the cubic synaptic operation. formance of the proposed DLSTM is observed to
In addition, most of parameters, such as time lag and be much better than the statistical ARIMA model.
number of neurons in the hidden layer, are adjusted The most important comparisons that conducted with
M
manually or based on trial and error. This means that two recent reported machine learning approaches,
the parameters selection should be adjusted carefully denoted as NEA and HONN, where DLSTM outper-
ED
to ensure accurate oil production forecasting. formed both of them with a noticeable difference on
Nevertheless, in Table 13 DLSTM continues the scale of two different percentage error measures.
to show better performance than HONN via the The accurate prediction and learning perfor-
PT
three error measures, particularly for percentage er- mance shown in the paper indicate that the proposed
ror measure. Namely, through the MAPE measure deep LSTM model, and other deep neural network
CE
the DLSTM achieved 2.8 against 3.4 for HONN. In models, are eligible to be applied in the nonlinear
our perspective, the optimality of DLSTM’s perfor- forecasting problems in the petroleum industry. In
mance can attribute to the recursive nature of DL- our future research plans, we will investigate the
AC
STM, against the feedforward nature of HONN. In- performance of DLSTM in other forecasting prob-
deed, the recursive property ensures more accurate lems especially when the problem includes multi-
prediction particularly when the dataset size going to variables (multivariate) time series data.
be large.
Acknowledgements
7. Conclusion
The authors of this paper would like to express
In this paper, we developed a promising predic- about their thank and gratitude to “Deanship of Sci-
tion model can be used in the majority of time se- entific Research” at King Faisal University, Saudi
16
ACCEPTED MANUSCRIPT
Arabia for their moral and financial support to this research, Int. J. Inf. Technol. Decis. Making 5 (2006) 597-
work under the research grant number (170069). 604.
[16] Rajesh Mehrotra, Regilal Gopalan, Factors infleuencing
strategic decision-making process for the oil/gas indus-
References triesof UAE-A study, Int. J. Mark. Financial Management
[1] De Gooijer, J.G., Hyndman, R.J., 25 years of time series 5 (2017) 62-69.
forecasting, Int. J. Forecast. 22(3) (2006) 443-473. [17] R. B. C. Gharbi and G. A. Mansoori, An introduction
[2] Poskitt, D.S., Tremayne, A.R., The selection and use of to artificial intelligence applications in petroleum explo-
linear and bilinear time series models, Int. J. Forecast. ration and production, J. Pet. Sci. Eng. 49 (2005) 93-96.
2(1) (1986) 101-114. [18] N. Chithra Chakra, Ki-Young Song, Madan M. Gupta,
T
[3] Tong, H., Non-linear Time Series: A Dynamical System Deoki N. Saraf , An innovative neural forecast of cumula-
IP
Approach, Oxford University Press, (1990). tive oil production from a petroleum reservoir employing
[4] Engle, R.F., Autoregressive conditional heteroscedasticity higher-order neural networks (HONNs), J. Pet. Sci. Eng.
106 (2013) 18-33.
CR
with estimates of the variance of United Kingdom infla-
tion, Econometrica 50(4) (1982) 987-1007. [19] R. Nyboe, Fault detection and other time series opportuni-
[5] Guoqiang Zhang, B. Eddy Patuwo, Michael Y. Hu, Fore- ties in the petroleum industry, Neurocomputing 73(10-12)
casting with artificial neural networks: The state of the (2010) 1987-1992.
art, Int. J. Forecast. 14 (1998) 35-62.
[6] M. Hü sken, P. Stagge, Recurrent neural networks for time
series classification, Neurocomputing 50 (2003) 223-235.
US [20] Luis Martı́, Nayat Sanchez-Pi, José Molina, and Ana
Garcia, Anomaly Detection Based on Sensor Data in
Petroleum Industry Applications, Sensors 15 (2015)
AN
[7] Bayer, Justin Simon, Learning Sequence Represen- 2774-2797
tations, Diss.Mnchen, Technische Universitt Mnchen, [21] Jonathan D. Cryer, Kung-Sik Chan, Time Series Analysis,
Diss., 2015. 2nd edition, Springer Texts in Statistics, Springer, New
York, 2008.
M
[8] Pascanu, Razvan, Tomas Mikolov, and Yoshua Bengio,

On the difficulty of training recurrent neural networks, in: [22] S. L. Ho, M. Xie, The Use of ARIMA Models for Re-
Proceedings of the 30th International Conference on Ma- liability Forecasting and Analysis, Comput. Ind. Eng. 35
ED
chine Learning (3) 28, (2013), pp. 1310-1318. (1998) 213-216.

[9] Sepp Hochreiter, Jürgen Schmidhuber, Long short-term [23] J. Choi, David C. Roberts, EunSu Lee, Forecasting Oil
memory, Neural Comput. 9(8) (1997) 1735-1780. Production in North Dakota Using the Seasonal Autore-
gressive Integrated Moving Average (S-ARIMA), Nat.
PT
[10] I. Sutskever, Training recurrent neural networks (Ph.D.

thesis),University of Toronto, 2012. Resour. 6 (2015) 16-26.
[11] M. Längkvist, Lars Karlsson, Amy Loutfi, A review of [24] Arash Kamari, Amir H. Mohammadi, Moonyong Lee,
CE
unsupervised feature learning and deep learning for time- Alireza Bahadori, Decline curve based models for pre-
series modeling, Pattern Recognit. Lett. 42 (2014) 11-24. dicting natural gas well performance, Pet. 3 (2017) 242-
[12] Michiel Hermans,Benjamin Schrauwen,Training and ana- 248.
[25] Igor Aizenberg, Leonid Sheremetov, Luis Villa-Vargas,
AC
lyzing deep Recurrent Neural Networks, in: Proceedings

of the 26th International Conference on Neural Informa- Jorge Martinez-Muñoz, Multilayer Neural Network with
tion Processing Systems NIPS 1, (2013), pp. 190-198 . Multi-Valued Neurons in Time Serie Forecasting of Oil
[13] Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Production, Neurocomputing 175 (2016) 980-989.
Yoshua Bengio,How to construct deep Recurrent Neural [26] S. Berneti, M. Shahbazian, An Imperialist Competitive
Networks, in: Proceedings of the Second International Algorithm Artificial Neural Network Method to Predict
Conference on Learning Representations ICLR (2014) Oil Flow Rate of the Wells, Int. J. Comput. Applic. 26
[14] Utgoff P.E. , Stracuzzi D.J. , Many-layered learning, Neu- (2011) 47-50.
ral Comput. 14(10) (2002) 2497-2529. [27] Zhidi Liu, Zhengguo Wang, and Chunyan Wang, Pre-
[15] Q. Yang, X. Wu, 10 Challenging problems in data mining dicting Reservoir Production Based on Wavelet Analysis-
17
ACCEPTED MANUSCRIPT
Neural Network, Advances in Computer Science and In- [39] R. J. Hyndman, measuring forecast accuracy, In: M.
formation Engineering. Advances in Intelligent and Soft Gilliland, L. Tashman, U. Sglavo, business forecasting:
Computing, 168 (2012). practical problems and solutions, John Wiley & Sons,
[28] Xin Ma, Predicting the oil production using the novel 2016, pp. 177-183.
multivariate nonlinear model based on Arps decline
model and kernel method, Neural Comput. Applic. 29
Dr. Alaa Sagheer received his B.Sc.
(2016) 1-13.
and M.Sc. in Mathematics from
[29] S. Ben Taieb, G. Bontempi , A. F. Atiya , A. Sorja- Aswan University, EGYPT. He got
maa, A review and comparison of strategies for multi- Ph.D. in Computer Engineering in
step ahead time series forecasting based on the NN5 fore- the area of Intelligent Systems from
T
the Graduate School of Information
casting competition, Expert Syst. Applic. 39 (2012) 7067-
Science and Electrical Engineering,
IP
7083. Kyushu University, Japan in 2007.
[30] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, Deep After receiving PhD, he served as an
learning, Nature 521 (2015) 436-444. Assistant Professor at Aswan Univer-
CR
sity. In 2010, Dr. Sagheer established
[31] Klaus Greff, Rupesh K. Srivastava, Jan Koutnk, Bas R.
and directed the Center for Artificial
Steunebrink, and Jrgen Schmidhuber, LSTM: A Search Intelligence and Robotics (CAIRO) at Aswan University. He
Space Odyssey, IEEE Trans. Neural Netw. Learn. Syst. served also as the Principal Investigator in CAIRO in several
[32]
28 (2017) 2222-2232.
Michiel Hermans, Benjamin Schrauwen, Training and
Analysing Deep Recurrent Neural Networks, in: Proceed-
US research and academic projects funded by different Egyptian
governmental organizations. In 2013, Dr. Sagheer and his team
won the first prize, in a programming competition organized
by the Ministry of Communication and Information Technol-
AN
ings of the 26th International Conference on Neural Infor- ogy (MCIT) Egypt, for their system entitled Mute and Hearing
mation Processing Systems NIPS 1, Dec 2013, pp. 190- Impaired Education via an Intelligent Lip Reading System. In
198. 2014, he appointed as an Associate Professor at Aswan Uni-
versity. In the same year, Dr. Sagheer joined the Department
[33] Stephan Spiegel, Julia Gaebler, Andreas Lommatzsch
M
of Computer Science, College of Computer Sciences and Infor-

Ernesto De Luca, Sahin Albayrak, Pattern Recognition mation Technology, King Faisal University, Saudi Arabia. Dr.
and Classification for Multivariate Time Series, in: pro- Sagheers research interests include artificial intelligence, ma-
chine learning, pattern recognition, computer vision, and opti-
ED
ceeding of the Fifth International Workshop on Knowl-

mization theory. Recently, Dr. Sagheer extended his research
edge Discovery from Sensor Data, 2011, pp. 34-42 interest to include quantum computing and quantum commu-
[34] Félix-Antoine Fortin and François-Michel De Rainville nication. He authored and co-authored more than 40 research
and Marc-André Gardner and Marc Parizeau and Chris- articles in his research interest fields. Dr. Sagheer is a mem-
PT
tian Gagné, DEAP: Evolutionary Algorithms Made Easy, ber of IEEE and IEEE Computational Intelligence society. He
is a reviewer for some journals and conferences related to his
J. Mach. Learn. Res. 13 (2012) 2171-2175.
research interests.
CE
[35] Seabold, Skipper and Perktold, Josef, Statsmodels:

Mostafa Kotb received his B.S.
Econometric and statistical modeling with python, in pro- degree in computer science from
ceeding of 9th Python in Science Conference, 2010. Aswan University, Aswan, Egypt at
[36] E.J. Bedrick and C.L. Tsai, Model Selection for Mul- 2012. He is now a master student
AC
and a research assistant in Center for

tivariate Regression in Small Samples, Biometrics 50
Artificial Intelligence and Robotics
(1994) 226-231. (CAIRO), Aswan University. His
[37] C. Junyoung, C. Gulcehre, C. KyungHyun, and Y. Ben- research interests include artificial
gio, Empirical Evaluation of Gated Recurrent Neural Net- intelligence, machine learning, and
deep learning.
works on Sequence Modeling, in: proceeding of the Deep
Learning workshop at NIPS, 2014.
[38] R. J. Hyndman, and A. B. Koehler, Another look at mea-
sures of forecast accuracy, Int. J. Forecast. 22(4) (2006)
679-688.
18

Sag Heer 2019

Uploaded by

Copyright:

Available Formats

Sag Heer 2019

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sag Heer 2019

Uploaded by

Copyright:

Available Formats

Communicated by Prof.

Time Series Forecasting of Petroleum Production using Deep LSTM

Alaa Sagheer, Mostafa Kotb

To appear in: Neurocomputing

Received date: 11 November 2017

Time Series Forecasting of Petroleum Production

Memory, Petroleum Production Forecasting

ear statistical methods in order to achieve the fore-

time series models were proposed such as the bilin-

2. TSF Problem Statement It is known that, the petroleum production from

massive investment of money, time and technology

try such as, prices, consumption levels, and reser-

tics of petroleum time series data make such estima-

petroleum industry, particularly in the scenarios de-

Second, the petroleum time series datasets are non-

to focused on forecasting cumulative oil production

the current study it is the precedent of the proposed

US following forget gate ( ft ):

• The following step is to decide which new in-

to solve the vanishing gradient problem of RNN in

[8, 9]. The LSTM model, developed by Horchre-

• Then, update the old cell state, Ct−1 into the

data and then combine these representations at the

From the previous six equations, the LSTM presents

3.2. The Deep LSTM Recurrent Network

It is widely demonstrated that increasing the

datasets [33]. tively incorporated to reduce the random noise

4. Experiments associated with raw data [18].

perimental settings that were adopted in order to im-

hibit a specific trend [21]. Of course, station-

comparative analysis of their performance with the

casting problem into the original scale.

models that vary from statistical methods, machine

chine learning) methods. The reference models are:

model with all of the training data and then fore-

represents a statistical-based comparison. For

to make comparisons between series that are on dif-

that combines Decline Curve Analysis (DCA), v

[28]. For comparison purpose, we will rely on the

5. The HONN model (2) Percentage errors

preferable to rely on RMSPE, or any other percent-

using same data set. The NEA results shown in Ta-

sual results of the proposed DLSTM model along

been done in concurrence with the widely demon-

data not the historical (training) data, which already

This case study includes raw data collected from the 5

Table 1: Best results of DLSTM with static scenario

No. of No. of hidden No.of

Table 2: Best results of DLSTM with dynamic scenario

Table 3: Best results of Single-RNN

No. of hidden No.of

3 [4,3,4] 1611 5 0.237 3.374

Table 6: Overall comparison among ARIMA, NEA [28], RNN,

lag RMSE RMSPE Model

DLSTM(static) 0.209 2.995

DGRU 0.222 3.175

three measures. For the proposed DLSTM model,

Table 7: Best results of DLSTM with static scenario

No. of No. of hidden No.of

Table 8: Best results of DLSTM with dynamic scenario