1 s2.0 S2090447914001129 Main

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Journal of Hydrology 558 (2018) 647–658

Contents lists available at ScienceDirect

Journal of Hydrology
journal homepage: www.elsevier.com/locate/jhydrol

Research papers

Prediction of GWL with the help of GRACE TWS for unevenly spaced time
series data in India : Analysis of comparative performances of SVR, ANN
and LRM
Amritendu Mukherjee ⇑, Parthasarathy Ramachandran
Indian Institute of Science, Bangalore 560012, India

a r t i c l e i n f o a b s t r a c t

Article history: Prediction of Ground Water Level (GWL) is extremely important for sustainable use and management of
Received 5 October 2017 ground water resource. The motivations for this work is to understand the relationship between Gravity
Received in revised form 29 January 2018 Recovery and Climate Experiment (GRACE) derived terrestrial water change (DTWS) data and GWL, so
Accepted 5 February 2018
that DTWS could be used as a proxy measurement for GWL. In our study, we have selected five observa-
Available online 9 February 2018
This manuscript was handled by Emmanouil
tion wells from different geographic regions in India. The datasets are unevenly spaced time series data
Anagnostou, Editor-in-Chief, with the which restricts us from applying standard time series methodologies and therefore in order to model and
assistance of Viviana Maggioni, Associate predict GWL with the help of DTWS, we have built Linear Regression Model (LRM), Support Vector
Editor Regression (SVR) and Artificial Neural Network (ANN). Comparative performances of LRM, SVR and
ANN have been evaluated with the help of correlation coefficient (q) and Root Mean Square Error
Keywords: (RMSE) between the actual and fitted (for training dataset) or predicted (for test dataset) values of
Groundwater level prediction GWL. It has been observed in our study that DTWS is highly significant variable to model GWL and
GRACE gravitational anomalies the amount of total variations in GWL that could be explained with the help of DTWS varies from
Linear regression model 36.48% to 74.28% ð0:3648 6 R2 6 0:7428Þ. We have found that for the model GWL  DTWS, for both train-
Support vector regression ing and test dataset, performances of SVR and ANN are better than that of LRM in terms of q and RMSE. It
Artificial neural network
also has been found in our study that with the inclusion of meteorological variables along with DTWS as
input parameters to model GWL, the performance of SVR improves and it performs better than ANN.
These results imply that for modelling irregular time series GWL data, DTWS could be very useful.
Ó 2018 Elsevier B.V. All rights reserved.

1. Introduction & motivation Reduction of groundwater storage, i.e., groundwater depletion has
been considered as a global problem that threatens the sustainabil-
The growth and sustainability of human civilization has been ity of water supply (Mays, 2013). For the time period of 1900 to
dependent greatly on the availability of water. Groundwater is 2008, the estimated depletion of global groundwater is 4500 km3
the source of 1/3rd of all freshwater withdrawals and supplies (equivalent to 12.6 mm sea-level rise) and the maximum rate of
around 36%, 42% and 27% of water for domestic, agriculture and depletion has occurred during the time period 2000 to 2008 with
industrial purposes respectively (Taylor et al., 2013). Groundwater average rate of 145 km3/year (equivalent to 0.4 mm/year sea-
is a renewable resource and requires proper management. Demand level rise) (Konikow, 1900). India is the largest groundwater user
for groundwater has been increasing due to various factors and in the world. The estimated usage of groundwater is around 230
efficient management is required to ensure long term supply of km3 per year which is more than 25% of global total1. Groundwater
groundwater (Mays, 2013). According to the Alicante Declaration has been very important for India to maintain it’s economy, environ-
(resulted from International Symposium on Groundwater. ment and standard of living as more than 60% of irrigated agriculture
Sustainability-ISGWAS, held in Alicante, Spain from 23rd to and 85% of drinking water supply depends on groundwater
27th January, 2006), availability of ground water depends on (Garduño et al., 2011). Rodell et al. (2009) have shown in their study
responsible use and governance. Current situation of Ground that in India, during the period from August 2002 to October 2008,
Water Level (GWL) has been extremely alarming around the globe. mean rate of groundwater depletion was 4.0 ± 1.0 cm/year equiva-

⇑ Corresponding author. 1
World Bank, 2010, Deep wells and prudence: towards pragmatic action for
E-mail address: [email protected] (A. Mukherjee). addressing groundwater overexploitation in India. Washington, DC: World Bank.

https://doi.org/10.1016/j.jhydrol.2018.02.005
0022-1694/Ó 2018 Elsevier B.V. All rights reserved.
648 A. Mukherjee, P. Ramachandran / Journal of Hydrology 558 (2018) 647–658

3
lent height of water (17:7  4:5 km =year) for Rajasthan, Punjab, locations, whereas DTWS is a continuous measurement in space.
Haryana and Delhi and the total groundwater depletion during this Therefore, it is extremely important to study and understand the
study period (August 2002 to October 2008) for these regions was degree of correlation between DTWS and GWL, if we want to use
3 DTWS as a proxy measurement for GWL.
equivalent to 109 km of water. During the period of 30 years
Feng et al. (2013) have used GRACE data to estimate changes in
between 1980 and 2010, major areas of India have experienced sub-
groundwater levels in North China region for the period 2003 to
stantial decline of groundwater level (depth to water, measured in
2010. In the Bengal Basin of Bangladesh, Shamsudduha et al.
terms of meters below ground level) varying from 4 meters to 16
(2012) have shown in their study that GRACE datasets of ground-
meters (Sekhri et al., 2013). Apart from irrigation, urbanisation and
water storage changes (DGWS) has a strong correlation
climate change are major factors that affect groundwater storage
(0:77 6 r  0:93) with in situ borehole records and account for
(Mays, 2013; Garduño et al., 2011; Schewe et al., 2014; Islam
44% of the total variation in Terrestrial Water Storage (DTWS).
et al., 2012; Slavkov et al., 2013; Teutschbein et al., 2015). In the last
Sun (2013) has predicted groundwater level changes from Terres-
decade, India has observed considerable amount of declination in the
trial Water Storage change (DTWS), provided from GRACE satellite
groundwater level. About 65% of wells in India have shown declina-
data with the help of Artificial Neural Network (ANN) for different
tion in groundwater level in January 2016 compared to decadal
regions in United States of America. In the study conducted by
mean of groundwater level for January (from 2006 to 2015) 2.
Panda and Wahr (2015), it has been observed that there exists a
GWL is measured in terms of depth to ground water from land
high degree of correlation between GRACE derived Ground Water
surface and is a measurement from observation wells, situated at
Storage and in situ groundwater levels from observation wells.
spatially discrete points. Also GWL provides an idea about the
Different statistical and machine learning methodologies have
water level but not the volume of it. Variations of Terrestrial Water
been applied to predict groundwater storage and to understand
Storage (DTWS), derived from Gravity Recovery and Climate
the impact of different variables on groundwater storage. Among
Experiment (GRACE) satellite data has been used extensively by
the statistical learning methodology, Linear & Non Linear Regres-
the researchers to understand groundwater storage conditions
sion and Correlation Analysis (Adamowski et al., 2012; Tiwari
and trends with the help of it (Sun, 2013; Feng et al., 2013;
and Adamowski, 2013; Mirzavand and Ghazavi, 2015; Dall et al.,
Shamsudduha et al., 2012; Panda and Wahr, 2015) (All reported
2014; Azadeh et al., 2011; Shamsudduha et al., 2012;
DTWS data are anomalies relative to 2004–2009 time-mean
Chinnasamy and Agoramoorthy, 2015; Panda and Wahr, 2015),
baseline3).
Time Series Models (ARMA4, ARIMA5, SARIMA6etc.)(Adamowski
Rodell and Famiglietti (2002) discussed about potential usage of
et al., 2012; Tiwari and Adamowski, 2013; Adamowski and Chan,
GRACE data for monitoring variations in groundwater storage. For
2011; Dall et al., 2014; Al-Zahrani and Abo-Monasar, 2015;
the time period January 2002 to July 2005, groundwater storage
Arandia et al., 0401; Shirmohammadi et al., 2013) have been widely
changes in the Mississipi River basin, USA, have been estimated
used. Among other machine learning methodologies, Artificial Neu-
using GRACE data by Rodell et al. (2007). In their study they have
ral Networks (ANN) (Adamowski et al., 2012; Tiwari and
shown the importance of groundwater storage assessment from
Adamowski, 2013; Adamowski and Chan, 2011; Al-Zahrani and
GRACE data. For California Central Valley, USA, Scanlon et al.
Abo-Monasar, 2015; Dos Santos and Pereira, 2014; Moosavi et al.,
(2012) have estimated groundwater storage changes from GRACE
2013; Emamgholizadeh et al., 2014; Azadeh et al., 2011; Sun,
data from April 2006 to September 2009. Dall et al. (2014) have
2013; He et al., 2014; Mohanty et al., 2015; Karthikeyan et al.,
modelled and analysed the trends of Groundwater Depletion
2013; Daliakopoulos et al., 2005; Yoon et al., 2011), Wavelet ANN
(GWD) and Terrestrial Water Storage (TWS), derived from GRACE
(WA-ANN) (Adamowski et al., 2012; Tiwari and Adamowski, 2013;
satellite data on a global scale. They have shown in their study that
Adamowski and Chan, 2011; Moosavi et al., 2013; He et al., 2014;
the highest GWD rates in the first decade of the 21st century
Tiwari and Adamowski, 2014), Adaptive Neuro-Fuzzy Inference Sys-
occurred in India, United States, Iran, Saudi Arabia, and China. Also,
tem (ANFIS) (Moosavi et al., 2013; Emamgholizadeh et al., 2014;
they have found that the rate of global GWD has likely more than
Shirmohammadi et al., 2013), Wavelet- ANFIS (Moosavi et al.,
doubled since the period 1960–2000. Chinnasamy and
2013), Support Vector Regression (SVR) (Yoon et al., 2011) have been
Agoramoorthy (2015) have studied impact of irrigation on ground-
used extensively by different research groups. Many of these studies
water by analysing groundwater storage and depletion trends with
have also tried to evaluate relative performances (in terms of predic-
the help of Gravity Recovery and Climate Experiment (GRACE) and
tion mainly) of different methodologies (Moosavi et al., 2013;
the Global Land Data Assimilation Systems (GLDAS) data in Tamil
Daliakopoulos et al., 2005; He et al., 2014; Karthikeyan et al.,
Nadu State, India for the time period 2002 to 2012. They have used
2013; Awchi, 2014; Yoon et al., 2011; Adamowski et al., 2012;
descriptive statistical analysis in their study. Panda and Wahr
Adamowski and Chan, 2011; Shirmohammadi et al., 2013).
(2015) have studied variations of Terrestrial Water Storage (DTWS)
From the literature on trend analysis and forecasting method-
and Ground Water Storage (DGWS) data, derived from Gravity
ologies in water resource research area, we could observe that
Recovery and Climate Experiment (GRACE) satellite data for the
temperature and precipitation (Adamowski et al., 2012; Tiwari
period of January 2003 to May 2014 in India. They have found that
and Adamowski, 2013; Azadeh et al., 2011; Panda and Wahr,
substantial GWS depletion has taken place in the northern part of
2015; Al-Zahrani and Abo-Monasar, 2015; Shirmohammadi et al.,
the country, particularly at Ganges Basin and Punjab state with
2013; Haque et al., 2014; Dos Santos and Pereira, 2014; Moosavi
depletion rates of 1.25 cm year1 and 2.1 cm year1 respectively.
et al., 2013; Sun, 2013; Karthikeyan et al., 2013; Yoon et al.,
As GRACE derived DTWS requires adjustments for other compo-
2011; Tiwari and Adamowski, 2014) have been consistently used
nents and involves errors due to statistical downscaling methodol-
as explanatory meteorological variables for both statistical and
ogy (Rodell and Famiglietti, 2002), it is not an exact measurement
probabilistic models and also for other machine learning tech-
for Ground Water Storage. Also, GWL is measured in terms of
niques. For some studies other meteorological variables like
meters below ground level (mbgl) from the observation wells that
humidity, wind speed, tide level etc. (Yoon et al., 2011; Al-
are spatially distributed and are situated at discrete geographic
Zahrani and Abo-Monasar, 2015; Dos Santos and Pereira, 2014)

2 4
Ground Water Scenario In India, January 2016, Central Ground Water Board, ARMA:Auto Regressive Moving Average.
5
Ministry Of Water Resources, Government Of India. ARIMA: Auto Regressive Integrated Moving Average.
3 6
Source:https://grace.jpl.nasa.gov/data/get-data/monthly-mass-grids-land/. SARIMA: Seasonal Auto Regressive Integrated Moving Average.
A. Mukherjee, P. Ramachandran / Journal of Hydrology 558 (2018) 647–658 649

along with temperature and precipitation have also been used as database11 for the study period from January 2005 to December
explanatory variables. 2013. This well observation GWL data is available for four seasons
In the literatures that we’ve covered on groundwater research across the year. These seasons are Post-monsoon Rabi (January to
area, we have not been able to find any study that focuses on the March), Pre monsoon (April to June), Monsoon (July to September)
accuracy of DTWS for measuring GWL in India, and this serves as and Post-monsoon Kharif (October to December). For observation
the primary motivation for this paper, which is to establish the sites, this GWL data has not been available for all months of men-
relationship between GRACE derived DTWS and GWL measure- tioned seasons of the year. Some of the sites have multiple observa-
ments of observation wells for different geographic regions in tions in a season whereas some sites have only one observation in a
India. Also, we would like to compare the performances of different season. Also, this data is unevenly spaced in time, i.e., time gaps
machine learning methodologies (Linear Regression Model-LRM, between two consecutive observations are not equal. The total num-
Support Vector Regression-SVR and Artificial Neural Network- ber of observations also varies across sites and it ranges from 35 (for
ANN) both in terms of modelling and prediction of GWL with the Sathamba) to 67 (for Mhow). The observed GWL data for all selected
help of DTWS and other meteorological variables like Tempera- sites have been shown in Fig. 2.
ture, Precipitation, Wind, Humidity etc. Gravity Recovery and Climate Experiment (GRACE) is a joint
mission, launched in March 2002 by NASA12 and DLR13. The main
objective of the mission has been to accurately measure Earth’s grav-
2. Data & study area itational field for monthly intervals. GRACE mission consists of two
twin satellites (220 km apart from each other at 500 km altitude).
India is the seventh largest country of the world covering an The distance between these twin satellites get affected due to spatio-
area of 32,87,263 sq km. Latitudes of India extends from 8°40 North temporal variation of Earth’s gravitational field. The on-board K-
to 37°60 North and longitudes of India extends from 68°70 East to Band microwave ranging systems measures this inter-satellite dis-
97°250 East. Geologically India can broadly be classified into three tance. This measurement, associated with other ancillary data, pro-
major regions namely Himalayas & associated group of mountains, vides measurement of Earth’s gravity field. The variations of this
Indo-Gangetic Plain and Peninsular Shield. India consists of 29 gravity field are mainly caused by changes in Terrestrial Water Stor-
states and 7 union territories. Climate of India can be described age (TWS) (Syed et al., 2008; Scanlon et al., 2012; Rodell et al., 2007).
as tropical monsoon type. Average maximum temperature across TWS is a measurement that integrates Ground Water Storage (GWS),
India varies from 24.5 °C (in January-February) to 31.5 °C (in Soil Moisture (SM), Canopy Water Storage (CWS), Snow, Ice and
March-May) whereas average minimum temperature varies from Water in biomass (Panda and Wahr, 2015; Sun, 2013). GRACE
13.85° (in January-February) to 23.27 °C (in June-September)7. derived monthly DTWS estimates data (anomalies relative to
Average rainfall in India ranges from 41.87 mm (in January- 2004–2009 time-mean baseline) is available in ftp site of NASA Jet
February) to 887.48 mm (in June-September)8. Propulsion Laboratory (JPL)14since April 2002.
We have selected five different sites (Table 1) from different We also have collected monthly water content data from
geographic regions of India to study the relationship between GLDAS15 which includes snow content, total soil moistures at 4 lay-
GWL and DTWS. These sites have been selected widely apart in ers and canopy water storage. This data does not include ground
order to avoid any interrelation between the sites so that observa- water and surface water content. Like DTWS data, this GLDAS water
tions for each sites would be independent of each other. Coastal content data are anomalies related to January 2003 to December
areas have been avoided as for coastal areas other meteorological 2007 time averaged baseline. We have downloaded this data from
factors like tide level could affect GWL(Yoon et al., 2011). ftp site of NASA Jet Propulsion Laboratory16.
From central part of India, we have selected Mhow, which is Both GRACE DTWS and GLDAS water content data are
near Indore, most populous and the largest city of the state of Mad- expressed in terms of equivalent liquid water thickness (in cm).
hya Pradesh and is situated on the southern edge of the Malwa pla- As both GRACE DTWS and GLDAS water content data are available
teau. From the Jashpur district of Chhattisgarh, we have selected for 1° resolution grid, we have collected DTWS and GLDAS water
Kotba. This region is hilly and contains forest area. content data for the latitude and longitude grid that encompasses
Panitola from the Tinsukia district of Assam has been selected particular observation site. For example, for the site Panitola (Lat-
as another site. This region is located at north east part of the coun- itude: 27.49° North and Longitude: 95.26°East), DTWS and GLDAS
try and includes several rivers and reserve forests. This area is sit- water content data have been collected for the 1° resolution grid
uated on Brahmaputra River basin.Fig. 1. whose latitude covers from 26.5°North to 27.5°North and longi-
We have selected Sathamba as another site from the Sabarkan- tude covers from 94.5°East to 95.5°East. Also, only for those
tha district of Gujrat state and this area is located at the western months for which well observation data exists, DTWS and GLDAS
part of India. water content data have been collected.
From the southern part of the country, we have selected Sirigeri We have considered Temperature (both Maximum and Mini-
from Bellary district of Karnataka. This area is situated in the Dec- mum), Precipitation, Wind and Humidity as other meteorological
can Plateau of southern India and is endowed with rich mineral covariates along with DTWS to model GWL. Inclusion of Wind
resources. Tungabhadra is the main river in this region. and Humidity as covariate to model and predict GWL is based on
Central Groundwater Board9 maintains a database of well obser- the assumption that these variables could impact groundwater
vation GWL data, measured in terms of meters below ground level, demand which in turn may influence groundwater level. For this
from a network over 22000 observation wells10 across the country. purpose, we have collected meteorological data from Global
GWL data for the selected observation sites in this study, has been
downloaded from Water Resources Information System of India
11
http://www.india-wris.nrsc.gov.in; accessed on May27, 2015.
7 12
Ministry of Earth Sciences, India Meteorological Department (IMD);Time Period: National Aeronautics and Space Administration.
13
1901–2015. German Aerospace Centre: Deutsches Zentrum für Luft- und Raumfahrt.
8 14
Ministry of Earth Sciences, India Meteorological Department (IMD);Time Period: ftp://podaac-ftp.jpl.nasa.gov/allData/tellus/L3/land_mass/RL05.
15
1901–2013. GLDAS: Global Land Data Assimilation System- https://grace.jpl.nasa.gov/data/
9
http://www.cgwb.gov.in. get-data/land-water-content/.
10 16
Ground Water Scenario In India, January 2016, Central Ground Water Board, ftp://podaac-ftp.jpl.nasa.gov/allData/tellus/L3/gldas_monthly/netcdf/.
Ministry Of Water Resources, Government Of India.
650 A. Mukherjee, P. Ramachandran / Journal of Hydrology 558 (2018) 647–658

Table 1
Details of Selected Data Points for Study.

Name of the Site State District Latitude Longitude Principal Aquifer Systems1 Average GWL2(in mbgl)
Mhow Madhya Pradesh Indore 22.55°North 75.76°East Basalt 5.25
Kotba Chhattisgarh Jashpur 22.42°North 83.75°East Banded Gneissic Complex 3.73
Panitola Assam Tinsukia 27.49°North 95.26°East Alluvium 2.79
Sathamba Gujrat Sabar Kantha 23.18°North 73.33°East Alluvium 9.54
Sirigeri Karnataka Bellary 15.44°North 76.84°East Banded Gneissic Complex 3.01
1
Source: Aquifer Systems of India –http://cgwb.gov.in/AQM/.
2
Average GWL is the average value of GWL, measured in terms of meters below ground level (mbgl) for the time period from 1996 to 2016. Data Source: Central Ground
Water Board –http://www.cgwb.gov.in/GW-data-access.html.

3. Methodology

We have collected GWL data for the selected sites and the data
has not been available for all months in a year. Also, the dataset is
unevenly spaced time series data as the time gap between two
consecutive data points are not same across the time period. This
restricts us from applying standard statistical methodology like
Time Series Analysis (ARMA, ARIMA, SARIMA etc.) which requires
equal time gaps between two adjacent data points across the time
period. Therefore, we have used Linear Regression Model (LRM),
Support Vector Regression (SVR) and Artificial Neural Network
(ANN) in our study as all these methodologies could be applied
on the available data without such restrictions. Also, these meth-
ods have been widely employed in the previous literatures
(Adamowski et al., 2012; Tiwari and Adamowski, 2013;
Mirzavand and Ghazavi, 2015; Dall et al., 2014; Shamsudduha
et al., 2012; Chinnasamy and Agoramoorthy, 2015; Panda and
Wahr, 2015; Adamowski and Chan, 2011; Al-Zahrani and Abo-
Monasar, 2015; Dos Santos and Pereira, 2014; Moosavi et al.,
2013; Emamgholizadeh et al., 2014; Azadeh et al., 2011; Sun,
2013; He et al., 2014; Mohanty et al., 2015; Karthikeyan et al.,
2013; Daliakopoulos et al., 2005; Yoon et al., 2011).

3.1. Linear Regression Model (LRM)

Let Y be the dependent variables and X be the independent vari-


Fig. 1. Study Area: Observation Sites in India.
ables; then in Linear Regression Model (LRM)20, the conditional dis-
tribution of Y given X is YjX  Nðb0 þ b1 X 1 þ . . . . . . þ bk X k ; r2 Þ. Also,
expectation (E½YjX) and variance (V½YjX) of Y given X are
Weather Data site17. These mentioned meteorological data have also
been collected for the same 1° resolution grid, used for GRACE DTWS E½YjX ¼ b0 þ b1 X 1 þ . . . :: þ bk X k and V½YjX ¼ r2 respectively, where
and GLDAS water content data. To continue with earlier example, for bj are expected changes in dependent variable Y for every increase in
Panitola (Latitude: 27.49°North and Longitude: 95.26°East), meteo- independent variable X j , given all other information fixed and r2 is
rological data have been collected for the 1° resolution grid whose the inherent variability in the process. The parameters b and r2
latitude covers from 26.5°North to 27.5°North and longitude covers are estimated through Maximum Likelihood Estimation (MLE).
from 94.5°East to 95.5°East. Once collected, for each meteorological Likelihood function of b and r2 for given Y and X is
 
Lðb; r2 jX; YÞ ¼ 
ð 1 Þ T
covariate, data is averaged out for all weather station data that 1
 exp 2r2 where is the error value
ð2pÞn=2 ðr2 Þ
n=2
resides within the mentioned 1° resolution grid. Like for Panotola,
(difference between observed and predicted values of Y), having
there exists 9 weather stations within the grid and data for all vari-
ables (Maximum Temperature, Minimum Temperature, Precipita- a Gaussian distribution with mean 0 and variance as r2
tion, Wind and Humidity) have been averaged out for all 9 (  Nð0; r2 Þ) and n is the number of training examples. From
weather stations. Maximum and Minimum Temperature is mea- the likelihood function, log Likelihood function can be obtained
sured in terms of °Celcius. Precipitation, Humidity and Wind are as lðb; r2 jX; YÞ ¼  n2 log 2p  2n log r2  2r1 2 ðY  XbÞT ðY  XbÞ. Dif-
measured in terms of millimetre (mm), fraction and meter/s ferentiating this log Likelihood function with respect to b and r2
respectively. and equating these differentiation to 0 will give Maximum Likeli-
We have considered the time period for the study from January hood Estimation for both b and r2 respectively.
2005 to December 2013. 1
For the purpose of processing GRACE DTWS data and GLDAS rb lðb; r2 jX; YÞ ¼ 0 ) b
b MLE ¼ ðX T XÞ X T Y and rr2 lðb; r2 jX; YÞ ¼ 0
water content data, we have used Ncview18 software and R package  
c2 MLE ¼ 1  ðY  X b
)r
T
b MLE Þ Y  X b
b MLE :
ncdf419 as both data files are in netCDF format. n

17
http://globalweather.tamu.edu.
18
http://meteora.ucsd.edu/pierce/ncview_home_page.html.
19 20
https://cran.r-project.org/web/packages/ncdf4/ncdf4.pdf. Neter, John, et al. Applied Linear Statistical Models. Fifth Edition.
A. Mukherjee, P. Ramachandran / Journal of Hydrology 558 (2018) 647–658 651

Fig. 2. GWL and TWS for all observation sites.


652 A. Mukherjee, P. Ramachandran / Journal of Hydrology 558 (2018) 647–658

Also the Uniformly Minimum Variance Unbiased Estimator tests of standardized residuals in Linear Regression Model, package
b UMVUE ¼ b
(UMVUE) for b and r2 are b c2 UMVUE ¼ SSE respec-
b MLE and r e107129 for modelling Support Vector Regression and package neu-
ðnpÞ
T
ralnet30 for Artificial Neural Network modelling). Python package
b MLE Þ ðY  X b
tively; where SSE ¼ ðY  X b b MLE Þ and p is the number of scikit-learn31 has also been used to build SVR and ANN models.
unknown parameters b. For all three methodologies used (LRM, SVR and ANN), we have
taken differences of GWL between two consecutive observations as
3.2. Support Vector Regression (SVR) output variable. As primary input parameter we have considered
differences of GRACE derived DTWS between two consecutive
For Support Vector Machine (SVM) Classification problem, the observations. Other associated input parameters considered are
goal is to find a hyperplane that separates different example Maximum Temperature, Minimum Temperature, Precipitation,
classes with maximum margin and for Support Vector Machine Wind and Humidity. Input variables and model structures have
Regression (SVR) problem, the goal is to construct a hyperplane been discussed in details in the following section.
that lies close to as many training data points as possible21.
For a set of N examples of fxk ; yk gNk¼1 ; x 2 Rm ; y 2 R,where x is
an input vector with m components and y is the corresponding 4. Model development
output value, the SVM estimator (f) on Regression can be expressed
as f ðxÞ ¼ w  /ðxÞ þ b, where w is weight vector and b is the bias. 4.1. Input variables & model structure
/ðÞ is the transfer function that maps input vectors to a high
dimensional feature space where simple linear regression method For all selected sites, we have taken differences of GWL, DTWS
can be applied. Optimization problem to solve this equation22 and GLDAS water content between two consecutive observation.
becomes This difference of GWL, DTWS and GLDAS water content between
XN two consecutive observation represents the changes in GWL and
1
minimize kwk2 þ C ðf þ f Þ corresponding changes in DTWS and GLDAS water content. We

w;b;f;f 2 k¼1 have considered these differences of GWL, DTWS and GLDAS water
subject to yk  wT /ðxk Þ  b 6  þ fk ; wT /ðxk Þ þ b  yk 6  þ fk content to be included in the model as the changes in DTWS and
ð1Þ GLDAS water content should relate to the corresponding changes
in GWL.
fk ; fk P 0; k ¼ 1; 2; . . . :N
As shown in the table (Table 2), the model input for GWL, DTWS
where f and f are slack variables that penalizes training errors over and GLDAS Water Content for an observation time t n , are
error tolerance . C determines the trade off between model com- ðGWLtn  GWLtn1 Þ; ðTWStn  TWStn1 Þ & ðGLDAStn  GLDAStn1 Þ
plexity and degree to which deviations larger than  are tolerated respectively. Where, GWLtn and GWLtn1 are observed GWL for asso-
in the optimization problem. ciated observation time tn and t n1 , similarly TWStn & TWStn1 are
Support Vectors are the input vectors (having non-zero Lagran- observed DTWS for the observation time tn and t n1 and GLDAStn
gian multiplier and satisfies KKT23 condition) that support the & GLDAStn1 are observed GLDAS Water Content data for observa-
structure of the estimator22 (Yoon et al., 2011). Kernel functions (K tion time t n and t n1 .
(xi ; xj Þ ¼ /ðxi Þ  /ðxj Þ; where /ðÞ is the transfer function) that are From this point onwards, GWL, DTWS and GLDAS water content
used in SVR are in general inner product kernel functions like Poly- would refer to the differences of GWL, DTWS and GLDAS water
p
nomial (Kðx; xi Þ ¼ ðxT xi þ 1Þ ), RBF24 (Kðx; xi Þ ¼ expðð2r1 2 Þkx  xi k2 Þ), content between two consecutive observations as explained above.
Sigmoid (Kðx; xi Þ ¼ tanhðb0 xT xi þ b1 Þ) etc. It has been found that there exists very high level of correlation
between DTWS and GLDAS water content for all selected sites
3.3. Artificial Neural Network – ANN (Table 3).
As DTWS and GLDAS Water Content are highly correlated for all
As it’s name suggests, Artificial Neural Network is developed selected sites, like previous studies (Rodell et al., 2009; Sun, 2013),
from biological nervous system. Input, hidden and output layers we also have not included GLDAS Water Content in our models.
with their nodes and activation functions are the basic elements GWL,DTWS,ðGWLtn  GWLtn1 Þ and ðTWStn  TWStn1 Þ data for all
of a generalized ANN structure25. In general, amþ1 ¼ observation sites have been shown in Fig. 2.
f
mþ1 mþ1
ðWmþ1 am þ b Þ for m ¼ 0; 1; . . . ðM  1Þ; a0 ¼ p; a ¼ aM ; where In order to study the relationship between GWL and DTWS, we
p is Input Variables and a is Network Outputs. Number of layers in have built two sets of models. In the first set of models, we have
the network structure is M. f; b and W are activation function, bias used only DTWS as the explanatory variable and GWL as depen-
and weight respectively. Back-Propagation25 and Resilient Back- dent variable (GWL  DTWS) to build Linear Regression Model
Propagation (Rprop)26 are efficient and widely used algorithms to (LRM), Support Vector Regression (SVR) and Artificial Neural Net-
train an ANN model. work (ANN) models. For second set of models, we have used mete-
For the purpose of building and verifying the LRM, SVR and ANN orological variables (Maximum Temperature, Minimum
models as described in the above sections, we have used R soft- Temperature, Precipitation, Wind and Humidity) along with DTWS
ware27 and different R packages (Package nortest28 for normality as explanatory variables and GWL as dependent variable to build
Support Vector Regression (SVR) and Artificial Neural Network
21
(ANN) models. Maximum Temperature (MaxTemp), Minimum
From Regression to Classification in Support Vector Machines by Massimiliano
Pontil, Ryan Rifkin and Theodoros Evgeniou.
Temperature (MinTemp), Wind (Wind) and Humidity (Humid) val-
22
Smola, Alex J., and Bernhard Schlkopf. ‘‘A tutorial on support vector regression.” ues that have been used for an observation time tn in the SVR and
Statistics and computing 14.3 (2004): 199–222. ANN models, are the average values of respective variables
23
KKT Condition:Karush–Kuhn–Tucker Optimality Condition. between the observation times tn and tn1 , whereas the value of
24
RBF: Radial Basis Function.
25
Precipitation (Prcpt) used in the models for observation time tn is
Demuth, Howard B.et al., Neural network design, 2nd Edition.
26
Rprop-Description and Implementation Details, Martin Riedmiller, Technical
29
Report, January 1994. https://cran.r-project.org/web/packages/e1071/index.html.
27 30
https://www.r-project.org/. https://cran.r-project.org/web/packages/neuralnet/neuralnet.pdf.
28 31
https://cran.r-project.org/web/packages/nortest/index.html. http://scikit-learn.org/stable/index.html.
A. Mukherjee, P. Ramachandran / Journal of Hydrology 558 (2018) 647–658 653

Table 2
Variables (GWL, DTWS and GLDAS water content) in Model Structure.

Observation Observed GWL used in Model Observed DTWS used in Model Observed GLDAS Water GLDAS Water Content used in Model
Time GWL Structure DTWS Structure Content Structure
t1 GWLt1 NA TWSt1 NA GLDASt1 NA
t2 GWLt2 ðGWLt2  GWLt1 Þ TWSt2 ðTWSt2  TWSt1 Þ GLDASt2 ðGLDASt2  GLDASt1 Þ
t3 GWLt3 ðGWLt3  GWLt2 Þ TWSt3 ðTWSt3  TWSt2 Þ GLDASt3 ðGLDASt3  GLDASt2 Þ

Table 3 SSR/SSTO20; where SSTO is total sum of squares, i.e., measurement


Correlation between DTWS and GLDAS Water Content.
Þ2 ) and SSR is regression sum of
of total variation (SSTO¼ Rðyi  y
Name of the Site Correlation Value p-Value(Pearson Correlation Test) Þ2 ) [yi ; ybi and y
squares (SSR¼ Rð ybi  y  are observaed values, pre-
Kotba 0.9315 4:44e16 dicted or fitted values from regression and mean of observed val-
Mhow 0.8858 < 2:20e16 ues respectively]. R2 value varies from 0 to 1 (0 6 R2  1) and it
Panitola 0.8940 < 2:22e16
indicates the amount of variability in the dependent variable
Sathamba 0.8386 5:95e10
(GWL), that could be explained with the help of the model
Sirigeri 0.7269 7:62e7
(GWL  DTWS), in other words, it indicates the importance of
DTWS to explain the variability in GWL. High values of R2 signifies
the total values of Precipitation between two observation times t n that the fraction of variability of the dependent variable explained
and t n1 . In order to account the lag time between precipitation by the regression model is high.
and its influence on groundwater level, we have included precipi- Correlation coefficient (q) has also been computed to study the
tation values of previous time period (precipitation with time lag 1, degree of linear relationship between GWL and DTWS.
denoted by Prcpt_LAG) in the model, i.e., while constructing the To compare performances among different models (LRM, SVR
model for observation time t n ,we have included precipitation value and ANN), we have considered correlation coefficient (q) and Root
for observation time t n1 along with precipitation value for time tn . Mean Square Error (RMSE) as performance indicators.
Due to limitation of the data, precipitation with lag time more than Correlation coefficient (q) between fitted value and actual val-
1 could not be added to the models. Thus, with the inclusion of ues (for training dataset) indicates fitness of the model and q
meteorological variables as explanatory covariates along with between predicted and actual values (for test dataset) indicates
DTWS, the SVR and ANN models becomes, GWL  DTWS + Max- prediction performance of the model. Correlation coefficient
Temp + MinTemp + Prcpt + Wind + Humid + Pcpt_LAG. We have between two variables x and y is expressed as qx;y ¼ COVðx;yÞ
rx ry ; where
not built LRM for GWL  DTWS + MaxTemp + MinTemp + Prcpt + COVðx; yÞ is the covariance between x and y and is expressed as
Wind + Humid + Prcpt_LAG, as there exist high level of correlations E½ðx  EðxÞÞðy  EðyÞÞ32 and rx and ry are standard deviations of x
among explanatory variables and that too is not consistent across and y respectively. Estimated statistics for q is calculated from the
selected sites. 1 n 
nRi¼1 ðxi xi Þðyi yi Þ

data of n observations as q
b ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where xi and
All the data has been normalized before building the models in 1 n ðnRi¼1 ðxi xi Þ ÞðnRi¼1 ðyi yi Þ Þ
2 1 n 2

order to avoid saturation. Normalized value for a data point X of a yi are observation pairs of x and y respectively for ith observation.
variable is calculated as X Normalized ¼ ðX  X Min Þ=ðX Max  X Min Þ, where Also correlation test needs to be performed (we have conducted
X Min and X Max are the minimum and maximum value of the variable Pearson correlation test at 95% Confidence Interval) to ensure exis-
for the entire dataset. Normalization would scale the values of the tence of correlation between two variables. The value of jqj varies
variables from 0 to 1, therefore after the development of the model, from 0 to 1 where value 1 signifies perfect correlation (+1 for perfect
all the fitted and predicted values for the dependent variable have positive correlation and 1 for perfect negative correlation) and
been inverted using the mentioned equation. value 0 indicates no correlation. Therefore a value of q close to +1
While developing the Linear Regression Models, we have per- implies high degree of correlation between fitted and actual value
formed residual analysis that includes normality tests for stan- (for training set) or between predicted and actual value (for testing
dardized residuals to ensure robustness of the models. SVR set). In other words, for training set, if fitted and actual values (pre-
models have been tuned for cost (C), error tolerance () and kernel dicted and actual values for testing set) are very close, the value of
functions in order to improve the performance of the model. Sim- correlation coefficient would be high (close to +1) indicating good
ilarly, ANN models have been tuned for hidden layer structure and fitness of the model (for training set) or good prediction performance
activation functions. (for testing set).
Root Mean Square Error (RMSE) is the measurement for devia-
tions of fitted and predicted (for training and testing set respec-
4.2. Training & test data set creation
tively) from actual values. Mathematical expression of RMSE is
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
For each observation site, we have created training and test data RMSE ¼ ð1n Rni¼1 ðyi  zi Þ2 Þ; where n is the number of elements in
sets in 80:20 ratio. All models (LRM, SVR and ANN) for the two sets training or testing set, yi and zi are the actual and fitted or pre-
of model structures (GWL  DTWS and GWL  DTWS + MaxTemp dicted (for training or testing set respectively) values of the depen-
+ MinTemp + Prcpt + Wind + Humid + Prcpt_LAG), have been dent variable. For a perfectly fitted or predicted model (all values of
trained and tested with same training and testing dataset. 5-Fold yi and zi are same) RMSE is 0 and it increases as the deviation
cross validation has been performed in order to ensure the robust- between actual and fitted or predicted values of dependent vari-
ness and reliability of the models. able increases. Therefore, less value of RMSE indicates high level
of fitness and prediction (for training and testing set respectively)
4.3. Performance criteria and high value of RMSE indicates otherwise.

We have noted R2 value of the Linear Regression Model


GWL  DTWS, for all observation sites. R2 value is calculated as 32
E(.) is the expectation function.
654 A. Mukherjee, P. Ramachandran / Journal of Hydrology 558 (2018) 647–658

5. Results and discussions Table 4


Relationship between GWL and DTWS: R2 Value for GWL  DTWS.

5.1. R2 and q values: GWL and DTWS Name of the Site R2 Value Adjusted R2 Value
Kotba 0.7428 0.735
For all our observation sites, we have found that DTWS is highly Mhow 0.3648 0.3549
significant variable for GWL and the R2 value of the Linear Regres- Panitola 0.6725 0.6649
Sathamba 0.6411 0.6299
sion Model GWL  DTWS varies from 0.3648 (for Mhow) to 0.7428
Sirigeri 0.4905 0.4751
(for Kotba).
As R2 value explains the amount of variability in dependent
variable (GWL) could be explained by independent variable Table 5
(DTWS); we can observe that the variability of GWL explained by Relationship between GWL and DTWS: q Values between GWL and DTWS.
DTWS for selected observation sites varies from 36.48% (for Mhow)
Name of the Site q Value p-Value(Pearson Correlation Test)
to 74.28% (for Kotba). It could be observed from Table 4 that GWL
and DTWS is highly correlated and the q values between these two Kotba 0:8619 2:94e11
Mhow 0:6040 7:92e08
variables ranges from 0.6040 (for Mhow) to 0.8619 (for Kotba).
Panitola 0:8201 5:52e12
Negative values of q occurs because of the fact that increment in
Sathamba 0:8007 1:31e08
DTWS would indicate rise in the groundwater storage causing Sirigeri 0:7004 2:81e06
decrement in the measurement of GWL as it is measured in terms
of meters below ground level (mbgl). This negative correlation
between GWL and DTWS measurements could also be observed
in Fig. 2. This observation of existence of strong correlation
between GWL and DTWS is consistent with previous studies
(Shamsudduha et al., 2012Panda and Wahr, 2015).
One of the possible reasons for low value of R2 and q between
GWL and DTWS for Mhow and Sirigeri could be the presence of
two large reservoirs, namely Indira Sagar (Latitude: 22.24°North
and Longitude: 76.52°East) and Tungabhadra reservoir (Latitude:
15.24°North and Longitude: 76.31°East) respectively. Indira Sagar
reservoir is extremely close to the GRACE 1 degree resolution grid
for Mhow (Latitude: 22.5°North To 23.5°North and Longitude:
75.5°East To 76.5°East) and similarly Tungabhadra reservoir is very
close to the GRACE 1 degree resolution grid for Sirigeri (Latitude:
14.5°North To 15.5°North and Longitude: 76.5°East To 77.5°East).
Water Resource Information System of India (India-WRIS)33 main-
tains and publishes reservoir level data for India and it has been Fig. 3. Reservoir Storage Data (2005 To 2013): Indira Sagar Reservoir.
observed from this India-WRIS reservoir level data, that during the
time period of the study (2005 to 2013), Indira Sagar and Tungab-
hadra reservoirs have experienced average yearly fluctuation
(defined in terms of range, the difference between maximum and
minimum values) of 7.143BCM34 (equivalent to 7:143 1012 kg of
12
water) and 3.102BCM (equivalent to 3:102 10 kg of water)
respectively (refer to Figs. 3 and 4). These large variations of water
mass in Indira Sagar and Tungabhadra reservoir have not been
accounted in the GRACE DTWS data for the neighbouring GRACE 1
degree resolution grid for Mhow and Sirigeri respectively and thus
could affect the relationship between GWL and DTWS for these
two sites.

5.2. Comparison of performance: GWL  DTWS

As discussed in the earlier section, for the first set of model


(GWL  DTWS), we have developed LRM, SVR and ANN models Fig. 4. Reservoir Storage Data (2005 To 2013): Tungabhadra Reservoir.
and have performed comparative performance analysis w.r.t RMSE
and q for both test and train data sets. for test set) performances among 3 methodologies (LRM, SVR and
Please note that for all the tables mentioned below we have tab- ANN), we have shown data points of Panitola site as an example
ulated the values of performance indicators (RMSE & q) for all (see Figs. 5 and 6).
observation sites and methodologies (LRM, SVR & ANN). Also, we For training data (Tables 6 & 7), in terms of both RMSE and q,
have mentioned their comparative rankings within first brackets though both SVR and ANN performs better that LRM (for RMSE,
and calculated average ranks of the methodologies have also been average ranks of ANN, SVR and LRM are 1.2, 2.0 and 2.8 respec-
included. tively and for q, average ranks of ANN, SVR and LRM are 1.4, 1.8
For graphical representation of modelling (Actual and Fitted and 2.8), there is no methodology that outperforms other two uni-
Values for training set) and prediction (Actual and Predicted Values formly for all 5 observation sites. In terms of both q and RMSE for
training set, ANN performs slightly better than SVR with best q val-
33
http://www.india-wris.nrsc.gov.in/ReservoirApp.html. ues for 3 observation sites and best RMSE values for 4 observation
34
BCM:Billion Cubic Meters. sites.
A. Mukherjee, P. Ramachandran / Journal of Hydrology 558 (2018) 647–658 655

Fig. 5. Actual and Fitted Values for Training Set - Panitola; Model: GWL DTWS.

Fig. 6. Actual and Predicted Values for Test Set - Panitola; Model: GWL DTWS.

Table 6 Table 8
Train Data RMSE Model: GWL  DTWS. Test Data RMSE Model: GWL  DTWS.

Name of the Site LRM SVR ANN Name of the Site LRM SVR ANN
Kotba 0.1637(2) 0.1739(3) 0.1627(1) Kotba 0.1679(3) 0.1611(1) 0.1675(2)
Mhow 0.1497(3) 0.1423(1) 0.1430(2)
Mhow 0.1538(3) 0.1445(2) 0.1414(1)
Panitola 0.1434(3) 0.1433(2) 0.1425(1)
Panitola 0.1480(1) 0.1517(3) 0.1496(2)
Sathamba 0.1394(3) 0.1331(2) 0.1289(1)
Sathamba 0.1392(3) 0.1379(2) 0.1309(1)
Sirigeri 0.1510(3) 0.1487(2) 0.1432(1)
Sirigeri 0.1452(2) 0.1404(1) 0.1482(3)
AVERAGE RANK 2.8 2.0 1.2
AVERAGE RANK 2.4 1.8 1.8

Table 7 not much significant difference in terms of absolute values of RMSE


Train Data q Model: GWL  DTWS. and q for these three methodologies. In other words, performance
Name of the Site LRM SVR ANN
of LRM is comparable with SVR and ANN, as for small dataset (size
of the dataset varies from 35 to 67 across observation sites), the
Kotba 0.8622(2) 0.8505(3) 0.8639(1)
generalization performance of simple linear models is better than
Mhow 0.6007(3) 0.6519(1) 0.6457(2) complex nonlinear models like SVR or ANN (Mrch et al., 1997).
Panitola 0.8219(3) 0.8257(1) 0.8243(2)
Sathamba 0.8085(3) 0.8256(2) 0.8346(1)
Sirigeri 0.7044(3) 0.7258(2) 0.7396(1)
5.3. Comparison of performance: GWL  D TWS + meteorological
AVERAGE RANK 2.8 1.8 1.4
variables

For the second set of models we have included meteorological


In case of RMSE for test data (Table 8), our observation is more variables along with DTWS to model and predict GWL. As
or less similar to our findings for training data set. For test set also, explained in previous section, meteorological variables that we
again SVR and ANN (average rank for both is 1.8) performs better have included are Maximum Temperature (MaxTemp), Minimum
than LRM (average rank is 2.4). While comparing the values of q Temperature (MinTemp), Precipitation (Prcpt), Wind, Humidity
for test dataset (Table 9), we have observed that SVR has outper- (Humid) and Precipitation with 1 time lag (Prcpt_LAG). We have
forms ANN and LRM with an average rank of 1.4 where the average developed ANN and SVR models with these input variables. Like
ranks of ANN and LRM are 1.8 and 2.8 respectively. previous section, all tables include values of performance indica-
Though for both the dataset, we could observe that SVR and tors (RMSE & q) for all observation sites and methodologies (SVR
ANN performs better than LRM in terms of RMSE and q, there is & ANN) and comparative rankings have been mentioned within
656 A. Mukherjee, P. Ramachandran / Journal of Hydrology 558 (2018) 647–658

Table 9 Table 10
Test Data q Model: GWL  DTWS. Train Data RMSE Model: GWL  DTWS + MaxTemp + MinTemp + Prcpt + Wind +
Humid + Prcpt_LAG.
Name of the Site LRM SVR ANN
Name of the Site SVR ANN
Kotba 0.8727(3) 0.9017(1) 0.8756(2)
Mhow 0.6277(3) 0.6416(2) 0.6577(1) Kotba 0.0962(2) 0.0565(1)
Panitola 0.8173(3) 0.8192(1) 0.8176(2) Mhow 0.1162(2) 0.1103(1)
Sathamba 0.8424(3) 0.8441(2) 0.8494(1) Panitola 0.1236(2) 0.1178(1)
Sirigeri 0.7386(2) 0.7531(1) 0.7366(3) Sathamba 0.0772(1) 0.0859(2)
Sirigeri 0.1183(1) 0.1194(2)
AVERAGE RANK 2.8 1.4 1.8
AVERAGE RANK 1.6 1.4

first brackets. Calculated average ranks of the methodologies have Table 11


also been included. As previous section, Panitola has been taken as Train Data q Model: GWL  D WS + MaxTemp + MinTemp + Prcpt + Wind + Humid +
a sample observation site for graphical representation of modelling Prcpt_LAG.
(Actual and Fitted Values for training set) and prediction (Actual Name of the Site SVR ANN
and Predicted Values for test set) performances between SVR and
Kotba 0.9571(2) 0.9837(1)
ANN (see Figs. 7 and 8). Mhow 0.7201(2) 0.8018(1)
In case of training data set (Tables 10 and 11), the performance Panitola 0.8803(2) 0.8845(1)
of SVR and ANN are comparable as the average ranks of SVR and Sathamba 0.9466(1) 0.9299(2)
ANN are 1.6 and 1.4 respectively for both RMSE and q . For test Sirigeri 0.8345(1) 0.8230(2)

data set (Tables 12 and 13), it could easily be observed that SVR AVERAGE RANK 1.6 1.4
clearly outperforms ANN in terms of both RMSE and q, for all
observation sites.
In most cases, both modelling and prediction performance of Table 12
SVR has improved significantly with inclusion of meteorological Test Data RMSE Model: GWL  DTWS + MaxTemp + MinTemp + Prcpt + Wind +
variables though this could not be concluded for ANN due to it’s Humid + Prcpt_LAG.
inherent network structures as the network structures are differ- Name of the Site SVR ANN
ent for two sets of models. Kotba 0.1437(1) 0.1878(2)
Mhow 0.1394(1) 0.1481(2)
Panitola 0.1306(1) 0.1455(2)
5.4. Summary & conclusions
Sathamba 0.1206(1) 0.1424(2)
Sirigeri 0.1475(1) 0.1500(2)
Finally, to conclude, we have observed in our study that DTWS
AVERAGE RANK 1.0 2.0
is a significant variable to model and predict GWL for the selected
observation sites in India. Particularly for a small and irregular

Fig. 7. Actual & Fitted Values for Training Set-Panitola; Model: GWL D TWS + MaxTemp + MinTemp + Prcpt + Wind + Humid + Prcpt_LAG.

Fig. 8. Actual & Predicted Values for Test Set-Panitola; Model:GWL DTWS + MaxTemp + MinTemp + Prcpt + Wind + Humid + Prcpt_LAG.
A. Mukherjee, P. Ramachandran / Journal of Hydrology 558 (2018) 647–658 657

Table 13 and forecasting in uncertain and complex environments: case of a large


Test Data q, Model: GWL  DTWS + MaxTemp + MinTemp + Prcpt + Wind + Humid + metropolitan city. J. Water Resour. Plann. Manage. 138 (1), 71–75.
Prcpt_LAG. Chinnasamy, P., Agoramoorthy, G., 2015. Groundwater storage and depletion trends
in Tamil Nadu State, India. Water Resour. Manage. 29 (7), 2139–2152.
Name of the Site SVR ANN Daliakopoulos, I.N., Coulibaly, P., Tsanis, I.K., 2005. Groundwater level forecasting
using artificial neural networks. J. Hydrol. 309 (1), 229–240.
Kotba 0.9130(1) 0.8654(2)
Dall, P., Mueller Schmied, H., Schuh, C., Portmann, F.T., Eicker, A., 2014. Global-scale
Mhow 0.7876(1) 0.7151(2) assessment of groundwater depletion and related groundwater abstractions:
Panitola 0.9182(1) 0.8930(2) Combining hydrological modeling with information from well observations and
Sathamba 0.8958(1) 0.8431(1) GRACE satellites. Water Resour. Res. 50 (7), 5698–5720.
Sirigeri 0.7771(1) 0.7358(2) Dos Santos, C.C., Pereira Filho, A.J., 2014. Water demand forecasting model for the
AVERAGE RANK 1.0 2.0 metropolitan area of so Paulo, Brazil. Water Resour. Manage. 28 (13), 4401–
4414.
Emamgholizadeh, S., Moslemi, K., Karami, G., 2014. Prediction the groundwater
level of bastam plain (Iran) by artificial neural network (ANN) and adaptive
neuro-fuzzy inference system (ANFIS). Water Resour. Manage. 28 (15), 5433–
time series data like the data of our study, DTWS could be very
5446.
useful for modelling GWL. Feng, W., Zhong, M., Lemoine, J.-M., Biancale, R., Hsu, H.-T., Xia, J., 2013. Evaluation
Like previous studies conducted by Rodell et al. (2009), we also of groundwater depletion in North China using the Gravity Recovery and
have found that GLDAS water content is not significant for ground Climate Experiment (GRACE) data and ground-based measurements. Water
Resour. Res. 49 (4), 2110–2118.
water storage variations and have observed that it is highly corre- Garduño, H., Romani, S., Sengupta, B., Tuinhof, A., Davis, R., India groundwater
lated with DTWS. Because of this reason, similar to the attempt governance case study 2011.
made by Sun (2013), we have tried to model and predict GWL with Haque, M.M., Rahman, A., Hagare, D., Kibria, G., 2014. Probabilistic water demand
forecasting using projected climatic data for Blue Mountains water supply
the help of only DTWS and other meteorological variables (and system in Australia. Water Resour. Manage. 28 (7), 1959–1971.
have not included GLDAS water content as an input variable). He, Z., Zhang, Y., Guo, Q., Zhao, X., 2014. Comparative study of artificial neural
In our study, for the selected observation sites, it has been found networks and wavelet artificial neural networks for groundwater depth data
forecasting with various curve fractal dimensions. Water Resour. Manage. 28
that there exists a strong correlation between DTWS and corre- (15), 5297–5317.
sponding observation well measurement data for GWL (Table 5: Islam, A., Sikka, A.K., Saha, B., Singh, A., 2012. Streamflow response to climate
0:8618768 6 q 6 0:6039772) and the amount of total varia- change in the Brahmani River Basin,India. Water Resour. Manage. 26 (6), 1409–
1424.
tions in GWL that could be explained with the help of DTWS varies Karthikeyan, L., Kumar, D.N., Graillot, D., Gaur, S., 2013. Prediction of ground water
from 36.48% to 74.28% (Table 4: 0:3648 6 R2 6 0:7428). levels in the uplands of a tropical coastal riparian wetland using artificial neural
We have found that in case of training data set, for the model networks. Water Resour. Manage. 27 (3), 871–883.
Konikow, L.F., 1900. Contribution of global groundwater depletion since, to sea-
GWL  DTWS, calculated range of q values in our study, varies level rise. Geophys. Res. Lett. 38 (17), 2011.
from 0.6007 to 0.8622 (for LRM), from 0.6519 to 0.8505 (for SVR) Mays, L.W., 2013. Groundwater resources sustainability: past, present, and future.
and from 0.6457 to 0.8639 (for ANN). For the same training dataset, Water Resour. Manage. 27 (13), 4409–4424.
Mirzavand, M., Ghazavi, R., 2015. A stochastic modelling technique for groundwater
the observed range of q values in our study, varies from 0.7201 to level forecasting in an arid environment using time series methods. Water
0.9571 (for SVR) and from 0.8018 to 0.9837 (for ANN) for the Resour. Manage 29 (4), 1315–1328.
model GWL DTWS + MaxTemp + MinTemp + Prcpt + Wind + Mohanty, S., Jha, M.K., Raul, S.K., Panda, R.K., Sudheer, K.P., 2015. Using artificial
neural network approach for simultaneous forecasting of weekly groundwater
Humid + Prcpt_LAG. In case of test dataset, calculated range of q levels at multiple sites. Water Resour. Manage. 29 (15), 5521–5532.
values varies from 0.6277 to 0.8727 (for LRM), from 0.6416 to Moosavi, V., Vafakhah, M., Shirmohammadi, B., Behnia, N., 2013. A wavelet-ANFIS
0.9017 (for SVR) and from 0.6577 to 0.8756 (for ANN) for the hybrid model for groundwater level forecasting for different prediction periods.
Water Resour. Manage. 27 (5), 1301–1321.
model GWL  DTWS.
Mrch, N., Hansen, L.K., Strother, S.C., Svarer, C., Rottenberg, D.A., Lautrup, B., Savoy,
For the model GWL DTWS + MaxTemp + MinTemp + Prcpt + R., Paulson, O.B., 1997. Nonlinear versus linear models in functional
Wind + Humid + Prcpt_LAG, calculated values of q varies from neuroimaging: learning curves and generalization crossover, Biennial
0.7771 to 0.9182 (for SVR) and from 0.7151 to 0.8930 (for ANN), International Conference on Information Processing in Medical Imaging.
Springer, pp. 259–270.
for the same test dataset. This observed range of q values that Panda, D.K., Wahr, J., 2015. Spatiotemporal evolution of water storage changes in
we have found in our study is comparable with the reported q val- India from the updated GRACE-derived gravity records. Water Resour. Res.
ues in earlier studies conducted by Sun (2013). Rodell, M., Famiglietti, J., 2002. The potential for satellite-based monitoring of
groundwater storage changes using grace: the high plains aquifer, central us. J.
Also, we have found in our study that for the purpose of mod- Hydrol. 263 (1), 245–256.
elling and predicting GWL, with the inclusion of other meteorolog- Rodell, M., Chen, J., Kato, H., Famiglietti, J.S., Nigro, J., Wilson, C.R., 2007. Estimating
ical variables SVR along with DTWS as explanatory variables, SVR groundwater storage changes in the mississippi river basin (usa) using grace.
Hydrogeol. J. 15 (1), 159–166.
performs more efficiently than ANN and this finding is consistent Rodell, M., Velicogna, I., Famiglietti, J.S., 2009. Satellite-based estimates of
with previous studies by Yoon et al. (2011). groundwater depletion in india. Nature 460 (7258), 999–1002.
Scanlon, B.R., Longuevergne, L., Long, D., 2012. Ground referencing grace satellite
estimates of groundwater storage changes in the california central valley, USA.
Water Resour. Res. 48 (4).
References Schewe, J., Heinke, J., Gerten, D., Haddeland, I., Arnell, N.W., Clark, D.B., Dankers, R.,
Eisner, S., Fekete, B.M., Colón-González, F.J., et al., 2014. Multimodel assessment
of water scarcity under climate change. Proc. Nat. Acad. Sci. 111 (9), 3245–3250.
Adamowski, J., Chan, H.F., 2011. A wavelet neural network conjunction model for
Sekhri, S. et al., 2013. Sustaining groundwater: role of policy reforms in promoting
groundwater level forecasting. J. Hydrol. 407 (1), 28–40.
conservation in india. India Policy Forum 9, 149.
Adamowski, J., Fung Chan, H., Prasher, S.O., Ozga-Zielinski, B., Sliusarieva, A., 2012.
Shamsudduha, M., Taylor, R.G., Longuevergne, L., 2012. Monitoring groundwater
Comparison of multiple linear and nonlinear regression, autoregressive
storage changes in the highly seasonal humid tropics: validation of GRACE
integrated moving average, artificial neural network, and wavelet artificial
measurements in the Bengal Basin. Water Resour. Res. 48 (2).
neural network methods for urban water demand forecasting in Montreal,
Shirmohammadi, B., Vafakhah, M., Moosavi, V., Moghaddamnia, A., 2013.
Canada. Water Resour. Res. 48 (1).
Application of several data-driven techniques for predicting groundwater
Al-Zahrani, M.A., Abo-Monasar, A., 2015. Urban residential water demand
level. Water Resour. Manage 27 (2), 419–432.
prediction based on artificial neural networks and time series models. Water
Slavkov, L., Mal, V., Rost, M., Petruela, L., Vojek, O., 2013. Impacts of climate
Resour. Manage. 29 (10), 3651–3662.
variables on residential water consumption in the Czech Republic. Water
Arandia, E., Ba, A., Eck, B., McKenna, S., 0401. Tailoring seasonal time series models
Resour. Manage. 27 (2), 365–379.
to forecast short-term water demand. J. Water Resour. Plann. Manage.,
Sun, A.Y., 2013. Predicting groundwater level changes using GRACE data. Water
04015067
Resour. Res. 49 (9), 5900–5912.
Awchi, T.A., 2014. River discharges forecasting in northern Iraq using different ANN
Syed, T.H., Famiglietti, J.S., Rodell, M., Chen, J., Wilson, C.R., 2008. Analysis of
techniques. Water Resour. Manage. 28 (3), 801–814.
terrestrial water storage changes from grace and gldas. Water Resources
Azadeh, A., Neshat, N., Hamidipour, H., 2011. Hybrid fuzzy regression artificial
Research 44 (2).
neural network for improvement of short-term water consumption estimation
658 A. Mukherjee, P. Ramachandran / Journal of Hydrology 558 (2018) 647–658

Taylor, R.G., Scanlon, B., Döll, P., Rodell, M., Van Beek, R., Wada, Y., Longuevergne, L., Tiwari, M.K., Adamowski, J.F., 2014. Medium-term urban water demand forecasting
Leblanc, M., Famiglietti, J.S., Edmunds, M., et al., 2013. Ground water and with limited data using an ensemble wavelet bootstrap machine-learning
climate change. Nature Climate Change 3 (4), 322–329. approach. J. Water Resour. Plann. Manage. 141 (2), 04014053.
Teutschbein, C., Grabs, T., Karlsen, R.H., Laudon, H., Bishop, K., 2015. Hydrological Yoon, H., Jun, S.-C., Hyun, Y., Bae, G.-O., Lee, K.-K., 2011. A comparative study of
response to changing climate conditions: spatial streamflow variability in the artificial neural networks and support vector machines for predicting
boreal region. Water Resour. Res. 51 (12), 9425–9446. groundwater levels in a coastal aquifer. J. Hydrol. 396 (1), 128–138.
Tiwari, M.K., Adamowski, J., 2013. Urban water demand forecasting and uncertainty
assessment using ensemble wavelet-bootstrap-neural network models. Water
Resour. Res. 49 (10), 6486–6507.

You might also like