1 s2.0 S2090447914001129 Main
1 s2.0 S2090447914001129 Main
1 s2.0 S2090447914001129 Main
Journal of Hydrology
journal homepage: www.elsevier.com/locate/jhydrol
Research papers
Prediction of GWL with the help of GRACE TWS for unevenly spaced time
series data in India : Analysis of comparative performances of SVR, ANN
and LRM
Amritendu Mukherjee ⇑, Parthasarathy Ramachandran
Indian Institute of Science, Bangalore 560012, India
a r t i c l e i n f o a b s t r a c t
Article history: Prediction of Ground Water Level (GWL) is extremely important for sustainable use and management of
Received 5 October 2017 ground water resource. The motivations for this work is to understand the relationship between Gravity
Received in revised form 29 January 2018 Recovery and Climate Experiment (GRACE) derived terrestrial water change (DTWS) data and GWL, so
Accepted 5 February 2018
that DTWS could be used as a proxy measurement for GWL. In our study, we have selected five observa-
Available online 9 February 2018
This manuscript was handled by Emmanouil
tion wells from different geographic regions in India. The datasets are unevenly spaced time series data
Anagnostou, Editor-in-Chief, with the which restricts us from applying standard time series methodologies and therefore in order to model and
assistance of Viviana Maggioni, Associate predict GWL with the help of DTWS, we have built Linear Regression Model (LRM), Support Vector
Editor Regression (SVR) and Artificial Neural Network (ANN). Comparative performances of LRM, SVR and
ANN have been evaluated with the help of correlation coefficient (q) and Root Mean Square Error
Keywords: (RMSE) between the actual and fitted (for training dataset) or predicted (for test dataset) values of
Groundwater level prediction GWL. It has been observed in our study that DTWS is highly significant variable to model GWL and
GRACE gravitational anomalies the amount of total variations in GWL that could be explained with the help of DTWS varies from
Linear regression model 36.48% to 74.28% ð0:3648 6 R2 6 0:7428Þ. We have found that for the model GWL DTWS, for both train-
Support vector regression ing and test dataset, performances of SVR and ANN are better than that of LRM in terms of q and RMSE. It
Artificial neural network
also has been found in our study that with the inclusion of meteorological variables along with DTWS as
input parameters to model GWL, the performance of SVR improves and it performs better than ANN.
These results imply that for modelling irregular time series GWL data, DTWS could be very useful.
Ó 2018 Elsevier B.V. All rights reserved.
1. Introduction & motivation Reduction of groundwater storage, i.e., groundwater depletion has
been considered as a global problem that threatens the sustainabil-
The growth and sustainability of human civilization has been ity of water supply (Mays, 2013). For the time period of 1900 to
dependent greatly on the availability of water. Groundwater is 2008, the estimated depletion of global groundwater is 4500 km3
the source of 1/3rd of all freshwater withdrawals and supplies (equivalent to 12.6 mm sea-level rise) and the maximum rate of
around 36%, 42% and 27% of water for domestic, agriculture and depletion has occurred during the time period 2000 to 2008 with
industrial purposes respectively (Taylor et al., 2013). Groundwater average rate of 145 km3/year (equivalent to 0.4 mm/year sea-
is a renewable resource and requires proper management. Demand level rise) (Konikow, 1900). India is the largest groundwater user
for groundwater has been increasing due to various factors and in the world. The estimated usage of groundwater is around 230
efficient management is required to ensure long term supply of km3 per year which is more than 25% of global total1. Groundwater
groundwater (Mays, 2013). According to the Alicante Declaration has been very important for India to maintain it’s economy, environ-
(resulted from International Symposium on Groundwater. ment and standard of living as more than 60% of irrigated agriculture
Sustainability-ISGWAS, held in Alicante, Spain from 23rd to and 85% of drinking water supply depends on groundwater
27th January, 2006), availability of ground water depends on (Garduño et al., 2011). Rodell et al. (2009) have shown in their study
responsible use and governance. Current situation of Ground that in India, during the period from August 2002 to October 2008,
Water Level (GWL) has been extremely alarming around the globe. mean rate of groundwater depletion was 4.0 ± 1.0 cm/year equiva-
⇑ Corresponding author. 1
World Bank, 2010, Deep wells and prudence: towards pragmatic action for
E-mail address: [email protected] (A. Mukherjee). addressing groundwater overexploitation in India. Washington, DC: World Bank.
https://doi.org/10.1016/j.jhydrol.2018.02.005
0022-1694/Ó 2018 Elsevier B.V. All rights reserved.
648 A. Mukherjee, P. Ramachandran / Journal of Hydrology 558 (2018) 647–658
3
lent height of water (17:7 4:5 km =year) for Rajasthan, Punjab, locations, whereas DTWS is a continuous measurement in space.
Haryana and Delhi and the total groundwater depletion during this Therefore, it is extremely important to study and understand the
study period (August 2002 to October 2008) for these regions was degree of correlation between DTWS and GWL, if we want to use
3 DTWS as a proxy measurement for GWL.
equivalent to 109 km of water. During the period of 30 years
Feng et al. (2013) have used GRACE data to estimate changes in
between 1980 and 2010, major areas of India have experienced sub-
groundwater levels in North China region for the period 2003 to
stantial decline of groundwater level (depth to water, measured in
2010. In the Bengal Basin of Bangladesh, Shamsudduha et al.
terms of meters below ground level) varying from 4 meters to 16
(2012) have shown in their study that GRACE datasets of ground-
meters (Sekhri et al., 2013). Apart from irrigation, urbanisation and
water storage changes (DGWS) has a strong correlation
climate change are major factors that affect groundwater storage
(0:77 6 r 0:93) with in situ borehole records and account for
(Mays, 2013; Garduño et al., 2011; Schewe et al., 2014; Islam
44% of the total variation in Terrestrial Water Storage (DTWS).
et al., 2012; Slavkov et al., 2013; Teutschbein et al., 2015). In the last
Sun (2013) has predicted groundwater level changes from Terres-
decade, India has observed considerable amount of declination in the
trial Water Storage change (DTWS), provided from GRACE satellite
groundwater level. About 65% of wells in India have shown declina-
data with the help of Artificial Neural Network (ANN) for different
tion in groundwater level in January 2016 compared to decadal
regions in United States of America. In the study conducted by
mean of groundwater level for January (from 2006 to 2015) 2.
Panda and Wahr (2015), it has been observed that there exists a
GWL is measured in terms of depth to ground water from land
high degree of correlation between GRACE derived Ground Water
surface and is a measurement from observation wells, situated at
Storage and in situ groundwater levels from observation wells.
spatially discrete points. Also GWL provides an idea about the
Different statistical and machine learning methodologies have
water level but not the volume of it. Variations of Terrestrial Water
been applied to predict groundwater storage and to understand
Storage (DTWS), derived from Gravity Recovery and Climate
the impact of different variables on groundwater storage. Among
Experiment (GRACE) satellite data has been used extensively by
the statistical learning methodology, Linear & Non Linear Regres-
the researchers to understand groundwater storage conditions
sion and Correlation Analysis (Adamowski et al., 2012; Tiwari
and trends with the help of it (Sun, 2013; Feng et al., 2013;
and Adamowski, 2013; Mirzavand and Ghazavi, 2015; Dall et al.,
Shamsudduha et al., 2012; Panda and Wahr, 2015) (All reported
2014; Azadeh et al., 2011; Shamsudduha et al., 2012;
DTWS data are anomalies relative to 2004–2009 time-mean
Chinnasamy and Agoramoorthy, 2015; Panda and Wahr, 2015),
baseline3).
Time Series Models (ARMA4, ARIMA5, SARIMA6etc.)(Adamowski
Rodell and Famiglietti (2002) discussed about potential usage of
et al., 2012; Tiwari and Adamowski, 2013; Adamowski and Chan,
GRACE data for monitoring variations in groundwater storage. For
2011; Dall et al., 2014; Al-Zahrani and Abo-Monasar, 2015;
the time period January 2002 to July 2005, groundwater storage
Arandia et al., 0401; Shirmohammadi et al., 2013) have been widely
changes in the Mississipi River basin, USA, have been estimated
used. Among other machine learning methodologies, Artificial Neu-
using GRACE data by Rodell et al. (2007). In their study they have
ral Networks (ANN) (Adamowski et al., 2012; Tiwari and
shown the importance of groundwater storage assessment from
Adamowski, 2013; Adamowski and Chan, 2011; Al-Zahrani and
GRACE data. For California Central Valley, USA, Scanlon et al.
Abo-Monasar, 2015; Dos Santos and Pereira, 2014; Moosavi et al.,
(2012) have estimated groundwater storage changes from GRACE
2013; Emamgholizadeh et al., 2014; Azadeh et al., 2011; Sun,
data from April 2006 to September 2009. Dall et al. (2014) have
2013; He et al., 2014; Mohanty et al., 2015; Karthikeyan et al.,
modelled and analysed the trends of Groundwater Depletion
2013; Daliakopoulos et al., 2005; Yoon et al., 2011), Wavelet ANN
(GWD) and Terrestrial Water Storage (TWS), derived from GRACE
(WA-ANN) (Adamowski et al., 2012; Tiwari and Adamowski, 2013;
satellite data on a global scale. They have shown in their study that
Adamowski and Chan, 2011; Moosavi et al., 2013; He et al., 2014;
the highest GWD rates in the first decade of the 21st century
Tiwari and Adamowski, 2014), Adaptive Neuro-Fuzzy Inference Sys-
occurred in India, United States, Iran, Saudi Arabia, and China. Also,
tem (ANFIS) (Moosavi et al., 2013; Emamgholizadeh et al., 2014;
they have found that the rate of global GWD has likely more than
Shirmohammadi et al., 2013), Wavelet- ANFIS (Moosavi et al.,
doubled since the period 1960–2000. Chinnasamy and
2013), Support Vector Regression (SVR) (Yoon et al., 2011) have been
Agoramoorthy (2015) have studied impact of irrigation on ground-
used extensively by different research groups. Many of these studies
water by analysing groundwater storage and depletion trends with
have also tried to evaluate relative performances (in terms of predic-
the help of Gravity Recovery and Climate Experiment (GRACE) and
tion mainly) of different methodologies (Moosavi et al., 2013;
the Global Land Data Assimilation Systems (GLDAS) data in Tamil
Daliakopoulos et al., 2005; He et al., 2014; Karthikeyan et al.,
Nadu State, India for the time period 2002 to 2012. They have used
2013; Awchi, 2014; Yoon et al., 2011; Adamowski et al., 2012;
descriptive statistical analysis in their study. Panda and Wahr
Adamowski and Chan, 2011; Shirmohammadi et al., 2013).
(2015) have studied variations of Terrestrial Water Storage (DTWS)
From the literature on trend analysis and forecasting method-
and Ground Water Storage (DGWS) data, derived from Gravity
ologies in water resource research area, we could observe that
Recovery and Climate Experiment (GRACE) satellite data for the
temperature and precipitation (Adamowski et al., 2012; Tiwari
period of January 2003 to May 2014 in India. They have found that
and Adamowski, 2013; Azadeh et al., 2011; Panda and Wahr,
substantial GWS depletion has taken place in the northern part of
2015; Al-Zahrani and Abo-Monasar, 2015; Shirmohammadi et al.,
the country, particularly at Ganges Basin and Punjab state with
2013; Haque et al., 2014; Dos Santos and Pereira, 2014; Moosavi
depletion rates of 1.25 cm year1 and 2.1 cm year1 respectively.
et al., 2013; Sun, 2013; Karthikeyan et al., 2013; Yoon et al.,
As GRACE derived DTWS requires adjustments for other compo-
2011; Tiwari and Adamowski, 2014) have been consistently used
nents and involves errors due to statistical downscaling methodol-
as explanatory meteorological variables for both statistical and
ogy (Rodell and Famiglietti, 2002), it is not an exact measurement
probabilistic models and also for other machine learning tech-
for Ground Water Storage. Also, GWL is measured in terms of
niques. For some studies other meteorological variables like
meters below ground level (mbgl) from the observation wells that
humidity, wind speed, tide level etc. (Yoon et al., 2011; Al-
are spatially distributed and are situated at discrete geographic
Zahrani and Abo-Monasar, 2015; Dos Santos and Pereira, 2014)
2 4
Ground Water Scenario In India, January 2016, Central Ground Water Board, ARMA:Auto Regressive Moving Average.
5
Ministry Of Water Resources, Government Of India. ARIMA: Auto Regressive Integrated Moving Average.
3 6
Source:https://grace.jpl.nasa.gov/data/get-data/monthly-mass-grids-land/. SARIMA: Seasonal Auto Regressive Integrated Moving Average.
A. Mukherjee, P. Ramachandran / Journal of Hydrology 558 (2018) 647–658 649
along with temperature and precipitation have also been used as database11 for the study period from January 2005 to December
explanatory variables. 2013. This well observation GWL data is available for four seasons
In the literatures that we’ve covered on groundwater research across the year. These seasons are Post-monsoon Rabi (January to
area, we have not been able to find any study that focuses on the March), Pre monsoon (April to June), Monsoon (July to September)
accuracy of DTWS for measuring GWL in India, and this serves as and Post-monsoon Kharif (October to December). For observation
the primary motivation for this paper, which is to establish the sites, this GWL data has not been available for all months of men-
relationship between GRACE derived DTWS and GWL measure- tioned seasons of the year. Some of the sites have multiple observa-
ments of observation wells for different geographic regions in tions in a season whereas some sites have only one observation in a
India. Also, we would like to compare the performances of different season. Also, this data is unevenly spaced in time, i.e., time gaps
machine learning methodologies (Linear Regression Model-LRM, between two consecutive observations are not equal. The total num-
Support Vector Regression-SVR and Artificial Neural Network- ber of observations also varies across sites and it ranges from 35 (for
ANN) both in terms of modelling and prediction of GWL with the Sathamba) to 67 (for Mhow). The observed GWL data for all selected
help of DTWS and other meteorological variables like Tempera- sites have been shown in Fig. 2.
ture, Precipitation, Wind, Humidity etc. Gravity Recovery and Climate Experiment (GRACE) is a joint
mission, launched in March 2002 by NASA12 and DLR13. The main
objective of the mission has been to accurately measure Earth’s grav-
2. Data & study area itational field for monthly intervals. GRACE mission consists of two
twin satellites (220 km apart from each other at 500 km altitude).
India is the seventh largest country of the world covering an The distance between these twin satellites get affected due to spatio-
area of 32,87,263 sq km. Latitudes of India extends from 8°40 North temporal variation of Earth’s gravitational field. The on-board K-
to 37°60 North and longitudes of India extends from 68°70 East to Band microwave ranging systems measures this inter-satellite dis-
97°250 East. Geologically India can broadly be classified into three tance. This measurement, associated with other ancillary data, pro-
major regions namely Himalayas & associated group of mountains, vides measurement of Earth’s gravity field. The variations of this
Indo-Gangetic Plain and Peninsular Shield. India consists of 29 gravity field are mainly caused by changes in Terrestrial Water Stor-
states and 7 union territories. Climate of India can be described age (TWS) (Syed et al., 2008; Scanlon et al., 2012; Rodell et al., 2007).
as tropical monsoon type. Average maximum temperature across TWS is a measurement that integrates Ground Water Storage (GWS),
India varies from 24.5 °C (in January-February) to 31.5 °C (in Soil Moisture (SM), Canopy Water Storage (CWS), Snow, Ice and
March-May) whereas average minimum temperature varies from Water in biomass (Panda and Wahr, 2015; Sun, 2013). GRACE
13.85° (in January-February) to 23.27 °C (in June-September)7. derived monthly DTWS estimates data (anomalies relative to
Average rainfall in India ranges from 41.87 mm (in January- 2004–2009 time-mean baseline) is available in ftp site of NASA Jet
February) to 887.48 mm (in June-September)8. Propulsion Laboratory (JPL)14since April 2002.
We have selected five different sites (Table 1) from different We also have collected monthly water content data from
geographic regions of India to study the relationship between GLDAS15 which includes snow content, total soil moistures at 4 lay-
GWL and DTWS. These sites have been selected widely apart in ers and canopy water storage. This data does not include ground
order to avoid any interrelation between the sites so that observa- water and surface water content. Like DTWS data, this GLDAS water
tions for each sites would be independent of each other. Coastal content data are anomalies related to January 2003 to December
areas have been avoided as for coastal areas other meteorological 2007 time averaged baseline. We have downloaded this data from
factors like tide level could affect GWL(Yoon et al., 2011). ftp site of NASA Jet Propulsion Laboratory16.
From central part of India, we have selected Mhow, which is Both GRACE DTWS and GLDAS water content data are
near Indore, most populous and the largest city of the state of Mad- expressed in terms of equivalent liquid water thickness (in cm).
hya Pradesh and is situated on the southern edge of the Malwa pla- As both GRACE DTWS and GLDAS water content data are available
teau. From the Jashpur district of Chhattisgarh, we have selected for 1° resolution grid, we have collected DTWS and GLDAS water
Kotba. This region is hilly and contains forest area. content data for the latitude and longitude grid that encompasses
Panitola from the Tinsukia district of Assam has been selected particular observation site. For example, for the site Panitola (Lat-
as another site. This region is located at north east part of the coun- itude: 27.49° North and Longitude: 95.26°East), DTWS and GLDAS
try and includes several rivers and reserve forests. This area is sit- water content data have been collected for the 1° resolution grid
uated on Brahmaputra River basin.Fig. 1. whose latitude covers from 26.5°North to 27.5°North and longi-
We have selected Sathamba as another site from the Sabarkan- tude covers from 94.5°East to 95.5°East. Also, only for those
tha district of Gujrat state and this area is located at the western months for which well observation data exists, DTWS and GLDAS
part of India. water content data have been collected.
From the southern part of the country, we have selected Sirigeri We have considered Temperature (both Maximum and Mini-
from Bellary district of Karnataka. This area is situated in the Dec- mum), Precipitation, Wind and Humidity as other meteorological
can Plateau of southern India and is endowed with rich mineral covariates along with DTWS to model GWL. Inclusion of Wind
resources. Tungabhadra is the main river in this region. and Humidity as covariate to model and predict GWL is based on
Central Groundwater Board9 maintains a database of well obser- the assumption that these variables could impact groundwater
vation GWL data, measured in terms of meters below ground level, demand which in turn may influence groundwater level. For this
from a network over 22000 observation wells10 across the country. purpose, we have collected meteorological data from Global
GWL data for the selected observation sites in this study, has been
downloaded from Water Resources Information System of India
11
http://www.india-wris.nrsc.gov.in; accessed on May27, 2015.
7 12
Ministry of Earth Sciences, India Meteorological Department (IMD);Time Period: National Aeronautics and Space Administration.
13
1901–2015. German Aerospace Centre: Deutsches Zentrum für Luft- und Raumfahrt.
8 14
Ministry of Earth Sciences, India Meteorological Department (IMD);Time Period: ftp://podaac-ftp.jpl.nasa.gov/allData/tellus/L3/land_mass/RL05.
15
1901–2013. GLDAS: Global Land Data Assimilation System- https://grace.jpl.nasa.gov/data/
9
http://www.cgwb.gov.in. get-data/land-water-content/.
10 16
Ground Water Scenario In India, January 2016, Central Ground Water Board, ftp://podaac-ftp.jpl.nasa.gov/allData/tellus/L3/gldas_monthly/netcdf/.
Ministry Of Water Resources, Government Of India.
650 A. Mukherjee, P. Ramachandran / Journal of Hydrology 558 (2018) 647–658
Table 1
Details of Selected Data Points for Study.
Name of the Site State District Latitude Longitude Principal Aquifer Systems1 Average GWL2(in mbgl)
Mhow Madhya Pradesh Indore 22.55°North 75.76°East Basalt 5.25
Kotba Chhattisgarh Jashpur 22.42°North 83.75°East Banded Gneissic Complex 3.73
Panitola Assam Tinsukia 27.49°North 95.26°East Alluvium 2.79
Sathamba Gujrat Sabar Kantha 23.18°North 73.33°East Alluvium 9.54
Sirigeri Karnataka Bellary 15.44°North 76.84°East Banded Gneissic Complex 3.01
1
Source: Aquifer Systems of India –http://cgwb.gov.in/AQM/.
2
Average GWL is the average value of GWL, measured in terms of meters below ground level (mbgl) for the time period from 1996 to 2016. Data Source: Central Ground
Water Board –http://www.cgwb.gov.in/GW-data-access.html.
3. Methodology
We have collected GWL data for the selected sites and the data
has not been available for all months in a year. Also, the dataset is
unevenly spaced time series data as the time gap between two
consecutive data points are not same across the time period. This
restricts us from applying standard statistical methodology like
Time Series Analysis (ARMA, ARIMA, SARIMA etc.) which requires
equal time gaps between two adjacent data points across the time
period. Therefore, we have used Linear Regression Model (LRM),
Support Vector Regression (SVR) and Artificial Neural Network
(ANN) in our study as all these methodologies could be applied
on the available data without such restrictions. Also, these meth-
ods have been widely employed in the previous literatures
(Adamowski et al., 2012; Tiwari and Adamowski, 2013;
Mirzavand and Ghazavi, 2015; Dall et al., 2014; Shamsudduha
et al., 2012; Chinnasamy and Agoramoorthy, 2015; Panda and
Wahr, 2015; Adamowski and Chan, 2011; Al-Zahrani and Abo-
Monasar, 2015; Dos Santos and Pereira, 2014; Moosavi et al.,
2013; Emamgholizadeh et al., 2014; Azadeh et al., 2011; Sun,
2013; He et al., 2014; Mohanty et al., 2015; Karthikeyan et al.,
2013; Daliakopoulos et al., 2005; Yoon et al., 2011).
17
http://globalweather.tamu.edu.
18
http://meteora.ucsd.edu/pierce/ncview_home_page.html.
19 20
https://cran.r-project.org/web/packages/ncdf4/ncdf4.pdf. Neter, John, et al. Applied Linear Statistical Models. Fifth Edition.
A. Mukherjee, P. Ramachandran / Journal of Hydrology 558 (2018) 647–658 651
Also the Uniformly Minimum Variance Unbiased Estimator tests of standardized residuals in Linear Regression Model, package
b UMVUE ¼ b
(UMVUE) for b and r2 are b c2 UMVUE ¼ SSE respec-
b MLE and r e107129 for modelling Support Vector Regression and package neu-
ðnpÞ
T
ralnet30 for Artificial Neural Network modelling). Python package
b MLE Þ ðY X b
tively; where SSE ¼ ðY X b b MLE Þ and p is the number of scikit-learn31 has also been used to build SVR and ANN models.
unknown parameters b. For all three methodologies used (LRM, SVR and ANN), we have
taken differences of GWL between two consecutive observations as
3.2. Support Vector Regression (SVR) output variable. As primary input parameter we have considered
differences of GRACE derived DTWS between two consecutive
For Support Vector Machine (SVM) Classification problem, the observations. Other associated input parameters considered are
goal is to find a hyperplane that separates different example Maximum Temperature, Minimum Temperature, Precipitation,
classes with maximum margin and for Support Vector Machine Wind and Humidity. Input variables and model structures have
Regression (SVR) problem, the goal is to construct a hyperplane been discussed in details in the following section.
that lies close to as many training data points as possible21.
For a set of N examples of fxk ; yk gNk¼1 ; x 2 Rm ; y 2 R,where x is
an input vector with m components and y is the corresponding 4. Model development
output value, the SVM estimator (f) on Regression can be expressed
as f ðxÞ ¼ w /ðxÞ þ b, where w is weight vector and b is the bias. 4.1. Input variables & model structure
/ðÞ is the transfer function that maps input vectors to a high
dimensional feature space where simple linear regression method For all selected sites, we have taken differences of GWL, DTWS
can be applied. Optimization problem to solve this equation22 and GLDAS water content between two consecutive observation.
becomes This difference of GWL, DTWS and GLDAS water content between
XN two consecutive observation represents the changes in GWL and
1
minimize kwk2 þ C ðf þ f Þ corresponding changes in DTWS and GLDAS water content. We
w;b;f;f 2 k¼1 have considered these differences of GWL, DTWS and GLDAS water
subject to yk wT /ðxk Þ b 6 þ fk ; wT /ðxk Þ þ b yk 6 þ fk content to be included in the model as the changes in DTWS and
ð1Þ GLDAS water content should relate to the corresponding changes
in GWL.
fk ; fk P 0; k ¼ 1; 2; . . . :N
As shown in the table (Table 2), the model input for GWL, DTWS
where f and f are slack variables that penalizes training errors over and GLDAS Water Content for an observation time t n , are
error tolerance . C determines the trade off between model com- ðGWLtn GWLtn1 Þ; ðTWStn TWStn1 Þ & ðGLDAStn GLDAStn1 Þ
plexity and degree to which deviations larger than are tolerated respectively. Where, GWLtn and GWLtn1 are observed GWL for asso-
in the optimization problem. ciated observation time tn and t n1 , similarly TWStn & TWStn1 are
Support Vectors are the input vectors (having non-zero Lagran- observed DTWS for the observation time tn and t n1 and GLDAStn
gian multiplier and satisfies KKT23 condition) that support the & GLDAStn1 are observed GLDAS Water Content data for observa-
structure of the estimator22 (Yoon et al., 2011). Kernel functions (K tion time t n and t n1 .
(xi ; xj Þ ¼ /ðxi Þ /ðxj Þ; where /ðÞ is the transfer function) that are From this point onwards, GWL, DTWS and GLDAS water content
used in SVR are in general inner product kernel functions like Poly- would refer to the differences of GWL, DTWS and GLDAS water
p
nomial (Kðx; xi Þ ¼ ðxT xi þ 1Þ ), RBF24 (Kðx; xi Þ ¼ expðð2r1 2 Þkx xi k2 Þ), content between two consecutive observations as explained above.
Sigmoid (Kðx; xi Þ ¼ tanhðb0 xT xi þ b1 Þ) etc. It has been found that there exists very high level of correlation
between DTWS and GLDAS water content for all selected sites
3.3. Artificial Neural Network – ANN (Table 3).
As DTWS and GLDAS Water Content are highly correlated for all
As it’s name suggests, Artificial Neural Network is developed selected sites, like previous studies (Rodell et al., 2009; Sun, 2013),
from biological nervous system. Input, hidden and output layers we also have not included GLDAS Water Content in our models.
with their nodes and activation functions are the basic elements GWL,DTWS,ðGWLtn GWLtn1 Þ and ðTWStn TWStn1 Þ data for all
of a generalized ANN structure25. In general, amþ1 ¼ observation sites have been shown in Fig. 2.
f
mþ1 mþ1
ðWmþ1 am þ b Þ for m ¼ 0; 1; . . . ðM 1Þ; a0 ¼ p; a ¼ aM ; where In order to study the relationship between GWL and DTWS, we
p is Input Variables and a is Network Outputs. Number of layers in have built two sets of models. In the first set of models, we have
the network structure is M. f; b and W are activation function, bias used only DTWS as the explanatory variable and GWL as depen-
and weight respectively. Back-Propagation25 and Resilient Back- dent variable (GWL DTWS) to build Linear Regression Model
Propagation (Rprop)26 are efficient and widely used algorithms to (LRM), Support Vector Regression (SVR) and Artificial Neural Net-
train an ANN model. work (ANN) models. For second set of models, we have used mete-
For the purpose of building and verifying the LRM, SVR and ANN orological variables (Maximum Temperature, Minimum
models as described in the above sections, we have used R soft- Temperature, Precipitation, Wind and Humidity) along with DTWS
ware27 and different R packages (Package nortest28 for normality as explanatory variables and GWL as dependent variable to build
Support Vector Regression (SVR) and Artificial Neural Network
21
(ANN) models. Maximum Temperature (MaxTemp), Minimum
From Regression to Classification in Support Vector Machines by Massimiliano
Pontil, Ryan Rifkin and Theodoros Evgeniou.
Temperature (MinTemp), Wind (Wind) and Humidity (Humid) val-
22
Smola, Alex J., and Bernhard Schlkopf. ‘‘A tutorial on support vector regression.” ues that have been used for an observation time tn in the SVR and
Statistics and computing 14.3 (2004): 199–222. ANN models, are the average values of respective variables
23
KKT Condition:Karush–Kuhn–Tucker Optimality Condition. between the observation times tn and tn1 , whereas the value of
24
RBF: Radial Basis Function.
25
Precipitation (Prcpt) used in the models for observation time tn is
Demuth, Howard B.et al., Neural network design, 2nd Edition.
26
Rprop-Description and Implementation Details, Martin Riedmiller, Technical
29
Report, January 1994. https://cran.r-project.org/web/packages/e1071/index.html.
27 30
https://www.r-project.org/. https://cran.r-project.org/web/packages/neuralnet/neuralnet.pdf.
28 31
https://cran.r-project.org/web/packages/nortest/index.html. http://scikit-learn.org/stable/index.html.
A. Mukherjee, P. Ramachandran / Journal of Hydrology 558 (2018) 647–658 653
Table 2
Variables (GWL, DTWS and GLDAS water content) in Model Structure.
Observation Observed GWL used in Model Observed DTWS used in Model Observed GLDAS Water GLDAS Water Content used in Model
Time GWL Structure DTWS Structure Content Structure
t1 GWLt1 NA TWSt1 NA GLDASt1 NA
t2 GWLt2 ðGWLt2 GWLt1 Þ TWSt2 ðTWSt2 TWSt1 Þ GLDASt2 ðGLDASt2 GLDASt1 Þ
t3 GWLt3 ðGWLt3 GWLt2 Þ TWSt3 ðTWSt3 TWSt2 Þ GLDASt3 ðGLDASt3 GLDASt2 Þ
order to avoid saturation. Normalized value for a data point X of a yi are observation pairs of x and y respectively for ith observation.
variable is calculated as X Normalized ¼ ðX X Min Þ=ðX Max X Min Þ, where Also correlation test needs to be performed (we have conducted
X Min and X Max are the minimum and maximum value of the variable Pearson correlation test at 95% Confidence Interval) to ensure exis-
for the entire dataset. Normalization would scale the values of the tence of correlation between two variables. The value of jqj varies
variables from 0 to 1, therefore after the development of the model, from 0 to 1 where value 1 signifies perfect correlation (+1 for perfect
all the fitted and predicted values for the dependent variable have positive correlation and 1 for perfect negative correlation) and
been inverted using the mentioned equation. value 0 indicates no correlation. Therefore a value of q close to +1
While developing the Linear Regression Models, we have per- implies high degree of correlation between fitted and actual value
formed residual analysis that includes normality tests for stan- (for training set) or between predicted and actual value (for testing
dardized residuals to ensure robustness of the models. SVR set). In other words, for training set, if fitted and actual values (pre-
models have been tuned for cost (C), error tolerance () and kernel dicted and actual values for testing set) are very close, the value of
functions in order to improve the performance of the model. Sim- correlation coefficient would be high (close to +1) indicating good
ilarly, ANN models have been tuned for hidden layer structure and fitness of the model (for training set) or good prediction performance
activation functions. (for testing set).
Root Mean Square Error (RMSE) is the measurement for devia-
tions of fitted and predicted (for training and testing set respec-
4.2. Training & test data set creation
tively) from actual values. Mathematical expression of RMSE is
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
For each observation site, we have created training and test data RMSE ¼ ð1n Rni¼1 ðyi zi Þ2 Þ; where n is the number of elements in
sets in 80:20 ratio. All models (LRM, SVR and ANN) for the two sets training or testing set, yi and zi are the actual and fitted or pre-
of model structures (GWL DTWS and GWL DTWS + MaxTemp dicted (for training or testing set respectively) values of the depen-
+ MinTemp + Prcpt + Wind + Humid + Prcpt_LAG), have been dent variable. For a perfectly fitted or predicted model (all values of
trained and tested with same training and testing dataset. 5-Fold yi and zi are same) RMSE is 0 and it increases as the deviation
cross validation has been performed in order to ensure the robust- between actual and fitted or predicted values of dependent vari-
ness and reliability of the models. able increases. Therefore, less value of RMSE indicates high level
of fitness and prediction (for training and testing set respectively)
4.3. Performance criteria and high value of RMSE indicates otherwise.
5.1. R2 and q values: GWL and DTWS Name of the Site R2 Value Adjusted R2 Value
Kotba 0.7428 0.735
For all our observation sites, we have found that DTWS is highly Mhow 0.3648 0.3549
significant variable for GWL and the R2 value of the Linear Regres- Panitola 0.6725 0.6649
Sathamba 0.6411 0.6299
sion Model GWL DTWS varies from 0.3648 (for Mhow) to 0.7428
Sirigeri 0.4905 0.4751
(for Kotba).
As R2 value explains the amount of variability in dependent
variable (GWL) could be explained by independent variable Table 5
(DTWS); we can observe that the variability of GWL explained by Relationship between GWL and DTWS: q Values between GWL and DTWS.
DTWS for selected observation sites varies from 36.48% (for Mhow)
Name of the Site q Value p-Value(Pearson Correlation Test)
to 74.28% (for Kotba). It could be observed from Table 4 that GWL
and DTWS is highly correlated and the q values between these two Kotba 0:8619 2:94e11
Mhow 0:6040 7:92e08
variables ranges from 0.6040 (for Mhow) to 0.8619 (for Kotba).
Panitola 0:8201 5:52e12
Negative values of q occurs because of the fact that increment in
Sathamba 0:8007 1:31e08
DTWS would indicate rise in the groundwater storage causing Sirigeri 0:7004 2:81e06
decrement in the measurement of GWL as it is measured in terms
of meters below ground level (mbgl). This negative correlation
between GWL and DTWS measurements could also be observed
in Fig. 2. This observation of existence of strong correlation
between GWL and DTWS is consistent with previous studies
(Shamsudduha et al., 2012Panda and Wahr, 2015).
One of the possible reasons for low value of R2 and q between
GWL and DTWS for Mhow and Sirigeri could be the presence of
two large reservoirs, namely Indira Sagar (Latitude: 22.24°North
and Longitude: 76.52°East) and Tungabhadra reservoir (Latitude:
15.24°North and Longitude: 76.31°East) respectively. Indira Sagar
reservoir is extremely close to the GRACE 1 degree resolution grid
for Mhow (Latitude: 22.5°North To 23.5°North and Longitude:
75.5°East To 76.5°East) and similarly Tungabhadra reservoir is very
close to the GRACE 1 degree resolution grid for Sirigeri (Latitude:
14.5°North To 15.5°North and Longitude: 76.5°East To 77.5°East).
Water Resource Information System of India (India-WRIS)33 main-
tains and publishes reservoir level data for India and it has been Fig. 3. Reservoir Storage Data (2005 To 2013): Indira Sagar Reservoir.
observed from this India-WRIS reservoir level data, that during the
time period of the study (2005 to 2013), Indira Sagar and Tungab-
hadra reservoirs have experienced average yearly fluctuation
(defined in terms of range, the difference between maximum and
minimum values) of 7.143BCM34 (equivalent to 7:143 1012 kg of
12
water) and 3.102BCM (equivalent to 3:102 10 kg of water)
respectively (refer to Figs. 3 and 4). These large variations of water
mass in Indira Sagar and Tungabhadra reservoir have not been
accounted in the GRACE DTWS data for the neighbouring GRACE 1
degree resolution grid for Mhow and Sirigeri respectively and thus
could affect the relationship between GWL and DTWS for these
two sites.
Fig. 5. Actual and Fitted Values for Training Set - Panitola; Model: GWL DTWS.
Fig. 6. Actual and Predicted Values for Test Set - Panitola; Model: GWL DTWS.
Table 6 Table 8
Train Data RMSE Model: GWL DTWS. Test Data RMSE Model: GWL DTWS.
Name of the Site LRM SVR ANN Name of the Site LRM SVR ANN
Kotba 0.1637(2) 0.1739(3) 0.1627(1) Kotba 0.1679(3) 0.1611(1) 0.1675(2)
Mhow 0.1497(3) 0.1423(1) 0.1430(2)
Mhow 0.1538(3) 0.1445(2) 0.1414(1)
Panitola 0.1434(3) 0.1433(2) 0.1425(1)
Panitola 0.1480(1) 0.1517(3) 0.1496(2)
Sathamba 0.1394(3) 0.1331(2) 0.1289(1)
Sathamba 0.1392(3) 0.1379(2) 0.1309(1)
Sirigeri 0.1510(3) 0.1487(2) 0.1432(1)
Sirigeri 0.1452(2) 0.1404(1) 0.1482(3)
AVERAGE RANK 2.8 2.0 1.2
AVERAGE RANK 2.4 1.8 1.8
Table 9 Table 10
Test Data q Model: GWL DTWS. Train Data RMSE Model: GWL DTWS + MaxTemp + MinTemp + Prcpt + Wind +
Humid + Prcpt_LAG.
Name of the Site LRM SVR ANN
Name of the Site SVR ANN
Kotba 0.8727(3) 0.9017(1) 0.8756(2)
Mhow 0.6277(3) 0.6416(2) 0.6577(1) Kotba 0.0962(2) 0.0565(1)
Panitola 0.8173(3) 0.8192(1) 0.8176(2) Mhow 0.1162(2) 0.1103(1)
Sathamba 0.8424(3) 0.8441(2) 0.8494(1) Panitola 0.1236(2) 0.1178(1)
Sirigeri 0.7386(2) 0.7531(1) 0.7366(3) Sathamba 0.0772(1) 0.0859(2)
Sirigeri 0.1183(1) 0.1194(2)
AVERAGE RANK 2.8 1.4 1.8
AVERAGE RANK 1.6 1.4
data set (Tables 12 and 13), it could easily be observed that SVR AVERAGE RANK 1.6 1.4
clearly outperforms ANN in terms of both RMSE and q, for all
observation sites.
In most cases, both modelling and prediction performance of Table 12
SVR has improved significantly with inclusion of meteorological Test Data RMSE Model: GWL DTWS + MaxTemp + MinTemp + Prcpt + Wind +
variables though this could not be concluded for ANN due to it’s Humid + Prcpt_LAG.
inherent network structures as the network structures are differ- Name of the Site SVR ANN
ent for two sets of models. Kotba 0.1437(1) 0.1878(2)
Mhow 0.1394(1) 0.1481(2)
Panitola 0.1306(1) 0.1455(2)
5.4. Summary & conclusions
Sathamba 0.1206(1) 0.1424(2)
Sirigeri 0.1475(1) 0.1500(2)
Finally, to conclude, we have observed in our study that DTWS
AVERAGE RANK 1.0 2.0
is a significant variable to model and predict GWL for the selected
observation sites in India. Particularly for a small and irregular
Fig. 7. Actual & Fitted Values for Training Set-Panitola; Model: GWL D TWS + MaxTemp + MinTemp + Prcpt + Wind + Humid + Prcpt_LAG.
Fig. 8. Actual & Predicted Values for Test Set-Panitola; Model:GWL DTWS + MaxTemp + MinTemp + Prcpt + Wind + Humid + Prcpt_LAG.
A. Mukherjee, P. Ramachandran / Journal of Hydrology 558 (2018) 647–658 657
Taylor, R.G., Scanlon, B., Döll, P., Rodell, M., Van Beek, R., Wada, Y., Longuevergne, L., Tiwari, M.K., Adamowski, J.F., 2014. Medium-term urban water demand forecasting
Leblanc, M., Famiglietti, J.S., Edmunds, M., et al., 2013. Ground water and with limited data using an ensemble wavelet bootstrap machine-learning
climate change. Nature Climate Change 3 (4), 322–329. approach. J. Water Resour. Plann. Manage. 141 (2), 04014053.
Teutschbein, C., Grabs, T., Karlsen, R.H., Laudon, H., Bishop, K., 2015. Hydrological Yoon, H., Jun, S.-C., Hyun, Y., Bae, G.-O., Lee, K.-K., 2011. A comparative study of
response to changing climate conditions: spatial streamflow variability in the artificial neural networks and support vector machines for predicting
boreal region. Water Resour. Res. 51 (12), 9425–9446. groundwater levels in a coastal aquifer. J. Hydrol. 396 (1), 128–138.
Tiwari, M.K., Adamowski, J., 2013. Urban water demand forecasting and uncertainty
assessment using ensemble wavelet-bootstrap-neural network models. Water
Resour. Res. 49 (10), 6486–6507.