Birkel 2014
Birkel 2014
Birkel 2014
of water and solute transport [Mroczkowski et al., 1997; Page et al., 2007]. Kirchner [2006] argued that simple, low-
parameterized models based on dominant process conceptualization can provide a preferable, more realistic rep-
resentation of catchment functioning with identifiable parameters and reduced predictive uncertainty. Such an
approach can also use additional data such as tracers [Seibert and McDonnell, 2002], though there is a trade-off in
terms of additional parameters that maybe needed to be added [Soulsby et al., 2008].
Integration of tracer-based information into conceptual modeling has many potential advantages. For exam-
ple, Vache and McDonnell [2006] used tracers to evaluate the effects of model structure and parameterizations
on flow simulations. More recently, Dunn et al. [2010] and McMillan et al. [2012] examined the effect of model
structures on transit time distributions (TTDs) calculated from coupled conceptual flow and solute transport
models. A central challenge to such initiatives is how to parameterize different mixing assumptions that
impact the transport of solutes and associated travel times [Fenicia et al., 2010]. Further, work by Hrachowitz
et al. [2010a] and Heidb€ uchel et al. [2012] demonstrated the effect of different wetness regimes and the impor-
tance of evapotranspiration fluxes on the time variance of mixing and transport processes. Most of these stud-
ies have been restricted to the use of tracer input-output relationships for model assessment. Therefore, the
potential of tracer measurements and how this can be implemented into runoff models as internal states
(e.g., from different soils and groundwater) is relatively unexplored. Such models may be used to investigate
how landscape heterogeneity mediates the effects of spatial and temporal variability of mixing and transport
processes at the catchment scale [Rinaldo et al., 2011]. Exceptions include Katsuyama et al. [2009] who used
spatially distributed tracer data from a small Japanese catchment to inform a catchment model. Similarly,
Sayama and McDonnell [2009] used a tracer-based model framework to account for the sources of water con-
tributing to the hydrograph. Additionally, Birkel et al. [2011a] used tracers measured in groundwater to test
internal model state variables and constrain predictions. Incorporation of such spatially distributed, high-
frequency tracer sampling over prolonged periods can be a ‘‘more powerful’’ test of models to reproduce
internal states rather than relying on discharge alone [Mroczkowski et al., 1997].
As tracer data becomes more readily available, opportunities increase for calibrating specific model parame-
ters against observed tracer data (e.g., for soil water or groundwater). While this has been done for hydro-
metric data (e.g., water table levels, soil moisture, etc.) [Freer et al., 2004], tracer data have greater
unrealized potential for calibration based on internal model states and fluxes that contribute to streamflow,
rather than just using tracers measured in streams [Beven, 2012a]. While this may not necessarily result in
improved model performance in terms of flow predictions, it can result in improved parameter identifiabil-
ity and help to reveal model inadequacies [Gupta et al., 1998], for example in terms of representation of
internal storage and fluxes [Price et al., 2012]. The latter is an essential part of conceptual modeling applied
in a learning framework, where identification of errors and rejection of hypotheses can provide improved
understanding of catchment function and subsequent models [Box, 1976; Gupta et al., 2008; Beven, 2012b].
In this paper, we use newly available water isotope data from different soil types and depths, and ground-
water level data collected over a full year to develop a parsimonious (low parameter), conceptual rainfall-
runoff model with coupled tracer transport modules for an upland catchment in Scotland (see Tetzlaff et al.
[2014] for details on the study site and measurements). The model evolved from previous investigations
which focused on simulating the runoff response and tracer composition of stream waters [Birkel et al.,
2010, 2011a]. The new data were used with the overall aim of testing the effect of different types of data on
the calibration of the coupled flow-tracer model in order to attain a conceptualization of internal model
state variables and fluxes that was broadly consistent with empirical data. The specific objectives were to:
(a) test alternative targets in addition to discharge for model calibration and evaluate internal model state
variables and fluxes based on the incorporation of key hydrometric and isotopic data; (b) examine how
parameter identifiability, sensitivity, and the predictive power of flow and tracer simulations are related to
the amount and type of data used for calibration; and (c) investigate the effects of different calibration data
on using the model to estimate stream water age using flux tracking.
Figure 1. Air photo of the Bruntland Burn experimental catchment showing distinct vegetation patterns related to wetness (green areas
can be associated with greater wetness) in the valley bottom saturation areas and the location of instrumentation at the catchment outlet
and the hillslope transect (1—deep histosols (peat soil) in the saturation area, 2—Shallow histosols (peaty gley soil) in the transition zone,
and 3—the freely draining spodosols on the steeper hillslope).
of northern upland areas with igneous (46% granite) and metamorphic (47% pelite) geology and a glacial
legacy (Figure 1). The resulting landscape has steep hillslopes and wide, flat valley bottoms (at around 200
m). Geophysical surveys show that the latter are filled with thick layers (>20 m) of glacial drift (mainly
coarse-textured till), where poorly drained histosols (peats 2 m deep and peaty gley soils 70 cm deep)
have developed. These soils fringe the channel network, maintaining saturated conditions all year and facili-
tating rapid near-surface runoff generation processes, especially saturation overland flow [Birkel et al., 2010;
Tetzlaff et al., 2008]. The steeper slopes have freely draining spodosols (<1 m deep) grading at altitudes
>350 m to thin (<40 cm) inceptisols which facilitate groundwater recharge [Soulsby et al., 1998].
Runoff dynamics are controlled by the expansion and contraction of the saturated riparian areas as a result
of direct precipitation and groundwater seepage from upslope [Birkel et al., 2011a]. The hydropedology is
apparent from the spatial distribution of vegetation communities representing distinct ecohydrological
units (Figure 1). The spodosols are dominated by heather (Calluna vulgaris) moorland; grasses (Molinia caer-
ulea) indicate the wet soils in the valley bottom while deeper histosols are dominated by mosses (Spagnum
spp). Mean annual precipitation is around 1000 mm with 363 mm actual evapotranspiration. Snow generally
accounts for <5% of precipitation and major snowfall events are unusual [Capell et al., 2013].
Table 1. Summary Statistics of Rainfall-Runoff Characteristics, Groundwater Levels, Additional Hydrometeorological Variables, and Unweighted Isotope Time Series (With n Number of
Samples and Coefficient of Variation CV) Incorporated Into the Modeling Procedure for the Calibration Period From 1 June 2011 to 31 May 2012a
Data Unit n Mean Range [min, max] CV
Hydrology and Hydrometeorology
Discharge (Q) mm d21 366 1.63 [0.31, 11.3] 1.34
Precipitation (P) mm d21 366 2.41 [0, 32.5] 1.9
Evapotranspiration (ET) mm d21 366 1.26 [0, 6.2] 0.92
Air temperature (AT) C 366 7.2 [27.1, 17.9] 0.64
Relative humidity (h) % 366 82.8 [55, 98] 0.09
Saturation area extent % catchment area 366 11 [2, 40] 0.81
Groundwater level 1 cm 300 23.7 [25.9, 0.2] 0.27
Groundwater level 2 cm 345 218.6 [242.6, 211.7] 0.3
Isotopic (d2H)
Stream & 317 258.1 [265.8, 253.6] 0.03
Rain & 192 256.3 [2143, 212.6] 0.41
Soil water 1 (in saturation area; 10 cm depth) & 45 255.9 [261.6, 250.7] 0.04
Soil water 1 (in saturation area; 30 cm depth) & 44 259.2 [261.8, 256.5] 0.02
Soil water 3 (0–20 cm) & 47 258.8 [282.9, 244.4] 0.16
Soil water 3 (40–60 cm) & 43 259.3 [266.7, 253.9] 0.05
Groundwater wells & 42 261.2 [263.2, 258.3] 0.02
a
Numbers 1 and 3 refer to sampling stations: 1 5 Deep Histosol (peat soil) in saturation area; 2 5 Shallow Histosol (peaty gley) in transition zone; and 3 5 freely draining spodosol
on steeper hillslope.
weather station 1 km from the BB was used for daily potential evapotranspiration estimates (Penman-Mon-
teith) and to derive mean catchment precipitation.
Three monitoring stations were established to characterize the soil moisture and groundwater dynamics
and associated isotope characteristics along a transect from the valley bottom to the upper slope (Figure 1).
The location was based on mapping the main hydropedological units [Boorman et al., 1995] and is represen-
tative of many catchments in the Scottish Highlands [Tetzlaff et al., 2014]. The transect forms a catena of
deep histosols (peats) in the valley bottom (site 1), shallow histosols (peaty gleys) in the lower slope (site 2),
and spodosols (site 3) on the upper slope.
Stations were equipped with two capacitance loggers for comparative recording of groundwater levels (15
min) in a screened well. Pairs of suction lysimeters (Rhizosphere MacroRhizons) were installed at 10, 30, and
50 cm soil depths (where volumetric moisture content was also measured using Campbell TDR probes) and
sampled over an hour under low suction once per week for isotope analysis. Site 1 was only sampled at 10
and 30 cm depth. We also sampled two groundwater wells for isotopes; one close to the BB outlet and one
west of site 3 at the same elevation [Birkel et al., 2011a]. Samples were analyzed for deuterium (d2H) and
oxygen-18 (d18O) using a Los Gatos DLT-100 laser spectrometer (precision 6 0.4& for d2H and 60.1& for
d18O). Results are reported in the delta-notion calibrated to Vienna Standard Mean Ocean Water (VSMOW)
standards. Replicates of soil and ground water isotopes were averaged and summaries of all measurements
are in Table 1. Daily precipitation and streamflow isotopes from 1 October 2008 to 30 September 2009 were
used as an independent model test period.
Figure 2. Time series of (a) rainfall and (b) runoff and evapotranspiration. Isotopic composition of (c) precipitation (weighted, shown as
proportional circles of deuterium flux) against stream water and (d) soil waters in the freely draining podzol on the steeper hillslope (S3)
and the riparian peat soil (S1) at 10 cm (upper horizon) and 30 cm (deeper horizon) depths relative to stream water.
horizons at 10 cm depth showed more marked variability tracking isotope inputs in precipitation most nota-
bly at site 3 (Figure 2d). This variability was damped in deeper horizons with more negative signatures,
reflecting recharge in larger winter events with low evaporation; features that were very similar to sampled
groundwater (Table 1).
Precipitation isotopic composition plots close to the Global Meteoric Water Line as indicated by the Local
Meteoric Water Line (LMWL), whereas soil waters indicate a slight deviation in slope (Figure 3). Soil water
isotope signatures become heavier during warmer, dry periods consistent with evaporative fractionation.
Surprisingly, deeper soil horizons (S3_50cm and S1_30cm) indicate a greater change in slope bracketing
the stream isotope local evaporation line (LEL) compared to the superficial measurements at 10 cm depths.
This probably explains the deviation of the stream LEL from meteoric waters; however, this was not as
extreme as that reported by Birkel et al. [2011a] for the warmer year of 2008/2009. The spread of soil water
isotope samples at sites 1 and 3 captures the range of variability observed, which is greatest in the 10 cm
deep samples and markedly damped even at 30 cm (for S1) and 50 cm (for S3). Stream water samples plot
entirely within the range of the sampled soil waters, and are almost entirely covered by samples collected
at site 1 (10 and 30 cm). In contrast, the soil samples at site 3 10 and 30 cm (not shown) were much more
variable than the stream. This is consistent with the saturation area in the valley bottom area providing a
large storage that mixes different source waters before contributing to streamflow [Tetzlaff et al., 2014]. Pre-
vious modeling studies here supported this hypothesis and provided the basis for further testing in the
model development described below [Birkel et al., 2011a].
Figure 3. Isotope (d2H and d18O) signatures of rainfall, stream water, soil water in the riparian peat (S1; 10 and 30 cm depth), and the freely
draining podzol on the steeper hillslope (S3; 10 and 50 cm depth). The dashed lines represent the global meteoric water line (GMWL) and
the local meteoric line (LMWL). Stream and soil isotopes were used for construction of Local Evaporation Lines (LEL). Please note that col-
ors match with Figure 1.
data—we test the effect of different data and objective functions (OFs) used in calibration on simulations of
model states, fluxes, and water age (section 4.3.). This was evaluated using local sensitivity analysis for visu-
alization and global sensitivity methods to estimate identifiability and the relevance of parameters for cali-
bration (section 4.4.). We also test the model calibrated to the 2011/2012 study period for the rainfall-runoff
isotope data from 2008/2009 and vice versa. Analysis used the R language [R Core Team, 2013].
Subsequently, catchment precipitation P and evapotranspiration ET are partitioned into the hillslope (Pup,
ETup) and saturation area (Psat, ETsat) according to the extent of fSAT (equations (2–5)).
Previous geochemical tracers identified distinct groundwater and soil water sources contributing to runoff
[see Birkel et al., 2010]. This was conceptualized in the original model as an additional linear groundwater
Figure 4. Conceptual diagram of the simple model structure developed indicating internal state variables (active storage components Sup,
Slow, and Ssat with their respective calibrated passive storage parameters upSp, lowSp, and satSp resulting in the storage concentrations
cSup, cSlow, and cSsat) and fluxes (Q1, Re, Q2, and stream Q with c indicative of concentrations) with model parameters in red. Note that c
refers to the coupled isotope transport and that observations representative of model states and fluxes used for calibration are shown in
the lower right corners of each of the storage boxes.
reservoir with contributions to annual streamflow 30–40%. The model operated on daily time steps with a
total of five parameters.
Table 2. Algorithms Involving Hydrometric and Isotope Data in Parsimonious Model Structure of the Evolving Bruntland Burn Model
Representative
Model Components Perceptual Model Conceptual Model Equation Observations References
Hillslope (Sup, Slow)—saturation Distinct topography, geology dSup 6 Soil isotope and ground- Tetzlaff et al. [2008];
5Pup 2ETup 2Re 2Q1
area (Ssat) reservoirs conform and soils. Hillslopes drain into dt water level modified according
the active storage (Sn) saturation area and recharge dSlow 7 observations to Birkel et al. [2010]
5Re 2Q2
(Re) groundwater (Slow) dt
dSsat 8
5Psat 2ETsat 1Q1 1Q2 2Q
dt
Evapotranspiration (ET) Vegetation adjusted Penman- DRn 1qa cp ðde Þra21 9 Meteorological station Dunn and McKay [1995]
ET 5
Monteith equation D1c 11 rrac Lv
Interception (I) Intercepted precipitation subject if ðP > ET Þ; I5P2ET 10 Empirical observations
to evaporation. Soil moisture if ðP < ET Þ; I5P2ET; P50; Sn 2ET 11
losses due to transpiration
and evaporation
Runoff generation Nonlinear streamflow genera- Q1 5Sup a 12 Groundwater level Modified according to
tion (Q), linear contributing Re 5Sup r 13 observations and Birkel et al. [2011a]
fluxes (Recharge Re, Q1, Q2) Q2 5Slow b 14 measured discharge
Q5k ðSsat Þ11a 15 at outlet
Mixing model Predominantly complete mixing MVup 5upSp ð12fSAT Þ 16– Soil isotope observa- Modified according to
due to high wetness with MVrip 5satSp fSAT 17 tions and isotopes in Birkel et al. [2011a]
X
some degree of preferential S5 Sn 1Sp 18 stream
recharge (cPRe). Dynamic mix- d ðcS Þ X X 19
ing volumes (MV) according 5 cI;j Ij 2 cn Ok
dt j k
to wetness state
d cSup 20
5 cP Pup 2cE ETup 2cQ1 2cP Re
dt
Isotopic fractionation Potential for fractionation in sat- ac2hcA 2e 21 Stream and surface sat- Gibson and Edwards
cE;up ; cE;sat 5
uration areas and upper soil 12h11023 eK uration area Local [2002]; modified
horizons Evaporation Lines according to Birkel
X
N Qn ðtj Þ et al. [2011a]
Age dating Distribution of water age (pF,Q) pF;Q 5 pF;QN tj 2ti ; tj Qðt jÞ
22 Hrachowitz et al. [2013]
of all contributing fluxes N to n51
total discharge Q
from the saturation areas connected to the stream. This also allows mixing with resident waters in a way
conceptually similar to Seibert et al. [2009].
For isotope transport, each model reservoir allows complete mixing of tracers with an additional storage
volume (Sp) that does not contribute hydraulically to streamflow generation (equation (18)). Complete mix-
ing can be considered a reasonable first approximation here: generally wet conditions on the hillslopes and
the large storage volumes in organic soils and drift deposits mean that catchment-scale water storage in
the upper 60 cm of the soils (300 mm) is large relative to daily precipitation [Tetzlaff et al., 2014]. However,
we relaxed the assumption of complete mixing [Hrachowitz et al., 2013] using available dynamic mixing vol-
umes (MV) dependent on the catchment wetness state and saturation area extent (equations (16) and (17)).
This assumes that the expansion of the saturation areas results in greater available mixing volumes. Further,
the saturated hillslope reservoir (Slow) was recharged with the isotopic content of rain rather than soil water
(equation (20)). This was based on observations of depleted deep soil and groundwater isotope values
(Table 1) resulting from preferential recharge during intense, isotopically depleted events. This is supported
by deeper soil horizon isotope signatures being more depleted than stream water (see mean and range in
Table 1 and Figures 2d and 3).
Given potential isotopic fractionation of the LEL resulting from evaporation in the upper and lower soil hori-
zons in both the hillslope and saturation area (Figure 3), we accounted for this in both reservoirs. This
explicitly calculates the isotopic content of vapor losses (cE, equation (21)), a process neglected in most
models. Finally, we tracked the age of water fluxes contributing to discharge (equation (22)) [Hrachowitz
et al. 2013]. This tests the impact of different data used for model calibration on stream water age estimates
[Dunn et al., 2010].
The final daily flow-tracer model uses eight calibrated parameters; five are used to simulate discharge (a, b,
r, k, and a) and three additional mixing volumes (upSp, lowSp, and satSp) simulate tracer transport in each
reservoir (Figure 4). The model was applied to the period from 1 October 2008 to 31 May 2012. However,
Table 3. Model Performances in Terms of the Euclidean Distance (ED, n Number of Combined Objective Functions Evaluating s States and f Fluxes) and Initial, Mean, and Posterior
5th/95th Parameter Percentiles are Shown According to Hydrometric, Isotopic, and Combined Data Sets Used for Calibration
Calibration data a (day21) b (day21) r (day21) k (day21) a (–) upSp (mm) satSp (mm) lowSp (mm) Best ED
Hydrometric
1(a) (ns52) [0,1]; [0,1]; [0,1]; [0,1]; [0,2]; 0.55
0.51; 0.14; 0.18; 0.72; 0.08;
[0.47,0.64] [0.13,0.15] [0.14,0.40] [0.7,0.8] [0.05,0.12]
1(b) (nf53) [0,1]; [0,1]; [0,1]; [0,1]; [0,2]; 0.77
0.23; 0.04; 0.93; 0.06; 0.63;
[0.14,0.22] [0.009,0.027] [0.78,0.99] [0.02,0.11] [0.15,0.92]
1(c) (ns52, nf53) [0,1]; [0,1]; [0,1]; [0,1]; [0,2]; 1.12
0.28; 0.05; 0.93; 0.04; 0.68;
[0.17,0.54] [0.009,0.04] [0.7,0.99] [0.03,0.08] [0.11,0.93]
Isotopic
2(a) (nf56) [0,1]; [0,1]; [0,1]; [0,1]; [0,2]; [1,5000]; [1,5000]; [1,5000]; 0.65
0.05; 0.09; 0.07; 0.29; 0.84; 240; 346; 3116;
[0.02,0.07] [0.006,0.78] [0.06,0.1] [0.02,0.43] [0.59,1.37] [170,201] [141,1192] [2728,4748]
2(b) (nf56, ns52) [0,1]; [0,1]; [0,1]; [0,1]; [0,2]; [1,5000]; [1,5000]; [1,5000]; 0.73
0.02; 0.14; 0.07; 0.09; 0.6; 195; 163; 3998;
[0.009,0.09] [0.1,0.2] [0.05,0.09] [0.01,0.81] [0.51,0.74] [170,223] [67,274] [2820,4625]
Combined
3(a) (ns55) [0,1]; [0,1]; [0,1]; [0,1]; [0,2]; [1,5000]; [1,5000]; [1,5000]; 1.03
0.21; 0.014; 0.92; 0.03; 0.73; 4013; 160; 3580;
[0.1,0.51] [0.007,0.016] [0.77,0.99] [0.02,0.1] [0.13,0.94] [3125,4858] [131,278] [1974,4200]
Calibration 2008/2009 0.12; 0.0076; 0.74; 0.1; 0.7; 1618; 610; 2633; 0.99
[0.11,0.15] [0.007,0.008] [0.67,0.8] [0.015,0.24] [0.3,1.26] [517,3365] [262,1507] [1343,4068]
3(b) (nf58, ns55) [0,1]; [0,1]; [0,1]; [0,1]; [0,2]; [1,5000]; [1,5000]; [1,5000]; 1.62
0.04; 0.27; 0.14; 0.06; 0.42; 280; 305; 3765;
[0.01,0.05] [0.07,0.37] [0.07,0.27] [0.03,0.1] [0.23,0.77] [162,292] [10,705] [1301,4714]
calibration only used the study year 1 June 2011 to 31 May 2012 where the groundwater and soil isotope
data were available. Incorporation of the tracer data for evaluating internal model state variables and fluxes
is shown in Figure 4. Initial tracer composition in the different stores was set to the observed means in Table 1.
The preceding 2.5 years were used as a warm-up period minimizing the impact of initial values on the final
year used for analysis.
We used a range of OFs to evaluate different aspects of the simulated hydrograph, storage dynamics, and
isotope dynamics. For discharge, the Nash-Sutcliffe Efficiency criterion (NSE) was used along with the NSE
(lnNSE) applied to log transformed discharge and the volumetric error (VE) in order to match high and low
flows as well as hydrograph dynamics [Criss and Winston, 2008]. Groundwater levels from the saturation
area (GWL 1 from S1) and hillslope (GWL 2 from S2) are used to fit the observed storage dynamics using the
coefficient of determination (R2—defined as the square of the Pearson correlation coefficient). Isotope sim-
ulations were evaluated using the R2 and VE criteria. Calibration was achieved using a Differential Evolution
(DE) genetic algorithm for parameter sampling [Price et al., 2005]. Evolutionary algorithms based on natural
selection are used where parameter populations are transformed over successive generations using arith-
metic operations to minimize an OF. However, recognizing that a global minimum for a model with large
numbers of parameters will be difficult to achieve [Beven, 2006], we use the DE algorithm as an advanced
parameter sampling technique. Therefore, initial parameter ranges were intentionally wide (Table 3) allow-
ing the search algorithm to initially explore the parameter space. The best parameter population after every
50 evaluations (parameter generation) was retained for further assessment of model parameter variability,
parameter sensitivity, and identifiability. The 5th and 95th percentiles of randomly chosen model state and
flux simulations generated from a subset of retained parameter sets were used to show the posterior vari-
ability of model parameters due to different calibration targets.
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X ffi
ED5 ð12OFn Þ2 : (23)
n
ED can be used in the form of the sum of the formerly mentioned n number of OFs (e.g., NSE, VE, etc.) based
on data available for representing simulated model states s and fluxes f. With subscripts s and f, we indicate
if n objective functions were used to evaluate model states and/or fluxes, respectively. Thus, calibration
using the groundwater level data with R2 evaluating the fit to observed data from the two wells results in
ns 5 2. If the former was to be combined with the discharge NSE, this would result in n 5 3 (ns 5 2; nf 5 1).
The ED is subsequently minimized by the evolutionary algorithm with ED 5 0 indicating a perfect fit.
To assess the effect of combining different types of data and objective functions on model calibration, we
used global sensitivity analysis to rank the parameters of each tested combination of ED and explore
whether the most balanced model can be found in terms of sensitivity. We used the Latin Hypercube (LH)
one-factor-at-a-time (LH-OAT) global sensitivity analysis method of van Griensven et al. [2006] which com-
pares well with other methods [e.g., Sobol, 1993; Shin et al., 2013]. The LH random sampling is a stratified
Monte Carlo approach to reduce the need for large numbers of simulations. This was coupled with the ran-
dom OAT design from Morris [1991] where effects on model parameters are randomly assessed over the
total input range. The final effects can be ranked with the most important to the least important rank
according to the total number of parameters used.
Temporal sensitivity was calculated from the local effect of parameters on simulated output variables in the
form of a sensitivity matrix S [Brun et al., 2001; Soetaert and Petzold, 2010]:
@yi DHj
Si;j 5 ; (24)
@Hj Dyi
where the i,jth element of the normalized sensitivities in the matrix contains an output variable yi at a cer-
tain time calculated by a model parameter Hj. The delta D indicates scaling of both components. The higher
the sensitivity value the greater the parameter’s influence on model output. For example, if two parameters
reflect similar dynamics or sensitivity close to zero, the effect of the parameters is indistinguishable and has
no effect on model outcome. Identification of such periods may be related to model structural deficits. This
Table 4. Optimal Calibrated Model Parameter Sets From Minimal Euclidean Distance (ED, n Number of Combined Objective Functions Evaluating s States and f Fluxes) of Selected
Evaluation Criteria (R2, NSE, lnNSE, and VE) for Hydrometric, Isotopic, and Combined Data Sets Used in Calibrationa
Hydrometric Simulations Isotopic Simulations
2 2
R (Soil_d H: VE (Soil_d2H:
Optimal Parameter Set 1_10cm, 3_10cm, 1_10cm, 3_10cm,
Calibration Data [a, b, r, k, a, upSp, satSp, lowSp] ED R2 (GWL 1, 2) NSE (Q) lnNSE (Q) VE (Q) 3_50cm, Q_d2H) 3_50cm, Q_d2H)
Hydrometric
1(a) (ns52) [0.5, 0.14, 0.14, 0.7, 0.1] 0.55 (0.53, 0.70) 0.04 0.44
1(b) (nf 5 3) [0.21, 0.009, 0.99, 0.01, 0.92] 0.77 (0.13, 0.97) 0.54 0.51 0.63
1(c) (ns 5 2, nf 5 3) [0.23, 0.009, 0.92, 0.03, 0.76] 1.12 (0.15, 0.69) 0.52 0.56 0.64
Isotopic
2(a) (nf 5 6) [0.02, 0.06, 0.06, 0.02, 0.87, 179, 141, 2991] 0.65 (0.36, 0.93) 0.33 0.41 (0.81, 0.83, 0.48, 0.98) (0.96, 0.91, 0.93, 1.01)
2(b) (nf 5 6, ns 5 2) [0.009, 0.18, 0.07, 0.02, 0.59, 179, 80, 4619] 0.73 (0.25, 0.91) 0.61 0.86 (0.71, 0.83, 0.47, 0.63) (0.96, 0.91, 0.94, 0.98)
Combined
3(a) (ns 5 5) [0.21, 0.009, 0.99, 0.03, 0.8, 4857, 127, 3870] 1.03 (0.13, 0.97) 0.53 0.56 0.63 (0.84, 0.99, 0.79, 0.63) (0.98, 0.95, 1.03, 0.97)
3(b) (nf 5 8, ns 5 5) [0.04, 0.009, 0.11, 0.03, 0.89, 184, 106, 4670] 1.62 (0.3, 0.7) 0.52 0.53 0.63 (0.73, 0.83, 0.47, 0.64) (0.95, 0.91, 0.94, 0.97)
a
Model performance is shown for all evaluated model flux and state variables obtained by calibration (in bold). The consistency criteria C in underlined Italics font represent simu-
lated model fluxes and internal states which were not explicitly calibrated using state representative data.
estimation was implemented using the R package ‘‘Calibration, Sensitivity and Monte Carlo Analysis (FME)’’
by Soetaert and Petzold [2010].
This can also help assess the identifiability of parameters; local sensitivity (the effect of one parameter on
an output variable at a time) analysis is extended to a multivariate context by assessing the colinearity (the
linear dependence) of all parameter sets:
1
c5 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h T iffi ; (25)
min EV ^S ^S
where the colinearity index is based on the Eigenvalues EV of ^S. ^S contains the sensitivity matrix Si;j (equa-
tion (24)) that corresponds to the parameters included in the model. The higher the value of c the more
dependent are the parameters, in which case, dependent parameters could potentially generate similar out-
put variables. Parameter sets are considered independent—and thus identifiable—for a colinearity index of
one (orthogonal) and reach infinity for highest dependency. Brun et al. [2001] suggest that the range of the
colinearity index from 10 to 15 is indicative of identifiable model parameters, whereas Soetaert and Petzold
[2010] refer to a threshold value below c 5 20 to detect identifiability.
5. Model Results
5.1. Simulations of Water and Isotope Fluxes
Table 4 summarizes the OFs resulting from the different calibration targets. Figure 5 shows the resulting
simulated discharge, storage dynamics, soil isotopes, and stream isotopes expressed as the 5th/95th per-
centiles of simulations obtained from parameter sets during the calibration (shown in Table 3). Generally,
the dynamics of discharge (high and low flows) are captured by the simulations resulting in best fit values
of 0.52 for the NSE and lnNSE and VE 5 0.63 for the calibrated model. However, smaller discharge peaks
occurring specifically during and after drier periods are not well reproduced. Further, larger peak flows in
July and December 2011 are not simulated; most likely due to measurement errors (i.e., recorded precipita-
tion is insufficient to reflect discharge magnitudes) which probably reflect snowmelt influence in December
and under-catch in July convective rain. The model was tested quite successfully for flows against the earlier
data from 2008/2009 with improved NSE of 0.62 (Table 5), though similar limitations in simulating smaller
events was evident (Figure 6a). The model 3(a) calibrated against this earlier year, could also predict quite
well the 2011/2012 runoff with only slightly lower efficiencies for most OFs.
Model internal states and fluxes were also simulated in both the calibration and independent test periods.
Temporal dynamics of the simulated unsaturated hillslope storage Sup show reasonable similarity to the
observed groundwater level in the calibrated model (GWL2, Figure 5b; R2 5 0.7, Table 4). In contrast, the
Figure 5. Simulated 95th and 5th percentile bounds of fluxes (a) discharge, (e) soil (S3_10cm d2H), and (f) stream d2H and storage compo-
nents (b) Sup, (c) Slow, and (d) Ssat of model 3(b) using all available data for calibration. The observed variables are given as black lines for
comparison. Note that the measured groundwater levels (GWL 1, 2) are given in cm below surface; whereas the model simulates storage
content in mm. The model 3(a) from calibration with 2008/2009 data was tested and applied to the study period 2011/2012 for simulation
(mean of accepted parameter sets in blue).
Table 5. Model Tests From 2011/2012 Calibration to 2008/2009 Using Model 3(b) and From 2008/2009 to 2011/2012 Using Model 3(a)a
Hydrometric Simulations Isotopic Simulations
2 2
R (Soil_d H: VE (Soil_d2H:
Optimal Parameter Set 1_10cm, 3_10cm, 1_10cm, 3_10cm,
Calibration Data [a, b, r, k, a, upSp, satSp, lowSp] ED R2 (GWL 1, 2) NSE (Q) lnNSE (Q) VE (Q) 3_50cm, Q_d2H) 3_50cm, Q_d2H)
Model 3(b)
Calibrated model 2011/2012 [0.04, 0.009, 0.11, 0.03, 1.62 (0.3, 0.7) 0.52 0.53 0.63 (0.73, 0.83, 0.47, 0.64) (0.95, 0.91, 0.94, 0.97)
(nf 5 8, ns 5 5) 0.89, 184, 106, 4670]
Test 2008/2009 0.62 0.33 0.66 0.59 0.97
Model 3(a)
Calibrated model 2008/2009 [0.13, 0.0076, 0.75, 0.016, 0.99 0.64 0.54 0.69 0.6 0.98
1.18, 2910, 332, 3628]
Test 2011/2012 (0.17, 0.99) 0.48 0.48 0.61 (0.63, 0.87, 0.67, 0.54) (0.97, 0.92, 1.03, 0.96)
a
Model performance is shown for all evaluated model flux and state variables obtained by calibration (in bold). The consistency criteria C in normal underlined Italics font represent
simulated model fluxes and internal states which were not explicitly calibrated using state representative data.
5.3. Effect of Additional Calibration Data on Parameter Identifiability and Model Predictions
In addition to the most balanced model, use of data matching internal model state variables and fluxes
proved insightful when combined data sets and OFs were used for calibration. Table 3 shows the prior,
Table 6. Relative Importance of Parameter Sensitivity Determined for Each Calibration Data Set Against Model Performance (ED, n Number of Combined Objective Functions
Evaluating s States and f Fluxes) Using Latin Hypercube Morris One-at-a-Time Global Sensitivity Analysis [van Griensven et al., 2006]
Ranked Relative Parameter Importance (1—Most Sensitive, 8—Least Sensitive)
Calibration Data 1 2 3 4 5 6 7 8
Hydrometric
1(a) (ns 5 2) a (0.34) k (0.25) a (0.18) b (0.13) r (0.1) satSp (0) upSp (0) lowSp (0)
1(b) (nf 5 3) a (0.49) k (0.2) a (0.19) b (0.07) r (0.05) satSp (0) upSp (0) lowSp (0)
1(c) (ns 5 2, nf 5 3) a (0.51) k (0.46) a (0.019) b (0.018) r (0.01) satSp (0) upSp (0) lowSp (0)
Isotopic
2(a) (nf 5 6) a (0.52) r (0.18) upSp (0.17) lowSp (0.07) b (0.06) k (0) ripSd (0) a (0)
2(b) (nf 5 6, ns 5 2) a (0.28) a (0.17) k (0.14) satSp (0.13) b (0.13) r (0.11) lowSp (0.02) upSp (0.009)
Combined
3(a) (ns 5 5) a (0.5) k (0.43) satSp (0.02) b (0.02) a (0.015) r (0.012) upSp (0.002) lowSp (0.0016)
3(b) (nf 5 8, ns 5 5) a (0.36) k (0.35) a (0.14) r (0.06) lowSp (0.04) b (0.03) upSp (0.02) satSp (0.009)
Figure 8. Predictive power of the model applied to simulate discharge Q and stream d2H exclusively based on calibration against data
assumed to match internal model state variables and fluxes (b) 1(a)—groundwater levels, (c) 2(b)—soil and stream water isotopes, and (d)
2(a)—soil water isotopes only.
mean, and posterior parameter distributions for the model for all combinations, all of which can have a major
influence on the mean and 5th/95th posterior percentiles. Generally, a clear difference in parameter values
exists for the calibration using groundwater levels and soil isotope data as internal states compared to the
more common discharge and stream isotopes. Calibration based solely on discharge and/or stream isotopes
favor an unsaturated reservoir which is faster draining in terms of recharge (r > 0.9) and outflow (a > 0.2) into
the saturation area. In contrast, using soil water isotopes and groundwater levels resulted in a slower draining
upper hillslope reservoir, but a faster draining lower hillslope reservoir (b > 0.1, Table 3) despite similar nonlin-
ear runoff generation parameters (k and a) for all data used in calibration. Similar clear differentiation in
parameter values for traditional calibration targets using discharge and the use of internal model state varia-
bles—particularly soil isotope data—is also evident for the isotope mixing volumes. The latter resulted in mix-
ing volumes for the unsaturated hillslope reservoir and saturation area store of between 100 and 350 mm. In
contrast, the model 3(a) (discharge and stream d2H) calibration for 2008/2009 and 2011/2012 results in a
much larger unsaturated mixing volume (upSp > 1600 mm and > 4000 mm). The saturated mixing volume
(lowSp > 2500 mm) was equally determined by all data sets for calibration and test period.
The predictive power of models with different calibration objectives unsurprisingly varied (Table 4). Dis-
charge predictions from model 1(a) using groundwater for calibration match the overall hydrograph fluctu-
ations; though high flows are overpredicted and low flows underpredicted (Figure 8b). This mostly reflects
the choice of OF (R2) and also that the model simulates storage rather than groundwater level (NSE 5 0.02
compared to NSE 5 0.54 of model 1(b)). Calibrating with soil and stream isotope data (model 2(b)) improves
simulations (see performances and consistencies in Table 4) of overall hydrograph dynamics and recessions,
though many smaller peaks are missed (Figure 8c). This can be explained by model structure, parameter val-
ues and the conceptualization of mixing required to produce the damped stream water outputs. If only soil
isotope data are used (2(a)), predictions of flows are poorer (as expected, NSE 5 0.18 compared to
NSE 5 0.52 of model 3(b)), though the stream isotopes match the observations equally well (Figure 8d,
R2 5 0.62 compared to R2 5 0.64 of model 3(b)).
Multivariate colinearity indices (equation (25)) were used to assess overall model identifiability and quantify
the interaction of the parameter sets retained by the DE algorithm for selected models (Figure 9). When
different study years, though this was a year with more evenly distributed precipitation and the proportion
of younger water is similar to model 2(a). Incorporation of soil isotope data for model calibrations indicated
much older water contributing to streamflow, especially during drier periods for calibrations 2(a) and 2(b)
(see inset), periods, when the more balanced model (3(b)) was simulating cessation of hillslope outflows.
6. Discussion
6.1. How Well Does a Simple, Parsimonious Model Characterize Catchment Function?
The eight parameter model—with five runoff parameters and three mixing volumes—represents a parsimo-
nious approach for the BB that can simulate catchment storage dynamics and fluxes of water and tracers.
The calibration to all available hydrometric and isotopic data produced the most balanced model in terms
of multicriteria evaluation which was tested reasonably well against independent flow and tracer time
series. The model evolved from previous work by Birkel et al. [2011a], but we modified this to utilize new
additional soil and groundwater data [Tetzlaff et al., 2014] while maintaining a parsimonious model. The
resulting model comprised an unsaturated hillslope reservoir filled after a simple interception and transpira-
tion loss routine overlying a hillslope groundwater reservoir. All hillslope fluxes drain into the store of the
saturation area from which water and solutes are mixed and nonlinearly transferred to the stream. We
explicitly accounted for evaporative fractionation, addressing the potential nonconservative behavior of iso-
tope tracers though effect of snow fractionation was not accommodated to keep parameter numbers low
[Birkel et al., 2011a]. In contrast to Fenicia et al. [2010] and Hrachowitz et al. [2013], the basic mixing assump-
tion used is that of complete mixing. However, we implemented preferential recharge and time-varying
mixing volume parameters (equations (16), (17), and (20)) relaxing the assumption of complete mixing con-
sistent with empirical data while restricting additional parameters to the necessary mixing volumes.
The model uses a power-law runoff generation mechanism (equation (15)) similar to Kirchner [2009] but
with dynamically connected separate landscape units. While capturing the general hydrological dynamics,
this appears too crude a conceptualization of the spatially complex surface flow patterns that will be
affected by microtopography [see, e.g., Frei et al., 2010]. Consequently, many of the small runoff events are
not captured well in the model simulations resulting in relatively low NSE values (0.52) compared to simu-
lations (NSE 0.6) for other study periods [Birkel et al., 2010 and 2011a]. This becomes apparent for the cali-
bration and the 2008/2009 test period resulting in best fit NSEs 5 0.64 and 0.62, respectively (Table 5).
Although these values are low compared to many modeling studies, they are not uncommon in montane
catchments with input uncertainties [e.g., Capell et al., 2013]. However, if calibration was simply based on
discharge performance using a single criterion in absence of additional calibration targets, this could be a
basis for model rejection [Beven, 2012b]. Therefore, accounting for the spatial heterogeneity of the satura-
tion area extent in future modeling will be an important research priority. Nevertheless, the current model
represents progress as a trade-off in terms of gaining additional information from different calibration data
and OFs. Thus, the runoff performance is compensated by quite good simulations of isotope dynamics in
streamflow (best fit R2 5 0.64), soil water and storage dynamics in the main model storage units [e.g., Seibert
and McDonnell, 2002]. The damping of stream isotopes reflects the mixing of new water with previously
stored water during storm event conditions as overland flow generated from the saturation areas largely
displaces preevent water into the stream (Figure 2c, see range and CV in Table 1).
Albeit specific to the model applied, sensitivity analysis revealed significant parameter sensitivity; particularly
associated with mixing (satSp) and runoff generation (a, k) in the saturation area (Table 6). While, obviously,
the mixing volume parameters are not used in discharge simulations, runoff generation parameters have a
marked effect on stream isotope simulations. Therefore, separate calibration of runoff generation and isotope
mixing parameters may ignore important integrated effects on tracer transport [Fenicia et al., 2008]. If the
time variance of sensitivity is assessed, as expected, highest sensitivity mostly coincides with periods of higher
runoff and increased displacement of water. Due to the limitations of local sensitivity analysis [e.g., Shin et al.,
2013], we estimated global sensitivity with the LH-OAT method of van Griensven et al. [2006]. This showed the
impact of different data sets and OF combinations, but demonstrated that when all available data were used
for calibration the most balanced model had parameters that all exhibited sensitivity. Despite the limitations
of simple and parsimonious models in terms of maximum performance, this study has shown that they have
significant potential for developing an internally consistent conceptualization of catchment functioning.
6.2. How Much Data are Needed to Calibrate a Conceptual Water and Tracer Transport Model?
It has been argued that maximum experimental data are always advantageous to characterize catchment
systems, though there is a pragmatic need to direct experimental efforts toward collection of data that is
most informative for model development and evaluation [McGuire et al., 2007; Soulsby et al., 2008]. We
incorporated soil isotope and groundwater level data for calibration assuming these allow different aspects
of model simulations to be assessed [Seibert and McDonnell 2002]. Given that almost inevitable data and
structural errors dictate that no optimal solution can be found for higher-parameter hydrological models,
the most balanced model representation is a more realistic goal [Gupta et al., 2008; Hrachowitz et al., 2013].
We assumed that using all combined data and OFs for calibration would result in the most balanced model
able to represent all incorporated processes adequately; this was corroborated by the independent test
period and supported by the LH-OAT sensitivity analysis.
Nevertheless, the use of different types of data provided different insights for model calibration and evalua-
tion. For simulating groundwater levels, calibration was restricted to evaluating the dynamics using the
coefficient of determination (R2) [Fenicia et al. 2008] as the model simulates storage content and the
groundwater levels represent highly variable point measurements (see section 4.2.). Nevertheless, calibra-
tion against groundwater levels revealed important insights into the internal dynamics of model states in
terms of simulating the unsaturated storage connectivity to the saturation area. This resulted in more
dynamic representation of hillslope connectivity during wetter phases as illustrated by Tetzlaff et al. [2014],
and increased groundwater contributions to streamflow and runoff generation (>parameter a, b) compared
to calibration against flow only. Incorporation of the soil isotope data for the unsaturated hillslopes (spodo-
sols at S3) and saturation area (histosols at S1)—resulted in significantly increased parameter identifiability
indicated by low colinearity indices [Soetart and Petzold, 2010]. This also points toward each parameter serv-
ing a unique and identifiable purpose in simulating discharge and stream isotopes [Brun et al. 2001]. This
can be corroborated by assessing the predictive power in simulating discharge and stream isotopes based
only on additional data assumed to match internal model state variables and fluxes. For example, general
flow dynamics were captured using soil isotope data for calibration alone and stream isotope simulations
were almost as good as models with all data used for calibration. Soil isotopes have potential—at least in
similar catchments—to be routinely used for model assessment as advocated by Clark et al. [2011].
antecedent conditions. Nevertheless, the estimates lie within the 90% uncertainty bounds (0.9–2.9 years)
of these lumped models and have a similar CDF [Hrachowitz et al., 2010b]. Incorporating the calibrated
model for 2008/2009 into the analysis as a test resulted in a similar CDF compared to the 2011/2012 calibra-
tions. This indicated only modest time variability of flow pathways and mixing between the two study peri-
ods causing broadly similar simulated transit times in contrast to, e.g., Heidb€
uchel et al. [2012]. The flux
tracking shows how high-resolution tracer data from different water sources, if sampled long enough pro-
vides a useful tool in model diagnostics through assessment of stream water age.
7. Conclusions
We presented a simple, lumped conceptual rainfall-runoff model developed from a data-rich empirical
understanding of an upland catchment. The model explicitly calculates water and tracer storage and trans-
port for the major landscape units (hillslope—riparian saturation area). Detailed experimental data (ground-
water level and soil isotopes for the most important soils) were incorporated into the modeling and
assumed to reflect internal state variables and fluxes. These were then used to calibrate and evaluate the
model in terms of performance, predictive power, parameter sensitivity, and identifiability. We show that
incorporation of such data improves identifiability. Consequently, the coupled flow-tracer model with eight
parameters exhibited better identifiability than the five parameter flow model. In this context, soil isotope
data resulted in the highest predictive power if the model is used for flow and, particularly, stream isotope
simulations. However, the spatial heterogeneity of the saturation areas in the model is currently inad-
equately represented and requires better conceptualization for improved runoff simulations of small events.
Furthermore, if the model calibrated against different data sets and OFs is used to derive stream water age
distributions, these show comparable mean values, but vary for the extremes. This suggests that care has to
be taken if such information is used to inform water quality assessments with emphasis on short and
longer-term pollutant transport. Our results imply that stream water age distributions depend not only on
model performance, but also on model consistency as indicated by the model’s ability to represent the
information content of data used in calibration. Thus, tracking water ages in conceptual models can be a
strong test for internal consistency of a model structure. We conclude that incorporating tracer data into
rainfall-runoff models offers guidance to fieldwork efforts and further insight into using simple models as a
learning tool about catchment behavior.
Notation
P catchment precipitation (mm d21).
ET catchment evapotranspiration (mm d21).
ETact actual evapotranspiration (mm d21).
I interception (mm d21).
S total storage (active 1 mixing volume) (mm).
Sn internal model storage content (active storage) (mm).
Sp additional mixing volume parameters (mm).
Q simulated discharge at catchment outlet (mm d21).
fSAT dynamic saturation area extent (% catchment area).
fHill dynamic hillslope storage extent (% catchment area).
AW antecedent wetness (mm).
Smax maximum soil moisture storage (mm).
cn isotope signature or tracer concentration of n storage components (&).
Ij j storage inflows (e.g., Pup, Psat, Qup, Re, Qlow) (mm d21).
Ok k outflow or loss components (e.g., ETup, ETsat, Qup, Re, Qlow, Qs) (mm d21).
Rn net radiation (W m22).
rc, ra canopy and aerodynamic roughness coefficients (m s21).
Lv latent heat loss (2453 MJ m23).
cp specific heat capacity of air (J kg21 K21).
qa dry air density (kg m23).
Acknowledgments References
The authors greatly appreciate help in
Barnes, C. J., and M. Bonell (1996), Application of unit hydrograph techniques to solute transport in catchments, Hydrol. Processes, 10,
the field by Konrad Piegat, Rene
793–802.
Capell, Jonathan Dick, and Josie Geris.
Berman, E. S. F., M. Gupta, C. Gabrielli, T. Garland, and J. J. McDonnell (2009), High-frequency field-deployable isotope analyzer for hydro-
Part of the hydrometeorological data
logical applications, Water Resour. Res., 45, W10201, doi:10.1029/2009WR008265.
was provided by Marine Scotland,
Beven, K. (2006), A manifesto for the equifinality thesis, J. Hydrol., 320, 18–36.
Freshwater Laboratory, and in
Beven, K. (2012a), Rainfall-Runoff Modelling: The Primer, 2nd ed., Wiley-Blackwell, Chichester, U. K.
particular, Iain Malcolm and the
Beven, K. (2012b), Causal models as multiple working hypotheses about environmental processes, C. R. Geosci., 344, 77–88, doi:10.1016/
Scottish Environmental Protection
j.crte.2012.01.005.
Agency. We are grateful to the careful
Birkel, C., D. Tetzlaff, S. M. Dunn, and C. Soulsby (2010), Towards simple dynamic process conceptualization in rainfall-runoff models using
and constructive comments of Hoshin
multi-criteria calibration and tracers in temperate, upland catchments, Hydrol. Processes, 24, 260–275.
Gupta and other anonymous reviewers
Birkel, C., D. Tetzlaff, S. M. Dunn, and C. Soulsby (2011a), Using time domain and geographic source tracers to conceptualize streamflow
that helped to improve an earlier
generation processes in lumped rainfall-runoff models, Water Resour. Res., 47, W02515, doi:10.1029/2010WR009547.
version of this manuscript. The data
Birkel, C., C. Soulsby, and D. Tetzlaff (2011b), Modelling catchment-scale water storage dynamics: Reconciling dynamic storage with tracer-
used in this study are available from
inferred passive storage, Hydrol. Processes, 25, 3924–3936, doi:10.1002/hyp.8201.
the authors upon request.
Boorman, D. B., J. M. Hollis, and A. Lilly (1995), Hydrology of soil types: A hydrological classification of the soils of the United Kingdom, Inst.
Hydrol. Rep. 126, Inst. of Hydrol., Wallingford, U. K.
Box, G. E. P. (1976), Science and statistics, J. Am. Stat. Assoc., 71(356), 791–799.
Brun, R., P. Reichert, and H. Kunsch (2001), Practical identifiability analysis of large environmental simulation models, Water Resour. Res.,
37(4), 1015–1030.
Capell, R., D. Tetzlaff, and C. Soulsby (2013), Will catchment characteristics moderate the projected effects of climate change on flow
regimes in the Scottish Highlands?, Hydrol. Processes, 27, 687–699, doi:10.1002/hyp.9626.
Clark, M. P., H. K. McMillan, D. B. G. Collins, D. Kavetski, and R. A. Woods (2011), Hydrological field data from a modeller’s perspective. Part
2: Process-based evaluation of model hypotheses, Hydrol. Processes, 25, 523–543.
Criss, R. E., and W. E. Winston (2008), Do Nash values have value? Discussion and alternate proposals, Hydrol. Processes, 22, 2723–2725, doi:
10.1002/hyp.7072.
Davies, J., and K. Beven (2012), Comparison of a multiple interacting pathways model with a classical kinematic wave subsurface flow solu-
tion, Hydrol. Sci. J., 57(2), 203–216.
Dunn, S. M., and R. MacKay (1995), Spatial variation in evapotranspiration and the influence of land use on catchment hydrology, J. Hydrol.,
171, 49–73.
Dunn, S. M., C. Birkel, D. Tetzlaff, and C. Soulsby (2010), Transit time distributions of a conceptual model: Their characteristics and sensitiv-
ities, Hydrol. Processes, 24, 1719–1729, doi:10.1002/hyp.7560.
Dunn, S. M., W. G. Darling, C. Birkel, and J. R. Bacon (2012), The role of groundwater characteristics in catchment recovery from nitrate pol-
lution, Hydrol. Res., 43(5), 560–575.
Fenicia, F., J. J. McDonnell, and H. Savenije (2008), Learning from model improvement: On the contribution of complementary data to pro-
cess understanding, Water Resour. Res., 44, W06419, doi:10.1029/2007WR006386.
Fenicia, F., S. Wrede, D. Kavetski, L. Pfister, L. Hoffmann, H. H. G. Savenije, and J. J. McDonnell (2010), Assessing the impact of mixing
assumptions on the estimation of streamwater mean residence time, Hydrol. Processes, 24(12), 1730–1741, doi:10.1002/hyp.7595.
Freer, J., H. McMillan, J. J. McDonnell, and K. J. Beven (2004), Constraining dynamic TOPMODEL responses for imprecise water table infor-
mation using fuzzy rule based performance measures, J. Hydrol., 291(3–4), 254–277.
Frei, S., G. Lischeid, and J. H. Fleckenstein (2010), Effects of micro-topography on surface–subsurface exchange and runoff generation in a
virtual riparian wetland—A modeling study, Adv. Water Resour., 336, 1388–1401, doi:10.1016/j.advwatres.2010.07.006.
Gibson, J. J., and T. W. D. Edwards (2002), Regional water balance trends and evaporation transpiration partitioning from stable isotope sur-
vey of lakes in northern Canada, Global Biogeochem. Cycles, 16(2), 1026, doi:10.1029/2001GB001839.
Gupta, H. V., S. Sorooshian, and P. O. Yapo (1998), Towards improved calibration of hydrologic models: Multiple and non-commensurable
measures of information, Water Resour. Res., 34(4), 751–763.
Gupta, H. V., T. Wagener, and Y. Liu (2008), Reconciling theory with observations: Elements of a diagnostic approach to model evaluation,
Hydrol. Processes, 22, 3802–3813, doi:10.1002/hyp.6989.
Gupta, H. V., M. P. Clark, J. A. Vrugt, G. Abramowitz, and M. Ye (2012), Towards a comprehensive assessment of model structural adequacy,
Water Resour. Res., 48, W08301, doi:10.1029/2011WR011044.
Heidb€ uchel, I., P. A. Troch, S. W. Lyon, and M. Weiler (2012), The master transit time distribution of variable flow systems, Water Resour. Res.,
48, W06520, doi:10.1029/2011WR011293.
Hrachowitz, M., C. Soulsby, D. Tetzlaff, I. A. Malcolm, and G. Schoups (2010a), Gamma distribution models for transit time estimation in
catchments: Physical interpretation of parameters and implications for time-variant transit time assessment, Water Resour. Res., 46,
W10536, doi:10.1029/2010WR009148.
Hrachowitz, M., C. Soulsby, D. Tetzlaff, and M. Speed (2010b), Catchment transit times and landscape controls: Does scale matter?, Hydrol.
Processes, 24, 117–125, doi:10.1002/hyp.7510.
Hrachowitz, M., H. Savenije, T. A. Bogaard, D. Tetzlaff, and C. Soulsby (2013), What can flux tracking teach us about water age distribution
patterns and their temporal dynamics?, Hydrol. Earth Syst. Sci., 17, 533–564, doi:10.5194/hess-17-533-2013.
Katsuyama, M., N. Kabeya, and N. Ohte (2009), Elucidation of the relationship between geographic and time sources of stream water using
a tracer approach in a headwater catchment, Water Resour. Res., 45, W06414, doi:10.1029/2008WR007458.
Kirchner, J. W. (2003), A double paradox in catchment hydrology and geochemistry, Hydrol. Processes, 17, 871–874.
Kirchner, J. W. (2006), Getting the right answers for the right reasons: Linking measurements, analyses, and models to advance the science
of hydrology, Water Resour. Res., 42, W03S04, doi:10.1029/2005WR004362.
Kirchner, J. W. (2009), Catchments as simple dynamical systems: Catchment characterization, rainfall-runoff modeling, and doing hydrology
backward, Water Resour. Res., 45, W02429, doi:10.1029/2008WR006912.
Lamb, R., K. J. Beven, and S. Myrabo (1998), Use of spatially distributed water table observations to constrain uncertainty in a rainfall-runoff
model, Adv. Water Resour., 22, 305–317.
McDonnell, J. J., et al. (2010), How old is the water? Open questions in catchment transit time conceptualization, modelling and analysis,
Hydrol. Processes, 24, 1745–1754, doi:10.1002/hyp.7796.
McGuire, K. J., M. Weiler, and J. J. McDonnell (2007), Integrating tracer experiments with modelling to assess runoff processes and water
transit times, Adv. Water Resour., 30(4), 824–837.
McMillan, H., D. Tetzlaff, M. Clark, and C. Soulsby (2012), Do time-variable tracers aid the evaluation of hydrological model structure? A mul-
timodel approach, Water Resour. Res., 48, W05501, doi:10.1029/2011WR011688.
Morris, M. D. (1991), Factorial sampling plans for preliminary computational experiments, Technometrics, 33(2), 161–174, doi:10.1080/
00401706.1991.10484804.
Mroczkowski, M., G. P. Raper, and G. Kuczera (1997), The quest for more powerful validation of conceptual catchment models, Water
Resour. Res., 33(10), 2325–2335.
Page, T., K. J. Beven, J. Freer, and C. Neal (2007), Modelling the chloride signal at Plynlimon, Wales, using a modified dynamic TOPMODEL
incorporating conservative chemical mixing (with uncertainty), Hydrol. Processes, 21(3), 292–307, doi:10.1002/hyp.6186.
Price, K., S. T. Purucker, S. R. Kraemer, and J. E. Babendreier (2012), Tradeoffs among watershed model calibration targets for parameter
estimation, Water Resour. Res., 48, W10542, doi:10.1029/2012WR012005.
Price, K. V., R. M. Storn, and J. A. Lampinen (2005), Differential Evolution—A Practical Approach to Global Optimization, Springer, Berlin.
R Core Team (2013), R: A Language and Environment for Statistical Computing, R Found. for Stat. Comput., Vienna. [Available at http://www.
R-project.org/.].
Rinaldo, A., K. J. Beven, E. Bertuzzo, L. Nicotina, J. Davies, A. Fiori, D. Russo, and G. Botter (2011), Catchment travel time distributions and
water flow in soils, Water Resour. Res., 47, W07537, doi:10.1029/2011WR010478.
Sayama, T., and J. J. McDonnell (2009), A new time-space accounting scheme to predict stream water residence time and hydrograph
source components at the water scale, Water Resour. Res., 47, W07401, doi:10.1029/2008WR007549.
Seibert, J., and J. J. McDonnell (2002), On the dialog between experimentalist and modeler in catchment hydrology: Use of soft data for
multicriteria model calibration, Water Resour. Res., 38(11), 1241, doi:10.1029/2001WR000978.
Seibert, J., T. Grabs, S. Koehler, H. Laudon, M. Winterdahl, and K. Bishop (2009), Linking soil- and stream-water chemistry based on riparian
flow-concentration integration model, Hydrol. Earth Syst. Sci., 13, 2287–2297.
Shin, M.-J., J. H. A. Guillaume, B. F. W. Croke, and A. J. Jakeman (2013), Addressing ten questions about conceptual rainfall-runoff models
with global sensitivity analyses in R, J. Hydrol., 503, 135–152, doi:10.1016/j.jhydrol.2013.08.047.
Sobol, I. M. (1993), Sensitivity analysis for nonlinear mathematical models, Math. Model. Comput. Exp., 1(4), 407–414.
Soetaert, K., and T. Petzoldt (2010), Inverse modelling, sensitivity and Monte Carlo analysis in R using package FME, J. Stat. Software, 33(3),
1–28. [Available at http://www.jstatsoft.org/v33/i03/.].
Soulsby, C., M. Chen, R. C. Ferrier, A. Jenkins, and R. Harriman (1998), Hydrogeochemistry of shallow groundwater in a Scottish catchment,
Hydrol. Processes, 12, 1111–1127.
Soulsby, C., D. Tetzlaff, N. van den Bedem, I. A. Malcolm, P. J. Bacon, and A. F. Youngson (2007), Inferring groundwater influences on surface
water in montane catchments from hydrochemical surveys of springs and streamwaters, J. Hydrol., 333, 199–213.
Soulsby, C., C. Neal, H. Laudon, D. A. Burns, P. Merot, M. Bonell, S. M. Dunn, and D. Tetzlaff (2008), Catchment data for process conceptuali-
zation: Simply not enough?, Hydrol. Processes, 22, 2057–2061.
Tetzlaff, D., S. Uhlenbrook, S. Eppert, and C. Soulsby (2008), Does the incorporation of process conceptualization and tracer data improve
the structure and performance of a simple rainfall-runoff model in a Scottish mesoscale catchment?, Hydrol. Processes, 22(14), 2461–
2474, doi:10.1002/hyp.6841.
Tetzlaff, D., C. Birkel, J. Dick, and C. Soulsby (2014), Storage dynamics in hydropedological units control hillslope connectivity, runoff gener-
ation and the evolution of catchment transit time distributions, Water Resour. Res., 50, 969–985, doi:10.1002/2013WR014147.
Uhlenbrook, S., and C. Leibundgut (2002), Process-oriented catchment modelling and multiple-response validation, Hydrol. Processes, 16,
423–440.
Vache, K. B., and J. J. McDonnell (2006), A process-based rejectionist framework for evaluating catchment runoff model structure, Water
Resour. Res., 42, W02409, doi:10.1029/2005WR004247.
van Griensven, A., T. Meixner, S. Grunwald, T. Bishop, M. Diluzio, and R. Srinivasan (2006), A global sensitivity analysis tool for the parame-
ters of multi-variable catchment models, J. Hydrol., 324, 10–23.