Birkel 2014

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

PUBLICATIONS

Water Resources Research


RESEARCH ARTICLE Developing a consistent process-based conceptualization
10.1002/2013WR014925
of catchment functioning using measurements of internal
Key Points: state variables
 Internal state and flux data improve
consistency in calibrated models € rthe Tetzlaff1
Christian Birkel1, Chris Soulsby1, and Do
 Soil isotope data have the highest
information content for model 1
Northern Rivers Institute, School of Geosciences, University of Aberdeen, Aberdeen, UK
calibration
 Water age tracking is a useful model
diagnostic of system function
Abstract We use isotope data in addition to discharge and groundwater level data to conceptualize the
internal processes of runoff generation and tracer transport in a low parameter coupled flow-tracer model
Correspondence to:
that could predict the runoff response and isotopic composition of an upland stream. We used sensitivity
C. Birkel,
[email protected] analysis to assess the effect of these data on model calibration in terms of parameter identifiability and the
model’s ability to predict the stream’s runoff response, isotopic composition and water age. The results
Citation: showed that the incorporation of tracer data in particular, clearly increased parameter identifiability and
Birkel, C., C. Soulsby, and D. Tetzlaff improved the predictive power of models for simulating both streamflow and isotopes. This also resulted in
(2014), Developing a consistent
a more consistent process-based conceptualization of catchment functioning. We could also show that
process-based conceptualization of
catchment functioning using using models as learning tools can guide sampling campaigns toward measurements with increased infor-
measurements of internal state mation content for further modeling. We conclude that this is a promising approach for assessing dominant
variables, Water Resour. Res., 50, 3481–
processes in coupled flow-tracer models. This is of value when such models are being used to test hypothe-
3501, doi:10.1002/2013WR014925.
ses about the hydrological functioning of catchments, particularly in relation to pollutant transfers.
Received 18 OCT 2013
Accepted 10 APR 2014
Accepted article online 15 APR 2014 1. Introduction
Published online 28 APR 2014
Relatively few studies use spatially distributed hydrometric and tracer observations in addition to discharge
to aid the calibration and evaluation of catchment runoff models [Beven, 2012a]. While empirical observa-
tions representing internal model state variables (e.g., groundwater level and soil moisture data) and their
conceptualization in models has proved useful, local heterogeneities can pose serious problems for
catchment-scale runoff modeling [e.g., Lamb et al., 1998; Freer et al., 2004]. Conservative tracers offer alter-
native integrated sources of information that can characterize the internal states and fluxes in terms of
water molecules stored in a catchment and aid modeling [Uhlenbrook and Leibundgut, 2002]. Combining
multiproxy hydrometric and tracer data can help reconcile the ‘‘old water paradox’’ [Kirchner, 2003] enabling
models to simulate both the rapid rainfall-runoff response (over time scales of minutes-hours-days-weeks)
and the transit time of water molecules (over months to decades) which are subsequently transported
[Davies and Beven, 2012]. Such integration of tracers with streamflow data has been traditionally achieved
by calibrating lumped models to measured streamflow and tracer concentrations on the basis of input-
output relationships [Barnes and Bonell, 1996; Seibert and McDonnell, 2002; Hrachowitz et al., 2013]. Resource
limitations previously constrained analysis of longer-term, high-frequency (daily and subdaily), and spatially
distributed tracer data. Now, the development of inexpensive laser spectroscopy has increased the feasibil-
ity of obtaining large numbers of isotope analyses, facilitating improved characterization of the temporal
(minutes to days) and spatial dynamics of the internal (e.g., soil and groundwater) catchment isotope
responses [Berman et al., 2009; Birkel et al., 2011a]. Such data have potential for use in models to learn how
water is transported and stored in catchments and the associated time scales [Soulsby et al., 2008; McDon-
nell et al., 2010]. This understanding is of particular practical value if models are being used to assess flow
paths of water and solutes in the context of diffuse pollutant transport [Dunn et al., 2012].
More generally, it remains a matter of debate over how to approach hydrological modeling in order to provide
the most appropriate framework for learning about catchment functioning and associated context-specific hypoth-
esis testing [see, e.g., Gupta et al., 2012]. Some studies have stressed the critical importance of model structure in
the context of coupled flow-tracer models; advocating multimodel approaches to identify the most suitable model
based on performance statistics [Fenicia et al., 2008; McMillan et al., 2012; Hrachowitz et al., 2013]. Despite the merits
of such methods, highly parameterized models can suffer from identifiability issues; particularly for coupled models

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3481
Water Resources Research 10.1002/2013WR014925

of water and solute transport [Mroczkowski et al., 1997; Page et al., 2007]. Kirchner [2006] argued that simple, low-
parameterized models based on dominant process conceptualization can provide a preferable, more realistic rep-
resentation of catchment functioning with identifiable parameters and reduced predictive uncertainty. Such an
approach can also use additional data such as tracers [Seibert and McDonnell, 2002], though there is a trade-off in
terms of additional parameters that maybe needed to be added [Soulsby et al., 2008].
Integration of tracer-based information into conceptual modeling has many potential advantages. For exam-
ple, Vache and McDonnell [2006] used tracers to evaluate the effects of model structure and parameterizations
on flow simulations. More recently, Dunn et al. [2010] and McMillan et al. [2012] examined the effect of model
structures on transit time distributions (TTDs) calculated from coupled conceptual flow and solute transport
models. A central challenge to such initiatives is how to parameterize different mixing assumptions that
impact the transport of solutes and associated travel times [Fenicia et al., 2010]. Further, work by Hrachowitz
et al. [2010a] and Heidb€ uchel et al. [2012] demonstrated the effect of different wetness regimes and the impor-
tance of evapotranspiration fluxes on the time variance of mixing and transport processes. Most of these stud-
ies have been restricted to the use of tracer input-output relationships for model assessment. Therefore, the
potential of tracer measurements and how this can be implemented into runoff models as internal states
(e.g., from different soils and groundwater) is relatively unexplored. Such models may be used to investigate
how landscape heterogeneity mediates the effects of spatial and temporal variability of mixing and transport
processes at the catchment scale [Rinaldo et al., 2011]. Exceptions include Katsuyama et al. [2009] who used
spatially distributed tracer data from a small Japanese catchment to inform a catchment model. Similarly,
Sayama and McDonnell [2009] used a tracer-based model framework to account for the sources of water con-
tributing to the hydrograph. Additionally, Birkel et al. [2011a] used tracers measured in groundwater to test
internal model state variables and constrain predictions. Incorporation of such spatially distributed, high-
frequency tracer sampling over prolonged periods can be a ‘‘more powerful’’ test of models to reproduce
internal states rather than relying on discharge alone [Mroczkowski et al., 1997].
As tracer data becomes more readily available, opportunities increase for calibrating specific model parame-
ters against observed tracer data (e.g., for soil water or groundwater). While this has been done for hydro-
metric data (e.g., water table levels, soil moisture, etc.) [Freer et al., 2004], tracer data have greater
unrealized potential for calibration based on internal model states and fluxes that contribute to streamflow,
rather than just using tracers measured in streams [Beven, 2012a]. While this may not necessarily result in
improved model performance in terms of flow predictions, it can result in improved parameter identifiabil-
ity and help to reveal model inadequacies [Gupta et al., 1998], for example in terms of representation of
internal storage and fluxes [Price et al., 2012]. The latter is an essential part of conceptual modeling applied
in a learning framework, where identification of errors and rejection of hypotheses can provide improved
understanding of catchment function and subsequent models [Box, 1976; Gupta et al., 2008; Beven, 2012b].
In this paper, we use newly available water isotope data from different soil types and depths, and ground-
water level data collected over a full year to develop a parsimonious (low parameter), conceptual rainfall-
runoff model with coupled tracer transport modules for an upland catchment in Scotland (see Tetzlaff et al.
[2014] for details on the study site and measurements). The model evolved from previous investigations
which focused on simulating the runoff response and tracer composition of stream waters [Birkel et al.,
2010, 2011a]. The new data were used with the overall aim of testing the effect of different types of data on
the calibration of the coupled flow-tracer model in order to attain a conceptualization of internal model
state variables and fluxes that was broadly consistent with empirical data. The specific objectives were to:
(a) test alternative targets in addition to discharge for model calibration and evaluate internal model state
variables and fluxes based on the incorporation of key hydrometric and isotopic data; (b) examine how
parameter identifiability, sensitivity, and the predictive power of flow and tracer simulations are related to
the amount and type of data used for calibration; and (c) investigate the effects of different calibration data
on using the model to estimate stream water age using flux tracking.

2. Study Site Monitoring and Data


2.1. Catchment Characteristics
The Bruntland Burn (BB) is in the Cairngorms National Park, Scotland; detailed descriptions can be found in
Soulsby et al. [2007] and Birkel et al. [2011a, 2011b]. The 3.2 km2 catchment has many characteristics typical

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3482
Water Resources Research 10.1002/2013WR014925

Figure 1. Air photo of the Bruntland Burn experimental catchment showing distinct vegetation patterns related to wetness (green areas
can be associated with greater wetness) in the valley bottom saturation areas and the location of instrumentation at the catchment outlet
and the hillslope transect (1—deep histosols (peat soil) in the saturation area, 2—Shallow histosols (peaty gley soil) in the transition zone,
and 3—the freely draining spodosols on the steeper hillslope).

of northern upland areas with igneous (46% granite) and metamorphic (47% pelite) geology and a glacial
legacy (Figure 1). The resulting landscape has steep hillslopes and wide, flat valley bottoms (at around 200
m). Geophysical surveys show that the latter are filled with thick layers (>20 m) of glacial drift (mainly
coarse-textured till), where poorly drained histosols (peats  2 m deep and peaty gley soils  70 cm deep)
have developed. These soils fringe the channel network, maintaining saturated conditions all year and facili-
tating rapid near-surface runoff generation processes, especially saturation overland flow [Birkel et al., 2010;
Tetzlaff et al., 2008]. The steeper slopes have freely draining spodosols (<1 m deep) grading at altitudes
>350 m to thin (<40 cm) inceptisols which facilitate groundwater recharge [Soulsby et al., 1998].
Runoff dynamics are controlled by the expansion and contraction of the saturated riparian areas as a result
of direct precipitation and groundwater seepage from upslope [Birkel et al., 2011a]. The hydropedology is
apparent from the spatial distribution of vegetation communities representing distinct ecohydrological
units (Figure 1). The spodosols are dominated by heather (Calluna vulgaris) moorland; grasses (Molinia caer-
ulea) indicate the wet soils in the valley bottom while deeper histosols are dominated by mosses (Spagnum
spp). Mean annual precipitation is around 1000 mm with 363 mm actual evapotranspiration. Snow generally
accounts for <5% of precipitation and major snowfall events are unusual [Capell et al., 2013].

2.2. Hydrometric Measurements and Isotope Tracer Data


Precipitation and streamflow have been measured since 2008, but most data presented were collected
between June 2011 and May 2012. Stream level (Capacitance probe, precision 6 0.2 cm) was measured at
the outlet and converted into discharge using a manual rating curve. A tipping bucket rain gauge (pre-
cision 6 0.2 mm) is located nearby, where ISCO 3700 autosamplers collected daily stream (composite sam-
ple at 4 3 6 h) and rain (accumulated over 24 h) samples for isotope analysis (Figure 1). Paraffin prevented
sample evaporation and freezing. These were collected for analysis at weekly intervals. An automatic

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3483
Water Resources Research 10.1002/2013WR014925

Table 1. Summary Statistics of Rainfall-Runoff Characteristics, Groundwater Levels, Additional Hydrometeorological Variables, and Unweighted Isotope Time Series (With n Number of
Samples and Coefficient of Variation CV) Incorporated Into the Modeling Procedure for the Calibration Period From 1 June 2011 to 31 May 2012a
Data Unit n Mean Range [min, max] CV
Hydrology and Hydrometeorology
Discharge (Q) mm d21 366 1.63 [0.31, 11.3] 1.34
Precipitation (P) mm d21 366 2.41 [0, 32.5] 1.9
Evapotranspiration (ET) mm d21 366 1.26 [0, 6.2] 0.92

Air temperature (AT) C 366 7.2 [27.1, 17.9] 0.64
Relative humidity (h) % 366 82.8 [55, 98] 0.09
Saturation area extent % catchment area 366 11 [2, 40] 0.81
Groundwater level 1 cm 300 23.7 [25.9, 0.2] 0.27
Groundwater level 2 cm 345 218.6 [242.6, 211.7] 0.3
Isotopic (d2H)
Stream & 317 258.1 [265.8, 253.6] 0.03
Rain & 192 256.3 [2143, 212.6] 0.41
Soil water 1 (in saturation area; 10 cm depth) & 45 255.9 [261.6, 250.7] 0.04
Soil water 1 (in saturation area; 30 cm depth) & 44 259.2 [261.8, 256.5] 0.02
Soil water 3 (0–20 cm) & 47 258.8 [282.9, 244.4] 0.16
Soil water 3 (40–60 cm) & 43 259.3 [266.7, 253.9] 0.05
Groundwater wells & 42 261.2 [263.2, 258.3] 0.02
a
Numbers 1 and 3 refer to sampling stations: 1 5 Deep Histosol (peat soil) in saturation area; 2 5 Shallow Histosol (peaty gley) in transition zone; and 3 5 freely draining spodosol
on steeper hillslope.

weather station 1 km from the BB was used for daily potential evapotranspiration estimates (Penman-Mon-
teith) and to derive mean catchment precipitation.
Three monitoring stations were established to characterize the soil moisture and groundwater dynamics
and associated isotope characteristics along a transect from the valley bottom to the upper slope (Figure 1).
The location was based on mapping the main hydropedological units [Boorman et al., 1995] and is represen-
tative of many catchments in the Scottish Highlands [Tetzlaff et al., 2014]. The transect forms a catena of
deep histosols (peats) in the valley bottom (site 1), shallow histosols (peaty gleys) in the lower slope (site 2),
and spodosols (site 3) on the upper slope.
Stations were equipped with two capacitance loggers for comparative recording of groundwater levels (15
min) in a screened well. Pairs of suction lysimeters (Rhizosphere MacroRhizons) were installed at 10, 30, and
50 cm soil depths (where volumetric moisture content was also measured using Campbell TDR probes) and
sampled over an hour under low suction once per week for isotope analysis. Site 1 was only sampled at 10
and 30 cm depth. We also sampled two groundwater wells for isotopes; one close to the BB outlet and one
west of site 3 at the same elevation [Birkel et al., 2011a]. Samples were analyzed for deuterium (d2H) and
oxygen-18 (d18O) using a Los Gatos DLT-100 laser spectrometer (precision 6 0.4& for d2H and 60.1& for
d18O). Results are reported in the delta-notion calibrated to Vienna Standard Mean Ocean Water (VSMOW)
standards. Replicates of soil and ground water isotopes were averaged and summaries of all measurements
are in Table 1. Daily precipitation and streamflow isotopes from 1 October 2008 to 30 September 2009 were
used as an independent model test period.

3. Hydrometric and Tracer Dynamics


The water balance of the study year was close to average (Table 1). However, precipitation was less evenly
distributed than usual (Figure 2a); summer 2011 was relatively cold and wet followed by a drier than aver-
age winter and early spring in 2012 then wetter conditions in late spring. This resulted in marked runoff
events for both summers and low-magnitude events in winter. An 8 mm d21 runoff peak in December had
a strong snowmelt influence. March 2012 showed unusually low runoff over a warm and dry weather period
accompanied by increased evapotranspiration.
Isotope dynamics reflect hydrodynamics with highly variable precipitation signatures being strongly
damped in stream water (Figure 2c). Nevertheless, stream isotope signals do exhibit variability with more
negative or positive excursions following large events and their associated isotope flux (Figure 2d). Those
larger events occurred in late summer 2011, winter and spring 2012 with more depleted precipitation
resulting in more negative stream isotope responses. The weekly soil isotope time series at the upper

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3484
Water Resources Research 10.1002/2013WR014925

Figure 2. Time series of (a) rainfall and (b) runoff and evapotranspiration. Isotopic composition of (c) precipitation (weighted, shown as
proportional circles of deuterium flux) against stream water and (d) soil waters in the freely draining podzol on the steeper hillslope (S3)
and the riparian peat soil (S1) at 10 cm (upper horizon) and 30 cm (deeper horizon) depths relative to stream water.

horizons at 10 cm depth showed more marked variability tracking isotope inputs in precipitation most nota-
bly at site 3 (Figure 2d). This variability was damped in deeper horizons with more negative signatures,
reflecting recharge in larger winter events with low evaporation; features that were very similar to sampled
groundwater (Table 1).
Precipitation isotopic composition plots close to the Global Meteoric Water Line as indicated by the Local
Meteoric Water Line (LMWL), whereas soil waters indicate a slight deviation in slope (Figure 3). Soil water
isotope signatures become heavier during warmer, dry periods consistent with evaporative fractionation.
Surprisingly, deeper soil horizons (S3_50cm and S1_30cm) indicate a greater change in slope bracketing
the stream isotope local evaporation line (LEL) compared to the superficial measurements at 10 cm depths.
This probably explains the deviation of the stream LEL from meteoric waters; however, this was not as
extreme as that reported by Birkel et al. [2011a] for the warmer year of 2008/2009. The spread of soil water
isotope samples at sites 1 and 3 captures the range of variability observed, which is greatest in the 10 cm
deep samples and markedly damped even at 30 cm (for S1) and 50 cm (for S3). Stream water samples plot
entirely within the range of the sampled soil waters, and are almost entirely covered by samples collected
at site 1 (10 and 30 cm). In contrast, the soil samples at site 3 10 and 30 cm (not shown) were much more
variable than the stream. This is consistent with the saturation area in the valley bottom area providing a
large storage that mixes different source waters before contributing to streamflow [Tetzlaff et al., 2014]. Pre-
vious modeling studies here supported this hypothesis and provided the basis for further testing in the
model development described below [Birkel et al., 2011a].

4. Tracer-Based Modeling Approach


To outline the modeling approach used, first we describe the low parameter but process-based, coupled
flow-tracer model philosophy which we build on (section 4.1.). The subsequent evolution is based on new
hydrometric and isotope data used for calibration (section 4.2.). Next—and crucial to using additional

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3485
Water Resources Research 10.1002/2013WR014925

Figure 3. Isotope (d2H and d18O) signatures of rainfall, stream water, soil water in the riparian peat (S1; 10 and 30 cm depth), and the freely
draining podzol on the steeper hillslope (S3; 10 and 50 cm depth). The dashed lines represent the global meteoric water line (GMWL) and
the local meteoric line (LMWL). Stream and soil isotopes were used for construction of Local Evaporation Lines (LEL). Please note that col-
ors match with Figure 1.

data—we test the effect of different data and objective functions (OFs) used in calibration on simulations of
model states, fluxes, and water age (section 4.3.). This was evaluated using local sensitivity analysis for visu-
alization and global sensitivity methods to estimate identifiability and the relevance of parameters for cali-
bration (section 4.4.). We also test the model calibrated to the 2011/2012 study period for the rainfall-runoff
isotope data from 2008/2009 and vice versa. Analysis used the R language [R Core Team, 2013].

4.1. Basic Model


Alternative models were previously evaluated using a multimodel approach, which explored the influence
of structure on performance in terms of simulation of runoff and conservative tracers [see Birkel et al., 2010].
The model structure that was most successful—and broadly consistent with empirically based understand-
ing of the catchment—involved conceptualization of the dynamic, nonlinear expansion and contraction of
the two major landscape units (unsaturated hillslopes and saturated riparian area) and a deeper ground-
water reservoir (see Figure 4). This was based on field mapping of the riparian saturation area fSAT con-
nected to the stream under different wetness conditions and associated rapid surface runoff mechanisms.
The percentage catchment area occupied by the saturation area is a function of antecedent wetness AW
and soil moisture capacity Smax. A calibrated 7 day antecedent wetness algorithm best represented this:

fSAT5f ðAW; Smax Þ: (1)

Subsequently, catchment precipitation P and evapotranspiration ET are partitioned into the hillslope (Pup,
ETup) and saturation area (Psat, ETsat) according to the extent of fSAT (equations (2–5)).

Pup 5PfSAT; (2)

Psat 512Pup ; (3)

ETup 5ETfSAT; (4)

ETsat 512ETup : (5)

Previous geochemical tracers identified distinct groundwater and soil water sources contributing to runoff
[see Birkel et al., 2010]. This was conceptualized in the original model as an additional linear groundwater

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3486
Water Resources Research 10.1002/2013WR014925

Figure 4. Conceptual diagram of the simple model structure developed indicating internal state variables (active storage components Sup,
Slow, and Ssat with their respective calibrated passive storage parameters upSp, lowSp, and satSp resulting in the storage concentrations
cSup, cSlow, and cSsat) and fluxes (Q1, Re, Q2, and stream Q with c indicative of concentrations) with model parameters in red. Note that c
refers to the coupled isotope transport and that observations representative of model states and fluxes used for calibration are shown in
the lower right corners of each of the storage boxes.

reservoir with contributions to annual streamflow 30–40%. The model operated on daily time steps with a
total of five parameters.

4.2. Model Evolution Through Incorporation of Additional Data


To conceptualize internal processes of runoff generation in a manner consistent with empirical data, we
extended the previous model using hydrometric (groundwater level and soil moisture) and isotopic (soil
water isotopes from different soils and depths) data from the hillslope transect. We assumed that the previ-
ous model structure is an empirically based hypothesis of catchment functioning, rather than repeating our
multimodel evaluation [Birkel et al., 2010]. This also maintains a simple, parsimonious model based on domi-
nant processes [Kirchner, 2006]. Data for use in modeling were selected based on visual screening of time
series (to detect outliers), as well as correlation and cross correlation of streamflow to other variables to
maximize the information content for the parsimonious addition of new data. Soil isotopes and ground-
water level data proved the most informative compared to soil moisture and groundwater isotopes in terms
of their greater dynamics as reflected in increased coefficients of variation (CV; Table 1). Soil moisture varied
little at S1 and S2 and damped groundwater isotopes had equivalent variability (CV) to deeper soil waters
[Tetzlaff et al., 2014].
The new groundwater and soil isotope data were largely consistent with previous conceptualizations of the
dynamic saturation area as a large storage volume. This storage also modulates the hydraulic and tracer
response of the upper hillslope contributions before release to the stream. Thus, we maintained a simple
lumped model structure (Figure 4), spatially distinguishing the basic landscape features via the nonlinear
dynamics of the saturation area (equations (1–5)). However, we modified the mixing of these outflows in
the riparian area. Full equations for the new model are presented in Table 2. These were based on simulat-
ing storage changes from the water balance in the upper hillslope (Sup; equation (6)) and groundwater (Slow;
equation (7)) reservoirs both draining into the saturation area (Ssat; equation (8)).
Given damped isotopic signatures in the deeper soil horizons and the distinct dynamics in the upper hori-
zons at 10 cm (Table 1 and Figure 2), we allowed for evapotranspiration (equations (9)–(11)) from the soil
stores after a simple interception routine (equation (10)). The saturated hillslope reservoir (Slow) is linearly
recharged (Re) from the upper store (equation (7) and (13)). Both reservoirs are forced to completely drain
into the saturation area storage Ssat (equations (12)–(14) and (8)). The latter generates streamflow with a
nonlinear algorithm using a power function (equation (15)). This maintains relatively few parameters and
allows us to test the hypothesis that all runoff and tracers are stored (at least temporarily) and released

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3487
Water Resources Research 10.1002/2013WR014925

Table 2. Algorithms Involving Hydrometric and Isotope Data in Parsimonious Model Structure of the Evolving Bruntland Burn Model
Representative
Model Components Perceptual Model Conceptual Model Equation Observations References
Hillslope (Sup, Slow)—saturation Distinct topography, geology dSup 6 Soil isotope and ground- Tetzlaff et al. [2008];
5Pup 2ETup 2Re 2Q1
area (Ssat) reservoirs conform and soils. Hillslopes drain into dt water level modified according
the active storage (Sn) saturation area and recharge dSlow 7 observations to Birkel et al. [2010]
5Re 2Q2
(Re) groundwater (Slow) dt
dSsat 8
5Psat 2ETsat 1Q1 1Q2 2Q
dt
Evapotranspiration (ET) Vegetation adjusted Penman- DRn 1qa cp ðde Þra21 9 Meteorological station Dunn and McKay [1995]
ET 5   
Monteith equation D1c 11 rrac Lv
Interception (I) Intercepted precipitation subject if ðP > ET Þ; I5P2ET 10 Empirical observations
to evaporation. Soil moisture if ðP < ET Þ; I5P2ET; P50; Sn 2ET 11
losses due to transpiration
and evaporation
Runoff generation Nonlinear streamflow genera- Q1 5Sup a 12 Groundwater level Modified according to
tion (Q), linear contributing Re 5Sup r 13 observations and Birkel et al. [2011a]
fluxes (Recharge Re, Q1, Q2) Q2 5Slow b 14 measured discharge
Q5k ðSsat Þ11a 15 at outlet
Mixing model Predominantly complete mixing MVup 5upSp ð12fSAT Þ 16– Soil isotope observa- Modified according to
due to high wetness with MVrip 5satSp fSAT 17 tions and isotopes in Birkel et al. [2011a]
X
some degree of preferential S5 Sn 1Sp 18 stream
recharge (cPRe). Dynamic mix- d ðcS Þ X X 19
ing volumes (MV) according 5 cI;j Ij 2 cn Ok
dt j k
to wetness state  
d cSup   20
5 cP Pup 2cE ETup 2cQ1 2cP Re
dt
Isotopic fractionation Potential for fractionation in sat- ac2hcA 2e 21 Stream and surface sat- Gibson and Edwards
cE;up ; cE;sat 5
uration areas and upper soil 12h11023 eK uration area Local [2002]; modified
horizons Evaporation Lines according to Birkel
X
N   Qn ðtj Þ et al. [2011a]
Age dating Distribution of water age (pF,Q) pF;Q 5 pF;QN tj 2ti ; tj Qðt jÞ
22 Hrachowitz et al. [2013]
of all contributing fluxes N to n51

total discharge Q

from the saturation areas connected to the stream. This also allows mixing with resident waters in a way
conceptually similar to Seibert et al. [2009].
For isotope transport, each model reservoir allows complete mixing of tracers with an additional storage
volume (Sp) that does not contribute hydraulically to streamflow generation (equation (18)). Complete mix-
ing can be considered a reasonable first approximation here: generally wet conditions on the hillslopes and
the large storage volumes in organic soils and drift deposits mean that catchment-scale water storage in
the upper 60 cm of the soils (300 mm) is large relative to daily precipitation [Tetzlaff et al., 2014]. However,
we relaxed the assumption of complete mixing [Hrachowitz et al., 2013] using available dynamic mixing vol-
umes (MV) dependent on the catchment wetness state and saturation area extent (equations (16) and (17)).
This assumes that the expansion of the saturation areas results in greater available mixing volumes. Further,
the saturated hillslope reservoir (Slow) was recharged with the isotopic content of rain rather than soil water
(equation (20)). This was based on observations of depleted deep soil and groundwater isotope values
(Table 1) resulting from preferential recharge during intense, isotopically depleted events. This is supported
by deeper soil horizon isotope signatures being more depleted than stream water (see mean and range in
Table 1 and Figures 2d and 3).
Given potential isotopic fractionation of the LEL resulting from evaporation in the upper and lower soil hori-
zons in both the hillslope and saturation area (Figure 3), we accounted for this in both reservoirs. This
explicitly calculates the isotopic content of vapor losses (cE, equation (21)), a process neglected in most
models. Finally, we tracked the age of water fluxes contributing to discharge (equation (22)) [Hrachowitz
et al. 2013]. This tests the impact of different data used for model calibration on stream water age estimates
[Dunn et al., 2010].
The final daily flow-tracer model uses eight calibrated parameters; five are used to simulate discharge (a, b,
r, k, and a) and three additional mixing volumes (upSp, lowSp, and satSp) simulate tracer transport in each
reservoir (Figure 4). The model was applied to the period from 1 October 2008 to 31 May 2012. However,

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3488
Water Resources Research 10.1002/2013WR014925

Table 3. Model Performances in Terms of the Euclidean Distance (ED, n Number of Combined Objective Functions Evaluating s States and f Fluxes) and Initial, Mean, and Posterior
5th/95th Parameter Percentiles are Shown According to Hydrometric, Isotopic, and Combined Data Sets Used for Calibration
Calibration data a (day21) b (day21) r (day21) k (day21) a (–) upSp (mm) satSp (mm) lowSp (mm) Best ED
Hydrometric
1(a) (ns52) [0,1]; [0,1]; [0,1]; [0,1]; [0,2]; 0.55
0.51; 0.14; 0.18; 0.72; 0.08;
[0.47,0.64] [0.13,0.15] [0.14,0.40] [0.7,0.8] [0.05,0.12]
1(b) (nf53) [0,1]; [0,1]; [0,1]; [0,1]; [0,2]; 0.77
0.23; 0.04; 0.93; 0.06; 0.63;
[0.14,0.22] [0.009,0.027] [0.78,0.99] [0.02,0.11] [0.15,0.92]
1(c) (ns52, nf53) [0,1]; [0,1]; [0,1]; [0,1]; [0,2]; 1.12
0.28; 0.05; 0.93; 0.04; 0.68;
[0.17,0.54] [0.009,0.04] [0.7,0.99] [0.03,0.08] [0.11,0.93]
Isotopic
2(a) (nf56) [0,1]; [0,1]; [0,1]; [0,1]; [0,2]; [1,5000]; [1,5000]; [1,5000]; 0.65
0.05; 0.09; 0.07; 0.29; 0.84; 240; 346; 3116;
[0.02,0.07] [0.006,0.78] [0.06,0.1] [0.02,0.43] [0.59,1.37] [170,201] [141,1192] [2728,4748]
2(b) (nf56, ns52) [0,1]; [0,1]; [0,1]; [0,1]; [0,2]; [1,5000]; [1,5000]; [1,5000]; 0.73
0.02; 0.14; 0.07; 0.09; 0.6; 195; 163; 3998;
[0.009,0.09] [0.1,0.2] [0.05,0.09] [0.01,0.81] [0.51,0.74] [170,223] [67,274] [2820,4625]
Combined
3(a) (ns55) [0,1]; [0,1]; [0,1]; [0,1]; [0,2]; [1,5000]; [1,5000]; [1,5000]; 1.03
0.21; 0.014; 0.92; 0.03; 0.73; 4013; 160; 3580;
[0.1,0.51] [0.007,0.016] [0.77,0.99] [0.02,0.1] [0.13,0.94] [3125,4858] [131,278] [1974,4200]
Calibration 2008/2009 0.12; 0.0076; 0.74; 0.1; 0.7; 1618; 610; 2633; 0.99
[0.11,0.15] [0.007,0.008] [0.67,0.8] [0.015,0.24] [0.3,1.26] [517,3365] [262,1507] [1343,4068]
3(b) (nf58, ns55) [0,1]; [0,1]; [0,1]; [0,1]; [0,2]; [1,5000]; [1,5000]; [1,5000]; 1.62
0.04; 0.27; 0.14; 0.06; 0.42; 280; 305; 3765;
[0.01,0.05] [0.07,0.37] [0.07,0.27] [0.03,0.1] [0.23,0.77] [162,292] [10,705] [1301,4714]

calibration only used the study year 1 June 2011 to 31 May 2012 where the groundwater and soil isotope
data were available. Incorporation of the tracer data for evaluating internal model state variables and fluxes
is shown in Figure 4. Initial tracer composition in the different stores was set to the observed means in Table 1.
The preceding 2.5 years were used as a warm-up period minimizing the impact of initial values on the final
year used for analysis.

4.3. Data-Based Model Calibration


We incorporated groundwater level and soil isotope data as additional targets for model calibration as well
as usual evaluation of rainfall-runoff dynamics. This was to assess whether model consistency increases due
to calibration of internal model state variables against observed data. Consistency is used here in terms of a
balanced model able to represent equally well different types of data representative of internal model
states and fluxes. Consistency C was estimated as the ratio of the best fit objective function (OF) value
obtained in the absence of representative state data for calibration to a model calibration using observed
state and/or flux data. A consistency C 5 1 means that the model equals the internal model state and/or
flux as if it were calibrated against observed data representative of the state. A consistency C < 1 indicates
lower consistency compared to the model using internal state data for calibration. Conversely a consistency
C > 1 indicates better representation of internal states compared to the model calibrated against this state.
We therefore accept trade-off effects in terms of a balanced model representation at the expense of single
model performance measures by using multiple data sets and OFs for calibration [Gupta et al., 1998; Price
et al. 2012]. We defined the following models based on the data sets as targets for calibration (Table 3):
1. Hydrometric Models with the flow model (with five calibrated parameters): (a) evaluation of groundwater
levels from the saturation area (GWL 1) and hillslope (GWL 2); (b) evaluation of discharge (Q) alone; and (c)
combination of both 1(a) and 1(b) (Q 1 GWL 1, 2).
2. Isotopic Models using the coupled flow-tracer model (with eight calibrated parameters): (a) evaluation of
isotope data from different soils and soil depths (Soil 1_10cm, Soil 3_10cm, Soil 3_50cm) and (b) combina-
tion of soil isotope data 2(a) with stream isotope data (Q_d2H).
3. Combined Models using the coupled flow-tracer model (with eight calibrated parameters): (a) evaluation of
discharge 1(b) and stream isotope data (Q 1 Q_d2H); and (b) combination of all available data (1(c) and 2(b)).

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3489
Water Resources Research 10.1002/2013WR014925

We used a range of OFs to evaluate different aspects of the simulated hydrograph, storage dynamics, and
isotope dynamics. For discharge, the Nash-Sutcliffe Efficiency criterion (NSE) was used along with the NSE
(lnNSE) applied to log transformed discharge and the volumetric error (VE) in order to match high and low
flows as well as hydrograph dynamics [Criss and Winston, 2008]. Groundwater levels from the saturation
area (GWL 1 from S1) and hillslope (GWL 2 from S2) are used to fit the observed storage dynamics using the
coefficient of determination (R2—defined as the square of the Pearson correlation coefficient). Isotope sim-
ulations were evaluated using the R2 and VE criteria. Calibration was achieved using a Differential Evolution
(DE) genetic algorithm for parameter sampling [Price et al., 2005]. Evolutionary algorithms based on natural
selection are used where parameter populations are transformed over successive generations using arith-
metic operations to minimize an OF. However, recognizing that a global minimum for a model with large
numbers of parameters will be difficult to achieve [Beven, 2006], we use the DE algorithm as an advanced
parameter sampling technique. Therefore, initial parameter ranges were intentionally wide (Table 3) allow-
ing the search algorithm to initially explore the parameter space. The best parameter population after every
50 evaluations (parameter generation) was retained for further assessment of model parameter variability,
parameter sensitivity, and identifiability. The 5th and 95th percentiles of randomly chosen model state and
flux simulations generated from a subset of retained parameter sets were used to show the posterior vari-
ability of model parameters due to different calibration targets.

4.4. Parameter Sensitivity and Identifiability


The most balanced model was sought in terms of combining different OFs from calibration data, which
simultaneously represents different aspects of hydrometric and isotopic simulations. For combination of dif-
ferent OFs, we estimated the Euclidean Distance (ED):

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X ffi
ED5 ð12OFn Þ2 : (23)
n

ED can be used in the form of the sum of the formerly mentioned n number of OFs (e.g., NSE, VE, etc.) based
on data available for representing simulated model states s and fluxes f. With subscripts s and f, we indicate
if n objective functions were used to evaluate model states and/or fluxes, respectively. Thus, calibration
using the groundwater level data with R2 evaluating the fit to observed data from the two wells results in
ns 5 2. If the former was to be combined with the discharge NSE, this would result in n 5 3 (ns 5 2; nf 5 1).
The ED is subsequently minimized by the evolutionary algorithm with ED 5 0 indicating a perfect fit.
To assess the effect of combining different types of data and objective functions on model calibration, we
used global sensitivity analysis to rank the parameters of each tested combination of ED and explore
whether the most balanced model can be found in terms of sensitivity. We used the Latin Hypercube (LH)
one-factor-at-a-time (LH-OAT) global sensitivity analysis method of van Griensven et al. [2006] which com-
pares well with other methods [e.g., Sobol, 1993; Shin et al., 2013]. The LH random sampling is a stratified
Monte Carlo approach to reduce the need for large numbers of simulations. This was coupled with the ran-
dom OAT design from Morris [1991] where effects on model parameters are randomly assessed over the
total input range. The final effects can be ranked with the most important to the least important rank
according to the total number of parameters used.
Temporal sensitivity was calculated from the local effect of parameters on simulated output variables in the
form of a sensitivity matrix S [Brun et al., 2001; Soetaert and Petzold, 2010]:

@yi DHj
Si;j 5 ; (24)
@Hj Dyi

where the i,jth element of the normalized sensitivities in the matrix contains an output variable yi at a cer-
tain time calculated by a model parameter Hj. The delta D indicates scaling of both components. The higher
the sensitivity value the greater the parameter’s influence on model output. For example, if two parameters
reflect similar dynamics or sensitivity close to zero, the effect of the parameters is indistinguishable and has
no effect on model outcome. Identification of such periods may be related to model structural deficits. This

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3490
Water Resources Research 10.1002/2013WR014925

Table 4. Optimal Calibrated Model Parameter Sets From Minimal Euclidean Distance (ED, n Number of Combined Objective Functions Evaluating s States and f Fluxes) of Selected
Evaluation Criteria (R2, NSE, lnNSE, and VE) for Hydrometric, Isotopic, and Combined Data Sets Used in Calibrationa
Hydrometric Simulations Isotopic Simulations
2 2
R (Soil_d H: VE (Soil_d2H:
Optimal Parameter Set 1_10cm, 3_10cm, 1_10cm, 3_10cm,
Calibration Data [a, b, r, k, a, upSp, satSp, lowSp] ED R2 (GWL 1, 2) NSE (Q) lnNSE (Q) VE (Q) 3_50cm, Q_d2H) 3_50cm, Q_d2H)
Hydrometric
1(a) (ns52) [0.5, 0.14, 0.14, 0.7, 0.1] 0.55 (0.53, 0.70) 0.04 0.44
1(b) (nf 5 3) [0.21, 0.009, 0.99, 0.01, 0.92] 0.77 (0.13, 0.97) 0.54 0.51 0.63
1(c) (ns 5 2, nf 5 3) [0.23, 0.009, 0.92, 0.03, 0.76] 1.12 (0.15, 0.69) 0.52 0.56 0.64
Isotopic
2(a) (nf 5 6) [0.02, 0.06, 0.06, 0.02, 0.87, 179, 141, 2991] 0.65 (0.36, 0.93) 0.33 0.41 (0.81, 0.83, 0.48, 0.98) (0.96, 0.91, 0.93, 1.01)
2(b) (nf 5 6, ns 5 2) [0.009, 0.18, 0.07, 0.02, 0.59, 179, 80, 4619] 0.73 (0.25, 0.91) 0.61 0.86 (0.71, 0.83, 0.47, 0.63) (0.96, 0.91, 0.94, 0.98)
Combined
3(a) (ns 5 5) [0.21, 0.009, 0.99, 0.03, 0.8, 4857, 127, 3870] 1.03 (0.13, 0.97) 0.53 0.56 0.63 (0.84, 0.99, 0.79, 0.63) (0.98, 0.95, 1.03, 0.97)
3(b) (nf 5 8, ns 5 5) [0.04, 0.009, 0.11, 0.03, 0.89, 184, 106, 4670] 1.62 (0.3, 0.7) 0.52 0.53 0.63 (0.73, 0.83, 0.47, 0.64) (0.95, 0.91, 0.94, 0.97)
a
Model performance is shown for all evaluated model flux and state variables obtained by calibration (in bold). The consistency criteria C in underlined Italics font represent simu-
lated model fluxes and internal states which were not explicitly calibrated using state representative data.

estimation was implemented using the R package ‘‘Calibration, Sensitivity and Monte Carlo Analysis (FME)’’
by Soetaert and Petzold [2010].
This can also help assess the identifiability of parameters; local sensitivity (the effect of one parameter on
an output variable at a time) analysis is extended to a multivariate context by assessing the colinearity (the
linear dependence) of all parameter sets:

1
c5 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
 h T iffi ; (25)
min EV ^S ^S

where the colinearity index is based on the Eigenvalues EV of ^S. ^S contains the sensitivity matrix Si;j (equa-
tion (24)) that corresponds to the parameters included in the model. The higher the value of c the more
dependent are the parameters, in which case, dependent parameters could potentially generate similar out-
put variables. Parameter sets are considered independent—and thus identifiable—for a colinearity index of
one (orthogonal) and reach infinity for highest dependency. Brun et al. [2001] suggest that the range of the
colinearity index from 10 to 15 is indicative of identifiable model parameters, whereas Soetaert and Petzold
[2010] refer to a threshold value below c 5 20 to detect identifiability.

5. Model Results
5.1. Simulations of Water and Isotope Fluxes
Table 4 summarizes the OFs resulting from the different calibration targets. Figure 5 shows the resulting
simulated discharge, storage dynamics, soil isotopes, and stream isotopes expressed as the 5th/95th per-
centiles of simulations obtained from parameter sets during the calibration (shown in Table 3). Generally,
the dynamics of discharge (high and low flows) are captured by the simulations resulting in best fit values
of 0.52 for the NSE and lnNSE and VE 5 0.63 for the calibrated model. However, smaller discharge peaks
occurring specifically during and after drier periods are not well reproduced. Further, larger peak flows in
July and December 2011 are not simulated; most likely due to measurement errors (i.e., recorded precipita-
tion is insufficient to reflect discharge magnitudes) which probably reflect snowmelt influence in December
and under-catch in July convective rain. The model was tested quite successfully for flows against the earlier
data from 2008/2009 with improved NSE of 0.62 (Table 5), though similar limitations in simulating smaller
events was evident (Figure 6a). The model 3(a) calibrated against this earlier year, could also predict quite
well the 2011/2012 runoff with only slightly lower efficiencies for most OFs.
Model internal states and fluxes were also simulated in both the calibration and independent test periods.
Temporal dynamics of the simulated unsaturated hillslope storage Sup show reasonable similarity to the
observed groundwater level in the calibrated model (GWL2, Figure 5b; R2 5 0.7, Table 4). In contrast, the

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3491
Water Resources Research 10.1002/2013WR014925

Figure 5. Simulated 95th and 5th percentile bounds of fluxes (a) discharge, (e) soil (S3_10cm d2H), and (f) stream d2H and storage compo-
nents (b) Sup, (c) Slow, and (d) Ssat of model 3(b) using all available data for calibration. The observed variables are given as black lines for
comparison. Note that the measured groundwater levels (GWL 1, 2) are given in cm below surface; whereas the model simulates storage
content in mm. The model 3(a) from calibration with 2008/2009 data was tested and applied to the study period 2011/2012 for simulation
(mean of accepted parameter sets in blue).

Table 5. Model Tests From 2011/2012 Calibration to 2008/2009 Using Model 3(b) and From 2008/2009 to 2011/2012 Using Model 3(a)a
Hydrometric Simulations Isotopic Simulations
2 2
R (Soil_d H: VE (Soil_d2H:
Optimal Parameter Set 1_10cm, 3_10cm, 1_10cm, 3_10cm,
Calibration Data [a, b, r, k, a, upSp, satSp, lowSp] ED R2 (GWL 1, 2) NSE (Q) lnNSE (Q) VE (Q) 3_50cm, Q_d2H) 3_50cm, Q_d2H)
Model 3(b)
Calibrated model 2011/2012 [0.04, 0.009, 0.11, 0.03, 1.62 (0.3, 0.7) 0.52 0.53 0.63 (0.73, 0.83, 0.47, 0.64) (0.95, 0.91, 0.94, 0.97)
(nf 5 8, ns 5 5) 0.89, 184, 106, 4670]
Test 2008/2009 0.62 0.33 0.66 0.59 0.97
Model 3(a)
Calibrated model 2008/2009 [0.13, 0.0076, 0.75, 0.016, 0.99 0.64 0.54 0.69 0.6 0.98
1.18, 2910, 332, 3628]
Test 2011/2012 (0.17, 0.99) 0.48 0.48 0.61 (0.63, 0.87, 0.67, 0.54) (0.97, 0.92, 1.03, 0.96)
a
Model performance is shown for all evaluated model flux and state variables obtained by calibration (in bold). The consistency criteria C in normal underlined Italics font represent
simulated model fluxes and internal states which were not explicitly calibrated using state representative data.

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3492
Water Resources Research 10.1002/2013WR014925

simulated riparian storage Ssat


has less overlap with the
dynamics of GWL 1 (R2 5 0.3,
Table 4) which probably
reflects the lack of variability
of GWL 1 (Figure 5d). How-
ever, the dynamic storage
changes needed to reproduce
the hydrograph are relatively
small compared to the much
larger (i.e., order of magni-
tude) mixing volumes indi-
cated by the calibrated
parameters upSp, lowSp, and
satSp. Simulations for the
upper hillslope, lower hillslope
Figure 6. Simulated 95th and 5th bounds of (a) discharge and (b) stream d2H for the test and saturation area mean sto-
period 2008/2009. The model 3(b) calibrated from the study period 2011/2012 was used to rages vary over a range of
simulate 2008/2009 (mean of accepted parameter sets in blue).
around 60, 30, and 20 mm,
respectively (Figure 5). Sur-
prisingly the tests using the 2008/2009 calibration which did not include groundwater levels as targets
show similar consistency (0.17 in Table 5 compared to 0.13 in Table 4 for GWL 1 and 0.99 in Table 5 com-
pared to 0.97 in Table 4 for GWL 2) comparable to the original calibration.
The stream isotope dynamics are also simulated quite well by both the calibrated and test models (Figures
5f and 6b). For the calibration, the simulation bands bracket the observed dynamics with a best fit
R2 5 0.64. The R2 for the test period was similar (0.59) though a period of snowmelt following rain in Febru-
ary 2009 where the isotopically depleted melt exhibited marked effects of fractionation was missed despite
the runoff response being captured [see Birkel et al., 2011a]. Also the predicted stream isotopes for 2011/
2012 only crudely capture much of the small event isotope dynamics compared to the 2008/2009 test
period (Figures 5f and 6b).
Soil isotope dynamics in the upper soil horizon (S3_10cm) of the spodosol are generally reproduced in the
unsaturated hillslope reservoir with the additional mixing volume resulting in a best fit R2 5 0.83 (Figure
5e). However, enrichment during summer 2011 is not captured, despite accounting for fractionation in the
model. Nevertheless, if compared to the simulations that did not include soil isotope data for calibration
(2008/2009) the more significant fractionation effect for the 2011/2012 calibration is evident. The isotope
time series of the upper riparian peat soil (S1 10cm) is also well simulated (R2 5 0.73, Table 4). Less well
reproduced are the dynamics of the deeper horizon (50 cm) of S3 (R2 5 0.47), probably due to the lack of
variability in the damped signal. Calibration using soil isotopes (model 2(a)) results in C 5 1.01; comparable
to the best fit VE for stream isotope simulations. Similarly, calibrated and tested soil isotope OFs and consis-
tency statistics are comparable supporting the calibration value of data on internal model states (Table 5).

5.2. Parameter Sensitivity


To visualize sensitivity of the most balanced model (3(b)), we evaluated local sensitivity of parameters on
simulated output (equation (24)). The temporal variation of parameter influence on simulated stream iso-
topes is separated into the isotope (mixing volumes: upSp, lowSp, and satSp) and runoff generation (a, b, r, c,
and a) groups (Figure 7). Outputs exhibit sensitivity to all parameters, though effects are as expected time
variant and generally lower for the isotope group than the runoff parameters. The parameter satSp has the
greatest influence (highest sensitivity values) of the mixing volumes on stream isotope simulations with the
nonlinear runoff parameter a being similarly influential though matching the dynamics are inverted. This is
consistent with the highest sensitivity occurring during events when rapid displacement of water takes
place from the saturated zone as it expands and its storage changes. Here, direction of the sensitivity time
series follows the event characteristics in terms of more isotopically depleted (2 sensitivity) or enriched (1
sensitivity) displaced water. The hillslope storage parameter upSp shows the second most pronounced
effect on simulations during recession periods as water drains into the saturation areas. Parameter a mirrors

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3493
Water Resources Research 10.1002/2013WR014925

this but is positive and the


recharge parameter r follows
the dynamics of upSp, but
with a lag. The lowSp parame-
ter has lowest sensitivity, but
a steady positive impact dur-
ing low flow periods due to
groundwater drainage and
mixing in the saturation area.
The b parameter is least sen-
sitive and the slightly more
sensitive k is the only param-
eter that exhibits a significant
correlation (r 5 0.57) to
parameter a.
Global sensitivity analysis
using LH-OAT estimated the
overall and ranked relative
importance of the model
parameters on the combined
OFs (ED) using different data
sets for model calibration
(Table 6). The hydrometric
(1(a), 1(b), and 1(c)) and com-
Figure 7. Visualization of time-variant (b) tracer and (c) rainfall-runoff parameter sensitivity as
bined (3(a) and 3(b)) calibra-
a function of impact on stream d2H output (shown as mean stream water isotope simulation tion data sets showed a and k
against observations) for the model calibrated using all available data (3(b)). as the most important param-
eters (ranked 1 and 2). For
hydrometric-based models, ranked parameters are identical. Including isotopic data resulted in the unsatu-
rated storage outflow coefficient a being the most sensitive parameter (rank 1). Calibrating against soil iso-
tope data (2(a)) resulted in the upper passive storage parameter upSp being ranked third with the
saturation passive storage parameter satSp as insensitive (relative importance of 0). Including stream isotope
data (2(b)) for calibration ranks satSp 4th and the most sensitive passive storage parameter. Overall 2(b)
exhibits more sensitive parameters. The satSp parameter was ranked third after a and k using 3(a). However,
calibration with all available data (3(b)) possibly resulted in more sensitive parameters throughout the ranks
indicating a more balanced model.

5.3. Effect of Additional Calibration Data on Parameter Identifiability and Model Predictions
In addition to the most balanced model, use of data matching internal model state variables and fluxes
proved insightful when combined data sets and OFs were used for calibration. Table 3 shows the prior,

Table 6. Relative Importance of Parameter Sensitivity Determined for Each Calibration Data Set Against Model Performance (ED, n Number of Combined Objective Functions
Evaluating s States and f Fluxes) Using Latin Hypercube Morris One-at-a-Time Global Sensitivity Analysis [van Griensven et al., 2006]
Ranked Relative Parameter Importance (1—Most Sensitive, 8—Least Sensitive)

Calibration Data 1 2 3 4 5 6 7 8
Hydrometric
1(a) (ns 5 2) a (0.34) k (0.25) a (0.18) b (0.13) r (0.1) satSp (0) upSp (0) lowSp (0)
1(b) (nf 5 3) a (0.49) k (0.2) a (0.19) b (0.07) r (0.05) satSp (0) upSp (0) lowSp (0)
1(c) (ns 5 2, nf 5 3) a (0.51) k (0.46) a (0.019) b (0.018) r (0.01) satSp (0) upSp (0) lowSp (0)
Isotopic
2(a) (nf 5 6) a (0.52) r (0.18) upSp (0.17) lowSp (0.07) b (0.06) k (0) ripSd (0) a (0)
2(b) (nf 5 6, ns 5 2) a (0.28) a (0.17) k (0.14) satSp (0.13) b (0.13) r (0.11) lowSp (0.02) upSp (0.009)
Combined
3(a) (ns 5 5) a (0.5) k (0.43) satSp (0.02) b (0.02) a (0.015) r (0.012) upSp (0.002) lowSp (0.0016)
3(b) (nf 5 8, ns 5 5) a (0.36) k (0.35) a (0.14) r (0.06) lowSp (0.04) b (0.03) upSp (0.02) satSp (0.009)

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3494
Water Resources Research 10.1002/2013WR014925

Figure 8. Predictive power of the model applied to simulate discharge Q and stream d2H exclusively based on calibration against data
assumed to match internal model state variables and fluxes (b) 1(a)—groundwater levels, (c) 2(b)—soil and stream water isotopes, and (d)
2(a)—soil water isotopes only.

mean, and posterior parameter distributions for the model for all combinations, all of which can have a major
influence on the mean and 5th/95th posterior percentiles. Generally, a clear difference in parameter values
exists for the calibration using groundwater levels and soil isotope data as internal states compared to the
more common discharge and stream isotopes. Calibration based solely on discharge and/or stream isotopes
favor an unsaturated reservoir which is faster draining in terms of recharge (r > 0.9) and outflow (a > 0.2) into
the saturation area. In contrast, using soil water isotopes and groundwater levels resulted in a slower draining
upper hillslope reservoir, but a faster draining lower hillslope reservoir (b > 0.1, Table 3) despite similar nonlin-
ear runoff generation parameters (k and a) for all data used in calibration. Similar clear differentiation in
parameter values for traditional calibration targets using discharge and the use of internal model state varia-
bles—particularly soil isotope data—is also evident for the isotope mixing volumes. The latter resulted in mix-
ing volumes for the unsaturated hillslope reservoir and saturation area store of between 100 and 350 mm. In
contrast, the model 3(a) (discharge and stream d2H) calibration for 2008/2009 and 2011/2012 results in a
much larger unsaturated mixing volume (upSp > 1600 mm and > 4000 mm). The saturated mixing volume
(lowSp > 2500 mm) was equally determined by all data sets for calibration and test period.
The predictive power of models with different calibration objectives unsurprisingly varied (Table 4). Dis-
charge predictions from model 1(a) using groundwater for calibration match the overall hydrograph fluctu-
ations; though high flows are overpredicted and low flows underpredicted (Figure 8b). This mostly reflects
the choice of OF (R2) and also that the model simulates storage rather than groundwater level (NSE 5 0.02
compared to NSE 5 0.54 of model 1(b)). Calibrating with soil and stream isotope data (model 2(b)) improves
simulations (see performances and consistencies in Table 4) of overall hydrograph dynamics and recessions,
though many smaller peaks are missed (Figure 8c). This can be explained by model structure, parameter val-
ues and the conceptualization of mixing required to produce the damped stream water outputs. If only soil
isotope data are used (2(a)), predictions of flows are poorer (as expected, NSE 5 0.18 compared to
NSE 5 0.52 of model 3(b)), though the stream isotopes match the observations equally well (Figure 8d,
R2 5 0.62 compared to R2 5 0.64 of model 3(b)).
Multivariate colinearity indices (equation (25)) were used to assess overall model identifiability and quantify
the interaction of the parameter sets retained by the DE algorithm for selected models (Figure 9). When

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3495
Water Resources Research 10.1002/2013WR014925

isotopic data are included (mod-


els 2(b), 3(a), and 3(b)), the sensi-
tivity matrices for stream
isotopes (eight parameters) can
be compared to the evaluation of
the five parameter runoff model
1(b). The incorporation of
groundwater and soil water iso-
tope data clearly increases
parameter identifiability with
only models 2(b) and 3(b) having
a sufficiently low colinearity
index (c < 10) to claim identifi-
ability. Even the model calibrated
using discharge and stream iso-
tope time series (3(a)) has indices
that exceed the limits of identifi-
ability suggested by Brun et al.
Figure 9. Multivariate parameter identifiability expressed as colinearity shown for model
[2001] and Soetaert and Petzold
calibrations using different data sets (1(a)—groundwater levels, 3(a) discharge and
stream isotopes, 1(c)—discharge and groundwater levels, 2(b)—soil and stream isotopes, [2010] as does the five parameter
and 3(b) using all available objectives). Note that the multiple realizations per parameter model calibrated against dis-
group are the result of correlating each possible parameter combination. Also, models
charge alone (1(b)).
1(b) and 2(a) are not visualized.

5.4. Stream Water Age Derivates


Models with different tracer data-aided calibration were used to track the age of water fluxes to estimate
the distribution of stream water ages (equation (22)). Figure 10 shows the cumulative distribution function
(CDF) of stream water ages calculated from the simulated time series (inset box) from the tracer-aided cali-
brated models and the model 3(a) for 2008/2009. The models produced similar mean stream water ages of
about 1 year; however, the CDFs vary for different calibrations in terms of their extremes, consistent with
the distinction of quick and much slower flow pathways. Calibration with 2(a) shows the greatest proportion
of young water reaching the stream via quick near-surface runoff and solute transport mechanisms, but
also a longer tailing >80% of the
tracer recovery (slower runoff
generation and solute delivery to
the stream). In contrast, calibra-
tion with 2(b) shows a smaller
proportion of young water reach-
ing the stream and the greatest
proportion of old water with only
ca. 90% of tracer recovery after
ca 800 days. Calibrations using
3(a) and 3(b) give similar CDFs to
model 2(b) though with reduced
tailing of older water. Even the
youngest stream water ages cal-
culated by the models are several
weeks old (>25 days) indicating
marked mixing with stored pree-
Figure 10. The impact of incorporating additional data for model calibration (3(a)—dis-
charge and stream isotopes, 2(a)—soil water isotopes, 2(b)—soil and stream isotopes, vent waters. The 2008/2009 year
and 3(b) using all available data) on mean stream water age distributions (in days on log CDF produced by model 3(a) is
scale) of water fluxes calculated by the model. Note that the inset box shows the mean broadly similar indicating limited
stream water age time series from the retained parameter sets used to derive the empiri-
cal cumulative distribution functions (eCDF). The model calibrated using data set 3(a) for variability of mixing and flow
the test period 2008/2009 is also shown as an eCDF for comparison. pathways between the two

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3496
Water Resources Research 10.1002/2013WR014925

different study years, though this was a year with more evenly distributed precipitation and the proportion
of younger water is similar to model 2(a). Incorporation of soil isotope data for model calibrations indicated
much older water contributing to streamflow, especially during drier periods for calibrations 2(a) and 2(b)
(see inset), periods, when the more balanced model (3(b)) was simulating cessation of hillslope outflows.

6. Discussion
6.1. How Well Does a Simple, Parsimonious Model Characterize Catchment Function?
The eight parameter model—with five runoff parameters and three mixing volumes—represents a parsimo-
nious approach for the BB that can simulate catchment storage dynamics and fluxes of water and tracers.
The calibration to all available hydrometric and isotopic data produced the most balanced model in terms
of multicriteria evaluation which was tested reasonably well against independent flow and tracer time
series. The model evolved from previous work by Birkel et al. [2011a], but we modified this to utilize new
additional soil and groundwater data [Tetzlaff et al., 2014] while maintaining a parsimonious model. The
resulting model comprised an unsaturated hillslope reservoir filled after a simple interception and transpira-
tion loss routine overlying a hillslope groundwater reservoir. All hillslope fluxes drain into the store of the
saturation area from which water and solutes are mixed and nonlinearly transferred to the stream. We
explicitly accounted for evaporative fractionation, addressing the potential nonconservative behavior of iso-
tope tracers though effect of snow fractionation was not accommodated to keep parameter numbers low
[Birkel et al., 2011a]. In contrast to Fenicia et al. [2010] and Hrachowitz et al. [2013], the basic mixing assump-
tion used is that of complete mixing. However, we implemented preferential recharge and time-varying
mixing volume parameters (equations (16), (17), and (20)) relaxing the assumption of complete mixing con-
sistent with empirical data while restricting additional parameters to the necessary mixing volumes.
The model uses a power-law runoff generation mechanism (equation (15)) similar to Kirchner [2009] but
with dynamically connected separate landscape units. While capturing the general hydrological dynamics,
this appears too crude a conceptualization of the spatially complex surface flow patterns that will be
affected by microtopography [see, e.g., Frei et al., 2010]. Consequently, many of the small runoff events are
not captured well in the model simulations resulting in relatively low NSE values (0.52) compared to simu-
lations (NSE  0.6) for other study periods [Birkel et al., 2010 and 2011a]. This becomes apparent for the cali-
bration and the 2008/2009 test period resulting in best fit NSEs 5 0.64 and 0.62, respectively (Table 5).
Although these values are low compared to many modeling studies, they are not uncommon in montane
catchments with input uncertainties [e.g., Capell et al., 2013]. However, if calibration was simply based on
discharge performance using a single criterion in absence of additional calibration targets, this could be a
basis for model rejection [Beven, 2012b]. Therefore, accounting for the spatial heterogeneity of the satura-
tion area extent in future modeling will be an important research priority. Nevertheless, the current model
represents progress as a trade-off in terms of gaining additional information from different calibration data
and OFs. Thus, the runoff performance is compensated by quite good simulations of isotope dynamics in
streamflow (best fit R2 5 0.64), soil water and storage dynamics in the main model storage units [e.g., Seibert
and McDonnell, 2002]. The damping of stream isotopes reflects the mixing of new water with previously
stored water during storm event conditions as overland flow generated from the saturation areas largely
displaces preevent water into the stream (Figure 2c, see range and CV in Table 1).
Albeit specific to the model applied, sensitivity analysis revealed significant parameter sensitivity; particularly
associated with mixing (satSp) and runoff generation (a, k) in the saturation area (Table 6). While, obviously,
the mixing volume parameters are not used in discharge simulations, runoff generation parameters have a
marked effect on stream isotope simulations. Therefore, separate calibration of runoff generation and isotope
mixing parameters may ignore important integrated effects on tracer transport [Fenicia et al., 2008]. If the
time variance of sensitivity is assessed, as expected, highest sensitivity mostly coincides with periods of higher
runoff and increased displacement of water. Due to the limitations of local sensitivity analysis [e.g., Shin et al.,
2013], we estimated global sensitivity with the LH-OAT method of van Griensven et al. [2006]. This showed the
impact of different data sets and OF combinations, but demonstrated that when all available data were used
for calibration the most balanced model had parameters that all exhibited sensitivity. Despite the limitations
of simple and parsimonious models in terms of maximum performance, this study has shown that they have
significant potential for developing an internally consistent conceptualization of catchment functioning.

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3497
Water Resources Research 10.1002/2013WR014925

6.2. How Much Data are Needed to Calibrate a Conceptual Water and Tracer Transport Model?
It has been argued that maximum experimental data are always advantageous to characterize catchment
systems, though there is a pragmatic need to direct experimental efforts toward collection of data that is
most informative for model development and evaluation [McGuire et al., 2007; Soulsby et al., 2008]. We
incorporated soil isotope and groundwater level data for calibration assuming these allow different aspects
of model simulations to be assessed [Seibert and McDonnell 2002]. Given that almost inevitable data and
structural errors dictate that no optimal solution can be found for higher-parameter hydrological models,
the most balanced model representation is a more realistic goal [Gupta et al., 2008; Hrachowitz et al., 2013].
We assumed that using all combined data and OFs for calibration would result in the most balanced model
able to represent all incorporated processes adequately; this was corroborated by the independent test
period and supported by the LH-OAT sensitivity analysis.
Nevertheless, the use of different types of data provided different insights for model calibration and evalua-
tion. For simulating groundwater levels, calibration was restricted to evaluating the dynamics using the
coefficient of determination (R2) [Fenicia et al. 2008] as the model simulates storage content and the
groundwater levels represent highly variable point measurements (see section 4.2.). Nevertheless, calibra-
tion against groundwater levels revealed important insights into the internal dynamics of model states in
terms of simulating the unsaturated storage connectivity to the saturation area. This resulted in more
dynamic representation of hillslope connectivity during wetter phases as illustrated by Tetzlaff et al. [2014],
and increased groundwater contributions to streamflow and runoff generation (>parameter a, b) compared
to calibration against flow only. Incorporation of the soil isotope data for the unsaturated hillslopes (spodo-
sols at S3) and saturation area (histosols at S1)—resulted in significantly increased parameter identifiability
indicated by low colinearity indices [Soetart and Petzold, 2010]. This also points toward each parameter serv-
ing a unique and identifiable purpose in simulating discharge and stream isotopes [Brun et al. 2001]. This
can be corroborated by assessing the predictive power in simulating discharge and stream isotopes based
only on additional data assumed to match internal model state variables and fluxes. For example, general
flow dynamics were captured using soil isotope data for calibration alone and stream isotope simulations
were almost as good as models with all data used for calibration. Soil isotopes have potential—at least in
similar catchments—to be routinely used for model assessment as advocated by Clark et al. [2011].

6.3. Is Stream Water Age a Meaningful Model Diagnostic?


Tracking water ages and tracer transit times has proved a useful surrogate for catchment model evaluation
[see, e.g., Vache and McDonnell, 2006; McMillan et al., 2012; Hrachowitz et al., 2013]. Dunn et al. [2010]
showed structural impacts on age tracking caused by different process conceptualizations. In contrast, the
tracking of water flux ages through our models with identical parameterization resulted in differences in
stream water age distributions in terms of the short and long-term extremes, despite similar mean values
(Figure 10). The differences for the extremes purely reflect the effect of calibration using different data sets
and OFs. The main difference is that using only streamflow and stream isotope data (3(a)) compromises
long tailing caused by slower flow pathways and larger mixing volumes compared to calibration including
soil isotope data. This results from higher new water contributions to the stream from the saturation area
despite a much higher mixing volume parameter (upSp > 4000 mm) in the unsaturated hillslope storage.
This relatively high mixing volume contributing old water to the saturation area is needed to successfully
simulate the observed tracer damping in the stream. However, the hillslope storage remains inactive for
longer periods of time (the storage is negative and does not generate fluxes), which precludes contributions
of older water to the saturation area during drier periods. In contrast, if calibration is based on soil isotope
data, the mixing parameters are smaller due to larger fluctuation of the upper horizon isotope signatures
compared to the damped stream response. This results in younger waters (a few days old) quickly contribut-
ing to streamflow during events, but also older waters during drier periods. The most balanced model inte-
grates these extreme effects of different calibration targets in the CDF produced. The age estimates are
affected by initializing the storage concentrations and antecedent wetness in the catchment [Hrachowitz
et al., 2013], but we found the 2.5 years warm-up period was sufficient to minimize such effects. The forms
of the CDFs of the TTDs were broadly similar to the findings of McMillan et al. [2012]. As with their conclu-
sions, we find here that the average stream water ages produced by the daily modeling (1 year) are less
than those estimated from weekly isotope samples (1.9 years) using stationary lumped parameter TTDs
[Hrachowitz et al., 2010b], presumably due to better representation of storm runoff periods and time-variant

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3498
Water Resources Research 10.1002/2013WR014925

antecedent conditions. Nevertheless, the estimates lie within the 90% uncertainty bounds (0.9–2.9 years)
of these lumped models and have a similar CDF [Hrachowitz et al., 2010b]. Incorporating the calibrated
model for 2008/2009 into the analysis as a test resulted in a similar CDF compared to the 2011/2012 calibra-
tions. This indicated only modest time variability of flow pathways and mixing between the two study peri-
ods causing broadly similar simulated transit times in contrast to, e.g., Heidb€
uchel et al. [2012]. The flux
tracking shows how high-resolution tracer data from different water sources, if sampled long enough pro-
vides a useful tool in model diagnostics through assessment of stream water age.

7. Conclusions
We presented a simple, lumped conceptual rainfall-runoff model developed from a data-rich empirical
understanding of an upland catchment. The model explicitly calculates water and tracer storage and trans-
port for the major landscape units (hillslope—riparian saturation area). Detailed experimental data (ground-
water level and soil isotopes for the most important soils) were incorporated into the modeling and
assumed to reflect internal state variables and fluxes. These were then used to calibrate and evaluate the
model in terms of performance, predictive power, parameter sensitivity, and identifiability. We show that
incorporation of such data improves identifiability. Consequently, the coupled flow-tracer model with eight
parameters exhibited better identifiability than the five parameter flow model. In this context, soil isotope
data resulted in the highest predictive power if the model is used for flow and, particularly, stream isotope
simulations. However, the spatial heterogeneity of the saturation areas in the model is currently inad-
equately represented and requires better conceptualization for improved runoff simulations of small events.
Furthermore, if the model calibrated against different data sets and OFs is used to derive stream water age
distributions, these show comparable mean values, but vary for the extremes. This suggests that care has to
be taken if such information is used to inform water quality assessments with emphasis on short and
longer-term pollutant transport. Our results imply that stream water age distributions depend not only on
model performance, but also on model consistency as indicated by the model’s ability to represent the
information content of data used in calibration. Thus, tracking water ages in conceptual models can be a
strong test for internal consistency of a model structure. We conclude that incorporating tracer data into
rainfall-runoff models offers guidance to fieldwork efforts and further insight into using simple models as a
learning tool about catchment behavior.

Notation
P catchment precipitation (mm d21).
ET catchment evapotranspiration (mm d21).
ETact actual evapotranspiration (mm d21).
I interception (mm d21).
S total storage (active 1 mixing volume) (mm).
Sn internal model storage content (active storage) (mm).
Sp additional mixing volume parameters (mm).
Q simulated discharge at catchment outlet (mm d21).
fSAT dynamic saturation area extent (% catchment area).
fHill dynamic hillslope storage extent (% catchment area).
AW antecedent wetness (mm).
Smax maximum soil moisture storage (mm).
cn isotope signature or tracer concentration of n storage components (&).
Ij j storage inflows (e.g., Pup, Psat, Qup, Re, Qlow) (mm d21).
Ok k outflow or loss components (e.g., ETup, ETsat, Qup, Re, Qlow, Qs) (mm d21).
Rn net radiation (W m22).
rc, ra canopy and aerodynamic roughness coefficients (m s21).
Lv latent heat loss (2453 MJ m23).
cp specific heat capacity of air (J kg21 K21).
qa dry air density (kg m23).

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3499
Water Resources Research 10.1002/2013WR014925

c psychrometric constant (c  66 Pa K21).


de vapor pressure deficit (Pa).
D rate of change of saturation specific humidity with air temperature (Pa K21).
h relative humidity (0–1).
T air temperature ( C).
eE, eK equilibrium and kinetic fractionation coefficients (-).
a equilibrium water vapor isotope fractionation (-).
cA isotopic composition of ambient moisture (&).
MV available mixing volume according to saturation area extent (mm).
pF probability density function of water ages (-).
t time (day).
ti time of exit from the system (day).
tj time of entry to the system (day).

Acknowledgments References
The authors greatly appreciate help in
Barnes, C. J., and M. Bonell (1996), Application of unit hydrograph techniques to solute transport in catchments, Hydrol. Processes, 10,
the field by Konrad Piegat, Rene
793–802.
Capell, Jonathan Dick, and Josie Geris.
Berman, E. S. F., M. Gupta, C. Gabrielli, T. Garland, and J. J. McDonnell (2009), High-frequency field-deployable isotope analyzer for hydro-
Part of the hydrometeorological data
logical applications, Water Resour. Res., 45, W10201, doi:10.1029/2009WR008265.
was provided by Marine Scotland,
Beven, K. (2006), A manifesto for the equifinality thesis, J. Hydrol., 320, 18–36.
Freshwater Laboratory, and in
Beven, K. (2012a), Rainfall-Runoff Modelling: The Primer, 2nd ed., Wiley-Blackwell, Chichester, U. K.
particular, Iain Malcolm and the
Beven, K. (2012b), Causal models as multiple working hypotheses about environmental processes, C. R. Geosci., 344, 77–88, doi:10.1016/
Scottish Environmental Protection
j.crte.2012.01.005.
Agency. We are grateful to the careful
Birkel, C., D. Tetzlaff, S. M. Dunn, and C. Soulsby (2010), Towards simple dynamic process conceptualization in rainfall-runoff models using
and constructive comments of Hoshin
multi-criteria calibration and tracers in temperate, upland catchments, Hydrol. Processes, 24, 260–275.
Gupta and other anonymous reviewers
Birkel, C., D. Tetzlaff, S. M. Dunn, and C. Soulsby (2011a), Using time domain and geographic source tracers to conceptualize streamflow
that helped to improve an earlier
generation processes in lumped rainfall-runoff models, Water Resour. Res., 47, W02515, doi:10.1029/2010WR009547.
version of this manuscript. The data
Birkel, C., C. Soulsby, and D. Tetzlaff (2011b), Modelling catchment-scale water storage dynamics: Reconciling dynamic storage with tracer-
used in this study are available from
inferred passive storage, Hydrol. Processes, 25, 3924–3936, doi:10.1002/hyp.8201.
the authors upon request.
Boorman, D. B., J. M. Hollis, and A. Lilly (1995), Hydrology of soil types: A hydrological classification of the soils of the United Kingdom, Inst.
Hydrol. Rep. 126, Inst. of Hydrol., Wallingford, U. K.
Box, G. E. P. (1976), Science and statistics, J. Am. Stat. Assoc., 71(356), 791–799.
Brun, R., P. Reichert, and H. Kunsch (2001), Practical identifiability analysis of large environmental simulation models, Water Resour. Res.,
37(4), 1015–1030.
Capell, R., D. Tetzlaff, and C. Soulsby (2013), Will catchment characteristics moderate the projected effects of climate change on flow
regimes in the Scottish Highlands?, Hydrol. Processes, 27, 687–699, doi:10.1002/hyp.9626.
Clark, M. P., H. K. McMillan, D. B. G. Collins, D. Kavetski, and R. A. Woods (2011), Hydrological field data from a modeller’s perspective. Part
2: Process-based evaluation of model hypotheses, Hydrol. Processes, 25, 523–543.
Criss, R. E., and W. E. Winston (2008), Do Nash values have value? Discussion and alternate proposals, Hydrol. Processes, 22, 2723–2725, doi:
10.1002/hyp.7072.
Davies, J., and K. Beven (2012), Comparison of a multiple interacting pathways model with a classical kinematic wave subsurface flow solu-
tion, Hydrol. Sci. J., 57(2), 203–216.
Dunn, S. M., and R. MacKay (1995), Spatial variation in evapotranspiration and the influence of land use on catchment hydrology, J. Hydrol.,
171, 49–73.
Dunn, S. M., C. Birkel, D. Tetzlaff, and C. Soulsby (2010), Transit time distributions of a conceptual model: Their characteristics and sensitiv-
ities, Hydrol. Processes, 24, 1719–1729, doi:10.1002/hyp.7560.
Dunn, S. M., W. G. Darling, C. Birkel, and J. R. Bacon (2012), The role of groundwater characteristics in catchment recovery from nitrate pol-
lution, Hydrol. Res., 43(5), 560–575.
Fenicia, F., J. J. McDonnell, and H. Savenije (2008), Learning from model improvement: On the contribution of complementary data to pro-
cess understanding, Water Resour. Res., 44, W06419, doi:10.1029/2007WR006386.
Fenicia, F., S. Wrede, D. Kavetski, L. Pfister, L. Hoffmann, H. H. G. Savenije, and J. J. McDonnell (2010), Assessing the impact of mixing
assumptions on the estimation of streamwater mean residence time, Hydrol. Processes, 24(12), 1730–1741, doi:10.1002/hyp.7595.
Freer, J., H. McMillan, J. J. McDonnell, and K. J. Beven (2004), Constraining dynamic TOPMODEL responses for imprecise water table infor-
mation using fuzzy rule based performance measures, J. Hydrol., 291(3–4), 254–277.
Frei, S., G. Lischeid, and J. H. Fleckenstein (2010), Effects of micro-topography on surface–subsurface exchange and runoff generation in a
virtual riparian wetland—A modeling study, Adv. Water Resour., 336, 1388–1401, doi:10.1016/j.advwatres.2010.07.006.
Gibson, J. J., and T. W. D. Edwards (2002), Regional water balance trends and evaporation transpiration partitioning from stable isotope sur-
vey of lakes in northern Canada, Global Biogeochem. Cycles, 16(2), 1026, doi:10.1029/2001GB001839.
Gupta, H. V., S. Sorooshian, and P. O. Yapo (1998), Towards improved calibration of hydrologic models: Multiple and non-commensurable
measures of information, Water Resour. Res., 34(4), 751–763.
Gupta, H. V., T. Wagener, and Y. Liu (2008), Reconciling theory with observations: Elements of a diagnostic approach to model evaluation,
Hydrol. Processes, 22, 3802–3813, doi:10.1002/hyp.6989.
Gupta, H. V., M. P. Clark, J. A. Vrugt, G. Abramowitz, and M. Ye (2012), Towards a comprehensive assessment of model structural adequacy,
Water Resour. Res., 48, W08301, doi:10.1029/2011WR011044.

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3500
Water Resources Research 10.1002/2013WR014925

Heidb€ uchel, I., P. A. Troch, S. W. Lyon, and M. Weiler (2012), The master transit time distribution of variable flow systems, Water Resour. Res.,
48, W06520, doi:10.1029/2011WR011293.
Hrachowitz, M., C. Soulsby, D. Tetzlaff, I. A. Malcolm, and G. Schoups (2010a), Gamma distribution models for transit time estimation in
catchments: Physical interpretation of parameters and implications for time-variant transit time assessment, Water Resour. Res., 46,
W10536, doi:10.1029/2010WR009148.
Hrachowitz, M., C. Soulsby, D. Tetzlaff, and M. Speed (2010b), Catchment transit times and landscape controls: Does scale matter?, Hydrol.
Processes, 24, 117–125, doi:10.1002/hyp.7510.
Hrachowitz, M., H. Savenije, T. A. Bogaard, D. Tetzlaff, and C. Soulsby (2013), What can flux tracking teach us about water age distribution
patterns and their temporal dynamics?, Hydrol. Earth Syst. Sci., 17, 533–564, doi:10.5194/hess-17-533-2013.
Katsuyama, M., N. Kabeya, and N. Ohte (2009), Elucidation of the relationship between geographic and time sources of stream water using
a tracer approach in a headwater catchment, Water Resour. Res., 45, W06414, doi:10.1029/2008WR007458.
Kirchner, J. W. (2003), A double paradox in catchment hydrology and geochemistry, Hydrol. Processes, 17, 871–874.
Kirchner, J. W. (2006), Getting the right answers for the right reasons: Linking measurements, analyses, and models to advance the science
of hydrology, Water Resour. Res., 42, W03S04, doi:10.1029/2005WR004362.
Kirchner, J. W. (2009), Catchments as simple dynamical systems: Catchment characterization, rainfall-runoff modeling, and doing hydrology
backward, Water Resour. Res., 45, W02429, doi:10.1029/2008WR006912.
Lamb, R., K. J. Beven, and S. Myrabo (1998), Use of spatially distributed water table observations to constrain uncertainty in a rainfall-runoff
model, Adv. Water Resour., 22, 305–317.
McDonnell, J. J., et al. (2010), How old is the water? Open questions in catchment transit time conceptualization, modelling and analysis,
Hydrol. Processes, 24, 1745–1754, doi:10.1002/hyp.7796.
McGuire, K. J., M. Weiler, and J. J. McDonnell (2007), Integrating tracer experiments with modelling to assess runoff processes and water
transit times, Adv. Water Resour., 30(4), 824–837.
McMillan, H., D. Tetzlaff, M. Clark, and C. Soulsby (2012), Do time-variable tracers aid the evaluation of hydrological model structure? A mul-
timodel approach, Water Resour. Res., 48, W05501, doi:10.1029/2011WR011688.
Morris, M. D. (1991), Factorial sampling plans for preliminary computational experiments, Technometrics, 33(2), 161–174, doi:10.1080/
00401706.1991.10484804.
Mroczkowski, M., G. P. Raper, and G. Kuczera (1997), The quest for more powerful validation of conceptual catchment models, Water
Resour. Res., 33(10), 2325–2335.
Page, T., K. J. Beven, J. Freer, and C. Neal (2007), Modelling the chloride signal at Plynlimon, Wales, using a modified dynamic TOPMODEL
incorporating conservative chemical mixing (with uncertainty), Hydrol. Processes, 21(3), 292–307, doi:10.1002/hyp.6186.
Price, K., S. T. Purucker, S. R. Kraemer, and J. E. Babendreier (2012), Tradeoffs among watershed model calibration targets for parameter
estimation, Water Resour. Res., 48, W10542, doi:10.1029/2012WR012005.
Price, K. V., R. M. Storn, and J. A. Lampinen (2005), Differential Evolution—A Practical Approach to Global Optimization, Springer, Berlin.
R Core Team (2013), R: A Language and Environment for Statistical Computing, R Found. for Stat. Comput., Vienna. [Available at http://www.
R-project.org/.].
Rinaldo, A., K. J. Beven, E. Bertuzzo, L. Nicotina, J. Davies, A. Fiori, D. Russo, and G. Botter (2011), Catchment travel time distributions and
water flow in soils, Water Resour. Res., 47, W07537, doi:10.1029/2011WR010478.
Sayama, T., and J. J. McDonnell (2009), A new time-space accounting scheme to predict stream water residence time and hydrograph
source components at the water scale, Water Resour. Res., 47, W07401, doi:10.1029/2008WR007549.
Seibert, J., and J. J. McDonnell (2002), On the dialog between experimentalist and modeler in catchment hydrology: Use of soft data for
multicriteria model calibration, Water Resour. Res., 38(11), 1241, doi:10.1029/2001WR000978.
Seibert, J., T. Grabs, S. Koehler, H. Laudon, M. Winterdahl, and K. Bishop (2009), Linking soil- and stream-water chemistry based on riparian
flow-concentration integration model, Hydrol. Earth Syst. Sci., 13, 2287–2297.
Shin, M.-J., J. H. A. Guillaume, B. F. W. Croke, and A. J. Jakeman (2013), Addressing ten questions about conceptual rainfall-runoff models
with global sensitivity analyses in R, J. Hydrol., 503, 135–152, doi:10.1016/j.jhydrol.2013.08.047.
Sobol, I. M. (1993), Sensitivity analysis for nonlinear mathematical models, Math. Model. Comput. Exp., 1(4), 407–414.
Soetaert, K., and T. Petzoldt (2010), Inverse modelling, sensitivity and Monte Carlo analysis in R using package FME, J. Stat. Software, 33(3),
1–28. [Available at http://www.jstatsoft.org/v33/i03/.].
Soulsby, C., M. Chen, R. C. Ferrier, A. Jenkins, and R. Harriman (1998), Hydrogeochemistry of shallow groundwater in a Scottish catchment,
Hydrol. Processes, 12, 1111–1127.
Soulsby, C., D. Tetzlaff, N. van den Bedem, I. A. Malcolm, P. J. Bacon, and A. F. Youngson (2007), Inferring groundwater influences on surface
water in montane catchments from hydrochemical surveys of springs and streamwaters, J. Hydrol., 333, 199–213.
Soulsby, C., C. Neal, H. Laudon, D. A. Burns, P. Merot, M. Bonell, S. M. Dunn, and D. Tetzlaff (2008), Catchment data for process conceptuali-
zation: Simply not enough?, Hydrol. Processes, 22, 2057–2061.
Tetzlaff, D., S. Uhlenbrook, S. Eppert, and C. Soulsby (2008), Does the incorporation of process conceptualization and tracer data improve
the structure and performance of a simple rainfall-runoff model in a Scottish mesoscale catchment?, Hydrol. Processes, 22(14), 2461–
2474, doi:10.1002/hyp.6841.
Tetzlaff, D., C. Birkel, J. Dick, and C. Soulsby (2014), Storage dynamics in hydropedological units control hillslope connectivity, runoff gener-
ation and the evolution of catchment transit time distributions, Water Resour. Res., 50, 969–985, doi:10.1002/2013WR014147.
Uhlenbrook, S., and C. Leibundgut (2002), Process-oriented catchment modelling and multiple-response validation, Hydrol. Processes, 16,
423–440.
Vache, K. B., and J. J. McDonnell (2006), A process-based rejectionist framework for evaluating catchment runoff model structure, Water
Resour. Res., 42, W02409, doi:10.1029/2005WR004247.
van Griensven, A., T. Meixner, S. Grunwald, T. Bishop, M. Diluzio, and R. Srinivasan (2006), A global sensitivity analysis tool for the parame-
ters of multi-variable catchment models, J. Hydrol., 324, 10–23.

BIRKEL ET AL. C 2014. American Geophysical Union. All Rights Reserved.


V 3501

You might also like