Habitat Suitability and Distribution Models - With Applications in R
Habitat Suitability and Distribution Models - With Applications in R
Habitat Suitability and Distribution Models - With Applications in R
This book introduces the key stages of niche-based habitat suitability model
building, evaluation and prediction required for understanding and predict-
ing future patterns of species and biodiversity. Beginning with the main theory
behind ecological niches and species distributions, the book proceeds through
all major steps of model building, from conceptualization and model training to
model evaluation and spatio-temporal predictions. Extensive examples using R
support graduate students and researchers in quantifying ecological niches and
predicting species distributions with their own data, and help to address key
environmental and conservation problems. Reflecting this highly active field of
research, the book incorporates the latest developments from informatics and
statistics, as well as using data from remote sources such as satellite imagery.
A website at www.unil.ch/hsdm contains the codes and supporting material
required to run the examples and teach courses.
All three authors are recognized specialists of and have contributed substan-
tially to the development of spatial prediction methods for species’ habitat suit-
ability and distribution modeling. They published a large number of papers,
overall cumulating tens of thousands of citations, and are ISI Highly Cited
Researchers.
Series Editors
Michael Usher University of Stirling, and formerly Scottish Natural Heritage
Denis Saunders Formerly CSIRO Division of Sustainable Ecosystems, Canberra
Robert Peet University of North Carolina, Chapel Hill
Andrew Dobson Princeton University
Editorial Board
Paul Adam University of New South Wales, Australia
H. J. B. Birks University of Bergen, Norway
Lena Gustafsson Swedish University of Agricultural Science
Jeff McNeely International Union for the Conservation of Nature
R. T. Paine University of Washington
David Richardson University of Stellenbosch
Jeremy Wilson Royal Society for the Protection of Birds
The world’s biological diversity faces unprecedented threats. The urgent challenge facing
the concerned biologist is to understand ecological processes well enough to maintain
their functioning in the face of the pressures resulting from human population growth.
Those concerned with the conservation of biodiversity and with restoration also need
to be acquainted with the political, social, historical, economic and legal frameworks
within which ecological and conservation practice must be developed. The new Ecology,
Biodiversity, and Conservation series will present balanced, comprehensive, up-to-date,
and critical reviews of selected topics within the sciences of ecology and conservation
biology, both botanical and zoological, and both “pure” and “applied”. It is aimed at
advanced final-year undergraduates, graduate students, researchers, and university teachers,
as well as ecologists and conservationists in industry, government and the voluntary sectors.
The series encompasses a wide range of approaches and scales (spatial, temporal, and taxo-
nomic), including quantitative, theoretical, population, community, ecosystem, landscape,
historical, experimental, behavioral, and evolutionary studies. The emphasis is on science
related to the real world of plants and animals rather than on purely theoretical abstrac-
tions and mathematical models. Books in this series will, wherever possible, consider issues
from a broad perspective. Some books will challenge existing paradigms and present new
ecological concepts, empirical or theoretical models, and testable hypotheses. Other books
will explore new approaches and present syntheses on topics of ecological importance.
Nonequilibrium Ecology
Klaus Rohde
The Ecology of Phytoplankton
C. S. Reynolds
Systematic Conservation Planning
Chris Margules and Sahotra Sarkar
Large-Scale Landscape Experiments: Lessons from Tumut
David B. Lindenmayer
Assessing the Conservation Value of Freshwaters: An International Perspective
Philip J. Boon and Catherine M. Pringle
Insect Species Conservation
T. R. New
Bird Conservation and Agriculture
Jeremy D. Wilson, Andrew D. Evans, and Philip V. Grice
Cave Biology: Life in Darkness
Aldemaro Romero
Biodiversity in Environmental Assessment: Enhancing Ecosystem Services for Human Well-being
Roel Slootweg, Asha Rajvanshi,Vinod B. Mathur, and Arend Kolhoff
Mapping Species Distributions: Spatial Inference and Prediction
Janet Franklin
Decline and Recovery of the Island Fox: A Case Study for Population Recovery
Timothy J. Coonan, Catherin A. Schwemm, and David K. Garcelon
Ecosystem Functioning
Kurt Jax
Spatio-Temporal Heterogeneity: Concepts and Analyses
Pierre R. L. Dutilleul
Parasites in Ecological Communities: From Interactions to Ecosystems
Melanie J. Hatcher and Alison M. Dunn
Zoo Conservation Biology
John E. Fa, Stephan M. Funk, and Donnamarie O’Connell
Marine Protected Areas: A Multidisciplinary Approach
Joachim Claudet
Biodiversity in Dead Wood
Jogeir N. Stokland, Juha Siitonen, and Bengt Gunnar Jonsson
Landslide Ecology
Lawrence R. Walker and Aaron B. Shiels
Nature’s Wealth: The Economics of Ecosystem Services and Poverty
Pieter J.H. van Beukering, Elissaios Papyrakis, Jetske Bouma, and Roy Brouwer
Birds and Climate Change: Impacts and Conservation Responses
James W. Pearce-Higgins and Rhys E. Green
A N TOINE G U IS A N
University of Lausanne
W I L F R IE D T H U IL L E R
CNRS, Université Grenoble Alpes
N I K L AU S E . Z IM M E R M ANN
Swiss Federal Research Institute WSL
DAMIEN G EORGES
CNRS, Université Grenoble Alpes
ACHILLEAS PS OM AS
Swiss Federal Research Institute WSL
www.cambridge.org
Information on this title: www.cambridge.org/9780521765138
DOI: 10.1017/9781139028271
© Antoine Guisan, Wilfried Thuiller, and Niklaus E. Zimmermann 2017
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2017
Printed in the United Kingdom by TJ International Ltd. Padstow Cornwall
A catalog record for this publication is available from the British Library.
ISBN 978-0-521-76513-8 Hardback
ISBN 978-0-521-75836-9 Paperback
Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
This book introduces the key stages of niche-based habitat suitability model
building, evaluation and prediction required for understanding and predict-
ing future patterns of species and biodiversity. Beginning with the main theory
behind ecological niches and species distributions, the book proceeds through
all major steps of model building, from conceptualization and model training to
model evaluation and spatio-temporal predictions. Extensive examples using R
support graduate students and researchers in quantifying ecological niches and
predicting species distributions with their own data, and help to address key
environmental and conservation problems. Reflecting this highly active field of
research, the book incorporates the latest developments from informatics and
statistics, as well as using data from remote sources such as satellite imagery.
A website at www.unil.ch/hsdm contains the codes and supporting material
required to run the examples and teach courses.
All three authors are recognized specialists of and have contributed substan-
tially to the development of spatial prediction methods for species’ habitat suit-
ability and distribution modeling. They published a large number of papers,
overall cumulating tens of thousands of citations, and are ISI Highly Cited
Researchers.
Series Editors
Michael Usher University of Stirling, and formerly Scottish Natural Heritage
Denis Saunders Formerly CSIRO Division of Sustainable Ecosystems, Canberra
Robert Peet University of North Carolina, Chapel Hill
Andrew Dobson Princeton University
Editorial Board
Paul Adam University of New South Wales, Australia
H. J. B. Birks University of Bergen, Norway
Lena Gustafsson Swedish University of Agricultural Science
Jeff McNeely International Union for the Conservation of Nature
R. T. Paine University of Washington
David Richardson University of Stellenbosch
Jeremy Wilson Royal Society for the Protection of Birds
The world’s biological diversity faces unprecedented threats. The urgent challenge facing
the concerned biologist is to understand ecological processes well enough to maintain
their functioning in the face of the pressures resulting from human population growth.
Those concerned with the conservation of biodiversity and with restoration also need
to be acquainted with the political, social, historical, economic and legal frameworks
within which ecological and conservation practice must be developed. The new Ecology,
Biodiversity, and Conservation series will present balanced, comprehensive, up-to-date,
and critical reviews of selected topics within the sciences of ecology and conservation
biology, both botanical and zoological, and both “pure” and “applied”. It is aimed at
advanced final-year undergraduates, graduate students, researchers, and university teachers,
as well as ecologists and conservationists in industry, government and the voluntary sectors.
The series encompasses a wide range of approaches and scales (spatial, temporal, and taxo-
nomic), including quantitative, theoretical, population, community, ecosystem, landscape,
historical, experimental, behavioral, and evolutionary studies. The emphasis is on science
related to the real world of plants and animals rather than on purely theoretical abstrac-
tions and mathematical models. Books in this series will, wherever possible, consider issues
from a broad perspective. Some books will challenge existing paradigms and present new
ecological concepts, empirical or theoretical models, and testable hypotheses. Other books
will explore new approaches and present syntheses on topics of ecological importance.
Nonequilibrium Ecology
Klaus Rohde
The Ecology of Phytoplankton
C. S. Reynolds
Systematic Conservation Planning
Chris Margules and Sahotra Sarkar
Large-Scale Landscape Experiments: Lessons from Tumut
David B. Lindenmayer
Assessing the Conservation Value of Freshwaters: An International Perspective
Philip J. Boon and Catherine M. Pringle
Insect Species Conservation
T. R. New
Bird Conservation and Agriculture
Jeremy D. Wilson, Andrew D. Evans, and Philip V. Grice
Cave Biology: Life in Darkness
Aldemaro Romero
Biodiversity in Environmental Assessment: Enhancing Ecosystem Services for Human Well-being
Roel Slootweg, Asha Rajvanshi,Vinod B. Mathur, and Arend Kolhoff
Mapping Species Distributions: Spatial Inference and Prediction
Janet Franklin
Decline and Recovery of the Island Fox: A Case Study for Population Recovery
Timothy J. Coonan, Catherin A. Schwemm, and David K. Garcelon
Ecosystem Functioning
Kurt Jax
Spatio-Temporal Heterogeneity: Concepts and Analyses
Pierre R. L. Dutilleul
Parasites in Ecological Communities: From Interactions to Ecosystems
Melanie J. Hatcher and Alison M. Dunn
Zoo Conservation Biology
John E. Fa, Stephan M. Funk, and Donnamarie O’Connell
Marine Protected Areas: A Multidisciplinary Approach
Joachim Claudet
Biodiversity in Dead Wood
Jogeir N. Stokland, Juha Siitonen, and Bengt Gunnar Jonsson
Landslide Ecology
Lawrence R. Walker and Aaron B. Shiels
Nature’s Wealth: The Economics of Ecosystem Services and Poverty
Pieter J.H. van Beukering, Elissaios Papyrakis, Jetske Bouma, and Roy Brouwer
Birds and Climate Change: Impacts and Conservation Responses
James W. Pearce-Higgins and Rhys E. Green
A N TOINE G U IS A N
University of Lausanne
W I L F R IE D T H U IL L E R
CNRS, Université Grenoble Alpes
N I K L AU S E . Z IM M E R M ANN
Swiss Federal Research Institute WSL
DAMIEN G EORGES
CNRS, Université Grenoble Alpes
ACHILLEAS PS OM AS
Swiss Federal Research Institute WSL
www.cambridge.org
Information on this title: www.cambridge.org/9780521765138
DOI: 10.1017/9781139028271
© Antoine Guisan, Wilfried Thuiller, and Niklaus E. Zimmermann 2017
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2017
Printed in the United Kingdom by TJ International Ltd. Padstow Cornwall
A catalog record for this publication is available from the British Library.
ISBN 978-0-521-76513-8 Hardback
ISBN 978-0-521-75836-9 Paperback
Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
This book introduces the key stages of niche-based habitat suitability model
building, evaluation and prediction required for understanding and predict-
ing future patterns of species and biodiversity. Beginning with the main theory
behind ecological niches and species distributions, the book proceeds through
all major steps of model building, from conceptualization and model training to
model evaluation and spatio-temporal predictions. Extensive examples using R
support graduate students and researchers in quantifying ecological niches and
predicting species distributions with their own data, and help to address key
environmental and conservation problems. Reflecting this highly active field of
research, the book incorporates the latest developments from informatics and
statistics, as well as using data from remote sources such as satellite imagery.
A website at www.unil.ch/hsdm contains the codes and supporting material
required to run the examples and teach courses.
All three authors are recognized specialists of and have contributed substan-
tially to the development of spatial prediction methods for species’ habitat suit-
ability and distribution modeling. They published a large number of papers,
overall cumulating tens of thousands of citations, and are ISI Highly Cited
Researchers.
Series Editors
Michael Usher University of Stirling, and formerly Scottish Natural Heritage
Denis Saunders Formerly CSIRO Division of Sustainable Ecosystems, Canberra
Robert Peet University of North Carolina, Chapel Hill
Andrew Dobson Princeton University
Editorial Board
Paul Adam University of New South Wales, Australia
H. J. B. Birks University of Bergen, Norway
Lena Gustafsson Swedish University of Agricultural Science
Jeff McNeely International Union for the Conservation of Nature
R. T. Paine University of Washington
David Richardson University of Stellenbosch
Jeremy Wilson Royal Society for the Protection of Birds
The world’s biological diversity faces unprecedented threats. The urgent challenge facing
the concerned biologist is to understand ecological processes well enough to maintain
their functioning in the face of the pressures resulting from human population growth.
Those concerned with the conservation of biodiversity and with restoration also need
to be acquainted with the political, social, historical, economic and legal frameworks
within which ecological and conservation practice must be developed. The new Ecology,
Biodiversity, and Conservation series will present balanced, comprehensive, up-to-date,
and critical reviews of selected topics within the sciences of ecology and conservation
biology, both botanical and zoological, and both “pure” and “applied”. It is aimed at
advanced final-year undergraduates, graduate students, researchers, and university teachers,
as well as ecologists and conservationists in industry, government and the voluntary sectors.
The series encompasses a wide range of approaches and scales (spatial, temporal, and taxo-
nomic), including quantitative, theoretical, population, community, ecosystem, landscape,
historical, experimental, behavioral, and evolutionary studies. The emphasis is on science
related to the real world of plants and animals rather than on purely theoretical abstrac-
tions and mathematical models. Books in this series will, wherever possible, consider issues
from a broad perspective. Some books will challenge existing paradigms and present new
ecological concepts, empirical or theoretical models, and testable hypotheses. Other books
will explore new approaches and present syntheses on topics of ecological importance.
Nonequilibrium Ecology
Klaus Rohde
The Ecology of Phytoplankton
C. S. Reynolds
Systematic Conservation Planning
Chris Margules and Sahotra Sarkar
Large-Scale Landscape Experiments: Lessons from Tumut
David B. Lindenmayer
Assessing the Conservation Value of Freshwaters: An International Perspective
Philip J. Boon and Catherine M. Pringle
Insect Species Conservation
T. R. New
Bird Conservation and Agriculture
Jeremy D. Wilson, Andrew D. Evans, and Philip V. Grice
Cave Biology: Life in Darkness
Aldemaro Romero
Biodiversity in Environmental Assessment: Enhancing Ecosystem Services for Human Well-being
Roel Slootweg, Asha Rajvanshi,Vinod B. Mathur, and Arend Kolhoff
Mapping Species Distributions: Spatial Inference and Prediction
Janet Franklin
Decline and Recovery of the Island Fox: A Case Study for Population Recovery
Timothy J. Coonan, Catherin A. Schwemm, and David K. Garcelon
Ecosystem Functioning
Kurt Jax
Spatio-Temporal Heterogeneity: Concepts and Analyses
Pierre R. L. Dutilleul
Parasites in Ecological Communities: From Interactions to Ecosystems
Melanie J. Hatcher and Alison M. Dunn
Zoo Conservation Biology
John E. Fa, Stephan M. Funk, and Donnamarie O’Connell
Marine Protected Areas: A Multidisciplinary Approach
Joachim Claudet
Biodiversity in Dead Wood
Jogeir N. Stokland, Juha Siitonen, and Bengt Gunnar Jonsson
Landslide Ecology
Lawrence R. Walker and Aaron B. Shiels
Nature’s Wealth: The Economics of Ecosystem Services and Poverty
Pieter J.H. van Beukering, Elissaios Papyrakis, Jetske Bouma, and Roy Brouwer
Birds and Climate Change: Impacts and Conservation Responses
James W. Pearce-Higgins and Rhys E. Green
A N TOINE G U IS A N
University of Lausanne
W I L F R IE D T H U IL L E R
CNRS, Université Grenoble Alpes
N I K L AU S E . Z IM M E R M ANN
Swiss Federal Research Institute WSL
DAMIEN G EORGES
CNRS, Université Grenoble Alpes
ACHILLEAS PS OM AS
Swiss Federal Research Institute WSL
www.cambridge.org
Information on this title: www.cambridge.org/9780521765138
DOI: 10.1017/9781139028271
© Antoine Guisan, Wilfried Thuiller, and Niklaus E. Zimmermann 2017
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2017
Printed in the United Kingdom by TJ International Ltd. Padstow Cornwall
A catalog record for this publication is available from the British Library.
ISBN 978-0-521-76513-8 Hardback
ISBN 978-0-521-75836-9 Paperback
Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
This book introduces the key stages of niche-based habitat suitability model
building, evaluation and prediction required for understanding and predict-
ing future patterns of species and biodiversity. Beginning with the main theory
behind ecological niches and species distributions, the book proceeds through
all major steps of model building, from conceptualization and model training to
model evaluation and spatio-temporal predictions. Extensive examples using R
support graduate students and researchers in quantifying ecological niches and
predicting species distributions with their own data, and help to address key
environmental and conservation problems. Reflecting this highly active field of
research, the book incorporates the latest developments from informatics and
statistics, as well as using data from remote sources such as satellite imagery.
A website at www.unil.ch/hsdm contains the codes and supporting material
required to run the examples and teach courses.
All three authors are recognized specialists of and have contributed substan-
tially to the development of spatial prediction methods for species’ habitat suit-
ability and distribution modeling. They published a large number of papers,
overall cumulating tens of thousands of citations, and are ISI Highly Cited
Researchers.
Series Editors
Michael Usher University of Stirling, and formerly Scottish Natural Heritage
Denis Saunders Formerly CSIRO Division of Sustainable Ecosystems, Canberra
Robert Peet University of North Carolina, Chapel Hill
Andrew Dobson Princeton University
Editorial Board
Paul Adam University of New South Wales, Australia
H. J. B. Birks University of Bergen, Norway
Lena Gustafsson Swedish University of Agricultural Science
Jeff McNeely International Union for the Conservation of Nature
R. T. Paine University of Washington
David Richardson University of Stellenbosch
Jeremy Wilson Royal Society for the Protection of Birds
The world’s biological diversity faces unprecedented threats. The urgent challenge facing
the concerned biologist is to understand ecological processes well enough to maintain
their functioning in the face of the pressures resulting from human population growth.
Those concerned with the conservation of biodiversity and with restoration also need
to be acquainted with the political, social, historical, economic and legal frameworks
within which ecological and conservation practice must be developed. The new Ecology,
Biodiversity, and Conservation series will present balanced, comprehensive, up-to-date,
and critical reviews of selected topics within the sciences of ecology and conservation
biology, both botanical and zoological, and both “pure” and “applied”. It is aimed at
advanced final-year undergraduates, graduate students, researchers, and university teachers,
as well as ecologists and conservationists in industry, government and the voluntary sectors.
The series encompasses a wide range of approaches and scales (spatial, temporal, and taxo-
nomic), including quantitative, theoretical, population, community, ecosystem, landscape,
historical, experimental, behavioral, and evolutionary studies. The emphasis is on science
related to the real world of plants and animals rather than on purely theoretical abstrac-
tions and mathematical models. Books in this series will, wherever possible, consider issues
from a broad perspective. Some books will challenge existing paradigms and present new
ecological concepts, empirical or theoretical models, and testable hypotheses. Other books
will explore new approaches and present syntheses on topics of ecological importance.
Nonequilibrium Ecology
Klaus Rohde
The Ecology of Phytoplankton
C. S. Reynolds
Systematic Conservation Planning
Chris Margules and Sahotra Sarkar
Large-Scale Landscape Experiments: Lessons from Tumut
David B. Lindenmayer
Assessing the Conservation Value of Freshwaters: An International Perspective
Philip J. Boon and Catherine M. Pringle
Insect Species Conservation
T. R. New
Bird Conservation and Agriculture
Jeremy D. Wilson, Andrew D. Evans, and Philip V. Grice
Cave Biology: Life in Darkness
Aldemaro Romero
Biodiversity in Environmental Assessment: Enhancing Ecosystem Services for Human Well-being
Roel Slootweg, Asha Rajvanshi,Vinod B. Mathur, and Arend Kolhoff
Mapping Species Distributions: Spatial Inference and Prediction
Janet Franklin
Decline and Recovery of the Island Fox: A Case Study for Population Recovery
Timothy J. Coonan, Catherin A. Schwemm, and David K. Garcelon
Ecosystem Functioning
Kurt Jax
Spatio-Temporal Heterogeneity: Concepts and Analyses
Pierre R. L. Dutilleul
Parasites in Ecological Communities: From Interactions to Ecosystems
Melanie J. Hatcher and Alison M. Dunn
Zoo Conservation Biology
John E. Fa, Stephan M. Funk, and Donnamarie O’Connell
Marine Protected Areas: A Multidisciplinary Approach
Joachim Claudet
Biodiversity in Dead Wood
Jogeir N. Stokland, Juha Siitonen, and Bengt Gunnar Jonsson
Landslide Ecology
Lawrence R. Walker and Aaron B. Shiels
Nature’s Wealth: The Economics of Ecosystem Services and Poverty
Pieter J.H. van Beukering, Elissaios Papyrakis, Jetske Bouma, and Roy Brouwer
Birds and Climate Change: Impacts and Conservation Responses
James W. Pearce-Higgins and Rhys E. Green
A N TOINE G U IS A N
University of Lausanne
W I L F R IE D T H U IL L E R
CNRS, Université Grenoble Alpes
N I K L AU S E . Z IM M E R M ANN
Swiss Federal Research Institute WSL
DAMIEN G EORGES
CNRS, Université Grenoble Alpes
ACHILLEAS PS OM AS
Swiss Federal Research Institute WSL
www.cambridge.org
Information on this title: www.cambridge.org/9780521765138
DOI: 10.1017/9781139028271
© Antoine Guisan, Wilfried Thuiller, and Niklaus E. Zimmermann 2017
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2017
Printed in the United Kingdom by TJ International Ltd. Padstow Cornwall
A catalog record for this publication is available from the British Library.
ISBN 978-0-521-76513-8 Hardback
ISBN 978-0-521-75836-9 Paperback
Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
Contents
Introduction 1
1 General Content of the Book 3
1.1 What Is This Book About? 3
1.2 How Is the Book Structured? 3
1.3 Why Write a Textbook with R Examples? 4
1.4 What Is This Book Not About? 5
1.5 Why Was This Book Needed? 5
1.6 Who Is This Book For? 6
1.7 Where Can I Find Supporting Material? 6
1.8 What Are Readers Assumed to Know Already? 7
1.9 How Does This Book Differ From Previous Ones? 7
1.10 What Terminology Is Used in This Book? 8
Contents · ix
8 Ecological Scales: Issues of Resolution
and Extent 135
8.1 Issues of Resolution 136
8.2 Issues of Extent 147
Contents · xi
20.4 Ensemble of Small Models for Rarer Species 406
20.5 Improving the Modeling Techniques to Fit
Simple and Ensemble HSMs 407
20.6 Multi-Species Modeling and Joint-Species
Distribution Modeling 408
20.7 Use of Artificial Data 410
Foreword · xiii
modeling in terms of the technical options available and the ecological
assumptions each implies. Of equal importance is the provision of R
code procedures making available to many the details of how to imple-
ment different methods and compare their performance. This book pro-
vides an invaluable learning experience for all ecologists interested in
habitat suitability modeling based as it is on the combined experience of
three of the pioneers of many of the ideas presented.
Mike Austin
CSIRO Land & Water Flagship, Canberra, Australia
Preface
Authors’ Contributions
AG, WT, and NEZ conceived and designed the book and wrote all the
main texts. AG led the book development with large support by VD
in the last four years. VD acted as the book manager to coordinate all
contributions and scripts. AG led the writing of Parts I and IV, NEZ led
Part II, WT led Parts III and VI, and all three led Part V. AP and NEZ
developed all script examples in Part II, DG and WT in Parts III and VI,
VD and AG in Part IV, and DG,VD, AP, NEZ, WT, and AG in Part V.
Introduction
Here, we present the main features of the book: its aims, structure, con-
tent, terminology, readership, supporting material, expected pre-requisites,
and how it differs from other books already available. The structure of
the book follows the main modeling steps, so we recommend reading the
sections about the book’s structure and content before reading the other
parts.
1
www.r-project.org
4
www.r-project.org
5
www.unil.ch/hsdm
PART I • O
verview, Principles,Theory,
and Assumptions Behind
Habitat Suitability Modeling
In this first part of the book, we begin by briefly presenting the general
procedure (i.e. the series of methodological steps) used to build and apply
HSMs (Chapter 2). We next summarize our ecological and evolution-
ary understanding of the factors driving species distributions and related
biogeographical theory (Chapter 3). It is by no means our intention to
present an exhaustive review of all existing theories, which can best be
found in textbooks (Lomolino et al., 2010; Smith and Smith, 2015), but
rather to focus on the most useful concepts for HSMs. Readers famil-
iar with the theory behind species’ niches and geographic distributions
may prefer to start directly with Chapter 4, where we explain the main
principles of habitat suitability modeling, how predictions for individual
species can be assembled to predict communities, and what the main
applications of these models are. We finally present the main working
assumptions that are made when fitting such models (Chapter 5; see
also Part V for assumptions specifically related to projections in time and
space).
Figure 2.1 The five main modeling steps to be followed when building HSM.
Simplified from Guisan and Zimmermann (2000), with permission. See also Table 2.1.
(continued)
The most obvious questions in biogeography, and the starting points for
this book, are: what are we observing (i.e. which species, communities, or
ecosystems?)? Where in space and time? Why are organisms distributed
where they are? The quest to answer these questions is an age-old one
(Guisan and Thuiller, 2005; Franklin, 2010a), but which really took off
scientifically with the eighteenth and nineteenth centuries’ first biogeog-
raphers (i.e.Von Humbolt, De Candolle, Darwin, and Wallace). Since the
steady rise of ecological research during the twentieth century, explain-
ing and understanding the distribution of biodiversity at various spatial
and temporal scales has continued to be an important field of macroeco-
logical and biodiversity research (Lomolino et al., 2010; McGill, 2010).
Numerous new or refined theories were proposed (metapopulation
dynamics, e.g. Hanski and Gilpin, 1997; neutral theory, Hubbell, 2001;
metabolic theory, Brown et al., 2004) in which geographic space was
explicitly considered (Moloney and Jeltsch, 2008). This ultimately fos-
tered the development of predictive models of species and biodiversity
(Côté and Reynolds, 2002; Guisan et al., 2013).
This chapter does not attempt to review every single step in this long
history, nor does it aim to provide an exhaustive review of all the the-
oretical aspects of species’ niches and distributions. Instead, we aim to
present the theories and findings most relevant to habitat suitability
modeling. However, and although habitat suitability remains the major
principle behind this type of model, we will begin with a more general
biogeographical perspective. First, we will present the three specific key
drivers of species distributions (3.1), then introduce each of them more
detail in the following sections: speciation, dispersal, species pools and
neutral theory (3.2), the abiotic environment: habitat and fundamental
niche (3.3), and the biotic environment: species interactions, community
assembly and the realized niche (3.4). Habitat and niche issues will then
be further discussed in more detail in Chapters 4 and 5, from the angle
of habitat suitability modeling. The aim here is to introduce the basic
Figure 3.2 The three main factors that drive species ranges. G: studied geographic area;
A: suitable abiotic environment (niche); B: suitable biotic environment; C: colonizable
range. Observations in the field could result from four situations, from most to least
likely: 1. Realized niche (suitable with regard to all three aspects). 2. Suitable abiotic
environment with unsuitable biotic conditions, for instance due to strong competi-
tion. 3. Colonization outside the suitable environment, maybe due to facilitation (sink).
4. Sink in unsuitable biotic and abiotic conditions, maybe due to historical effect (e.g. trees
persisting in unsuitable conditions). Modified from Soberón (2007), with permission.
Austin, 1985; Austin and Gaywood, 1994). In most cases this is associated
with a position along the gradient where the species performs best –the
physiological optimum –and a gradual decrease in performance the fur-
ther one moves away from this optimum, in either direction (Ellenberg,
1953, 1954; Hector et al., 2012). Such physiological response curves can
therefore be represented as sigmoidal or unimodal shapes (Figure 3.3),
with unimodal responses appearing to be dominant in nature (e.g. linear
responses can be observed for species at the end of gradients or along
very stressful gradients). The width of the curve also documents the
physiological tolerance of species along the gradient. Different species
have different optima and tolerance along a same gradient (Figure 3.3c),
from narrow to wide. A species can have a wide tolerance along one
variable but a narrow tolerance along another. A species that has a very
broad tolerance along a specific gradient may be considered indifferent
to variations in this variable, and thus be considered a generalist species
(for that gradient).
The transition from optimal to poor performance can be smooth or
abrupt, depending on the types of physiological mechanisms involved. An
The distinction between the two types of niche responses along single
gradients leads to the same distinction at the level of the whole envir-
onmental niche, which was termed the “realized niche” by Hutchinson
(1957) or the “ecological potential” by Ellenberg (1953), describing a
subset of the fundamental niche (or physiological potential) constrained
by competition with one or several other species (Figure 3.5).
We have now seen that biotic interactions constrain the fundamental
niche thus forming the realized niche. Numerous different biotic interac-
tions can affect the predictability of a species at a site from environmental
predictors only (Araújo and Guisan, 2006; Sutherst et al., 2007; Elith and
Leathwick, 2009; Kissling et al., 2012; Wisz et al., 2013). These biotic
interactions may either be negative, by excluding a species from sites that
are a priori environmentally suitable (i.e. within its fundamental niche;
see above; e.g. competition) or facilitate a species at sites that appear envi-
ronmentally unsuitable based on measured average site conditions (i.e.
depending on the scale of measurement; Pellissier et al. (2010)).
Examples of positive interactions (i.e. facilitation; Boucher et al., 1982;
Callaway, 1995; Stachowicz, 2001; Bruno et al., 2003) include commens-
alism, mutualism (e.g. non-symbiotic, Pellissier et al., 2012a; or symbiotic,
Pellissier et al., 2013c), biotic engineering (i.e. a species improving the
micro-habitat conditions for another), for instance forest understory spe-
cies that cannot grow in plain light benefiting from the shade from the
In this chapter, we describe and illustrate the principles for modeling the
distribution of suitable habitats for a given species. However, as discussed
in Chapter 1, habitats for other biological entities can also be modeled
using the same approach, these may include intraspecific levels (e.g. sub-
species, haplotypes), supra-specific levels (e.g. functional groups, commu-
nities, ecosystem types), or features that are transversal across species (e.g.
species traits, genes or alleles, etc.).
Here, we address how to fit the niche of the modeled entity from field
observations (4.1), how the fitted species’ niche can then be projected into
geographical space to predict species distributions (4.2), how individual
single species predictions can be assembled into community-level predic-
tions (e.g. species richness; 4.3), and finally the possible applications of these
models and their predictions in ecology, biogeography and evolution (4.4).
in these models are called “species response curves” (Austin et al.,
1994). These response curves therefore either represent simple rectilin-
ear “box-like” envelopes resulting in simple binary (inside/outside the
niche) predictions (Figure 4.2a), or more realistic representations of the
niche (e.g. unimodal) based on more gradual (and often more complex)
responses in environmental space (Figure 4.2b), resulting in continuous
index of habitat suitability or probability of species occurrence (usu-
ally from low to high, e.g. [0–1] or [0–100]; see Part III and Merow
et al. (2014) for discussion of simple versus complex response curves and
associated models).
A crucial step in habitat suitability modeling is the acquisition of
spatially explicit environmental variables (i.e. maps) at the right reso-
lution, which are sufficiently accurate to be used to determine a species
niche as close to its ecophysiological needs as possible. As previously
seen (3.3), environmental variables (or predictors; see Part II) can exert
direct or indirect effects on species, which can be expressed as a gradi-
ent ranging from proximal to distal predictors (Austin, 2002; Huston,
2002; Austin, 2007; Mod et al., 2016). These are ideally chosen to
reflect the three main types of influences on the species (Guisan and
Zimmermann, 2000; Guisan and Thuiller, 2005): (i) regulators (or lim-
iting factors), defined as factors controlling a species’ metabolism (e.g.
low temperatures); (ii) disturbances, defined as all types of perturbations
As the number of papers on, and topics addressed using, HSMs has
increased exponentially in recent decades (Guisan et al., 2013), it would
be impossible to list them all. Instead, we have provided some examples
in Table 4.1 and refer readers to review papers such as (Guisan and
Zimmermann, 2000; Guisan and Thuiller, 2005; Thuiller et al., 2008;
Elith and Leathwick, 2009; Zimmermann et al., 2010; Guisan et al.,
2013; Thuiller et al., 2013) or complementary books (Scott et al., 2002;
Franklin, 2010a; Peterson et al., 2011).
This part gives a broad overview of general data preparation and pre-
liminary analysis steps. Chapter 6 covers data acquisition from existing
sources for species and environmental data; it introduces spatial analyses
in R that usually are carried out in GIS or RS environments. It discusses
issues of pre-selecting variables for model building explains how to ana-
lyse and avoid correlation structures among variables, and discusses statis-
tical accuracy vs. ecological explanation in predictor variables. While all
this traditionally has been managed within a GIS, this part demonstrates
the required steps can be done in R by providing a number of examples.
This part introduces the main databases of environmental predictors and
explains how digital elevation models (DEM) and RS can be used to
derive ecologically more meaningful predictors. Issues related to species
data are dealt with in Chapter 7. It explains how to prepare one’s own
sampling, discusses issues of sample size, prevalence, and spatial autocorre-
lation. It specifically introduces algorithms to generate designed sampling
using regular or random design elements; and discusses and compares
presence–absence vs. presence-only sampling. Chapter 8 addresses issues
of ecological scale, namely resolution and extent aspects in the spatial,
temporal, and thematic realm. This chapter is partly theoretical, high-
lighting the effects of scale and extent on habitat suitability modeling, but
it also includes practical solutions for scaling data to appropriate com-
mon resolution and extent.
6 • Environmental Predictors: Issues
of Processing and Selection
1
http://worldgrids.org/doku.php?id=source_data and http://freegisdata.rtwilson.com/
index.html
2
www.ngdc.noaa.gov/mgg/topo/gltiles.html
3
www.ngdc.noaa.gov/mgg/global/global.html
4
http://srtm.csi.cgiar.org/
5
http://asterweb.jpl.nasa.gov/gdem.asp
6.1.2 Climate Data
There are several large-scale datasets available, and depending on the
nature of the analysis, researchers can choose the one dataset that is appro-
priate for their analysis. Worldclim6 is probably the most widely used
global climate dataset for ecological analyses (Hijmans et al., 2005). It is
available in geographic projection at a spatial resolution of 30 arc seconds
(~1 km), but also at coarser resolutions (2.5, 5, and 10 arc minutes). In
order to project the climate extrapolation spatially, the SRTM dataset (see
above) was used and resampled to the resolution and spatial registration
of the GTOPO30 DEM where available, while the latter was used else-
where. Worldclim maps are based on a large number of climate stations
using long-term (1950–2000 in general, but 1961–1990 for most stations)
monthly mean climate information for precipitation (47,554 stations),
maximum (24,542 stations), and minimum (14,930 stations) temperature.
It is generally available as version 1.4. Just recently, version 2.0 was released
for beta testing (beta release 1, June 2016). Due to the globally uneven
distribution of climate stations, the mapping uncertainty varies substan-
tially in space. The climate mapping method is based on the ANUSPLIN
package, which uses thin-plate smoothing splines (Hutchinson, 1995). In
addition to basic climate parameters such as monthly mean, minimum,
and maximum temperature and precipitation, this dataset provides a set
of 19 so-called bioclimatic variables (bioclim), which are supposed to be
more biological relevant than the original monthly climate layers, from
which the bioclim variables are derived. The datasets are primarily made
available for the current climate. However, the website also offers datasets
for historical data (three time slices for the last interglacial, last glacial
maximum, and mid-Holocene), as well as for projected future climates
for large numbers of global circulation models (GCM) and scenarios
6
http://worldclim.org/
7
www.ccafs-climate.org/
8
www.prism.oregonstate.edu/
9
https://daymet.ornl.gov/
10
http://forest.moscowfsl.wsu.edu/climate/
11
www.cru.uea.ac.uk/cru/data/hrg/
12
http://cmip-pcmdi.llnl.gov/index.html
13
www.cordex.org/
14
http://landcover.usgs.gov/globallandcover.php
15
https://lpdaac.usgs.gov/dataset_discovery/modis/modis_products_table
16
www.landcover.org/data/landcover/
17
https://lpdaac.usgs.gov/data_access
18
http://due.esrin.esa.int/page_globcover.php
19
www.mrlc.gov/
20
www.eea.europa.eu/data-and-maps/
21
www.landcover.org/data/
hood, and historical data, but also to indicators and tools that can be used
for European territorial development and policy formulation at different
geographical levels. The data included in the ESPON database is mainly
from European institutions such as EUROSTAT and EEA and is aimed
at a wide range of users (researchers, policy makers, stakeholders).
22
http://gadm.org/
23
www.naturalearthdata.com/
24
http://database.espon.eu/db2/
6.2.1 Introduction
We will make use of the GIS layers available in ESRIs grid and vec-
tor formats as well as GeoTIFF, which are the most widely distributed
formats for interacting statistics with GIS. In order to prepare for fur-
ther statistical analyses, simple spatial operations and analyses are intro-
duced, such as building spatial datasets, importing grids, vectors and
points, developing new and partly DEM-derived raster layers, inter-
secting points with grids, or building simple SDMs and predicting the
25
http://nils.weidmann.ws/projects/cshapes
26
www.marineregions.org/about.php
Furthermore, and for all future analyses, it is important to make sure that
R is directed to the correct working directory:
> setwd(“PATH/data/”)
27
www.unil.ch/hsdm
These grids are now loaded into R and we can access them directly.
In order to evaluate the imported datasets, we can use GIS-type com-
mands that give us this information, similar to the “describe” command
in ArcInfo:
> bbox(bio7); ncol(bio7); nrow(bio7) ; res(bio7)
min max
s1 - 180.00000 180.00000
s2 -57.49999 83.50001
[1] 240
[1] 94
[1]1.5 1.5
Wordclim data have been loaded at 1.5° lat/lon resolution, which roughly
corresponds to a 167 km spatial resolution at the equator.
We then load the GTOPO30 global DEM. As seen in Section 6.1.1,
this dataset is available at 30 arc seconds, translating into roughly 1.85 km
at the equator. We also assign a projection to this raster, without much
explanation here as this topic is treated later (Section 6.2.8):
> elev <- raster(“raster/topo/GTOPO30.tif”)
> projection(elev) <-“+proj=longlat +datum=WGS84 +ellps=WGS84
+towgs84=0,0,0”
If we now compare the elev and the bioclim grids (e.g. bio7), it becomes
obvious that they do not have the same spatial extent, pixel resolution or
number of rows and columns:
> bbox(elev); ncol(elev); nrow(elev); res(elev)
min max
s1 -
180 180.00002
s2 -60 90.00001
[1] 43200
[1] 18000
[1]0.008333334 0.008333334
Specifically, we see that elev has many more rows and columns than
bio7, and also the lower-left coordinate, the extent, and the pixel size dif-
fer. They have neither the same resolution nor extent.
This creates three maps with conserved x-and y-axis ratios. In a regu-
lar plot, such axis ratio conservation can be achieved using the option
Figure 6.2 Illustration of the pairwise correlation structure of three bioclim and one
elevation grid.
(asp=1), but in raster this is not necessary. In the last sequence of com-
mands we have used three different color schemes in one plot window
by using the par() command (Figure 6.1), which is a very flexible
instrument to define plot parameters.
The raster package provides an easy way of observing the correlation
between rasters. The same plot() command in raster can be used to
visualize the shape of correlations among raster layers (Figure 6.2):
> par(mfrow=c(2,2))
> plot(bio3,bio12,xlab=“bio3”,ylab=“bio12”,col=“gray55”)
> plot(bio3,bio7,xlab=“bio3”,ylab=“bio7”,col=“gray55”)
> plot(bio3,bio11,xlab=“bio3”,ylab=“bio11”,col=“gray55”)
> plot(bio3,elev1,xlab=“bio3”,ylab=“elevation”,col=“gray55”)
> par(mfrow=c(1,1))
Next, we need to store the global contour line as a spatial object, and
finally write it into a line shapefile. To do this, we use the rasterTo-
Contour() command in the raster package.
> iso<-
rasterToContour(elev, nlevels=7, levels=c(0, 500, 1000,
1500, 2000, 3000, 4000, 5000))
> writeOGR(iso, dsn=“vector/globe”, layer=“isolines”,
“ESRI Shapefile”, check_
exists=T, overwrite_
layer=T)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.138e+02 2.146e+00 192.82 <2e-16 ***
alt -
3.624e-
02 8.603e-
04 -42.13 <2e-
16 ***
abs(lat) -
8.216e+00 2.885e-
02 -
284.82 <2e-16 ***
abs(lon) -
1.794e+00 5.389e-
02 -
33.29 <2e-16 ***
I(lon^2) 7.122e-
03 3.206e-
04 22.22 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
We see that we can express the mean temperature of the coldest quar-
ter (bio11) from a sample of pixels in the climate map as a function of
elevation, latitude, and longitude. We now apply this function to each
cell of the elevation grid and then plot the resulting grid (Figure 6.5). To
implement this calculation, we first need to load the two missing raster
files (lat, lon) from TIFF files and then calculate the difference between
the observed and the modeled mean temperature of the coldest quarter:
> lat<-raster(“raster/other/latitude.tif”)
> lon<-raster(“raster/other/longitude.tif”)
> tcold<-4.138e+02 + (- 3.624e-02 * elev1) + (-
8.216e+00 *
abs(lat)) + (- 1.794e+00 * abs(lon)) + (7.122e- 03 * lon^2)
> diff_obs_model_temp <-bio11 - tcold
> par(mfrow=c(2,1))
> plot(tcold, col=rev(rainbow(100))[20:100],main=“Modelled mean
temperature of the coldest quarter”)
> contour(elev, nlevels=7,
levels=c(0, 500, 1000, 1500, 2000, 3000, 4000, 5000), add=T,
labels=““, lwd=.3)
Next, we need to define the focal window including the weights. In this
analysis, we give equal weight to all cells in a quadratic 15 × 15 cell win-
dow. The window weights matrix is created as follows so that all weights
have a value of 1:
> w <-matrix(rep(1,225), nr=15, nc=15)
Finally, we calculate the focal operation over the 10 km DEM using the
w window weights and we subtract this focal analysis result from the
original elevation grid. We call the resulting raster TopEx, representing
topographic exposure with negative values representing sinks and valleys
and positive values representing ridges and peaks.
Figure 6.6 Topographic exposure over South America calculated from a focal analysis.
To plot this raster, we first define a color scale that is optimal for terrain
data, be it solar radiation or topographic exposure data. Then we apply
this color scheme to the plot so that positive values are shown in light
shades and negative values in dark shades (Figure 6.6).
> topography.c <-colorRampPalette(c(“dodgerblue4”,
“lemonchiffon”, “firebrick3”))
> plot(TopEx,col=topography.c(100),
main=“Topographic Exposure over South America”)
> contour(elev_sa10, nlevels=7,
levels=c(0, 500, 1000, 1500, 2000, 3000, 4000, 5000),
add=T, labels=““, lwd=.1)
The default option in the focal analysis requires that values be only
calculated for those windows that have all 15 × 15 (= 225) cells avail-
able with numerical information. Since that would remove some of the
marginal area along the coast, we set na.rm=T. This setting removes
unavailable or missing values (NAs) from the focal computations and
allows calculating an output for windows with fewer than 225 cells with
numerical values in our case.
Terrain analyses constitute a special kind of focal analysis. This type of
analysis can be used to calculate slope, aspect, or to shade a DEM. Here,
These two maps are now used to shade the terrain model elev, from
which slope and aspect are derived. Here, we set the altitude angle to 30°
and the azimuth angle to 315°, in order to illuminate the terrain model.
> hillshade <-hillShade(slope, aspect, 30, 315)
The hill-shaded raster is stored here as a TIFF file for later use, it can then
be read from here again in a new R session.
> writeRaster(hillshade, “raster/topo/hillshade.tif”,
overwrite=T)
> hillsh<-raster(“raster/topo/hillshade.tif”)
Such maps can now be used as background images for illustrating the-
matic layers. We illustrate this with the example of North America, first
by cropping the global hillshade image to the North America extent, and
then by overlaying the elevation map (elev) in a semi-transparent man-
ner. In order to do this, we redefine the plot extent as above for North
America. The argument alpha=0.5 adds semi-transparency, making
the superimposed layer 50% transparent. In this way, the underlying hill-
shade is partly visible.
> plot_extent<-extent(-124,-66,24,50)
> hillsh_na<-crop(hillshade, extent(-188,-50,15,90))
> plot(hillsh_ na, col=grey(0:100/ 100), legend=FALSE, axes=F,
ext=plot_extent)
> plot(elev_na,col=terrain.colors(100),alpha=.5,add=T,
ext=plot_extent)
In the next plotting example, we create our own color palette, mimick-
ing the elevation color ramp of ArcGIS. We first generate a function for
this color palette, and then we generate 100 color values.
> dem.c<-colorRampPalette(c(“aquamarine”, “lightgoldenrodyellow”,
“lightgoldenrod”, “yellow”, “burlywood”, “burlywood4”, “plum”,
“seashell”))
> cols<-dem.c(100)
There are numerous other possible analyses. Only a few examples have
been provided here. Read for example the manuals for the raster,
maptools, rgdal, and sp packages in order to find out more. Again,
for large datasets, the GIS functionality may be somewhat slow. However,
with medium to small datasets, R offers an extremely flexible and power-
ful way of combining statistical modeling with GIS functions.
Single grids from a grid stack can be selected using the $ selection (e.g.
world.stk$bio7). In addition, we can obtain a very rapid overview of
a stack by mapping its content (Figure 6.8). This is only recommended
if the stack contains a small number of grids. Here, we plot the stack for
layers 2 to 5, thus not plotting the first layer, i.e. elevation:
> plot(world.stk[[2:5]], col=rainbow(100,start=.0,end=.8))
Working with grid stacks has several advantages. It enables sampling all
elements of a stack from a point file using a single command (see below),
or one can process all stacked grids equally with one single command.
One example of this latter option is changing the cell size (spatial reso-
lution) of all grids in our stack. Assuming we want to convert our 1.5°
stack of rasters (our world.stk) to a lower resolution (3 × 3 degree),
When plotting this stack, we can immediately see that the cells are larger
than in the original stack. This represents an upscaling procedure (with
respect to grain size).
6.2.8 Re-Projecting Grids
Rasters (or vector type files) assembled from various sources might not
always be found in the same projection. Overlaying these environmental
files with a set of sample points for modeling purposes or creating a stack
of raster files requires all the files to be part of the same projection. We
have seen previously how to assign a projection. Here we work through
an example of the steps that need to be taken when reading a raster from
a different projection.
Here, we read long-term (30-year normal) annual climate data for pre-
cipitation and temperature from the PRISM project (see Section 6.2.1).
> prec_yearly_usa <- raster(“raster/prism/prec_30yr_normal_
annual.asc”)
> tave_yearly_usa <- raster(“raster/prism/tave_30yr_normal_
annual.asc”)
The datum and the ellipsoid of the projection differ from the ones used
when reading the Worldclim data (bio3, etc.). In order to superimpose these
maps correctly with the points, vectors and rasters that use the WGS84
datum and ellipsoid, we need to re-project the PRISM rasters to the same
WGS84 datum and ellipsoid. It is also interesting to see that the raster
already has the projection information, despite being read from an ascii file.
This information is attached, because in the same folder of PRISM climate
files, there are also associated *.prj files available with the same name as
the *.asc ascii files.The raster environment recognizes these files, and
reads the projection information from them. Removing these *.prj files
from the directory results in reading un-projected raster files.
The new output raster now has a 0.025° spatial raster resolution. This
translates to c. 2.5 km in a metric projection. The original 0.00833333°
resolution would project to c. 800m cell size if projected to a metric
projection such as Albers equal area (aea) or Lambert azimuthal equal
area (lazea). It is important to note that the lon/lat geographic coordinate
system does not conserve area or angles, and therefore cannot be used
to perform any area-or distance-based calculations. For such analyses, all
layers should be projected to equal area-based projections, for example.
An overview of PROJ4 projections that represents the basis for the pro-
jection definitions in R are available online.28
Such a file is read as a data frame object, which includes two columns
that represent coordinates (lat and lon). However, the file is not of class
spatial. We can check this with the class() command.
> class(pinus_edulis)
28
http://geotiff.maptools.org/proj_list/
When checking again using the class() command, the object is now
of class spatial. However, we also see that no coordinate system has yet
been defined for this spatial object. So we then need to define it:
> projection(pinus_
edulis) <-“+proj=longlat +datum=WGS84 +no_
defs +ellps=WGS84 +towgs84=0,0,0”
Using the option (sp=T), we ensure that the generated file is converted
directly into an object of class spatial, and not only into a tabular data
frame object. The new spatial object has just one column (next to the
invisible coordinates), namely the download date.We just give it this name:
> names(pinus_edulis)[1]<-”dwnld.date”
Next, we want to extract climate data from the world.stk object used
above (Section 6.2.7) so that we overlay several stacked, bioclimatic raster
objects with one single command using the bilinear interpolation
method for a set of points.
> pts.clim<-extract(world.stk, pinus_edulis, method=“bilinear”)
> pin_edu.clim<-data.frame(cbind(coordinates(pinus_edulis),
pts.clim, pinus_edulis@data))
Here, we now plot this file over North America in order to visualize the
presence of Pinus edulis as downloaded from GBIF using a predefined
extent that is smaller than the hillshade created above (Figure 6.9, note
that this figure is printed in gray).
In this way, the calibration and the evaluation points are plotted in maroon
and sea green, respectively, for presence points (Figure 6.10). The absence
points for the two datasets are presented in darker and lighter shades of gray.
Now, we first need to remove the NA values using the na.omit() com-
mand and at the same time convert the object to a data frame. Second,
We have now completed our data preparation and can fit a simple GLM
object.
A simple GLM model is fitted to illustrate GIS capability in R. Here,
we fit a four-parameter model (bio3, bio7, bio11, and bio12) with both
linear and quadratic terms, and we perform simple stepwise bi-directional
parameter selection in order to optimize this model.
> vulpes.full <-glm(Vulpes.vulpes~bio3+I(bio3^2)+bio7+I(bio7^2)+
bio11+ I(bio11^2)+bio12+I(bio12^2), family=“binomial”,
data=pts.cal.ovl)
> vulpes.step <-step(vulpes.full, direction=“both”, trace=F)
We find that both models give roughly the same adjusted D2 value, with
a slightly higher value for the stepwise-optimized model. The (unad-
justed) D2 would be higher for the full model, and slightly lower for
the stepwise-optimized model. However, the adjusted D2 considers the
number of parameters and the number of observations used, and thus
penalizes the stepwise-optimized model, which has the linear term of the
bio7 variable removed, less:
> summary(vulpes.step)
Call:
glm(formula = Vulpes.vulpes ~ bio3 + I(bio3^2) + I(bio7^2) +
bio11 + I(bio11^2) + bio12 + I(bio12^2), family = “binomial”,
data = pts.cal.ovl)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -
7.596e+00 7.330e-
01 -
10.362 < 2e-
16 ***
bio3 3.711e-
01 4.003e-
02 9.269 < 2e-
16 ***
I(bio3^2) -
6.517e-
03 5.446e- 11.967 < 2e-
04 - 16 ***
I(bio7^2) 3.084e- 06 25.009 < 2e-
05 1.233e- 16 ***
bio11 5.277e- 04 7.141 9.25e-
03 7.390e- 13 ***
I(bio11^2) -
2.346e-
05 2.582e- 9.084 < 2e-
06 - 16 ***
bio12 2.694e- 04 9.702 < 2e-
03 2.776e- 16 ***
I(bio12^2) -
8.662e-
07 1.228e-
07 -
7.052 1.76e-
12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
We will not further test this model here, as the whole of Part IV is
devoted to model evaluation. Many more GIS functions are available in
R, which can be used for habitat distribution modeling of species. The
basic steps introduced so far should now offer a good basis from which
readers can further explore this functionality.
the available packages in R. Each reader may have his/her own preferred
RS datasets, which they can combine with other geospatial data. We will
only superficially address the issue of RS data here. Most importantly,
this section does not aim to introduce the basics of RS data process-
ing, such as georegistration, relative or absolute atmospheric correction,
or cloud masking, for which we refer readers to specialized books (e.g.
Normand et al., 2013). Instead, it constitutes a simple introduction to
loading and re-calculating remotely sensed data in order to combine
these with other GIS layers for powerful statistical analyses (Carlson et al.,
2014; Pottier et al., 2014). For this purpose, we use a smaller-scale dataset,
which improves the handling and visualization of RS products.
6.3.1 Introduction
RS has a lot to offer biogeographers and macroecologists (Kerr and
Ostrovsky, 2003). The careful processing of RS data requires specialist
knowledge, and this is not the subject here. However, RS data is increas-
ingly available in pre-processed formats, and can be used like any other
form of GIS data, if prepared carefully. One of the biggest advantages of
RS data is that it informs us objectively, usually with full coverage of a
larger study area, about the state of the Earth’s surface at a specific point
in time. There are many different systems such as passive optical or active
LandSat7 ETM+ bands:
29
http://landsat.usgs.gov/
30
http://landsat.gsfc.nasa.gov/
31
http://landsat.usgs.gov/landsat8.php
32
http://landsat.usgs.gov/band_designations_landsat_satellites.php
33
http://reverb.echo.nasa.gov/reverb/
34
http://landsat.usgs.gov/Landsat_Search_and_Download.php
The next step is to crop the six Landsat bands to the canton of Zurich.
> band1_blue_crop<-crop(band1_blue,extent(Zurich))
> band2_green_crop<-crop(band2_green,extent(Zurich))
> band3_red_crop<-crop(band3_red,extent(Zurich))
> band4_nir_crop<-crop(band4_nir,extent(Zurich))
> band5_swir1_crop<-crop(band5_swir1,extent(Zurich))
> band7_swir2_crop<-crop(band7_swir2,extent(Zurich))
> tmp<-stack(band1_blue,band2_green,
band3_red,band4_nir,band5_swir1, band7_swir2)
> L7_010824<-crop(tmp, extent(Zurich))
and process MODIS data directly within R. Finally, other packages like
hyperSpec or hsdar, currently in the early stages of development, can
be used for processing hyperspectral RS data.
Finally, we plot the results of our simple analyses using the 250 m
resolution hillshade available for the study area (Figures 6.13 and 6.14).
For the second set of graphs, we specifically design a new color palette
ygb.c, using the colorRampPalette() command, which allows us
to assign colors to a palette. We then assign the number of color shades
to be generated and the command interpolates between the assigned
core colors.
> hill_250m_utm <- raster(“raster/topo/hill_250m_utm.tif”)
> par(mfcol=c(2,2))
> plot(hill_250m_utm,col=grey(0:100/100), legend=FALSE, axes=F,
ext=extent(SAVI), main =“NDVI”)
> plot(NDVI,col=rev(terrain.colors(20,alpha=0.6)),add=T)
(c) (d)
1.6
0.1
1.4 0.4
0.0
1.2
0.2 -0.1
1.0
0.0 -0.2
0.8
-0.3
0.6 -0.2
-0.4
0.4 -0.4 -0.5
0.2 -0.6
> par(mfcol=c(1,3))
> plot(hill_250m_utm,col=grey(0:100/100), legend=FALSE, axes=F,
ext=extent(SAVI), main =“Brightness”)
> plot(L7_010824_Brightness,col=rev(paste(ygb.c(20),”B3”,
sep=““)),add=T)
36
www.worldclim.org
We can see that almost all the variables are above a value of 10.0. Usually
values from 5 to 10 are considered as critical for multi-variable correla-
tion. Some authors suggest that VIF values of up to 20 can be accepted,
but we do not recommend going above 10. Specifically, we see that bio4
has a very high VIF value. We next test what happens if we remove bio4
from our analyses, the same variable that showed extremely high correla-
tion with several other variables in Figure 6.15.
> vif(data[,c(4,6:8)])
Variables VIF
1 bio3 5.866959
2 bio7 4.678556
3 bio11 6.881024
4 bio12 1.933955
We can now check which variables remain when only a certain level of
correlation is accepted, say r = 0.7.
> vifcor(data[,4:8], th=.7)
3 variables from the 5 input variables have collinearity
problem:
bio4 bio11 bio7
In our case, we can see that only bio3 and bio12 remain at a correlation
threshold of 0.7 (which we can also see from Figure 6.15). However,
vif() does not calculate in the same way as bivariate correlations. VIF
is based on the square of the multiple correlation coefficients resulting
from regressing a predictor variable against all other predictor variables. It
therefore detects multicollinearities that cannot always easily be detected
with a simple pairs() scatterplot correlation.
More generally speaking, we might ask why we should be concerned
about correlations. On the one hand, some statistical methods will fail to
> cor(data$bio12,data$bio7)
[1] -0.6439925
> cor(log(data$bio12+.001),data$bio7)
[1] -0.4410228
1
www.gbif.org
gaps in the collections and these gaps might cause difficulties later on
when analysing the data.
The rapid growth of web databases such as GBIF has made access to
data much simpler (e.g. Table 7.1). Yet this does not necessarily mean
that the data can be used without restriction. Extractions from large,
community databases such as GBIF need to be treated with caution for
the following reasons: (i) uncertainty in species identification, (ii) low or
unknown accuracy of sample location, (iii) lack of design, (iv) incomplete
or uneven spatial coverage of the true distribution of a species, or (v) spa-
tial autocorrelation in sample locations.
The issues related to species identification cannot be easily resolved,
and are not addressed in this book. The second issue related to uncer-
tainty in sampling location is covered in Chapter 8. The lack of design is
a third, serious issue for all analyses that attempt to derive a probabilistic
habitat suitability estimate from large datasets. If lack of design is an issue,
one might consider resampling existing databases in order to increase the
level of design (Broennimann and Guisan, 2008; Veloz, 2009; Anderson
and Raza, 2010; Hijmans, 2012; Syfert et al., 2013; Mateo et al., 2015).
Such resampling can only improve, but not fully remove the design bias
inherent to such large datasets (see Section 7.4 for some suggestions).The
fourth issue relating to the lack of coverage cannot be easily overcome,
and its effects are treated in Section 8.2. The fifth issue is spatial autocor-
relation, which is an inherent property of spatially structured, ecological
> xy <-pts.cal[,1:2]
> dists <-as.matrix(dist(xy))
> dists.inv <-1/
dists
> diag(dists.inv) <- 0
> Moran.I(vulpes.step$residuals, dists.inv)
$observed
[1]0.01506913
$expected
[1] -0.0001695203
$sd
[1]0.0003468404
$p.value
[1] 0
We learn from this example that the p-value for testing for spatial auto-
correlation is highly significant (p < 0.05). We therefore find spatial
autocorrelation in the residuals. Next, we will want to plot the spatial
correlation structure against distances between our observations. Samples
separated by a short distance should have greater similarity (and thus
correlation) than samples separated by a larger distance. We evaluate this
distance dependence using a Mantel correlogram in the ncf package.
This package makes it easy to plot a spatial (Mantel) correlogram. This
is done by first extracting the residuals from the GLM object, and then
randomly selecting 500 points from the residuals and from the x- and
y-coordinates (xy object from the previous example). This information
is needed in the correlog() command in the ncf package. We store
the result of this command in the spat.cor object. We can either plot
this object directly by typing “plot(spat.cor)”, or we can extract the
necessary information, and make a neater plot (see Figure 7.1)
Figure 7.1 Spatial correlation (a) and spatial patterns (b) of model residuals for the
Vulpes vulpes stepwise-optimized GLM model. The correlogram reveals a low cor-
relation at a short distance. (A black and white version of this figure will appear in some
formats. For the color version, please refer to the plate section.)
> library(ncf)
> rsd<-vulpes.step$residuals
> rnd<-sample(1:length(rsd),500,replace=T)
> spat.cor<-correlog(xy[rnd,1],xy[rnd,2],rsd[rnd],increment=2,
resamp=10)
# Stepwise optimized model
paok1s<-step(paok1f,direction=“both”,trace=F)
paok0<-
predict(paok1s,paok,type=“response”)
pr.qual[i,2]<-ecospat.adj.D2.glm(paok1s)
tmp1 <-data.frame(1:length(paok0),paok[,1],paok0)
names(tmp1) <-c(“ID”,”Observed”,”Predicted”)
pr.qual[i,4]<-auc(tmp1)$AUC
pr.qual[i,7]<-ecospat.max.kappa(paok0,paok[,1])[[2]][1,2]
pr.qual[i,10]<-ecospat.max.tss(paok0,paok[,1])[[2]
][1,2]
# Xval procedure
paok1x<-ecospat.cv.glm(paok1s)
tmp1 <-data.frame(1:length(paok0),paok[,1],paok1x$predictions)
names(tmp1) <-c(“ID”,”Observed”,”Predicted”)
pr.qual[i,5]<-auc(tmp1)$AUC
pr.qual[i,8]<-ecospat.max.kappa(paok1x$predictions,paok[,1])
[[2]][1,2]
pr.qual[i,11]<-ecospat.max.tss(paok1x$predictions,paok[,1])[[2]
]
[1,2]
}
Finally, we plot all model quality results from the prevalence test
(Figure 7.2). Note that the sample size plot is also presented in the same
figure, but not given as code example here.
> plot(pr.qual$Prev,pr.qual$Kappa.f,ty=“l”,lwd=5,col=“#00FF00B4”,
ylim=c(0,1.0),xlim=c(0,.5),xlab=“Sample size”,
ylab=“Model Quality”,main=“Prevalence Effects”)
> points(pr.qual$Prev,pr.qual$Kappa.s,ty=“l”,lwd=5,col=“#00CD00B4”,
lty=3)
> points(pr.qual$Prev,pr.qual$Kappa.x,ty=“l”,lwd=5,col=“#008B00B4”,
lty=2)
> points(pr.qual$Prev,pr.qual$TSS.f,ty=“l”,lwd=5,col=“#ADD8E6B4”)
> points(pr.qual$Prev,pr.qual$TSS.s,ty=“l”,lwd=5,col=“#9FB6CDB4”,
lty=3)
Figure 7.2 (a) Sample size and (b) prevalence effects on model accuracy in the global
Vulpes vulpes dataset. Low prevalence has a strong effect on cross-validated Kappa and
TSS accuracies. (A black and white version of this figure will appear in some formats. For the
color version, please refer to the plate section.)
Before classifying the bio3 and bio12 rasters, we crop the global bioclim
rasters from Section 6.1.3 to the extent of the conterminous lower 48
states of the United States. For this we load a shapefile of the states of the
United States and extract the lower 48 states:
> usa <-shapefile(“vector/
usa/
USA_
states.shp”)
> usa_
contin <-usa[usa$STATE_NAME != “Alaska” & usa$STATE_NAME
!= “Hawaii”, ]
With the second command, we select all states that are neither Alaska
nor Hawaii, thus representing the conterminous United States. Next we
need to generate an empty raster, to which we then rasterize the state
polygons using the DRAWSEQ field:
> empty_
raster <-raster(bio3)
> usa_
raster <-rasterize(usa_
contin,empty_
raster,field=“DRAWSEQ”)
We now have generated two reclassified raster layers, one with class
numbers running from 1 to 9 and the other with numbers of 10, 20,
… 90. Then we summed the two layers so the new values are now
unique, meaning that we can trace back their origin. For example, a
number of 56 means that it originates from class 5 (code 50) of bio12
and class 6 of bio3. No other combination of classes would generate
this code.
Next we plot the histogram of class frequencies, and we plot the
stratification map with the rainbow color scheme (Figure 7.3). This
includes checking the range of possible values for plotting the histo-
gram (cspan). In order to better visualize the classes, we also allocate
random colors to the map, and we click on five locations with the
mouse to read values from the map (shown after you click five times
at any location on the map). The randomly allocated colors are stored
in a variable called yb, a name that has no ecological or R-specific
meaning:
> cspan<-maxValue(B3B12.comb)-minValue(B3B12.comb)
> yb<-rainbow(100)[round(runif(cspan,.5,100.5))]
> par(mfcol=c(1,2))
> hist(B3B12.comb,breaks=100,col=heat.colors(cspan),
main=“Histogram values”)
> plot(B3B12.comb,col=yb,main=“Stratified map”, asp=1)
> click(B3B12.comb,n=5,type=“p”,xy=T)
> par(mfcol=c(1,1))
Figure 7.3 (a) Histogram of the frequencies of the classes, and (b) stratification map
with the rainbow color scheme used to identify the strata from the environmental
stratification of the study region. (A black and white version of this figure will appear in
some formats. For the color version, please refer to the plate section.)
It is clear to see that these designs are arranged in a sequence from fully
random to fully regular.The four graphs are plotted as follows (Figure 7.4):
> par(mfcol=c(2,2))
> plot(B3B12.comb,main=“random”,col=rev(terrain.colors(25)))
> points(s.rand, pch=3, cex=.5)
> plot(B3B12.comb,main=“stratified”,col=rev(terrain.colors(25)))
> points(s.strt, pch=3, cex=.5)
> plot(B3B12.comb,main=“nonaligned”,col=rev(terrain.
colors(25)))
> points(s.nona, pch=3, cex=.5)
> plot(B3B12.comb,main=“regular”,col=rev(terrain.colors(25)))
> points(s.regl, pch=3, cex=.5)
> par(mfcol=c(1,1))
We have now developed four different designs, and each of these data-
sets can be used to sample the environmental layer stacks as done previ-
ously. Once this is done, we can check how the different designs affect
the retrieved distribution layers sampled with the respective design. We
can compare this distribution with the known truth (when plotting the
distribution of the raster layers).
approach still has a lower risk of failing to detect rare strata. With both
strategies, even the rarest strata will still be chosen with a few points,
(proportional) or with an equal number of points as other strata (equal),
unless they are too small (see Figure 7.6). The equal number variant
assigns the same number of points, which –depending on the number of
available classes –can result in slightly different numbers than those ori-
ginally given. The proportional variant assigns the numbers according to
the number of pixels available in the stratification grid. If the proportion
of a stratum results in less than one sample being allocated to this class,
then this class is not sampled.
We apply the equal, and then the proportional, allocation of sample
points according to our random (environmentally) stratified design using
the ecospat.recstrat_regl() and the ecospat.recstrat_
prop() functions. These functions take the stratification grid and the
total number of points to allocate per strata as arguments.
Finally, we plot 150 points from the two designs over the whole study
area (Figure 7.5). We can see that the proportional design reveals a very
similar distribution as is available in the study area, while the equal design
(even numbers per stratum) generates a more uniform distribution of
strata.
> envstrat_equ<- ecospat.recstrat_regl(B3B12.comb,150)
> envstrat_prp<- ecospat.recstrat_prop(B3B12.comb, 150)
> par(mfcol=c(1,2))
> plot(B3B12.comb,main=“Proportional Sampling”, col=rev(terrain.
colors(25)))
Figure 7.6 Illustration of the class distribution between the two sampling designs.
The x-axis labels indicate the identifier of the strata from the environmental strati-
fication of the study region.
We then mask the DEM and generate a contour of the 1000 m altitude
band throughout the United States, as illustrated in Figure 7.7. This con-
tour line is now ready to use to design different line sampling strategies
by allocating sample points along this elevation contour.
> dem_
usa_
masked <-mask(dem_usa_
10km, usa_raster)
> iso_
1000m<-rasterToContour(dem_
usa_masked, nlevels=1,
levels=c(1000))
> plot(dem_usa_
masked,col=dem.c(100),
main=“Elevation Contours at 1000m”)
> lines(iso_1000m, lwd=1.3, col=2)
Figure 7.7 Elevation contour lines (1000 m) plotted in the study area. (A black and
white version of this figure will appear in some formats. For the color version, please refer to
the plate section.)
Figure 7.8 Three linear sampling designs applied along elevation contours in the
study region: (a) random, (b) stratified, and (c) regular.
8 • Ecological Scales: Issues of
Resolution and Extent
Figure 8.1 (a) Species occurrence data with positional uncertainty illustrated by cir-
cles around the noted location (+) mapped along artificial “latitude” and “longi-
tude” coordinates. (b) Remaining points with accepted positional uncertainty and
the associated spatial resolution of the raster (gray lines) used for calibration and
prediction. The excluded species occurrences are shown with a red cross.
8.1.2 Temporal Resolution
Temporal resolution is rarely properly considered in the context of mod-
eling habitat suitability and species distributions. We usually collect a set
of distribution data, and take “current” climate and environmental pre-
dictor variables to relate the distribution to these predictors. However,
there might be a strong mismatch between the observed distribution
and the drivers (predictors) responsible for shaping this distribution. We
distinguish between three different aspects, which we discuss here briefly.
The three aspects all relate to the problem of capturing the niche of
a species when overlaying observed presence and absence points with
climate and other environmental predictors and then fitting statistical
models: (i) Does “current” climate refer to the period responsible for
the observed presence of a species? (ii) Is there a time lag in a spe-
cies’ response to changes in environmental conditions or does the spe-
cies respond instantaneously to these changes? And (iii) is it possible to
8.1.3 Thematic Resolution
Thematic resolution is another scale issue that is rarely considered or
discussed in habitat suitability and species distribution modeling stud-
ies. Thematic resolution concerns the extent to which a predictor
variable resolves the thematic content it intends to represent. We can
represent precipitation in units of millimeters, centimeters, decimeters,
or meters, for example. While we may not be able to recognize the dif-
ference between millimeters and centimeters, this is clear for decimeters
or meters when representing maps in these classified units. Representing
precipitation as classified integer unit in meters would be considered
a very coarse thematic resolution. Of course, we can always represent
any climate variable as real rather than as integer numbers. This cannot,
however, be done easily with all predictors, notably categorical variables.
So we need to distinguish two issues that we will discuss in more detail
below: (1) which thematic resolution is meaningful from an ecological
viewpoint; and (2) which thematic resolution is meaningful from a sta-
tistical viewpoint?
The first question is not always easy to answer. Of course, we want
to have high thematic resolution for all the predictors used. Regarding
habitat units, geology, or soil information, we are often left with compar-
ably coarse or ecologically uninformative classifications. A time-classified
geologic stratification does not translate directly into ecologically mean-
ingful classes. We therefore often have numerous classes that have more
disadvantages than advantages for habitat suitability modeling, because
we may not have sufficient observations of presence and absence for each
of these strata. Here, we suggest reclassifying all strata that exhibit similar
ecological properties, irrespective of age of the tectonic stratum. In this
way, we generate a coarser temporal, but more a meaningful, thematic
stratification. We can for example collapse sedimentary rock based on
their clay and calcium content, which have direct impact on soil devel-
opment, soil pH, and nutrient availability. In the end, we have a much
smaller number of classes (thus lower thematic resolution), but these are
more useful for modeling habitat suitability, and this also directly translates
Figure 8.3 Illustration of a random stratified design along two variables, one being
a nominal class (here geology) and one being a numerical environmental variable
(here classified into classes of 100 units).The nominal variable has six classes, and the
numerical gradient has also been classified into six classes. Within each class combi-
nation, a randomly allocated set of five observations has been chosen.
This part covers the different statistical modeling approaches that can be
used to predict habitat suitability for species or other biological entities.
It does not aim to be exhaustive as this would require a book in itself.
Rather, it aims to present the modeling techniques that (i) are most com-
monly used and (ii) are implemented in R or can be easily called from
R. Numerous alternative or complementary approaches can be found in
Guisan and Zimmermann (2000), Elith et al. (2006), Franklin (2010a),
and Maher et al. (2014), for instance. As we have already seen in Part I,
selecting the appropriate modeling approaches is ultimately based upon
the ecological questions the researcher would like to address, and the
availability and accuracy of data to fit the models.
With the development of new powerful statistical techniques, the use
of HSMs in ecology has increased rapidly (see Part I, Box 2.1). These
models are static and probabilistic in nature, since they statistically relate
the distributions of populations, species, communities or biodiversity
to their contemporary environment. A wide array of models has been
developed to cover research areas as diverse as evolutionary biology,
macroecology, biogeography, functional ecology, conservation biology,
global change biology, and habitat or species management (see Guisan
and Zimmermann, 2000; Guisan and Thuiller, 2005; Thuiller et al., 2008;
Elith and Leathwick, 2009; Franklin, 2010b; Peterson et al., 2011; Guisan
et al., 2013; see Section 4.4 and Table 4.1).
In practice, ecological models can be separated into three main
types: descriptive, explanatory and predictive. This terminology is also
sometime used to distinguish between different biogeographic approaches
(e.g. Blondel and Aronson, 1995). In the modeling literature, most discus-
sions compare the respective strengths and drawbacks of predictive versus
explanatory models (e.g. Mac Nally, 2000; Austin, 2002; Guisan et al.,
2002), with little attention paid to descriptive models.
9.1 Concepts
Presence-only approaches are the simplest and oldest methods available,
usually based on very simple rules and assumptions (see Box 2.1). They
are particular in that they deal with presence-only data with no need
to create any background or pseudo-absence data. They can roughly be
separated into two categories –envelopes (e.g. BIOCLIM, HABITAT)
and distance-based approaches (e.g. ENFA, DOMAIN, Mahalanobis dis-
tance) –which will be developed in the next two sections. In Chapter 20,
we will briefly introduce point-process models that have been recently
been introduced into the field of species distribution modeling and
address most of the criticisms of traditional presence-only approaches
(see below). Since point-process models are still quite new in the field
and do not specifically model species presence, but rather species density
(i.e. species records per area), we decided to not detail them in this edi-
tion. However, Maxent is a specific case of point-process models which
is fully developed in Chapter 13.
Figure 9.1 Observed and potential distribution of the red fox using a rectilinear
envelope model (sre function in the biomod2 package). The potential distribu-
tions differ by the use of different percentiles to delineate the envelope. In all maps,
black = presence, light gray = absence.
We note that predictions from SRE using 100 percent of the data
erroneously predict the southern hemisphere as being suitable for the red
fox. Using the core 95 percent quantile allows for more accurate predic-
tion of the southern hemisphere, but at the cost of underestimating the
distribution in Russia. Generally speaking, such over-and under-predic-
tions highlight the relatively low predictive accuracy of SRE (Elith et
al., 2006). Indeed, it assumes independent rectilinear bounds and that all
variables are known, and it will cause over-prediction when not enough
variables are included and under-prediction with too many (or even
spurious) variables (Barry and Elith, 2006).This approach, although quite
simple, should thus be used with parsimony and care. However, it does
give a quick rough estimate of the habitat suitability of a given species
without much effort. It does not expect the predictor variables to be
uncorrelated, and it can map the distribution using many different vari-
ables at the same time.
Several refinements of environmental envelopes were later developed,
including DOMAIN (Carpenter et al., 1993), but we will not discuss
them here (and note that some of them are partly distance approaches;
see Section 9.3). These approaches are no longer routinely applied and
the few comparative analyses that have tested their predictive accuracy
have revealed only moderate to weak performance (Elith et al., 2006;
1
www.unil.ch/biomapper/
Figure 9.2 Ecological niche description of the red fox (function enfa() in the pack-
age adehabitatHS)
> par(mfrow = c(2, 2))
> level.plot(mammals_data$VulpesVulpes, XY = mammals_data[,
c(“X_WGS84”, “Y_
WGS84”)], color.gradient = “grey”,
cex = 0.3, show.scale = F, title = “Original data”)
> level.plot(en$li[, 1], XY = mammals_data[, c(“X_WGS84”,
“Y_WGS84”)], color.gradient = “grey”, cex = 0.3,
show.scale = F, title = “ENFA”)
> roc_enfa <-roc(mammals_
data$VulpesVulpes, en$li[, 1])
> threshold_enfa <-coords(roc_
enfa, “best”,
ret = c(“threshold”))
> Pred01 <-as.numeric(en$li[, 1] > threshold_enfa)
> level.plot(Pred01, XY = mammals_data[, c(“X_
WGS84”,
“Y_WGS84”)], color.gradient = “grey”, cex = 0.3,
show.scale = F,
title = “ENFA binary”)
Figure 9.3 Observed and potential distribution of the red fox modeled using ENFA.
The potential distribution is either expressed along a scale of habitat suitability val-
ues (light= low suitability to dark = high suitability), or in a binary form picturing
presence–absence (black = presence, light gray = absence).
10 • Regression-Based Approaches
10.1 Concepts
Regression-based approaches are by far the most commonly used in
ecology and other disciplines, and particularly in habitat suitability mod-
eling (Guisan et al., 2002). They usually rely on robust statistical theories
(e.g. sum of squares, maximum likelihood) and are treated in detail in
textbooks.
Regression relates a response variable (e.g. presence–absence, abun-
dance, biomass) to a set of pre-selected environmental predictors (e.g.
climate, land use, resource).The predictors can be used as untransformed
environmental variables or, in order to prevent multicollinearity in the
data, as orthogonal components derived from the environmental vari-
ables through multivariate analyses. As seen in section 6.4.2, one diag-
nostic to test for multicollinearity is the VIF (Montgomery and Peck,
1982; see Part II) and its derivation to test for various combinations
of variables. The classical ordinary least-square (OLS) linear regression
approach (often simply called linear model, LM) is theoretically valid
only when the response variable is normally distributed (i.e. Gaussian)
and the variance does not change as a function of the mean (homo-
scedasticity). In other words, homoscedasticity relates to the specific
case in which the error term (i.e. the random effect in the relationship
between the predictors and the response variable) is constant across all
values of the predictor variables. GLMs constitute a more flexible fam-
ily of regression models, which allow the response variable to follow
other distributions and non-constant variance functions to be mod-
eled. In GLMs, the combination of predictors (the linear predictors)
is related to the mean of the response variable through a link func-
tion. Using such link functions makes it possible to both transform the
response to linearity and maintain the predicted values within the orig-
inal range of values allowed for the response variable. By doing so, the
GLMs can handle Gaussian (e.g. biomass), Poisson (species abundance,
Regression-Based Approaches · 167
Table 10.1 Examples of commonly used distributions, associated families and
links for GLM. A classical ecological example is also given.
> library(biomod2)
> par(mfrow = c(2, 2))
The two models glm1 and glm2 mostly differ in terms of the hypoth-
eses used regarding the shape of the relationship between all variables
and the presence of the species. In glm1, one assumes that linear predic-
tors are sufficient, in glm2 one expects quadratic relationships, (i.e. non-
symmetric, unimodal or sigmoidal relationships). The poly function in
glm2 is an effective way of dealing with correlation between x and x2
and provide a more flexible response (i.e. non-symmetric unimodal) that
simply uses x + I(X)^2 in the formula.
In this particular example, the spatial distributions of the probabil-
ity of occurrence from the two different models appear rather similar
at first glance (Figure 10.1). However, let’s examine how the modeled
responses differ in environmental space by analysing the response curves
of the species along the environmental gradients fitted in the models
(Figure 10.2).
There are several ways of visualizing the response curves of a species
for the different models. One possibility is to use a function in the bio-
mod2 package, which implements the evaluation strip method proposed
by Elith et al. (2005). This method has the advantage of being independ-
ent of the algorithm used. For building the predicted response curves,
n-1 variables are set as constants to a fixed value (mean, median, min or
max, i.e. fixed.var.metric argument) and only the remaining one
(remaining two for three-dimensional response plots) varies across its
whole range (given by Data). The variations observed and the curve
thus obtained shows the sensibility of the model to that specific variable
(Figure 10.2.).
> library(ggplot2)
> ## create the response plot
> rp <-response.plot2(models = c(“glm1”, “glm2”),
Data = mammals_data[,
c(“bio3”, “bio7”, “bio11”, “bio12”)],
show.variables = c(“bio3”, “bio7”, “bio11”, “bio12”), fixed.var.
metric = “mean”,
plot = FALSE, use.formal.names = TRUE)
> ## define a custom ggplot2 theme
Regression-Based Approaches · 169
Figure 10.1 Observed (black = presence, light gray = absence) and potential distri-
bution of species Sp290 modeled by different GLM differing by the complexity of
the parameters (linear, quadratic, and second-order polynomials). The gray scale of
predictions (b, c) shows habitat suitability values between 0 (light, unsuitable) and 1
(dark, highly suitable).
Figure 10.2 Response curves of model glm1 (linear terms) and glm2 (quadratic
terms). Plotted are the probabilities of occurrence in function of the bioclimatic
variables.
Regression-Based Approaches · 171
Although stepwise regression is certainly appealing and used to be one
of the most commonly used means of reducing complexity in regres-
sion-like methods, it is often deemed to be a high-variance exercise
since the slightest disturbance in the response data can sometimes lead
to vastly different subsets of the variables (Johnson and Omland, 2004;
Whittingham et al., 2006). This is especially the case when the number
of predictor variables is large (over 10) and the variables correlated with
each other. We highly recommend, at least, reducing the number of vari-
ables first with PCA,VIF analyses, or simple pairwise correlation tests, to
ultimately select a series of non-correlated, ecologically relevant variables
(see Part II and Dormann et al., 2013).
The last few years have also seen the development of penalized regres-
sion and shrinkage rules as alternatives to stepwise regression. Penalizing
algorithms such as “lasso” or “ridge” have gained momentum in the
statistical literature, but also in the habitat suitability modeling literature
(Hastie et al., 2009; Renner and Warton, 2013; and see Chapter 11).
Lasso (Tibshirani, 1996, 1997) and ridge (Hoerl and Kennard, 1970; Le
Cessie and van Houwelingen, 1992) provide alternative algorithms that
shrink the estimates of the regression coefficients toward zero relative to
the maximum likelihood estimates. The overarching goal of the penalty
(or shrinkage) is to accurately estimate the parameters while avoiding
overfitting either due to multicollinearity of the predictors or overly
high dimensionality (i.e. too many predictors). The ridge penalty gener-
ally leads to many small but non-zero regression coefficients, while the
lasso penalty results in few regression coefficients with little shrinkage
and the remaining ones shrunk to zero. However, as in any optimiza-
tion process, one has to decide a priori what criteria should be used to
optimally shrink the parameters. This is determined by tuning a shrink-
age parameter (usually called λ) that takes values between zero (i.e. no
shrinkage, maximum likelihood estimation) and infinity (i.e. infinite
shrinkage, all regression coefficients set to zero). The penalized package
offers interesting tools to perform lasso and ridge regressions and select
the optimal λ by means of cross-validation.
Here, we provide an example of stepwise selection using the ste-
pAIC() function (in the MASS package). Let’s start by running an inter-
cept model that will serve as the starting model. Then, the stepAIC()
function will sequentially add and remove the different variables. There
are three important parameters in that function: scope, direction and
k. Scope can be used to specify the form of the different variables to be
We can see now the effects of the variable selection on the retained
best model (Figure 10.3).
Regression-Based Approaches · 173
These bivariate plots allow analysing the joint effects of two variables
on the modeled probability of presence (Figure 10.4). For instance, the
probability of occurrence is high for high values of bio3 and low values
of bio7.When both bio3 and bio7 are both low, the probability of occur-
rence of the red fox is also low.
The variable rankings can be easily extracted using the anova()
function.
> anova(glmModAIC)
Analysis of Deviance Table
Model: binomial, link: logit
Response: VulpesVulpes
Terms added sequentially (first to last)
Figure 10.4 Bivariate response curves from the model glmModAIC for four predictor
variables.
Df Deviance Resid. Df Resid. Dev
NULL 8541 11839.6
poly(bio3, 2) 2 5581.8 8539 6257.8
poly(bio7, 2) 2 1247.0 8537 5010.8
poly(bio11, 2) 2 299.1 8535 4711.7
poly(bio12, 2) 2 136.1 8533 4575.7
bio3:bio7 1 338.3 8532 4237.4
bio7:bio11 1 129.4 8531 4107.9
bio11:bio12 1 53.9 8530 4054.0
bio7:bio12 1 53.0 8529 4001.1
bio3:bio12 1 3.6 8528 3997.5
Regression-Based Approaches · 175
The potential distribution of the red fox does not differ significantly
between the two stepwise procedures (Figure 10.5). We would have
expected larger differences primarily when using small sample sizes, but
not when using big datasets as is the case here.
> library(gam)
> gam1 <-gam(VulpesVulpes ~ s(bio3, 2) + s(bio7, 2) + s(bio11,
2) + s(bio12, 2), data = mammals_
data, family = “binomial”)
> gam2 <-gam(VulpesVulpes ~ s(bio3, 4) + s(bio7, 4) + s(bio11,
4) + s(bio12, 4), data = mammals_
data, family = “binomial”)
Regression-Based Approaches · 177
Note that the response curves are quite similar to those obtained from
the GLMs (Figure 10.7). Therefore, it is clear that the degree of smooth-
ing has a relatively small effect in this example. However, it is important to
carefully check the complexity of models. GAMs are data-driven and thus
prone to overfitting the data when highly complex smoothers are used.
When modeling species distributions for predictive purposes, we do not
recommend using degree of smoothing higher than 4 or 5. Users who
want to model more complex relationships, e.g. in order to very closely
fit and predict the calibration data, may use a higher degree of smoothing,
but at the cost of reduced generalization (Merow et al., 2014).
Similarly to a GLM, the gam() function supports various options for
variable selection using stepwise procedures or shrinkage rules. These
are implemented in the same way as in a GLM. It is also possible to use
a custom function for the scope argument from the biomod2 package
(function.scope()). Here we will illustrate the use of the stepwise pro-
cedure with another function called step.gam() (note however that
the stepAIC() function also works for gam() and can be implemented
in the same way as previously shown for GLM).
> gamStart <-gam(VulpesVulpes ~ 1, data = mammals_
data,
family = binomial)
> gamModAIC <-step.gam(gamStart, biomod2:::.scope(mammals_
data[1:3,
c(“bio3”, “bio7”, “bio11”, “bio12”)], “s”, 4), trace = F,
direction = “both”)
Regression-Based Approaches · 179
In practice, when the observed relationship is linear, the GAM will also
fit a linear relationship even if the degree has been pre-set to 4. An alter-
native would be to test for different degree of smoothing using c(2,3,4)
instead of 4.
The spatial prediction can easily be displayed and compared with the
observed distribution (Figure 10.8).
> par(mfrow = c(1, 2))
The mgcv package provides a lot of summary statistics that can be very
useful when carefully examined (see gam.check()). Additionally, response
curves can also be plotted using the internal functions of mgcv (Figure 10.9).
> plot(gam_
mgcv, pages = 1, seWithMean = TRUE)
This makes it possible to compare the response curves from the mgcv
implementation of GAM to those from the gam package (Figure 10.10).
> rp <-response.plot2(models = c(“gam1”, “gam2”),
Data = mammals_data[,
c(“bio3”, “bio7”, “bio11”, “bio12”)],
show.variables = c(“bio3”, “bio7”, “bio11”, “bio12”),
fixed.var.metric = “mean”, plot = FALSE, use.formal.names = TRUE)
> gg.rp <-ggplot(rp, aes(x = expl.val, y = pred.val,
lty = pred.name)) + geom_line() + ylab(“prob of occ”) + xlab(““)
+ rp.gg.theme + facet_grid(~expl.name, scales = “free_
x”)
> print(gg.rp)
Regression-Based Approaches · 181
Figure 10.9 Response curves of model gam_mgcv plotted using the internal func-
tion of mgcv().
Figure 10.10 The response curves from the model calibrated with the mgcv package
(gam_
mgcv).
Regression-Based Approaches · 183
MARS is implemented in R in both the mda and earth package.
Here, we use the earth package, which provides additional functions
that are not available in mda.
Very few parameters are required to fit a MARS model. One
important parameter concerns the maximum interaction degree,
which determines whether interactions between variables are fitted
or not. This is set to one by default, but more complicated response
curves are likely to be required in certain instances. In the follow-
ing examples, we thus use both a degree of 1 (no interactions) and 2
(pairwise interactions).
> library(earth)
> Mars_
int1 <-earth(VulpesVulpes ~ 1 + bio3 + bio7
+ bio11 + bio12, data = mammals_data, degree = 1,
glm = list(family = binomial))
> Mars_
int2 <-earth(VulpesVulpes ~ 1 + bio3 + bio7
+ bio11 + bio12, data = mammals_data, degree = 2,
glm = list(family = binomial))
> ## print the summary of objects
> Mars_int1
Earth selected 14 of 15 terms, and 4 of 4 predictors
Termination condition: Reached nk 21
Importance: bio7, bio11, bio3, bio12
Number of terms at each degree of interaction: 1 13
(additive model)
Earth GCV 0.08460021 RSS 718.0938 GRSq 0.6615926
RSq 0.6636498
GLM null.deviance 11839.56 (8541 dof) deviance 4267.856 (8528
dof) iters 11
> Mars_int2
Earth selected 18 of 21 terms, and 4 of 4 predictors
Termination condition: Reached nk 21
Importance: bio7, bio3, bio11, bio12
Number of terms at each degree of interaction: 1 4 13
Earth GCV 0.07349056 RSS 621.379 GRSq 0.7060321 RSq
0.7089503
GLM null.deviance 11839.56 (8541 dof) deviance 3625.926 (8524
dof) iters 25 did not converge
Regression-Based Approaches · 185
Figure 10.12 The distribution of the predicted values from MARS for both the
presence and absence of Vulpes vulpes.
> level.plot(pred_Mars_
int1, XY = mammals_
data[, c(“X_
WGS84”,
“Y_WGS84”)], color.gradient = “grey”, cex = 0.3, level.
range = c(0, 1),
show.scale = F, title = “MARS with interaction degree 1”)
> level.plot(pred_Mars_
int2, XY = mammals_
data[, c(“X_
WGS84”,
“Y_WGS84”)], color.gradient = “grey”, cex = 0.3, level.
range = c(0, 1),
show.scale = F, title = “MARS with interaction degree 2”)
The response curves for MARS are not shown here, as they can be
extracted using the same function as in GLM or GAM, as shown above.
Regression-Based Approaches · 187
Figure 10.14 (a) Observed (black = presence, light gray = absence) and poten-
tial distribution of Vulpes vulpes extracted from the (b) MARS 1 and (c) MARS 2
objects. The gray scale of predictions (upper-r ight and lower-left panels) illustrates
habitat suitability values between 0 (light, unsuitable) and 1 (dark, highly suitable).
11.1 Concepts
Classification approaches, recursive partitioning, and even some of the
machine-learning approaches rely on the concept of classifying obser-
vations into homogenous groups (two or more). It is difficult to trace
back to the first application of classification approaches in ecology,
as many different implementations were developed to answer differ-
ent scientific questions. Cluster analysis is the approach most widely
used to group observations, based on one or several predictor variables.
Clustering is a method of unsupervised learning, and a common tech-
nique for statistical data analysis used in many fields, including machine
learning, data mining, pattern recognition, image analysis, and bioin-
formatics. Other examples of methods include supervised approaches,
such as discriminant analyses (Hastie et al., 1994), recursive partitioning
(Breiman et al., 1984; Quinlan, 1986) neural networks (Ripley, 1996;
Franklin, 2010a) or support vector machine (Drake et al., 2006).
These methods have been compared or tested in a number of stud-
ies (e.g. Manel et al., 1999a; Loiselle et al., 2003; Thuiller et al., 2003a,
2003b; Lawler et al., 2006; Maher et al., 2014). The main finding is that
generally speaking, classification or machine- learning approaches do
not provide better results than regression-based approaches, but some of
them are easy to understand and allow the models to be represented in a
very informative or complementary format (e.g. recursive partitioning),
or reveal properties not automatically available from other approaches
(e.g. interactions between predictors). We will detail here three differ-
ent approaches: recursive partitioning, discriminant analysis, and artificial
neural networks.
Figure 11.1 Classification tree for Vulpes vulpes using the rpart() function.
> fda_mod$confusion
true
predicted 0 1
0 3866 449
1 473 3754
attr(,”error”)
[1]0.1079373
Figure 11.5 Response curve of Vulpes vulpes modeled using flexible discriminant
analysis.
As with the other algorithms, the evaluation strip method (Elith et al.,
2005) makes it possible to extract the response curves from the FDA and
visualize the shape of the modeled relationships between the species and
its environment (Figure 11.5).
> rp <-response.plot2(models = c(“fda_ mod”),
Data = mammals_data[, c(“bio3”, “bio7”, “bio11”, “bio12”)],
show.variables = c(“bio3”, “bio7”, “bio11”, “bio12”),
fixed.var.metric = “mean”,
plot = FALSE, use.formal.names = TRUE)
> gg.rp <-ggplot(rp, aes(x = expl.val, y = pred.val,
lty = pred.name)) + geom_ line() + ylab(“prob of occ”) + xlab(““)
+ rp.gg.theme + facet_grid(~expl.name, scales = “free_x”)
> print(gg.rp)
Figure 11.6 Bivariate response curve of Vulpes vulpes modeled using flexible discri-
minant analysis along four predictor variables.
Interested readers are advised to take a look at the source code for this
function by typing:
biomod2:::.CV.nnet in R
Note that due to the inherent stochasticity of NNs, the results may differ
slightly each time.
Figure 11.7 (a) Observed (black = presence, light gray = absence) and potential
distribution of Vulpes vulpes modeled using a neural network algorithm with two
different sets of (b) SIZE and (c) DECAY. The gray scale of predictions (upper-r ight
and lower-left panels) shows habitat suitability values between 0 (light, unsuitable)
and 1 (dark, highly suitable).
Figure 11.8 Response curve of Vulpes vulpes modeled by neural networks. The red
lines represent a first model with reasonable but not optimized parameters set for
SIZE and DECAY, while the blue line represents the final model with optimized
parameters.
12.1 Concepts
We have seen that RP methods can be used as alternative approaches to
classification (e.g. FDA) and regression techniques (e.g. GLM, GAM) for
predicting species distributions. They are not based on assumptions of
normality and user-specified model statements as is discriminant analysis
(e.g. FDA) and OLS regression. However, as for stepwise regression, the
classification into groups can be influenced by local optima or noise in
the data. Therefore, there is not one single decision tree that best explain
the habitat suitability of a given species, but rather several trees which
perform just as accurately when predicting a response. Here we present
two different types of technique that have emerged over the last few years
and that have been mostly applied to RP, although, in theory, they can
be applied to any method. Bagging and boosting are ensemble modeling
techniques, for which a classification or regression method is applied
to various resampling of the original data set or through a stage-based
framework, respectively. The results from each model are then combined
(ensembled) using different weighting schemes.
Bagging –a short for bootstrap aggregation –was proposed by Breiman
(1996), based on the principle of bootstrapping. In this approach, a large
number of bootstrap samples are drawn from the available data (random
subsampling with replacement of rows of data), a model (e.g. RP) is
applied to each bootstrap sample, and then the results are combined into
an ensemble.The final prediction is made either by averaging the outputs
of regression tree approaches or by simple voting in the case of classifica-
tion tree approaches (committee averaging; see Section 17.3.2).This type
of procedure has been shown to drastically reduce the associated variance
of the prediction (Breiman 2001).This bagging procedure applied to RP
together with certain other refinements (see below) has given rise to the
well-known random forests algorithm (Breiman 2001). Note that other
types of bagged trees methods exist in the machine-learning literature.
For this species, we can see that the fitted tree is slightly more compli-
cated than for V. vulpes (Figure 12.1).
> plot(RP.PantheraOnca, uniform = F, margin = 0.1, branch = 0.5,
compress = T)
> text(RP.PantheraOnca, cex = 0.8)
Figure 12.1 Classification tree for Panthera onca using the rpart() function.
Ten splits have been selected in the optimized model. How does
this value change with different cross-validations runs, for instance?
How robust is it to noisy data or small perturbations in the input
data? These are fundamental questions one should preferably ask
when applying RP approaches, instead of taking the first decision
tree as given.
The idea of bagging is to fit several trees to different resampling of the
original dataset and then to average the trees from the different subsam-
ples. This is a relatively easy way of generating a naïve bagging approach
using a bootstrap procedure. First, the bootstrap samples can be drawn
from a multinomial distribution of parameter n (the number of sites or
plots) and with the initial probability of drawing a plot from this distribu-
tion being equal to 1/n.
> trees <-vector(mode = “list”, length = 50)
> n <-nrow(mammals_
data)
> boot <-rmultinom(length(trees), n, rep(1, n)/
n)
We first create a complete tree with no pruning (xval=0) and then use
the update function to re-evaluate the initial tree (Full_tree) without
altering the weights (i.e. fitting a tree to a bootstrap sample specified by
the weights) and store the trees in the list called “trees.”
We can see that through the 50 bootstraps, bio3 is always for the first
split. When going down the trees, it becomes clear that all the variables
could have been selected for a given split. The further we go down the
tree, the higher the variability of the selected variables.
The advantage of the bootstrap approach is that one can extract the
averaged probability (and the variance) of occurrences across all boot-
strap samples.
> Pred <-matrix(0, nrow = n, ncol = length(trees))
> for (i in 1:length(trees)) {
# extract the prediction for each of the trees
Pred[, i] <-predict(trees[[i]], newdata = mammals_
data[,
c(“bio3”, “bio7”, “bio11”, “bio12”)], type = “prob”)[, 2]
# remove potential predictions with a negative
# weight in the # bootstrap procedure
Pred[boot[, i] < 0, i] <- NA
+ }
> ## calculate the average probability of occurrence (e.g.
> ## habitat suitability)
> Pred.AVG <-rowMeans(Pred, na.rm = TRUE)
> importance(RF)
0 1 MeanDecreaseAccuracy MeanDecreaseGini
bio3 91.00154 78.45245 137.51393 1823.810
bio7 36.97712 410.76112 164.58416 947.334
bio11 52.24918 61.35427 85.40074 1026.730
bio12 111.97419 90.89983 153.76295 471.509
From the curve we can see that 1000 trees is not enough to get a
reliable and stable model while a model with more than 5000 trees is
enough. The user can here consider to either manually select 6000 trees
for making predictions or plotting the response curve, or to select the
optimal number of trees using the function (gbm.perf) which is here
10 000. For the sake of simplicity, we will here use the output from the
gbm.perf function.
Figure 12.4 Optimal number of iterations (trees) for the GBM object. The y-axis
represents the error of the model in function of the total number of trees (x-axis).
The black line represents the error of the calibrated model with all data, while the
grey line represents the error from the cross-validation runs.
We can see for the red fox (V. vulpes) example above that the same
ranking is obtained using either method, with the species distribution
strongly influenced by bio3 and then bio7 and bio11. All the models we
have seen so far in Part II provide the same ranking of variable import-
ance for this species.
An additional feature of gbm is the “inner” argument (i.var) used
to plot the response curve of species as a function of the environmental
variables (Figure 12.5).
> par(mfrow = c(2, 2))
> for (i in 1:ncol(mammals_
data[, c(“bio3”, “bio7”, “bio11”,
“bio12”)])) plot(GBM.mod, n.trees = gbm.mod.perf,
i.var = i)
Figure 12.6 Response curves of red fox as a function of one (a) or two (b) explana-
tory variables at a time.
Figure 12.7 (a) Observed (black=presence, light gray= absence) and (b) potential
distributions of the red fox modeled using a boosted regression tree approach. The
gray scale of predictions shows habitat suitability values between 0 (light, unsuitable)
and 1 (dark, highly suitable).
13 • Maximum Entropy
13.1 Concepts
In recent years, we have seen a rise in applications using the maximum
entropy principle in ecology; for instance, to predict species abundances
from functional traits (Shipley et al., 2006, 2011), to predict macroecolog-
ical patterns (Harte, 2011), or to model species distributions (Phillips et
al., 2004, 2006). From a Bayesian perspective, the principle of maximum
entropy states that, subject to known constraints, the probability distribu-
tion that best represents the data is the one with the greatest entropy, i.e.
the one which best reproduces the data. When applying Maxent to pres-
ence-only species distribution data, the space within which the Maxent
probability distribution is defined encompasses all pixels in the study
area (i.e. background information, or quadrature points, see Renner et
al. 2015), the pixels representing the distribution of species occurrences
constitute the sample points, and their environmental features are the
explanatory variables.
The application of the maximum entropy formalism to species dis-
tribution modeling was first introduced by Phillips et al. (2004) and is
now well-developed in the standalone package Maxent.1 Although it is
not formally implemented in R, we decided to add a short introduction
here and present a way of running Maxent from R, so that the Maxent
results can be compared with those from other modeling techniques and
approaches (see Part IV and Part V). Both dismo and biomod2 can be
used to run Maxent in a batch mode. In addition, a maximum entropy
R package is currently in development (see Halvorsen et al., 2015). For
more information about Maxent, we refer interested readers to Elith
et al. (2011), and for its equivalence to GLM and more general discussion
of point-pattern process models, Renner et al. (2015) and to Chapter 20
of this book.
1
www.cs.princeton.edu/~schapire/maxent/
Maximum Entropy · 219
In light of these features, Maxent tends to overfit the data if no pen-
alty or regularization is used to down-weight unimportant variables (as
in boosted regression trees). Thus, in a similar vein to other penalties
for complexity such as Akaike’s information criterion (Akaike, 1974),
Maxent fits a penalized maximum likelihood model that aims to trade-
off model fit and model complexity (Phillips and Dudik, 2008).
13.2 Maxent in R
Maxent takes the sample points or coordinates of observed presences of
the species of interest in a comma-separated text file and the environ-
mental variables in grid formats. Here, in our implementation, Maxent
directly uses ascii grids to sample the environmental variables for the
presence locations of the species, and to define the available space for
the Maxent probability distribution. Maxent then creates the back-
ground data with a default number of 10 000 randomly selected points
across the ascii grids (also called quadrature point, see Renner et al.
2015). If presence–absence data are both available and are reliable, it
is generally advisable to use a presence–absence modeling method (as
seen in the previous parts of this book), as this makes the models less
susceptible to sample selection bias and means they take advantage of
all information in the data. In other words, using Maxent with true
presence and absence data is not recommended (e.g. Elith et al., 2011;
Guillera-Arroita et al., 2014).
First of all, we need to inform Maxent where the species and the grids
files are located.
The path to maxent.jar should also be referred.
> ## The folders ‘book.data’ should be in the a directory just a
> ## before your working directory test if the data directory is
> ## well located (i.e. in dirname(getwd()))
> parent.dir <-dirname(getwd()) ## get the name of the
directory where data dir should be
> any(file.exists(“data”, parent.dir)) ## ok if return TRUE
[1] TRUE
> dir.create(“MaxEnt.res”)
> MaxEnt.layers.dir <- “../data/bioclim”
> MaxEnt.samples.dir <- “../data/species”
> MaxEnt.out.dir <-“MaxEnt.res”
> MaxEnt.soft.path <-“../ data/
maxent.jar” ## the path to
maxent.jar file
> Java.soft.path <- “C:/Program Files (x86)/Java/jre7/bin/java.
exe” ## the path to java software binaries => to be adapted
according to your computer settings
Then, we call Maxent directly from R in batch mode (see the Maxent
manual for further explanations, Elith et al., 2011, and Renner et al.,
2015, for additional code):
> ## define the shell command we want to execute
> maxent.cmd <-paste0(“\”“, Java.soft.path, “\” -mx512m -
jar \”“, MaxEnt.soft.path, “\” environmentallayers=\”“,
file.path(MaxEnt.layers.dir,
“current”, “ascii”), “\” samplesfile=\”“,
file.path(MaxEnt.samples.dir,
“VulpesVulpes.csv”), “\” projectionlayers=\”“,
file.path(MaxEnt.layers.dir,
“current”, “bioclim_table.csv”), “\” outputdirectory=\”“,
MaxEnt.out.dir, “\” outputformat=logistic
maximumiterations=500 jackknife visible=FALSE redoifexists
autorun nowarnings notooltips”)
> ## run Maxent
> system(command = maxent.cmd)
This should normally load Maxent and run it for the species V. vulpes.
The command - mx512m gives Maxent 512Mb of RAM. Then, one
has to provide Maxent with the location of the environmental layers, the
sample file, the output directory, and a few more options. For instance,
we ask to obtain the probability of occurrence (transformed from the
raw data) instead of the raw data using outputformat=logistic.
To be able to compare Maxent predictions to those from other models
in R, we need to provide Maxent with a projection file in.csv for-
mat (bioclim_table.csv). This file contains the coordinates and values
of explanatory variables for all grid cells (same data as in the mammal_
data table).
Most of the outputs from Maxent are finally stored in the MaxEnt.
out folder:
Maximum Entropy · 221
> list.files(MaxEnt.out.dir)
[1]“maxent.log”
[2]“maxentResults.csv”
[3]“plots”
[4]“VulpesVulpes.asc”
[5]“VulpesVulpes.html”
[6]“VulpesVulpes.lambdas”
[7]“VulpesVulpes_bioclim_
table.csv”
[8]“VulpesVulpes_bioclim_
table_
clamping.csv”
[9]“VulpesVulpes_omission.csv”
[10] “VulpesVulpes_sampleAverages.csv”
[11] “VulpesVulpes_samplePredictions.csv”
One can then plot the predictions and compare them to the observed
data as for the other models (Figure 13.1).
> par(mfrow = c(1, 2))
> level.plot(mammals_ data$VulpesVulpes, XY = mammals_ data[,
c(“X_WGS84”, “Y_
WGS84”)], color.gradient = “grey”, cex = 0.3,
level.range = c(0, 1), show.scale = F, title = “Original data”)
> level.plot(Maxent.pred_ AllFeatures[, 3],
XY = Maxent.pred_ AllFeatures[, c(“X_WGS84”, “Y_WGS84”)], color.
gradient = “grey”,
cex = 0.3, show.scale = F, title = “MAXENT”, level.
range = c(0, 1))
All results regarding the predictive accuracy and other results are stored
in the maxentResults.csv file.
> Maxent.results <-read.csv(“MaxEnt.res/
maxentResults.csv”)
> names(Maxent.results)
The results obtained from Maxent are relatively similar to those from
the other techniques we have looked at so far.
The example run here used the default option which allows all types
of feature. This is however worth considering simpler models as we have
seen through the entire book (see also Merow et al., 2014).
Here we use hinge features by turning off the other feature types
(nonlinear, etc.).
> ## define the shell command we want to execute
> maxent.cmd <-paste0(“\”“, Java.soft.path, “\” - mx512m -
jar \”“, MaxEnt.soft.path, “\” environmentallayers=\”“,
file.path(MaxEnt.layers.dir, “current”, “ascii”), “\”
samplesfile=\”“,
file.path(MaxEnt.samples.dir, “VulpesVulpes.csv”), “\”
projectionlayers=\”“,
file.path(MaxEnt.layers.dir, “current”, “bioclim_ table.csv”), “\”
outputdirectory=\”“, MaxEnt.out.dir, “\” outputformat=logistic
nowarnings
nolinear noquadratic nothreshold noproduct maximumiterations=500
jackknife visible=FALSE redoifexists autorun nowarnings
notooltips”)
> ## run Maxent
> system(command = maxent.cmd)
> Maxent.pred_Hinge <- read.csv(“MaxEnt.res/VulpesVulpes_bioclim_
table.csv”)
Maximum Entropy · 223
Figure 13.3 Comparison between the potential distribution of the red fox mod-
eled using (a) Maxent with all features (by default) and (b) Maxent with only the
hinge feature selected. The gray scale of predictions shows habitat suitability values
between 0 (unsuitable) and 1 (highly suitable).
The predictions can then be plotted and compared to the initial model
with all features and the one with only the hinge feature (Figure 13.3).
> par(mfrow = c(1, 2))
> level.plot(Maxent.pred_ AllFeatures[, 3],
XY = Maxent.pred_ AllFeatures[, c(“X_WGS84”, “Y_WGS84”)],
color.gradient = “grey”,
cex = 0.3, level.range = c(0, 1),
show.scale = F,
title = “MAXENT -all features”)
> level.plot(Maxent.pred_ Hinge[, 3], XY = Maxent.pred_ Hinge[,
c(“X_WGS84”, “Y_
WGS84”)], color.gradient = “grey”, cex = 0.3,
show.scale = F, title = “MAXENT -hinge feature”, level.
range = c(0, 1))
As we can see in Figure 13.3, the maps are almost exactly the same. The
hinge function that behaves similarly to a GAM is enough to predict the
distribution of the red fox. Interested readers could also take a look at the
jackknifing results that are also the same.
This result underlines that Maxent, like any other modeling technique,
needs tuning to ensure it is correctly parameterized. This could be done
in a semi-automatic fashion once the criteria are clear (e.g. the best fit
with the simplest model, for instance).
So far, we have seen that HSMs can be implemented with a large range of
statistical tools. This raises the question of which one(s) to use and how?
There is no simple answer to this question, but it has fueled more than ten
years of comparative analyses comparing, for example, regression-based
versus tree-based algorithms (Thuiller et al., 2003a; Meynard and Quinn,
2007) or model- based versus machine- learning based (Manel et al.,
1999a; Segurado and Araújo, 2004), parametric versus non-parametric
algorithms (Thuiller et al., 2003a; Segurado and Araújo, 2004), and all
the other types of model contests.With just a few exceptions (e.g. Maher
et al., 2014), the main conclusion has been that presence–absence models
usually work better (Brotons et al., 2004), that the most recently pro-
posed approaches to HSM such as boosting or bagging tend to offer
higher predictive performance (Elith et al., 2006), but this also usually
depends on the context, data bias, and resolution (Elith and Leathwick,
2009), and that better predictive performance at model calibration usu-
ally comes at the expense of model transferability to new regions or to
new conditions (Randin et al., 2006).
One way of selecting a model from the plethora of existing algorithms
is to simply select the best one for the data, based on one or a set of pre-
dictive performance metrics (Thuiller, 2003, see Part IV). When mod-
eling a large number of species, one model can be selected per species,
resulting in different models selected for different species. The advantage
of this solution is that the predictive performance metric selects the best
model for the user, but it does make it more difficult to compare models
across species. An alternative to the strict selection of one single model
is to use an ensemble of models (e.g. fitted with different techniques, or
with different sets of predictors) and to derive a general prediction from
all (or a part) of them. The rationale behind using and ensembling sev-
eral models is that two or more models may have very similar predictive
where
∆ i = AICi − min AIC
where wiPi is the prediction from model i (fitted using any mod-
eling technique), weighted by a weight of evidence in favor of this
model, but this time based on a chosen predictive accuracy metric
(e.g. AUC or TSS), ideally (but not necessarily) calculated on a left-out
partition of the data, obtained for instance through cross-validation.
As we will see in Part IV, cross-validation is one of the most widely
accepted approaches for testing the predictive accuracy of habitat suit-
ability modeling. A random part of the data is kept for calibration
(i.e. training data) while the remainder is used to test the prediction
of the model, and the whole approach is then repeated several times
Extract the environmental layers for the presence and absence points:
> Env <-extract(myExpl, DataSpecies[, c(2, 3)])
Create a dataframe to store the evaluation results for each model for each
cross-validation:
> Test_
results <-as.data.frame(matrix(0, ncol = nCV, nrow = 5,
dimnames = list(c(“GLM”, “GAM”, “MARS”, “FDA”, “RF”), NULL)))
Create an array to store the predicted habitat suitability for each single
model × cross-validation combination:
> Pred_
results <-array(0, c(nRow, 5, nCV),
dimnames = list(seq(1:nRow),
c(“GLM”, “GAM”, “MARS”, “FDA”, “RF”), seq(1:nCV)))
Once the cross-validation runs are computed, we can analyse the varia-
tion between the different runs and across the models. Here we will use
the ggplot2 package as an example.
> library(ggplot2)
> AUC <-unlist(Test_ results)
> AUC <-as.data.frame(AUC)
> Test_results_ggplot <- cbind(AUC,
model = rep(rownames(Test_ results), times = 20))
> p <-ggplot(Test_ results_ggplot, aes(model, AUC))
> p + geom_boxplot()
Since the data have been modeled at a coarse (100 km) resolution,
we will transform them back to a raster stack object to facilitate the
representation.
> Obs <-rasterFromXYZ(DataSpecies[, c(“X_ WGS84”, “Y_ WGS84”,
“VulpesVulpes”)])
> Pred_total_mean_r <- rasterFromXYZ(cbind(DataSpecies[,
c(“X_WGS84”, “Y_WGS84”)], Pred_total_mean))
> Pred_total_median_r <- rasterFromXYZ(cbind(DataSpecies[,
c(“X_WGS84”, “Y_WGS84”)], Pred_total_median))
> Pred_total_sd_r <- rasterFromXYZ(cbind(DataSpecies[,
c(“X_WGS84”, “Y_WGS84”)], Pred_total_sd))
> Out <- stack(Obs, Pred_total_mean_r, Pred_total_median_r,
Pred_total_sd_r)
> names(Out) <- c(“Observed_Vulpes_vulpes”,
“Ensemble_modeling_mean”, “Ensemble_modeling_median”,
“Ensemble_modeling_sd”)
> plot(Out)
Figure 14.3 Observed presence and absence of Vulpes vulpes at the global scale (a),
together with the two model averaging predictions (mean and median; b and c) and
the ensemble modeling uncertainty (sd; d). (A black and white version of this figure will
appear in some formats. For the color version, please refer to the plate section.)
In Part IV, we review and detail aspects of evaluating HSMs after their
calibration, including the definition of the different types of errors
and the types of metrics used to compare predictions with observa-
tions (Chapter 15), and the type of data needed to assess –as indepen-
dently as possible –the predictive power of a model and the associated
uncertainty estimates around the final predictions (Chapter 16). The
data resampling approaches described in Chapter 16 can also be used
to run sensitivity analyses and deliver uncertainty estimates, under
the present or future conditions to which the model is applied (e.g.
Buisson et al., 2010; Carvalho et al., 2011; Thibaud et al., 2014). These
assessments are the ones commonly applied in the literature. A third
type of assessment, less commonly used, is to assess the ecological
realism of the models (e.g. shape of response curves, Elith et al., 2005;
Merow et al., 2014) and associated predictions (Guisan et al., 2006a;
Mateo et al., 2012; Thuiller et al., 2014b).
Model evaluation is a crucial step in any modeling exercise (Hastie et
al., 2009), as it evaluates the capacity of a given model to reflect “truth,”
its inherent uncertainty in the parameter estimations, and whether it
can be applied under other conditions. Evaluating HSMs is crucial if
those models are to be used for conservation planning and biodiver-
sity management (Vaughan and Ormerod, 2005; Guisan et al., 2013).
Consequently, a sound evaluation primarily depends on the intended use
of a model, and therefore on the aims of the underlying modeling study.
For instance, estimating parameter uncertainty might be more relevant
for making inferences about a given predictor variable, while prediction
uncertainty might be more closely scrutinized when the model is used
for purely predictive purposes (Hastie et al., 2009).
Elith, 2010). For instance, the commonly used area under the ROC
curve (AUC) metric only measures discrimination, while the point-
biserial correlation (see Elith et al., 2006; rpb, Linacre, 2008) meas-
ures both calibration and discrimination (Phillips and Elith, 2010).
As the AUC is the predominant choice in published HSM studies,
most models were only evaluated from this perspective. These two
perspectives on the evaluation of presence–absence models should be
used conjointly (as e.g. in Elith et al., 2006) when reporting on model
evaluation, and the procedures for their use are developed in the next
two sections.
π1 … πk
x=0 n01 … n0k n0.
x=1 n11 … n1k n1.
n.1 … n.k nn..
As we can see in Figure 15.4, the calibration plots are relatively similar
for the three modeling techniques (each technique tending to slightly
over-or under-predict at different places along the probability of occur-
rence gradient). Interestingly, predictions from the averaged model show
a better calibration plot (close to the 1:1 line) compared to single algo-
rithms, which further supports Laplace’s idea that the average of multiple
models predicts better than individual models (Araújo and New, 2007;
see Part III and Part V).
In a next step, trend lines (i.e. models) can be added through the points
with confidence intervals (CIs). Prediction bins which CI contains the
diagonal represent the bins where the predictions and observations can
be considered statistically identical. Such graph can be drawn using the
scripts in Phillips and Elith (2010). The resulting plot is displayed in
Figure 15.5.
> calibplot <-function(pred, negrug, posrug, ideal, ylim=c(0,1),
xlim=c(0,1), capuci=TRUE, xlabel = “Predicted probability of
presence”, filename=NULL, title=“Calibration plot”, ...) {
if (!is.null(filename)) png(filename)
ylow <-pred$y -2 * pred$se
ylow[ylow<0] <- 0
yhigh <-pred$y + 2 * pred$se
if (capuci) yhigh[yhigh>1] <- 1
plot(pred$x, ylow, type=“l”, col=“orange”, ylim=ylim,
xlim=xlim, xlab=xlabel, lwd=2, ...)
> smoothingdf <- 6
> smoothdist <-function(pred, res) {
require(splines)
gam1 <-glm(res ~ ns(pred, df=smoothingdf), weights=rep(1,
length(pred)), family=binomial)
x <-seq(min(pred), max(pred), length = 512)
y <-predict(gam1, newdata = data.frame(pred = x),
se.fit = TRUE, type = “response”)
data.frame(x=x, y=y$fit, se=y$se.fit)
}
Figure 15.4 Example of calibration plots for species Vulpes vulpes modeled and pre-
dicted using three modeling techniques and their averaged ensemble: (a) random
forest (RF), (b) flexible discriminant analysis (FDA), and (c) boosting regression trees
(BRT) and (d) average model (AVER). Different models will yield calibration curves
with different spreads.
> Data<-EvalData[1:2000,]
#true probability of presence
> RF<-Data$RF
> FDA<-Data$FDA
> BRT<-Data$BRT
> AVER<-Data$AVER
0 1
FALSE 4020 321
TRUE 319 3882
Observed
present absent sum
1 0
predicted present 1 TP FP TP+FP
true presence false presence total predicted
commission error presences
absent 0 FA TA FA+TA
False absence True absence total predicted
omission error absences
sum TP=FA FP+TA N = TP + FP +
total presences total absences FA + TA
Total number of
observations
254
Optimist’s Correct classification rate CCR Percentage of correct [0: 1] (TP + TA)/N
view(no predictions (presences and
difference absences)
between types Misclassification rate MR Percentage of false predictions [0: 1] (FP + FA)/N
of errors) (presences and absences)
Observer’s Sensitivity SE Percentage of presences [0: 1] TP/(TP + FA)
view(by (=true positive rate) correctly predicted
column in False absence rate FAR Percent of presences falsely [0: 1] FA/(TP + FA)
Table 15.2) (=false negative rate) predicted = 1 –SE
Specificity SP Percentage of absences [0: 1] TA/(TA + FP)
on 12 Feb 2018 at 17:52:52, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
One can also examine how the choice of threshold can change the pre-
dicted prevalence, i.e. the proportion of presences and absences across
the prediction map, across the different models (RF=random forest,
FDA=flexible discriminant analysis, BRT=boosted regression trees,
AVER=average model of the three techniques; see Part III).
# Effect of threshold choice (11 thresholds) on predicted
prevalence
> pred.prev <-predicted.prevalence(EvalData, threshold = 11)
> pred.prev[, 2:6] <-round(pred.prev[, 2:6], digits = 2)
> pred.prev
$EVALUATION_METRICS
Metric Value
1 “Prevalence” “0.492”
2 “Correct classification rate” “0.9207”
3 “Misclassification rate” “0.0793”
4 “Sensitivity” “0.8977”
5 “Specificity” “0.9431”
6 “Positive predictive power” “0.0569”
7 “Negative predictive power” “0.1023”
8 “False positive rate” “0.9386”
9 “False negative rate” “0.9049”
10 “Odds Ratio” “145.3641”
11 “Kappa” “0.8413”
12 “Normalized mutual information” “0.3968”
13 “True skill statistic” “0.8408”
The previous step provided values for different metrics for a given thresh-
old. Let’s now see in detail how to obtain values for one metric across
different thresholds, taking Cohen’s Kappa as the evaluation metric, and
considering this time 0.01 increments (i.e. 99 thresholds).
> kappa100 <- ecospat.max.kappa(EvalData$AVER, EvalData$ObsNum)
> kappa100 [[2]]
[,1] [,2]
[1,] “Maximum K” “0.8507”
[2,] “Correspondent threshold” “0.44”
As we will see later, the same type of analysis can be run in biomod2 and
in other R packages (e.g. PresenceAbsence).
From these types of “across threshold” analyses, a first type of
threshold-independent evaluation measures of discrimination can be derived.
Here, the “optimized” threshold (on any dataset) is simply found by cal-
culating the chosen evaluation metric for a range of possible thresholds
(e.g. from 0 to 1, with an increment of 0.01), and by then selecting the
one that maximizes the metric (assuming the response of the evaluation
metric to the threshold is unimodal). This results in a “max” value for
the chosen metric (e.g. max-Kappa or max-TSS; Table 15.3; see Liu et
al., 2005). The underlying hypothesis is that the best possible value for
the evaluation metric will reveal the predictive potential of the related
model. Indeed, a model with poor predictive capacity will obtain a low
Figure 15.7 (a) Kappa and (b) TSS plots for species Vulpes vulpes modeled and pre-
dicted using three modeling techniques and their averaged ensemble: random forest
(RF), flexible discriminant analysis (FDA) and boosting regression trees (BRT) and
average model (AVER). (A black and white version of this figure will appear in some for-
mats. For the color version, please refer to the plate section.)
score for the maximized evaluation metric, supporting the use of this
approach. For instance, Landis and Koch (1977) suggested the following
scale of judgment for Kappa: excellent K > 0.75; good 0.40 > K > 0.75;
and poor K < 0.40. The advantage of this strategy is that it applies to any
discriminant evaluation metric that can be calculated between binary
observations and binarized predictions (see Table 15.3). In this regard,
Liu et al. (2013) showed that the thresholding approach maximizing the
true skill statistics (max-TSS), which is equivalent to maximizing the
sum of sensitivity and specificity (max SSS), is particularly well suited
as it produces the same threshold using either presence–absence data or
presence-only data.
In biomod2, the following steps allow us to obtain all the threshold-
dependent metrics simultaneously, along with graphs showing the vari-
ation in the values along the thresholds and the maximized statistics
(Figure 15.7):
# Plotting the Kappa and TSS for each model using the function
Find.Optim.Stat() from the package biomod2
> library(biomod2)
> library(ggplot2)
> n=100
Both Kappa and TSS combine information on the omission and com-
mission error rates (see Tables 15.2 and 15.3) with the correctly predicted
presences and absences. It can thus be informative to plot the variation
of these metrics together with the variation in sensitivity and specificity
across all thresholds, to see how variations in these overall metrics relate
to the variation in the rate of correctly predicted presences and absences.
This can be done using the error.threshold.plot() function in
the PresenceAbsence package (Figure 15.8).
# Plotting the error statistics as a function of threshold in
four models
> data <-EvalData[1:6]
> N.models <-ncol(data) - 2
> par(oma=c(0,5,0,0), mar=c(4,4,4,1), mfrow=c(2,2), cex=0.7,
cex.lab=1.4, mgp=c(2, 0.5,0))
> for (mod in 1:N.models){
error.threshold.plot(data, which.model = mod, color = TRUE,
add.legend = TRUE, legend.cex = 0.7)
}
Figure 15.8 Error threshold plots for species Vulpes vulpes modeled and predicted
using three modeling techniques and their averaged ensemble: (a) random forest
(RF), (b) flexible discriminant analysis (FDA), and (c) boosting regression trees
(BRT), and (d) average model (AVER).
curves along an axis of threshold values), the latter being often attributed
to the threshold from the curve of a ROC plot (see below) since it is also
equal to the threshold defining the inflection point of the curve.
A second type of threshold-independent discrimination metric, and
an alternative to the previous maximization metrics, is to use an inte-
grative approach that does not require the association of a metric with a
given subjective or optimized threshold, but rather calculates it by inte-
grating evaluation values across the whole range of possible thresholds.
The AUC of a ROC1 (see Swets, 1988; Fielding and Bell, 1997), origin-
ally developed during the World War II for signal detection and later used
in medicine, is currently the most commonly used integrated discrimin-
ation metric in habitat distribution modeling. Instead of looking for the
1
Receiver-Operating Characteristic
Figure 15.9 AUC ROC plots for the species Vulpes vulpes modeled and predicted
using three modeling techniques and their averaged ensemble: random forest (RF),
flexible discriminant analysis (FDA) and boosting regression trees (BRT) and average
model (AVER). (A black and white version of this figure will appear in some formats. For
the color version, please refer to the plate section.)
The four models for V. vulpes deliver here rather high AUC values, all
at >0.95 (see the interpretation scale in the text above).
In this case, the threshold was set to 0.5. In the same way as for presence–
absence, a cross-validation or split-sampling procedure can be used to
find the threshold that optimizes the associated metrics.
Several other approaches have been proposed for evaluating presence-
only predictions that do not require the selection of a single threshold.
The first is the Boyce index (Boyce et al., 2002; Hirzel et al., 2006). As
initially defined, this methods splits the model predictions into b regular
bins (or classes, typically 10) and then assesses the proportion of presences
actually found within each bin i compared to the proportion of modeling
cells (i.e. pixels) in the same bin, i.e. the expected proportion if the pres-
ences were distributed randomly (called the predicted-to-expected (P/E)
ratio Fi in Hirzel et al. (2006). A model that adequately predicts the distri-
bution of a given species should predict large numbers of presences in the
high prediction bins (i.e. high proportion of presences with high values of
habitat suitability) and fewer and fewer presences as one moves toward the
lower prediction bins (i.e. toward low habitat suitability for the species).
In this, it is similar to drawing a calibration plot with presence–absence
data (see Section 15.1.1), but with background data instead of absences
(Phillips and Elith, 2010). Accordingly, one would expect a monotonic rela-
tionship between the mean (or median) bin value and the predicted-to-
expected (P/E) ratio Fi. The Boyce index can therefore be calculated as the
Spearman correlation between the mean/median bin value and Fi (Boyce
et al., 2002; Hirzel et al., 2006). It takes a value between -1 and +1, with
a value tending toward +1 indicating good to perfect predictions, values
around 0 indicating predictions no different from those obtained by chance,
and values toward -1 indicating counter-predictions, i.e. observing presences
in low suitability classes and observing absences in high suitability classes
(Hirzel et al., 2006). This approach has been used, for example, to compare
Figure 15.10 Boyce index plot of the Vulpes vulpes model fitted using an average of
three modeling techniques (AVER) and predicted worldwide.
The same type of plot, but with a curve that is also fitted to the points,
can be obtained, using the pocplot() function provided in Phillips and
Elith (2010), see Figure 15.11.
## POC function
# presence-
only smoothed calibration plot
Figure 15.11 POC-plot of the Vulpes vulpes model fitted using an average of three
modeling techniques (AVER) and predicted worldwide.
Once one or several evaluation metrics have been chosen, the next step
is to determine which data to use for model evaluation. Using exactly
the same data used to fit the model to calculate an agreement metric –
a process often called resubstitution –is not considered a proper evalu-
ation because the model is not tested on independent data (Section
16.1). Resubstitution procedures do, however, provide a baseline for
comparing the same metrics measured on model predictions obtained
on independent data (Sections 16.2 and 16.3). Randomization pro-
cedures can also be used here to complement the resubstitution pro-
cedure. The latter approaches additionally assesses the robustness of a
model and its goodness-of-fit measures by randomizing the data (typ-
ically by permutation), and then testing in which proportion (across
all models fitted with the randomized data) a similar model (e.g. with
same coefficients and similar fit) can be obtained by chance (Section
16.1). Taking a honest evaluation perspective, involving some level of
independent data, there are two basic strategies that can be followed
depending on the degree of independence of the evaluation data (or
test set) compared to the calibration data (or training set) (Guisan and
Zimmermann, 2000; Araújo et al., 2005a):
(i) Using resampling procedures (e.g. jackknife, cross-validation, boot-
strap) within the training set to assess the model’s predictive power
on partially independent data, known as “internal validation” by resa-
mpling (Section 16.2); and
(ii) Testing the model on fully independent data, kept separate from the
beginning or ideally sampled a posteriori to test the model, known
as “external evaluation” (Section 16.3).
With this in mind, “internal evaluation” can be considered as any assess-
ment of a model within the dataset or region used to calibrate it (as
Figure 16.1 The different data partitioning strategies that can be used to evaluate a
model. k = number of partitions. Upper arrows indicate model training; lower arrows
indicate model evaluation. (Figure drawn with contributions by L. Maiorano.)
Table 16.1 The four main resampling approaches and their characteristics.
cross-validation random
Bootstrap /Bootstrap 2 (*) Random with R R R = (50 –200) Leathwick et al. (2006);
.632+ replacement Moretti et al. (2006)
(*) except the very exceptional cases where all observations are resampled (highly improbable)
278 · Evaluating Models: Errors and Uncertainty
3 Designs
DESIGN (~800 plots)
PURPOSIVE (~300 plots)
EVAL
8 Models
PURPOSIVE: L. oregana
PURPOSIVE: L. pulmonaria
PURPOSIVE: P. anomala
PURPOSIVE: P. anthraspis
DESIGN: L. oregana
1 DESIGN: L. pulmonaria
DESIGN: P. anomala
DESIGN: P. anthraspis
0.9
0.8
AUC
0.7
0.6
0.5
0.4
Resubstitution 10-f CV Test Independent Test
Type of evaluation
Figure 16.4 The use of three different types of samples to illustrate the importance
of using partially independent (cross-validation, Section 16.2) and independent data
(Section 16.3) in addition to internal resubstitution (Section 16.1) to evaluate model
predictive power. The example is a modified version of that used by Edwards et al.
(2006), with permission. (A black and white version of this figure will appear in some for-
mats. For the color version, please refer to the plate section.)
Note that the computation time is much shorter than for a leave-one-
out cross-validation (LOO-CV, see the next Section 16.2.2) due to the
use of cv.glm() that cannot be used for LOO-CV. In this example, we
see little evidence that using cubic or higher-order polynomial terms
leads to lower test error than simply using quadratic fit.
The cv.glm() function produces a list with several components.The
numbers in the delta vector contain the cross-validation results. On
this dataset, the estimates are very similar to each other.
One other option is to perform an estimation of misclassification rate,
sensitivity, specificity and AUC based on cross-validation (CV) using the
Daim package.
> library(Daim)
> vulpes_data<-s_mammals_data[c(9:13,8)]
> vulpes_data$VulpesVulpes <- as.factor(vulpes_data$VulpesVulpes)
> set.seed(555)
> vulpes_RF_cv <-Daim(formula=VulpesVulpes~., model=myRF,
data=vulpes_ data, labpos=“1”, control=Daim.control(method=“cv”,
k=10, k.runs=10), cutoff=“cv”)
> vulpes_RF_cv
Performance of the classification obtained by:
Call:
VulpesVulpes ~ bio3 + bio4 + bio7 + bio11 + bio12
Daim parameters:
method = cv, k = 10, k.runs = 10, cutoff = cv, est.method =
obs, best.cutoff = 0.4.
Result:
-----------------------------------
Error: | | cv | | apparent |
------------------------
------------------------
| 0.0646 | | 0.0000 |
-----------------------------------
> summary(vulpes_RF_cv)
Performance of the classification obtained by:
Call:
VulpesVulpes ~ bio3 + bio4 + bio7 + bio11 + bio12
Daim parameters:
method = cv, k = 10, k.runs = 10, cutoff = cv, est.method =
obs, best.cutoff = 0.4.
Figure 16.5 Plot of the Daim object generated using the Daim() function. (a) Cross-
validation mean estimate of sensitivity and specificity. (b) All CV sample estimates of
sensitivity and specificity.
Result:
-----------------------------------------
| Method: | | cv | | apparent |
=========================================
| Error: | 0.0646 | | 0.0000 |
-----------------------------------------
| Sensitivity: | 0.9507 | | 1.0000 |
-----------------------------------------
| Specificity: | 0.9215 | | 1.0000 |
-----------------------------------------
| AUC | 0.9830 | | 1.0000 |
-----------------------------------------
one observation and using the model to predict to this single left-
out observation. Then, the same operation is repeated n times, each
time leaving out a different observation until all n observations have
been left out once (Figure 16.6). This means that as many models are
fitted as there are observations (n; Table 16.1). Each observation can
therefore be associated with a prediction (i.e. from a model that was
fitted without it), so that in the end, a vector of n predictions can be
constructed. As for k-fold cross-validation, predicted values can then
be compared to real observations using any of the evaluation metrics
presented in Chapter 16.
If used to generate independent predictions, this approach is more
appropriate with very small sample size, when too few species obser-
vations are available to conduct a k-fold cross-validation (Guisan and
Zimmermann, 2000). Examples of uses of jackknife include HSMs for
bats at a coarse resolution (and therefore small sample size) in Switzerland
(Jaberg and Guisan, 2001), and habitat models of geckos in Madagascar
(Pearson et al., 2007). However, unless the number of observations is very
low, in most cases a repeated split sample cross-validation approach is a
better option (16.2.3). In addition, jackknife does not thoroughly assess
the stability of a model, because it only removes one observation at a
time between models, and therefore the models and associated param-
eters do not differ drastically.
So, is this approach at all useful? There is indeed another, more
important role for leave-one-out cross-validation, associated with the
initial aim of the Jackknife: to calculate a measure of the influence of
each single observation on the overall model or statistics (influence meas-
ure; Efron and Tibshirani, 1993). Because the models are fitted and each
As for the k-fold CV example, we call the boot library where the cv.
glm() function is located.
> cv.err=cv.glm(s_
mammals_
data,glm.fit)
> cv.err$delta
[1]0.1092130 0.1092129
Note that the computation time is much longer than for k-fold
cross-validation.
Figure 16.7 Procedure for the repeated split sample (i.e. repeated twofold) cross-
validation for evaluating predictive models.
Use the subset option in glm() to fit a glm using only observations in
the training set.
Use the predict() function to estimate the response for all 2,488
observations, and the mean() function to calculate the MSE of the
1,244 observations in the validation set. Note that the -train argument
below selects only the observations that are not in the training set.
> mean((VulpesVulpes-
predict(glm.fit,s_
mammals_
data))[-
train]^2)
[1]7.349963
Therefore, the estimated test MSE for the GLM fit is 7.349963. We can
use the poly() function to estimate the error for the second-and third-
order (cubic) polynomial regressions.
> glm.fit2=glm(VulpesVulpes~poly(bio3+bio7+bio11+bio12,2),
family=“binomial”, data=s_mammals_data, subset=train)
> mean((VulpesVulpes-predict(glm.fit2,s_
mammals_
data))[-
train]^2)
[1]8.360697
> glm.fit3=glm(VulpesVulpes~poly(bio3+bio7+bio11+bio12,3),
family=“binomial”, data=s_mammals_data, subset=train)
> mean((VulpesVulpes-predict(glm.fit3,s_
mammals_
data))[-
train]^2)
[1]3.575069
[1]10.1622
> glm.fit2=glm(VulpesVulpes~poly(bio3+bio7+bio11+bio12,2),
family=“binomial”, data=s_mammals_data, subset=train)
> mean((VulpesVulpes-predict(glm.fit2,s_
mammals_
data))[-
train]^2)
[1]9.018095
> glm.fit3=glm(VulpesVulpes~poly(bio3+bio7+bio11+bio12,3),
family=“binomial”, data=s_mammals_data, subset=train)
> mean((VulpesVulpes-predict(glm.fit3,s_
mammals_
data))[-
train]^2)
[1]3.920435
Using these observations split into a training set and a validation set, we
find that the repeated split sample error rates for the models with linear,
quadratic, and cubic terms are 10.16, 9.02, and 3.92, respectively.
Figure 16.8 Procedure for the bootstrap approach exemplified on a small sample
containing n = 3 observations. Each bootstrap dataset contains n observations, sam-
pled with replacement from the original dataset. Each bootstrap dataset is used to
obtain an estimate of α for evaluating predictive models. Adapted from Hastie et al.
(2009) and James et al. (2013), with permission.
where R is the number of bootstrap samples, tr* is the value of the metric
or model parameter of interest estimated in sample r, and t̄ * is the mean
of the empirical bootstrap values.
The bias B is estimated as the difference between the model parameter
calculated on all data (full model) and the mean of the estimates calcu-
lated on the empirical bootstrap values, as follows:
1 R *
B=
R r =1
( )
∑ tr − t = t * − t0
With GLMs for instance, bias corrected values of the explained devi-
ance of the predictor coefficients and other model parameters can
be obtained in this way (Harrell et al., 1996). When the difference
between the parameter of the full model and the mean of the empir-
ical bootstrapped values is too high –what is called “optimisms from
overfitting” (Harrell et al., 1996) –then the predictive ability of the
model can be questioned (see SDM examples in Moretti et al., 2006;
Marcelli et al., 2012).
Measuring bias and variance was the initial aim of bootstrap (Efron
and Tibshirani, 1993), as measuring influence values was the initial aim
of jackknife (see Section 16.2.2). However, bootstrap can also be used
as an alternative to cross-validation to obtain data for evaluation (Efron
and Tibshirani, 1997). We previously saw that on average in each subset,
bootstrap selects 63.2% of all observations in the full dataset (Efron and
Tibshirani, 1993). This means conversely that, again on average, at each
iteration 36.8% of the observations will not be resampled and therefore
can be used as data for evaluation. Although this was not the original
intent of bootstrap, it can therefore be used as a powerful alternative to
k-fold cross-validation. Referred to as the .632+ bootstrap, it provides
more robust estimates than k-fold CV because it is repeated a much
larger number of cases (Efron and Tibshirani, 1997; Hastie et al., 2009).
However, one drawback is that, unlike the repeated split sample, a dif-
ferent number of observations is left out at each bootstrap iteration. This
means the proportion of independent data is uneven between iterations,
and therefore the programming of the procedure and statistical calcula-
tions is more complex. Although the potential exists (as presented in
Efron and Tibshirani, 1997; see Robinson et al., 2011 for an example
with GAMs used in forestry), for using bootstrap .632+ in the context of
habitat suitability modeling, we are only aware of two examples (Wintle
et al., 2005; Leathwick et al., 2006).
The bootstrap approach can thus be used to assess both the variabil-
ity of the coefficient estimates and predictions from a statistical predict-
ive model (normal bootstrap) and the predictive power when used as a
cross-validation method (.632+ bootstrap). This section shows examples
of applying bootstrap for both uses.We first illustrate the use to assess the
variability of the estimates for b0 and b1, the intercept and slope terms
> boot.fn(s_mammals_data,sample(2488,2488,replace=T))
Call:
boot(data = s_
mammals_
data, statistic = boot.fn, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* -1.0382810741 8.206176e- 03 0.4820269002
t2* -0.1787174774 -1.144172e-03 0.0131623288
t3* 0.0216654829 9.378300e- 05 0.0010757700
This indicates that the bootstrap estimate for SE(b0) is 0.4820, and
that the bootstrap estimate is 0.0132 for SE(b1), 0.0011 for SE(b2),
etc. These can be compared to the analytical standard errors for the
regression coefficients obtained by the summary() function applied
to the GLM:
> summary(glm(VulpesVulpes~bio3+bio7+bio11+bio12,
family=“binomial”, data=s_mammals_data))$coef
The standard error estimates for b0 and b1 obtained using the formulae
are 0.4958 for the intercept and 0.0113 for the slope of bio3, 0.0011 for
the slope of bio7, 0.0010 for the slope of bio11, and 0.0001 for the slope
of bio12. Interestingly, these are somewhat different from the bootstrap
estimates. This indicates a potential problem with the analytical coeffi-
cients. Below, we compute the bootstrap standard error estimates and the
standard glm estimates that result from fitting the quadratic model to the
data (bio3). Since this model provides a good fit to the data, there is now
a better correspondence between the bootstrap estimates and the stand-
ard estimates of SE(b0), SE(b1), and SE(b2).
> boot.fn=function(data,index)
coefficients(glm(VulpesVulpes~bio3+I(bio3^2),
family=“binomial”,data=data,subset=index))
> set.seed(555)
> boot(s_mammals_data,boot.fn,1000)
Call:
boot(data = s_
mammals_
data, statistic = boot.fn, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* -5.826502239 -4.134456e-02 0.4758100058
t2* 0.535307238 3.321060e- 03 0.0365987705
t3* -0.009090535 -5.514673e-05 0.0005977474
> summary(glm(VulpesVulpes~bio3+I(bio3^2),family=“binomial”,
data=s_mammals_data))$coef
> summary(vulpes_RF)
Call:
VulpesVulpes ~ bio3 + bio4 + bio7 + bio11 + bio12
Daim parameters:
method = boot, nboot = 50, replace = TRUE, boot.size = 1,
cutoff = 0.5,
est.method = obs.
Result:
------------------------------------------------------------------
| Method: | .632+ | | .632 | | loob | | apparent |
==================================================================
| Error: | 0.0472 | | 0.0448 | | 0.0710 | | 0.0000 |
------------------------------------------------------------------
| Sensitivity: | 0.9491 | | 0.9518 | | 0.9237 | | 1.0000 |
------------------------------------------------------------------
| Specificity: | 0.9559 | | 0.9582 | | 0.9339 | | 1.0000 |
------------------------------------------------------------------
| AUC | 0.9889 | | 0.9897 | | 0.9800 | | 1.0000 |
------------------------------------------------------------------
> par(mfrow=c(2,2))
RF, method=“0.632+”, legend=TRUE)
> plot(vulpes_
> plot(vulpes_
RF, method=“sample”)
> plot(vulpes_
RF, method=“0.632+”,
main=“Comparison between methods”)
> plot(vulpes_
RF, method=“0.632”, col=“blue”, add=TRUE)
Figure 16.9 Plot of the Daim object generated by the Daim() function correspond-
ing to ROC curves for various bootstrap evaluation methods. (a) The method
“0.632+” discussed in the main text, (b) all the bootstrap samples, (c) a comparison
of methods bottom left, and (d) all the bootstrap samples plotted together with the
one of the “0.632+” method bottom right. FPR: false positive (presence) rate (1
–specificity), TPR: true positive (presence) rate (sensitivity), loob: leave-one-out
bootstrap. See Efron and Tibshirani (1993), Efron and Gong (1983), and Efron and
Tibshirani (1997). (A black and white version of this figure will appear in some formats. For
the color version, please refer to the plate section.)
> plot(vulpes_
RF, method=“loob”, col=“green”, add=TRUE)
> legend(“bottomright”, c(“0.632+”,”0.632”,”loob”),
col=c(“red”,”blue”,”green”), lty=1, inset=0.01)
> plot(vulpes_
RF, all.roc=TRUE)
This function can also be used to obtain the optimal cut-point cor-
responding to the “0.632+ bootstrap” estimation of the sensitivity and
the specificity. In the following example, the best cut-point corresponds
to 0.46.
> vulpes_
RF2 <-Daim(formula=VulpesVulpes~., model=myRF,
data=vulpes_
data, labpos=“1”, control=Daim.control(method=“boot”,
number=100), cutoff=“0.632+”)
> summary(vulpes_RF2)
Call:
VulpesVulpes ~ bio3 + bio4 + bio7 + bio11 + bio12
Result:
------------------------------------------------------------------
| Method: | .632+ | | .632 | | loob | | apparent |
==================================================================
| Error: | 0.0462 | | 0.0438 | | 0.0694 | | 0.0000 |
------------------------------------------------------------------
| Sensitivity: | 0.9570 | | 0.9590 | | 0.9351 | | 1.0000 |
------------------------------------------------------------------
| Specificity: | 0.9508 | | 0.9536 | | 0.9266 | | 1.0000 |
------------------------------------------------------------------
| AUC | 0.9891 | | 0.9899 | | 0.9803 | | 1.0000 |
------------------------------------------------------------------
time period, (ii) in a different geographic area but same time period, (iii)
in a different time period but same geographic area, (iv) in a different
area and time period (some cases of biological invasions under climate
change typically fall into this category). However, since the consider-
ations developed for point (ii) and (iii) should apply conjointly for point
(iv), we will not discuss the latter any further herein.
When running an independent evaluation in the same area and time
period, one of the potential problems is that the training and test data-
sets are not spatially independent, i.e. that their observations are spatially
autocorrelated, therefore reducing the “independence” of the test obser-
vations, as regards the training observations. In order to assess this, the
spatial independence between the test and training sets can be tested
with spatial autocorrelation methods, as done for instance in Pottier et
al. (2013) (see also: Bahn et al., 2006; Bahn and McGill, 2007; Beale et
al., 2013; Fithian et al., 2015). In turn, in this evaluation in a same area
and time period the spatial structure of the environmental predictors
remains the same between the training and test sets, so that the model
can be confidently transferred from one situation to the other. However,
it will not guarantee that the model can be applied to another area (e.g.
Randin et al., 2006) or time period (e.g. Araújo et al., 2005a) where the
spatial co-variation between predictors is different or has changed (e.g.
Wenger and Olden, 2012), as when attempting to anticipate biological
invasions (e.g. Thuiller et al., 2005b) or the impact of climate change on
We have already seen in Parts I and IV that HSMs can be used to make
predictions in time and/or space. For the purposes of convenience, in this
section “projection” will be used to refer to any prediction made outside
of the study area or time period used to train the model. We will also
at times refer to this procedure as transferability in space and time. One
can, for instance, project: (i) to a different area to anticipate biological
invasions (e.g. Thuiller et al., 2005b; Gallien et al., 2010; Petitpierre et al.,
2012); (ii) to future time periods to assess the possible impact of climate
change on species ranges or diversity (e.g. Engler et al., 2011a;Thuiller et
al., 2011); (iii) to both other areas and time periods, to assess the future
state of invasions in a changed climate (e.g. Roura-Pascual et al., 2004;
Broennimann and Guisan, 2008; Peterson et al., 2008b); (iv) to past peri-
ods (hindcasting; (e.g. Espíndola et al., 2012; Maiorano et al., 2013) or
(v) to present distribution from past records (forecasting; Pearman et al.,
2008b). However, additional assumptions have to be made to make these
transfers.
This part is composed of a single chapter (Chapter 17), divided into
four sections. The first section introduces the additional assumptions
made when projecting models in space and time. The second and third
sections then present approaches and examples of projections in space and
time respectively. Finally, the fourth section presents the use of ensemble
modeling for projections. This part is therefore based on, and comple-
ments, Parts III and IV, by showing how previously fitted and discussed
models can be used to generalize projections in space and time. When
predicting to different study areas or time periods (i.e. projecting), we
will see that new issues arise, such as niche completeness, niche stability,
and environmental analogy, and that these require careful consideration
before making or interpreting any projections.
Figure 17.2 below can be generated with the following R code, using
the niche quantification and comparison functions from the ecospat
R package.
# load climate variable for all site of the North American study
area (column names should be x,y,X1,X2,...,Xn)
> clim2<-read.table(“tabular/bioclim/current/clim.vulpesEU_100.
txt”,h=TRUE)
> occ.sp1 <-na.exclude(ecospat.sample.envar(dfsp=occ.sp1,colspx
y=1:2,colspkept=NULL,dfvar=clim1,colvarxy=1:2,colvar=“all”,
resolution=1))
> occ.sp2 <-na.exclude(ecospat.sample.envar(dfsp=occ.sp2,cols
pxy=1:2,colspkept=NULL,dfvar=clim2,colvarxy=1:2,colvar=“all”,
resolution=1))
> row.w.1.env<-1-(nrow(clim1)/nrow(clim12))
# prevalence of clim1
> row.w.2.env<-1-(nrow(clim2)/nrow(clim12))
# prevalence of clim2
> row.w.env<-
c(rep(row.w.1.env, nrow(clim1)),rep(row.w.2.env,
nrow(clim2)), rep(0, nrow(occ.sp1)), rep(0, nrow(occ.sp2)))
> fac<-
as.factor(c(rep(1, nrow(clim1)),rep(2, nrow(clim2)),
rep(1, nrow(occ.sp1)),rep(2, nrow(occ.sp2))))
# global dataset for the analysis and rows for each sub dataset
> data.env.occ<-rbind(clim1,clim2,occ.sp1,occ.sp2)[Xvar]
> row.clim1<-1:nrow(clim1)
> row.clim2<-(nrow(clim1)+1):(nrow(clim1)+nrow(clim2))
> row.clim12<-1:(nrow(clim1)+nrow(clim2))
> row.sp1 <-(nrow(clim1)+nrow(clim2)+1):(nrow(clim1)+nrow(clim2
)+nrow(occ.sp1))
> row.sp2 <-(nrow(clim1)+nrow(clim2)+nrow(occ.sp1)+1):(nrow(cli
m1)+nrow(clim2)+nrow(occ.sp1)+nrow(occ.sp2))
## PCA-ENV
> z1 <-ecospat.grid.clim.dyn(scores.clim12,scores.clim1,
th.sp= 0,scores.sp1,R)
> z1$z.uncor <- z1$Z
> z2 <-ecospat.grid.clim.dyn(scores.clim12,scores.clim2,
th.sp= 0,scores.sp2,R)
> z2$z.uncor <- z2$Z
Figure 17.3 Illustration of the niche truncation problem for an oak species with
restricted range in Europe, Quercus crenata. Figure based on Thuiller et al. (2004b),
with permission. (a) Response curves along the mean temperature of the coldest
month from GAM (generalized additive models) fitted with restricted (two differ-
ent levels of truncation) and full ranges (see the two thick lines above and below
the plot), and (c) how GAM handles the extrapolation along the whole temperature
gradient. Note that the curve is forced to zero in the second truncation case, whereas
in the more severe truncation, the GAM forces the curve to increase again below
temperatures of zero. (b) Spatial prediction based on the truncated model, showing
incorrect predictions of the species in Scandinavia. (d) Spatial prediction with the
full-range model, showing the correct prediction to the observed distribution range,
in the South of France and Italy.
Figure 17.5 Comparison of the realized niches of the red fox (Vulpes vulpes) between
its native distribution in Eurasia and both native and invaded distribution in North
America. (a) Realized climatic niche in Eurasia; (b) realized climatic niche in North
America; (c) overlap of the climatic niches between the two ranges, showing the
stable (shared, overlapping) portion of the two niches in dark, and the differing niche
conditions between the two ranges in black and grey, black showing conditions
found only in North America and grey showing conditions found only in Eurasia.
be observed (as in Figure 17. 5). The question thus becomes how much
of its fundamental niche a species occupies in the field at a given time
(Maiorano et al., 2013), i.e. is it reduced to a smaller realized niche, and
if so by how much? For instance, in the case of the red fox example
used throughout this book, quantifying the niche in Eurasia and North
America separately reveals slight niche differences, although the species
is native in both ranges (Figure 17.5). This shows that each range only
captures part of the full realized niche.
Figure 17.5 can be generated with the following R code. Following
the previous calculation of environmental density, we can calculate the
occurrence density for the species in each range.
# Calculation of occurrence density using the ecospat.grid.clim.
dyn() function from the ecospat package
> z1 <-ecospat.grid.clim.dyn(scores.clim12,scores.clim1,
th.sp= 0,scores.sp1,R)
> z2 <-ecospat.grid.clim.dyn(scores.clim12,scores.clim2,
th.sp= 0,scores.sp2,R)
When detecting changes in the realized niche, the next question is: are
there any species properties that allow us to predict how much of its fun-
damental niche a species occupies? For instance, if a species is dominant
across its full range of tolerances and has good dispersal ability, it is likely
to occupy a larger part of its fundamental niche than subordinate species
or species with limited dispersal abilities.This may influence whether the
niche can be safely projected in space and time (Pearman et al., 2008b).
However, very few studies have so far attempted to quantify the dif-
ference between the fundamental and realized niche (e.g. Malanson et
al., 1992; Vetaas, 2002; Kearney and Porter, 2004; Wharton and Kriticos,
2004; Araújo et al., 2013). This is because this question is extremely dif-
ficult to assess from empirical data on species distributions, and experi-
mental in situ and ex situ studies are also needed to explore this issue (but
see Araújo et al., 2013).
The evolutionary explanation of niche change relates to a change of
the fundamental niche of species, e.g. through evolution in the new range
or in the new period (Dietz and Edwards, 2006).This could theoretically
be caused by founder effects followed by genetic drift or natural selection
in the case of biological invasions (Pearman et al., 2008b), as discussed by
Lavergne and Molofsky (2007) for an invasive grass species.
Another crucial question here is to know how to measure such
changes in the realized niche (Guisan et al., 2014)? Depending on the
statistical approach and test used, there may be different answers to the
same question (Pearman et al., 2008a; Warren et al., 2008; Guisan et al.,
2014). For instance, Warren and colleagues (2008) reviewed two dis-
tinct tests of niche differences in geographical space, later generalized
in environmental space by Broennimann et al. (2012). This highlights
a first important dichotomy between existing tests in two approaches
(Broennimann et al., 2012; Guisan et al., 2014; Figure 17.6): (i) tests in
environmental space (i.e. ordination), using multivariate ordinations; (ii)
tests in geographic space, using predictions of ecological niche models
(Figure 17.6).
Figure 17.6 The two approaches commonly used to quantify niche changes between
ranges). Ordination is based only on the observations, whereas HSM is based only
on the predictions (see reference 22 and Box 1 in Guisan et al. 2014). The steps
for ordination are (square numbers): 1. Definition of the reduced multidimensional
environmental space; 2. Plotting the observations from each range in this space; 3.
Comparing the niche defined from observations in each range; 4. Calculating the
niche change metrics (see Box 3 in Guisan et al. 2014). The steps for HSMs are: 1.
Fitting HSMs by relating field observations to environmental variables; 2. Projecting
the HSMs in geographic space; 3. Computing differences in the projections; 4.
Calculating the niche change metrics. See Guisan et al. (2014) for discussion of
the respective strengths and weaknesses of the two approaches. Figure from Guisan
et al. (2014), with permission. (A black and white version of this figure will appear in some
formats. For the color version, please refer to the plate section.)
Using both approaches, niches can be further tested for being strictly
equivalent (test of niche equivalency) or for being more similar to one
another than to any random niche fitted in the same realized environ-
ment (test of niche similarity; Warren et al., 2008; Broennimann et al.,
2012). For instance, in the case of biological invasions, the test of niche
equivalency is usually so strict that it is rejected (often slightly, in both
spaces) for most species between their native and invaded geographic
ranges (Petitpierre et al., 2012), and so it would prevent projecting pre-
dictions to other areas for most species. On the other hand, niche simi-
larity only tests if the two niches (in different time periods or areas)
This produces the red fox model using a point dataset. In this case, the
points originate from a range map that has been sampled at regular spa-
tial intervals, so the file does not differ much from a spatial raster file.
In many other cases, however, individual point locations are available
(as downloaded e.g. for Pinus edulis Engelm. in Section 6.2.9) as is usu-
ally the case when using or downloading museum-type data e.g. from
Figure 17.7 Simulated global habitat suitability of Vuples vulpes using a simple GAM
model and five bioclim variables as predictors.
Figure 17.8 Spatial map of standard errors around the observation points for the
GAM model of Vulpes vulpes.
Figure 17.9 Spatial distribution of errors around predictions for the Vulpes vulpes
GAM model.
Usually, one finds an obvious pattern to such errors, and can clearly
see that some, usually only a few, pixels have very high errors, while
most pixels contain comparably low standard errors. In our example,
such “error pixels” are mostly found along coasts and on the edge of
the distribution that are rather marginal with regards to the species’
distribution range.
From Figure 17.10, we can see that the differences between the two
regions are rather minor, except in the upper left corner.This means that,
in general, climates are similar in the two regions.
Another complementary approach is to apply a MESS method (Elith
et al., 2010) as implemented in the dismo package in R. This approach
measures the environmental similarity of a point (e.g. presence data) to
the reference environment. In other words, it quantifies how far or close
Figure 17.10 Differences in the ecological space of Vulpes vulpes between Old (red)
and New (blue) World climates as mapped in (a) the geographic space and (b) the
PCA space based on four bioclim variables. (A black and white version of this figure will
appear in some formats. For the color version, please refer to the plate section.)
the projected area is to the training points. Negative values mean dis-
similar points, the more negative these values are the more the points are
dissimilar.
For instance, if we want to estimate the MESS between the training
point of V. vulpes in Europe compared to North America:
> library(dismo)
> vulpes_east<-mammals_data[mammals_data$X_WGS84>-13.0,
c(1:2,8:13)]
> vulpes_ne<-vulpes_east[vulpes_east$Y_WGS84>30,]
> vulpes_europe<-vulpes_ne[vulpes_ne$X_WGS84<60,]
> Mess.Vulpes <-mess(biostack.curr, vulpes_ europe[,c(4, 6:8)])
> plot(Mess.Vulpes)
> points(vulpes_ oldnew[,1:2], col=cols, pch=16, cex=0.3)
Figure 17.13 Projected habitat suitability of Vulpes vulpes under (a) current and
(b) projected future climate, mapped over the extent of North America from a glob-
ally fitted GAM model.
> bio3r.fu<-raster(“raster/bioclim/future/grd/bio3.grd”)
> bio7r.fu<-raster(“raster/bioclim/future/grd/bio7.grd”)
> bio11r.fu<-raster(“raster/bioclim/future/grd/bio11.grd”)
> bio12r.fu<-raster(“raster/bioclim/future/grd/bio12.grd”)
> biostack.fut<-stack(bio3r.fu,bio7r.fu,bio11r.fu,bio12r.fu)
> names(biostack.fut)
[1]“bio3” “bio7” “bio11” “bio12”
It appears that due to the naming of the raster files in the “grd” folder on
the hard drive, the names in the raster stack are exactly the same as those
in the current climate stack (biostack.curr).This is important for the
next step. Let’s now project the V. vulpes GAM model to future climates
and map the resulting predictions across North America in order to assess
range changes. For this step, the names of the predictor variables have
to precisely match those used to fit the model. The map shows that the
predicted habitat suitability of V. vulpes is likely to expand toward more
northern latitudes (Figure 17.13).
> vulpes.fut <-predict(biostack.fut, gam1, type=“response”)
> vulpes.na.cur<-crop(vulpes.curr, extent(-170,-50,10,90))
> vulpes.na.fut<-crop(vulpes.fut, extent(-170,-50,10,90))
> par(mfrow=c(1,2))
> plot(vulpes.na.cur, col=two.colors(start=“grey90”,
end=“firebrick4”, middle= “orange2”),main=“Current climate”)
> plot(vulpes.na.fut, col=two.colors(start=“grey90”,
end=“firebrick4”,middle= “orange2”),main=“Future climate”)
> par(mfrow=c(1,1))
Figure 17.14 Future habitat suitability for Vulpes vulpes predicted by (a) a GAM
model, and (b) its associated uncertainty.
areas also correspond to the most northern latitude the habitat suitability
is projected to expand to (Figure 17.14).
> biostack.fut_df <-as.data.frame(rasterToPoints(biostack.fut))
> vulpes.fut_se <-predict(gam1, biostack.fut_ df,
type=“response”, se.fit=T)
> vulpes.fut_se <- rasterFromXYZ(cbind(biostack.fut_df[,1:2],
vulpes.fut_se), biostack.fut)
> vulpes.fut_se<-crop(vulpes.fut_se, extent(-170,-50,10,90))
> names(vulpes.fut_ se) <-c(“Habitat suitability future
climate”, “Habitat suitability -Uncertainty”)
> plot(vulpes.fut_ se, col=two.colors(start=“grey90”,
end=“firebrick4”, middle= “orange2”))
Figure 17.15 Variation in the true skill statistics in the 20-fold repeated split sampling
procedure.
We have now run 20-fold repeated split sampling for five different tech-
niques and used all the calibrated models to project the potential future
climatic suitability for the species.
Let’s first look at the quality of the model to judge whether some
of the techniques or runs need to be discarded due to poor quality
(Figure 17.15). We will use here the ggplot package.
# Variation in TSS between models and cross- validation runs
> library(ggplot2)
> TSS <-unlist(Test_ results)
> TSS <-as.data.frame(TSS)
> Test_results_ggplot <- cbind(TSS,
model=rep(rownames(Test_results), times=20))
# Variability in predictive accuracy between cross- validation
# runs and models.
> p <-ggplot(Test_ results_ggplot, aes(model, TSS))
> p + geom_boxplot()
As already seen in Part III, the observed presences and absences of the
red fox are modeled relatively well under current climatic conditions
(Figure 17.16). Both ensemble forecasts (mean and median) gave similar
predictions. The uncertainty maps show areas where the models tend
to differ across the different runs of repeated split sampling. These are
mostly concentrated in North Africa where the models not only tend to
over-predict southward, but also disagree with each other.
What would happen under future conditions? Here, we have used
projections of future climate by 2080 under the A1FI scenario down-
loaded from the Worldclim dataset. Using the same strategy as for the
current conditions, we first transform the point data into raster stacks:
Figure 17.16 Observed presence and absence of Vulpes vulpes at (a) the global scale,
together with (b and c) the two model averaging predictions (mean and median);
and (d) the ensemble modeling uncertainty (sd).
Figure 17.17 Observed presence and absence of Vulpes vulpes at (a) global scale,
together with (b, c) the two model averaging projections for 2080 (mean and
median) and (d) the ensemble modeling uncertainty (sd).
We can first contrast the mean probability of the ensemble forecast and
the committee averaging, which need to correlate to a certain extent.
# Link between committee averaging and mean probabilities across
# the models and repetitions.
> plot(ProjFuture_CA$CA,ProjFuture_total_mean,
xlab=“Committee
averaging”, ylab=“Mean probability”)
Figure 17.18 Relationship between Vulpes vulpes projections into the future from
either committee averaging (x-axis) or mean probability across all techniques and
repeated split sampling (y-axis).
Figure 17.19 Future climatically suitable sites for Vulpes vulpes according to the
different committee averaging procedures. (a) CA_GLM, (b) CA_GAM, (c) CA_
MARS, (d) CA_FDA, (e) CA_RF represent the committee averaging for GLM,
GAM, MARS, FDA and random forest across the 20 different repeated split sam-
pling, while (f) CA_ALL represents the committee averaging across all techniques
and repeated split sampling runs.
# Species range change
> SRG <- 100*(colSums(ProjFuture_results_bin)-
sum(FutureEnv$VulpesVulpes))/sum(FutureEnv$VulpesVulpes)
> SRG_
ToPlot <-as.data.frame(as.numeric(SRG))
> SRG_
ToPlot$Model <-rep(c(“GLM”,”GAM”,”MARS”,”FDA”,”RF”), 20)
> colnames(SRG_ToPlot)[1] <- “SRG”
> library(ggplot2)
> ggplot(SRG_ ToPlot, aes(SRG)) + geom_
histogram(aes(y =
..density.., fill = ..count..), binwidth=1) + geom_density()
+ scale_
fill_
gradient(“Count”, low = “lightgrey”, high =
“black”) + xlab(“Species Range Change (%)”)# Density
On average, we can see that the red fox is predicted to increase its total
range by about 9–10% (Figure 17.20). However, we can also see that,
depending on the technique and the split-sampling run, the expected
species range could vary from a small reduction (-1%) to an almost 25%
increase. Employing several techniques and several split-sampling runs
means we can have more confidence in the projections, which in this
case give an increase of around 10%.
Indeed, we can see that even for a given modeling technique, high
levels variation can be found (Figure 17.21).
> p <-ggplot(SRG_
ToPlot, aes(SRG, colour=Model))
> p + geom_
density()+ xlab(“Species Range Change (%)”)
Figure 17.21 Density plot representing the variation in modeled species range for
Vulpes vulpes for each technique due to the different repeated split sampling runs. (A
black and white version of this figure will appear in some formats. For the color version, please
refer to the plate section.)
1
http://r-forge.r-project.org/projects/biomod/
2
www.worldclim.org/download
3
www.unil.ch/hsdm
4
www.worldclim.org
Part Chapter Section File name Path: > setwd(“PATH/ Type Source (Book website or
data/”) code)
2 6 6.2.2 bio3.grd ~/raster/bioclim/current/g rd/ grid biomod2 package
2 6 6.2.2 bio7.grd ~/raster/bioclim/current/g rd/ grid biomod2 package
2 6 6.2.2 bio11.grd ~/raster/bioclim/current/g rd/ grid biomod2 package
2 6 6.2.2 bio12.grd ~/raster/bioclim/current/g rd/ grid biomod2 package
2 6 6.2.2 GTOPO30.tif ~/raster/topo/ Tiff biomod2 package
2 6 6.2.4 isolines.shp ~/vector/globe/ Shape file
2 6 6.2.5 latitude.tif ~/raster/other/ Tiff
2 6 6.2.5 longitude.tif ~/raster/other/ Tiff
2 6 6.2.6 hillshade.tif ~/raster/topo/ Tiff
2 6 6.2.8 prec_30yr_normal_ ~/raster/prism/ ascii PRISM project
annual.asc
2 6 6.2.8 tave_30yr_normal_ ~/raster/prism/ ascii PRISM project
annual.asc
2 6 6.2.9 pinus_edulis_occ.csv.txt ~/tabular/species/ csv GBIF July 2014
2 6 6.2.10 cal.txt ~/tabular/species/ txt Calculated from the
on 12 Feb 2018 at 17:53:11, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
mammals_and_bioclim_table.csv
dataset
using the ecospat.caleval()
function from
ecospat package
2 6 6.2.10 eva.txt ~/tabular/species/ txt Calculated from the
mammals_and_bioclim_table.csv
dataset
using the ecospat.caleval()
function from
ecospat package
Part Chapter Section File name Path: > setwd(“PATH/ Type Source (Book website or
data/”) code)
3 14 14 bio7.grd system.file(“external/ grd biomod2 package
bioclim/current/bio7.
grd”,package=“biomod2”)
3 14 14 bio11.grd system.file(“external/ grd biomod2 package
bioclim/current/bio11.
grd”,package=“biomod2”)
3 14 14 bio12.grd system.file(“external/ grd biomod2 package
bioclim/current/bio12.
grd”,package=“biomod2”)
4 15 15.1.1 EvalData.txt ~/tabular/ txt file models calculated from the
mammals_and_bioclim_table.csv
dataset
4 16 16.2.1 summary_mammals_ ~/tabular/species/ csv
and_bioclim.csv
5 17 17.1 clim.vulpesEU_100.txt ~/tabular/bioclim/current txt file obtained from biomod2 package
on 12 Feb 2018 at 17:53:11, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms.
dataset
5 17 17.1 clim.vulpesNA_100.txt ~/tabular/bioclim/current txt file obtained from biomod2 package
dataset
5 17 17.1 vulpes_eu.txt ~/tabular/species/ txt file obtained from biomod2 package
dataset
5 17 17.1 vulpes_na.txt ~/tabular/species/ txt file obtained from biomod2 package
dataset
5 17 17.3 bio3.grd ~/raster/bioclim/future/grd/ grd
5 17 17.3 bio7.grd ~/raster/bioclim/future/grd/ grd
5 17 17.3 bio11.grd ~/raster/bioclim/future/grd/ grd
5 17 17.3 bio12.grd ~/raster/bioclim/future/grd/ grd
6 19 19.1 protea laurifolia data frame downloaded from GBIF with R
presence records code
6 19 19.1 worldclim data current ~/WorldClim_data Tiff downloaded from worldclim with
R code
6 19 19.1 worldclim data future 50 ~/WorldClim_data Tiff downloaded from worldclim with
R code
6 19 19.1 worldclim data future 70 ~/WorldClim_data Tiff downloaded from worldclim with
R code
6 19 19.1 south africa shape file download.file(url = “https:// Shape file biomod2 package
sourceforge.net/projects/
biomod2/files/
data_for_example/south_of_
africa.zip”, destfile = “south_
of_africa.zip”)
6 19 19.2 larus presence records data frame downloaded from GBIF with R
code
6 19 19.1 worldclim data current ~/WorldClim_data Tiff downloaded from worldclim with
R code
6 19 19.1 worldclim data future 50 ~/WorldClim_data Tiff downloaded from worldclim with
R code
6 19 19.1 worldclim data future 70 ~/WorldClim_data Tiff downloaded from worldclim with
R code
356 · Data and Tools Used in this Book, with Developed Case Studies
Table 18.2 A list of the R packages used to perform the examples illustrated in
this book.
variables. It should be noted, however, that the presences of the ref fox
in some other invaded areas, mainly Australia and New Zealand, are not
included and therefore worldwide projections should be interpreted
accordingly. This red fox dataset is well suited to the different examples
because it has a global extent (world) which means everyone can easily
understand the illustrations. A coarser resolution version of the dataset is
also directly available in the biomod2 package, and the dataset used in
this book, which has a higher resolution (166 km) is available through
the book website.5 In this type of dataset, the standard functions in R and
several external libraries have been used to prepare the data and perform
the modeling analyses. A list of the R packages required for the examples
and analysis is also provided in Table 18.2.
Many of the resources used in this book are available on the book
website at www.unil.ch/hsdm.
5
www.unil.ch/hsdm
Data
This example relies on presence- only data downloaded from GBIF,
which will then require the creation of a set, or several sets, of pseudo-
absence data. The explanatory variables are raster grid data downloaded
from the WorldClim datacenter.
Modeling Steps
• Loading and formatting the presence-only data
• Loading and formatting the raster data
• Building a range of models and ensemble models using biomod2
• Decomposing the models’ variability (predictive ability /predictions)
• Projections under current and future conditions
• Species’ range change estimates.
1
www.gbif.org/
Figure 19.1 Protea laurifolia flower and leaves. (Photo from www.flickr.com/photos/
flowcomm/.) (A black and white version of this figure will appear in some formats. For the
color version, please refer to the plate section.)
2
www.plantzafrica.com/plantnop/protealauri.htm
3
www.gbif.org/species/5637308
$`7468831`
[1]“no data found, try a different search”
$`8498444`
[1]“no data found, try a different search”
$`5637308`
# A table: 290 × 5
attr(,”args”)
attr(,”args”)$taxonKey
attr(,”args”)$fields
[1]“name” “key” “country”
[4] “decimalLatitude” “decimalLongitude”
It appears that only the item “5637308” is data so we can remove the
other ones.
> data <-data[[‘5637308’]]
# unzip climatic files
> unzip(zipfile = “WorldClim_data/ current_ bioclim_
10min.zip”,
exdir = “WorldClim_data/current”,
overwrite = T)
> list.files(“WorldClim_data/
current/ bio/
”)
[1]“bio_1” “bio_ 10” “bio_11” “bio_ 12” “bio_13” “bio_
14”
[7] “bio_ 15” “bio_ 16” “bio_ 17” “bio_ 18” “bio_ 19” “bio_ 2”
[13] “bio_ 3” “bio_ 4” “bio_ 5” “bio_ 6” “bio_ 7” “bio_ 8”
[19] “bio_ 9” “info”
> unzip(zipfile = “WorldClim_ data/ 2050_BC_45_
bioclim_10min.
zip”,
exdir = “WorldClim_data/2050/BC_45”,
overwrite = T)
At this point, we have all the species and climate data we need for this
example.
The bioclimatic variables are stored in grid format. We will first stack
them, so they can all be found in one file.
> library (raster)
> bioclim_
world <-stack(list.files(“WorldClim_data/
current/
bio”,
pattern = “bio_
”, full.names = T), RAT = FALSE)
We will start with a shape file for the whole of Southern Africa from the
biomod2 package, stored in the biomod2 repository. Then, since we are
First, we convert the raster object into a data frame to run the PCA (but
note that PCA can now deal with raster objects). We also need to remove
the non-defined area from this dataset.
> bioclim_ZA_df <- na.omit(as.data.frame(bioclim_ZA))
> head(bioclim_ZA_df)
Two points, located in the top-left corner of the graph (Figure 19.2),
are far from all other points and can thus be considered as outliers.
Because these outliers can seriously distort the analyses, and considering
that in practice there are a number of good reasons for excluding them
(e.g. if these are obviously numerical errors), they can easily be removed
and the PCA performed again.
# tail of distributions
> sort(pca_ ZA$li[, 1])[1:10]
[1]-24.646 -24.463 - 8.758 - 8.462 - 8.292 -
8.290
-8.138 -8.136
[9] - 8.131 -
7.979
# IDs of points to remove
> (to_remove <-which(pca_ ZA$li[, 1] < - 10))
[1]4069 4070
# remove points and re- compute PCA
> if(length(to_ remove)){ ## remove outliers
bioclim_ZA_df <- bioclim_ZA_df[ - to_remove,]
pca_ZA <-dudi.pca(bioclim_ ZA_df, scannf = F, nf = 2)
}
Figure 19.2 Plot of the principal component analysis scores of the first two axes of
the South African environmental space.
The next step could be to investigate the distribution of our target spe-
cies, Protea laurifolia, in the environmental space defined by the PCA.
First, we investigate how Protea laurifolia is distributed along the first two
PCA axes. Secondly, we illustrate the projection of the selected biocli-
matic variables over the same two PCA axes.
> par(mfrow=c(1, 2))
# Discriminate Protea laurifolia presences from the entire
# South African environmental space.
> s.class(pca_
ZA$li[, 1:2], fac= factor(rownames(bioclim_ZA_df)
%in% ProLau_
cell_
id, levels = c(“FALSE”, “TRUE” ),
labels = c(“background”, “ProLau”)), col=c(“red”, “blue”),
csta = 0, cellipse = 2, cpoint = .3, pch = 16)
> mtext(“(a)”, side = 3, line = 3, adj = 0)
> s.corcircle(pca_
ZA$co, clabel = .5 )
> mtext(“(b)”, side = 3, line = 3, adj = 0)
Figure 19.3 Distribution of the points of Protea laurifolia (ProLau) in the environ-
mental space defined by the first two PCA axes (a) and correlation circle of the
selected bioclimatic variables (see full names at worldclim website) as a function of
the same first two PCA axes (b).
5
www.worldclim.org/bioclim
Biomod2 Formatting
The first step is to put the data into the right format. This is done using
the BIOMOD_FormatingData() function where we have to provide
the species occurrences and associated coordinates, the environmental
conditions, and the name of the species of interest. In this example we
will start with presence-only data, but because most niche models need
both presences and absences, we need to sample a set of pseudo-absences/
background data from the South African landscape (see Part II). Since
this process implies a stochastic procedure caused by the random selec-
tion (potentially stratified) of the pseudo-absences, it is recommended
that several sets of pseudo-absences data are built to prevent sampling
bias, especially for moderate or low numbers of pseudo-absences (say
<1000). A suite of tests will be carried out to investigate the effect of
each pseudo-absence selection on the predictive ability of the models. In
biomod2, routines are built in to carry out the pseudo-absence selection
under different procedures. Here we will use the simplest one, a ran-
dom sampling, and repeat it three times with a selection of 500 pseudo-
absences/background data.
> library(biomod2)
> ProLau_
data <-BIOMOD_
FormatingData(
resp.var = rep(1, nrow(ProLau_
occ)), expl.var = bioclim_ZA_sub,
resp.xy = ProLau_
occ[, c(‘decimalLongitude’, ‘decimalLatitude’)],
resp.name = “Protea.laurifolia”, PA.nb.rep = 3,
PA.nb.absences = 500, PA.strategy = ‘random’)
sp.name = Protea.laurifolia
4 explanatory variables
bio_
5 bio_
7 bio_
11 bio_
19
Min. :188 Min. :142 Min. : 35 Min. : : 3.0
1st Qu.:276 1st Qu.:241 1st Qu.: 96 1st Qu.: 21.0
Median :300 Median :275 Median :111 Median : 37.0
Mean :299 Mean :272 Mean :112 Mean : 72.2
3rd Qu.:321 3rd Qu.:313 3rd Qu.:124 3rd Qu.: 80.0
Max. :383 Max. :352 Max. :189 Max. :429.0
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-=-=-=-=-=-
# plot of selected pseudo- absences
> plot(ProLau_ data)
Biomod2 Modeling
We now come to the main step where SDMs are parameterized and
fitted.
Although the default parameters should reflect the settings most com-
monly used in published SDM studies, users still have the option to
fine-tune the parameters for each algorithm separately. In the example
below, we specify the use of quadratic terms and first-order interactions
in GLMs, to limit the number of trees to 1000 in GBMs, and to use the
“mgcv” package to fit the GAMs. As we decided to run the RFs with
Figure 19.4 Plot of the species distribution (occurrences) and three selected sets of
pseudo-absences. (A black and white version of this figure will appear in some formats. For
the color version, please refer to the plate section.)
models_
# ProLau_ scores is a 5 dimension array containing the
# scores for the models
models_
> dim(ProLau_ scores)
[1]3 4 4 4 3
models_
> dimnames(ProLau_ scores)
[[1]
]
[1]“KAPPA” “TSS” “ROC”
[[2]
]
[1]“Testing.data” “Cutoff” “Sensitivity”
“Specificity”
[[3]
]
[1]“GLM” “GBM” “RF” “GAM”
[[4]
]
[1]“RUN1” “RUN2” “RUN3” “RUN4”
[[5]
]
Protea.laurifolia_
PA1 Protea.laurifolia_
PA2 Protea.laurifolia_
PA3
“PA1” “PA2” “PA3”
Graphical tools can also be used to assess the influence of the differ-
ent choices made when parameterizing the models (e.g. the choice of
algorithm (Figure 19.5), cross-validation run (Figure 19.6), pseudo-
absences sampling (Figure 19.7)) according to the selected evalu-
ation metrics. Here we focus on TSS and AUC (ROC scores) only.
On these graphs, the points represent the mean of evaluation score
for a given condition and the lines represent the associated standard
deviations.
> models_
scores_
graph(ProLau_
models, by = “models” ,
metrics = c(“ROC”,”TSS”), xlim = c(0.5,1), ylim = c(0.5,1))
Figure 19.5 Plot of the mean of the model evaluation scores (by algorithms) accord-
ing to two different evaluation metrics, ROC (AUC) and TSS. (A black and white
version of this figure will appear in some formats. For the color version, please refer to the plate
section.)
Figure 19.6 Plot of the mean of the model evaluation scores (by cross-validation)
according to two different evaluation metrics, ROC (AUC) and TSS. (A black and
white version of this figure will appear in some formats. For the color version, please refer to
the plate section.)
Figure 19.7 Plot of the mean of the model evaluation scores (by dataset) accord-
ing to two different evaluation metrics, ROC (AUC) and TSS. (A black and white
version of this figure will appear in some formats. For the color version, please refer to the
plate section.)
> models_
scores_
graph(ProLau_
models, by = “cv_
run” ,
metrics = c(“ROC”,”TSS”), xlim = c(0.5,1), ylim = c(0.5,1))
> models_
scores_
graph(ProLau_
models, by = “data_
set” ,
metrics = c(“ROC”, “TSS”), xlim = c(0.5, 1), ylim = c(0.5, 1))
Figure 19.8 Plot of the response curves of a model (glm) to each variable.
Figure 19.9 Plot of the response curves of four variables in a GBM (generalized
boosting model) for Protea laurifolia.
Figure 19.10 Plot of the response curves of four variables in a RF (random forest)
model for Protea laurifolia.
Figure 19.11 Plot of the response curves of four variables in a GAM (generalized
additive model) for Protea laurifolia.
As for single algorithm models, we can check the scores for the ensem-
bles of models.
> (ProLau_ensemble_
models_
scores <-get_evaluations(ProLau_
ensemble_models))
$Protea.laurifolia_EMcvByTSS_mergedAlgo_mergedRun_mergedData
Testing.data Cutoff Sensitivity Specificity
KAPPA NA NA NA NA
TSS NA NA NA NA
ROC NA NA NA NA
$Protea.laurifolia_EMcaByTSS_mergedAlgo_mergedRun_mergedData
Testing.data Cutoff Sensitivity Specificity
KAPPA 0.906 884.0 93.62 98.02
TSS 0.939 237.0 100.00 93.92
ROC 0.995 239.5 100.00 93.92
$Protea.laurifolia_EMwmeanByTSS_mergedAlgo_mergedRun_mergedData
Testing.data Cutoff Sensitivity Specificity
KAPPA 0.908 752.0 93.62 98.10
TSS 0.938 642.0 96.81 96.96
ROC 0.995 642.5 96.81 96.96
We can see that all our ensembles of models perform well (with evaluation
scores of higher than 0.9 for all three evaluation metrics). Committee
averaging seems to provide a slightly better evaluation than weighted
mean, so we will keep the former to present the results hereafter.
Biomod2 Projections
Having built a range of SDMs and two ensemble models for P. laurifo-
lia and shown how accurate these models were, we will now turn our
attention to current and future spatial distributions of our focal spe-
cies, using the ensemble of models built under the committee averaging
The spatial projections for current conditions are stored in the “proj_
current” directory.
The Worlclim bioclimatic scenarios downloaded from the Worlclim
website were used to project future distributions. Although there are
a wide range of scenarios available, we will only focus here on GCM
BCC-CSM1-1 coupled with the RCP 45 bioclimatic scenario for year
2050 and 2070. The first step is to load this data and extract the areas we
are interested in. Then, simply apply the same function with the same
parameters as used for current conditions.
> ### Future projections ###
> ## load 2050 bioclim variables
> bioclim_world_2050_BC45 <-
stack(c(bio_5 = “WorldClim_data/2050/BC_45/bc45bi505.tif”,
bio_7 = “WorldClim_data/2050/BC_45/bc45bi507.tif”,
bio_11 = “WorldClim_data/2050/BC_45/bc45bi5011.tif”,
bio_19 = “WorldClim_data/2050/BC_45/bc45bi5019.tif”))
> ProLau_
ensemble_
models_proj_
2070_BC45 <-
BIOMOD_
EnsembleForecasting(
EM.output = ProLau_ensemble_
models,
projection.output = ProLau_models_
proj_2070_
BC45,
binary.meth = “TSS”, output.format = “.img”, do.stack = FALSE)
At this stage, we have built all our model predictions for current and
future conditions. Although it is possible to graph maps for all differ-
ent ensemble forecasting approaches, here we have only mapped the
weighted mean ensemble model, for present and future conditions. Note
that the units of projections are predicted habitat suitability multiplied by
1000 (thus on a 0-1000 scale).
These maps in Figure 19.12 suggest that in 2070 the only remaining
areas suitable for our species will be found in the south-west corner of
South Africa.
Figure 19.12 Plot showing the geographic projections using the weighted average
ensemble model for Protea laurifolia under (a) current and (b) future conditions. (A
black and white version of this figure will appear in some formats. For the color version, please
refer to the plate section.)
wm = “Protea.laurifolia/proj_
2070_BC45/
individual_projections/
Protea.laurifolia_EMwmeanByTSS_mergedAlgo_mergedRun_mergedData_
TSSbin.img”))
> SRC_current_2050_BC45$Compt.By.Models
Loss Stable0 Stable1 Gain PercLoss PercGain SpeciesRangeChange
ca 97 37930 268 30 26.57 8.219 -
18.36
wm 78 38107 140 0 35.78 0.000 -
35.78
CurrentRangeSize FutureRangeSize.NoDisp FutureRangeSize.
FullDisp
> SRC_current_2070_
BC45 <-BIOMOD_
RangeSize(
ProLau_
bin_proj_
current,
ProLau_bin_
proj_2070_
BC45)
> SRC_current_2070_BC45$Compt.By.Models
Loss Stable0 Stable1 Gain PercLoss PercGain SpeciesRangeChange
ca 151 37950 214 10 41.37 2.74 38.63
-
wm 103 38107 115 0 47.25 0.00 -
47.25
CurrentRangeSize FutureRangeSize.NoDisp FutureRangeSize.FullDisp
ca 365 214 224
wm 218 115 115
From the SRC output tables, we can see that our species will lose suit-
able habitat in the future. According to the ensemble model, P. laurifolia’s
habitat could be reduced by 25% in 2050 and by nearly 40% in 2070.
These predicted changes in distributions can be plotted as follows (see
Figure 19.13):
> ProLau_src_map <-stack(SRC_current_2050_
BC45$Diff.By.Pixel,
SRC_current_2070_BC45$Diff.By.Pixel)
> names(ProLau_ src_map) <-c(“ca cur-
2050”, “wm cur-2050”, “ca
cur-2070”, “wm cur-2070”)
> library(rasterVis)
> my.at <-seq(-2.5,1.5,1)
> myColorkey <-list(at = my.at, ## where the colors change
labels = list(labels = c(“lost”, “pres”, “abs”, “gain”),
## labels
at = my.at[-
1]-
0.5 ## where to print labels
))
> rasterVis::levelplot(ProLau_
src_
map,
main = “Protea laurifolia range change”,
colorkey = myColorkey, layout = c(2,2))
As expected, the areas that are likely to become unsuitable in the future
are mostly located at the borders of the species range. The final analytical
step might be to try to understand how the different modeling tech-
niques, pseudo-absences sampling, and cross-validation runs influence the
predicted species range changes. In the previous example, we only used
one climate change scenario, but in an extended exercise, the same func-
tion could also be used to further compare the importance of different
climate change scenarios. We will use the ProbDensFunc() function to
try to disentangle the effects of these distinct modeling facets (techniques,
pseudo-absences, cross-validations) on the species range changes.
Figure 19.13 Plot of the predicted range changes for Protea laurifolia between present
and future conditions. (A black and white version of this figure will appear in some formats.
For the color version, please refer to the plate section.)
to get a density plot that shows the predicted species range change
according to the selected facets in the ensemble model (Figure 19.14).
Figure 19.14 Density plot of the predicted species range changes according to the
facets selected in the ensemble model. (A black and white version of this figure will appear
in some formats. For the color version, please refer to the plate section.)
19.1.10 Conclusion
Many options and model parameterization options can be considered
when building projections of species distribution, and these influence the
predictions resulting from this type of analysis. Results and related conclu-
sions thus depend strongly on the initial modeling choice. In this regard, a
model ensembling approach provides more information on the variation
and uncertainty in the predictions relating to these choices, and producing
an ensemble of predictions that is based on the consensus across all choices.
Objectives
The objectives are to apply SDMs to a set of species (herein the
Larus genus) and stack them to produce a resulting prediction of
Data
All data for this example come from online data centers (IUCN,
Worldclim). Most data is downloaded as raster grids. We will trans-
form the initial species occurrences data to fit a XY + “species_name”
formalism.
Methodological Steps
• Loading data from the web
• Formatting the data
• Building a range of models and ensemble models
• Building diversity indices and diversity maps.
Having extracted the species IDs, we can then query the associated spe-
cies’ occurrences. We will then reformat the data.
> ## get species occurrences
> occ_larus <- occ_search( taxonKey = spp_
larus$key,
continent=‘europe’,
fields = c(‘name’, ‘key’, ‘country’, ‘decimalLatitude’,
‘decimalLongitude’), hasCoordinate = TRUE, limit = 500,
return = ‘data’)
> ## remove null items
> occ_larus <-occ_ larus[sapply(occ_
larus,
function(x){!is.null(dim(x))})]
> ## combine all data in a single data.frame
> data <-do.call(rbind, occ_ larus)
In order to avoid any problems with naming the file pathway (on the
local computer), it is a good practice to remove spaces within species
names.
> ## replace “ “ by “.” in species names
> data$name <-sub(“ “, “.”, data$name)
6
www.worldclim.org/
At this point, we have all the species (from GBIF) and climatic (from
Worldclim) data we need to develop this biomod2 example.
From this, we can plot the environmental rasters to ensure that they are
correctly handled.
> # plot(stk_
current)
Figure 19.16 Maps of the three selected variables in Europe. (A black and white version
of this figure will appear in some formats. For the color version, please refer to the plate section.)
We have seen that the maximum correlation between our three variables
is 0.41 (between bio_8 and bio_12, a value well below the 0.7 figure
usually considered acceptable (see Dormann et al., 2013). Let’s extract
these variables from the pool of bioclimatic variables available for current
and future conditions (Figure 19.16).
The variable selection, data collection, and data preparation steps are now
completed, which means we can fit the models.
## build ensemble models
sp_ens_
model <-BIOMOD_EnsembleModeling(
modeling.output = sp_ model, chosen.models = ‘all’,
em.by = ‘all’, eval.metric = c(‘TSS’),
eval.metric.quality.threshold = c(0.7),
models.eval.meth = c(‘TSS’,’ROC’), prob.mean = TRUE, prob.
cv = TRUE, prob.ci = FALSE, prob.ci.alpha = 0.05, prob.median
= FALSE, committee.averaging = TRUE,
prob.mean.weight = TRUE,
prob.mean.weight.decay = ‘proportional’)
## make the projections
proj_scen <-c(“current”, “2050_ BC_
45”, “2070_
BC_
45”)
for(scen in proj_scen){
cat(“\n> projections of “, scen)
## Single model projections
sp_proj <-BIOMOD_
Projection(
modeling.output = sp_ model,
new.env = get(paste(“stk_ ”, scen, sep = ““)),
proj.name = scen, selected.models = ‘all’, binary.meth = “TSS”,
filtered.meth = NULL, compress = TRUE,
build.clamping.mask = TRUE,
do.stack = FALSE, output.format = “.img”)
## Ensemble model projections
sp_ens_
proj <-BIOMOD_EnsembleForecasting(
EM.output = sp_ens_model,
projection.output = sp_ proj, binary.meth = “TSS”,
For each species, a directory is created on the hard drive. This direc-
tory contains all biomod2 modeling and projection outputs for that spe-
cies (see the other example in Part III, or the biomod2 examples and
vignettes (explanation regarding the specific functionalities of a package
based on examples).
Figure 19.17 Species richness (alpha diversity) maps for the three time steps. (A black
and white version of this figure will appear in some formats. For the color version, please refer
to the plate section.)
19.2.7 Conclusion
In this example, we have seen how to model a list of species in parallel
mode using online databases, and how to stack them to produce some
simple species richness (alpha diversity) maps. More advanced tuning of
models, uncertainty analyses, and subsequent analysis may be included
to address more complex questions. We refer interested users to the
Biomod2 documentation for further details.
In this last part, we briefly discuss the advances already made in HSMs
and present the issues currently in development or under debate, which
therefore constitute valuable topics for future HSM research.
The aim of this book was to present HSMs, and the associated theory
and methods. As we have seen, this field has developed tremendously, but
much still remains to be done to better formalize existing approaches
in solid mathematical frameworks. Several aspects of the field are still
under development or were making significant progress at the time of
publication of this book. Here, we have identified some important top-
ics which are currently developing rapidly and could not therefore be
fully discussed in this book. We have mainly identified topics relating
to: (i) further progress in HSMs through metagenomics and remote sens-
ing; (ii) point-process models for presence-only HSM; (iii) hierarchical
Bayesian approaches to integrate models at different scales; (iv) ensem-
bles of small models for rarer species; (v) improving methods to build
ensembles of models, e.g. using Bayesian approaches; (vi) modeling com-
munities through multi-species modeling and joint-species distribution
modeling; and (vii) use of artificial data to assess various methodological
aspects of HSMs, such as which factors affect model building or model
performance.
1
https://methodsblog.wordpress.com/2016/05/24/esms-for-rare-species
References
References · 419
Austin, M. and Gaywood, M. 1994. Current problems of environmental gradients
and species response curves in relation to continuum theory. Journal of Vegetation
Science, 5, 473–482.
Austin, M., Nicholls, A., Doherty, M. and Meyers, J. 1994. Determining species
response functions to an environmental gradient by means of a β-function.
Journal of Vegetation Science, 5, 215–228.
Austin, M. and Smith, T. 1989. A new model for the continuum concept. Plant
Ecology, 83, 35–47.
Austin, M., Belbin, L., Meyers, J., Doherty, M. and Luoto, M. 2006. Evaluation of
statistical models used for predicting plant species distributions: role of artificial
data and theory. Ecological Modelling, 199, 197–216.
Austin, M. P. 1971. Role of regression analysis in plant ecology. Proceedings of the
Ecological Society of Australia, 6, 63–75.
Austin, M. P. 1985. Continuum concept, ordination methods, and niche theory.
Annual Review of Ecology and Systematics, 16, 39–61.
Austin, M. P. 1992. Modeling the environmental niche of plants –implications for
plant community response to elevated CO2 levels. Australian Journal of Botany,
40, 615–630.
Austin, M. P. 2002. Spatial prediction of species distribution: an interface between
ecological theory and statistical modelling. Ecological Modelling, 157, 101–118.
Austin, M. P. 2007. Species distribution models and ecological theory: a criti-
cal assessment and some possible new approaches. Ecological Modelling,
200, 1–19.
Austin, M. P. and Van Niel, K. P. 2011. Improving species distribution models for cli-
mate change studies: variable selection and scale. Journal of Biogeography, 38, 1–8.
Austin, M. P., Nicholls, A. O. and Margules, C. R. 1990. Measurement of the real-
ized qualitative niche: environmental niches of 5 Eucalyptus species. Ecological
Monographs, 60, 161–177.
Ba, J., Hou, Z., Platvoet, D., Zhu, L. and Li, S. 2010. Is Gammarus tigrinus (Crustacea,
Amphipoda) becoming cosmopolitan through shipping? Predicting its poten-
tial invasive range using ecological niche modeling. Hydrobiologia, 649, 183–194.
Bahn, V. and McGill, B. J. 2007. Can niche-based distribution models outperform
spatial interpolation? Global Ecology and Biogeography, 16, 733–742.
Bahn, V., J O’Connor, R. and B Krohn, W. 2006. Importance of spatial autocor-
relation in modeling bird distributions at a continental scale. Ecography, 29,
835–844.
Barbet-Massin, M., Thuiller, W. and Jiguet, F. 2010. How much do we overestimate
future local extinction rates when restricting the range of occurrence data in
climate suitability models? Ecography, 33, 878–886.
Barbet-Massin, M., Jiguet, F., Albert, C. H. and Thuiller, W. 2012. Selecting pseudo-
absences for species distribution models: how, where and how many? Methods
in Ecology and Evolution, 3, 327–338.
Barry, S. and Elith, J. 2006. Error and uncertainty in habitat models. Journal of Applied
Ecology, 43, 413–423.
Bartholome, E. and Belward, A. S. 2005. GLC2000: a new approach to global land
cover mapping from Earth observation data. International Journal of Remote
Sensing, 26, 1959–1977.
References · 421
example of Zygaena carniolica and Coenonympha arcania. Biological Conservation,
126, 247–259.
Blondel, J. and Aronson, J. 1995. Biodiversity and ecosystem function in the
Mediterranean basin: human and non-human determinants. In Davis, G.W. and
Richardson, D. M. (eds), Ecological Studies. Berlin: Springer-Verlag, pp. 43–119.
Bocedi, G., Zurell, D., Reineking, B. and Travis, J. M. J. 2014. Mechanistic modelling
of animal dispersal offers new insights into range expansion dynamics across
fragmented landscapes. Ecography, 37, 1240–1253.
Bombi, P. and D’Amen, M. 2012. Scaling down distribution maps from atlas data: a
test of different approaches with virtual species. Journal of Biogeography, 39,
640–651.
Bombi, P., Salvi, D.,Vignoli, L. and Bologna, M. A. 2009. Modelling Bedriaga’s rock
lizard distribution in Sardinia: an ensemble approach. Amphibia–Reptilia, 30,
413–424.
Booth, T. H., Nix, H. A., Busby, J. R. and Hutchinson, M. F. 2014. BIOCLIM: the
first species distribution modelling package, its early applications and relevance
to most current MAXENT studies. Diversity and Distributions, 20, 1–9.
Botkin, D. B., Saxe, H., Araujo, M. B., et al. 2007. Forecasting the effects of global
warming on biodiversity. Bioscience, 57, 227–236.
Boucher, D. H., James, S. and Keeler, K. H. 1982. The ecology of mutualism. Annual
Review of Ecology and Systematics, 13, 315–347.
Boucher, F. C., Thuiller, W., Roquet, C., et al. 2012. Reconstructing the origins of
high-alpine niches and cushion life form in the genus Androsace sl (Primulaceae).
Evolution, 66, 1255–1268.
Boulangeat, I., Gravel, D. and Thuiller, W. 2012a. Accounting for dispersal and biotic
interactions to disentangle the drivers of species distributions and their abun-
dances. Ecology Letters, 15, 584–593.
Boulangeat, I., Philippe, P., Abdulhak, S., et al. 2012b. Improving plant functional
groups for dynamic models of biodiversity: at the crossroads between func-
tional and community ecology. Global Change Biology, 18, 3464–3475.
Boulangeat, I., Georges, D. and Thuiller, W. 2014. FATE-HD: a spatially and tempo-
rally explicit integrated model for predicting vegetation structure and diversity
at regional scale. Global Change Biology, 20, 2368–2378.
Box, E. O. 1981. Macroclimate and Plant Forms: An Introduction to Predictive Modeling in
Phytogeography, The Hague: Junk.
Boyce, M. S.,Vernier, P. R., Nielsen, S. E. and Schmiegelow, F. K. A. 2002. Evaluating
resource selection functions. Ecological Modelling, 157, 281–300.
Braconnot, P., Otto-Bliesner, B., Harrison, S., et al. 2007. Results of PMIP2 coupled
simulations of the mid-holocene and last glacial maximum. Part I: experiments
and large-scale features. Climate of the Past, 3, 261–277.
Breiman, L. 1996. Bagging predictors. Machine Learning, 24, 123–140.
Breiman, L. 2001. Random forests. Machine Learning, 45, 5–32.
Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. 1984. Classification and
Regression Trees, New York: Chapman and Hall.
Breiner, F. T., Guisan, A., Bergamini, A. and Nobis, M. P. 2015. Overcoming limita-
tions of modelling rare species by using ensembles of small models. Methods in
Ecology and Evolution, 6(10), 1210–1218.
References · 423
Buckley, L. B., Davies, T. J., Ackerly, D. D., et al. 2010. Phylogeny, niche conservatism
and the latitudinal diversity gradient in mammals. Proceedings of the Royal Society
of London B: Biological Sciences, 277, 2131–2138.
Buisson, L., Thuiller, W., Lek, S., Lim, P. and Grenouillet, G. 2008. Climate change
hastens the turnover of stream fish assemblages. Global Change Biology, 14,
2232–2248.
Buisson, L., Thuiller, W., Casajus, N., Lek, S. and Grenouillet, G. 2010. Uncertainty
in ensemble forecasting of species distribution. Global Change Biology, 16,
1145–1157.
Buisson, L., Grenouillet, G.,Villéger, S., Canal, J. and Laffaille, P. 2013. Toward a loss
of functional diversity in stream fish assemblages under climate change. Global
Change Biology, 19, 387–400.
Burgman, M. A. and Fox, J. C. 2003. Bias in species range estimates from minimum
convex polygons: implications for conservation and options for improved plan-
ning. Animal Conservation, 6, 19–28.
Burnham, K. P. and Anderson, D. R. 2002. Model Selection and Multimodel
Inference: A Practical Information-Theoretic Approach. Berlin: Springer.
Busby, J. R. 1991. BIOCLIM: a bioclimate analysis and prediction system. In
Margules, C. R. and Austin, M. P. (eds.), Nature Conservation: Cost Effective
Biological Surveys and Data Analysis. Canberra, Australia: CSIRO, pp. 64–68.
Calenge, C. and Basille, M. 2008. A general framework for the statistical exploration
of the ecological niche. Journal of Theoretical Biology, 252, 674–685.
Caley, M. J. and Schluter, D. 1997.The relationship between local and regional diver-
sity. Ecology, 78, 70–80.
Callaway, R. M. 1995. Positive interactions among plants (interpreting botanical pro-
gress). The Botanical Review, 61, 306–349.
Calvete, C., Estrada, R., Miranda, M. A., et al. 2008. Modelling the distributions and
spatial coincidence of bluetongue vectors Culicoides imicola and the Culicoides
obsoletus group throughout the Iberian peninsula. Medical and Veterinary
Entomology, 22, 124–134.
Carl, G. and Kühn, I. 2007. Analyzing spatial autocorrelation in species distributions
using Gaussian and logit models. Ecological Modelling, 207, 159–170.
Carlson, B. Z., Georges, D., Rabatel, A., et al. 2014. Accounting for tree line shift,
glacier retreat and primary succession in mountain plant distribution models.
Diversity and Distributions, 20, 1379–1391.
Carnaval, A. C., Hickerson, M. J., Haddad, C. F., Rodrigues, M. T. and Moritz, C.
2009. Stability predicts genetic diversity in the Brazilian Atlantic forest hotspot.
Science, 323, 785–789.
Carpenter, G., Gillison, A. N. and Winter, J. 1993. DOMAIN: a flexible modelling
procedure for mapping potential distributions of plants and animals. Biodiversity
and Conservation, 2, 667–680.
Carroll, C., Johnson, D. S., Dunk, J. R. and Zielinski, W. J. 2010. Hierarchical
Bayesian spatial models for multispecies conservation planning and monitor-
ing. Conservation Biology, 24, 1538–1548.
Carvalho, S. B., Brito, J. C., Crespo, E. G., Watts, M. E. and Possingham, H. P.
2011. Conservation planning under climate change: toward accounting for
References · 425
Côté, I. M. and Reynolds, J. D. 2002. Predictive ecology to the rescue? Science, 298,
1181–1182.
Cox, B. 2001. The biogeographic regions reconsidered. Journal of Biogeography, 28,
511–523.
Crase, B., Liedloff, A. C. and Wintle, B. A. 2012. A new method for dealing with
residual spatial autocorrelation in species distribution models. Ecography, 35,
879–888.
Crase, B., Liedloff, A., Vesk, P. A., Fukuda, Y. and Wintle, B. A. 2014. Incorporating
spatial autocorrelation into species distribution models alters forecasts of
climate-mediated range shifts. Global Change Biology, 20, 2566–2579.
Cressie, N. 1993. Geostatistics: a tool for environmental models. In Goodchild, M. F.,
Parks, B. O. and Steyaert, L. T. (eds.), Environmental Modeling with GIS. Oxford,
UK: Oxford University Press, pp. 414–421.
Crimmins, S. M., Dobrowski, S. Z., Greenberg, J. A., Abatzoglou, J.T. and Mynsberge,
A. R. 2011. Changes in climatic water balance drive downhill shifts in plant
species’ optimum elevations. Science, 331, 324–327.
Cutler, D. R., Edwards, T. C., Beard, K. H., Cutler, A. and Hess, K. T. 2007. Random
forests for classification in ecology. Ecology, 88, 2783–2792.
D’Amen, M., Zimmermann, N. E. and Pearman, P. B. 2013. Conservation of phylo-
geographic lineages under climate change. Global Ecology and Biogeography, 22,
93–104.
D’Amen, M., Dubuis, A., Fernandes, R. F., et al. 2015a. Using species richness and
functional traits predictions to constrain assemblage predictions from stacked
species distribution models. Journal of Biogeography, 42, 1255–1266.
D’Amen, M., Zimmermann, N. E., Rahbek, C. and Guisan, A. 2015b. Spatial predic-
tion at the community level: state of the art and future perspectives. Biological
Reviews, 92, 169–187.
Daly, C., Neilson, R. P. and Phillips, D. L. 1994. A statistical topographic model
for mapping climatological precipitation over mountainous terrain. Journal of
Applied Meteorology, 33, 140–158.
Davis, E. B., McGuire, J. L. and Orcutt, J. D. 2014. Ecological niche models of mam-
malian glacial refugia show consistent bias. Ecography, 37, 1133–1138.
Davis, M. B. 1989. Lags in vegetation response to greenhouse warming. Climatic
Change, 15, 75–82.
Dawson, T. P., Curran, P. J. and Plummer, S. E. 1998. The biochemical decompo-
sition of slash pine needles from reflectance spectra using neural networks.
International Journal of Remote Sensing, 19, 1433–1438.
De’Ath, G. 2007. Boosted trees for ecological modeling and prediction. Ecology, 88,
243–251.
De’Ath, G. and Fabricius, K. E. 2000. Classification and regression trees: a powerful
yet simple technique for ecological data analysis. Ecology, 81, 3178–3192.
de Oliveira, S. V., Escobar, L. E., Peterson, A. T. and Gurgel-Gonçalves, R. 2013.
Potential geographic distribution of hantavirus reservoirs in Brazil. Plos One,
8, e85137.
de Witte, L. C. and Stöcklin, J. 2010. Longevity of clonal plants: why it matters and
how to measure it. Annals of botany, 1–12.
References · 427
Dorazio, R. M. 2014. Accounting for imperfect detection and survey bias in statistical
analysis of presence-only data. Global Ecology and Biogeography, 23, 1472–1484.
Dormann, C. F. 2007a. Assessing the validity of autologistic regression. Ecological
Modelling, 207, 234–242.
Dormann, C. F. 2007b. Effects of incorporating spatial autocorrelation into the anal-
ysis of species distribution data. Global Ecology and Biogeography, 30, 609–628.
Dormann, C. F., McPherson, J. M., Araujo, M. B., et al. 2007. Methods to account
for spatial autocorrelation in the analysis of species distributional data: a review.
Ecography, 30, 609–628.
Dormann, C. F., Purschke, O., Marquez, J. R. G., Lautenbach, S. and Schroder, B.
2008. Components of uncertainty in species distribution analysis: a case study
of the great grey shrike. Ecology, 89, 3371–3386.
Dormann, C. F., Schymanski, S. J., Cabral, J., et al. 2012. Correlation and process in
species distribution models: bridging a dichotomy. Journal of Biogeography, 39,
2119–2131.
Dormann, C. F., Elith, J., Bacher, S., et al. 2013. Collinearity: a review of methods
to deal with it and a simulation study evaluating their performance. Ecography,
36, 27–46.
Drake, J. M., Randin, C. and Guisan, A. 2006. Modelling ecological niches with sup-
port vector machines. Journal of Applied Ecology, 43, 424–432.
Dray, S., Chessel, D. and Thioulouse, J. 2003. Co-inertia analysis and the linking of
ecological data tables. Ecology, 84, 3078–3089.
Dubuis, A., Pottier, J., Rion,V., et al. 2011. Predicting spatial patterns of plant species
richness: a comparison of direct macroecological and species stacking model-
ling approaches. Diversity and Distributions, 17, 1122–1131.
Dubuis, A., Giovanettina, S., Pellissier, L., et al. 2013. Improving the prediction of
plant species distribution and community composition by adding edaphic to
topo-climatic variables. Journal of Vegetation Science, 24, 593–606.
Duckworth, J., Bunce, R. and Malloch, A. 2000. Vegetation–environment relation-
ships in Atlantic European calcareous grasslands. Journal of Vegetation Science,
11, 15–22.
Dullinger, S., Gattringer, A., Thuiller, W., et al. 2012. Extinction debt of high-
mountain plants under twenty-first-century climate change. Nature Climate
Change, 2, 619–622.
Edwards, T. C., Cutler, D. R., Zimmermann, N. E., Geiser, L. and Alegria, J. 2005.
Model- based stratifications for enhancing the detection of rare ecological
events. Ecology, 86, 1081–1090.
Edwards,T. C., Cutler, D. R., Zimmermann, N. E., Geiser, L. and Moisen, G. G. 2006.
Effects of sample survey design on the accuracy of classification tree models in
species distribution models. Ecological Modelling, 199, 132–141.
Efron, B. and Gong, G. 1983. A leisurely look at the bootstrap, the jackknife, and
cross- validation. American Statistician, 37, 36–48.
Efron, B. and Tibshirani, R. 1993. An Introduction to the Bootstrap, New York: Chapman
and Hall.
Efron, B. and Tibshirani, R. 1997. Improvements on cross-validation: The .632+
bootstrap method. Journal of the American Statistical Association, 92, 548–560.
References · 429
Estrada-Peña, A. and Thuiller, W. 2008. An assessment of the effect of data partition-
ing on the performance of modelling algorithms for habitat suitability for ticks.
Medical and Veterinary Entomology, 22, 248–257.
Estrada-Peña, A. and Venzal, J. M. 2007. Climate niches of tick species in the
Mediterranean region: modeling of occurrence data, distributional constraints,
and impact of climate change. Journal of Medical Entomology, 44, 1130–1138.
Evans, M. E., Smith, S. A., Flynn, R. S. and Donoghue, M. J. 2009. Climate, niche
evolution, and diversification of the “bird-cage” evening primroses (Oenothera,
Sections Anogra and Kleinia). The American Naturalist, 173, 225–240.
Farber, O. and Kadmon, R. 2003. Assessment of alternative approaches for biocli-
matic modeling with special emphasis on the Mahalanobis distance. Ecological
Modelling, 160, 115–130.
Fernandes, R. F.,Vicente, J. R., Georges, D., et al. 2014. A novel downscaling approach
to predict plant invasions and improve local conservation actions. Biological
Invasions, 16, 2577–2590.
Ferreira, M. P., Zortea, M., Zanotta, D. C., Shimabukuro,Y. E. and de Souza Filho, C.
R. 2016. Mapping tree species in tropical seasonal semi-deciduous forests with
hyperspectral and multispectral data. Remote Sensing of Environment, 179, 66–78.
Ferrier, S. 1984. The status of the Rufous Scrub-Bird Atrichornis rufescens: habitat, geo-
graphical variation and abundance. PhD Thesis. Armidale, Australia: University of
New England.
Ferrier, S. and Guisan, A. 2006. Spatial modelling of biodiversity at the community
level. Journal of Applied Ecology, 43, 393–404.
Ferrier, S. and Watson, G. 1997. An Evaluation of the Effectiveness of Environmental
Surrogates and Modelling Techniques in Predicting the Distribution of Biological
Diversity. Canberra, Australia: NSW National Parks and Wildlife Service.
Ferrier, S., Drielsma, M., Manion, G. and Watson, G. 2002. Extended statistical
approaches to modelling spatial pattern in biodiversity in north-east New
South Wales. II. Community-level modelling. Biodiversity and Conservation, 11,
2309–2338.
Ferrier, S., Manion, G., Elith, J. and Richardson, K. 2007. Using generalized dissimi-
larity modelling to analyse and predict patterns of beta diversity in regional
biodiversity assessment. Diversity and Distributions, 13, 252–264.
Fielding, A. H. 2002. What are the appropriate characteristics of an accuracy meas-
ure? In Scott, J. M., Heglund, P. J., Morrison, M. L., et al. (eds), Predicting Species
Occurrences: Issues of Accuracy and Scale. Covelo, California: Island Press.
Fielding, A. H. and Bell, J. F. 1997. A review of methods for the assessment of
prediction errors in conservation presence– absence models. Environmental
Conservation, 24, 38–49.
Filchak, K. E., Roethele, J. B. and Feder, J. L. 2000. Natural selection and sympatric
divergence in the apple maggot Rhagoletis pomonella. Nature, 407, 739–742.
Fithian, W., Elith, J., Hastie, T. and Keith, D. A. 2015. Bias correction in species distri-
bution models: pooling survey and collection data for multiple species. Methods
in Ecology and Evolution, 6, 424–438.
Fitzpatrick, M. C. and Hargrove, W. W. 2009. The projection of species distribution
models and the problem of non-analog climate. Biodiversity and Conservation,
18, 2255–2261.
References · 431
Gallant, D., Slough, B. G., Reid, D. G. and Berteaux, D. 2012. Arctic fox versus red
fox in the warming Arctic: four decades of den surveys in north Yukon. Polar
Biology, 35, 1421–1431.
Gallien, L., Münkemüller, T., Albert, C. H., Boulangeat, I. and Thuiller, W. 2010.
Predicting potential distributions of invasive species: where to go from here?
Diversity and Distributions, 16, 331–342.
Gallien, L., Douzet, R., Pratte, S., Zimmermann, N. E. and Thuiller,W. 2012. Invasive
species distribution models: how violating the equilibrium assumption can
create new insights? Global Ecology and Biogeography, 21, 1126–1136.
Gallien, L., Mazel, F., Lavergne, S., et al. 2015. Contrasting the effects of environment,
dispersal and biotic interactions to explain the distribution of invasive plants in
alpine communities. Biological Invasions, 17, 1407–1423.
Gao, B. C. 1996. NDWI: A normalized difference water index for remote sensing of
vegetation liquid water from space. Remote Sensing of Environment, 58, 257–266.
Gause, G. F. 1936. The Struggle for Existence. Baltimore, MD: Williams and Wilkins.
Gehrig-Fasel, J., Guisan, A. and Zimmermann, N. E. 2007. Tree line shifts in the
Swiss Alps: climate change or land abandonment? Journal of Vegetation Science,
18, 571–582.
Gelfand, A. E., Silander, J. A., Wu, S., et al. 2006. Explaining species distribution pat-
terns through hierarchical modeling. Bayesian Analysis, 1, 41–92.
Gellrich, M. and Zimmermann, N. E. 2007. Investigating the regional-scale pattern
of agricultural land abandonment in the Swiss mountains: a spatial statistical
modelling approach. Landscape and Urban Planning, 79, 65–76.
Goldewijk, K. K. 2001. Estimating global land use change over the past 300 years: the
HYDE Database. Global Biogeochemical Cycles, 15, 417–433.
Golding, N. and Purse, B. V. 2016. Fast and flexible Bayesian species distribution
modelling using Gaussian processes. Methods in Ecology and Evolution.
Gotelli, N. J., Graves, G. R. and Rahbek, C. 2010. Macroecological signals of spe-
cies interactions in the Danish avifauna. Proceedings of the National Academy of
Sciences, 107, 5030–5035.
Grace, J. B. and Wetzel, R. G. 1981. Habitat partitioning and competitive displace-
ment in Cattails (Typha): experimental field studies. American Naturalist, 118,
463–474.
Graham, C. H., Ferrier, S., Huettman, F., Moritz, C. and Peterson, A. T. 2004a. New
developments in museum-based informatics and applications in biodiversity
analysis. Trends in Ecology and Evolution, 19, 497–503.
Graham, C. H., Ron, S. R., Santos, J. C., Schneider, C. J. and Moritz, C. 2004b.
Integrating phylogenetics and environmental niche models to explore specia-
tion mechanisms in dendrobatid frogs. Evolution, 58, 1781–1793.
Grant, P. R. and Grant, B. R. 2009. The secondary contact phase of allopatric spe-
ciation in Darwin’s finches. Proceedings of the National Academy of Sciences of the
United States of America, 106, 20141–20148.
Gravel, D., Massol, F., Canard, E., Mouillot, D. and Mouquet, N. 2011.Trophic theory
of island biogeography. Ecology Letters, 14, 1010–1016.
Graves, G. R. and Rahbek, C. 2005. Source pool geometry and the assembly of
continental avifaunas. Proceedings of the National Academy of Sciences of the United
States of America, 102, 7871–7876.
References · 433
Hair, J. F., Black,W. C., Babin, B. J.,Anderson, R. E. and Tatham, R. L. 2006. Multivariate
Data Analysis. Upper Saddle River, NJ: Pearson Prentice Hall.
Hakkarainen, H., Mykra, S., Kurki, S.,Tornberg, R. and Jungell, S. 2004. Competitive
interactions among raptors in boreal forests. Oecologia, 141, 420–424.
Halvorsen, R., Mazzoni, S., Bryn, A. and Bakkestuen, V. 2015. Opportunities for
improved distribution modelling practice via a strict maximum likelihood
interpretation of MaxEnt. Ecography, 38, 172–183.
Hanberry, B. B., He, H. S. and Palik, B. J. 2012. Pseudoabsence generation strategies
for species distribution models. Plos One, 7.
Hansen, M. C., Defries, R. S., Townshend, J. R. G. and Sohlberg, R. 2000. Global
land cover classification at 1 km spatial resolution using a classification tree
approach. International Journal of Remote Sensing, 21, 1331–1364.
Hansen, M. C., DeFries, R. S., Townshend, J. R. G., et al. 2002. Towards an oper-
ational MODIS continuous field of percent tree cover algorithm: examples
using AVHRR and MODIS data. Remote Sensing of Environment, 83, 303–319.
Hanski, I. and Gilpin, M. E. 1997. Metapopulation Biology, San Diego,
CA: Academic Press.
Hanspach, J., Kühn, I., Pompe, S. and Klotz, S. 2010. Predictive performance of plant
species distribution models depends on species traits. Perspectives in Plant Ecology,
Evolution and Systematics, 12, 219–225.
Hanssen, A. J. and Kuipers, W. J. 1965. On the relationship between the frequency of
rain and various meteorological parameters. Meded Verhand, 81, 2–15.
Harrell, F. E. 2001. Regression Modeling Strategies: With Applications to Linear Models,
Logistic Regression, and Survival Analysis. Berlin: Springer.
Harrell, F. E., Lee, K. L. and Mark, D. B. 1996. Multivariable prognostic models: Issues
in developing models, evaluating assumptions and adequacy, and measuring and
reducing errors. Statistics in Medicine, 15, 361–387.
Harris, D. J. 2015. Generating realistic assemblages with a joint species distribution
model. Methods in Ecology and Evolution, 6, 465–473.
Harte, J. 2011. Maximum Entropy and Ecology: A Theory of Abundance, Distribution, and
Energetics. London: Oxford University Press.
Hastie, T. and Tibishirani, R. 1986. Generalized additive models. Statistical Science, 1,
297–318.
Hastie, T. J. and Tibshirani, R. 1990. Generalized Additive Models, London: Chapman
and Hall.
Hastie, T., Tibshirani, R. and Buja, A. 1994. Flexible discriminant analysis by optimal
scoring. Journal of the American Statistical Association, 89(428), 1255–1270.
Hastie,T.,Tibshirani, R. and Friedman, J. 2009. The Elements of Statistical Learning: Data
Mining, Inference, and Prediction. Berlin: Springer.
Hastings, D. A., Dunbar, P. K., Elphingstone, G. M., et al. 1999. The Global Land
One- kilometer Base Elevation (GLOBE) Digital Elevation Model, Version 1.0.
Boulder, CO: National Oceanic and Atmospheric Administration, National
Geophysical Data Center.
Hausser, J. (ed.) 1995. Säugetiere der Schweitz. Atlas des Mammifères de Suisse. Mammiferi
della Svizzera. Basel: Birkhäuser Verlag.
Hautier, Y., Randin, C. F., Stocklin, J. and Guisan, A. 2009. Changes in reproductive
investment with altitude in an alpine plant. Journal of Plant Ecology, 2, 125–134.
References · 435
Hirzel, A. H., Posse, B., Oggier, P. A., et al. 2004. Ecological requirements of reintro-
duced species and the implications for release policy: the case of the bearded
vulture. Journal of Applied Ecology, 41, 1103–1116.
Hirzel, A. H., Le Lay, G., Helfer, V., Randin, C. and Guisan, A. 2006. Evaluating
the ability of habitat suitability models to predict species presences. Ecological
Modelling, 199, 142–152.
Hoerl, A. E. and Kennard, R. W. 1970. Ridge regression: biased estimation for non-
orthogonal problems. Technometrics, 12, 55–67.
Holdridge, L. R. 1967. Life Zone Ecology, San Jose, Costa Rica: Tropical Science
Center.
Homer, C. H., Fry, J. A. and Barnes, C. A. 2012. The national land cover database. US
Geological Survey Fact Sheet 2012–3020. Reston,VA: USGS. Available at: http://
pubs. usgs. gov/fs/2012/3020/fs2012-3020.pdf.
Hooten, M. B., Larsen, D. R. and Wikle, C. K. 2003. Predicting the spatial distribu-
tion of ground flora on large domains using a hierarchical Bayesian model.
Landscape Ecology, 18, 487–502.
Hortal, J., De Marco Jr, P., Santos, A. and Diniz-Filho, J. A. F. 2012. Integrating bio-
geographical processes and local community assembly. Journal of Biogeography,
39, 627–628.
Hoskin, C. J., Higgie, M., McDonald, K. R. and Moritz, C. 2005. Reinforcement
drives rapid allopatric speciation. Nature, 437, 1353–1356.
Howard, C., Stephens, P. A., Pearce-Higgins, J. W., Gregory, R. D. and Willis, S. G.
2014. Improving species distribution models: the value of data on abundance.
Methods in Ecology and Evolution, 5, 506–513.
Howe, A. and Chain, P. S. 2015. Challenges and opportunities in understanding
microbial communities with metagenome assembly (accompanied by IPython
Notebook tutorial). Frontiers in Microbiology, 6, 678.
Hubbell, S. P. 2001. The Unified Neutral Theory of Biodiversity and Biogeography.
Princeton, NJ: Princeton University Press.
Huete, A.R. 1988. A soil- adjusted vegetation index (SAVI). Remote Sensing of
Environment, 25, 295–309.
Hugall, A., Moritz, C., Moussalli, A. and Stanisic, J. 2002. Reconciling paleodistri-
bution models and comparative phylogeography in the Wet Tropics rainforest
land snail Gnarosophia bellendenkerensis (Brazier 1875). Proceedings of the National
Academy of Sciences, 99, 6112–6117.
Hui, F. K., Warton, D. I. and Foster, S. D. 2015. Multi-species distribution modeling
using penalized mixture of regressions. The Annals of Applied Statistics, 9, 866–882.
Huntley, B., Green, R. E., Collingham,Y. C., et al. 2004. The performance of models
relating species geographical distributions to climate is independent of trophic
level. Ecology Letters, 7, 417–426.
Hurtt, G. C., Chini, L. P., Frolking, S., et al. 2011. Harmonization of land-use sce-
narios for the period 1500–2100: 600 years of global gridded annual land-use
transitions, wood harvest, and resulting secondary lands. Climatic Change, 109,
117–161.
Huston, M. A. 2002. Introductory essay: critical issues for improving predictions.
In Scott, J. M., Heglund, P. J., Morrison, M. L., et al. (eds.) Predicting Species
Occurrences: Issues of Accuracy and Scale. Covelo, California: Island Press, pp. 7–21.
References · 437
Jensen, O. P., Seppelt, R., Miller,T. J. and Bauer, L. J. 2005.Winter distribution of blue
crab Callinectes sapidus in Chesapeake Bay: application and cross-validation
of a two-stage generalized additive model. Marine Ecology-Progress Series, 299,
239–255.
Jimenez-Valverde, A. and Lobo, J. M. 2007. Threshold criteria for conversion of
probability of species presence to either-or presence–absence. Acta Oecologica –
International Journal of Ecology, 31, 361–369.
Jimenez-Valverde, A., Lobo, J. M. and Hortal, J. 2009. The effect of prevalence and
its interaction with sample size on the reliability of species distribution models.
Community Ecology, 10, 196–205.
Jimenez-Alfaro, B., Draper, D. and Nogues-Bravo, D. 2012. Modeling the potential
area of occupancy at fine resolution may reduce uncertainty in species range
estimates. Biological Conservation, 147, 190–196.
Johnson, C. J. and Gillingham, M. P. 2005. An evaluation of mapped species distri-
bution models used for conservation planning. Environmental Conservation, 32,
117–128.
Johnson, C. J. and Gillingham, M. P. 2008. Sensitivity of species-distribution models
to error, bias, and model design: an application to resource selection functions
for woodland caribou. Ecological Modelling, 213, 143–155.
Johnson, J. B. and Omland, K. S. 2004. Model selection in ecology and evolution.
Trends in Ecology and Evolution, 19, 101–108.
Johnston,T. H. 1924.The relation of climate to the spread of prickly pear. Transactions
of the Royal Society of South Australia, 48, 269–295.
Jones, H. G. 1992. Plants and Microclimate. A Quantitative Approach to Environmental
Plant Physiology, Cambridge, UK: Cambridge University Press.
Ju, J. C., Kolaczyk, E. D. and Gopal, S. 2003. Gaussian mixture discriminant analysis
and sub-pixel land cover characterization in remote sensing. Remote Sensing of
Environment, 84, 550–560.
Kadmon, R., Farber, O. and Danin, A. 2004. Effect of roadside bias on the accuracy
of predictive maps produced by bioclimatic models. Ecological Applications, 14,
401–413.
Kaplan, J. O., Krumhardt, K. M. and Zimmermann, N. E. 2009. The prehistoric
and preindustrial deforestation of Europe. Quaternary Science Reviews, 28,
3016–3034.
Kauth, R. J. and Thomas, G. S. 1976. The tasselled cap: a graphic description of the
spectral-temporal development of agricultural crops as seen by Landsat. LARS
Symposia, 159.
Kearney, M. 2006. Habitat, environment and niche: what are we modelling? Oikos,
115, 186–191.
Kearney, M. and Porter, W. P. 2004. Mapping the fundamental niche: physiology, cli-
mate, and the distribution of a nocturnal lizard. Ecology, 85, 3119–3131.
Kearney, M. and Porter, W. 2009. Mechanistic niche modelling: combining physi-
ological and spatial data to predict species’ ranges. Ecology Letters, 12, 334–350.
Keil, P., Belmaker, J., Wilson, A. M., Unitt, P. and Jetz, W. 2013. Downscaling of
species distribution models: a hierarchical approach. Methods in Ecology and
Evolution, 4, 82–94.
References · 439
Lassueur, T., Joost, S. and Randin, C. F. 2006. Very high resolution digital eleva-
tion models: Do they improve models of plant species distribution? Ecological
Modelling, 198, 139–153.
Lavergne, S. and Molofsky, J. 2007. Increased genetic variation and evolutionary
potential drive the success of an invasive grass. Proceedings of the National Academy
of Sciences of the United States of America, 104, 3883–3888.
Lawler, J. J., White, D., Neilson, R. P. and Blaustein, A. R. 2006. Predicting climate-
induced range shifts: model differences and model reliability. Global Change
Biology, 12, 1568–1584.
Lawler, J. J., Shafer, S. L., White, D., et al. 2009. Projected climate-induced faunal
change in the Western Hemisphere. Ecology, 90, 588–597.
Le Cessie, S. and van Houwelingen, J. C. 1992. Ridge estimators in logistic regres-
sion. Applied Statistics, 41, 191–201.
Leathwick, J., Elith, J. and Hastie, T. 2006. Comparative performance of generalized
additive models and multivariate adaptive regression splines for statistical mod-
elling of species distributions. Ecological Modelling, 199, 188–196.
Leathwick, J., Moilanen, A., Francis, M., et al. 2008. Novel methods for the design
and evaluation of marine protected areas in offshore waters. Conservation Letters,
1, 91–102.
Leathwick, J. R. 1998. Are New Zealand’s Nothofagus species in equilibrium with
their environment? Journal of Vegetation Science, 9, 719–732.
Leathwick, J. R. 2002. Intra-generic competition among Nothofagus in New Zealand’s
primary indigenous forests. Biodiversity and Conservation, 11, 2177–2187.
Leathwick, J. R. and Austin, M. P. 2001. Competitive interactions between
tree species in New Zealand’s old-g rowth indigenous forests. Ecology, 82,
2560–2573.
Leathwick, J. R., Rowe, D., Richardson, J., Elith, J. and Hastie, T. 2005. Using multi-
variate adaptive regression splines to predict the distributions of New Zealand’s
freshwater diadromous fish. Freshwater Biology, 50, 2034–2052.
Legendre, P. 1993. Spatial autocorrelation: trouble or new paradigm? Ecology, 74,
1659–1673.
Legendre, P., Dale, M. R., Fortin, M. J., et al. 2002. The consequences of spatial
structure for the design and analysis of ecological field surveys. Ecography, 25,
601–615.
Legendre, P., Borcard, D. and Peres-Neto, P. R. 2005. Analyzing beta diversity: par-
titioning the spatial variation of community composition data. Ecological
Monographs, 75, 435–450.
Lehmann, A., Overton, J. M. and Leathwick, J. R. 2002. GRASP: generalized regres-
sion analysis and spatial prediction. Ecological Modelling, 157, 189–207.
Leibold, M. A. 1995. The niche concept revisited: mechanistic models and commu-
nity context. Ecology, 76, 1371–1382.
Leibold, M. A. and McPeek, M. A. 2006. Coexistence of the niche and neutral per-
spectives in community ecology. Ecology, 87, 1399–1410.
Lek, S., Delacoste, M., Baran, P., et al. 1996. Application of neural networks to mod-
elling nonlinear relationships in ecology. Ecological Modelling, 90, 39–52.
Lenoir, J. and Svenning, J. C. 2015. Climate-related range shifts–a global multidimen-
sional synthesis and new research directions. Ecography, 38, 15–28.
References · 441
Lomba, A., Pellissier, L., Randin, C., et al. 2010. Overcoming the rare species model-
ling paradox: a novel hierarchical framework applied to an Iberian endemic
plant. Biological Conservation, 143, 2647–2657.
Lomolino, M.V., Riddle, B. R., Whittaker, R. J. and Brown, J. H. 2010. Biogeography.
Sunderland, MA: Sinauer Associates.
Lorenzen, E. D., Nogues-Bravo, D., Orlando, L., et al. 2011. Species-specific responses
of Late Quaternary megafauna to climate and humans. Nature, 479, 359-U195.
Lortie, C. J., Brooker, R. W., Choler, P., et al. 2004. Rethinking plant community
theory. Oikos, 107, 433–438.
Lowry, J., Ramsey, R. D., Thomas, K., et al. 2007. Mapping moderate-scale land-
cover over very large geographic areas within a collaborative framework: a case
study of the Southwest Regional Gap Analysis Project (SWReGAP). Remote
Sensing of Environment, 108, 59–73.
Luoto, M., Heikkinen, R. K., Poyry, J. and Saarinen, K. 2006. Determinants of
the biogeographical distribution of butterflies in boreal regions. Journal of
Biogeography, 33, 1764–1778.
Luoto, M., Virkkala, R. and Heikkinen, R. K. 2007. The role of land cover in bio-
climatic models depends on spatial resolution. Global Ecology and Biogeography,
16, 34–42.
Lyet, A.,Thuiller,W., Cheylan, M. and Besnard, A. 2013. Fine-scale regional distribu-
tion modelling of rare and threatened species: bridging GIS Tools and conser-
vation in practice. Diversity and Distributions, 19, 651–663.
Lyons, K. G. and Schwartz, M. W. 2001. Rare species loss alters ecosystem function–
invasion resistance. Ecology Letters, 4, 358–365.
Lyons, K. G., Brigham, C., Traut, B. and Schwartz, M. W. 2005. Rare species and
ecosystem functioning. Conservation Biology, 19, 1019–1024.
Mac Nally, R. 2000. Regression and model-building in conservation biology, bioge-
ography and ecology: the distinction between –and reconciliation of –“pre-
dictive” and “explanatory” models. Biodiversity and Conservation, 9, 655–671.
MacArthur, R. H. 1968.The theory of the niche. In Lewontin, R. C. (ed.), Population
Biology and Evolution. Syracuse, NY: Syracuse University Press.
MacArthur, R. H. 1972. Geographical Ecology. New-York: Harper & Row.
Maggini, R., Lehmann, A., Zimmermann, N. E. and Guisan, A. 2006. Improving
generalized regression analysis for the spatial prediction of forest communities.
Journal of Biogeography, 33, 1729–1749.
Maher, S. P., Randin, C. F., Guisan, A. and Drake, J. M. 2014. Pattern-recognition eco-
logical niche models fit to presence-only and presence–absence data. Methods
in Ecology and Evolution, 5, 761–770.
Maiorano, L., Cheddadi, R., Zimmerman, N. E., et al. 2013. Building the niche
through time: using 13,000 years of data to predict the effects of climate
change on tree species in Europe. Global Ecology and Biogeography, 22, 302–317.
Malanson, G. P., Westman, W. E. and Yan, Y.-L. 1992. Realized versus fundamental
niche functions in a model of chaparral response to climatic change. Ecological
Modelling, 64, 261–277.
Manel, D., Dias, J. M., Buckton, S. T. and Ormerod, S. J. 1999a. Alternative methods
for predicting species distribution: an illustration with Himalayan river birds.
Journal of Applied Ecology, 36, 734–747.
References · 443
Meehl, G. A., Goddard, L., Murphy, J., et al. 2009. Decadal prediction: can it be skil-
ful? Bulletin of the American Meteorological Society, 90, 1467–1485.
Meentemeyer, R. K., Anacker, B. L., Mark, W. and Rizzo, D. M. 2008. Early detec-
tion of emerging forest disease using dispersal estimation and ecological niche
modeling. Ecological Applications, 18, 377–390.
Meier, E. S., Kienast, F., Pearman, P. B., et al. 2010. Biotic and abiotic variables
show little redundancy in explaining tree species distributions. Ecography, 33,
1038–1048.
Meier, E. S., Edwards,T. C., Kienast, F., Dobbertin, M. and Zimmermann, N. E. 2011.
Co- occurrence patterns of trees along macro- climatic gradients and their
potential influence on the present and future distribution of Fagus sylvatica L.
Journal of Biogeography, 38, 371–382.
Meier, E. S., Lischke, H., Schmatz, D. R. and Zimmermann, N. E. 2012. Climate,
competition and connectivity affect future migration and ranges of European
trees. Global Ecology and Biogeography, 21, 164–178.
Meller, L., Cabeza, M., Pironon, S., et al. 2014. Ensemble distribution models in
conservation prioritization: from consensus predictions to consensus reserve
networks. Diversity and Distributions, 20, 309–321.
Menard, S. W. 2002. Applied Logistic Regression Analysis. Thousand Oaks, CA: Sage.
Merow, C., Smith, M. J., Edwards, T. C., et al. 2014. What do we gain from simplicity
versus complexity in species distribution models? Ecography, 37, 1267–1281.
Mesgaran, M. B., Cousens, R. D. and Webber, B. L. 2014. Here be dragons: a
tool for quantifying novelty due to covariate range and correlation change
when projecting species distribution models. Diversity and Distributions, 20,
1147–1159.
Meyer, C., Kreft, H., Guralnick, R. and Jetz, W. 2015. Global priorities for an effec-
tive information basis of biodiversity distributions. Nature Communications, 6.
Meynard, C. N. and Kaplan, D. M. 2013. Using virtual species to study species distri-
butions and model performance. Journal of Biogeography, 40, 1–8.
Meynard, C. N. and Quinn, J. F. 2007. Predicting species distributions: a critical com-
parison of the most common statistical models using artificial species. Journal of
Biogeography, 34, 1455–1469.
Meyneeke, J. O. 2004. Effects of global climate change on geographic distributions of
vertebrates in North Queensland. Ecological Modelling, 174, 347–357.
Mitchell, M. S., Lancia, R. A. and Gerwin, J. A. 2001. Using landscape-level data to
predict the distribution of birds on a managed forest: effects of scale. Ecological
Applications, 11, 1692–1708.
Mod, H. K., Scherrer, D., Luoto, M. and Guisan, A. 2016. What we use is not what
we know: environmental predictors in plant distribution models. Journal of
Vegetation Science, 27(6), 1308–1322.
Moisen, G. G. and Frescino, T. S. 2002. Comparing five modelling techniques for
predicting forest characteristics. Ecological Modelling, 157, 209–225.
Mokany, K., Harwood, T. D., Williams, K. J. and Ferrier, S. 2012. Dynamic macro-
ecology and the future for biodiversity. Global Change Biology, 18, 3149–3159.
Moloney, K. A. and Jeltsch, F. 2008. Space matters: novel developments in plant
ecology through spatial modelling. Perspectives in Plant Ecology Evolution and
Systematics, 9, 119–120.
References · 445
Nogues-Bravo, D. 2009. Predicting the past distribution of species climatic niches.
Global Ecology and Biogeography, 18, 521–531.
Nogues-Bravo, D., Rodiguez, J., Hortal, J., Batra, P. and Araujo, M. B. 2008. Climate
change, humans, and the extinction of the woolly mammoth. Plos Biology, 6,
685–692.
Normand, S., Treier, U. A., Randin, C., Vittoz, P., Guisan, A. and Svenning, J. C.
2009. Importance of abiotic stress as a range-limit determinant for European
plants: insights from species responses to climatic gradients. Global Ecology and
Biogeography, 18, 437–449.
Normand, S., Ricklefs, R. E., Skov, F., Bladt, J., Tackenberg, O. and Svenning, J.-C.
2011. Postglacial migration supplements climate in determining plant species
ranges in Europe. Proceedings of the Royal Society of London B: Biological Sciences,
278, 3644–3653.
Normand, S., Randin, C., Ohlemueller, R., et al. 2013. A greener Greenland?
Climatic potential and long-term constraints on future expansions of trees and
shrubs. Philosophical Transactions of the Royal Society B-Biological Sciences, 368.
O’Neill, R. V., DeAngelis, D. L., Waide, J. B. and Allen, T. F. H. 1986. A hierarchical
concept of ecosystems, Princeton, NJ, USA, Princeton University Press.
Olden, J. D. 2003. A species-specific approach to modeling biological communities
and its potential for conservation. Conservation Biology, 17, 854–863.
Olwoch, J. M., Rautenbach, C. J. D., Erasmus, B. F. N., Engelbrecht, F. A. and van
Jaarsveld, A. S. 2003. Simulating tick distributions over sub-Saharan Africa: the
use of observed and simulated climate surfaces. Journal of Biogeography, 30,
1221–1232.
Osborne, P. E. and Suarez-Seoane, S. 2002. Should data be partitioned spatially
before building large- scale distribution models? Ecological Modelling, 157,
249–259.
Osborne, P. E., Foody, G. M. and Suárez-Seoane, S. 2007. Non-stationarity and local
approaches to modelling the distributions of wildlife. Diversity and Distributions,
13, 313–323.
Ottaviani, D., Lasinio, G. J. and Boitani, L. 2004. Two statistical methods to validate
habitat suitability models using presence-only data. Ecological Modelling, 179,
417–443.
Oulas, A., Pavloudi, C., Polymenakou, P., et al. 2015. Metagenomics: tools and
insights for analyzing next-generation sequencing data derived from biodiver-
sity studies. Bioinformatics and Biology Insights, 9, 75.
Ovaskainen, O. and Meerson, B. 2010. Stochastic models of population extinction.
Trends in Ecology and Evolution, 25, 643–652.
Ovaskainen, O. and Soininen, J. 2011. Making more out of sparse data: hierarchical
modeling of species communities. Ecology, 92, 289–295.
Pagel, J. and Schurr, F. M. 2012. Forecasting species ranges by statistical estima-
tion of ecological niches and spatial population dynamics. Global Ecology and
Biogeography, 21, 293–304.
Parviainen, M., Marmion, M., Luoto, M., Thuiller, W. and Heikkinen, R. K. 2009.
Using summed individual species models and state-of-the-art modelling tech-
niques to identify threatened plant species hotspots. Biological Conservation, 142,
2501–2509.
References · 447
Pellissier, L., Fiedler, K., Ndribe, C., et al. 2012b. Shifts in species richness, herbi-
vore specialization, and plant resistance along elevation gradients. Ecology and
Evolution, 2, 1818–1825.
Pellissier, L., Bråthen, K. A.,Vittoz, P. A., et al. 2013a. Thermal niches are more con-
served at cold than warm limits in arctic-alpine plant species. Global Ecology and
Biogeography, 22, 933–941.
Pellissier, L., Meltofte, H., Hansen, J., et al. 2013b. Suitability, success and sinks: how
do predictions of nesting distributions relate to fitness parameters in high arctic
waders? Diversity and Distributions, 19, 1496–1505.
Pellissier, L., Pinto-Figueroa, E., Niculita-Hirzel, H., et al. 2013c. Plant species dis-
tributions along environmental gradients: do belowground interactions with
fungi matter? Frontiers in plant science, 4, 1–9.
Pellissier, L., Rohr, R. P., Ndiribe, C., et al. 2013d. Combining food web and species
distribution models for improved community projections. Ecology and Evolution,
3, 4572–4583.
Peltier, W. R. 2004. Global glacial isostasy and the surface of the ice-age earth: The
ice-5G (VM2) model and grace. Annual Review of Earth and Planetary Sciences,
32, 111–149.
Peppler-Lisbach, C. and Schroder, B. 2004. Predicting the species composition of
Nardus stricta communities by logistic regression modelling. Journal of Vegetation
Science, 15, 623–634.
Peters, J., De Baets, B., Verhoest, N. E. C., et al. 2007. Random forests as a tool for
ecohydrological distribution modelling. Ecological Modelling, 207, 304–318.
Peters, R. H. 1991. A Critique for Ecology, Cambridge, UK: Cambridge University Press.
Peterson, A. T. 2003. Predicting the geography of species’ invasions via ecological
niche modeling. Quarterly Review of Biology, 78, 419–433.
Peterson, A. T. 2006. Ecologic niche modeling and spatial patterns of disease trans-
mission. Emerging Infectious Diseases, 12, 1822–1826.
Peterson, A. T. 2011. Ecological niche conservatism: a time-structured review of
evidence. Journal of Biogeography, 38, 817–827.
Peterson, A. T., Ortega-Huerta, M. A., Bartley, J., et al. 2002a. Future projections for
Mexican faunas under global climatic change scenarios. Nature, 416, 626–629.
Peterson, A.T., Sanchez-Cordero,V., Ben Beard, C. and Ramsey, J. M. 2002b. Ecologic
niche modeling and potential reservoirs for Chagas disease, Mexico. Emerging
Infectious Diseases, 8, 662–667.
Peterson, A.T., Papes, M. and Soberon, J. 2008a. Rethinking receiver operating char-
acteristic analysis applications in ecological niche modeling. Ecological Modelling,
213, 63–72.
Peterson, A. T., Stewart, A., Mohamed, K. I. and Araújo, M. B. 2008b. Shifting global
invasive potential of European plants with climate change. PLoS One, 3, e2441.
Peterson, A.T., Soberon, J., Pearson, R. G., et al. 2011. Ecological Niches and Geographic
Distributions, Princeton, NJ: Princeton University Press.
Petitpierre, B., Kueffer, C., Broennimann, O., et al. 2012. Climatic niche shifts are
rare among terrestrial plant invaders. Science, 335, 1344–1348.
Petitpierre, B., McDougall, K., Seipel, T., et al. 2016. Will climate change increase
the risk of plant invasions into mountains? Ecological Applications, 26, 530–544.
References · 449
Quetier, F., Rivoal, F., Marty, P., et al. 2010. Social representations of an alpine grass-
land landscape and socio-political discourses on rural development. Regional
Environmental Change, 10, 119–130.
Quinlan, J. R. 1986. Induction of decision trees. Machine Learning, 1, 81–106.
Ramankutty, N. and Foley, J. A. 1999. Estimating historical changes in global land
cover: croplands from 1700 to 1992. Global Biogeochemical Cycles, 13, 997–1027.
Randin, C. F., Dirnbock, T., Dullinger, S., et al. 2006. Are niche-based species dis-
tribution models transferable in space? Journal of Biogeography, 33, 1689–1703.
Randin, C. F., Engler, R., Normand, S., et al. 2009. Climate change and plant distri-
bution: local models predict high-elevation persistence. Global Change Biology,
15, 1557–1569.
Randin, C. F., Paulsen, J.,Vitasse,Y., et al. 2013. Do the elevational limits of decidu-
ous tree species match their thermal latitudinal limits? Global Ecology and
Biogeography, 22, 913–923.
Raxworthy, C. J., Martinez-Meyer, E., Horning, N., et al. 2003. Predicting distri-
butions of known and unknown reptile species in Madagascar. Nature, 426,
837–841.
Regan, H. M., Hierl, L. A., Franklin, J., et al. 2008. Species prioritization for monitor-
ing and management in regional multiple species conservation plans. Diversity
and Distributions, 14, 462–471.
Reid, P. C., Lancelot, C., Gieskes, W. W. C., Hagmeier, E. and Weichart, G. 1990.
Phytoplankton of the North-Sea and its dynamics: a review. Netherlands Journal
of Sea Research, 26, 295–331.
Reineking, B. and Schroder, B. 2006. Constrain to perform: regularization of habitat
models. Ecological Modelling, 193, 675–690.
Renner, I.W. and Warton, D. I. 2013. Equivalence of MAXENT and Poisson point pro-
cess models for species distribution modeling in ecology. Biometrics, 69, 274–281.
Renner, I. W., Elith, J., Baddeley, A., et al. 2015. Point process models for presence-
only analysis. Methods in Ecology and Evolution, 6, 366–379.
Richardson, D. M., Pysek, P., Rejmanek, M., et al. 2000. Naturalization and invasion
of alien plants: concepts and definitions. Diversity and Distributions, 6, 93–107.
Ricklefs, R. E. 1987. Community diversity: relative roles of local and regional pro-
cesses. Science, 235, 167–171.
Ricklefs, R. E. 2008. Disintegration of the ecological community. The American
Naturalist, 172, 741–750.
Ridgeway, G. 1999.The state of boosting. Computing Science and Statistics, 31, 172–181.
Ripley, B. D. 1996. Pattern Recognition and Neural Networks. Cambridge: Cambridge
University Press.
Robertson, M. P., Caithness, N. and Villet, M. H. 2001. A PCA-based modelling
technique for predicting environmental suitability for organisms from presence
records. Diversity and Distributions, 7, 15–27.
Robertson, M. P., Villet, M. H. and Palmer, A. R. 2004. A fuzzy classification tech-
nique for predicting species’ distributions: applications using invasive alien
plants and indigenous insects. Diversity and Distributions, 10.
Robinson, A. P., Lane, S. E. and Thérien, G. 2011. Fitting forestry models using
generalized additive models: a taper model example. Canadian Journal of Forest
Research, 41, 1909–1916.
References · 451
Schwartz, M. W. 2012. Using niche models with climate projections to inform con-
servation management decisions. Biological Conservation, 155, 149–156.
Schwarz, M. and Zimmermann, N. E. 2005. A new GLM-based method for mapping
tree cover continuous fields using regional MODIS reflectance data. Remote
Sensing of Environment, 95, 428–443.
Scott, J. M., Davis, F., Csuti, B., et al. 1993. Gap analysis: a geographic approach to
protection of biological diversity. Wildlife Monographs, 123, 1–41.
Scott, J. M., Davis, F. W., McGhie, R. G., et al. 2001. Nature reserves: do they cap-
ture the full range of America’s biological diversity? Ecological Applications, 11,
999–1007.
Scott, J. M., Heglund, P. J., Haufler, J. B., et al. (eds) 2002. Predicting Species
Occurrences: Issues of Accuracy and Scale, Covelo, CA: Island Press.
Segurado, P. and Araujo, M. B. 2004. An evaluation of methods for modelling species
distributions. Journal of Biogeography, 31, 1555–1568.
Segurado, P., Araujo, M. B. and Kunin, W. E. 2006. Consequences of spatial autocor-
relation for niche-based models. Journal of Applied Ecology, 43, 433–444.
Seo, C.,Thorne, J. H., Hannah, L. and Thuiller,W. 2009. Scale effects in species distri-
bution models: implications for conservation planning under climate change.
Biology Letters, 5, 39–43.
Serra-Diaz, J. M., Franklin, J., Ninyerola, M., et al. 2014. Bioclimatic velocity: the pace
of species exposure to climate change. Diversity and Distributions, 20, 169–180.
Serra-Varela, M., Grivet, D.,Vincenot, L., Broennimann, O.,et al. 2015. Does phylo-
geographical structure relate to climatic niche divergence? A test using mari-
time pine (Pinus pinaster Ait.). Global Ecology and Biogeography, 24, 1302–1313.
Sexton, J. O., Song, X.-P., Feng, M., et al. 2013. Global, 30-m resolution continuous
fields of tree cover: Landsat-based rescaling of MODIS vegetation continuous
fields with lidar-based estimates of error. International Journal of Digital Earth, 6,
427–448.
Shipley, B., Vile, D. and Garnier, E. 2006. From plant traits to plant communities: a
statistical mechanistic approach to biodiversity. Science, 314, 812–814.
Shipley, B., Laughlin, D. C., Sonnier, G. and Otfinowski, R. 2011. A strong test of
a maximum entropy model of trait-based community assembly. Ecology, 92,
507–517.
Silvertown, J. 2004. Plant coexistence and the niche. Trends in Ecology and Evolution,
19, 605–611.
Silvertown, J., Dodd, M., Gowing, D., Lawson, C. and McConway, K. 2006. Phylogeny
and the hierarchical organization of plant diversity. Ecology, 87, S39–S49.
Simmons, R. E., Barnard, P., Dean,W. R. J., et al. 2004. Climate change and birds: per-
spectives and prospects from southern Africa. Ostrich, 75, 295–308.
Smith, T. M. and Smith, R. L. 2015. Elements of Ecology, 9th edn. San Francisco,
CA: Pearson Education Ltd.
Snell, R. S., Huth, A., Nabel, J. E. M. S., et al. 2014. Using dynamic vegetation models
to simulate plant range shifts. Ecography, 37, 1184–1197.
Soberón, J. 2007. Grinnellian and Eltonian niches and geographic distributions of
species. Ecology Letters, 10, 1115–1123.
Soberón, J. and Nakamura, M. 2009. Niches and distributional areas: concepts,
methods, and assumptions. Proceedings of the National Academy of Sciences, 106,
19644–19650.
References · 453
Tautz, D. 2003. Evolutionary biology: splitting in space. Nature, 421, 225–226.
Team, R. C. 2014. R: A language and environment for statistical computing. In R
Foundation For Statistical Computing,V. www.R-project.org/
Tebaldi, C. and Knutti, R. 2007.The use of the multi-model ensemble in probabilistic
climate projections. Philosophical Transactions of the Royal Society A: Mathematical
Physical and Engineering Sciences, 365, 2053–2075.
ter Braak, C. J. F. 1986. Canonical correspondence analysis: a new eigenvector tech-
nique for multivariate direct gradient analysis. Ecology, 67, 1167–1179.
Thibaud, E., Petitpierre, B., Broennimann, O., Davison, A. C. and Guisan, A. 2014.
Measuring the relative effect of factors affecting species distribution model
predictions. Methods in Ecology and Evolution, 5(9), 947–955.
Thomas, C. D., Cameron, A., Green, R. E., et al. 2004. Extinction risk from climate
change. Nature, 427, 145–148.
Thornton, P. E. and Running, S. W. 1999. An improved algorithm for estimating
incident daily solar radiation from measurements of temperature, humidity, and
precipitation. Agricultural and Forest Meteorology, 93, 211–228.
Thornton, P. E., Running, S. W. and White, M. A. 1997. Generating surfaces of
daily meteorological variables over large regions of complex terrain. Journal of
Hydrology, 190, 214–251.
Thornton, P. E., Thornton, M. M., Mayer, B. W., et al. 2017. Daymet: daily sur-
face weather data on a 1-km grid for North America. Version 3. Oak Ridge,
TN: ORNL DAAC. Available at: https://doi.org/10.3334/ORNLDAAC/
1328.
Thuiller,W. 2003. BIOMOD: optimizing predictions of species distributions and project-
ing potential future shifts under global change. Global Change Biology, 9, 1353–1362.
Thuiller, W. 2004. Patterns and uncertainties of species’ range shifts under climate
change. Global Change Biology, 10, 2020–2027.
Thuiller, W. 2007. Biodiversity: climate change and the ecologist. Nature, 448,
550–552.
Thuiller,W., Araujo, M. B. and Lavorel, S. 2003a. Generalized models vs. classification
tree analysis: predicting spatial distributions of plant species at different scales.
Journal of Vegetation Science, 14, 669–680.
Thuiller, W., Vayreda, J., Pino, J., et al. 2003b. Large-scale environmental corre-
lates of forest tree distributions in Catalonia (NE Spain). Global Ecology and
Biogeography, 12, 313–325.
Thuiller,W., Araujo, M. B., Pearson, R. G., et al. 2004a. Uncertainty in predictions of
extinction risk. Nature, 430, 34.
Thuiller, W., Brotons, L., Araújo, M. B. and Lavorel, S. 2004b. Effects of restricting
environmental range of data to project current and future species distributions.
Ecography, 27, 165–172.
Thuiller,W., Lavorel, S., Midgley, G., Lavergne, S. and Rebelo,T. 2004c. Relating plant
traits and species distributions along bioclimatic gradients for 88 Leucadendron
taxa. Ecology, 85, 1688–1699.
Thuiller, W., Lavorel, S., Araujo, M. B., Sykes, M. T. and Prentice, I. C. 2005a. Climate
change threats to plant diversity in Europe. Proceedings of the National Academy of
Sciences of the United States of America, 102, 8245–8250.
References · 455
Van Horne, B. 2002. Approaches to Habitat Modeling: The Tensions between
Pattern and Process and between Specificity and Generality. In Scott, J. M.,
Heglund, P. J., Haufler, J. B., et al. (eds), Predicting Species Occurrences: Issues of
Accuracy and Scale. Covelo, CA: Island Press.
Van Niel, K. P. and Austin, M. P. 2007. Predictive vegetation modeling for con-
servation: Impact of error propagation from digital elevation data. Ecological
Applications, 17, 266–280.
Vandermeer, J. H. 1972. Niche theory. Annual Review of Ecology and Systematics, 3,
107–132.
VanDerWal, J., Shoo, L. P., Graham, C. and William, S. E. 2009. Selecting pseudo-
absence data for presence-only distribution modeling: how far should you stray
from what you know? Ecological Modelling, 220, 589–594.
Vaughan, I. P. and Ormerod, S. J. 2005. The continuing challenges of testing species
distribution models. Journal of Applied Ecology, 42, 720–730.
Veloz, S. D. 2009. Spatially autocorrelated sampling falsely inflates measures of accu-
racy for presence-only niche models. Journal of Biogeography, 36, 2290–2299.
Venables, W. N. and Ripley, B. D. 2002. Modern Applied Statistisc with S. Dordrecht,
The Netherlands: Springer.
Verbyla, D. L. and Litvaitis, J. A. 1989. Resampling methods for evaluating classification
accuracy of wildlife habitat models. Environmental Management, 13, 783–787.
Verner, J., Morrison, M. L. and Ralph, C. J. 1986. Wildlife 2000: Modelling Habitat
Relationships of Terrestrial Vertebrates, Madison,WI: University of Wisconsin Press.
Vesk, P. A. 2013. How traits determine species responses to environmental gradients.
Journal of Vegetation Science, 24, 977–978.
Vetaas, O. R. 2002. Realized and potential climate niches: a comparison of four
Rhododendron tree species. Journal of Biogeography, 29, 545–554.
Vicente, J., Fernandes, R., Randin, C., et al. 2013. Will climate change drive alien
invasive plants into areas of high protection value? An improved model-based
regional assessment to prioritise the management of invasions. Journal of
Environmental Management, 131, 185–195.
Vincent, P. J. and Haworth, J. M. 1983. Poisson regression models of species abun-
dance. Journal of Biogeography, 10, 153–160.
Vittoz, P. and Engler, R. 2007. Seed dispersal distances: a typology based on dispersal
modes and plant traits. Botanica Helvetica, 117, 109–124.
Walther, G. R., Berger, S. and Sykes, M. T. 2005. An ecological ‘footprint’ of climate
change. Proceedings of the Royal Society B: Biological Sciences, 272, 1427–1432.
Ward, G., Hastie, T., Barry, S., Elith, J. and Leathwick, J. R. 2009. Presence-only data
and the EM algorithm. Biometrics, 65, 554–563.
Warren, D. L., Glor, R. E. and Turelli, M. 2008. Environmental niche equivalency
versus conservatism: quantitative approaches to niche evolution. Evolution, 62,
2868–2883.
Warton, D. I., Blanchet, F. G., O’Hara, et al. 2015. So many variables: joint modeling
in community ecology. Trends in Ecology and Evolution, 30, 766–779.
Weiher, E. and Keddy, P. 2001. Ecological Assembly Rules: Perspectives, Advances, Retreats,
Cambridge, UK: Cambridge University Press.
Weisberg, S. 1980. Applied Linear Regression, New York, NY: Wiley.
References · 457
Wisz, M. S., Pottier, J., Kissling, W. D., et al. 2013. The role of biotic interactions in
shaping distributions and realised assemblages of species: implications for spe-
cies distribution modelling. Biological Reviews, 88, 15–30.
Wisz, M. S., Broennimann, O., Grønkjær, P., et al. 2015. Arctic warming will promote
Atlantic–Pacific fish interchange. Nature Climate Change, 5, 261–265.
Wolmarans, R., Robertson, M. P. and van Rensburg, B. J. 2010. Predicting invasive
alien plant distributions: how geographical bias in occurrence records influ-
ences model performance. Journal of Biogeography, 37, 1797–1810.
Wood, S. 2006. Generalized Additive Models: An Introduction with R. London: CRC Press.
Wood, S. N., Goude,Y. and Shaw, S. 2015. Generalized additive models for large data
sets. Journal of the Royal Statistical Society: Series C (Applied Statistics), 64, 139–155.
Woodward, F. I. 1987. Climate and Plant Distribution, Cambridge, UK: Cambridge
University Press.
Woodward, F. I. and Kelly, C. K. 2003. Why are species not more widely distributed?
Physiological and environmental limits. In Blackburn, T. M. and Gaston, K. J.
(eds), Macroecology. Oxford, UK: Blackwell.
Wu, J. 1999. Hierarchy and scaling: extrapolating information along a scaling ladder.
Canadian Journal of Remote Sensing, 25, 367–380.
Wu, J. G. 2004. Effects of changing scale on landscape pattern analysis: scaling rela-
tions. Landscape Ecology, 19, 125–138.
Yañez-Arenas, C., Peterson, A. T., Mokondoko, P., Rojas-Soto, O. and Martínez-
Meyer, E. 2014. The use of ecological niche modeling to infer potential risk
areas of snakebite in the Mexican state of Veracruz. PLoS One, 9, e100957.
Yates, C. J., McNeill, A., Elith, J. and Midgley, G. F. 2010. Assessing the impacts of cli-
mate change and land transformation on Banksia in the South West Australian
Floristic Region. Diversity and Distributions, 16, 187–201.
Zanini, F., Pellet, J. and Schmidt, B. R. 2009.The transferability of distribution models
across regions: an amphibian case study. Diversity and Distributions, 15, 469–480.
Zhang, J., Hu, J., Lian, J., et al. 2016. Seeing the forest from drones: testing the poten-
tial of lightweight drones as a tool for long-term forest monitoring. Biological
Conservation, 198, 60–69.
Zhu, G. and Peterson, A. T. 2014. Potential geographic distribution of the novel
avian-origin influenza A (H7N9) Virus. PLoS One, 9, e93390.
Zimmermann, N. E., Yoccoz, N. G., Edwards, T. C., et al. 2009. Climatic extremes
improve predictions of spatial patterns of tree species. Proceedings of the National
Academy of Sciences of the United States of America, 106, 19723–19728.
Zimmermann, N. E., Edwards, T. C., Graham, C. H., Pearman, P. B. and Svenning, J.
C. 2010. New trends in species distribution modelling. Ecography, 33, 985–989.
Zobel, M. 1997. The relative role of species pools in determining plant species rich-
ness: an alternative explanation of species coexistence. Trends in Ecology and
Evolution, 12, 266–269.
Zurell, D., Berger, U., Cabral, J. S., et al. 2010. The virtual ecologist approach: simu-
lating data and observers. Oikos, 119, 622–635.
Zurell, D., Elith, J. and Schröder, B. 2012. Predicting to new environments: tools for
visualizing model behaviour and impacts on mapped distributions. Diversity and
Distributions, 18, 628–634.
Index · 459
conservation planning, 67, 147, 336 environment, 21, 41, 53, 129, 303
contrast validation index, 265 environmental data, 11, 22, 31, 42, 43, 44, 83,
cor (R function in package stats), 264 86, 113, 138, 323, 324, 332, 351
correlation coefficient environmental map. See environmental
Pearson correlation coefficient, 243, 264 data
point-biserial correlation coefficient, 264 environmental niche, 11, 22, 36, 37, 46, 53,
cowplot (R package), 214 311
cross-validation, 18, 117, 171, 190, 211, 228, environmental predictor. See
266, 271, 275 environmental data
cv.glm (R function in package boot), 281, environmental space, 44, 46, 121, 313, 316
285 environmental variable. See
environmental data
Daim (R function in package Daim), 282, error, 56, 102, 137, 198, 211, 242, 252
294 error.threshold.plot (R function in package
Daim (R package), 281, 294 PresenceAbsence), 260
Daim.control (R function in package evaluation, 17, 89, 117, 237, 299
Daim), 282, 294 explanatory variables, 152
degrees of freedom, 112, 113, 406 extent, 19, 59, 71, 80, 132, 149, 308
DEM. See digital elevation models
digital elevation models, 59, 62, 80, 104, 128 fields (R package), 317
discrete variable, 194 Find.Optim.Stat (R function in package
dismo (R package), 71, 87, 153, 217, 320 biomod2), 259, 338
dudi.pca (R function in package ade4), 161, functional traits, 49, 409
307, 320, 364 fundamental niche, 31, 312, 408
Index · 461
prevalence, 117, 245, 253 rpart (R package), 190, 191, 231
principal component analysis, 362 rpart (R function in package rpart), 191,
ProbDensFunc (R function in package 195, 203
biomod2), 382, 384 rpart.control (R function in package rpart),
pROC (R package), 161 191, 203
projection, 62, 301 RSF. See resource selection functions
pseudo-absence, 112, 131, 265, 335
pseudo-replication, 112, 272 SAC. See spatial autocorrelation
sample (R function), 287
random forests, 202 sample size, 112, 116
random sampling, 125, 146, 367 sampling bias, 131, 238
randomForest (R function in package sampling design, 110, 120
randomForest), 207, 232, 247, SAVI. See soil adjusted vegetation
282, 339 index
randomForest (R package), 207, 212, 247, scale, 135
282, 336 SDM. See species distribution models
randomization, 272 sensitivity, 116, 156, 239, 256, 282, 294
range filling, 29, 53, 141, 330 set.seed (R function), 198, 203, 280
rare species, 210, 406 similarity, 314
raster, 70, 102, 121, 136, 316 smoothing, 176, 210, 267
raster (R package), 71, 317, 362 snowfall (R package), 397
rasterVis (R package), 382, 391 soil adjusted vegetation index, 98
realized environment, 39, 41, 54, 303 sp (R package), 71
realized niche, 32, 36, 39, 42, 54, 308 spatial autocorrelation, 54, 59, 111,
receiver-operating characteristic curve, 230, 112, 238
262, 263 spatial prediction, 46, 152, 301
regression coefficient, 108, 113, 171, 293 spatial scale, 139
remote sensing, 92, 404 species data, 19, 53, 133
resample (R function in package species distribution models, 8
raster), 73 species range envelope, 156
resampling, 18, 73, 111, 237, 239, 270 species turnover, 48
resolution, 52, 59, 306 species–environment, 20, 26, 42, 52, 138,
resource selection functions, 8 325
response curve, 11, 30, 44, 166, 168, 175, specificity, 239, 256, 282, 294
186, 193, 308 SRE. See species range envelope
response.plot2 (R function in package S-SDMs. See stacked species
biomod2), 168, 176, 193, 201, distribution models
373 stack, 46, 70
resubstitution, 239, 270 standard deviation, 227
RF. See random forest standard error, 293, 318, 327
rgbif (R package), 358, 386 step.gam (R function in package
rgdal (R package), 71, 76, 123, 128 gam), 178
RMSE. See root mean square sum of stepAIC (R function in package MASS),
errors 171, 178
roc (R function in package pRoc), summary (R function), 78, 83, 91, 180, 183,
161, 206 212, 293
ROC plot. See receiver-operating summary.gbm (R function in package
characteristic curve gbm), 212
root mean square sum of errors, 243, 272 support vector machine, 188