Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Applied Geospatial Data Science with Python: Leverage geospatial data analysis and modeling to find unique solutions to environmental problems
Applied Geospatial Data Science with Python: Leverage geospatial data analysis and modeling to find unique solutions to environmental problems
Applied Geospatial Data Science with Python: Leverage geospatial data analysis and modeling to find unique solutions to environmental problems
Ebook590 pages3 hours

Applied Geospatial Data Science with Python: Leverage geospatial data analysis and modeling to find unique solutions to environmental problems

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Data scientists, when presented with a myriad of data, can often lose sight of how to present geospatial analyses in a meaningful way so that it makes sense to everyone. Using Python to visualize data helps stakeholders in less technical roles to understand the problem and seek solutions. The goal of this book is to help data scientists and GIS professionals learn and implement geospatial data science workflows using Python.
Throughout this book, you’ll uncover numerous geospatial Python libraries with which you can develop end-to-end spatial data science workflows. You’ll learn how to read, process, and manipulate spatial data effectively. With data in hand, you’ll move on to crafting spatial data visualizations to better understand and tell the story of your data through static and dynamic mapping applications. As you progress through the book, you’ll find yourself developing geospatial AI and ML models focused on clustering, regression, and optimization. The use cases can be leveraged as building blocks for more advanced work in a variety of industries.
By the end of the book, you’ll be able to tackle random data, find meaningful correlations, and make geospatial data models.

LanguageEnglish
Release dateFeb 28, 2023
ISBN9781803240343
Applied Geospatial Data Science with Python: Leverage geospatial data analysis and modeling to find unique solutions to environmental problems

Related to Applied Geospatial Data Science with Python

Related ebooks

Computers For You

View More

Related articles

Reviews for Applied Geospatial Data Science with Python

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Applied Geospatial Data Science with Python - David S. Jordan

    cover.png

    BIRMINGHAM—MUMBAI

    Applied Geospatial Data Science with Python

    Copyright © 2023 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Publishing Product Manager: Dinesh Chaudhary

    Content Development Editor: Shreya Moharir

    Technical Editor: Devanshi Ayare

    Copy Editor: Safis Editing

    Project Coordinator: Farheen Fathima

    Proofreader: Safis Editing

    Indexer: Pratik Shirodkar

    Production Designer: Ponraj Dhandapani

    Marketing Coordinators: Shifa Ansari, Vinishka Kalra

    First published: February 2023

    Production reference: 1270123

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham

    B3 2PB, UK.

    ISBN 978-1-80323-812-8

    www.packtpub.com

    To those who came before me and those who come after, leaving an indelible mark on the world and making it a better place.

    Acknowledgments

    I’d like to acknowledge the many teachers, professors, and mentors who have shared their infinite knowledge and wisdom with me throughout the years. Without them, my path through life would likely be completely different. I’d like to start with Dr. Jennifer Stuart, whose marketing analytics course introduced me to the power of data science for the first time. I’d also like to thank Dr. Connie Rothwell and Delbridge Narron for supporting me in my honors thesis and encouraging me to question and challenge the world around me. I’d also like to thank the staff and professors at the Institute for Advanced Analytics at North Carolina State University for providing me with a first-class applied data science education.

    I’d be remiss if I didn’t thank my friends and family for supporting me in this endeavor, as well as all of the other crazy leaps I’ve taken throughout my personal and professional life – most importantly, my older brother, Jeff Jordan, who has encouraged me in my successes and failures and listened to me vent in my times of need. Thank you for also continuing to be the intellectual springboard that has led to some of my best thinking.

    Lastly, I’d like to thank the many developers who are actively and often thanklessly building out the open source spatial data science ecosystem. Without these developers, this book would not have been possible, as we wouldn’t have had the tools or data at our disposal. I encourage you, the reader of this book and bourgeoning spatial data scientist, to give back to the community in any way possible in the future.

    Contributors

    About the author

    David S. Jordan has made a career out of applying spatial thinking to tough problem spaces in the domains of real estate planning, disaster response, social equity, and climate change. He currently leads distribution and geospatial data science at JPMorgan Chase & Co. In addition to leading and building out geospatial data science teams, David is a patented inventor of new geospatial analytics processes, a winner of a Special Achievement in GIS (SAG) Award from Esri, and a conference speaker on topics including banking deserts and how great businesses leverage GIS.

    About the reviewer

    Rohit Singh has been working in the field of geospatial analysis and modeling for the past 5 years, and he is currently working as a geospatial data scientist at Near Intelligence Pvt. Ltd. In his current role, he is in charge of all spatial data features and engineering pipelines, as well as curating spatial data from different sources, developing methods for spatial data processing in an optimized manner. He has developed methods in Apache Spark to handle the processing and modeling of spatial big data, and he regularly contributes to the GIS and data science communities. He developed a Python module to optimize the geohash generation process for polygons. In the future, he wants to contribute more to spatial data science by developing more spatial data methods and models.

    Table of Contents

    Preface

    Part 1: The Essentials of Geospatial Data Science

    1

    Introducing Geographic Information Systems and Geospatial Data Science

    What is GIS?

    What is data science?

    Mathematics

    Computer science

    Industry and domain knowledge

    Soft skills

    What is geospatial data science?

    Summary

    2

    What Is Geospatial Data and Where Can I Find It?

    Static and dynamic geospatial data

    Geospatial file formats

    Vector data

    Raster data

    Introducing geospatial databases and storage

    PostgreSQL and PostGIS

    ArcGIS geodatabase

    Exploring open geospatial data assets

    Human geography

    Physical geography

    Country- and area-specific data

    Summary

    3

    Working with Geographic and Projected Coordinate Systems

    Technical requirements

    Exploring geographic coordinate systems

    Understanding GCS versions

    Understanding projected coordinate systems

    Common types of projected coordinate systems

    Working with GCS and PCS in Python

    PyProj

    GeoPandas

    Summary

    4

    Exploring Geospatial Data Science Packages

    Technical requirements

    Packages for working with geospatial data

    GeoPandas

    GDAL

    Shapely

    Fiona

    Rasterio

    Packages enabling spatial analysis and modeling

    PySAL

    Packages for producing production-quality spatial visualizations

    ipyLeaflet

    Folium

    geoplot

    GeoViews

    Datashader

    Reviewing foundational data science packages

    pandas

    scikit-learn

    Summary

    Part 2: Exploratory Spatial Data Analysis

    5

    Exploratory Data Visualization

    Technical requirements

    The fundamentals of ESDA

    Example – New York City Airbnb listings

    Conducting EDA

    ESDA

    Summary

    6

    Hypothesis Testing and Spatial Randomness

    Technical requirements

    Constructing a spatial hypothesis test

    Understanding spatial weights and spatial lags

    Global spatial autocorrelation

    Local spatial autocorrelation

    Point pattern analysis

    Ripley’s alphabet functions

    Summary

    7

    Spatial Feature Engineering

    Technical requirements

    Defining spatial feature engineering

    Performing a bit of geospatial magic

    Engineering summary spatial features

    Summary spatial features using one dataset

    Summary spatial features using two datasets

    Engineering proximity spatial features

    Proximity spatial features – NYC attractions

    Summary

    Part 3: Geospatial Modeling Case Studies

    8

    Spatial Clustering and Regionalization

    Technical requirements

    Collecting geodemographic data for modeling

    Extracting data using the Census API

    Cleaning the extracted data

    Conducting EDA and ESDA

    Developing geodemographic clusters

    K-means geodemographic clustering

    Agglomerative hierarchical geodemographic clustering

    Spatially constrained agglomerative hierarchical geodemographic clustering

    Measuring model performance

    Summary

    9

    Developing Spatial Regression Models

    Technical requirements

    A refresher on regression models

    Constructing an initial regression model

    Exploring unmodeled spatial relationships

    Teaching the model to think spatially

    Incorporating spatial fixed effects within the model

    Introduction to GWR models

    Fitting a GWR model to predict nightly Airbnb prices

    Introduction to Multiscale Geographically Weighted Regression

    Fitting an MGWR model to predict nightly Airbnb prices

    How do I choose between these models?

    Summary

    10

    Developing Solutions for Spatial Optimization Problems

    Technical requirements

    Exploring the Location Set Covering Problem (LSCP)

    Understanding the math behind the LSCP

    Solving LSCPs

    Exploring route-based combinatorial optimization problems

    Understanding the math behind the TSP

    Setting up the Google Maps API

    Solving the TSP

    Exploring a single-vehicle Vehicle Routing Problem (VRP)

    Exploring a Capacitated Vehicle Routing Problem (CVRP)

    Summary

    11

    Advanced Topics in Spatial Data Science

    Technical requirements

    Efficient operations with spatial indexing

    Implementing R-tree indexing in GeoPandas

    Introducing the H3 spatial index

    Estimating unknowns with spatial interpolation

    Applying Inverse Distance Weighted (IDW) interpolation

    Introduction to Kriging-based interpolation

    Ethical spatial data science

    Example 1 – Sharpiegate

    Example 2 – Human mobility: The New York Times investigative report

    Example 3 – COVID-19 contact tracing

    Example 4 – United States Census Bureau disclosure avoidance system

    Summary

    Index

    Other Books You May Enjoy

    Preface

    By the time this book has been published, the world will have just formally exited a global pandemic, and society as a whole will be trying to grapple with the new normal in the post-COVID era. During the depths of the pandemic, spatial analysis was featured in prime time through the great work of Johns Hopkins University (JHU)’s COVID-19 dashboard, which can be found at https://coronavirus.jhu.edu/map.html. The JHU dashboard monitored the spread of the virus across the globe in near real time, and this map was likely the first time that the masses were exposed to the power of spatial analysis, spatial data visualization, and spatial data science. However, spatial analysis has long been used to analyze the spread of diseases. In fact, way back in 1854, John Snow produced a map of cholera deaths in London, which allowed him to show that cholera was spread through germs in water wells and not through miasma in the air, as many thought during that time.

    Reeling from this global pandemic is not the only problem that our modern society faces. Today, supply chain issues that face economies across the globe are driving inflation to heights not seen in several decades. In addition to this, climate change is causing major rivers across the globe to dry up, including the Colorado and Mississippi rivers in the United States, the Yangtze in China, the Rhine in Germany, and the Danube in Romania. Climate change is also leading to more extreme weather events, yielding devastating flooding in areas such as Florida in the United States and Pakistan in South Asia.

    We are also living through a time in which more and more people are willing to stand up for equity and call out inequities when they see them. In the United States and across the world, teams of people are researching high-profile inequities in terms of the global food supply, healthcare access, and financial services. Others are looking into lesser-known inequities, such as urban heat islands and lack of shade. Collectively, teams of this kind are working hard to ensure that future generations won’t face the inequities of their forefathers.

    We now have the data, tools, and technology to begin to do something about each of these problems. Spatial analysis and data science have the potential to provide enormous value in helping us find solutions, perform resiliency planning, and better educate ourselves and those around us. However, while performing spatial analysis and producing compelling visualizations is now easier than ever, it is not without risks. By nature, maps and spatial data are representations of real-world processes and are often incomplete or can easily be manipulated and thus the truth can be distorted. One recent example of map manipulation happened in an event that has since been dubbed Sharpie-gate, in which then-President Donald Trump altered an NOAA hurricane path map with a Sharpie in defiance of the scientific community. While this example may seem comical, there are many nuances to spatial analysis, data science, and cartography that you’ll need to be aware of as a burgeoning spatial data scientist.

    This book is written for data scientists seeking to incorporate geospatial analysis into their work and for geographic information system (GIS) professionals seeking to incorporate data science methods into their work. Our goal is that this text will help these communities to develop a common understanding and shared vernacular, enabling them to properly incorporate geographic context into modeling, analysis, and visualization.

    This book will begin with the fundamentals of GIS and data science before moving into detailed examples of spatial data science workflows built upon practical applications of geospatial data science that are industry agnostic. We will begin by teaching you the fundamentals of sourcing and working with geospatial data. Building upon this, we will teach you how to integrate spatial data and spatial thinking into your data science processes to hopefully improve model performance and develop a more accurate representation of the world around us.

    We hope that you, as a member of the next generation of spatial data scientists, are empowered to leverage spatial thinking and analysis, which may help us find solutions to the problems currently facing our society and better prepare for the future ahead.

    Who this book is for

    This book is for you if you are a data scientist seeking to incorporate geospatial thinking into your workflows or a GIS professional seeking to incorporate data science methods into yours. You’ll need to have a foundational knowledge of Python for data analysis and/or data science.

    What this book covers

    Chapter 1, Introducing Geographic Information Systems and Geospatial Data Science, lays the foundations for the book by introducing you to GIS and its commonalities with and differences from geospatial data science. In this chapter, we also walk through the data science pipeline that you’ll follow throughout the book.

    Chapter 2, What Is Geospatial Data and Where Can I Find It?, introduces you to common geospatial data types and formats that you’ll work with throughout your geospatial data science workflows. In this chapter, we’ll also introduce various categories of geospatial data, ranging from human geography to country- and area-specific data.

    Chapter 3, Working with Geographic and Projected Coordinate Systems, will introduce you to geographic and projected coordinate systems and help you avoid some of the most common pitfalls of working with geospatial data.

    Chapter 4, Exploring Geospatial Data Science Packages, covers a wide variety of Python geospatial data science packages that allow you to perform spatial data processing, analysis, visualization, and modeling.

    Chapter 5, Exploratory Data Visualization, shows you how to harness the power of spatial data to create compelling static and dynamic mapping applications.

    Chapter 6, Hypothesis Testing and Spatial Randomness, introduces you to the topic of complete spatial randomness and a variety of statistical tests to better understand whether your data reflects patterns across space.

    Chapter 7, Spatial Feature Engineering, will walk you through how to derive new spatial-based features known as summary spatial features and proximity spatial features from both tabular and geo-enabled data assets.

    Chapter 8, Spatial Clustering and Regionalization, introduces you to a class of unsupervised machine learning models known as clustering models, through which you’ll create spatial clusters and regions from your data.

    Chapter 9, Developing Spatial Regression Models, will open your eyes to the power that spatial data can bring to regression models through the incorporation of spatial effects.

    Chapter 10, Developing Solutions to Spatial Optimization Problems, will show you how to use linear programming in combination with spatial data to solve problems such as the Vehicle Routing Problem and the Location Set Covering Problem.

    Chapter 11, Advanced Topics in Spatial Data Science, covers more advanced topics in spatial feature engineering, spatial modeling, and spatial ethics.

    To get the most out of this book

    As readers of this book, we assume that you come from a background in either data science or GIS. We also expect that you have some foundational knowledge of working with Python.

    Additionally, you will need to set up keys to several APIs, from which you will access data throughout the book.

    If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

    The quality of the hardware can impact the runtime for some analyses, as is the case for most data science activities. As such, we recommend hardware similar or better to the specified hardware outlined to prevent any potential issues:

    NVIDIA GeForce GTX 1050

    16 GB RAM

    We recommend that you use Anaconda as your Python environment and package manager. To begin installing the Anaconda Distribution, you’ll want to visit the Anaconda Distribution installation website at https://docs.anaconda.com/anaconda/install/. The Python version we are using throughout this book is 3.10.6, as this is one of the latest versions of Python available at the time of publication. Leveraging this version will ensure that all packages are compatible. To make the setup of your virtual environment as streamlined as possible, we’ve exported our environment.yml file and uploaded it to the GitHub repository at https://github.com/PacktPublishing/Applied-Geospatial-Data-Science-with-Python.

    To set up the virtual environment called GeospatialPython, launch Anaconda prompt and execute the following command:

    conda env create -file environment.yml

    You’ll need to substitute environment.yml for the full path of the downloaded file.

    After the environment is installed, you can activate it by executing the following command:

    conda activate GeospatialPython

    Throughout the book, you’ll see the following code:

    data_path = r'YOUR FILE PATH'

    Anytime you see this, you’ll need to substitute ‘YOUR FILE PATH’ with the file path of the data folder which can be downloaded from the GitHub repo. The data stored in the GitHub repo can be found in the Releases section or by visiting https://github.com/PacktPublishing/Applied-Geospatial-Data-Science-with-Python/releases. There are three parts to the data:

    Data.pt1.zip

    LCMS_CONUS_v2021-7_Land_Cover_Annual_2021.zip

    S2B_MSIL2A_20220504T161829_N0400_R040_T17TNF_20220504T210702.SAFE.zip

    You’ll need to extract the contents of these zip folders and store the contents in a single folder. You’ll then point to this folder any time you see ‘YOUR FILE PATH’ referenced in the Jupyter notebooks.

    Similarly, you will also see the following code from time to time:

    out_path = rYOUR FILE PATH

    You’ll need to substitute YOUR FILE PATH in this code reference with the directory to which you’d like the output to be saved.

    Download the example code files

    You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Applied-Geospatial-Data-Science-with-Python. If there’s an update to the code, it will be updated in the GitHub repository.

    We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

    Download the color images

    We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://packt.link/AN9bG.

    Conventions used

    There are a number of text conventions used throughout this book.

    Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: The PyProj package is useful when working with cartographic projections and geodetic transformations.

    A block of code is set as follows:

    world_ae = world.to_crs(ESRI:54032)

    graticules_ae = grat.to_crs(ESRI:54032)

    When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

    patients = 150

    # number of demand points represented as patients

     

    medical_centers = 4

    # number of service points represented as medical centers

    Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: We've called ours VRP Project. Select this project and scroll down to Enabled APIs. Then, select Directions API and click Enable.

    Tips or important notes

    Appear like this.

    Get in touch

    Feedback from our readers is always welcome.

    General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

    Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

    Piracy: If you come across any illegal copies of our works in any form on the internet, we would

    Enjoying the preview?
    Page 1 of 1