Applied Geospatial Data Science with Python: Leverage geospatial data analysis and modeling to find unique solutions to environmental problems
()
About this ebook
Data scientists, when presented with a myriad of data, can often lose sight of how to present geospatial analyses in a meaningful way so that it makes sense to everyone. Using Python to visualize data helps stakeholders in less technical roles to understand the problem and seek solutions. The goal of this book is to help data scientists and GIS professionals learn and implement geospatial data science workflows using Python.
Throughout this book, you’ll uncover numerous geospatial Python libraries with which you can develop end-to-end spatial data science workflows. You’ll learn how to read, process, and manipulate spatial data effectively. With data in hand, you’ll move on to crafting spatial data visualizations to better understand and tell the story of your data through static and dynamic mapping applications. As you progress through the book, you’ll find yourself developing geospatial AI and ML models focused on clustering, regression, and optimization. The use cases can be leveraged as building blocks for more advanced work in a variety of industries.
By the end of the book, you’ll be able to tackle random data, find meaningful correlations, and make geospatial data models.
Related to Applied Geospatial Data Science with Python
Related ebooks
Learning Geospatial Analysis with Python Rating: 5 out of 5 stars5/5Geospatial Development By Example with Python Rating: 5 out of 5 stars5/5Learning Geospatial Analysis with Python - Second Edition Rating: 0 out of 5 stars0 ratingsPython for ArcGIS Pro: Automate cartography and data analysis using ArcPy, ArcGIS API for Python, Notebooks, and pandas Rating: 0 out of 5 stars0 ratingsBuilding Mapping Applications with QGIS Rating: 0 out of 5 stars0 ratingsData Science with Raspberry Pi: Real-Time Applications Using a Localized Cloud Rating: 0 out of 5 stars0 ratingsPython Geospatial Development Essentials Rating: 0 out of 5 stars0 ratingsDomain-Specific Knowledge Graph Construction Rating: 0 out of 5 stars0 ratingsThe Esri Guide to GIS Analysis, Volume 3: Modeling Suitability, Movement, and Interaction Rating: 0 out of 5 stars0 ratingsPractical Data Analysis Rating: 4 out of 5 stars4/5Deep Learning: Convergence to Big Data Analytics Rating: 0 out of 5 stars0 ratingsLeaflet.js Essentials Rating: 4 out of 5 stars4/5Geospatial Data Science: Combining Geography with Data Science Rating: 0 out of 5 stars0 ratingsGeospatial Data Analytics on AWS: Discover how to manage and analyze geospatial data in the cloud Rating: 0 out of 5 stars0 ratingsVisuals Matter! Rating: 0 out of 5 stars0 ratingsSpatial Statistics Illustrated Rating: 5 out of 5 stars5/5Mastering Social Media Mining with R Rating: 5 out of 5 stars5/5Introduction to R for Business Intelligence Rating: 0 out of 5 stars0 ratingsPython Geospatial Development - Third Edition Rating: 4 out of 5 stars4/5Python Geospatial Analysis Essentials Rating: 0 out of 5 stars0 ratingsComputer Vision Using Deep Learning: Neural Network Architectures with Python and Keras Rating: 0 out of 5 stars0 ratingsPostGIS in Action, Third Edition Rating: 0 out of 5 stars0 ratingsActivity Recognition: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsFinding Data Patterns in the Noise: A Data Scientist's Tale Rating: 0 out of 5 stars0 ratingsStatistical Analysis of Network Data with R Rating: 2 out of 5 stars2/5An Introduction to Spatial Data Analysis: Remote Sensing and GIS with Open Source Software Rating: 0 out of 5 stars0 ratings
Computers For You
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 5 out of 5 stars5/5The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5Elon Musk Rating: 4 out of 5 stars4/5The Invisible Rainbow: A History of Electricity and Life Rating: 5 out of 5 stars5/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls Rating: 4 out of 5 stars4/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 4 out of 5 stars4/5Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsSQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsHow to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally Rating: 4 out of 5 stars4/5An Ultimate Guide to Kali Linux for Beginners Rating: 3 out of 5 stars3/5101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5Uncanny Valley: A Memoir Rating: 4 out of 5 stars4/5CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide Rating: 5 out of 5 stars5/5Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5Deep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5Dawn of the New Everything: Encounters with Reality and Virtual Reality Rating: 4 out of 5 stars4/5The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling Rating: 0 out of 5 stars0 ratingsThe Hacker Crackdown: Law and Disorder on the Electronic Frontier Rating: 4 out of 5 stars4/5
Reviews for Applied Geospatial Data Science with Python
0 ratings0 reviews
Book preview
Applied Geospatial Data Science with Python - David S. Jordan
BIRMINGHAM—MUMBAI
Applied Geospatial Data Science with Python
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Publishing Product Manager: Dinesh Chaudhary
Content Development Editor: Shreya Moharir
Technical Editor: Devanshi Ayare
Copy Editor: Safis Editing
Project Coordinator: Farheen Fathima
Proofreader: Safis Editing
Indexer: Pratik Shirodkar
Production Designer: Ponraj Dhandapani
Marketing Coordinators: Shifa Ansari, Vinishka Kalra
First published: February 2023
Production reference: 1270123
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80323-812-8
www.packtpub.com
To those who came before me and those who come after, leaving an indelible mark on the world and making it a better place.
Acknowledgments
I’d like to acknowledge the many teachers, professors, and mentors who have shared their infinite knowledge and wisdom with me throughout the years. Without them, my path through life would likely be completely different. I’d like to start with Dr. Jennifer Stuart, whose marketing analytics course introduced me to the power of data science for the first time. I’d also like to thank Dr. Connie Rothwell and Delbridge Narron for supporting me in my honors thesis and encouraging me to question and challenge the world around me. I’d also like to thank the staff and professors at the Institute for Advanced Analytics at North Carolina State University for providing me with a first-class applied data science education.
I’d be remiss if I didn’t thank my friends and family for supporting me in this endeavor, as well as all of the other crazy leaps I’ve taken throughout my personal and professional life – most importantly, my older brother, Jeff Jordan, who has encouraged me in my successes and failures and listened to me vent in my times of need. Thank you for also continuing to be the intellectual springboard that has led to some of my best thinking.
Lastly, I’d like to thank the many developers who are actively and often thanklessly building out the open source spatial data science ecosystem. Without these developers, this book would not have been possible, as we wouldn’t have had the tools or data at our disposal. I encourage you, the reader of this book and bourgeoning spatial data scientist, to give back to the community in any way possible in the future.
Contributors
About the author
David S. Jordan has made a career out of applying spatial thinking to tough problem spaces in the domains of real estate planning, disaster response, social equity, and climate change. He currently leads distribution and geospatial data science at JPMorgan Chase & Co. In addition to leading and building out geospatial data science teams, David is a patented inventor of new geospatial analytics processes, a winner of a Special Achievement in GIS (SAG) Award from Esri, and a conference speaker on topics including banking deserts and how great businesses leverage GIS.
About the reviewer
Rohit Singh has been working in the field of geospatial analysis and modeling for the past 5 years, and he is currently working as a geospatial data scientist at Near Intelligence Pvt. Ltd. In his current role, he is in charge of all spatial data features and engineering pipelines, as well as curating spatial data from different sources, developing methods for spatial data processing in an optimized manner. He has developed methods in Apache Spark to handle the processing and modeling of spatial big data, and he regularly contributes to the GIS and data science communities. He developed a Python module to optimize the geohash generation process for polygons. In the future, he wants to contribute more to spatial data science by developing more spatial data methods and models.
Table of Contents
Preface
Part 1: The Essentials of Geospatial Data Science
1
Introducing Geographic Information Systems and Geospatial Data Science
What is GIS?
What is data science?
Mathematics
Computer science
Industry and domain knowledge
Soft skills
What is geospatial data science?
Summary
2
What Is Geospatial Data and Where Can I Find It?
Static and dynamic geospatial data
Geospatial file formats
Vector data
Raster data
Introducing geospatial databases and storage
PostgreSQL and PostGIS
ArcGIS geodatabase
Exploring open geospatial data assets
Human geography
Physical geography
Country- and area-specific data
Summary
3
Working with Geographic and Projected Coordinate Systems
Technical requirements
Exploring geographic coordinate systems
Understanding GCS versions
Understanding projected coordinate systems
Common types of projected coordinate systems
Working with GCS and PCS in Python
PyProj
GeoPandas
Summary
4
Exploring Geospatial Data Science Packages
Technical requirements
Packages for working with geospatial data
GeoPandas
GDAL
Shapely
Fiona
Rasterio
Packages enabling spatial analysis and modeling
PySAL
Packages for producing production-quality spatial visualizations
ipyLeaflet
Folium
geoplot
GeoViews
Datashader
Reviewing foundational data science packages
pandas
scikit-learn
Summary
Part 2: Exploratory Spatial Data Analysis
5
Exploratory Data Visualization
Technical requirements
The fundamentals of ESDA
Example – New York City Airbnb listings
Conducting EDA
ESDA
Summary
6
Hypothesis Testing and Spatial Randomness
Technical requirements
Constructing a spatial hypothesis test
Understanding spatial weights and spatial lags
Global spatial autocorrelation
Local spatial autocorrelation
Point pattern analysis
Ripley’s alphabet functions
Summary
7
Spatial Feature Engineering
Technical requirements
Defining spatial feature engineering
Performing a bit of geospatial magic
Engineering summary spatial features
Summary spatial features using one dataset
Summary spatial features using two datasets
Engineering proximity spatial features
Proximity spatial features – NYC attractions
Summary
Part 3: Geospatial Modeling Case Studies
8
Spatial Clustering and Regionalization
Technical requirements
Collecting geodemographic data for modeling
Extracting data using the Census API
Cleaning the extracted data
Conducting EDA and ESDA
Developing geodemographic clusters
K-means geodemographic clustering
Agglomerative hierarchical geodemographic clustering
Spatially constrained agglomerative hierarchical geodemographic clustering
Measuring model performance
Summary
9
Developing Spatial Regression Models
Technical requirements
A refresher on regression models
Constructing an initial regression model
Exploring unmodeled spatial relationships
Teaching the model to think spatially
Incorporating spatial fixed effects within the model
Introduction to GWR models
Fitting a GWR model to predict nightly Airbnb prices
Introduction to Multiscale Geographically Weighted Regression
Fitting an MGWR model to predict nightly Airbnb prices
How do I choose between these models?
Summary
10
Developing Solutions for Spatial Optimization Problems
Technical requirements
Exploring the Location Set Covering Problem (LSCP)
Understanding the math behind the LSCP
Solving LSCPs
Exploring route-based combinatorial optimization problems
Understanding the math behind the TSP
Setting up the Google Maps API
Solving the TSP
Exploring a single-vehicle Vehicle Routing Problem (VRP)
Exploring a Capacitated Vehicle Routing Problem (CVRP)
Summary
11
Advanced Topics in Spatial Data Science
Technical requirements
Efficient operations with spatial indexing
Implementing R-tree indexing in GeoPandas
Introducing the H3 spatial index
Estimating unknowns with spatial interpolation
Applying Inverse Distance Weighted (IDW) interpolation
Introduction to Kriging-based interpolation
Ethical spatial data science
Example 1 – Sharpiegate
Example 2 – Human mobility: The New York Times investigative report
Example 3 – COVID-19 contact tracing
Example 4 – United States Census Bureau disclosure avoidance system
Summary
Index
Other Books You May Enjoy
Preface
By the time this book has been published, the world will have just formally exited a global pandemic, and society as a whole will be trying to grapple with the new normal in the post-COVID era. During the depths of the pandemic, spatial analysis was featured in prime time through the great work of Johns Hopkins University (JHU)’s COVID-19 dashboard, which can be found at https://coronavirus.jhu.edu/map.html. The JHU dashboard monitored the spread of the virus across the globe in near real time, and this map was likely the first time that the masses were exposed to the power of spatial analysis, spatial data visualization, and spatial data science. However, spatial analysis has long been used to analyze the spread of diseases. In fact, way back in 1854, John Snow produced a map of cholera deaths in London, which allowed him to show that cholera was spread through germs in water wells and not through miasma in the air, as many thought during that time.
Reeling from this global pandemic is not the only problem that our modern society faces. Today, supply chain issues that face economies across the globe are driving inflation to heights not seen in several decades. In addition to this, climate change is causing major rivers across the globe to dry up, including the Colorado and Mississippi rivers in the United States, the Yangtze in China, the Rhine in Germany, and the Danube in Romania. Climate change is also leading to more extreme weather events, yielding devastating flooding in areas such as Florida in the United States and Pakistan in South Asia.
We are also living through a time in which more and more people are willing to stand up for equity and call out inequities when they see them. In the United States and across the world, teams of people are researching high-profile inequities in terms of the global food supply, healthcare access, and financial services. Others are looking into lesser-known inequities, such as urban heat islands and lack of shade. Collectively, teams of this kind are working hard to ensure that future generations won’t face the inequities of their forefathers.
We now have the data, tools, and technology to begin to do something about each of these problems. Spatial analysis and data science have the potential to provide enormous value in helping us find solutions, perform resiliency planning, and better educate ourselves and those around us. However, while performing spatial analysis and producing compelling visualizations is now easier than ever, it is not without risks. By nature, maps and spatial data are representations of real-world processes and are often incomplete or can easily be manipulated and thus the truth can be distorted. One recent example of map manipulation happened in an event that has since been dubbed Sharpie-gate,
in which then-President Donald Trump altered an NOAA hurricane path map with a Sharpie in defiance of the scientific community. While this example may seem comical, there are many nuances to spatial analysis, data science, and cartography that you’ll need to be aware of as a burgeoning spatial data scientist.
This book is written for data scientists seeking to incorporate geospatial analysis into their work and for geographic information system (GIS) professionals seeking to incorporate data science methods into their work. Our goal is that this text will help these communities to develop a common understanding and shared vernacular, enabling them to properly incorporate geographic context into modeling, analysis, and visualization.
This book will begin with the fundamentals of GIS and data science before moving into detailed examples of spatial data science workflows built upon practical applications of geospatial data science that are industry agnostic. We will begin by teaching you the fundamentals of sourcing and working with geospatial data. Building upon this, we will teach you how to integrate spatial data and spatial thinking into your data science processes to hopefully improve model performance and develop a more accurate representation of the world around us.
We hope that you, as a member of the next generation of spatial data scientists, are empowered to leverage spatial thinking and analysis, which may help us find solutions to the problems currently facing our society and better prepare for the future ahead.
Who this book is for
This book is for you if you are a data scientist seeking to incorporate geospatial thinking into your workflows or a GIS professional seeking to incorporate data science methods into yours. You’ll need to have a foundational knowledge of Python for data analysis and/or data science.
What this book covers
Chapter 1, Introducing Geographic Information Systems and Geospatial Data Science, lays the foundations for the book by introducing you to GIS and its commonalities with and differences from geospatial data science. In this chapter, we also walk through the data science pipeline that you’ll follow throughout the book.
Chapter 2, What Is Geospatial Data and Where Can I Find It?, introduces you to common geospatial data types and formats that you’ll work with throughout your geospatial data science workflows. In this chapter, we’ll also introduce various categories of geospatial data, ranging from human geography to country- and area-specific data.
Chapter 3, Working with Geographic and Projected Coordinate Systems, will introduce you to geographic and projected coordinate systems and help you avoid some of the most common pitfalls of working with geospatial data.
Chapter 4, Exploring Geospatial Data Science Packages, covers a wide variety of Python geospatial data science packages that allow you to perform spatial data processing, analysis, visualization, and modeling.
Chapter 5, Exploratory Data Visualization, shows you how to harness the power of spatial data to create compelling static and dynamic mapping applications.
Chapter 6, Hypothesis Testing and Spatial Randomness, introduces you to the topic of complete spatial randomness and a variety of statistical tests to better understand whether your data reflects patterns across space.
Chapter 7, Spatial Feature Engineering, will walk you through how to derive new spatial-based features known as summary spatial features and proximity spatial features from both tabular and geo-enabled data assets.
Chapter 8, Spatial Clustering and Regionalization, introduces you to a class of unsupervised machine learning models known as clustering models, through which you’ll create spatial clusters and regions from your data.
Chapter 9, Developing Spatial Regression Models, will open your eyes to the power that spatial data can bring to regression models through the incorporation of spatial effects.
Chapter 10, Developing Solutions to Spatial Optimization Problems, will show you how to use linear programming in combination with spatial data to solve problems such as the Vehicle Routing Problem and the Location Set Covering Problem.
Chapter 11, Advanced Topics in Spatial Data Science, covers more advanced topics in spatial feature engineering, spatial modeling, and spatial ethics.
To get the most out of this book
As readers of this book, we assume that you come from a background in either data science or GIS. We also expect that you have some foundational knowledge of working with Python.
Additionally, you will need to set up keys to several APIs, from which you will access data throughout the book.
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
The quality of the hardware can impact the runtime for some analyses, as is the case for most data science activities. As such, we recommend hardware similar or better to the specified hardware outlined to prevent any potential issues:
NVIDIA GeForce GTX 1050
16 GB RAM
We recommend that you use Anaconda as your Python environment and package manager. To begin installing the Anaconda Distribution, you’ll want to visit the Anaconda Distribution installation website at https://docs.anaconda.com/anaconda/install/. The Python version we are using throughout this book is 3.10.6, as this is one of the latest versions of Python available at the time of publication. Leveraging this version will ensure that all packages are compatible. To make the setup of your virtual environment as streamlined as possible, we’ve exported our environment.yml file and uploaded it to the GitHub repository at https://github.com/PacktPublishing/Applied-Geospatial-Data-Science-with-Python.
To set up the virtual environment called GeospatialPython, launch Anaconda prompt and execute the following command:
conda env create -file environment.yml
You’ll need to substitute environment.yml for the full path of the downloaded file.
After the environment is installed, you can activate it by executing the following command:
conda activate GeospatialPython
Throughout the book, you’ll see the following code:
data_path = r'YOUR FILE PATH'
Anytime you see this, you’ll need to substitute ‘YOUR FILE PATH’ with the file path of the data folder which can be downloaded from the GitHub repo. The data stored in the GitHub repo can be found in the Releases section or by visiting https://github.com/PacktPublishing/Applied-Geospatial-Data-Science-with-Python/releases. There are three parts to the data:
Data.pt1.zip
LCMS_CONUS_v2021-7_Land_Cover_Annual_2021.zip
S2B_MSIL2A_20220504T161829_N0400_R040_T17TNF_20220504T210702.SAFE.zip
You’ll need to extract the contents of these zip folders and store the contents in a single folder. You’ll then point to this folder any time you see ‘YOUR FILE PATH’ referenced in the Jupyter notebooks.
Similarly, you will also see the following code from time to time:
out_path = rYOUR FILE PATH
You’ll need to substitute YOUR FILE PATH in this code reference with the directory to which you’d like the output to be saved.
Download the example code files
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Applied-Geospatial-Data-Science-with-Python. If there’s an update to the code, it will be updated in the GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Download the color images
We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://packt.link/AN9bG.
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: The PyProj package is useful when working with cartographic projections and geodetic transformations.
A block of code is set as follows:
world_ae = world.to_crs(ESRI:54032
)
graticules_ae = grat.to_crs(ESRI:54032
)
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
patients = 150
# number of demand points represented as patients
medical_centers = 4
# number of service points represented as medical centers
Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: We've called ours VRP Project. Select this project and scroll down to Enabled APIs. Then, select Directions API and click Enable.
Tips or important notes
Appear like this.
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would