Introduction To GIS Programming A Practical Python Guide To Open Source Geospatial Tools
Introduction To GIS Programming A Practical Python Guide To Open Source Geospatial Tools
Programming
A Practical Python Guide to Open Source
Geospatial Tools
Qiusheng Wu
2025
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Who This Book Is For . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
What This Book Covers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Getting the Most Out of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Conventions Used in This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Downloading the Code Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Video Tutorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Get in Touch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
About the Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Licensing and Copyright . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
I: Software Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1. Overview of Software Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3. Essential Software Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4. Tool Integration and Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5. Running Code Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2. Introduction to Python Package Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3. Installing Conda (Miniconda) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4. Understanding Conda Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5. Creating Your First Geospatial Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6. Troubleshooting Conda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.7. Essential Conda Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.8. Introducing uv: The Fast Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.9. Best Practices for Package Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.10. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.11. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3. Setting Up Visual Studio Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3. Installing Visual Studio Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4. Essential Extensions for Python Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5. Configure VS Code for Python Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.6. Essential Keyboard Shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.7. References and Further Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.8. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.9. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4. Version Control with Git . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3. Setting Up GitHub Account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
i
4.4. Installing Git . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.5. Configuring Git . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.6. Understanding Git Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.7. Essential Git Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.8. Using GitHub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.9. Integration with VS Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.10. Best Practices for Geospatial Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.11. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.12. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5. Using Google Colab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3. Getting Started with Google Colab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.4. Setting Up Your Geospatial Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.5. Essential Colab Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.6. Run Code Examples in Colab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.7. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.8. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6. Working with JupyterLab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.3. Installing and Setting Up JupyterLab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.4. Getting Started with JupyterLab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.5. Essential Keyboard Shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.6. Running Code Examples on MyBinder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.7. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.8. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7. Using Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.3. Installing Docker Desktop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.4. Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.5. Running Code Examples in Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.6. Common Docker Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.7. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.8. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
II: Python Programming Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8. Variables and Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.3. Variables in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.4. Naming Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.5. Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8.6. Escape Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.7. Comments in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.8. Working with Variables and Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.9. Basic String Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.10. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
ii
8.11. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
9. Python Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
9.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
9.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
9.3. Tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
9.4. Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
9.5. Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
9.6. Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
9.7. Data Structure Selection Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
9.8. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
9.9. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
10. String Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10.3. Creating and Manipulating Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10.4. String Methods for Geospatial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
10.5. String Formatting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
10.6. String Operation Decision Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
10.7. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
10.8. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
11. Loops and Conditional Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
11.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
11.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
11.3. For Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
11.4. While Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
11.5. Control Statements: Making Decisions in Your Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
11.6. Combining Loops and Control Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
11.7. Loop and Control Statement Decision Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
11.8. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
11.9. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
12. Functions and Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12.3. Functions: Building Reusable Code Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12.4. Classes: Organizing Data and Behavior Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
12.5. Combining Functions and Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
12.6. Function and Class Design Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
12.7. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
12.8. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
13. Working with Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
13.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
13.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
13.3. Creating a Sample File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
13.4. Reading and Writing Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
13.5. Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
13.6. Combining File Handling and Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
13.7. Working with Different File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
13.8. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
iii
13.9. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
14. Data Analysis with NumPy and Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
14.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
14.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
14.3. Introduction to NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
14.4. Introduction to Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
14.5. Combining NumPy and Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
14.6. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
14.7. Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
14.8. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
III: Geospatial Programming with Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
15. Introduction to Geospatial Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
15.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
15.2. The Geospatial Python Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
15.3. Understanding Library Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
15.4. Setting Up Your Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
15.5. Verification and First Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
15.6. Learning Path and Chapter Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
15.7. Key Concepts to Remember . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
15.8. Getting Help and Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
15.9. Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
15.10. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
16. Vector Data Analysis with GeoPandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
16.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
16.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
16.3. Core Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
16.4. Installing GeoPandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
16.5. Creating GeoDataFrames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
16.6. Reading and Writing Geospatial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
16.7. Projections and Coordinate Reference Systems (CRS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
16.8. Spatial Measurements and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
16.9. Visualizing Geospatial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
16.10. Advanced Geometric Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
16.11. Spatial Relationships and Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
16.12. Best Practices and Performance Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
16.13. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
16.14. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
17. Working with Raster Data Using Rasterio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
17.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
17.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
17.3. Installing Rasterio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
17.4. Reading Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
17.5. Visualizing Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
17.6. Accessing and Manipulating Raster Bands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
17.7. Writing Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
17.8. Clipping Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
17.9. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
17.10. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
iv
18. Multi-dimensional Data Analysis with Xarray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
18.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
18.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
18.3. Understanding Xarray’s Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
18.4. Setting Up Your Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
18.5. Loading and Exploring Real Climate Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
18.6. Working with DataArrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
18.7. Intuitive Data Selection and Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
18.8. Performing Operations on Multi-Dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
18.9. Data Visualization with Xarray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
18.10. Working with Datasets: Multiple Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
18.11. The Power of Label-Based Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
18.12. Advanced Indexing Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
18.13. High-Level Computational Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
18.14. Data Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
18.15. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
18.16. Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
18.17. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
19. Raster Analysis with Rioxarray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
19.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
19.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
19.3. Setting Up Your Rioxarray Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
19.4. Loading and Exploring Georeferenced Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
19.5. Fundamental Geospatial Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
19.6. Working with Spatial Dimensions and Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
19.7. Visualizing Geospatial Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
19.8. Data Storage and File Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
19.9. Coordinate System Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
19.10. Introduction to Band Math . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
19.11. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
19.12. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
20. Interactive Visualization with Leafmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
20.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
20.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
20.3. Installing and Setting Up Leafmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
20.4. Creating Interactive Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
20.5. Changing Basemaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
20.6. Visualizing Vector Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
20.7. Creating Choropleth Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
20.8. Visualizing GeoParquet Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
20.9. Visualizing PMTiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
20.10. Visualizing Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
20.11. Accessing and Visualizing Maxar Open Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
20.12. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
20.13. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
21. Geoprocessing with WhiteboxTools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
21.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
21.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
v
21.3. Why Whitebox? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
21.4. Useful Resources for Whitebox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
21.5. Installing Whitebox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
21.6. Watershed Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
21.7. LiDAR Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
21.8. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
21.9. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
22. 3D Mapping with MapLibre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
22.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
22.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
22.3. Useful Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
22.4. Installation and Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
22.5. Creating Interactive Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
22.6. Adding Map Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
22.7. Adding Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
22.8. Using MapTiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
22.9. 3D Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
22.10. Visualizing Vector Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
22.11. Visualizing Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
22.12. Adding Custom Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
22.13. Visualizing PMTiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
22.14. Adding DeckGL Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
22.15. Exporting to HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
22.16. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
22.17. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
23. Cloud Computing with Earth Engine and Geemap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
23.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
23.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
23.3. Introduction to Google Earth Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
23.4. Introduction to Interactive Maps and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
23.5. The Earth Engine Data Catalog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
23.6. Earth Engine Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
23.7. Earth Engine Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
23.8. Earth Engine Vector Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
23.9. More Tools for Visualizing Earth Engine Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
23.10. Vector Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
23.11. Raster Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
23.12. Exporting Earth Engine Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
23.13. Creating Timelapse Animations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
23.14. Charting Earth Engine Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
23.15. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
23.16. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
24. Hyperspectral Data Visualization with HyperCoast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
24.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
24.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
24.3. Environment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
24.4. Finding Hyperspectral Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
24.5. Downloading Hyperspectral Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
vi
24.6. Reading Hyperspectral Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
24.7. Visualizing Hyperspectral Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
24.8. Creating Image Cubes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
24.9. Interactive Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
24.10. Interactive Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
24.11. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
24.12. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
25. High-Performance Geospatial Analytics with DuckDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
25.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
25.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
25.3. Installation and Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
25.4. SQL Basics for Spatial Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
25.5. Python API Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
25.6. Data Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
25.7. Data Export . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
25.8. Working with Geometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
25.9. Spatial Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
25.10. Spatial Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456
25.11. Large-Scale Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
25.12. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
25.13. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
26. Geospatial Data Processing with GDAL and OGR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
26.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
26.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
26.3. Installation and Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
26.4. Sample Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
26.5. Understanding Your Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
26.6. Coordinate Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
26.7. Format Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
26.8. Clipping and Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
26.9. Raster Analysis and Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
26.10. Converting Between Raster and Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
26.11. Geometry Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
26.12. Managing Fields and Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
26.13. Tiling and Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
26.14. Advanced Raster Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
26.15. Terrain Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482
26.16. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
26.17. References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
26.18. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
27. Building Interactive Dashboards with Voilà and Solara . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
27.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
27.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
27.3. Installing Voilà and Solara . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
27.4. Introduction to Hugging Face Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
27.5. Creating a Basic Voilà Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
27.6. Creating an Advanced Web Application with Solara . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
27.7. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
vii
27.8. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
28. Distributed Computing with Apache Sedona . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
28.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
28.2. Learning Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
28.3. Installing and Setting Up Apache Sedona . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
28.4. Downloading Sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
28.5. Core Concepts and Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
28.6. Spatial Operations and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
28.7. Spatial Joins and Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
28.8. Advanced Spatial Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
28.9. Reading Vector Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
28.10. Visualizing Vector Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
28.11. Writing Vector Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
28.12. Reading Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
28.13. Visualizing Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
28.14. Raster Map Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
28.15. Raster Zonal Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
28.16. Writing Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532
28.17. Integration with GeoPandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
28.18. Real-World Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
28.19. Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
28.20. References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
28.21. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
viii
Preface
1
2
Introduction
Geographic Information Systems (GIS) and geospatial analysis have become fundamental tools across
numerous disciplines, from environmental science and urban planning to business analytics and public
health. As the volume and complexity of geospatial data continue to grow exponentially, the ability to
programmatically process, analyze, and visualize this data has become an essential skill for researchers,
analysts, and professionals working with spatial information.
Python has emerged as the leading programming language for geospatial analysis, offering a rich
ecosystem of libraries and tools that make complex spatial operations accessible to both beginners and
experts. However, the path from Python novice to confident geospatial programmer can seem daunting,
with numerous libraries to learn and concepts to master.
This book bridges that gap by providing a structured, practical approach to learning geospatial program-
ming with Python. Rather than overwhelming you with advanced techniques from the start, we focus on
building a solid foundation of essential skills that will serve you throughout your geospatial programming
journey. Each chapter builds upon the previous ones, ensuring you develop both theoretical understanding
and practical expertise.
The approach taken in this book is hands-on and example-driven. You’ll work with real geospatial
datasets, solve practical problems, and build projects that demonstrate the power of Python for geospatial
analysis and visualization. By the end of this book, you’ll have the confidence and skills to tackle your
own geospatial programming challenges.
3
The key requirement is curiosity and willingness to learn. While programming experience is helpful, it’s
not necessary. We start with the fundamentals and build up systematically.
4
Getting the Most Out of This Book
To maximize your learning experience with this book, consider the following recommendations:
Set Up a Proper Development Environment: Install Python and the required libraries as described in
the first section of the book. A well-configured environment will save you time and frustration throughout
your learning journey. Consider using conda or uv to manage your Python packages, as this simplifies
the installation of geospatial libraries.
Follow Along with Code Examples: This book is designed to be interactive. Don’t just read the code—
type it out, run it, and experiment with modifications. Understanding comes through practice, and each
example builds skills you’ll need later.
Work Through the Exercises: Each chapter includes exercises designed to reinforce the concepts you’ve
learned. These are not optional extras—they are an integral part of the learning process. Start with the
guided exercises, then challenge yourself with your own projects.
Use Real Data: While the book provides datasets for examples and exercises, try applying the techniques
to data from your own field or interests. This will help you understand how the concepts apply to real-
world scenarios and build confidence in your abilities.
Build Projects: As you progress through the book, consider working on a personal project that interests
you. This could be analyzing data from your research, creating maps for your community, or solving a
problem you’ve encountered in your work.
Be Patient with Yourself: Programming can be frustrating, especially when you’re learning. Expect
to encounter errors, spend time debugging, and occasionally feel stuck. This is normal and part of the
learning process. Take breaks when needed, and remember that expertise develops gradually through
consistent practice. If you get stuck, don’t hesitate to ask for help on the book’s GitHub repository.
Keep Practicing: The skills in this book require regular practice to maintain and develop. Set aside time
regularly to work on geospatial programming projects, even if they’re small ones.
Command Line Instructions: Commands to be entered at the command line or terminal are shown
with a $ prompt:
5
$ pip install leafmap
$ python script.py
Video Tutorials
Complementing the written content, this book is supported by a comprehensive series of video tutorials
that walk through key concepts and provide additional examples:
https://tinyurl.com/intro-gispro-videos
The videos are designed to complement, not replace, the written material. They’re particularly helpful for:
• Visual learners who benefit from seeing code being written and executed
• Understanding complex concepts through multiple explanations
• Learning about the development workflow and best practices
• Seeing how to approach problems and debug issues
The playlist is organized to follow the book’s structure. You can watch them in order as you progress
through the book, or jump to specific topics as needed.
The videos were created in Fall 2024 when I was teaching the Introduction to GIS Programming1
course at the University of Tennessee. Although the course has concluded, the videos remain relevant and
can be used as a reference for the book. Additional videos will be added in the future.
1 https://geog-312.gishub.org
6
Get in Touch
I welcome feedback, questions, and suggestions from readers. Your input helps improve the book and
makes it more useful for the geospatial programming community.
For book-related questions and discussions:
• GitHub Issues: https://github.com/giswqs/intro-gispro/issues
• GitHub Discussions: https://github.com/giswqs/intro-gispro/discussions
Types of feedback that are particularly helpful:
• Errors or unclear explanations in the text or code
• Suggestions for additional examples or use cases
• Ideas for new topics or chapters
• Reports of compatibility issues with different operating systems or library versions
• Success stories of how you’ve applied the techniques from the book
Acknowledgments
This book would not have been possible without the contributions and support of many individuals and
the broader open-source geospatial community.
The Open-Source Community: This book builds upon the incredible work of countless open-source
developers who have created and maintained the Python geospatial ecosystem. Special thanks to the
developers and maintainers of NumPy, Pandas, GeoPandas, Rasterio, Xarray, Rioxarray, Folium, ipyleaflet,
MapLibre, GDAL, and the many other libraries that make geospatial programming accessible.
Students and Colleagues: The questions, challenges, and insights from students in my geospatial
programming courses at the University of Tennessee have shaped the approach and content of this book.
Their feedback on what works and what doesn’t has been invaluable in creating materials that truly serve
learners.
Research Collaborators: Colleagues and collaborators in the geospatial research community have pro-
vided real-world use cases, datasets, and problem scenarios that inform the practical examples throughout
the book.
Family and Friends: Writing a technical book requires significant time and focus. I’m grateful for the
patience and support of family and friends who understood the many evenings and weekends dedicated
to this project.
The Broader GIS Community: The geospatial field is built on a foundation of sharing knowledge and
tools. This book is part of that tradition, and I’m honored to contribute to the resources available for
learning geospatial programming.
This book was written using MyST Markdown2 and compiled using Typst3 with the min-book4 template.
Credits to developers and maintainers of the Typst and MyST Markdown projects. Special thanks to
Maycon F. Melo5 for the min-book template and their help with customizing the template for this book.
2 https://mystmd.org
3 https://github.com/typst/typst
4 https://github.com/mayconfmelo/min-book
5 https://github.com/mayconfmelo
7
Any errors or omissions in this book remain my responsibility. I’m committed to addressing issues and
improving the content based on reader feedback.
6 https://geemap.org
7 https://leafmap.org
8 https://samgeo.gishub.org
9 https://opengeoai.org
10 https://github.com/opengeos
8
Section I: Software Setup
9
10
Chapter 1. Overview of Software Tools
1.1. Introduction
Successful geospatial programming requires more than just knowing Python—it demands a carefully
selected toolkit of software that works together seamlessly. Just as a carpenter needs the right combination
of tools to build a house, geospatial programmers need an integrated set of applications that handle
everything from managing Python packages to creating interactive maps.
This chapter introduces six essential tools that form the foundation of modern geospatial programming
workflows. These tools address the core challenges you’ll face: managing complex software dependencies,
writing and debugging code efficiently, tracking changes and collaborating with others, and running
analyses in both local and cloud environments.
11 https://docs.conda.io
12 https://code.visualstudio.com
11
Why it’s essential: VS Code combines the simplicity of a text editor with the power of an IDE, offering
intelligent code completion, debugging, Git integration, and native Jupyter notebook support—all essential
for efficient geospatial programming.
Key benefits:
• Intelligent Python code completion and debugging
• Native Jupyter notebook support for interactive analysis
• Built-in Git integration for version control
• Extensive extension ecosystem for geospatial tools
• Remote development capabilities for cloud computing
12
Why it’s essential: Geospatial analysis is inherently exploratory and iterative. JupyterLab provides the
perfect environment for combining code, visualizations, and documentation in an interactive workflow
that supports the experimental nature of spatial data analysis.
Key benefits:
• Combines code, visualizations, and documentation seamlessly
• Flexible workspace for managing multiple files and notebooks
• Excellent support for interactive data exploration
• Integration with various file formats and data sources
• Extensible architecture for specialized geospatial tools
16 https://www.docker.com
13
1.5. Running Code Examples
You can run the code examples from this book using several approaches:
Cloud Options (No Installation Required):
• Binder: https://mybinder.org/v2/gh/giswqs/intro-gispro/HEAD
• Google Colab: https://colab.research.google.com/github/giswqs/intro-gispro/blob/main
Local Development:
• Docker: docker run -it -p 8888:8888 -v $(pwd):/app/workspace giswqs/pygis:book
• Conda Environment: Follow the installation instructions in subsequent chapters
You will learn how to use these tools in the following chapters.
14
Chapter 2. Introduction to Python Package Management
2.1. Introduction
One of the biggest challenges in Python programming—especially in data science and geospatial analysis
—is managing the complex web of software dependencies. Different projects require different versions
of Python packages, and sometimes these requirements conflict with each other. Installing geospatial
libraries (e.g., GDAL, GEOS, PROJ) can be particularly challenging because they often depend on complex
underlying C/C++ libraries for tasks like coordinate transformations and data format support.
This is where package managers like Conda and uv become essential tools. Think of them as smart
assistants that handle the tedious work of finding, downloading, and installing the right versions of
software packages while ensuring everything works together harmoniously.
Why package management matters for geospatial programming:
• Complex dependencies: Geospatial libraries like GDAL and Rasterio have intricate dependencies on
system libraries
• Version compatibility: Different projects may require different versions of the same library
• Environment isolation: Keep project dependencies separate to avoid conflicts
• Reproducibility: Ensure your code works the same way on different computers
• Easy collaboration: Share exact environment specifications with colleagues
Conda is a cross-platform package manager that excels at handling complex dependencies, especially
for scientific computing. It can install not just Python packages, but also system libraries, compilers, and
other tools that geospatial packages often require.
uv is a newer, extremely fast package manager written in Rust that focuses on Python packages. While
it primarily works with PyPI17 (the Python Package Index), it offers significant speed improvements over
traditional tools like pip.
This chapter will teach you to use both tools effectively, helping you create reliable, reproducible devel-
opment environments for your geospatial programming projects.
17 https://pypi.org
15
2.3. Installing Conda (Miniconda)
2.3.1. Why Miniconda?
Miniconda18 is the recommended way to install Conda. Unlike the full Anaconda19 distribution, which
comes with hundreds of pre-installed packages, Miniconda includes only the essential components: conda,
Python, and a few basic packages. This minimal approach gives you more control over what gets installed
in your environment.
2.3.2. Installation
To install Miniconda, visit the Miniconda installation page: https://tinyurl.com/miniconda-install. De-
pending on your operating system, you can choose the appropriate graphical installer or the command
line installer. The instructions below are for the command line installer.
2.3.2.1. Windows
Open Windows PowerShell as administrator and run the following command:
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe -
o .\miniconda.exe
start /wait "" .\miniconda.exe /S
del .\miniconda.exe
After the installation is complete, you can close the PowerShell window. Go to the Start menu and search
for “Anaconda Prompt” to open a new terminal.
2.3.2.2. macOS
If you are using macOS with Apple Silicon, run the following command:
mkdir -p ~/miniconda3
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -o ~/
miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh
If you are using macOS with Intel, run the following command:
mkdir -p ~/miniconda3
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ~/
miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh
18 https://www.anaconda.com/docs/getting-started/miniconda/main
19 https://www.anaconda.com
16
After the installation is complete, close the terminal and open a new terminal. Run the following
commands to initialize conda:
source ~/miniconda3/bin/activate
conda init --all
2.3.2.3. Linux
If you are using Linux, run the following command:
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/
miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh
After the installation is complete, close the terminal and open a new terminal. Run the following
commands to initialize conda:
source ~/miniconda3/bin/activate
conda init --all
conda --version
conda info
You should see version information and system details. If you get a “command not found” error, conda
may not be in your PATH (see troubleshooting section below).
By default, conda will automatically activate the base environment when you open a new terminal. It
is recommended to disable automatic activation of the base environment to avoid accidentally installing
packages into the base environment. You can do this by running the following command:
17
2.4.1. Environments
Think of a conda environment as an isolated workspace for a specific project. Each environment has its
own Python installation and packages, preventing conflicts between different projects.
Why use environments:
• Isolation: Keep project dependencies separate
• Reproducibility: Create identical setups on different machines
• Experimentation: Test new packages without affecting other projects
• Collaboration: Share exact environment specifications with teammates
2.4.2. Channels
Channels are repositories where conda packages are stored. Different channels may have different
versions or specialized packages.
There are two main channels for conda packages:
• defaults: Anaconda’s main channel. It is the default channel for conda. However, the packages in this
channel are often outdated.
• conda-forge: A community-driven channel with many geospatial packages. The packages in this
channel are often up-to-date. It is the recommended channel for installing geospatial packages.
18
2.6. Troubleshooting Conda
If you’re on Windows and conda commands aren’t recognized in your terminal, you may need to manually
add conda to your system PATH. This is a common issue when the installer doesn’t automatically
configure the PATH variable.
Method 1: Using Environment Variables GUI
1. Open the Start Menu and search for “Environment Variables”
2. Click on “Edit the system environment variables” (see Figure 1)
3. In the System Properties window, click “Environment Variables”
4. Under “System Variables,” find the Path variable and select it
5. Click “Edit” and then “New”
6. Add these paths (replace <YourUsername> with your actual username):
• C:\Users\<YourUsername>\miniconda3
• C:\Users\<YourUsername>\miniconda3\Scripts
• C:\Users\<YourUsername>\miniconda3\Library\bin
7. Click “OK” to close all windows
8. Restart your terminal or command prompt
19
conda init cmd.exe
Activate an environment:
After activation, your terminal prompt will show the environment name, indicating you’re working within
that environment.
Deactivate the current environment:
conda deactivate
20
# Remove entire environment and all its packages
conda remove -n myenv --all
Update packages:
21
# Update specific packages
conda update numpy pandas
Remove packages:
20 https://mamba.readthedocs.io
22
# Install mamba in the base environment (do this once)
conda install -n base mamba -c conda-forge
name: geo
channels:
- conda-forge
- defaults
dependencies:
- python=3.12
- geopandas
- rasterio
23
- geemap
- leafmap
- jupyterlab
2.8.1. Installing uv
On macOS and Linux:
On Windows:
pip install uv
21 https://docs.astral.sh/uv
24
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate
# Install packages
uv pip install jupyterlab leafmap
25
2.9.3. Collaboration and Reproducibility
• Share environment files: Include environment.yml in your project repositories
• Document installation steps: Include clear setup instructions in README files
• Test on fresh environments: Verify your code works in clean environments
• Use version pinning carefully: Balance stability with updates
2.11. Exercises
These hands-on exercises will help you practice essential package management skills for geospatial
programming.
26
Tasks:
1. Create a new conda environment called gispro with Python 3.12
2. Activate the environment and install mamba
3. Use mamba to install these essential packages from conda-forge:
• geopandas
• rasterio
• geemap
• leafmap
• jupyterlab
4. Verify the installation by listing all installed packages
5. Test the environment by running Python and importing each package
27
Chapter 3. Setting Up Visual Studio Code
3.1. Introduction
In the world of geospatial programming, having the right development environment can make the
difference between frustration and productivity. Visual Studio Code22 (VS Code) is the go-to code editor
for Python developers, data scientists, and geospatial programmers worldwide—and for good reasons.
Think of VS Code as your digital workshop: a place where you’ll write code, debug problems, manage
files, explore data, create visualizations, and collaborate with others. Unlike simple text editors, VS Code
understands your code, helps you write it more efficiently, and integrates seamlessly with the tools you’ll
use for geospatial analysis.
Why VS Code excels for geospatial programming:
• Python mastery: Exceptional support for Python with intelligent code completion, debugging, and
testing
• Jupyter integration: Run Jupyter notebooks directly in the editor—perfect for geospatial data explo-
ration
• Git version control: Built-in Git support for tracking changes and collaborating on projects
• Extension ecosystem: Thousands of extensions, including specialized tools for Python and Jupyter
• Remote development: Work on cloud servers or remote machines as if they were local
• Multi-language support: Handle Python, JavaScript, SQL, and markup languages in one place
• Free and open-source: Professional-grade tools available to everyone
What makes VS Code special for our work: Unlike heavyweight IDEs that can be overwhelming, or
basic editors that lack functionality, VS Code strikes the perfect balance. It’s lightweight enough to start
quickly, yet powerful enough to handle complex geospatial projects involving multiple data formats, large
datasets, and sophisticated analyses.
Whether you’re processing satellite imagery, analyzing GPS tracks, building web maps, or developing
geospatial web applications, VS Code provides the tools and flexibility you need to work efficiently and
effectively. I have been using VS Code for geospatial programming for several years, and it has become
an indispensable tool for my daily work.
22 https://code.visualstudio.com
28
3.3. Installing Visual Studio Code
VS Code is available for Windows, macOS, and Linux, and the installation process is straightforward on
all platforms.
29
• IntelliCode: AI-assisted code completion and suggestions
• CodeSnap: Generate beautiful code screenshots for documentation
30
Method 2: Using the Command Palette
1. Press Ctrl+Shift+P to open Command Palette
2. Type “Extensions: Install Extensions”
3. Search and install extensions
To run the code, go to Run > Run Without Debugging (or press Ctrl+F5 ). You should see the message
“Geospatial Programming with Python” printed in the terminal. If the terminal is not visible, you can
open it by going to View > Terminal.
As mentioned above, VS Code has native support for Jupyter notebooks. You can create a new Jupyter
notebook by going to File > New File. Enter a file name (e.g., hello-world.ipynb ) and select Jupyter
Notebook. This will create a new Jupyter notebook in the current folder. Follow the steps below to run
the notebook (see Figure 4):
1. Double-click the notebook file under the Explorer panel to open it.
2. Click on the Select Kernel button in the top right corner of the notebook and select the geo
environment you created in the previous chapter.
31
3. Click on the + Cell button to add a new cell.
4. Set the cell type to code by selecting Python in the lower right corner of the cell.
5. Write the following code in the cell and click on the Run button to the left of the cell.
You should see the message “Geospatial Programming with Python” printed in the notebook. Press
Ctrl+S to save the notebook.
32
• New File: Ctrl+N
• Save: Ctrl+S
• Save All: Ctrl+K S
• Close Tab: Ctrl+W
• Reopen Closed Tab: Ctrl+Shift+T
33
3.7. References and Further Learning
Official Documentation:
• VS Code Python Tutorial23
• Jupyter Notebooks in VS Code24
Keyboard Shortcuts Reference:
• Windows Shortcuts25
• macOS Shortcuts26
• Linux Shortcuts27
3.9. Exercises
These hands-on exercises will help you become proficient with VS Code for geospatial programming.
23 https://code.visualstudio.com/docs/python/python-tutorial
24 https://code.visualstudio.com/docs/datascience/jupyter-notebooks
25 https://code.visualstudio.com/shortcuts/keyboard-shortcuts-windows.pdf
26 https://code.visualstudio.com/shortcuts/keyboard-shortcuts-macos.pdf
27 https://code.visualstudio.com/shortcuts/keyboard-shortcuts-linux.pdf
34
Tasks:
1. Install VS Code and the essential extensions (Python, Jupyter, Remote Development)
2. Create a new folder called gis-workspace
3. Open the folder in VS Code ( File > Open Folder )
4. Connect VS Code to your geo conda environment using “Python: Select Interpreter”
5. Create the recommended folder structure (data, notebooks, src, etc.)
35
Chapter 4. Version Control with Git
4.1. Introduction
4.1.1. What is Git?
Imagine you’re working on a complex geospatial analysis project. You’ve spent weeks writing code to
process satellite imagery, analyze land use patterns, and create beautiful visualizations. Then disaster
strikes: you accidentally delete a crucial function, or an experiment goes wrong and corrupts your data
processing pipeline. Without proper version control, you might lose days or weeks of work.
This is where Git28 becomes your safety net and productivity superpower. Git is a distributed version
control system that tracks every change to your files, creating a complete history of your project’s evolu-
tion. Think of it as a sophisticated “undo” system that never forgets, combined with powerful collaboration
tools that enable seamless teamwork.
Why Git is essential for geospatial programming:
• Safety net: Never lose work again—every version of your code is safely stored
• Experimentation: Try new approaches fearlessly, knowing you can always revert changes
• Collaboration: Work seamlessly with team members on complex geospatial projects
• Documentation: Track what changed, when, and why with detailed commit messages
• Reproducibility: Ensure others can recreate your exact analysis environment
• Professional workflow: Industry-standard tool used by developers worldwide
Git integrates seamlessly with the tools you’ve already learned—VS Code has excellent Git support,
and Jupyter notebooks work beautifully with Git workflows. Combined with platforms like GitHub, Git
provides the foundation for modern, collaborative geospatial development.
28 https://git-scm.com
29 https://github.com
36
4.2. Learning Objectives
By the end of this chapter, you should be able to:
• Create a GitHub account and understand GitHub fundamentals
• Install and configure Git for geospatial development workflows
• Create and manage local Git repositories for your geospatial projects
• Connect local repositories to GitHub and manage remote repositories
• Track changes, commit code, and write meaningful commit messages
• Use basic branching strategies for organizing your work
• Collaborate effectively using GitHub’s pull request workflow
• Apply Git best practices specific to geospatial data and code
• Integrate Git workflows with VS Code and other development tools
37
# Check Git version
git --version
You should see output showing the Git version number (e.g., git version 2.49.0 ).
4.5.2. Verification
Check your configuration:
Please make sure you use the same email address for Git and GitHub. Otherwise, you will not be able to
push your changes to GitHub.
38
Branch: A parallel version of your repository
• main branch typically contains stable, working code
• Feature branches contain experimental or in-development work
Remote: A version of your repository stored on a server (like GitHub)
• Enables collaboration and serves as a backup
39
# Clone a repository from GitHub
git clone https://github.com/<your-username>/intro-gispro.git
You can also add all files in the current directory to the staging area using the following command:
Commit changes:
To commit changes, you can use the following command:
40
Format: Start with a verb, keep under 50 characters, be specific about what changed.
Once you have connected your local repository to the remote repository on GitHub, you can push your
changes to GitHub using the following command:
To pull changes from GitHub, you can use the following command:
41
# Delete merged branch
git branch -d satellite-analysis
42
4.8.2. Basic Collaboration
Working with others:
1. Fork someone else’s repository to your GitHub account
2. Clone your fork to your local machine
3. Make changes and commit them locally
4. Push changes to your fork on GitHub
5. Create a Pull Request to suggest changes to the original repository
my-geo-project/
├── README.md
├── .gitignore
├── requirements.txt
├── data/
│ ├── raw/
│ └── processed/
├── src/
│ ├── data_processing.py
│ └── analysis.py
├── notebooks/
│ └── exploratory_analysis.ipynb
└── outputs/
├── figures/
└── results/
43
4.10.2. What to Track vs. Ignore
Track these files:
• Source code ( .py , .r , .sql )
• Jupyter notebooks ( .ipynb )
• Configuration files ( requirements.txt , environment.yml )
• Documentation ( .md , .rst )
• Small reference datasets
Don’t track these files (add to .gitignore ):
• Large data files (use Git LFS or external storage)
• Temporary files ( .tmp , .cache )
• Environment files ( .env with secrets)
• Output files that can be regenerated
• IDE-specific files ( .vscode/settings.json )
Types: feat (new feature), fix (bug fix), docs (documentation), refactor (code cleanup)
44
Collaboration Benefits:
• Share geospatial analysis methods with the research community
• Contribute to open-source geospatial libraries
• Build a portfolio showcasing your programming skills
• Work effectively with distributed teams
Mastering Git and GitHub will make you a more confident, collaborative, and professional geospatial
programmer. You’ll never lose work, can experiment fearlessly, and can contribute to the broader geospa-
tial community.
4.12. Exercises
These exercises will help you master Git and GitHub for geospatial programming workflows.
45
6. Create a pull request back to the original repository
7. Write a clear description of your changes
Expected outcome: Understanding of the fork-clone-branch-PR workflow for collaboration.
46
Chapter 5. Using Google Colab
5.1. Introduction
Google Colab30 is a cloud-based Jupyter notebook environment that has revolutionized how we approach
geospatial programming and data science education. Think of it as having a powerful computer in the
cloud that you can access from anywhere, with the ability to share your work instantly with colleagues
around the world. It is free to use and requires no installation on your local machine. It even provides free
GPU and TPU access for machine learning and deep learning tasks. You will be able to run our book code
examples in Colab with no additional setup.
Why Google Colab is transformative for geospatial programming:
• Zero setup: No installation, configuration, or environment management—just open a browser and start
coding
• Free GPU/TPU access: Process satellite imagery and run machine learning models on powerful
hardware
• Pre-installed packages: Popular geospatial libraries like GDAL, geemap, GeoPandas, and ipyleaflet
are readily available
• Seamless sharing: Collaborate on geospatial analyses as easily as sharing a Google Doc
• Google Drive integration: Store and access your datasets and notebooks in the cloud
• No local storage limits: Work with large geospatial datasets without filling up your hard drive
This chapter will teach you to leverage Colab’s full potential for geospatial programming. You’ll learn
to work efficiently in this cloud environment and integrate it seamlessly with your broader geospatial
programming toolkit.
30 https://colab.research.google.com
47
2. Sign in with your Google account
3. Create a new notebook: Click “New notebook” or use the scratchpad
4. Choose your runtime: By default, Colab provides a CPU runtime. You can change it to a GPU or TPU
runtime if needed for your geospatial computations. Go to Runtime -> Change runtime type ->
select GPU or TPU .
%pip list
48
You should see a list of packages that are pre-installed in Colab. Some commonly used packages for
geospatial programming that are pre-installed in Colab include: gdal, geemap, geopandas, and xarray. You
can also install additional packages by running the following code:
To install multiple packages at once, you can use the following code:
For example, to install the leafmap and pygis packages, you can run the following code:
Note that the pip command is prefixed with % in Colab. This is because Colab uses Jupyter notebooks,
and the % is a special character in Jupyter notebooks called magic commands that allows you to run
system commands. You can also use the ! prefix to run system commands. For example, to install the
pygis package, you can run the following code:
Sometimes, the installation log output might be overwhelming. You can suppress the output by adding
-q to the end of the command. For example, to install the pygis package quietly, you can run the
following code:
To check if the package is installed, you can run the following code:
You should see the information about the geemap package in the output, including the version number,
package location, and other information. This is helpful if you want to know where the package is
installed. At the time of writing, the Python version pre-installed in Colab is 3.11. All Python packages
are installed in the /usr/local/lib/python3.11/dist-packages directory.
To uninstall a package, you can run the following code:
49
5.5. Essential Colab Features
5.5.1. Code Execution and Navigation
To become proficient in using Colab, you need to know how to navigate the interface and execute code
efficiently. The most commonly used keyboard shortcuts are shown below.
Keyboard shortcuts for efficient coding:
• Run current cell: Ctrl + Enter
• Run cell and move to next: Shift + Enter
• Run cell and insert new cell below: Alt + Enter
• Run selected text: Ctrl + Shift + Enter
• Comment/uncomment: Ctrl + /
Cell management shortcuts:
• Insert cell above: Ctrl + M + A
• Insert cell below: Ctrl + M + B
• Delete cell: Ctrl + M + D
• Change to code cell: Ctrl + M + Y
• Change to markdown cell: Ctrl + M + M
Navigation helpers:
• Jump to class definition: Ctrl + Click on class names
• View function documentation: Hover over functions
• Code completion: Tab key while typing
50
from google.colab import files
51
5.8. Exercises
These hands-on exercises will help you master Google Colab for geospatial programming workflows.
52
Chapter 6. Working with JupyterLab
6.1. Introduction
JupyterLab31 is a powerful, web-based interactive development environment that serves as the next-
generation interface for Jupyter notebooks. Think of it as an integrated workspace where you can write
code, visualize data, manage files, and document your analysis—all within a single browser window.
Unlike traditional code editors that separate different tools, JupyterLab brings everything together in one
cohesive environment.
At its core, JupyterLab builds upon the familiar Jupyter notebook concept, where you can combine
executable code, rich text, equations, and visualizations in a single document. However, JupyterLab
extends this concept by providing a more flexible and feature-rich interface that accommodates complex
workflows and project management needs.
JupyterLab is particularly well-suited for geospatial programming because spatial data analysis is inher-
ently exploratory and iterative. When working with geographic data, you often need to:
• Load and inspect datasets from various sources
• Experiment with different processing methods
• Create and refine visualizations
• Document your methodology for reproducibility
• Share results with collaborators
In this chapter, we will learn how to install and configure JupyterLab for geospatial programming
workflows. We will also learn how to run the code examples in this book in JupyterLab.
31 https://jupyter.org
53
conda create -n geo python=3.12
conda activate geo
conda install -c conda-forge jupyterlab leafmap
You should see the version number of JupyterLab. At the time of writing, the latest version of JupyterLab
is 4.4.1. If you see an error message, check that your environment is activated and packages installed
correctly.
jupyter lab
You should see the JupyterLab interface in your browser (see Figure 6). By default, JupyterLab will launch
on port 8888.
You can also launch JupyterLab on a specific port by adding the --port option. For example, to launch
JupyterLab on port 8889, you can run the following command:
54
# Launch on a specific port (useful if the default port is busy)
jupyter lab --port=8889
55
• File Browser (folder icon): Navigate your project files, create new folders, upload data files, and manage
your project structure
• Running Kernels (circle icon): Monitor active notebooks and terminals, useful for managing multiple
analyses
• Table of Contents (list icon): Navigate long notebooks by their headings
• Extension Manager: Add new functionality to JupyterLab
2. Main Work Area (Your Canvas)
This is where your actual work happens:
• Tabbed interface: Each notebook, text file, terminal, or data viewer opens in its own tab
• Split views: Drag tabs to create side-by-side or top-bottom layouts
• Flexible arrangement: Organize your workspace to match your workflow
3. Top Menu Bar (Your Toolbox)
Contains traditional menu items organized by function:
• File: Create, open, save, and export files
• Edit: Copy, paste, find, and replace operations
• View: Control interface layout and appearance
• Run: Execute notebook cells and manage kernels
• Kernel: Restart, interrupt, or change computational kernels
• Tabs: Switch between different notebooks, text files, terminals, and data viewers
• Settings: Change the appearance of the interface
• Help: Access documentation and keyboard shortcuts
4. Status Bar (Your Information Panel)
Shows important real-time information:
• Kernel status: Whether your Python environment is busy or idle
• Line/column position: Your current location in code
• File information: Details about the current file
• System status: Memory usage and other system information
Click the “Run” button to run the code. Another code cell will be created automatically. Type the following
code in the second cell:
m = leafmap.Map(style="liberty", projection="globe")
m
Click the “Run” button to run the code. A map will be created in the third cell (see Figure 6). You can zoom
in and out the map by scrolling the mouse wheel. We will learn more about leafmap in Section III of
this book about geospatial programming with Python.
56
Once you are done with the notebook, you can save it by clicking the File menu and selecting Save
Notebook. Alternatively, you can press Ctrl+S to save the notebook.
57
6.5.3. Command Mode Shortcuts (Press Esc first)
Once you’re comfortable with the universal shortcuts, these command mode shortcuts will significantly
speed up your workflow:
Creating and Managing Cells:
• A : Insert cell above current cell
‣ Useful when you need to add documentation or setup code before existing analysis
• B : Insert cell below current cell
‣ Most common way to add new cells as you build your analysis
• DD : Delete selected cell (press D twice)
‣ The double-press prevents accidental deletions
‣ Don’t worry—you can undo with Z , not Ctrl+Z
• Z : Undo cell deletion
‣ Recovers accidentally deleted cells
Copying and Moving Cells:
• C : Copy selected cell(s)
• X : Cut selected cell(s)
• V : Paste cell(s) below
‣ Useful for duplicating analysis steps or reorganizing your notebook
Changing Cell Types:
• Y : Change cell to Code
‣ Convert markdown cells back to code cells
• M : Change cell to Markdown
‣ Convert code cells to documentation cells
‣ Essential for creating well-documented geospatial analyses
Navigation and Selection:
• ↑/↓ : Navigate between cells
‣ Much faster than clicking, especially in long notebooks
• Shift + ↑/↓ : Extend cell selection
‣ Select multiple cells for batch operations
• Shift + M : Merge selected cells
‣ Combine multiple cells into one (useful for consolidating code)
58
‣ Quickly comment/uncomment lines of code for testing
• Ctrl + A : Select all text in cell
• Ctrl + Z : Undo text edit
‣ Different from command mode undo—this undoes changes within a cell
6.8. Exercises
These exercises are designed to help you build practical skills with JupyterLab in a geospatial context.
Work through them at your own pace, and don’t hesitate to experiment beyond the basic requirements.
32 https://mybinder.org
59
conda create -n geolab python=3.12
conda activate geolab
60
• Convert all cells to markdown, then back to code
• Delete and recover cells using DD and Z
5. Time yourself:
• See how quickly you can create a 10-cell notebook with alternating code/markdown cells
• Practice until you can do this in under 2 minutes
61
Chapter 7. Using Docker
7.1. Introduction
Have you ever tried to install geospatial python packages on a new computer, only to run into compati-
bility issues, missing dependencies, or version conflicts? Or maybe you’ve had code that works perfectly
on your machine but fails when a colleague tries to run it on theirs? These are common problems in
geospatial programming, where we often work with complex libraries that have many dependencies.
Docker33 solves these problems by providing a way to package your entire development environment –
including the operating system, Python, libraries, and your code – into a container. Think of a container
like a lightweight virtual computer that runs inside your actual computer. This container includes every-
thing needed to run your geospatial programs, and it works the same way on any computer that has
Docker installed.
33 https://www.docker.com
62
7.3.1. Windows and macOS
1. Go to https://www.docker.com/products/docker-desktop
2. Download Docker Desktop for your operating system
3. Run the installer and follow the setup instructions
4. After installation, Docker Desktop will start automatically
5. You’ll see the Docker icon in your system tray (Windows) or menu bar (macOS)
7.3.2. Linux
There are multiple ways to install Docker on Linux, depending on the distribution. Please refer to the
Docker installation guide34 for the latest instructions.
docker --version
You should see output showing the Docker version. At the time of writing, the latest version of Docker
Desktop is 4.42.0.
To start Docker Desktop (Figure 7), click the Docker icon in the system tray (Windows) or menu bar
(macOS). You can manage Docker containers and images from Docker Desktop. Alternatively, you can
manage Docker containers and images from the terminal, which will be covered in the next section.
34 https://docs.docker.com/desktop/setup/install/linux/
63
Figure 7: The Docker Desktop application.
64
After running this command, you should see output indicating that Jupyter is starting. Look for a line
that shows:
http://127.0.0.1:8888/lab?token=your_token_here
Press Ctrl and click the link to open JupyterLab in your web browser. Alternatively, copy this URL into
your web browser to access JupyterLab. Once you are in JupyterLab, you can navigate to the book folder
and open any notebook to run the code examples.
Note that the pygis docker image has multiple tags, including book and latest . You can check the
available tags by visiting the Docker Hub page35.
You can use the book tag to run the code examples in this book. You can also use the latest tag to run
the latest version of the pygis docker image. All the Python packages needed to run the code examples
in this book are installed in the pygis docker image. You can install additional packages by running
pip install <package_name> in the container.
docker ps
35 https://hub.docker.com/r/giswqs/pygis/tags
65
docker images
# Stop a container
docker stop container_name
# Remove a container
docker rm container_name
# Remove an image
docker rmi image_name
66
1. Choose an appropriate Docker image (like giswqs/pygis:book )
2. Run the container with volume mounting to share files
3. Work in the containerized environment (often through Jupyter)
4. Your files are automatically saved to your computer through the volume mount
5. Stop the container when done
Docker bridges the gap between the complexity of geospatial software and the need for accessible,
reproducible research environments.
7.8. Exercises
7.8.1. Exercise 1: First Docker Run
1. Install Docker on your computer following the instructions in this chapter
2. Create a new folder called docker_test
3. Navigate to this folder in your terminal
4. Run the leafmap Docker container:
67
7.8.4. Exercise 4: Docker Commands Practice
1. Use docker ps to see running containers
2. Use docker images to see downloaded images
3. Stop your container using docker stop (you’ll need to find the container name with docker ps
first)
4. Use docker ps -a to see your stopped container
5. Start a new container and note how it gets a different container ID
These exercises will give you hands-on experience with the fundamental Docker operations you’ll use in
geospatial programming projects.
68
Section II: Python Programming Fundamentals
69
70
Chapter 8. Variables and Data Types
8.1. Introduction
In Python programming, variables and data types are the fundamental building blocks that allow us to
store and work with information. Just as a geographer needs to understand basic map elements before
creating complex spatial analyses, a programmer must master variables and data types before working
with geospatial data.
Variables act as containers that hold information. They allow us to store, access, and modify data
throughout our programs. In geospatial work, variables might store coordinates for a specific location,
elevation measurements, or place names. Data types define what kind of information we’re working with
and what operations we can perform on that data.
Understanding variables and data types is especially important in geospatial programming because spatial
data comes in many different forms: coordinates are typically decimal numbers, place names are text, and
true/false flags might indicate data quality. Learning to work with these different types of data effectively
is essential for any geospatial analysis.
num_points = 120
This variable num_points now holds the integer value 120, which we can use in our calculations or logic.
To view the value of the variable, we can use the print() function:
print(num_points)
Alternatively, we can simply type the variable name in a code cell and run the cell to display the value of
the variable:
71
num_points
# Change to text
location_data = "Boston"
print(location_data)
72
# Poor variable names - avoid these
x = 42.3601 # Too generic
data = "Boston" # Too vague
temp = 25.6 # Ambiguous - temperature or temporary?
l = [42.36, -71.06] # Single letter variables are hard to understand
b) Floating-point numbers (float): These are numbers with a decimal point, e.g., 3.14, −0.001, 100.0.
c) Strings (str): Strings are sequences of characters, e.g., “Hello”, “Geospatial Data”, “Lat/Long”
Strings can be enclosed in single quotes ( ' ) or double quotes ( " ). You can also use triple quotes ( '''
or """ ) for multiline strings.
d) Booleans (bool): Booleans represent one of two values: True or False
e) Lists: Lists are ordered collections of items, which can be of any data type.
coordinates = [
35.6895,
139.6917,
] # A list representing latitude and longitude of a point
feature_attributes = {
"name": "Mount Fuji",
"height_meters": 3776,
"type": "Stratovolcano",
73
"location": [35.3606, 138.7274],
}
If you want to include a single quote in a string, you can wrap the string in double quotes.
Alternatively, you can use the escape character \' to include a single quote in a string.
# This is a comment
num_points = 120 # This is an inline comment
num_features += 20
print("Updated number of features:", num_features)
74
Converting latitude from degrees to radians (required for some geospatial calculations):
import math
latitude = 35.6895
latitude_radians = math.radians(latitude)
print("Latitude in radians:", latitude_radians)
We first import the math module, which contains the radians() function. Then we convert the latitude
from degrees to radians.
Adding new coordinates to the list of coordinates using the append() method:
mount_fuji_name = feature_attributes["name"]
mount_fuji_height = feature_attributes["height_meters"]
print(f"{mount_fuji_name} is {mount_fuji_height} meters high.")
# Convert to lowercase
city_lowercase = city_name.lower()
print("Lowercase:", city_lowercase)
# Convert to uppercase
city_uppercase = city_name.upper()
print("Uppercase:", city_uppercase)
75
8.9.2. Replacing Text
The replace() method allows you to substitute parts of a string with new text:
8.11. Exercises
8.11.1. Exercise 1: Variable Assignment and Basic Operations
Create variables to store the following geospatial data:
• The latitude and longitude of New York City: 40.7128, −74.0060.
• The population of New York City: 8,336,817.
• The area of New York City in square kilometers: 783.8.
Perform the following tasks:
1. Calculate and print the population density of New York City (population per square kilometer).
2. Print the coordinates in the format “Latitude: [latitude], Longitude: [longitude]”.
76
8.11.2. Exercise 2: Working with Strings
Create a string variable to store the name of a city, such as “New York City”. Perform the following
operations:
1. Convert the string to lowercase and print the result.
2. Convert the string to uppercase and print the result.
3. Replace “New York” with “Los Angeles” in the city name and print the new string.
77
Chapter 9. Python Data Structures
9.1. Introduction
Data structures are the building blocks that allow us to organize and store multiple pieces of related
information together. While individual variables hold single values like a coordinate or place name,
data structures let us group multiple values in meaningful ways. This becomes essential in geospatial
programming where we often work with collections of coordinates, lists of place names, sets of unique
identifiers, and structured attribute information.
Python provides several built-in data structures that are particularly useful for geospatial work: tuples for
storing fixed coordinate pairs, lists for sequences of locations along a path, sets for collections of unique
identifiers, and dictionaries for organizing feature attributes. Understanding when and how to use each
of these structures is fundamental to effective geospatial programming.
These data structures serve as the foundation for more complex geospatial operations. Whether you’re
tracking a GPS route with a list of coordinates, storing unique country codes in a set, or organizing
feature attributes in a dictionary, mastering these basic structures will enable you to work efficiently with
spatial data.
9.3. Tuples
Tuples are immutable sequences, which means that once a tuple is created, you cannot change, add, or
remove its elements. This immutability makes tuples perfect for storing data that should remain constant
throughout your program’s execution. In geospatial programming, tuples are commonly used to represent
coordinate pairs, since a point’s latitude and longitude typically shouldn’t change once defined.
The immutable nature of tuples provides several advantages: they’re memory-efficient, can be used as
dictionary keys (which we’ll see later), and help prevent accidental modification of important spatial
reference data. When you want to represent a fixed geographic location, a tuple is often the best choice.
78
tokyo_point = (
35.6895,
139.6917,
) # Tuple representing Tokyo's coordinates (latitude, longitude)
print(f"Tokyo coordinates: {tokyo_point}")
latitude = tokyo_point[0]
longitude = tokyo_point[1]
print(f"Latitude: {latitude}")
print(f"Longitude: {longitude}")
9.4. Lists
Lists are ordered, mutable sequences that can store multiple items in a single container. Unlike tuples, lists
are mutable, which means you can change, add, or remove elements after the list has been created. This
flexibility makes lists incredibly useful for geospatial applications where you need to build collections
of data dynamically, such as tracking a GPS route, storing elevation measurements along a transect, or
maintaining a collection of waypoints.
Lists maintain the order of elements, which is crucial for geospatial applications where sequence matters.
For example, when representing a path or route, the order of waypoints determines the direction of travel.
79
Lists can store different types of data (numbers, strings, tuples, or even other lists), making them versatile
containers for complex geospatial information.
Note that the append() method modifies the list in place, meaning it directly changes the original list
rather than returning a new list.
80
# Access the first city in our route
first_stop = route[0]
print(f"First stop: {first_stop}")
9.5. Sets
Sets are unordered collections of unique elements, meaning they automatically eliminate duplicates and
don’t maintain any particular order. This makes sets incredibly useful in geospatial programming when
you need to work with unique identifiers, remove duplicate entries from datasets, or perform operations
like finding common elements between different spatial datasets.
81
Sets are particularly valuable when working with categorical spatial data. For example, you might want
to track unique country codes in a global dataset, identify distinct land cover types in a study area, or
maintain a collection of unique coordinate system identifiers. The automatic duplicate removal feature of
sets saves you from having to manually check for and remove repeated values.
82
Original set: {'Asia', 'Europe', 'North America'}
After adding Africa: {'Asia', 'Europe', 'North America', 'Africa'}
After trying to add Europe again: {'Asia', 'Europe', 'North America', 'Africa'}
if "Antarctica" in regions_visited:
print("We have visited Antarctica")
else:
print("Antarctica not yet visited")
83
9.6. Dictionaries
Dictionaries are collections of key-value pairs where each key is unique and maps to a specific value. This
structure is perfect for storing related information about geographic features, where you need to associate
descriptive attributes with specific identifiers. In geospatial programming, dictionaries are extensively
used to store feature attributes, metadata about datasets, configuration settings, and any situation where
you need to organize information by meaningful names rather than numeric positions.
Think of dictionaries as lookup tables or filing systems where you can quickly find information using a
descriptive key. For example, instead of remembering that population data is stored in position 2 of a list,
you can simply use the key “population” to access that information directly. This makes your code more
readable and less prone to errors.
84
print(f"City: {city_name}")
print(f"Population: {city_population:,}")
print(f"Coordinates: {city_coords}")
Keep in mind that if you try to access a key that doesn’t exist in a dictionary, you’ll get a KeyError error.
To avoid this, you can use the get() method as introduced below to safely access a key and provide a
default value if the key doesn’t exist.
print(f"Area: {area}")
print(f"Timezone: {timezone}")
85
"coordinates": (48.8566, 2.3522),
"population": 2161000,
},
"UK": {
"capital": "London",
"coordinates": (51.5074, -0.1278),
"population": 8982000,
},
}
As you can see from the example above, a dictionary can be nested within another dictionary. This is useful
when you need to store information about a city, and then store information about the city’s population,
coordinates, and country.
86
"name": "Trail Start",
"latitude": 45.3311,
"longitude": -121.7113,
"elevation": 1200,
"description": "Beginning of Pacific Crest Trail section",
"waypoint_type": "trailhead",
"facilities": ["parking", "restrooms", "water"],
"difficulty": "easy",
}
print(f"Waypoint: {waypoint['name']}")
print(f"Location: {waypoint['latitude']}, {waypoint['longitude']}")
print(f"Elevation: {waypoint['elevation']} meters")
print(f"Available facilities: {', '.join(waypoint['facilities'])}")
87
• Tuples provide immutable containers perfect for coordinate pairs and fixed reference data
• Lists offer flexible, ordered collections ideal for sequences like routes and measurement series
• Sets automatically handle uniqueness and provide powerful operations for comparing datasets
• Dictionaries organize complex attribute information with meaningful, descriptive keys
As you work with real geospatial data, you’ll often combine these structures - for example, using a list of
dictionaries to represent multiple geographic features, or storing coordinate tuples within a dictionary’s
values. Practice with these fundamental structures will prepare you for more advanced geospatial
programming concepts and libraries.
The exercises below will help reinforce these concepts and give you hands-on experience applying these
data structures to common geospatial scenarios.
9.9. Exercises
9.9.1. Exercise 1: Using Lists
Create a list of tuples, where each tuple contains the name of a city and its corresponding latitude and
longitude:
• New York City: (40.7128, −74.0060)
• Los Angeles: (34.0522, −118.2437)
• Chicago: (41.8781, −87.6298)
Perform the following tasks:
1. Add a new city (e.g., Miami: (25.7617, −80.1918)) to the list.
2. Print the entire list of cities.
3. Slice the list to print only the first two cities.
88
9.9.4. Exercise 4: Working with Dictionaries
Create a dictionary to store information about a specific geospatial feature, such as a river:
• Name: “Amazon River”
• Length: 6400 km
• Countries: [“Brazil”, “Peru”, “Colombia”]
Perform the following tasks:
1. Add a new key-value pair to the dictionary to store the river’s average discharge (e.g., 209,000 m³/s).
2. Update the length of the river to 6992 km.
3. Print the dictionary.
89
Chapter 10. String Operations
10.1. Introduction
Strings are sequences of characters that represent textual information in Python programs. In geospatial
programming, strings play a crucial role in handling various types of text-based data: place names,
coordinate system definitions, file paths, data extracted from text files, and even coordinate values that
come in text format from GPS devices or web services.
Working effectively with strings is essential for many common geospatial tasks. You might need to clean
messy place names from a dataset, parse coordinates from a GPS log file, format location information for
display on a map, or build file paths to access geographic data files. Understanding how to manipulate
strings gives you the tools to handle these text-processing challenges efficiently.
Python provides a rich set of built-in string operations and methods that make working with textual data
straightforward and powerful. Whether you’re concatenating location names, extracting numeric values
from coordinate strings, or formatting output for reports, mastering these fundamental string operations
will significantly enhance your ability to work with geospatial data.
print(f"Location: {location_name}")
90
print(f"Country: {country}")
print(f"Description: {description}")
91
# Check if a string contains only letters
city_name = "SanFrancisco"
print(f"Is '{city_name}' alphabetic? {city_name.isalpha()}")
location_info = (
location_name
+ " is located at coordinates "
+ str(latitude)
+ ", "
+ str(longitude)
)
print(location_info)
92
# Case conversion examples
mountain_name = "Mount Everest"
print(f"Original: '{messy_location}'")
print(f"strip(): '{messy_location.strip()}'") # Remove both sides
print(f"lstrip(): '{messy_left.lstrip()}'") # Remove left side
print(f"rstrip(): '{messy_right.rstrip()}'") # Remove right side
# Basic replacement
location = "Mount Everest, Nepal"
updated_location = location.replace("Everest", "Kilimanjaro")
print(f"Original: {location}")
print(f"Updated: {updated_location}")
93
# Replace multiple occurrences
path_string = "data/raw_data/geographic_data/raw_data/points.csv"
clean_path = path_string.replace("raw_data/", "")
print(f"Original path: {path_string}")
print(f"Clean path: {clean_path}")
# Basic splitting
location_full = "Mount Everest, Nepal, Asia"
location_parts = location_full.split(", ")
print(f"Original: {location_full}")
print(f"Split into parts: {location_parts}")
The example below shows how to split a string into a list of coordinates:
The example below shows how to split a file path into a list of path components using the split()
method with a forward slash as the separator:
94
10.4.5. String Joining
The join() method combines lists of strings into single strings, which is the reverse of splitting. For
example, you might have a list of city names, and you want to join them into a single string with a comma
and a space between each city name. To use the join() method, you need to provide a string as the
separator. The example below shows how to join a list of city names into a single string with a comma
and a space between each city name:
# Basic joining
city_names = ["San Francisco", "New York", "Tokyo"]
city_name = ", ".join(city_names)
print(f"Joined city name: {city_name}")
The example below shows how to join a list of file paths into a single file path using the join() method
with a forward slash as the separator:
The example below shows how to join a list of coordinates into a string:
95
# Simple variable insertion
location_info = f"Location: {location}"
print(location_info)
# Multiple variables
coordinates = f"Coordinates: ({latitude}, {longitude})"
print(coordinates)
print(coords_2_places)
print(coords_4_places)
96
number inside to indicate the position of the variable to be formatted. The example below shows how to
format a number with a certain number of decimal places:
current_time = datetime.datetime.now()
survey_lat = 45.3311
survey_lon = -121.7113
filename = f"survey_{current_time.strftime('%Y%m%d')}_{survey_lat:.4f}
N_{abs(survey_lon):.4f}W.csv"
print(f"Generated filename: {filename}")
The example below shows how to build a SQL query with formatting:
97
# Building SQL queries with formatting
table_name = "cities"
min_population = 1000000
region = "North America"
98
combination of basic string operations, formatting techniques, and parsing strategies provides a powerful
toolkit for handling the textual aspects of geospatial programming.
As you advance in geospatial programming, you’ll find these string manipulation skills valuable when
working with coordinate reference systems, parsing spatial data formats, and integrating with web
services that return geographic information in text format.
The exercises below will help you practice these concepts with realistic geospatial scenarios.
10.8. Exercises
10.8.1. Exercise 1: Manipulating Geographic Location Strings
• Create a string that represents the name of a geographic feature (e.g., "Amazon River" ).
• Convert the string to lowercase and then to uppercase.
• Concatenate the string with the name of the country (e.g., "Brazil" ) to create a full location name.
• Repeat the string three times, separating each repetition with a dash ( - ).
99
10.8.5. Exercise 5: Parsing and Extracting Address Information
• Given a string in the format "Street, City, Country" (e.g.,
"123 Main St, Springfield, USA" ), write a function that parses the string into a dictionary with
keys street , city , and country .
• The function should return a dictionary like
{"street": "123 Main St", "city": "Springfield", "country": "USA"} .
100
Chapter 11. Loops and Conditional Statements
11.1. Introduction
Looping and control statements are fundamental programming concepts that allow you to automate
repetitive tasks and make decisions based on data conditions. In geospatial programming, these concepts
become particularly powerful when working with collections of spatial data, such as processing multiple
coordinate points, analyzing sets of geographic features, or performing calculations across entire datasets.
Think of loops as a way to avoid writing the same code multiple times. Instead of manually processing
each coordinate in a list of 100 GPS points, you can write a loop that automatically handles all of them
with just a few lines of code. Control statements, on the other hand, allow your programs to make intel-
ligent decisions. For example, categorizing coordinates by hemisphere, filtering data based on elevation
thresholds, or handling different types of geographic features appropriately.
These concepts are essential building blocks that enable you to process real-world geospatial datasets
efficiently. Whether you’re analyzing GPS tracks, processing survey data, or working with geographic
features from a database, loops and control statements will help you automate tedious tasks and build
more intelligent, responsive programs.
101
coordinates = [
(35.6895, 139.6917),
(34.0522, -118.2437),
(51.5074, -0.1278),
] # List of tuples representing coordinates
Notice how we use tuple unpacking here—each coordinate tuple (lat, lon) is automatically
unpacked into separate lat and lon variables. This is a very common and useful pattern when working
with coordinate data.
This example demonstrates how loops can help you perform the same calculation (computing distance)
for multiple coordinate pairs without writing repetitive code. The :.2f formatting ensures the distance
is displayed with two decimal places. Note that we created a function named calculate_distance to
encapsulate the distance calculation logic. We will learn more about functions in the Functions and
Classes chapter.
print("Major US Cities:")
for i, city in enumerate(cities):
print(f"{i+1}. {city}")
102
In the above example, we used a for loop to iterate through a list of city names. The enumerate function
is used to get the index of each item in the list. This allows us to print the index and the city name in a
single line. The index starts at 0 and increments by 1 for each item in the list. This can be useful when
you need to keep track of the position of each item in the list.
counter = 0
while counter < len(coordinates):
lat, lon = coordinates[counter]
print(f"Processing coordinate: ({lat}, {lon})")
counter += 1
Important Note: When using while loops, always make sure your condition will eventually become
False , or you’ll create an infinite loop! In the example above, we increment the counter variable in
each iteration to ensure the loop will eventually end.
target_latitude = 40.0
tolerance = 0.1
attempts = 0
max_attempts = 10
103
attempts += 1
The above example shows how to generate random coordinates until you find one within a certain
tolerance of a target latitude. This is a common pattern in geospatial programming when you need to find
a specific location in a dataset. Try running the code block multiple times to see how the results vary.
104
for lat, lon in coordinates:
if lat > 0:
hemisphere = "Northern"
else:
hemisphere = "Southern"
if lon > 0:
direction = "Eastern"
else:
direction = "Western"
print(
f"The coordinate ({lat}, {lon}) is in the {hemisphere} Hemisphere and
{direction} Hemisphere."
)
The above example combines a for loop with an if statement to categorize coordinates by hemisphere. This
demonstrates how loops and control statements can be used together to process and analyze geospatial
data.
The and and or operators are used to combine multiple conditions in a single expression. The and
operator returns True if both conditions are True , otherwise it returns False . The or operator
returns True if at least one of the conditions is True , otherwise it returns False . The not operator
returns the opposite of the condition.
105
Common patterns include filtering data based on conditions, counting items that meet specific criteria,
transforming data selectively, and building new datasets from existing ones.
filtered_coordinates = []
for lat, lon in coordinates:
if lon > 0:
filtered_coordinates.append((lat, lon))
print(f"Filtered coordinates (only with positive longitude):
{filtered_coordinates}")
southern_count = 0
for lat, lon in coordinates:
if lat < 0:
southern_count += 1
print(f"Number of coordinates in the Southern Hemisphere: {southern_count}")
# Initialize counters
northern_count = 0
southern_count = 0
eastern_count = 0
western_count = 0
valid_coordinates = []
print("Coordinate Analysis:")
print("-" * 40)
106
for lat, lon in analysis_coordinates:
# Validate coordinates (basic check)
if -90 <= lat <= 90 and -180 <= lon <= 180:
valid_coordinates.append((lat, lon))
# Count by hemisphere
if lat >= 0:
northern_count += 1
else:
southern_count += 1
# Count by longitude
if lon >= 0:
eastern_count += 1
else:
western_count += 1
print(f"\nSummary:")
print(f"Valid coordinates: {len(valid_coordinates)}")
print(f"Northern Hemisphere: {northern_count}")
print(f"Southern Hemisphere: {southern_count}")
print(f"Eastern Longitude: {eastern_count}")
print(f"Western Longitude: {western_count}")
107
• You need to filter data based on specific criteria
• You want to handle different types of input data appropriately
11.9. Exercises
11.9.1. Exercise 1: Using For Loops to Process Coordinate Lists
• Create a list of tuples representing coordinates (latitude, longitude).
• Write a for loop that prints each coordinate and indicates whether it is in the Northern or Southern
Hemisphere based on the latitude.
108
11.9.4. Exercise 4: Filtering Data with Combined Loops and Conditionals
• Given a list of coordinates, filter out and store only those located in the Southern Hemisphere (latitude
< 0).
• Count the number of coordinates that meet this condition and print the result.
109
Chapter 12. Functions and Classes
12.1. Introduction
Functions and classes are two of the most powerful organizational tools in Python programming, and they
become especially valuable when working with geospatial data. These concepts allow you to structure
your code in ways that make it more readable, reusable, and maintainable—essential qualities when
dealing with complex spatial analysis tasks.
Functions serve as the building blocks of organized code. Instead of writing the same distance calculation
or coordinate transformation code multiple times throughout your program, you can encapsulate that
logic into a function and call it whenever needed. This approach, known as “Don’t Repeat Yourself” (DRY),
makes your code more efficient and reduces the chance of errors.
Classes take organization a step further by allowing you to bundle related data and functions together
into logical units. For example, instead of managing separate variables for latitude, longitude, elevation,
and place name, you can create a Location class that keeps all this information together along with
methods to manipulate it.
In geospatial programming, these concepts become particularly powerful. You might create functions to
calculate distances, convert between coordinate systems, or validate GPS data. You could design classes
to represent geographic features like points, lines, or polygons, each with their own properties and
behaviors. As your spatial analysis projects grow more complex, these organizational tools will help you
build sophisticated yet manageable geospatial applications.
110
12.3.1. Defining a Simple Function
The basic structure of a function includes the def keyword, a function name, parameters in parentheses,
and a colon followed by an indented code block. Here’s a simple function that adds two numbers:
# Example usage
result = add(5, 3)
print(f"Result: {result}")
# Example usage
print(greet("Alice")) # Uses the default greeting
print(greet("Bob", "Hi")) # Overrides the default greeting
111
# Function to multiply two numbers
def multiply(a, b):
return a * b
Figure 8: A diagram illustrating great-circle distance (drawn in red) between two points on a sphere.
First, let’s import the necessary functions from the math module:
36 https://shorturl.at/rO0Yy
112
Then, we define the function haversine that takes four parameters: lat1 , lon1 , lat2 , and lon2 .
The function returns the distance between the two points in kilometers.
# Example usage
distance = haversine(35.6895, 139.6917, 34.0522, -118.2437)
print(f"Distance: {distance:.2f} km")
113
print(f"Distance in kilometers: {distance_km:.2f} km")
This enhanced version shows the power of default parameters in geospatial programming:
• Convenience: Most of the time, you want distance in kilometers, so the default works perfectly
• Flexibility: When you need different units, you can easily override the default
• Professional practice: Real geospatial applications often need to handle multiple unit systems
def batch_haversine(coord_list):
distances = []
for i in range(len(coord_list) - 1):
lat1, lon1 = coord_list[i]
lat2, lon2 = coord_list[i + 1]
distance = haversine(lat1, lon1, lat2, lon2)
distances.append(distance)
return distances
# Example usage
coordinates = [(35.6895, 139.6917), (34.0522, -118.2437), (40.7128, -74.0060)]
distances = batch_haversine(coordinates)
print(f"Distances: {distances}")
def average(*numbers):
return sum(numbers) / len(numbers)
114
# Example usage
print(average(10, 20, 30)) # 20.0
print(average(5, 15, 25, 35)) # 20.0
The *args parameter allows the function to accept any number of arguments, which are then treated
as a tuple inside the function. This is useful for calculations where the number of inputs might vary.
return description
# Example usage
print(describe_point(35.6895, 139.6917, name="Tokyo", population=37400000))
print(describe_point(34.0522, -118.2437, name="Los Angeles", state="California"))
This pattern allows functions to handle a flexible set of optional parameters, making them more adaptable
to different use cases while maintaining a clean, consistent interface.
115
12.4.1. Defining a Simple Class
Let’s start with a simple Point class that demonstrates the basic structure of class definition. This class
will represent geographic points with coordinates and optional names:
class Point:
def __init__(self, latitude, longitude, name=None):
self.latitude = latitude
self.longitude = longitude
self.name = name
def __str__(self):
return f"{self.name or 'Point'} ({self.latitude}, {self.longitude})"
# Example usage
point1 = Point(35.6895, 139.6917, "Tokyo")
print(point1)
class Point:
def __init__(self, latitude, longitude, name=None):
self.latitude = latitude
self.longitude = longitude
self.name = name
# Example usage
point1 = Point(35.6895, 139.6917, "Tokyo")
point2 = Point(34.0522, -118.2437, "Los Angeles")
print(
116
f"Distance from {point1.name} to {point2.name}:
{point1.distance_to(point2):.2f} km"
)
class Point:
def __init__(self, latitude, longitude, name="Unnamed"):
self.latitude = latitude
self.longitude = longitude
self.name = name
class Route:
def __init__(self, points):
self.points = points
def total_distance(self):
total_dist = 0
for i in range(len(self.points) - 1):
total_dist += self.points[i].distance_to(self.points[i + 1])
return total_dist
# Example usage
route = Route([point1, point2])
print(f"Total distance: {route.total_distance():.2f} km")
117
• You want to break down complex calculations into manageable pieces
• You need to validate, transform, or process data in a consistent way
• You want to create reusable utilities for common geospatial tasks
Use Classes When:
• You need to represent complex entities with multiple related attributes
• You want to bundle data and the operations that work on that data together
• You’re modeling real-world geographic objects (points, routes, regions)
• You need to maintain state and behavior together in a logical unit
Best Practices:
• Give functions and classes descriptive names that clearly indicate their purpose
• Keep functions focused on a single task (single responsibility principle)
• Use default parameters to make functions more flexible and user-friendly
• Test your functions with various inputs to ensure they work correctly
12.8. Exercises
12.8.1. Exercise 1: Calculating Distances with Functions
• Define a function calculate_distance that takes two geographic coordinates (latitude and longitude)
and returns the distance between them using the Haversine formula37.
• Use this function to calculate the distance between multiple pairs of coordinates.
• Test your function with coordinates for cities you know and verify the results make sense.
37 https://shorturl.at/rO0Yy
118
12.8.2. Exercise 2: Batch Distance Calculation
• Create a function batch_distance_calculation that accepts a list of coordinate pairs and returns a
list of distances between consecutive pairs.
• Test the function with a list of coordinates representing several cities.
• Add an optional parameter to specify the units (kilometers or miles) with a default value.
119
Chapter 13. Working with Files
13.1. Introduction
In the world of geospatial programming, data rarely exists in isolation within your program. Most often, it
comes from external sources: text files containing GPS coordinates, CSV files with spatial measurements,
or configuration files with map settings. Learning to work with files is therefore fundamental to any
geospatial application.
This chapter introduces the essential techniques for reading from and writing to files in Python, with a
special focus on handling geospatial data. You’ll also learn about exception handling—a critical skill that
ensures your programs can gracefully handle unexpected situations like missing files, corrupted data, or
network interruptions.
Think of file handling as the bridge between your Python program and the external world of data. Whether
you’re processing GPS tracks from a hiking expedition, analyzing weather station data, or working with
coordinates from a mapping survey, you’ll need these fundamental skills to get data into your program
and save results back out.
Exception handling, meanwhile, is your safety net. In real-world scenarios, files might be missing, data
might be corrupted, or network connections might fail. Rather than letting these issues crash your
program, exception handling allows you to anticipate problems and respond appropriately—perhaps by
displaying a helpful error message, trying an alternative approach, or gracefully skipping problematic
data.
120
sample_data = """35.6895,139.6917
34.0522,-118.2437
51.5074,-0.1278
-33.8688,151.2093
48.8566,2.3522"""
output_file = "coordinates.txt"
try:
with open(output_file, "w") as file:
file.write(sample_data)
print(f"Sample file '{output_file}' has been created successfully.")
except Exception as e:
print(f"An error occurred while creating the file: {e}")
121
# Example of reading coordinates from a file and writing to another file
input_file = "coordinates.txt"
output_file = "output_coordinates.txt"
try:
# Step 1: Read the coordinate data from our input file
with open(input_file, "r") as infile:
# readlines() returns a list where each element is a line from the file
# Each line still includes the newline character (\n) at the end
coordinates = infile.readlines()
# Step 2: Process the data and write it to a new file with better formatting
with open(output_file, "w") as outfile:
for line in coordinates:
# strip() removes whitespace (including \n) from both ends of the
string
# split(",") breaks the line into parts wherever it finds a comma
lat, lon = line.strip().split(",")
except FileNotFoundError:
print(f"Error: The file {input_file} was not found.")
print(
"Make sure you ran the previous code cell to create the coordinates.txt
file."
)
This code example demonstrates how to read data from a file and write to another file. The readlines
method is used to read the lines from the file. The write method is used to write the lines to the new
file. The strip method is used to remove the newline character from the end of the line. The split
method is used to split the line into a list of strings. You can also use the writelines method to write a
list of strings to a file. By combining these methods, we can read the data from the file and write it to the
new file with the desired format. This is a common pattern in geospatial programming when you need to
process data from a file and write the results to another file.
122
Think of exceptions as Python’s way of saying “Something went wrong, but I’ll let you decide how to
handle it.” Without exception handling, a single missing file could stop your entire geospatial analysis.
With proper exception handling, your program can continue processing other files, log the error, or try
alternative approaches.
The basic structure uses three keywords:
• try : Contains the code that might cause an error
• except : Contains the code that runs if an error occurs
• finally : Contains code that runs regardless of whether an error occurred (optional)
Let’s see how this works with a practical example of parsing coordinate data that might be malformed.
Args:
line (str): A string containing coordinates in the format "lat,lon"
Returns:
tuple: (latitude, longitude) as floats, or None if parsing fails
"""
try:
# Attempt to split the line and convert to numbers
lat, lon = line.strip().split(",")
lat = float(lat) # This might raise ValueError if lat isn't a valid
number
lon = float(lon) # This might raise ValueError if lon isn't a valid
number
return lat, lon
except ValueError as e:
# This happens when we can't convert to float or split doesn't work as
expected
print(f"Data format error: {e}. Could not parse line: '{line.strip()}'")
return None
except Exception as e:
# This catches any other unexpected errors
print(f"An unexpected error occurred: {e}")
return None
123
"45.0,not_a_number", # Invalid longitude
"only_one_value", # Missing comma
]
This code example demonstrates how to process geographic coordinates and identify invalid data. This is
a very common task in geospatial programming when you need to process field survey data or GPS data,
which might contain invalid entries.
try:
print(f"Starting to process file: {input_file}")
124
if not line.strip():
continue
coordinates = parse_coordinates(line)
if coordinates:
lat, lon = coordinates
print(
f"Line {line_number}: Processed coordinates ({lat:.4f},
{lon:.4f})"
)
processed_count += 1
else:
print(f"Line {line_number}: Skipped due to parsing error")
error_count += 1
except FileNotFoundError:
print(f"Error: The file '{input_file}' was not found.")
print("Please check the file path and make sure the file exists.")
return
except PermissionError:
print(f"Error: Permission denied when trying to read '{input_file}'.")
print("Please check if you have read permissions for this file.")
return
except Exception as e:
print(f"An unexpected error occurred while processing the file: {e}")
return
finally:
# This block always runs, whether there was an error or not
print(f"\n--- Processing Summary ---")
print(f"Successfully processed: {processed_count} coordinates")
print(f"Errors encountered: {error_count} lines")
print(f"Finished processing {input_file}")
125
CSV files are particularly common in geospatial work because they can easily store tabular data like
coordinate lists, attribute tables, or measurement records. Let’s create and work with a more structured
coordinate file that includes city names.
csv_file = "cities.csv"
try:
with open(csv_file, "w") as file:
file.write(csv_data)
print(f"CSV file '{csv_file}' created successfully.")
except Exception as e:
print(f"Error creating CSV file: {e}")
def read_city_coordinates(filename):
"""
Read city coordinate data from a CSV-style file.
try:
with open(filename, "r") as file:
lines = file.readlines()
cities.append(
{
"name": city_name,
126
"latitude": latitude,
"longitude": longitude,
}
)
except ValueError as e:
print(f"Warning: Could not parse line {line_num}:
{line.strip()}")
continue
except FileNotFoundError:
print(f"Error: File '{filename}' not found.")
return []
except Exception as e:
print(f"Error reading file: {e}")
return []
return cities
127
• Keep track of processing statistics to help users understand the results
These skills will serve you well as we move into more advanced geospatial topics, where data quality and
error handling become even more critical.
13.9. Exercises
These exercises will help you practice the fundamental file handling and exception handling concepts
covered in this chapter. Each exercise builds on the previous ones, so it’s recommended to complete them
in order.
128
34.0522,-118.2437
51.5074,-0.1278
-33.8688,151.2093
48.8566,2.3522"""
output_file = "coordinates.txt"
try:
with open(output_file, "w") as file:
file.write(sample_data)
print(f"Sample file '{output_file}' has been created successfully.")
except Exception as e:
print(f"An error occurred while creating the file: {e}")
129
Chapter 14. Data Analysis with NumPy and Pandas
14.1. Introduction
In geospatial programming, you’ll frequently work with large amounts of numerical data: coordinates
from GPS devices, elevation measurements, temperature readings, population statistics, and much more.
While Python’s built-in data types (lists, dictionaries) can handle small datasets, they become inefficient
and cumbersome when dealing with the volume and complexity of real-world geospatial data.
This is where NumPy38 and Pandas39 become essential tools in your geospatial programming toolkit. Think
of NumPy as your mathematical powerhouse. It provides efficient storage and operations for numerical
data in arrays, making calculations on thousands or millions of data points fast and straightforward.
Pandas, on the other hand, is your data organization expert. It excels at handling structured, tabular data
(like spreadsheets) and provides intuitive tools for cleaning, analyzing, and manipulating datasets.
Why NumPy matters for geospatial work:
• Efficiently stores and processes large arrays of coordinates, measurements, or sensor data
• Provides fast mathematical operations needed for distance calculations, coordinate transformations,
and statistical analysis
• Forms the foundation for most other geospatial Python libraries
Why Pandas matters for geospatial work:
• Handles datasets with mixed data types (city names, coordinates, population numbers, dates)
• Makes it easy to filter, group, and summarize geospatial datasets
• Seamlessly reads data from common formats like CSV files, databases, and web APIs
• Provides tools for handling missing or inconsistent data—common issues in real-world datasets
Together, these libraries will transform how you work with geospatial data, making complex analyses
both possible and efficient.
38 https://numpy.org
39 https://pandas.pydata.org
130
What makes NumPy arrays special?
• Speed: Operations on NumPy arrays are much faster than equivalent operations on Python lists
• Memory efficiency: Arrays store data more compactly, important when working with large datasets
• Vectorization: You can perform mathematical operations on entire arrays at once, rather than looping
through individual elements
• Multidimensional: Arrays can have multiple dimensions, perfect for storing coordinate grids, images,
or sensor data
In geospatial contexts, you’ll use NumPy arrays to store:
• Lists of coordinates (latitude/longitude pairs)
• Elevation data from digital elevation models
• Time series of measurements from weather stations
• Pixel values from satellite imagery
• Results of spatial calculations
Let’s start with the basics of creating and working with NumPy arrays.
import numpy as np
Let’s create a NumPy array from a list of numbers, which may represent elevation measurements at
different points:
Each NumPy array is an object that contains a sequence of numbers. It is similar to a list in Python, but
it is more efficient for numerical operations. The dtype attribute of an array tells you the data type of
the elements in the array. The shape attribute of an array tells you the dimensions of the array.
Let’s create a 2D array of coordinates:
131
print(f"2D Array (city coordinates):\n{coordinates}")
print(f"Array shape: {coordinates.shape}") # (3 rows, 2 columns)
print(f"Number of cities: {coordinates.shape[0]}")
Sometimes you need to create a NumPy array with a specific pattern. For example, you might want to
create an array of zeros or ones. You can do this with the zeros and ones functions.
Let’s create an array of zeros, which may be useful when you need to initialize an array before filling it
with data:
# You can multiply by any number to get arrays filled with that value
weights = np.ones((2, 4)) * 0.5 # Array filled with 0.5
print(f"Array of weights (0.5):\n{weights}")
The arange function is a built-in function that creates a NumPy array with a sequence of numbers. It is
similar to the range function in Python, but it returns a NumPy array instead of a list.
132
In geospatial programming, you’ll frequently need to transform data: convert units, apply scaling factors,
normalize coordinates, or perform calculations across entire datasets. NumPy makes these operations
simple and efficient.
The above example shows how to perform vectorized operations on NumPy arrays. Vectorized operations
are operations that are performed on entire arrays at once, rather than looping through individual
elements. This is much faster than looping through individual elements.
In addition to plus and minus operations, you can also perform other mathematical operations on NumPy
arrays. For example, you can perform multiplication, division, and exponentiation on NumPy arrays.
Vectorized operations can also be performed between NumPy arrays. For example, you can perform
multiplication, division, and exponentiation on NumPy arrays:
# Apply different weights to latitude and longitude (useful for some projections)
weights = np.array([1.0, 0.8]) # Weight latitude normally, reduce longitude
weight
weighted_coords = coordinates * weights
print(f"Original coordinates:\n{coordinates}")
print(f"Weighted coordinates:\n{weighted_coords}")
133
# You can also perform operations between arrays of the same shape
coord_differences = coordinates - coordinates[0] # Distance from first city
(Tokyo)
print(f"Coordinate differences from Tokyo:\n{coord_differences}")
Conversely, you can flatten a multidimensional array back to 1D. This is useful when you need to perform
operations on the entire array at once. For example, you can use the flatten method to flatten a 2D
array into a 1D array:
134
flattened_again = coordinate_pairs.flatten()
print(f"Flattened back to 1D: {flattened_again}")
# Exponential functions
growth_rates = np.array([0.01, 0.02, 0.03, 0.05]) # 1%, 2%, 3%, 5% growth rates
exponential_growth = np.exp(growth_rates) # e^x
print(f"Growth rates: {growth_rates}")
print(f"Exponential factors: {exponential_growth.round(3)}")
135
14.3.5. Statistical Operations
Statistical analysis is fundamental to understanding geospatial data. NumPy provides efficient functions
to calculate descriptive statistics that help you understand your data’s characteristics: central tendencies
(mean, median), variability (standard deviation, variance), and distributions (min, max, percentiles).
These statistics are particularly important in geospatial work for data quality assessment, outlier detec-
tion, and understanding spatial patterns.
Let’s create a NumPy array of elevation readings from a hiking trail and calculate some basic statistics:
elevation_profile = np.array(
[1245, 1367, 1423, 1389, 1456, 1502, 1478, 1398, 1334, 1278]
)
136
random_latitudes = np.random.uniform(-90, 90, n_points)
random_longitudes = np.random.uniform(-180, 180, n_points)
print(f"Random coordinates:")
for i in range(n_points):
print(f"Point {i+1}: ({random_latitudes[i]:.4f},
{random_longitudes[i]:.4f})")
Similarly, you can generate random temperature readings with a normal (Gaussian) distribution, which
is useful for simulating measurement errors or natural variations.
137
print(f"First temperature reading: {first_temp}°C")
In 2D arrays, you can specify both row and column indices to access a particular element.
# Create a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr_2d
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Create a 1D array
arr = np.array([10, 20, 30, 40, 50])
138
# Create a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr_2d
# Slice the last two rows and the first two columns
slice_2d_partial = arr_2d[1:, :2]
print(f"Sliced 2D array (last two rows, first two columns):\n{slice_2d_partial}")
# Create a 1D array
arr = np.array([10, 20, 30, 40, 50])
# Create a 1D array
arr = np.array([10, 20, 30, 40, 50])
# Create a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
139
# Iterating through rows of the 2D array
print("Iterating over rows:")
for row in arr_2d:
print(row)
# Create a 1D array
arr = np.array([10, 20, 30, 40, 50])
140
14.4. Introduction to Pandas
While NumPy excels at numerical computations on arrays, real-world geospatial data often comes in more
complex formats. You might have a dataset with city names, coordinates, population numbers, and dates
all mixed together. This is where Pandas shines.
Pandas is built on top of NumPy but adds powerful tools for working with structured data—data that has
labels, different data types, and relationships between columns. Think of Pandas as providing you with a
powerful, programmable spreadsheet that can handle much larger datasets and more complex operations
than traditional spreadsheet software.
Key Pandas concepts:
• Series: A one-dimensional labeled array (like a single column in a spreadsheet)
• DataFrame: A two-dimensional labeled data structure (like a complete spreadsheet with rows and
columns)
• Index: Labels for rows and columns that make data selection intuitive
• Data types: Automatic handling of numbers, text, dates, and other data types
In geospatial programming, you’ll use Pandas to:
• Load datasets from CSV files, databases, or web APIs
• Clean and prepare messy real-world data
• Filter and select subsets of your data (e.g., all cities in a specific country)
• Group and summarize data (e.g., average population by continent)
• Merge different datasets together (e.g., combining city coordinates with demographic data)
import pandas as pd
Then, you can create a Pandas Series from a list of city names:
Note that we assigned a name to the Series object. This is useful when you want to refer to the Series
object by name later in the code.
141
In addition to the Series object, Pandas also provides a DataFrame object, which is a two-dimensional
labeled data structure. A DataFrame is like a complete table with multiple columns (where each column
is a Series ). Let’s create a DataFrame from a dictionary of data representing city information:
data = {
"City": ["Tokyo", "Los Angeles", "London"],
"Latitude": [35.6895, 34.0522, 51.5074],
"Longitude": [139.6917, -118.2437, -0.1278],
"Country": ["Japan", "USA", "UK"],
}
df = pd.DataFrame(data)
print(f"Pandas DataFrame (city information):\n{df}")
print(f"\nDataFrame shape: {df.shape}") # (rows, columns)
print(f"Column names: {df.columns.tolist()}")
print(f"Data types:\n{df.dtypes}")
Although we can use the print function to print the contents of a Pandas DataFrame, it is not formatted
well. The recommended way to print a DataFrame is to use the display function or simply type the
DataFrame name in a code cell and run it:
display(df)
The output format of the DataFrame object is a table with rows and columns. It is similar to a spreadsheet
and looks better than the output from a print statement.
You can also select multiple columns at once. This returns a new DataFrame with the selected columns:
You can also filter data based on conditions. For example, you can select all cities in the Western
Hemisphere (negative longitude):
142
western_cities = df[df["Longitude"] < 0]
print("Cities in Western Hemisphere:")
print(western_cities)
You can combine multiple conditions to filter data. For example, you can select all cities with latitude
greater than 40 and in the Western Hemisphere:
Sometimes, you may want to create a new column in a DataFrame based on the values of other columns.
For example, you want to convert coordinates from degrees to radians. You can do this by using the
np.radians function applied to the columns of the DataFrame:
df["Lat_Radians"] = np.radians(df["Latitude"])
df["Lon_Radians"] = np.radians(df["Longitude"])
df
To concatenate two columns in a DataFrame, you can use the + operator. This is a convenient way to
create a new column that combines the values of two existing columns. For example, you can create a
new column that combines the city and country names:
data = {
"City": ["Tokyo", "Los Angeles", "London", "Paris", "Chicago"],
"Country": ["Japan", "USA", "UK", "France", "USA"],
"Population": [37400068, 3970000, 9126366, 2140526, 2665000],
}
df = pd.DataFrame(data)
df
To group data by a column, you can use the groupby method. For example, you can group the data by
country and calculate the total population for each country:
143
df_grouped = df.groupby("Country")["Population"].sum()
print(f"Total Population by Country:\n{df_grouped}")
df1 = pd.DataFrame(
{"City": ["Tokyo", "Los Angeles", "London"], "Country": ["Japan", "USA",
"UK"]}
)
df2 = pd.DataFrame(
{
"City": ["Tokyo", "Los Angeles", "London"],
"Population": [37400068, 3970000, 9126366],
}
)
df1
df2
data_with_nan = {
"City": ["Tokyo", "Los Angeles", "London", "Paris"],
"Population": [37400068, 3970000, None, 2140526],
}
df_nan = pd.DataFrame(data_with_nan)
df_nan
You can fill missing values with the mean of the column. For example, you can fill the missing population
with the mean population of the cities:
144
df_filled = df_nan.fillna(df_nan["Population"].mean())
df_filled
url = "https://github.com/opengeos/datasets/releases/download/world/world_cities.
csv"
df = pd.read_csv(url)
df.head()
The DataFrame contains information about world cities, including their names, countries, populations,
and geographical coordinates. We can calculate the total population of all cities in the dataset using
NumPy and Pandas as follows.
np.sum(df["population"])
To do a quick visual check of the data, you can use the plot method of the DataFrame object. This will
create a line plot of the data (Figure 9).
air_quality.plot(figsize=(10, 5))
145
Figure 9: A line plot of air quality data in three cities.
To plot only the columns of the data table with the data from Paris (Figure 10):
air_quality["station_paris"].plot(figsize=(10, 5))
146
air_quality.plot.scatter(x="station_london", y="station_paris", alpha=0.5)
Figure 11: A scatter plot of air quality data in Paris and London.
To visualize each of the columns in a separate subplot, you can use the plot.area method. This will
create an area plot of the data (Figure 12):
147
Let’s say you have a dataset of cities, and you want to calculate the average distance from each city to all
other cities.
# Creating a DataFrame
data = {
"City": ["Tokyo", "Los Angeles", "London"],
"Latitude": [35.6895, 34.0522, 51.5074],
"Longitude": [139.6917, -118.2437, -0.1278],
}
df = pd.DataFrame(data)
148
• Vectorization allows you to perform operations on entire arrays without writing loops
• Broadcasting enables operations between arrays of different shapes
• Mathematical functions work element-wise on arrays, making complex calculations simple
• Array reshaping helps you reorganize data without copying it
Pandas Fundamentals:
• DataFrames provide a powerful structure for mixed-type tabular data
• Indexing and selection make it easy to extract subsets of your data
• Filtering allows you to select data based on conditions (geographic bounds, attribute values)
• Column operations enable you to create new calculated fields
• Data integration tools help you combine datasets from different sources
Why This Matters for Geospatial Work:
• Most geospatial Python libraries (GeoPandas, Rasterio, etc.) are built on NumPy and Pandas
• Real-world geospatial data is often messy and requires cleaning and transformation
• Efficient data processing becomes critical when working with large datasets
• These skills transfer directly to more advanced geospatial analysis techniques
14.8. Exercises
14.8.1. Exercise 1: NumPy Array Operations and Geospatial Coordinates
Practice fundamental NumPy operations with geospatial coordinate data.
Tasks:
1. Create coordinate arrays: Build a 2D NumPy array containing the latitude and longitude of these
cities:
• Tokyo: (35.6895, 139.6917)
• New York: (40.7128, −74.0060)
• London: (51.5074, −0.1278)
• Paris: (48.8566, 2.3522)
2. Unit conversion: Convert all coordinates from degrees to radians using np.radians()
3. Distance calculations: Calculate the coordinate differences between Tokyo and each other city (in
radians)
40 https://numpy.org/doc/stable
41 https://pandas.pydata.org/docs
149
4. Statistical analysis: Find the mean latitude and longitude, and identify which city is furthest from
the mean coordinates
150
Section III: Geospatial Programming with Python
151
152
Chapter 15. Introduction to Geospatial Python
15.1. Introduction
Python has emerged as the dominant language for geospatial analysis and visualization, offering a rich
ecosystem of libraries that handle everything from basic data manipulation to advanced cloud computing
and 3D mapping. This chapter introduces you to the comprehensive suite of geospatial tools covered in
this book, helping you understand how different libraries work together to create powerful analytical
workflows.
The geospatial Python ecosystem is built on a foundation of interoperable libraries, each serving specific
purposes while working seamlessly together. Understanding the relationships between these tools—and
knowing when to use each one—is crucial for building effective geospatial applications.
153
15.2.4. Specialized Analysis
WhiteboxTools provides over 550 geoprocessing tools with a focus on hydrological analysis, terrain
processing, and LiDAR data analysis, all optimized for performance.
Geemap connects Python to Google Earth Engine’s planetary-scale computing platform, enabling analy-
sis of petabytes of satellite imagery and geospatial datasets in the cloud.
DuckDB offers fast analytical database capabilities with spatial extensions, providing SQL-based analysis
of geospatial data with excellent performance characteristics.
Apache Sedona provides distributed computing capabilities for geospatial data processing, enabling
processing of large datasets across multiple nodes.
# Install uv
# macOS and Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 |
iex"
42 https://github.com/astral-sh/uv
154
# Create virtual environment
uv venv
# Activate environment
# macOS and Linux:
source .venv/bin/activate
# Windows:
.venv\Scripts\activate
# Install pixi
# macOS and Linux:
curl -fsSL https://pixi.sh/install.sh | bash
# Windows:
iwr -useb https://pixi.sh/install.ps1 | iex
43 https://pixi.sh
155
15.5. Verification and First Steps
Once your environment is set up, verify that key libraries are working correctly:
If you see an interactive map with multiple basemap options, your geospatial Python environment is
ready for the more advanced topics covered in subsequent chapters.
156
7. 3D Visualization of Hyperspectral Data with HyperCoast - Visualize hyperspectral data in 3D
157
• Conferences: Attend SciPy, FOSS4G, or CNG conferences for networking and learning
15.10. Exercises
1. Environment Setup: Complete the installation process for your preferred method (uv, pixi, or conda)
and verify that all core libraries import correctly.
2. First Map: Create an interactive map centered on your location or area of interest. Experiment with
different basemaps and zoom levels.
3. Library Exploration: Choose one library from each category (data structures, visualization, special-
ized analysis) and read through their documentation to understand their primary use cases.
4. Integration Planning: Think about a geospatial project you’d like to work on. Based on the library
descriptions, identify which tools you would need and how they might work together in your workflow.
These exercises will prepare you for the more detailed exploration of each library in the following
chapters.
158
Chapter 16. Vector Data Analysis with GeoPandas
16.1. Introduction
Working with geospatial data in Python becomes remarkably intuitive with GeoPandas44, a powerful
library that brings the familiar pandas experience to geographic data analysis. Just as pandas revolution-
ized data science by making tabular data manipulation simple and intuitive, GeoPandas does the same for
spatial vector data—points, lines, and polygons that represent real-world geographic features.
Imagine you’re analyzing urban development patterns, studying wildlife migration routes, or mapping
disease outbreaks. These scenarios involve spatial relationships that traditional data analysis tools struggle
to handle effectively. GeoPandas bridges this gap by seamlessly integrating geospatial operations with
the pandas interface you already know, while leveraging the geometric capabilities of Shapely45 under
the hood.
What makes GeoPandas particularly powerful is its ability to handle both the geometric aspects of spatial
data (where features are located, their shapes and sizes) and their attribute data (what those features
represent—population, elevation, land use type, etc.). This dual nature allows you to perform complex
spatial analyses while maintaining the familiar DataFrame operations you use in pandas.
Key capabilities of GeoPandas:
• Unified data model: Store geometric shapes alongside their attributes in a single data structure
• Spatial operations: Calculate distances, intersections, buffers, and other geometric relationships
• Coordinate system management: Handle different map projections and coordinate reference
systems
• Multiple format support: Read and write industry-standard formats like Shapefile, GeoJSON,
GeoPackage, and more
• Visualization integration: Create maps using Matplotlib with simple, pandas-style plotting methods
44 https://geopandas.org
45 https://github.com/shapely/shapely
46 https://matplotlib.org
159
16.3.1. GeoDataFrame and GeoSeries
The GeoDataFrame is the central data structure in GeoPandas. Think of it as a pandas DataFrame with
superpowers—it contains all the familiar tabular data capabilities you expect, plus a special geometry
column that stores spatial shapes. This geometry column can contain points (representing locations like
cities or GPS coordinates), lines (representing features like rivers or roads), or polygons (representing
areas like countries or land parcels).
The GeoSeries handles the geometric data within a GeoDataFrame. While you’ll primarily work with
GeoDataFrames, understanding that GeoSeries manages the actual geometric operations helps clarify how
spatial methods work.
Import the necessary libraries. Notice how we import both pandas and GeoPandas—they work together
seamlessly:
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
160
"Latitude": [35.6895, 40.7128, 51.5074, 48.8566],
"Longitude": [139.6917, -74.0060, -0.1278, 2.3522],
}
The gpd.points_from_xy() function is a convenience method that creates Point geometries from sep-
arate longitude (x) and latitude (y) columns. Notice that longitude comes first (x-coordinate), then latitude
(y-coordinate)—this follows the standard mathematical convention, though it might feel backwards since
we often think “latitude, longitude” when speaking.
url = "https://github.com/opengeos/datasets/releases/download/vector/nybb.
geojson"
161
gdf = gpd.read_file(url)
gdf.head()
The read_file() function automatically detects the file format and coordinate system, loading both
the geometric shapes and their associated attributes. This GeoDataFrame contains several informative
columns:
• BoroCode : Numeric codes for each borough
• BoroName : Human-readable borough names
• Shape_Leng and Shape_Area : Geometric measurements
• geometry : The actual polygon shapes defining each borough boundary
output_file = "nyc_boroughs.geojson"
gdf.to_file(output_file, driver="GeoJSON")
print(f"GeoDataFrame has been written to {output_file}")
The driver parameter explicitly specifies the output format, though GeoPandas can often infer the
format from the file extension. This approach ensures your data is saved exactly as intended.
You can also export to other formats commonly used in GIS workflows:
162
16.7.1. Understanding Coordinate Systems
Think of coordinate reference systems as different ways to represent the curved Earth on flat maps or
in computer systems. Geographic coordinate systems use latitude and longitude (like GPS coordinates),
while projected coordinate systems convert these to flat, Cartesian coordinates measured in linear units
like meters or feet.
This dataset uses EPSG:2263 47 (NAD83 / New York Long Island State Plane), a projected coordinate
system designed specifically for the New York area. The measurements are in feet, which is typical for US
state plane coordinate systems.
EPSG48 codes are standardized identifiers for coordinate reference systems, maintained by the Interna-
tional Association of Oil & Gas Producers (formerly the European Petroleum Survey Group). These codes
provide a universal way to specify exactly which coordinate system is being used.
Notice how the coordinate values in the geometry column have changed from large numbers (repre-
senting feet in the state plane system) to decimal degrees (representing latitude and longitude). This
47 https://epsg.io/2263
48 https://epsg.io
163
transformation preserves the shapes and spatial relationships while changing how coordinates are
expressed.
49 https://epsg.io/3857
164
These calculations reveal that Queens is the largest NYC borough by area, followed by Brooklyn and Staten
Island. This type of analysis is fundamental for urban planning, resource allocation, and demographic
studies.
Boundaries convert polygon areas to their outline as LineString geometries, useful for analyzing borders
or creating simplified representations. Centroids provide single-point representations of complex shapes,
valuable for distance calculations and spatial indexing.
gdf[["distance_to_manhattan_km"]].sort_values("distance_to_manhattan_km")
165
16.8.5. Statistical Analysis of Spatial Data
Combining spatial measurements with statistical analysis provides deeper insights:
166
16.9.2. Thematic Mapping
Thematic maps use visual variables like color to represent data values across geographic areas. This is
one of the most effective ways to reveal spatial patterns:
gdf.plot(
column="area_km2",
ax=ax,
legend=True,
cmap="YlOrRd", # Yellow-Orange-Red colormap
edgecolor="black",
linewidth=0.5,
)
Figure 13: A choropleth map showing New York boroughs colored by their area in square kilometers.
167
The resulting map (Figure 13) immediately reveals spatial patterns—Queens and Brooklyn dominate in
terms of area, while Manhattan appears surprisingly small despite its density and economic importance.
168
Figure 14: A multi-layer map showing NYC borough boundaries, centroids, and labels.
169
Figure 15: An interactive map of NYC borough boundaries with hover tooltips showing area and distance
information.
The explore() method creates interactive maps with tooltips, zoom capabilities, and base map options,
making it ideal for data exploration and presentation. Zooming in and out of the map (Figure 15) reveals
the borough boundaries, and hovering over a borough shows its area and distance to Manhattan. Clicking
on a borough opens a popup with detailed information.
Visualizing original shapes alongside their buffers reveals the expanded area of influence (Figure 16):
170
# Visualize original vs buffered geometries
fig, ax = plt.subplots(figsize=(10, 6))
171
print("Convex Hull Analysis:")
print(gdf[["area_km2", "convex_hull_area", "area_ratio"]].round(2))
Figure 16: A comparison map showing original NYC borough boundaries and their 3-kilometer buffer
zones.
Visualize the relationship between original shapes and their convex hulls (Figure 17):
172
facecolor="none",
edgecolor="red",
linewidth=2,
linestyle="--",
label="Convex Hull",
)
plt.title(
"NYC Boroughs: Original Shapes vs Convex Hulls", fontsize=16,
fontweight="bold"
)
plt.legend(loc="upper right")
plt.axis("off")
plt.tight_layout()
plt.show()
Figure 17: A comparison showing original NYC borough boundaries overlaid with their convex hulls
(dashed red lines).
The area ratio reveals shape complexity—values close to 1.0 indicate relatively simple, convex shapes,
while smaller ratios suggest more complex, irregular boundaries.
173
16.11. Spatial Relationships and Queries
Understanding spatial relationships between different geographic features is fundamental to geospatial
analysis. GeoPandas provides intuitive methods for examining how geometries relate to each other.
gdf["intersects_manhattan"] = gdf["buffered"].intersects(manhattan_geom)
gdf["touches_manhattan"] = gdf["geometry"].touches(manhattan_geom)
# Display results
intersection_results = gdf[["intersects_manhattan", "touches_manhattan"]]
intersection_results
This analysis reveals both direct adjacency (touches) and proximity relationships (buffer intersections)
between boroughs.
This type of validation is crucial for data quality assurance in geospatial analysis workflows.
174
16.12.1. Coordinate System Management
• Always check CRS when loading new datasets
• Use appropriate projections for your analysis type (geographic for global visualization, projected for
measurements)
• Document coordinate systems in your analysis workflow
16.14. Exercises
16.14.1. Exercise 1: Creating and Manipulating GeoDataFrames with
GeoPandas
This exercise focuses on creating and manipulating GeoDataFrames, performing spatial operations, and
visualizing the data.
175
1. Load the New York City building dataset from the GeoJSON file using GeoPandas: https://github.com/
opengeos/datasets/releases/download/places/nyc_buildings.geojson
2. Create a plot of the building footprints and color them based on the building height (use the
height_MS column).
3. Create an interactive map of the building footprints and color them based on the building height (use
the height_MS column).
4. Calculate the average building height (use the height_MS column).
5. Select buildings with a height greater than the average height.
6. Save the GeoDataFrame to a new GeoJSON file.
176
Chapter 17. Working with Raster Data Using Rasterio
17.1. Introduction
Rasterio50 is a Python library that allows you to read, write, and analyze geospatial raster data. Built
on top of GDAL51 (Geospatial Data Abstraction Library), it provides an efficient interface to work with
raster datasets, such as satellite images, digital elevation models (DEMs), and other gridded data. Rasterio
simplifies common geospatial tasks and helps to bridge the gap between raw geospatial data and analysis,
especially when combined with other Python libraries like numpy , pandas , and matplotlib .
50 https://rasterio.readthedocs.io
51 https://gdal.org
177
17.3. Installing Rasterio
Before working with Rasterio, you need to install the library. You can do this by running the following
command in your Python environment. Uncomment the line below if you’re working in a Jupyter
notebook or other interactive Python environment.
To get started, you’ll need to import rasterio along with a few other useful Python libraries. These libraries
will allow us to perform different types of geospatial operations, manipulate arrays, and visualize raster
data.
import rasterio
import rasterio.plot
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
• rasterio : The main library for reading and writing raster data.
• rasterio.plot : A submodule of Rasterio for plotting raster data.
• geopandas : A popular library for handling vector geospatial data.
• numpy : A powerful library for array manipulations, which is very useful for raster data.
• matplotlib : A standard plotting library in Python for creating visualizations.
raster_path = (
"https://github.com/opengeos/datasets/releases/download/raster/dem_90m.tif"
)
src = rasterio.open(raster_path)
print(src)
178
17.4.2. Understanding Raster Metadata
When working with raster data, it’s crucial to understand its metadata - the information that describes
the raster’s properties. Rasterio provides several ways to access this information:
2. Complete Metadata: The meta attribute provides a dictionary containing all the essential raster
properties:
print("Raster metadata:")
for key, value in src.meta.items():
print(f"{key}: {value}")
Raster metadata:
driver: GTiff
dtype: int16
nodata: None
width: 4269
height: 3113
count: 1
crs: EPSG:3857
transform: | 90.00, 0.00,-13442488.34|
| 0.00,-90.00, 4668371.58|
| 0.00, 0.00, 1.00|
2. Spatial Resolution: Resolution tells you the size of each pixel in real-world units (e.g., meters). It’s
crucial for:
• Understanding the level of detail in your data
• Determining the appropriate scale for analysis
179
• Comparing different raster datasets
3. Raster Dimensions: The width and height tell you the number of pixels in each dimension:
4. Geographic Extent: The bounds attribute provides the coordinates of the raster’s edges:
2. Number of Bands: Some rasters have multiple bands (like RGB images), while others have a single
band (like elevation data):
print("Affine transform:")
print(src.transform)
Affine transform:
| 90.00, 0.00,-13442488.34|
| 0.00,-90.00, 4668371.58|
| 0.00, 0.00, 1.00|
180
• a : width of a pixel in the x-direction
• b : row rotation (typically zero)
• c : x-coordinate of the upper-left corner
• d : column rotation (typically zero)
• e : height of a pixel in the y-direction (typically negative)
• f : y-coordinate of the upper-left corner
Understanding these parameters is essential for:
• Converting between pixel and geographic coordinates
• Performing accurate spatial operations
• Maintaining the spatial integrity of your data
rasterio.plot.show(src)
181
17.5.2. Understanding Color Maps
Color maps (colormaps) are essential for effectively visualizing raster data. They map the pixel values to
colors, making it easier to interpret the data. Common colormaps include:
• 'terrain' : Good for elevation data, showing low to high values with intuitive colors
• 'viridis' : A perceptually uniform colormap, good for general use
• 'gray' or 'Greys' : Useful for visualizing single-band data
• 'RdYlBu' : Red-Yellow-Blue, good for showing positive and negative values
Here’s an example using the ‘terrain’ colormap for our DEM (see Figure 19):
Figure 19: Visualization of a Digital Elevation Model (DEM) using Rasterio with a colormap.
elev_band = src.read(1)
plt.figure(figsize=(8, 8))
plt.imshow(elev_band, cmap="terrain")
182
plt.colorbar(label="Elevation (meters)", shrink=0.5)
plt.title("DEM with Terrain Colormap")
plt.show()
raster_path = "https://github.com/opengeos/datasets/releases/download/raster/LC
09_039035_20240708_90m.tif"
src = rasterio.open(raster_path)
183
Figure 21: Visualization of a single band of a raster dataset using Rasterio.
nir_band = src.read(5)
red_band = src.read(4)
green_band = src.read(3)
184
Figure 22: Visualization of RGB bands of a raster dataset using Rasterio.
plt.tight_layout()
plt.show()
185
Figure 23: Visualization of a raster dataset using Rasterio with multiple panels.
186
17.5.5. Overlaying Vector Data
Often, you’ll want to overlay vector data (like boundaries or points) on top of your raster visualization.
This helps provide context and additional information. Below is an example of overlaying a vector dataset
on top of a raster dataset (see Figure 24):
187
17.6. Accessing and Manipulating Raster Bands
Raster datasets often consist of multiple bands, each capturing a different part of the electromagnetic
spectrum. For instance, satellite images may include separate bands for red, green, blue, and near-infrared
(NIR) wavelengths.
raster_path = "https://github.com/opengeos/datasets/releases/download/raster/LC
09_039035_20240708_90m.tif"
src = rasterio.open(raster_path)
nir_band = src.read(5)
red_band = src.read(4)
green_band = src.read(3)
print(rgb.shape)
plt.figure(figsize=(8, 8))
plt.imshow(ndvi, cmap="RdYlGn", vmin=-1, vmax=1)
plt.colorbar(label="NDVI", shrink=0.75)
plt.title("NDVI")
plt.xlabel("Column #")
plt.ylabel("Row #")
plt.show()
188
Figure 25: Visualization of NDVI (Normalized Difference Vegetation Index) using Rasterio.
Then, we adjust the profile to fit the modified dataset (e.g., NDVI) since the NDVI dataset only has one
band:
189
profile.update(dtype=rasterio.float32, count=1, compress="lzw")
print(profile)
output_raster_path = "ndvi.tif"
raster_path = "https://github.com/opengeos/datasets/releases/download/raster/LC
09_039035_20240708_90m.tif"
src = rasterio.open(raster_path)
data = src.read()
data.shape
Then, let’s clip a portion of the raster data using array indices (see Figure 26):
190
Figure 26: Clipping raster data using Rasterio.
Alternatively, we can use a specific geographic window to clip the data:
After defining the window, we write the clipped data to a new file:
with rasterio.open(
"las_vegas.tif",
"w",
driver="GTiff",
191
height=subset.shape[1],
width=subset.shape[2],
count=subset.shape[0],
dtype=subset.dtype,
crs=src.crs,
transform=new_transform,
compress="lzw",
) as dst:
dst.write(subset)
import fiona
import rasterio.mask
geojson_path = "https://github.com/opengeos/datasets/releases/download/places/
las_vegas_bounds_utm.geojson"
bounds = gpd.read_file(geojson_path)
fig, ax = plt.subplots()
rasterio.plot.show(src, ax=ax)
bounds.plot(ax=ax, edgecolor="red", facecolor="none")
Next, apply the mask to extract only the area within the vector bounds:
out_meta = src.meta
out_meta.update(
{
"driver": "GTiff",
"height": out_image.shape[1],
"width": out_image.shape[2],
"transform": out_transform,
}
192
)
17.10. Exercises
The following exercises will help you practice the fundamental concepts covered in this chapter. Each
exercise builds on the previous ones and focuses on essential raster data operations.
193
• Bands: 1
2. Multispectral Satellite Image
• URL: https://github.com/opengeos/datasets/releases/download/raster/cog.tif
• Type: Landsat imagery
• Resolution: 30 meters
• Bands: Multiple (including RGB and infrared)
194
17.10.5. Exercise 4: Writing and Saving Raster Data
This exercise focuses on saving processed raster data.
1. Using the DEM:
• Calculate the slope (you can use a simple gradient)
• Save the slope as a new GeoTIFF file
• Include appropriate metadata
2. Using the multispectral image:
• Create a simple vegetation index (e.g., (band 4 - band 3) / (band 4 + band 3))
• Save the result as a new GeoTIFF file
• Include appropriate metadata
195
Chapter 18. Multi-dimensional Data Analysis with Xarray
18.1. Introduction
Imagine you’re analyzing climate data that varies across three dimensions: latitude, longitude, and time.
With traditional tools, you might find yourself constantly keeping track of which array index corresponds
to which dimension, carefully managing metadata, and writing complex loops to perform seemingly
simple operations. This is where Xarray52 transforms your workflow from tedious to intuitive.
Xarray is a powerful Python library designed specifically for working with multi-dimensional labeled
datasets—the kind of data that’s ubiquitous in geospatial analysis, climate science, oceanography, and
remote sensing. Rather than working with anonymous arrays of numbers, Xarray lets you work with data
that knows about itself: its dimensions have meaningful names (like “time”, “latitude”, “longitude”), its
coordinates have real values (like dates and geographic locations), and it carries metadata that describes
what the data represents.
Why Xarray revolutionizes multi-dimensional data analysis:
Consider a typical climate dataset containing temperature measurements. With raw NumPy arrays,
selecting data for a specific date might require remembering that time is the first dimension and finding
the right index. With Xarray, you simply write temperature.sel(time='2023-06-15') and the library
handles the rest. This intuitive approach extends to all operations, making complex analyses readable and
maintainable.
Key advantages of Xarray:
• Self-describing data: Dimensions, coordinates, and metadata travel with your data
• Intuitive indexing: Select data using meaningful labels rather than numeric indices
• Broadcasting and alignment: Automatic handling of operations between arrays with different
dimensions
• Integration ecosystem: Seamless interoperability with pandas, NumPy, Matplotlib, and other scien-
tific Python tools
• Performance optimization: Built-in support for lazy evaluation and parallel computing through
Dask53
• Format flexibility: Native support for scientific data formats like NetCDF54 and Zarr55
55 https://zarr.dev
196
18.3. Understanding Xarray’s Data Model
Before diving into practical examples, it’s essential to understand Xarray’s data model (Figure 27), which
consists of two primary structures that work together to represent complex, multi-dimensional datasets.
x
temperature precipitation latitude longitude ( Indexes)
197
%pip install xarray pooch pygis
Configuration explanations:
• keep_attrs=True : Preserves metadata attributes through operations
• display_expand_data=False : Shows a compact representation of large datasets
• NumPy settings: Prevents overwhelming output when displaying large arrays
This dataset represents a common structure in climate science: temperature measurements (in Kelvin)
recorded at weather stations across North America over time. The data is stored in NetCDF format, a self-
describing scientific data format that preserves all the metadata and coordinate information.
Understanding the output: When you display a Dataset, Xarray shows you a wealth of information:
• Dimensions: The named axes and their sizes
• Coordinates: The actual coordinate values along each dimension
• Data variables: The measured quantities (in this case, air temperature)
56 https://tinyurl.com/xarray-tutorial-dataset
198
• Attributes: Global metadata describing the entire dataset
Data source and caching: The dataset is automatically downloaded from the internet and cached locally.
You can find the cache directory at:
• Linux: ~/.cache/xarray_tutorial_data
• macOS: ~/Library/Caches/xarray_tutorial_data
• Windows: ~/AppData/Local/xarray_tutorial_data
This caching ensures you don’t re-download data unnecessarily and enables offline work after the initial
download.
199
Figure 29: The data structure of Xarray’s DataArray.
Alternatively, you can use dot notation for cleaner syntax:
Both approaches yield the same result, but dot notation is often more readable in complex analyses.
print("Dimensions:", temperature.dims)
print("Dimension sizes:", temperature.sizes)
200
# Explore the coordinate information
print("Coordinates:")
for name, coord in temperature.coords.items():
print(f" {name}: {coord.values[:3]}... (showing first 3 values)")
Coordinates:
lat: [75. 72.5 70. ]... (showing first 3 values)
lon: [200. 202.5 205. ]... (showing first 3 values)
time: ['2013-01-01T00:00:00.000000000' '2013-01-01T06:00:00.000000000'
'2013-01-01T12:00:00.000000000']... (showing first 3 values)
Attributes:
long_name: 4xDaily Air temperature at sigma level 995
units: degK
precision: 2
GRIB_id: 11
GRIB_name: TMP
var_desc: Air temperature
dataset: NMC Reanalysis
level_desc: Surface
statistic: Individual Obs
parent_stat: Other
actual_range: [185.16 322.1 ]
201
# Select data for a specific date and location
point_data = temperature.sel(time="2013-01-01", lat=40.0, lon=260.0)
point_data
This approach is intuitive and self-documenting. Anyone reading your code immediately understands
what data you’re selecting.
Slice notation: The slice() function creates a range selector, similar to Python’s standard slicing but
using coordinate values instead of indices.
# Select data nearest to a location that might not be exactly on the grid
nearest_data = temperature.sel(lat=40.5, lon=255.7, method="nearest")
actual_coords = nearest_data.sel(time="2013-01-01")
print(f"Requested: lat=40.5, lon=255.7")
print(f"Actual: lat={actual_coords.lat.values}, lon={actual_coords.lon.values}")
This feature is invaluable when working with irregularly spaced data or when your analysis points don’t
exactly match measurement locations.
202
print(f"Time-averaged data shape: {mean_temperature.shape}")
print(
f"Temperature range: {mean_temperature.min().values:.1f} to
{mean_temperature.max().values:.1f} K"
)
What happened: By specifying dim="time" , we computed the average across all time steps, leaving us
with a 2D array representing the long-term average temperature at each geographic location.
# Calculate temperature anomalies by subtracting the time mean from each time
step
anomalies = temperature - mean_temperature
print(f"Anomaly range: {anomalies.min().values:.1f} to
{anomalies.max().values:.1f} K")
Broadcasting magic: Xarray automatically handles the broadcasting between the 3D temperature array
and the 2D mean temperature array, aligning coordinates correctly without any manual intervention.
203
print(f"Warmest period: {warmest_date.values}")
print(f"Coldest period: {coldest_date.values}")
204
Automatic features: Notice how Xarray automatically:
• Uses coordinate values for axis labels
• Adds appropriate axis titles
• Creates a colorbar with proper units
• Handles map projection basics
205
Figure 31: A custom plot of the mean temperature of the dataset.
206
18.10.1. Exploring Dataset Structure
Understanding your dataset’s structure is the first step in any analysis:
This approach ensures consistent processing across all variables while maintaining their coordinate
relationships.
207
print("- Latitude is dimension 1")
print("- Longitude is dimension 2")
# Same result with Xarray - much more readable and less error-prone
ds.air.isel(time=0).plot(figsize=(12, 6), cmap="RdYlBu_r")
plt.title("First Time Step (Xarray approach)")
plt.show()
Or even better, using meaningful date selection, such as “2013-01-01T00:00:00”. The result is a DataArray
with a single time step (Figure 33).
208
Figure 33: The temperature on January 1, 2013.
209
warm_count = warm_locations.count()
print(f"Number of grid points with mean temperature > 280 K:
{warm_count.values}")
# Find time periods when spatial average temperature was unusually high
temp_threshold = spatial_mean.quantile(0.9) # 90th percentile
warm_periods = spatial_mean.where(spatial_mean > temp_threshold, drop=True)
print(f"Number of exceptionally warm time periods: {len(warm_periods)}")
In the example above, we selected locations where the mean temperature was above 280 K and counted
the number of grid points that met this criterion. We also identified locations where the mean temperature
was unusually high. These kinds of operations are common in geospatial analysis, and Xarray makes them
straightforward.
The resulting seasonal_means is a DataArray with the seasonal mean temperature for each season (DJF,
MAM, JJA, SON). We can plot it using the plot method (see Figure 34):
210
plt.tight_layout()
plt.show()
211
plt.tight_layout()
plt.show()
212
Figure 36: The weighted mean temperature of the dataset.
213
output_ds.to_netcdf("processed_air_temperature.nc")
print("Dataset saved to processed_air_temperature.nc")
214
18.16. Further Reading
Xarray provides a wealth of resources for learning more about the library and its capabilities. Here are
some of the most useful resources:
• Xarray documentation: https://docs.xarray.dev
• Xarray tutorial: https://tutorial.xarray.dev
• Xarray gallery: https://docs.xarray.dev/en/stable/gallery.html
18.17. Exercises
The following exercises will help you practice essential Xarray concepts using different datasets and
analysis scenarios.
215
1. Use groupby to calculate the seasonal mean temperature ( Tair ) across all years.
2. Use resample to calculate the monthly mean temperature for 1980.
3. Create visualizations showing both the seasonal patterns and the monthly progression for 1980.
216
Chapter 19. Raster Analysis with Rioxarray
19.1. Introduction
Imagine you’re working with satellite imagery that covers vast geographic areas, or analyzing elevation
data that represents terrain across an entire country. These datasets are fundamentally different from
the vector data we explored in previous chapters—instead of discrete points, lines, and polygons, we’re
working with continuous grids of values that cover space like a digital blanket. This is where rioxarray
57 becomes an invaluable tool for geospatial analysis.
Rioxarray bridges two powerful Python ecosystems: the multi-dimensional array capabilities of Xarray
and the geospatial data handling prowess of Rasterio58. Think of it as Xarray that has learned to speak the
language of geospatial data—it understands coordinate reference systems, spatial transformations, and all
the complexities that come with georeferenced raster data.
Key advantages of rioxarray:
• Geospatial awareness: Full integration of coordinate reference systems, spatial transformations, and
georeferencing information
• Xarray integration: All the power of labeled, multi-dimensional arrays with automatic alignment and
broadcasting
• Format flexibility: Native support for common geospatial formats like GeoTIFF, NetCDF, and more
• Efficient operations: Optimized reading and writing of large raster datasets
• Spatial operations: Built-in methods for reprojection, clipping, resampling, and masking
• Metadata preservation: Automatic handling of spatial metadata through all operations
57 https://corteva.github.io/rioxarray
58 https://rasterio.readthedocs.io
217
# %pip install rioxarray pygis
import rioxarray
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
218
# Load a Landsat satellite image covering Las Vegas area
url = "https://github.com/opengeos/datasets/releases/download/raster/LC09_039035_
20240708_90m.tif"
data = rioxarray.open_rasterio(url)
print(f"Successfully loaded raster with shape: {data.shape}")
data
Coordinates:
band: 1 to 7
x: 582435.0 to 805995.0
y: 3874995.0 to 4105575.0
spatial_ref: 0 to 0
219
Key attributes:
AREA_OR_POINT: Area
long_name: ('SR_B1', 'SR_B2', 'SR_B3', 'SR_B4', 'SR_B5', 'SR_B6', 'SR_B7')
Affine transformation:
| 90.00, 0.00, 582390.00|
| 0.00,-90.00, 4105620.00|
| 0.00, 0.00, 1.00|
Pixel size: 90.0 x 90.0 meters
220
# Example: Set CRS if missing or incorrect (only run if needed)
# data = data.rio.write_crs("EPSG:32611", inplace=True)
print(f"Current CRS is properly set: {data.rio.crs}")
This step is crucial because many spatial operations require known coordinate systems.
221
19.5.2. Spatial Subsetting with Bounding Boxes
Often you need to focus on a specific geographic area. Bounding box clipping is the most straightforward
approach:
222
19.6.1. Understanding Spatial Resolution
Spatial resolution determines the level of detail in your raster data:
Spatial extent:
Width: 223.6 km
Height: 230.6 km
223
resampled_pixels = np.prod(resampled_data.shape)
reduction_factor = original_pixels / resampled_pixels
print(f"Data reduction factor: {reduction_factor:.1f}x")
224
# Create a true-color (RGB) composite using bands 4, 3, 2 (Red, Green, Blue)
fig, ax = plt.subplots(figsize=(6, 6))
Figure 37: A true-color composite of the Landsat 9 image of the Las Vegas area.
225
19.7.2. Visualizing Individual Bands
Sometimes you need to examine individual spectral bands. For example, the near-infrared band (band 5)
is useful for vegetation analysis (Figure 38).
226
Figure 38: A near-infrared band of the Landsat 9 image of the Las Vegas area.
227
Figure 39: A Landsat 9 image of the Las Vegas area with a vector boundary overlay.
228
saved_data = rioxarray.open_rasterio(output_filename)
print(f"Verified: Saved file has shape {saved_data.shape} and CRS
{saved_data.rio.crs}")
If you want to save the data as a Cloud Optimized GeoTIFF (COG), you can use the rio.to_raster()
method with the driver="COG" option. This will create a COG file that is optimized for cloud storage
and can be accessed efficiently.
output_filename = "las_vegas_landsat_processed_cog.tif"
clipped_data.rio.to_raster(output_filename, driver="COG")
Sometimes, you need to mask out certain values in your raster data. For example, you might want to mask
out very low pixels values that are likely to be invalid measurements. First, you can use the where method
to mask out the values you don’t want. Then, you can set the NoData value using the rio.set_nodata()
method.
229
# Example: Mask out extreme values as NoData
# This is useful for removing outliers or invalid measurements
masked_data = data.where(data < 1.0, -9999) # Mask very high reflectance values
masked_data = masked_data.rio.set_nodata(-9999)
print(f"Applied masking: {np.sum(masked_data == -9999).values} pixels set to
NoData")
230
f" Y: {mercator_data.y.min().values:.0f} to
{mercator_data.y.max().values:.0f}"
)
231
plt.tight_layout()
plt.show()
232
The normalization (dividing by the sum) ensures that NDVI values always fall between −1 and +1, making
it easy to compare across different images and time periods. This normalization also helps reduce the
effects of atmospheric conditions and sun angle variations.
Why NDVI is so useful:
• Simple and robust: Easy to calculate and interpret across different sensors and time periods
• Standardized scale: Values between −1 and +1 make comparison straightforward
• Physically meaningful: Directly related to photosynthetic activity and biomass
• Widely validated: Decades of research have established its reliability for vegetation monitoring
The example below shows how to compute NDVI from Landsat 9 data using rioxarray:
# Calculate NDVI using the standard formula: (NIR - Red) / (NIR + Red)
print("Calculating NDVI...")
# Clip values to valid NDVI range and handle any invalid calculations
ndvi = ndvi.clip(min=-1, max=1)
ndvi = ndvi.where(np.isfinite(ndvi)) # Remove infinite values
print(f"NDVI statistics:")
print(f" Range: {ndvi.min().values:.3f} to {ndvi.max().values:.3f}")
print(f" Mean: {ndvi.mean().values:.3f}")
233
vmin=-0.5,
vmax=0.8,
add_colorbar=True,
cbar_kwargs={"label": "NDVI Value"},
)
ax1.set_title("Raw NDVI Values", fontsize=14, fontweight="bold")
ax1.set_xlabel("Longitude (°)")
ax1.set_ylabel("Latitude (°)")
plt.tight_layout()
plt.show()
Figure 41: A visualization of the NDVI of the Landsat 9 image of the Las Vegas area.
234
category, then prints the results along with their percentage of the total. This provides a quick overview
of vegetation presence and other surface types in the analyzed area.
NDVI interpretation:
• NDVI > 0.4: Dense vegetation (forests, healthy crops)
• 0.2 < NDVI < 0.4: Moderate vegetation (grasslands, sparse vegetation)
• 0 < NDVI < 0.2: Bare soil, urban areas, sparse vegetation
• NDVI < 0: Water, snow, clouds
235
19.12. Exercises
The following exercises will help you practice essential rioxarray concepts using a different dataset and
various analysis scenarios.
236
2. Examine the vector data and ensure it has a proper coordinate reference system.
3. Use the GeoJSON polygon to mask the Libya raster dataset, keeping only the data within the polygon
boundaries.
4. Create a visualization showing the masked raster data with the vector boundary overlay.
237
Chapter 20. Interactive Visualization with Leafmap
20.1. Introduction
In the modern era of geospatial analysis, interactive web-based maps have become essential tools for
exploring, analyzing, and communicating spatial information. Traditional static maps, while useful for
certain purposes, lack the dynamic capabilities needed for interactive data exploration, real-time analysis,
and engaging presentations. This is where Leafmap59 comes in—a powerful, open-source Python package
that bridges the gap between complex geospatial data and intuitive, interactive visualization.
What is Leafmap?
Leafmap is fundamentally designed to make geospatial visualization accessible to everyone, from begin-
ners learning GIS programming to experienced geospatial professionals working with complex datasets.
Think of it as a Swiss Army knife for interactive mapping in Python—it provides a unified, simplified
interface that works with multiple mapping libraries behind the scenes, allowing you to focus on your
data and analysis rather than wrestling with complex API syntax.
The Challenge Leafmap Solves
Before Leafmap, creating interactive maps in Python often required deep knowledge of various mapping
libraries, each with their own syntax, capabilities, and limitations. You might need to use folium60 for web-
based maps, ipyleaflet61 for Jupyter integration, bokeh62 for dashboard applications, plotly63 for statistical
visualizations, and maplibre64 for 3D visualization. Each library serves its purpose, but learning and
switching between them creates friction in your workflow.
Leafmap elegantly solves this problem by building on top of these established libraries—including
maplibre, folium, ipyleaflet, bokeh, plotly, and pydeck—and extending their functionalities with a unified,
intuitive API. This means you can leverage the strengths of multiple mapping ecosystems through a single,
consistent interface.
Why Interactive Maps Matter in Geospatial Analysis
Interactive maps are powerful analytical tools that enable exploratory data analysis, multi-scale visual-
ization, and real-time feedback. Unlike static maps, they allow you to zoom, pan, and interact with your
data to discover patterns that might otherwise remain hidden. You can seamlessly navigate from global
overviews to detailed local insights, turn different data layers on and off to understand relationships, and
immediately see the results of data processing operations. This interactivity also enhances communication
by allowing stakeholders to explore the data themselves through engaging presentations.
This chapter will guide you through Leafmap’s essential features, focusing on the fundamental concepts
and techniques you need to create effective interactive geospatial visualizations. We’ll start with the basics
of map creation and customization, then progress through data visualization techniques for both vector
and raster data. Each section builds upon previous concepts while providing practical examples that you
can immediately apply to your own geospatial projects.
59 https://leafmap.org
60 https://python-visualization.github.io/folium
61 https://ipyleaflet.readthedocs.io
62 https://docs.bokeh.org/en/latest/docs/user_guide/topics/geo.html
63 https://plotly.com/python/maps
64 https://eoda-dev.github.io/py-maplibregl
238
20.2. Learning Objectives
By the end of this chapter, you will be able to:
• Create and customize interactive maps using Leafmap’s intuitive interface, including setting map
properties like center location, zoom level, and map controls
• Add and configure different basemap layers to provide appropriate context for your geospatial data,
including standard web map services and custom tile layers
• Visualize and style vector geospatial data effectively, including points, lines, and polygons from
various formats like GeoJSON, Shapefile, and GeoParquet
• Display and analyze raster data including satellite imagery, digital elevation models, and other
gridded datasets using modern cloud-optimized formats
• Apply fundamental interactive mapping techniques such as layer management, legends, and basic
map controls that enhance user experience and data interpretation
Once installed, you can import Leafmap into your Python environment:
import leafmap
239
To switch backends when needed, you can import the specific backend module:
m = leafmap.Map()
m
When you run this code, you’ll see an interactive map centered on a default location with OpenStreetMap
as the base layer. You can immediately pan by clicking and dragging, zoom using the mouse wheel or
zoom controls, switch to fullscreen using the fullscreen button, and access the toolbar for additional
functionality. Note that the fullscreen button will not work in VS Code. Try JupyterLab instead.
Understanding the Map Object
The variable m now contains a map object that represents your interactive map. This object is your
interface for adding data, changing settings, and customizing the map’s appearance and behavior. Think
of it as a canvas that you’ll progressively build upon throughout your analysis.
240
20.4.3. Managing Map Controls
Interactive maps include various controls that help users navigate and interact with the data. Understand-
ing these controls and when to show or hide them is essential for creating effective user experiences.
241
Figure 42: The leafmap search control for searching places on the map.
242
20.4.4.3. Clearing Map Content
Sometimes you need to start fresh or create minimal maps for specific purposes:
m = leafmap.Map()
m.add_basemap("OpenTopoMap")
m
243
Figure 43: The OpenTopoMap basemap.
Popular Basemap Options
Common basemaps include "OpenStreetMap" for standard street maps with good detail,
"OpenTopoMap" for topographic features and elevation contours, "Esri.WorldImagery" for high-
resolution satellite imagery, and "CartoDB.Positron" for clean minimalist design.
m = leafmap.Map()
m.add_basemap_gui()
m
244
m = leafmap.Map()
m.add_tile_layer(
url="https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}",
name="Google Satellite",
attribution="Google",
)
m
65 https://apps.nationalmap.gov/services
245
20.5.6. Adding Legends for Data Context
Legends are essential components that help users understand what your map symbols and colors repre-
sent. Without proper legends, even the most sophisticated geospatial visualization can be confusing and
ineffective. Leafmap provides both built-in legends for common datasets and the ability to create custom
legends.
Here’s an example of adding a legend for the National Land Cover Database (NLCD), which classifies land
use across the United States (see Figure 44):
246
20.5.7. Adding Colorbars for Continuous Data
While legends work well for categorical data (like land cover types), continuous data such as elevation,
temperature, or precipitation requires different visualization approaches. Colorbars provide a continuous
scale that shows how colors map to numerical values.
Here’s how to add a colorbar for topographic data (see Figure 45):
m = leafmap.Map()
m.add_basemap("OpenTopoMap")
m.add_colormap(
cmap="terrain",
label="Elevation (meters)",
orientation="horizontal",
vmin=0,
vmax=4000,
)
m
66 https://matplotlib.org/stable/users/explain/colors/colormaps.html
247
• Population/Density: "YlOrRd" , "Reds"
To show the list of all colormaps, you can use the following code:
import leafmap.colormaps as cm
cm.plot_colormaps(width=8, height=0.3)
m = leafmap.Map()
location = [40, -100] # [latitude, longitude]
m.add_marker(location, draggable=True)
m
m = leafmap.Map()
# Coordinates for three different cities
locations = [
[40, -100], # Central US
[45, -110], # Northwestern US
[50, -120], # Southwestern Canada
248
]
m.add_markers(markers=locations)
m
m = leafmap.Map()
url = "https://github.com/opengeos/datasets/releases/download/world/world_cities.
csv"
m.add_marker_cluster(url, x="longitude", y="latitude", layer_name="World cities")
m
249
m = leafmap.Map(center=[40, -100], zoom=4)
cities = "https://github.com/opengeos/datasets/releases/download/us/cities.csv"
regions = "https://github.com/opengeos/datasets/releases/download/us/us_regions.
geojson"
m.add_geojson(regions, layer_name="US Regions")
m.add_points_from_xy(
cities,
x="longitude",
y="latitude",
color_column="region",
icon_names=["gear", "map", "leaf", "globe"],
spin=True,
add_legend=True,
)
m
250
m.add_vector(data, layer_name="Cable lines", info_mode="on_hover")
m
m = leafmap.Map()
url = "https://github.com/opengeos/datasets/releases/download/places/nyc_
251
buildings.geojson"
m.add_vector(url, layer_name="NYC Buildings", zoom_to_layer=True)
m
url = "https://github.com/opengeos/datasets/releases/download/places/las_vegas_
buildings.geojson"
gdf = leafmap.read_vector(url)
gdf.head()
You can use the GeoDataFrame.explore() method to visualize the data interactively, which utilizes the
folium library. The example below displays the building footprints interactively:
gdf.explore()
To display the GeoDataFrame using Leaflet, use the add_gdf() method. The following code adds the
building footprints to the map (see Figure 49):
m = leafmap.Map()
m.add_basemap("Esri.WorldImagery")
style = {"color": "red", "fillColor": "red", "fillOpacity": 0.1, "weight": 2}
m.add_gdf(gdf, style=style, layer_name="Las Vegas Buildings", zoom_to_layer=True)
m
252
Figure 49: Visualizing buildings with Leafmap.
m = leafmap.Map()
data = "https://raw.githubusercontent.com/opengeos/leafmap/master/docs/data/
countries.geojson"
m.add_data(
data, column="POP_EST", scheme="Quantiles", cmap="Blues",
legend_title="Population"
)
m
253
Figure 50: A choropleth map of population data using the “Quantiles” classification scheme.
You can also use different classification schemes like “EqualInterval” for different data visualizations:
m = leafmap.Map()
m.add_data(
data,
column="POP_EST",
scheme="EqualInterval",
cmap="Blues",
legend_title="Population",
)
m
67 https://geoparquet.org
254
url = "https://opengeos.org/data/duckdb/cities.parquet"
gdf = leafmap.read_vector(url)
gdf.head()
You can use GeoDataFrame.explore() to visualize the data interactively (see Figure 51).
gdf.explore()
m = leafmap.Map()
m.add_points_from_xy(gdf, x="longitude", y="latitude")
m
url = "https://data.source.coop/giswqs/nwi/wetlands/DC_Wetlands.parquet"
gdf = leafmap.read_vector(url)
gdf.head()
You can visualize the polygon data directly using explore() with folium:
255
gdf.explore()
For visualization with Leafmap, use the following code to add the polygons to the map with a satellite
basemap (see Figure 52).
m = leafmap.Map()
m.add_basemap("Esri.WorldImagery", show=False)
m.add_nwi(gdf, col_name="WETLAND_TYPE", zoom_to_layer=True)
m
Figure 52: Visualizing the National Wetlands Inventory (NWI) with leafmap.
url = "https://opengeos.org/data/pmtiles/protomaps_firenze.pmtiles"
metadata = leafmap.pmtiles_metadata(url)
68 https://github.com/protomaps/PMTiles
256
print(f"layer names: {metadata['layer_names']}")
print(f"bounds: {metadata['bounds']}")
m = leafmap.Map()
style = {
"version": 8,
"sources": {
"example_source": {
"type": "vector",
"url": "pmtiles://" + url,
"attribution": "PMTiles",
}
},
"layers": [
{
"id": "buildings",
"source": "example_source",
"source-layer": "landuse",
"type": "fill",
"paint": {"fill-color": "steelblue"},
},
{
"id": "roads",
"source": "example_source",
"source-layer": "roads",
"type": "line",
"paint": {"line-color": "black"},
},
],
}
257
You can also visualize large datasets like the Google-Microsoft Open Buildings data69 hosted on Source
Cooperative. First, check the PMTiles metadata for the building footprints layer:
url = "https://data.source.coop/vida/google-microsoft-open-buildings/pmtiles/go_
ms_building_footprints.pmtiles"
metadata = leafmap.pmtiles_metadata(url)
print(f"layer names: {metadata['layer_names']}")
print(f"bounds: {metadata['bounds']}")
Now, visualize the building footprints using a custom style for the fill color and opacity:
style = {
"version": 8,
"sources": {
"example_source": {
"type": "vector",
"url": "pmtiles://" + url,
"attribution": "PMTiles",
}
},
"layers": [
{
"id": "buildings",
"source": "example_source",
"source-layer": "building_footprints",
"type": "fill",
"paint": {"fill-color": "#3388ff", "fill-opacity": 0.5},
},
],
}
m.add_pmtiles(
url, name="Buildings", style=style, overlay=True, show=True,
zoom_to_layer=False
)
m
69 https://source.coop/repositories/vida/google-microsoft-open-buildings/description
258
20.9.4. Visualizing Overture Maps Data
Overture Maps Foundation70 provides open-source, high-quality basemaps for web mapping applications.
You can visualize Overture Maps data using PMTiles archives. The following example demonstrates how
to visualize the building footprints layer from the Overture Maps. For more information, visit the Overture
Maps documentation71.
The Overture Maps data are being updated regularly. You can get the latest release of the Overture Maps
data using the following code:
release = leafmap.get_overture_latest_release()
release
At the time of writing, the latest release of the Overture Maps data is 2025-05-21. You can check the
historical releases of the Overture Maps data by visiting https://labs.overturemaps.org/data/releases.json.
Next, we can visualize the building footprints layer from the Overture Maps (see Figure 53).
theme = "buildings"
url = f"https://overturemaps-tiles-us-west-2-beta.s3.amazonaws.com/{release}/
{theme}.pmtiles"
style = {
"version": 8,
"sources": {
"example_source": {
"type": "vector",
"url": "pmtiles://" + url,
"attribution": "PMTiles",
}
},
"layers": [
{
"id": "Building",
"source": "example_source",
"source-layer": "building",
"type": "fill",
"paint": {
"fill-color": "#ffff00",
"fill-opacity": 0.7,
"fill-outline-color": "#ff0000",
},
},
],
}
70 https://overturemaps.org
71 https://docs.overturemaps.org
259
m = leafmap.Map(center=[47.65350739, -117.59664999], zoom=16)
m.add_basemap("Satellite")
m.add_pmtiles(url, style=style, layer_name="Buildings", zoom_to_layer=False)
m
Figure 53: Visualizing the building footprints layer from the Overture Maps.
72 https://cogeo.org
260
m.add_cog_layer(url, name="Fire (pre-event)")
m
Under the hood, the add_cog_layer() method uses the TiTiler73 endpoint to serve COGs as map
tiles. The method also supports custom styling and visualization parameters. Please refer to the TiTiler
documentation74 for more information.
Please note that the add_cog_layer() method requires an internet connection to fetch the COG tiles
from the TiTiler endpoint. If you need to work offline, you can download the COG and load it as a local
GeoTIFF file using the Map.add_raster() method covered in the next section.
To show the image metadata, use the cog_info() function:
leafmap.cog_info(url)
leafmap.cog_bands(url)
stats = leafmap.cog_stats(url)
# stats
url2 = "https://opendata.digitalglobe.com/events/california-fire-2020/post-event/
2020-08-14/pine-gulch-fire20/10300100AAC8DD00.tif"
m.add_cog_layer(url2, name="Fire (post-event)")
m
73 https://developmentseed.org/titiler
74 https://developmentseed.org/titiler/endpoints/cog/#api
261
)
m
Figure 54: Visualizing the pre-event and post-event imagery of the 2020 California fire with leafmap.
Here is another example comparing two COGs using a split map (Figure 54).
75 https://www.mrlc.gov/data/nlcd-2021-land-cover-conus
262
url = "https://github.com/opengeos/datasets/releases/download/raster/nlcd_2021_
land_cover_30m.tif"
colormap = {
"11": "#466b9f",
"12": "#d1def8",
"21": "#dec5c5",
"22": "#d99282",
"23": "#eb0000",
"24": "#ab0000",
"31": "#b3ac9f",
"41": "#68ab5f",
"42": "#1c5f2c",
"43": "#b5c58f",
"51": "#af963c",
"52": "#ccb879",
"71": "#dfdfc2",
"72": "#d1d182",
"73": "#a3cc51",
"74": "#82ba9e",
"81": "#dcd939",
"82": "#ab6c28",
"90": "#b8d9eb",
"95": "#6c9fb8",
}
m = leafmap.Map(center=[40, -100], zoom=4, height="650px")
m.add_basemap("Esri.WorldImagery")
m.add_cog_layer(url, colormap=colormap, name="NLCD Land Cover", nodata=0)
m.add_legend(title="NLCD Land Cover Type", builtin_legend="NLCD")
m.add_layer_manager()
m
263
Figure 55: Visualizing the US National Land Cover Database (NLCD) data with a custom colormap.
dem_url = "https://github.com/opengeos/datasets/releases/download/raster/dem_90m.
tif"
filename = "dem_90m.tif"
leafmap.download_file(dem_url, filename, quiet=True)
Next, visualize the raster using the Map.add_raster() method with a “terrain” colormap to highlight
elevation differences (Figure 56):
m = leafmap.Map()
m.add_raster(filename, colormap="terrain", layer_name="DEM")
m
264
Figure 56: Visualizing the DEM data with a terrain colormap.
You can also check the minimum and maximum values of the raster:
leafmap.image_min_max(filename)
Keep in mind that the add_raster() method works for both local and remote GeoTIFF files. You can use
it to visualize COGs available online or local GeoTIFF files stored on your computer. The example below
demonstrates how to visualize the same COG from the previous section as a remote file:
import leafmap
landsat_url = "https://github.com/opengeos/datasets/releases/download/raster/cog.
tif"
filename = "cog.tif"
leafmap.download_file(landsat_url, filename, quiet=True)
265
m = leafmap.Map()
m.add_raster(filename, indexes=[4, 3, 2], layer_name="RGB")
m
76 https://stacspec.org
77 https://stacindex.org/catalogs/spot-orthoimages-canada-2005
266
url = "https://canada-spot-ortho.s3.amazonaws.com/canada_spot_orthoimages/canada_
spot5_orthoimages/S5_2007/S5_11055_6057_20070622/S5_11055_6057_20070622.json"
leafmap.stac_bands(url)
267
information about Maxar Open Data is available at https://registry.opendata.aws/maxar-open-data78.
leafmap.maxar_collections()
This function retrieves all available collections from the Maxar Open Data STAC (SpatioTemporal Asset
Catalog) catalog. The output will show you a list of disaster events, each with a unique collection ID that
we can use to access the specific imagery data.
collection = "Kahramanmaras-turkey-earthquake-23"
url = leafmap.maxar_collection_url(collection, dtype="geojson")
url
This function generates a URL pointing to the GeoJSON file containing the spatial footprints and metadata
for all satellite images related to the Turkey earthquake event. The GeoJSON format is particularly useful
because it includes both the geographic boundaries of each image and associated metadata.
Now let’s load this data and explore what’s available:
gdf = leafmap.read_vector(url)
print(f"Total number of images: {len(gdf)}")
gdf.head()
Here we’re using GeoPandas to read the remote GeoJSON file directly from the URL. The gdf.head()
command shows us the first few rows of the dataset, revealing important metadata columns such as:
• geometry: The spatial footprint of each satellite image
• catalog_id: A unique identifier for grouping related images
• quadkey: Microsoft’s quadkey system for tile identification
• timestamp: When the image was captured
• visual: Direct download link to the satellite image
78 https://registry.opendata.aws/maxar-open-data
268
m = leafmap.Map()
m.add_gdf(gdf, layer_name="Footprints", zoom_to_layer=True)
m
This creates an interactive map centered on the earthquake region, with blue polygons showing the spatial
coverage of each satellite image. You can zoom in, pan around, and click on individual footprints to see
their metadata. This visualization helps us understand the geographic extent of the available imagery and
identify areas with dense coverage.
The leafmap.maxar_search() function allows us to filter the imagery collection by date. By setting
end_date="2023-02-06" , we retrieve only images captured before the earthquake occurred. These pre-
event images serve as our baseline for comparison.
Now let’s get all images captured after the earthquake:
By setting start_date="2023-02-06" , we retrieve all images captured from the earthquake date on-
wards. These post-event images will show the immediate aftermath and damage caused by the earthquake.
m = leafmap.Map()
pre_style = {"color": "red", "fillColor": "red", "opacity": 1, "fillOpacity":
0.5}
m.add_gdf(
pre_gdf,
layer_name="Pre-event",
style=pre_style,
info_mode="on_click",
269
zoom_to_layer=True,
)
m.add_gdf(post_gdf, layer_name="Post-event", info_mode="on_click",
zoom_to_layer=True)
m
Figure 59: The interactive map shows the spatial coverage of pre-event (red) and post-event (blue) satellite
images for the Turkey earthquake.
In this visualization, red polygons represent pre-earthquake imagery while blue polygons show post-
earthquake coverage. The info_mode="on_click" parameter enables interactive information popups
when you click on any footprint. You can toggle layers on/off using the layer control panel, and the
different colors help distinguish the temporal coverage patterns.
bbox = m.user_roi_bounds()
if bbox is None:
bbox = [36.8715, 37.5497, 36.9814, 37.6019]
The m.user_roi_bounds() function attempts to get the bounding box coordinates from any region
you’ve drawn on the map. If no region is drawn, we fall back to predefined coordinates that cover a
270
significantly impacted area near Kahramanmaraş. The bbox format follows [min_longitude, min_latitude,
max_longitude, max_latitude] convention.
This search combines spatial filtering (using the bounding box) with temporal filtering (images before
February 6, 2023). The result shows us only images that both intersect our geographic area of interest and
were captured before the earthquake.
Similarly, let’s find post-earthquake images within the same area:
This gives us the corresponding post-earthquake imagery for the same geographic region, enabling direct
before-and-after comparison.
pre_tile = pre_event["catalog_id"].values[0]
pre_tile
This gets the catalog ID for the first pre-event tile in our search results. The catalog ID serves as a unique
identifier for a collection of related satellite images covering the same geographic area.
post_tile = post_event["catalog_id"].values[0]
post_tile
Similarly, this extracts the catalog ID for the post-event tile. Having both pre- and post-event tile IDs
allows us to create comparative visualizations.
271
pre_stac = leafmap.maxar_tile_url(collection, pre_tile, dtype="json")
pre_stac
This generates a MosaicJSON URL for the pre-event tile. MosaicJSON is a specification that allows multiple
raster files to be presented as a single web-optimized layer, enabling efficient visualization of large satellite
imagery datasets.
Similarly, this creates the MosaicJSON URL for the post-event imagery. These URLs point to tile services
that can stream the imagery directly to our interactive map.
m = leafmap.Map()
m.split_map(
left_layer=pre_stac,
right_layer=post_stac,
left_label="Pre-event",
right_label="Post-event",
)
m.set_center(36.9265, 37.5762, 16)
m
272
Figure 60: The split map shows the pre-event (left) and post-event (right) satellite images for the Turkey
earthquake.
This creates an interactive split-screen map where you can:
• Left side: View the area before the earthquake
• Right side: View the same area after the earthquake
• Divider: Drag the vertical divider left or right to compare different parts of the scene
• Synchronization: Both sides pan and zoom together, maintaining perfect geographic alignment
pre_images = pre_event["visual"].tolist()
post_images = post_event["visual"].tolist()
This extracts the direct download URLs from our filtered datasets. The “visual” column contains links to
the processed, analysis-ready satellite images in RGB format that are optimized for visual interpretation.
Now let’s download the pre-event images:
leafmap.maxar_download(pre_images)
The leafmap.maxar_download() function handles the download process, saving images to your local
directory with organized naming conventions. These high-resolution images can then be used for:
273
• Detailed damage assessment
• Machine learning training datasets
• Offline mapping applications
• Scientific research and analysis
# leafmap.maxar_download(post_images)
There are a lot more post-event images available for the Turkey earthquake. It may take a while to
download all the images. Uncomment the above cell to download the post-event images if needed.
20.13. Exercises
20.13.1. Exercise 1: Creating an Interactive Map
1. Create an interactive map with search functionality that allows users to search for places and zoom to
them. Disable the draw control on the map.
79 https://apps.nationalmap.gov/services
274
• URL:
https://basemap.nationalmap.gov/arcgis/rest/services/USGSTopo/MapServer/tile/
{z}/{y}/{x}
2. Add two WMS layers to visualize NAIP imagery and NDVI using a USGS WMS service.
• URL:
https://imagery.nationalmap.gov/arcgis/services/USGSNAIPImagery/ImageServer/
WMSServer?
• Layer names: USGSNAIPImagery:FalseColorComposite , USGSNAIPImagery:NDVI_Color
80 https://esa-worldcover.org/en
275
• Line width: 2
• URL: https://github.com/opengeos/datasets/releases/download/places/las_vegas_roads.geojson
3. Create a choropleth map of county areas in the US:
• URL: https://github.com/opengeos/datasets/releases/download/us/us_counties.geojson
• Column: CENSUSAREA
276
• Pre-event imagery: https://github.com/opengeos/datasets/releases/download/raster/Libya-2023-07-
01.tif
• Post-event imagery: https://github.com/opengeos/datasets/releases/download/raster/Libya-2023-
09-13.tif
277
Chapter 21. Geoprocessing with WhiteboxTools
21.1. Introduction
Geospatial analysis forms the backbone of modern environmental science, urban planning, and natural
resource management. Among the many tools available for such analysis, WhiteboxTools81 stands out
as a particularly powerful and accessible open-source library. This chapter will guide you through the
fundamental concepts and practical applications of WhiteboxTools, focusing on two essential areas of
geospatial analysis: watershed analysis and LiDAR data processing.
WhiteboxTools excels in hydrological modeling and terrain analysis, making it an ideal choice for
understanding how water moves across landscapes and how we can extract meaningful information from
three-dimensional point cloud data. Throughout this chapter, we will work with real-world datasets to
demonstrate these concepts, providing you with hands-on experience that you can apply to your own
research and projects.
The chapter is organized into two complementary sections. First, we will explore watershed analysis,
where you will learn to work with Digital Elevation Models (DEMs) to understand water flow patterns,
delineate watershed boundaries, and extract stream networks. This foundation in hydrological analysis is
crucial for environmental modeling and water resource management. Second, we will delve into LiDAR
data analysis, where you will discover how to process three-dimensional point cloud data to create various
elevation models and extract vegetation information.
By combining WhiteboxTools with leafmap, you will gain the ability to not only perform sophisticated
analyses but also visualize and interpret your results effectively. This integration of analysis and visual-
ization is essential for communicating geospatial findings to both technical and non-technical audiences.
81 https://github.com/jblindsay/whitebox-tools
278
21.3.1. What is Whitebox?
Whitebox is a comprehensive geospatial data analysis platform that originated from the research efforts
at the University of Guelph’s Geomorphometry and Hydrogeomatics Research Group (GHRG82), under
the direction of Dr. John Lindsay. The platform has evolved into a robust toolkit containing over 550
specialized tools for processing diverse types of geospatial data.
What makes Whitebox particularly appealing is its design philosophy. The developers prioritized perfor-
mance through extensive use of parallel computing using Rust83, which means that many operations
can take advantage of multiple processor cores to complete tasks faster. Additionally, Whitebox is self-
contained, meaning it doesn’t require you to install and manage complex dependencies like GDAL, which
can sometimes be challenging to configure properly across different operating systems.
The platform’s architecture also makes it highly scriptable, allowing you to integrate Whitebox functions
seamlessly into Python, R, or other programming environments. This flexibility means you can incorpo-
rate Whitebox capabilities into larger analytical workflows without having to switch between different
software packages.
82 https://jblindsay.github.io/ghrg/index.html
83 https://www.rust-lang.org
84 https://www.whiteboxgeo.com/whitebox-workflows-for-python
279
The performance characteristics of Whitebox also set it apart from many alternatives. The platform is
optimized for speed and can handle large datasets efficiently, making it suitable for regional or even
continental-scale analyses. This performance advantage becomes particularly apparent when working
with high-resolution elevation data or large LiDAR point clouds.
Once the packages are installed, we can import the necessary libraries and configure our environment for
the analyses that follow.
85 https://github.com/jblindsay/whitebox-tools
86 https://www.whiteboxgeo.com
87 https://whiteboxgeo.com/manual/wbt_book/preface.html
88 https://www.whiteboxgeo.com/whitebox-workflows-for-python
89 https://www.whiteboxgeo.com/whitebox-workflows-professional
90 https://github.com/opengeos/whitebox-python
91 https://github.com/opengeos/whiteboxgui
92 https://github.com/opengeos/whiteboxR
93 https://github.com/opengeos/WhiteboxTools-ArcGIS
94 https://plugins.qgis.org/plugins/whitebox_workflows_for_qgis
280
import os
import leafmap
import numpy as np
WhiteboxTools generates raster datasets with specific data types and nodata values that need to be
handled properly for visualization. Many of the raster outputs use 32-bit integer format with a nodata
value of −32768. By setting this as an environment variable, we ensure that leafmap will correctly interpret
and display these datasets.
os.environ["NODATA"] = "-32768"
m = leafmap.Map()
m.add_basemap("OpenTopoMap")
m.add_basemap("USGS 3DEP Elevation")
m.add_basemap("USGS Hydrography")
m
The basemaps we’ve added serve different purposes in our analysis. OpenTopoMap provides a general
topographic context, showing terrain features and elevation patterns. The USGS 3DEP Elevation basemap
offers a detailed view of elevation data, which will help us understand the terrain we’re analyzing. The
USGS Hydrography basemap shows existing stream networks and water bodies, which we can later
compare with our analytical results to validate our watershed delineation.
281
Willamette River and drains a portion of the Oregon Cascade Range, creating distinct elevation gradients
that make watershed boundaries relatively easy to identify.
We begin by specifying a point within our area of interest. This point will serve as a reference for
downloading watershed boundary data from existing databases.
lat = 44.361169
lon = -122.821802
The coordinates we’ve chosen represent a location within the Calapooia River basin. By centering our
map on this point and adding a marker, we can visualize exactly where our analysis will focus. The zoom
level of 10 provides a good balance between showing the local area detail and maintaining context of the
broader region.
Now we’ll use this point to download existing watershed boundary data. The leafmap.get_wbd()
function accesses the USGS Watershed Boundary Dataset, which provides standardized watershed delin-
eations at various scales. The digit=10 parameter specifies that we want HUC-10 level watersheds,
which typically represent watersheds of 40,000 to 250,000 acres.
The watershed boundary data we’ve downloaded provides us with an official delineation that we can
use for comparison with our own analysis. This comparison is important for validating our methods and
understanding any differences that might arise from using different elevation data sources or analytical
approaches.
gdf.to_file("basin.geojson")
Saving the watershed boundary as a GeoJSON file ensures that we can easily reload and reuse this
data throughout our analysis. GeoJSON is a widely supported format that maintains spatial reference
information and can be easily shared or used in other applications.
282
array = leafmap.get_3dep_dem(
gdf,
resolution=30,
output="dem.tif",
dst_crs="EPSG:3857",
to_cog=True,
overwrite=True,
)
array
The parameters we’ve specified for downloading the DEM are important for our analysis. The 30-meter
resolution provides a good balance between detail and computational efficiency for watershed-scale
analysis. Higher resolutions would provide more detail but would also require significantly more process-
ing time and storage space. The coordinate reference system EPSG:3857 (Web Mercator) is commonly
used for web mapping applications and ensures compatibility with our interactive maps.
The to_cog=True parameter creates a Cloud Optimized GeoTIFF, which is a format optimized for
efficient access and visualization. This format includes internal tiling and overviews that make it faster to
display at different zoom levels.
Visualizing the DEM on our interactive map allows us to immediately see the topographic characteristics
of our study area. The terrain color palette provides an intuitive representation where lower elevations
appear in greens and blues, while higher elevations are shown in browns and whites. This visualization
helps us understand the overall topographic setting and identify major ridgelines that likely form water-
shed boundaries.
metadata = leafmap.image_metadata("dem.tif")
metadata
The metadata reveals important details such as the exact spatial extent of our data, the coordinate
reference system, the data type and bit depth, and any compression settings used. This information helps
us understand the precision and limitations of our elevation data, which in turn affects the accuracy of
our watershed analysis.
Understanding the elevation range in our study area is also important for setting appropriate parameters
in our analysis tools and for interpreting our results in the context of the local topography.
283
21.6.5. Add colorbar
A colorbar provides essential context for interpreting elevation values in our visualization. Without a
colorbar, users can see relative elevation differences but cannot determine actual elevation values, which
are important for understanding the scale and magnitude of topographic features.
leafmap.image_min_max("dem.tif")
The elevation range from 60 to 1500 meters represents the topographic diversity within our watershed,
from the valley floor where the river exits the basin to the ridge tops that form the watershed boundaries
(Figure 61). This significant elevation range creates the hydraulic gradients that drive water flow and
make watershed delineation possible.
wbt = leafmap.WhiteboxTools()
284
Checking the version ensures that we’re using a current version of the software and helps with repro-
ducibility of our analysis. Different versions may have different capabilities or may produce slightly
different results due to algorithm improvements.
wbt.version()
The WhiteboxTools interface (Figure 62) provides a comprehensive view of all available tools, organized
by category. While we’ll use the Python API for our analysis, this interface is valuable for exploring
capabilities and understanding tool parameters.
leafmap.whiteboxgui()
wbt.set_working_dir(os.getcwd())
wbt.verbose = False
285
Setting verbose=False reduces the amount of processing information displayed, which keeps our
notebook output clean and focused on the essential results. However, if you encounter problems with any
analysis step, you can set verbose=True to get detailed information about what the tools are doing.
The feature-preserving smoothing algorithm is specifically designed for elevation data. Unlike simple
averaging filters, it attempts to smooth noise while maintaining important topographic features like
ridgelines and valley bottoms. The filter size of 9 represents a 9x9 pixel window, which provides moderate
smoothing appropriate for 30-meter resolution data.
The return value of 0 indicates that the operation completed successfully. WhiteboxTools functions return
0 for success and 1 for failure, which allows you to check whether operations completed properly in
automated workflows.
Let’s visualize the smoothed DEM alongside our watershed boundary to see how the smoothing has
affected the elevation data.
m = leafmap.Map()
m.add_basemap("Satellite")
m.add_raster("smoothed.tif", colormap="terrain", layer_name="Smoothed DEM")
m.add_geojson("basin.geojson", layer_name="Watershed", info_mode=None)
m.add_basemap("USGS Hydrography", show=False)
m
The satellite basemap provides a real-world context for our elevation data, allowing us to see how
topographic features correspond to visible landscape features like forests, agricultural areas, and urban
development. The USGS Hydrography basemap is available but initially hidden, so we can toggle it on
later to compare our analytical results with mapped stream networks.
The azimuth parameter of 315 degrees represents lighting from the northwest, which is a common
convention in cartography because it creates shadows that most people find intuitive to interpret. The
altitude of 35 degrees represents the angle of the light source above the horizon, creating moderate
shadows that provide good contrast without being too dramatic.
286
m.add_raster("hillshade.tif", layer_name="Hillshade")
m.layers[-1].opacity = 0.6
By overlaying the hillshade with partial transparency (opacity of 0.6), we create a visualization that
combines the elevation information from our DEM with the three-dimensional appearance provided by
the hillshade. This combination is particularly effective for understanding the topographic context of our
watershed analysis.
wbt.find_no_flow_cells("smoothed.tif", "noflow.tif")
The D8 flow direction algorithm, which we’ll use later, assigns flow direction based on the steepest
downhill path to one of eight neighboring cells. In areas where all neighboring cells have higher or equal
elevation, the algorithm cannot determine a flow direction, resulting in no-flow cells.
Visualizing the no-flow cells helps us understand where our elevation data might have problems that
could affect our watershed analysis. Large areas of no-flow cells might indicate flat areas like lakes or
wetlands, or they might indicate data quality issues that need to be addressed.
wbt.fill_depressions("smoothed.tif", "filled.tif")
Depression filling works by raising the elevation of cells within depressions to the level of the lowest
point on the depression’s rim. This creates a continuous flow path while making minimal changes to the
original elevation data.
An alternative approach is depression breaching, which creates channels through the barriers that create
depressions rather than filling the depressions themselves. This approach can be more appropriate in
some situations because it preserves the original elevation values in most areas.
wbt.breach_depressions("smoothed.tif", "breached.tif")
Let’s check how depression breaching affects the no-flow cells in our dataset.
287
wbt.find_no_flow_cells("breached.tif", "noflow2.tif")
m.layers[-1].visible = False
m.add_raster("noflow2.tif", layer_name="No Flow Cells after Breaching")
m
Comparing the no-flow cells before and after depression breaching shows us how effective the prepro-
cessing has been. Ideally, we should see a significant reduction in no-flow cells, indicating that our
elevation data is now suitable for flow modeling.
wbt.d8_pointer("breached.tif", "flow_direction.tif")
The D8 algorithm is called “D8” because it considers eight possible flow directions (the four cardinal
directions plus the four diagonal directions). While more sophisticated algorithms exist that can split flow
among multiple directions, D8 remains popular because it’s computationally efficient and produces results
that are easy to interpret and use in subsequent analyses.
The flow direction grid uses a coding system where each direction is represented by a specific value.
Understanding this coding system isn’t essential for basic watershed analysis, but it becomes important
if you need to interpret the flow direction values directly or use them in custom analyses.
wbt.d8_flow_accumulation("breached.tif", "flow_accum.tif")
The flow accumulation calculation follows the flow directions we calculated in the previous step, counting
how many cells contribute flow to each location. Cells at ridge tops have low flow accumulation (only
their own area), while cells in major valleys have high flow accumulation (representing the drainage from
large upslope areas).
Visualizing flow accumulation reveals the drainage network structure within our watershed. The brightest
areas represent locations where the most water would accumulate, typically corresponding to the main
288
river channels and major tributaries. This pattern should generally match the actual stream network
visible in satellite imagery or hydrography datasets.
The threshold value of 5000 means that a cell must have at least 5000 upslope cells draining through it to
be classified as a stream. This threshold determines the density of the extracted stream network. Lower
thresholds produce more detailed networks with smaller tributaries, while higher thresholds produce
simpler networks with only the major channels.
Selecting an appropriate threshold requires balancing detail with accuracy. Too low a threshold will
identify many small channels that may not actually carry water year-round, while too high a threshold
may miss important tributaries. The optimal threshold often depends on the resolution of your elevation
data, the climate of your study area, and the specific requirements of your analysis.
m.layers[-1].visible = False
m.add_raster("streams.tif", layer_name="Streams")
m
The extracted stream network should generally match the pattern visible in the flow accumulation data,
but with a clear binary classification between stream and non-stream cells. This binary classification is
useful for many types of analysis, such as calculating distances to streams or determining stream density
within different areas.
wbt.distance_to_outlet(
"flow_direction.tif", streams="streams.tif", output="distance_to_outlet.tif"
)
This analysis uses both the flow direction grid and the stream network to calculate distances along the
actual flow paths rather than straight-line distances. This is important because water and sediment must
follow the topographically determined flow paths, which are often much longer than direct distances.
The distance to outlet visualization typically shows a pattern where areas near the watershed outlet have
low values (short distances) and areas near the watershed boundaries have high values (long distances).
289
This pattern reflects the dendritic (tree-like) structure of drainage networks, where tributaries converge
toward the main channel.
wbt.raster_streams_to_vector(
"streams.tif", d8_pntr="flow_direction.tif", output="streams.shp"
)
The vectorization process uses both the stream raster and the flow direction grid to create connected line
segments that represent the stream network. The flow direction information is essential for determining
how individual stream segments connect to form a coherent network.
There’s a known issue with the raster_streams_to_vector tool where the output vector file doesn’t include
coordinate system information. We can fix this by explicitly setting the coordinate reference system.
leafmap.vector_set_crs(source="streams.shp", output="streams.shp",
crs="EPSG:3857")
m.add_shp(
"streams.shp",
layer_name="Streams Vector",
style={"color": "#ff0000", "weight": 3},
info_mode=None,
)
m
290
Figure 63: The stream network of the Calapooia River basin.
The vector stream network (Figure 63) provides a clean, cartographic representation that’s easy to
interpret and compare with other datasets. You can toggle on the USGS Hydrography basemap to compare
your extracted stream network with the officially mapped streams, which provides a good validation of
your analysis methods and parameters.
wbt.basins("flow_direction.tif", "basins.tif")
First, we need to identify all the individual drainage basins within our study area. This step creates a raster
where each connected drainage area is assigned a unique identifier.
wbt.longest_flowpath(
dem="breached.tif", basins="basins.tif", output="longest_flowpath.shp"
)
The longest flow path calculation uses the elevation data and basin boundaries to trace the path from the
highest point in each basin to its outlet, following the steepest descent route.
Since we’re interested in the single longest flow path for our main watershed, we need to select only the
longest path from all the paths that were calculated.
291
leafmap.select_largest(
"longest_flowpath.shp", column="LENGTH", output="longest_flowpath.shp"
)
m.add_shp(
"longest_flowpath.shp",
layer_name="Longest Flowpath",
style={"color": "#ff0000", "weight": 3},
)
m
The longest flow path provides important information about the watershed’s hydrological characteristics.
Longer flow paths generally indicate longer travel times for water and sediment, which affects flood timing
and water quality processes within the watershed.
You can use the drawing tools in the interactive map to specify your own pour point, or use the default
location provided. The choice of pour point significantly affects the resulting watershed boundary, so it’s
important to select a location that represents a meaningful hydrological boundary, such as a stream gauge
location or the confluence with a larger river.
wbt.snap_pour_points(
"pour_point.shp", "flow_accum.tif", "pour_point_snapped.shp", snap_dist=300
)
292
The snap distance of 300 meters allows the algorithm to search within a reasonable area around the
original pour point location. This distance should be large enough to account for small inaccuracies in
point placement but not so large that the point gets snapped to an entirely different stream.
Visualizing both the original and snapped pour points helps you verify that the snapping process worked
correctly and that the final pour point is located where you intended.
The watershed delineation algorithm works by tracing flow paths backward from the pour point, follow-
ing the flow direction grid to identify all cells that contribute flow to the outlet. This creates a binary raster
where cells within the watershed are assigned one value and cells outside the watershed are assigned
another value.
m.add_raster("watershed.tif", layer_name="Watershed")
m
293
The delineated watershed boundary (Figure 64) should encompass all the topographic area that drains to
your pour point. You can compare this boundary with the original watershed boundary we downloaded
earlier to see how well our analysis matches the official delineation.
wbt.raster_to_vector_polygons("watershed.tif", "watershed.shp")
m.layers[-1].visible = False
m.add_shp(
"watershed.shp",
layer_name="Watershed Vector",
style={"color": "#ffff00", "weight": 3},
info_mode=False,
)
The vector watershed boundary provides a clean, professional-looking result that clearly delineates
the drainage area. This boundary can be used for further analysis, such as calculating watershed area,
extracting climate data for the watershed, or analyzing land use patterns within the drainage basin.
294
wbt = leafmap.WhiteboxTools()
wbt.set_working_dir(os.getcwd())
wbt.verbose = False
The working directory and verbose settings ensure that our LiDAR processing files are organized in a
predictable location and that the output remains clean and focused on essential information.
url = "https://github.com/opengeos/datasets/releases/download/lidar/madison.zip"
filename = "madison.las"
leafmap.download_file(url, "madison.zip", quiet=True)
LiDAR data is typically distributed in LAS or LAZ format, which are standardized formats specifically
designed for storing three-dimensional point cloud data. These formats include not only the x, y, and z
coordinates of each point but also additional information such as the intensity of the return signal, the
time of measurement, and classification codes that indicate what type of object each point represents.
laz = leafmap.read_lidar(filename)
laz
The LiDAR data object contains both the point data and metadata about the data collection. This metadata
includes information about the coordinate system, the data collection date, the sensor specifications, and
the data processing history.
str(laz.header.version)
The LAS format has evolved over time, with different versions supporting different features and capabil-
ities. Understanding the version of your data helps you know what information is available and what
processing options are appropriate.
295
21.7.4. Upgrade file version
Some LiDAR processing tools work better with newer versions of the LAS format. Upgrading to version
1.4 ensures compatibility with the latest processing algorithms and provides access to enhanced features
for point classification and analysis.
The version upgrade process maintains all the original point data while updating the file structure to
support newer features. This is particularly important for advanced classification and filtering operations.
leafmap.write_lidar(las, "madison.las")
wbt.lidar_histogram("madison.las", "histogram.html")
296
The histogram analysis creates an interactive HTML file that you can open in a web browser to explore the
elevation distribution in detail. This visualization (Figure 65) helps identify potential data quality issues,
such as points with unrealistic elevation values that might represent noise or processing errors.
leafmap.view_lidar("madison.las")
297
The elevation slice operation removes points outside a specified elevation range. The minimum elevation
of 0 meters eliminates points below ground level that likely represent noise, while the maximum elevation
of 450 meters removes unrealistically high points that might represent aircraft or atmospheric returns.
Setting appropriate elevation limits requires understanding the topography of your study area. The limits
should be wide enough to include all legitimate ground and vegetation points but narrow enough to
exclude obvious outliers.
leafmap.view_lidar("madison_rm.las", cmap="terrain")
The terrain colormap provides an intuitive visualization where elevation is represented by color, making
it easy to see the topographic structure of the landscape and verify that the outlier removal process
preserved the important features while eliminating problematic points.
wbt.lidar_digital_surface_model(
"madison_rm.las", "dsm.tif", resolution=1.0, minz=0, maxz=450
)
The 1-meter resolution provides detailed information while maintaining reasonable file sizes and pro-
cessing times. Higher resolutions would provide more detail but would also create much larger files and
require more processing time. The elevation limits match those used in the outlier removal step, ensuring
consistency in our processing workflow.
DSMs are particularly useful for applications that need to consider above-ground features, such as flood
modeling in urban areas, viewshed analysis, or solar radiation calculations that must account for buildings
and vegetation.
leafmap.add_crs("dsm.tif", epsg=2255)
WhiteboxTools sometimes creates raster files without coordinate system information, so we need to
explicitly set the coordinate reference system to ensure proper spatial referencing for visualization and
analysis.
298
m = leafmap.Map()
m.add_basemap("Satellite")
m.add_raster("dsm.tif", colormap="terrain", layer_name="DSM")
m
The remove_off_terrain_objects algorithm identifies and removes features that are likely to be above-
ground objects rather than ground surface. The filter parameter of 25 specifies the size of the analysis
window, while the slope parameter of 15.0 degrees helps distinguish between natural terrain variations
and artificial features.
This algorithm works by analyzing the local topographic context around each cell. Areas with steep slopes
or abrupt elevation changes are more likely to represent above-ground objects and are candidates for
removal. The algorithm attempts to interpolate the ground surface beneath these features.
299
m.add_raster("dem.tif", palette="terrain", layer_name="DEM")
m
The subtraction operation creates a new raster where each cell value represents the height of features
above the ground surface. In forested areas, this typically represents tree height, while in urban areas, it
might represent building height.
300
Figure 69: Canopy height model (CHM) of the LiDAR dataset.
The CHM visualization (Figure 69) reveals the three-dimensional structure of vegetation and buildings
in your study area. Tall trees and buildings appear as high values, while open areas and low vegetation
appear as low values. The layer manager allows you to toggle between different layers to compare the
DSM, DEM, and CHM and understand how they relate to each other.
301
21.9. Exercises
21.9.1. Exercise 1: Watershed Analysis for a Different Location
In this exercise, you will apply the watershed analysis techniques learned in this chapter to analyze a
different watershed. This will help you understand how the methods work across different geographic
settings and topographic conditions.
Objective: Perform a complete watershed analysis for the Tualatin River basin in Oregon.
Instructions:
1. Set up your analysis environment by importing the necessary libraries and initializing Whitebox-
Tools.
2. Define your study area using the following coordinates for a point within the Tualatin River basin:
• Latitude: 45.3311
• Longitude: −122.7645
3. Download and prepare elevation data:
• Use leafmap.get_wbd() to download the watershed boundary (HUC-10 level)
• Download a 30-meter resolution DEM using leafmap.get_3dep_dem()
• Apply feature-preserving smoothing to reduce noise
4. Perform hydrological preprocessing:
• Create a hillshade for visualization
• Identify and address no-flow cells using depression breaching
• Calculate flow direction using the D8 algorithm
5. Extract hydrological features:
• Calculate flow accumulation
• Extract the stream network using a threshold of 3000 cells
• Convert the stream network to vector format
• Calculate the longest flow path
6. Delineate the watershed:
• Create a pour point at coordinates (45.2891, −122.7234)
• Snap the pour point to the nearest stream
• Delineate the watershed boundary
• Convert the watershed to vector format
7. Create visualizations:
• Create an interactive map showing the DEM, hillshade, stream network, and watershed boundary
• Add appropriate legends and colorbars
• Compare your results with the official watershed boundary
Questions to consider:
• How does the topography of the Tualatin River basin differ from the Calapooia River basin?
• What threshold value produces a stream network that best matches visible drainage patterns?
• How well does your delineated watershed match the official boundary, and what might cause any
differences?
302
21.9.2. Exercise 2: LiDAR Analysis for Forest Structure Assessment
This exercise focuses on using LiDAR data to assess forest structure and canopy characteristics, which
are important for ecological studies and forest management applications.
Objective: Process LiDAR data to create elevation models and analyze vegetation structure in a forested
area.
Instructions:
1. Download and inspect LiDAR data:
• Download a different LiDAR dataset from the USGS 3DEP Program95
• Extract and load the LAS file
• Examine the data characteristics including point density, elevation range, and data version
2. Data quality assessment:
• Create a histogram of elevation values to identify the data distribution
• Visualize the raw point cloud in 3D to understand the landscape structure
• Identify and remove outlier points (use appropriate elevation limits based on your data inspection)
3. Generate elevation models:
• Create a Digital Surface Model (DSM) at 2-meter resolution
• Generate a bare-earth Digital Elevation Model (DEM) using the remove_off_terrain_objects
tool
• Calculate a Canopy Height Model (CHM) by subtracting the DEM from the DSM
4. Analyze forest structure:
• Calculate summary statistics for the CHM (minimum, maximum, mean, and standard deviation of
canopy heights)
• Create a histogram of canopy heights to understand the forest structure
• Identify areas with different canopy height classes:
‣ Low vegetation (0-5 meters)
‣ Medium vegetation (5-15 meters)
‣ Tall vegetation (>15 meters)
5. Create visualizations:
• Display all three elevation models (DSM, DEM, CHM) on an interactive map
• Use appropriate color schemes for each model type
• Add a satellite basemap for context
• Create a layer manager to toggle between different models
6. Advanced analysis (optional):
• Calculate the percentage of area covered by each vegetation height class
• Identify the tallest trees in the dataset and their locations
• Create a slope map from the DEM to understand terrain characteristics
Questions to consider:
• What does the CHM reveal about the forest structure and species composition?
• How do the elevation models differ in areas with dense vegetation versus open areas?
95 https://www.usgs.gov/3d-elevation-program/about-3dep-products-services
303
• What are the advantages and limitations of using LiDAR-derived DEMs compared to photogrammetric
DEMs?
• How might the resolution of your elevation models affect the accuracy of forest structure analysis?
304
Chapter 22. 3D Mapping with MapLibre
22.1. Introduction
This chapter introduces the MapLibre96 mapping backend in the Leafmap library—a powerful open-source
Python tool for creating customizable 2D and 3D interactive maps. With MapLibre, GIS professionals
can develop tailored visualizations that enhance data presentation and geospatial analysis. The provided
Jupyter notebook demonstrates foundational skills, including interactive map creation, basemap cus-
tomization, and data layer integration. Additionally, advanced features such as 3D building visualizations
and dynamic map controls equip students with practical skills for effective geospatial data visualization
and analysis.
Once installed, import the maplibregl backend from the leafmap package:
96 https://github.com/eodaGmbH/py-maplibregl
97 https://maplibre.org/maplibre-gl-js/docs
98 https://github.com/eoda-dev/py-maplibregl
99 https://leafmap.org/maplibre/overview
100 https://bit.ly/maplibre
101 https://maps.gishub.org
305
22.5. Creating Interactive Maps
22.5.1. Basic Map Setup
Let’s start by creating a simple interactive map with default settings. This basic setup provides simple
map with the dark-matter style on which you can add data layers, controls, and other customizations.
m = leafmap.Map()
m
Hold down the Ctrl key. Click and drag to pan the map.
m = leafmap.Map(style="positron")
m
OpenFreeMap102 provides a variety of basemap styles that you can use in your interactive maps. These
styles include liberty , bright , and positron .
m = leafmap.Map(style="liberty", projection="globe")
m
102 https://openfreemap.org
306
Figure 70: MapLibre with Liberty basemap.
Specifying projection="globe" in the map setup creates a 3D globe map (Figure 70). Once the map is
displayed, you can click on the globe icon in the upper right corner of the map to toggle the globe view.
Click on the left arrow button in the upper right corner of the map to open the layer manager, which
allows you to toggle layer visibility and change the opacity of the layers.
m = leafmap.Map(style="background-lightgray")
m
style = "https://demotiles.maplibre.org/style.json"
m = leafmap.Map(style=style)
m
307
22.6.1. Available Controls
• Geolocate: Centers the map based on the user’s current location, if available.
• Fullscreen: Expands the map to a full-screen view for better focus.
• Navigation: Provides zoom controls and a compass for reorientation.
• Draw: Allows users to draw and edit shapes on the map.
m = leafmap.Map()
m.add_control("geolocate", position="top-left")
m
Additionally, you can load a GeoJSON FeatureCollection into the Draw control (Figure 71), which will
allow users to view, edit, or interact with pre-defined geographical features, such as boundaries or points
of interest.
308
m = leafmap.Map(center=[-100, 40], zoom=3, style="positron")
geojson = {
"type": "FeatureCollection",
"features": [
{
"id": "abc",
"type": "Feature",
"properties": {},
"geometry": {
"coordinates": [
[
[-119.08, 45.95],
[-119.79, 42.08],
[-107.28, 41.43],
[-108.15, 46.44],
[-119.08, 45.95],
]
],
"type": "Polygon",
},
},
{
"id": "xyz",
"type": "Feature",
"properties": {},
"geometry": {
"coordinates": [
[
[-103.87, 38.08],
[-108.54, 36.44],
[-106.25, 33.00],
[-99.91, 31.79],
[-96.82, 35.48],
[-98.80, 37.77],
[-103.87, 38.08],
]
],
"type": "Polygon",
},
},
],
}
m.add_draw_control(position="top-left", geojson=geojson)
m
309
Figure 71: Loading existing GeoJSON data into the map with the Draw control.
Two key methods for accessing drawn features:
• Selected Features: Accesses only the currently selected features.
• All Features: Accesses all features added, regardless of selection, giving you full control over the spatial
data on the map.
m.draw_features_selected
m.draw_feature_collection_all
m = leafmap.Map()
m.add_basemap("OpenTopoMap")
m.add_layer_control()
m
m.add_basemap("Esri.WorldImagery")
310
You can also add basemaps interactively, which provides flexibility for selecting the best background for
your map content.
m = leafmap.Map()
m
m.add_basemap()
m = leafmap.Map()
url = "https://basemap.nationalmap.gov/arcgis/rest/services/USGSTopo/MapServer/
tile/{z}/{y}/{x}"
m.add_tile_layer(url, name="USGS TOpo", attribution="USGS", opacity=1.0,
visible=True)
m
311
22.8. Using MapTiler
To use MapTiler with this notebook, you need to set up a MapTiler API key. You can obtain a free API
key by signing up at https://cloud.maptiler.com/103.
# import os
# os.environ["MAPTILER_KEY"] = "YOUR_API_KEY"
Set the API key as an environment variable to access MapTiler’s resources. Once set up, you can specify
any named style from MapTiler by using the style parameter in the map setup (see Figure 72).
m = leafmap.Map(style="streets")
m
103 https://cloud.maptiler.com/
312
m = leafmap.Map(style="satellite")
m
m = leafmap.Map(style="hybrid")
m
m = leafmap.Map(style="topo")
m
22.9. 3D Mapping
MapTiler provides a variety of 3D styles that enhance geographic data visualization. Styles can be prefixed
with 3d- to utilize 3D effects such as hybrid, satellite, and topo. The 3d-hillshade style is used for
visualizing hillshade effects alone.
22.9.1. 3D Terrain
These examples demonstrate different ways to implement 3D terrain visualization:
• 3D Hybrid style: Adds terrain relief to urban data with hybrid visualization.
• 3D Satellite style: Combines satellite imagery with 3D elevation, enhancing visual context for topog-
raphy.
• 3D Topo style: Provides a topographic view with elevation exaggeration and optional hillshade.
• 3D Terrain style: Displays terrain with default settings for natural geographic areas.
• 3D Ocean style: A specialized terrain style with oceanic details, using exaggeration and hillshade to
emphasize depth.
Each terrain map setup includes a pitch and bearing to adjust the map’s angle and orientation, giving a
better perspective of 3D features (see Figure 73).
m = leafmap.Map(
center=[-122.1874314, 46.2022386], zoom=13, pitch=60, bearing=220, style="3d-
hybrid"
)
m.add_layer_control(bg_layers=True)
m
313
Figure 73: MapTiler 3D hybrid basemap.
m = leafmap.Map(
center=[-122.1874314, 46.2022386],
zoom=13,
pitch=60,
bearing=220,
style="3d-satellite",
)
m.add_layer_control(bg_layers=True)
m
m = leafmap.Map(
center=[-122.1874314, 46.2022386],
zoom=13,
pitch=60,
bearing=220,
style="3d-topo",
exaggeration=1.5,
hillshade=False,
)
m.add_layer_control(bg_layers=True)
m
m = leafmap.Map(
center=[-122.1874314, 46.2022386],
zoom=13,
314
pitch=60,
bearing=220,
style="3d-terrain",
)
m.add_layer_control(bg_layers=True)
m
22.9.2. 3D Buildings
Adding 3D buildings enhances urban visualizations, showing buildings with height variations. The setup
involves specifying the MapTiler API key for vector tiles and adding building data as a 3D extrusion layer.
The extrusion height and color can be set based on data attributes to visualize structures with varying
heights, which can be useful in city planning and urban analysis (see Figure 74).
m = leafmap.Map(
center=[-74.01201, 40.70473], zoom=16, pitch=60, bearing=35, style="basic-v2"
)
MAPTILER_KEY = leafmap.get_api_key("MAPTILER_KEY")
m.add_basemap("Esri.WorldImagery", visible=False)
source = {
"url": f"https://api.maptiler.com/tiles/v3/tiles.json?key={MAPTILER_KEY}",
"type": "vector",
}
layer = {
"id": "3d-buildings",
"source": "openmaptiles",
"source-layer": "building",
"type": "fill-extrusion",
"min-zoom": 15,
"paint": {
"fill-extrusion-color": [
"interpolate",
["linear"],
["get", "render_height"],
0,
"lightgray",
200,
"royalblue",
400,
"lightblue",
],
315
"fill-extrusion-height": [
"interpolate",
["linear"],
["zoom"],
15,
0,
16,
["get", "render_height"],
],
"fill-extrusion-base": [
"case",
[">=", ["get", "zoom"], 16],
["get", "render_min_height"],
0,
],
},
}
m.add_source("openmaptiles", source)
m.add_layer(layer)
m.add_layer_control()
m
316
m = leafmap.Map(
center=[-74.01201, 40.70473], zoom=16, pitch=60, bearing=35, style="basic-v2"
)
m.add_basemap("Esri.WorldImagery", visible=False)
m.add_3d_buildings(min_zoom=15)
m.add_layer_control()
m
data = "https://maplibre.org/maplibre-gl-js/docs/assets/indoor-3d-map.geojson"
gdf = leafmap.geojson_to_gdf(data)
gdf.explore()
gdf.head()
To visualize indoor spaces in 3D, you can use the add_geojson method with the fill-extrusion layer
type. The following example shows how to visualize a 3D model of a building (Figure 75).
m = leafmap.Map(
center=(-87.61694, 41.86625), zoom=17, pitch=40, bearing=20, style="positron"
)
m.add_basemap("OpenStreetMap.Mapnik")
m.add_geojson(
data,
layer_type="fill-extrusion",
name="floorplan",
paint={
"fill-extrusion-color": ["get", "color"],
"fill-extrusion-height": ["get", "height"],
"fill-extrusion-base": ["get", "base_height"],
"fill-extrusion-opacity": 0.5,
},
)
m.add_layer_control()
m
317
Figure 75: Indoor mapping with MapLibre.
318
"source": "countries",
"type": "fill-extrusion",
"paint": {
"fill-extrusion-color": [
"interpolate",
["linear"],
["get", "age"],
23.0,
"#fff5eb",
24.0,
"#fee6ce",
25.0,
"#fdd0a2",
26.0,
"#fdae6b",
27.0,
"#fd8d3c",
28.0,
"#f16913",
29.0,
"#d94801",
30.0,
"#8c2d04",
],
"fill-extrusion-opacity": 1,
"fill-extrusion-height": ["*", ["get", "age"], 5000],
},
}
first_symbol_layer_id = m.find_first_symbol_layer()["id"]
m.add_layer(layer, first_symbol_layer_id)
m.add_layer_control()
m
319
["linear"],
["get", "CENSUSAREA"],
400,
"#fff5eb",
600,
"#fee6ce",
800,
"#fdd0a2",
1000,
"#fdae6b",
],
"fill-extrusion-opacity": 1,
"fill-extrusion-height": ["*", ["get", "CENSUSAREA"], 50],
},
}
first_symbol_layer_id = m.find_first_symbol_layer()["id"]
m.add_layer(layer, first_symbol_layer_id)
m.add_layer_control()
m
320
Simple Marker: Initializes the map centered on a specific latitude and longitude, then adds a static marker
to the location.
Draggable Marker: Similar to the simple marker, but with an additional option to make the marker
draggable. This allows users to reposition it directly on the map.
Multiple Points with GeoJSON: Loads point data from a GeoJSON file containing world city locations.
After reading the data, it’s added to the map as a layer named “cities.” Popups can be enabled to display
additional information when a user clicks on a city.
url = (
"https://github.com/opengeos/datasets/releases/download/world/world_cities.
geojson"
)
geojson = leafmap.read_geojson(url)
m = leafmap.Map(style="streets")
m.add_geojson(geojson, name="cities")
m.add_popup("cities")
m
Custom Symbol Layer: Loads point data as symbols instead of markers and customizes the symbol
layout using icons, which can be scaled to any size.
m = leafmap.Map(style="streets")
source = {"type": "geojson", "data": geojson}
layer = {
"id": "cities",
"type": "symbol",
"source": "point",
"layout": {
"icon-image": "marker_15",
"icon-size": 1,
},
}
m.add_source("point", source)
m.add_layer(layer)
321
m.add_popup("cities")
m
url = "https://github.com/opengeos/datasets/releases/download/places/osgeo_
conferences.geojson"
geojson = leafmap.read_geojson(url)
m.add_source("conferences", source)
layer = {
"id": "conferences",
"type": "symbol",
"source": "conferences",
"layout": {
"icon-image": "custom-marker",
"text-field": ["get", "year"],
"text-font": ["Open Sans Semibold", "Arial Unicode MS Bold"],
"text-offset": [0, 1.25],
"text-anchor": "top",
},
}
m.add_layer(layer)
m
322
source = {
"type": "geojson",
"data": {
"type": "Feature",
"properties": {},
"geometry": {
"type": "LineString",
"coordinates": [
[-122.48369693756104, 37.83381888486939],
[-122.48348236083984, 37.83317489144141],
[-122.48339653015138, 37.83270036637107],
[-122.48356819152832, 37.832056363179625],
[-122.48404026031496, 37.83114119107971],
[-122.48404026031496, 37.83049717427869],
[-122.48348236083984, 37.829920943955045],
[-122.48356819152832, 37.82954808664175],
[-122.48507022857666, 37.82944639795659],
],
},
},
}
layer = {
"id": "route",
"type": "line",
"source": "route",
"layout": {"line-join": "round", "line-cap": "round"},
"paint": {"line-color": "#888", "line-width": 8},
}
m.add_source("route", source)
m.add_layer(layer)
m
m = leafmap.Map(style="hybrid")
geojson = "https://github.com/opengeos/datasets/releases/download/places/wa_
overture_buildings.geojson"
323
paint = {"fill-color": "#ffff00", "fill-opacity": 0.5, "fill-outline-color":
"#ff0000"}
m.add_geojson(geojson, layer_type="fill", paint=paint, name="Fill")
m
m = leafmap.Map(style="hybrid")
geojson = "https://github.com/opengeos/datasets/releases/download/places/wa_
overture_buildings.geojson"
paint_line = {"line-color": "#ff0000", "line-width": 3}
m.add_geojson(geojson, layer_type="line", paint=paint_line, name="Outline")
paint_fill = {"fill-color": "#ffff00", "fill-opacity": 0.5}
m.add_geojson(geojson, layer_type="fill", paint=paint_fill, name="Fill")
m.add_layer_control()
m
m = leafmap.Map(
center=[-123.13, 49.254], zoom=11, style="dark-matter", pitch=45, bearing=0
)
url = "https://raw.githubusercontent.com/visgl/deck.gl-data/master/examples/
geojson/vancouver-blocks.json"
paint_line = {
"line-color": "white",
"line-width": 2,
}
paint_fill = {
"fill-extrusion-color": {
"property": "valuePerSqm",
"stops": [
[0, "grey"],
[1000, "yellow"],
[5000, "orange"],
[10000, "darkred"],
[50000, "lightblue"],
],
},
"fill-extrusion-height": ["*", 10, ["sqrt", ["get", "valuePerSqm"]]],
"fill-extrusion-opacity": 0.9,
}
324
m.add_geojson(url, layer_type="line", paint=paint_line, name="blocks-line")
m.add_geojson(url, layer_type="fill-extrusion", paint=paint_fill, name="blocks-
fill")
m
m.add_geojson(
data,
325
layer_type="circle",
name="earthquake-circles",
filter=["!", ["has", "point_count"]],
paint={"circle-color": "darkblue"},
source_args=source_args,
)
m.add_geojson(
data,
layer_type="circle",
name="earthquake-clusters",
filter=["has", "point_count"],
paint={
"circle-color": [
"step",
["get", "point_count"],
"#51bbd6",
100,
"#f1f075",
750,
"#f28cb1",
],
"circle-radius": ["step", ["get", "point_count"], 20, 100, 30, 750, 40],
},
source_args=source_args,
)
m.add_geojson(
data,
layer_type="symbol",
name="earthquake-labels",
filter=["has", "point_count"],
layout={
"text-field": ["get", "point_count_abbreviated"],
"text-size": 12,
},
source_args=source_args,
)
m
326
Figure 78: A marker cluster of earthquakes in the world.
url = "https://github.com/opengeos/datasets/releases/download/us/us_states.
geojson"
filepath = "data/us_states.geojson"
leafmap.download_file(url, filepath, quiet=True)
m.open_geojson()
327
"building",
"fill-color",
["interpolate", ["exponential", 0.5], ["zoom"], 15, "#e2714b", 22,
"#eee695"],
)
m.set_paint_property(
"building",
"fill-opacity",
["interpolate", ["exponential", 0.5], ["zoom"], 15, 0, 22, 1],
)
m.add_layer_control(bg_layers=True)
m
328
m = leafmap.Map(center=[30.0222, -1.9596], zoom=7, style="streets")
source = {
"type": "geojson",
"data": "https://maplibre.org/maplibre-gl-js/docs/assets/rwanda-provinces.
geojson",
}
m.add_source("rwanda-provinces", source)
layer = {
"id": "rwanda-provinces",
"type": "fill",
"source": "rwanda-provinces",
"layout": {},
"paint": {
"fill-color": [
"let",
"density",
["/", ["get", "population"], ["get", "sq-km"]],
[
"interpolate",
["linear"],
["zoom"],
8,
[
"interpolate",
["linear"],
["var", "density"],
274,
["to-color", "#edf8e9"],
1551,
["to-color", "#006d2c"],
],
10,
[
"interpolate",
["linear"],
["var", "density"],
274,
["to-color", "#eff3ff"],
1551,
["to-color", "#08519c"],
],
],
],
"fill-opacity": 0.7,
},
}
m.add_layer(layer)
m
329
22.11. Visualizing Raster Data
22.11.1. Local Raster Data
To visualize local raster files, use the add_raster method. In the example, a Landsat image is downloaded
and displayed using two different band combinations:
• Band Combination 3-2-1 (True Color): Simulates natural colors in the RGB channels.
• Band Combination 4-3-2: Enhances vegetation, displaying it in red for better visual contrast. These
layers are added to the map along with controls to toggle them. You can adjust brightness and contrast
with the vmin and vmax arguments to improve clarity.
url = "https://github.com/opengeos/datasets/releases/download/raster/landsat.tif"
filepath = "landsat.tif"
leafmap.download_file(url, filepath, quiet=True)
m = leafmap.Map(style="streets")
m.add_raster(filepath, indexes=[3, 2, 1], vmin=0, vmax=100, name="Landsat-321")
m.add_raster(filepath, indexes=[4, 3, 2], vmin=0, vmax=100, name="Landsat-432")
m.add_layer_control()
m
A Digital Elevation Model (DEM) is also downloaded and visualized with a terrain color scheme. Leafmap’s
layer_interact method allows interactive adjustments.
url = "https://github.com/opengeos/datasets/releases/download/raster/srtm90.tif"
filepath = "srtm90.tif"
leafmap.download_file(url, filepath, quiet=True)
m = leafmap.Map(style="satellite")
m.add_raster(filepath, colormap="terrain", name="DEM")
m
m = leafmap.Map(style="satellite")
before = (
"https://github.com/opengeos/datasets/releases/download/raster/Libya-2023-07-
01.tif"
)
330
after = (
"https://github.com/opengeos/datasets/releases/download/raster/Libya-2023-09-
13.tif"
)
m.add_cog_layer(before, name="Before", attribution="Maxar")
m.add_cog_layer(after, name="After", attribution="Maxar", fit_bounds=True)
m.add_layer_control()
m
m = leafmap.Map(style="streets")
url = "https://canada-spot-ortho.s3.amazonaws.com/canada_spot_orthoimages/canada_
spot5_orthoimages/S5_2007/S5_11055_6057_20070622/S5_11055_6057_20070622.json"
m.add_stac_layer(url, bands=["pan"], name="Panchromatic", vmin=0, vmax=150)
m.add_stac_layer(url, bands=["B4", "B3", "B2"], name="RGB", vmin=0, vmax=150)
m.add_layer_control()
m
Leafmap also supports loading STAC items from the Microsoft Planetary Computer104. The example
demonstrates how to load a STAC item from the Planetary Computer and display it on the map.
collection = "landsat-8-c2-l2"
item = "LC08_L2SP_047027_20201204_02_T1"
m = leafmap.Map(style="satellite")
m.add_stac_layer(
collection=collection,
item=item,
assets=["SR_B5", "SR_B4", "SR_B3"],
name="Color infrared",
)
m
104 https://planetarycomputer.microsoft.com
331
22.12. Adding Custom Components
Enhance your maps by adding custom components such as images, videos, text, color bars, and legends.
layer = {
"id": "points",
"type": "symbol",
"source": "point",
"layout": {
"icon-image": "cat",
"icon-size": 0.25,
"text-field": "I love kitty!",
"text-font": ["Open Sans Regular"],
"text-offset": [0, 3],
"text-anchor": "top",
},
}
m.add_image("cat", image)
m.add_source("point", source)
m.add_layer(layer)
m
332
m = leafmap.Map(center=[-100, 40], zoom=3, style="positron")
content = '<img src="https://i.imgur.com/LmTETPX.png">'
m.add_html(content, bg_color="transparent", position="bottom-right")
m
🦥
m.add_image(image=image, width=250, height=250, position="bottom-right")
text = "I love sloth! "
m.add_text(text, fontsize=35, padding="20px")
image2 = "https://i.imgur.com/kZC2tpr.gif"
m.add_image(image=image2, bg_color="transparent", position="bottom-left")
m
333
font-size: 20px;
}
</style>
<body>
<span style='font-size:100px;'>🚀</span>
<p>I will display 🚁</p>
<p>I will display 🚂</p>
</body>
</html>
"""
m.add_html(html, bg_color="transparent")
m
import numpy as np
m = leafmap.Map(style="topo", height="500px")
dem = "https://github.com/opengeos/datasets/releases/download/raster/dem.tif"
m.add_cog_layer(
dem,
name="DEM",
colormap_name="terrain",
rescale="0, 1500",
fit_bounds=True,
nodata=np.nan,
)
m.add_colorbar(
cmap="terrain", vmin=0, vmax=1500, label="Elevation (m)", position="bottom-
right"
)
m.add_layer_control()
m
334
Figure 79: Adding a colorbar to a MapLibre map.
Make the color bar background transparent to blend seamlessly with the map.
m = leafmap.Map(style="topo")
m.add_cog_layer(
dem,
name="DEM",
colormap_name="terrain",
rescale="0, 1500",
nodata=np.nan,
fit_bounds=True,
)
m.add_colorbar(
cmap="terrain",
vmin=0,
vmax=1500,
label="Elevation (m)",
position="bottom-right",
transparent=True,
)
m
m = leafmap.Map(style="topo")
m.add_cog_layer(
dem,
335
name="DEM",
colormap_name="terrain",
rescale="0, 1500",
nodata=np.nan,
fit_bounds=True,
)
m.add_colorbar(
cmap="terrain",
vmin=0,
vmax=1500,
label="Elevation (m)",
position="bottom-right",
width=0.2,
height=3,
orientation="vertical",
)
m
336
legend_dict = {
"11 Open Water": "466b9f",
"12 Perennial Ice/Snow": "d1def8",
"21 Developed, Open Space": "dec5c5",
"22 Developed, Low Intensity": "d99282",
"23 Developed, Medium Intensity": "eb0000",
"24 Developed High Intensity": "ab0000",
"31 Barren Land (Rock/Sand/Clay)": "b3ac9f",
"41 Deciduous Forest": "68ab5f",
"42 Evergreen Forest": "1c5f2c",
"43 Mixed Forest": "b5c58f",
"51 Dwarf Scrub": "af963c",
"52 Shrub/Scrub": "ccb879",
"71 Grassland/Herbaceous": "dfdfc2",
"72 Sedge/Herbaceous": "d1d182",
"73 Lichens": "a3cc51",
"74 Moss": "82ba9e",
"81 Pasture/Hay": "dcd939",
"82 Cultivated Crops": "ab6c28",
"90 Woody Wetlands": "b8d9eb",
"95 Emergent Herbaceous Wetlands": "6c9fb8",
}
m.add_legend(
title="NLCD Land Cover Type",
legend_dict=legend_dict,
bg_color="rgba(255, 255, 255, 0.5)",
position="bottom-left",
)
m
337
Figure 80: Adding a legend to a MapLibre map.
m = leafmap.Map(
center=[-122.514426, 37.562984], zoom=17, bearing=-96, style="satellite"
)
urls = [
"https://static-assets.mapbox.com/mapbox-gl-js/drone.mp4",
"https://static-assets.mapbox.com/mapbox-gl-js/drone.webm",
]
coordinates = [
[-122.51596391201019, 37.56238816766053],
[-122.51467645168304, 37.56410183312965],
[-122.51309394836426, 37.563391708549425],
[-122.51423120498657, 37.56161849366671],
]
m.add_video(urls, coordinates)
m.add_layer_control()
m
338
m = leafmap.Map(center=[-115, 25], zoom=4, style="satellite")
urls = [
"https://data.opengeos.org/patricia_nasa.mp4",
"https://data.opengeos.org/patricia_nasa.webm",
]
coordinates = [
[-130, 32],
[-100, 32],
[-100, 13],
[-130, 13],
]
m.add_video(urls, coordinates)
m.add_layer_control()
m
url = "https://data.source.coop/vida/google-microsoft-open-buildings/pmtiles/go_
ms_building_footprints.pmtiles"
metadata = leafmap.pmtiles_metadata(url)
print(f"layer names: {metadata['layer_names']}")
print(f"bounds: {metadata['bounds']}")
style = {
"version": 8,
"sources": {
"example_source": {
"type": "vector",
"url": "pmtiles://" + url,
"attribution": "PMTiles",
}
},
105 https://protomaps.com/docs/pmtiles/
339
"layers": [
{
"id": "buildings",
"source": "example_source",
"source-layer": "building_footprints",
"type": "fill",
"paint": {"fill-color": "#3388ff", "fill-opacity": 0.5},
},
],
}
m.add_pmtiles(
url,
style=style,
visible=True,
opacity=1.0,
tooltip=True,
)
m
url = "https://data.source.coop/kerner-lab/fields-of-the-world/ftw-sources.
pmtiles"
metadata = leafmap.pmtiles_metadata(url)
print(f"layer names: {metadata['layer_names']}")
print(f"bounds: {metadata['bounds']}")
m = leafmap.Map()
# Define colors for each last digit (0-9)
style = {
"layers": [
{
"id": "Field Polygon",
"source": "example_source",
"source-layer": "ftw-sources",
"type": "fill",
106 https://fieldsofthe.world
107 https://source.coop/repositories/kerner-lab/fields-of-the-world/description
340
"paint": {
"fill-color": [
"case",
["==", ["%", ["to-number", ["get", "id"]], 10], 0],
"#FF5733", # Color for last digit 0
["==", ["%", ["to-number", ["get", "id"]], 10], 1],
"#33FF57", # Color for last digit 1
["==", ["%", ["to-number", ["get", "id"]], 10], 2],
"#3357FF", # Color for last digit 2
["==", ["%", ["to-number", ["get", "id"]], 10], 3],
"#FF33A1", # Color for last digit 3
["==", ["%", ["to-number", ["get", "id"]], 10], 4],
"#FF8C33", # Color for last digit 4
["==", ["%", ["to-number", ["get", "id"]], 10], 5],
"#33FFF6", # Color for last digit 5
["==", ["%", ["to-number", ["get", "id"]], 10], 6],
"#A833FF", # Color for last digit 6
["==", ["%", ["to-number", ["get", "id"]], 10], 7],
"#FF333D", # Color for last digit 7
["==", ["%", ["to-number", ["get", "id"]], 10], 8],
"#33FFBD", # Color for last digit 8
["==", ["%", ["to-number", ["get", "id"]], 10], 9],
"#FF9933", # Color for last digit 9
"#FF0000", # Fallback color if no match
],
"fill-opacity": 0.5,
},
},
{
"id": "Field Outline",
"source": "example_source",
"source-layer": "ftw-sources",
"type": "line",
"paint": {"line-color": "#ffffff", "line-width": 1, "line-opacity":
1},
},
],
}
m.add_basemap("Esri.WorldImagery")
m.add_pmtiles(url, style=style, name="FTW", zoom_to_layer=False)
m.add_layer_control()
m
341
Figure 81: A map of agricultural fields.
22.13.3. 3D PMTiles
Render global building data in 3D for a realistic, textured experience. Set building colors and extrusion
heights to create visually compelling cityscapes. For example, apply color gradients and height scaling
based on building attributes to differentiate buildings by their heights.
url = "https://data.source.coop/cholmes/overture/overture-buildings.pmtiles"
metadata = leafmap.pmtiles_metadata(url)
print(f"layer names: {metadata['layer_names']}")
print(f"bounds: {metadata['bounds']}")
22.13.4. 3D Buildings
For a simplified setup, the add_overture_3d_buildings function quickly adds 3D building data from
Overture’s latest release to a basemap, creating depth and spatial realism on the map (see Figure 82).
m = leafmap.Map(
center=[-74.0095, 40.7046], zoom=16, pitch=60, bearing=-17, style="positron",
height="500px"
)
m.add_basemap("Esri.WorldImagery", visible=False)
m.add_overture_3d_buildings(template="simple")
m.add_layer_control()
m
342
Figure 82: A map of New York City with Overture 3D buildings.
Transportation theme:
Places theme:
343
m = leafmap.Map(center=[-74.0095, 40.7046], zoom=16)
m.add_basemap("Esri.WorldImagery", visible=False)
m.add_overture_data(theme="places", opacity=0.8)
m.add_layer_control()
m
Addresses theme:
Base theme:
Divisions theme:
m = leafmap.Map()
m.add_basemap("Esri.WorldImagery", visible=False)
m.add_overture_data(theme="divisions", opacity=0.8)
m.add_layer_control()
m
m = leafmap.Map(
style="positron",
center=(-122.4, 37.74),
zoom=12,
pitch=40,
344
)
deck_grid_layer = {
"@@type": "GridLayer",
"id": "GridLayer",
"data": "https://raw.githubusercontent.com/visgl/deck.gl/master/examples/
layer-browser/data/sf.bike.parking.json",
"extruded": True,
"getPosition": "@@=COORDINATES",
"getColorWeight": "@@=SPACES",
"getElevationWeight": "@@=SPACES",
"elevationScale": 4,
"cellSize": 200,
"pickable": True,
}
url = "https://d2ad6b4ur7yvpq.cloudfront.net/naturalearth-3.3.0/ne_10m_airports.
geojson"
data = leafmap.read_geojson(url)
m = leafmap.Map(
style="positron",
center=(0.45, 51.47),
zoom=4,
pitch=30,
)
deck_geojson_layer = {
"@@type": "GeoJsonLayer",
"id": "airports",
"data": data,
"filled": True,
"pointRadiusMinPixels": 2,
"pointRadiusScale": 2000,
"getPointRadius": "@@=11 - properties.scalerank",
"getFillColor": [200, 0, 80, 180],
"autoHighlight": True,
345
"pickable": True,
}
deck_arc_layer = {
"@@type": "ArcLayer",
"id": "arcs",
"data": [
feature
for feature in data["features"]
if feature["properties"]["scalerank"] < 4
],
"getSourcePosition": [-0.4531566, 51.4709959], # London
"getTargetPosition": "@@=geometry.coordinates",
"getSourceColor": [0, 128, 200],
"getTargetColor": [200, 0, 80],
"getWidth": 2,
"pickable": True,
}
m.add_deck_layers(
[deck_geojson_layer, deck_arc_layer],
tooltip={
"airports": "{{ &properties.name }}",
"arcs": "gps_code: {{ properties.gps_code }}",
},
)
m
346
The result is a rich, interactive visualization that highlights both point data and relational data connec-
tions, useful for airport connectivity or hub-and-spoke modeling.
# import os
# os.environ["MAPTILER_KEY"] = "YOUR_PRIVATE_API_KEY"
# os.environ["MAPTILER_KEY_PUBLIC"] = "YOUR_PUBLIC_API_KEY"
m = leafmap.Map(
center=[-122.19861, 46.21168], zoom=13, pitch=60, bearing=150, style="3d-
terrain"
)
m.add_layer_control(bg_layers=True)
m.to_html(
"terrain.html",
title="Awesome 3D Map",
width="100%",
height="100%",
replace_key=True,
)
m
22.17. Exercises
22.17.1. Exercise 1: Setting up MapLibre and Basic Map Creation
• Initialize a map centered on a country of your choice with an appropriate zoom level and display it with
the dark-matter basemap.
• Change the basemap style to liberty and display it again.
347
22.17.2. Exercise 2: Customizing the Map View
• Create a 3D map of a city of your choice with an appropriate zoom level, pitch and bearing using the
liberty basemap.
• Experiment with MapTiler 3D basemap styles, such as 3d-satellite , 3d-hybrid , and 3d-topo , to
visualize a location of your choice in different ways. Please set your MapTiler API key as Colab secret
and do NOT expose the API key in the notebook.
348
22.17.6. Exercise 6: Adding Map Elements
• Image and Text: Add a logo image of your choice with appropriate text to the map.
• GIF: Add an animated GIF of your choice to the map.
349
Chapter 23. Cloud Computing with Earth Engine and Geemap
23.1. Introduction
This chapter introduces cloud-based geospatial analysis using Google Earth Engine108 (GEE) and the
geemap109 Python package. Google Earth Engine represents a paradigm shift from traditional desktop GIS
to cloud computing, enabling efficient processing of massive datasets without local data storage or high-
end hardware requirements.
The geemap package bridges Google Earth Engine’s power with Python’s familiar ecosystem. While
Earth Engine traditionally used JavaScript, geemap enables direct access to Earth Engine capabilities from
Jupyter notebooks, bringing planetary-scale geospatial analysis to Python users.
This chapter covers fundamental concepts and practical applications of geemap for geospatial analysis.
You’ll learn to set up the library, work with raster and vector data, create interactive visualizations, and
perform common geospatial operations—building a solid foundation for cloud-based geospatial analysis.
108 https://earthengine.google.com
109 https://geemap.org
110 https://earthengine.google.com/noncommercial
111 https://earthengine.google.com/commercial
350
23.3.2. Account Registration
Before using Google Earth Engine, you need a GEE account. Register at the GEE Registration Page112.
During registration, you’ll set up a Google Cloud Project and enable the Earth Engine API. Follow the
instructions and note your Google Cloud Project ID for the next section.
import ee
import geemap
ee.Authenticate()
This opens a browser window for authentication. Follow the instructions and copy the authorization code
if needed. Once authenticated, initialize a session with your Google Cloud Project ID:
ee.Initialize(project="your-ee-project")
m = geemap.Map()
To display the map in a Jupyter notebook, simply enter the map object:
112 https://code.earthengine.google.com/register
351
m
To hide specific map controls, set the corresponding control argument to False , such as
draw_ctrl=False to hide the drawing control.
m = geemap.Map(basemap="Esri.WorldImagery")
m
m.add_basemap("Esri.WorldTopoMap")
m.add_basemap("OpenTopoMap")
ROADMAP: https://mt1.google.com/vt/lyrs=m&x={x}&y={y}&z={z}
SATELLITE: https://mt1.google.com/vt/lyrs=s&x={x}&y={y}&z={z}
352
TERRAIN: https://mt1.google.com/vt/lyrs=p&x={x}&y={y}&z={z}
HYBRID: https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}
m = geemap.Map()
url = "https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}"
m.add_tile_layer(url, name="Google Satellite", attribution="Google")
m
You can also add text to the map using the add_text method. The text will be displayed at the specified
location on the map.
m = geemap.Map()
m.add("basemap_selector")
m
353
23.4.3. Inspector Tool
The inspector tool allows you to click on the map to query Earth Engine data at specific locations. This
is helpful for examining the properties of datasets directly on the map (see Figure 84).
Figure 84: The geemap inspector for querying Earth Engine data.
354
23.4.4. Layer Editor
With the layer editor, you can adjust visualization parameters of Earth Engine data for better clarity and
focus. It supports single-band images, multi-band images, and feature collections.
355
23.4.4.2. Multi-Band image
The example below shows how to use the layer editor to adjust the visualization parameters of a multi-
band image (Figure 86). Select an RGB band combination from the dropdown menu, and then adjust the
min and max values. Click the “Apply” button to update the visualization.
356
Figure 87: Layer Editor for a feature collection.
Use the drawing tools to draw shapes on the map. Then run the code below to clip the DEM to the
drawn shape.
357
23.5. The Earth Engine Data Catalog
The Earth Engine Data Catalog113 hosts an extensive collection of geospatial datasets. Currently, the
catalog includes over 1,000 datasets with a combined size exceeding 100 petabytes. Notable datasets
include Landsat, Sentinel, MODIS, and NAIP. For a comprehensive list of datasets in CSV or JSON format,
refer to the Earth Engine Datasets List114.
m = geemap.Map()
m
113 https://developers.google.com/earth-engine/datasets
114 https://tinyurl.com/gee-catalog
115 https://developers.google.com/earth-engine/datasets/catalog/USGS_SRTMGL1_003#description
358
Figure 88: Search for datasets in the data search panel.
Each dataset page contains detailed information such as availability, provider, Earth Engine snippet,
tags, description, and example code. The Image, ImageCollection, or FeatureCollection ID is essential for
accessing the dataset in Earth Engine’s JavaScript or Python APIs.
Click in the blue import button to copy the code snippet to your clipboard. Then paste the code snippet
into your Jupyter notebook, which you can modify as needed.
m = geemap.Map()
dataset = ee.Image(DATA.USGS_GAP_AK_2001)
m.add_layer(dataset, {}, "GAP Alaska")
m.centerObject(dataset, zoom=4)
m
359
get_metadata(DATA.USGS_GAP_AK_2001)
image = ee.Image("USGS/SRTMGL1_003")
image
360
23.7.1.2. Visualizing Earth Engine Images
To visualize an image, you can specify visualization parameters such as minimum and maximum values
and color palettes.
23.7.2. ImageCollection
An ImageCollection represents a sequence of images, often used for temporal data like satellite image
time series. ImageCollections are created by passing an Earth Engine asset ID into the ImageCollection
constructor. Asset IDs are available in the Earth Engine Data Catalog.
collection = ee.ImageCollection("COPERNICUS/S2_SR_HARMONIZED")
images.first()
We can also filter the collection by location, date, and cloud cover:
images = (
collection.filterBounds(geometry)
.filterDate("2024-07-01", "2024-10-01")
.filter(ee.Filter.lt("CLOUDY_PIXEL_PERCENTAGE", 5))
361
)
images.size()
m = geemap.Map()
image = images.first()
vis = {
"min": 0.0,
"max": 3000,
"bands": ["B4", "B3", "B2"],
}
m = geemap.Map()
image = images.median()
vis = {
"min": 0.0,
"max": 3000,
"bands": ["B8", "B4", "B3"],
}
362
23.8.1. Loading Feature Collections
The Earth Engine Data Catalog includes various vector datasets, such as U.S. Census data and country
boundaries, as feature collections. Feature Collection IDs are accessible by searching the data catalog. For
example, to load the TIGER roads data116 by the U.S. Census Bureau:
m = geemap.Map()
fc = ee.FeatureCollection("TIGER/2016/Roads")
m.set_center(-83.909463, 35.959111, 12)
m.add_layer(fc, {}, "Census roads")
m
m = geemap.Map()
states = ee.FeatureCollection("TIGER/2018/States")
fc = states.filter(ee.Filter.eq("NAME", "Tennessee"))
m.add_layer(fc, {}, "Tennessee")
m.center_object(fc, 7)
m
feat = fc.first()
feat.toDictionary()
You can also convert a FeatureCollection to a Pandas DataFrame for easier analysis with:
geemap.ee_to_df(fc)
The ee.Filter.inList() method is used to filter features based on a list of values. For example, to filter
the US States dataset to only include California, Oregon, and Washington, we can use the following code:
m = geemap.Map()
states = ee.FeatureCollection("TIGER/2018/States")
fc = states.filter(ee.Filter.inList("NAME", ["California", "Oregon",
"Washington"]))
m.add_layer(fc, {}, "West Coast")
m.center_object(fc, 5)
m
116 https://tinyurl.com/gee-tiger-roads
363
The filterBounds() method is used to filter a feature collection by a geometry. For example, to filter
the US states that intersect with a region of interest:
region = m.user_roi
if region is None:
region = ee.Geometry.BBox(-88.40, 29.88, -77.90, 35.39)
fc = ee.FeatureCollection("TIGER/2018/States").filterBounds(region)
m.add_layer(fc, {}, "Southeastern U.S.")
m.center_object(fc, 6)
A FeatureCollection can be used to clip an image. The clipToCollection() method clips an image to
the geometry of a feature collection. For example, to clip a Landsat image to the boundary of Germany:
Feature collections can also be styled with additional parameters. To apply a custom style, specify options
like color and line width:
364
Using Map.add_styled_vector() method, you can apply a color palette to style different features by
attribute:
landsat7 = ee.Image("LANDSAT/LE7_TOA_5YEAR/1999_2003").select(
["B1", "B2", "B3", "B4", "B5", "B7"]
)
hyperion = ee.ImageCollection("EO1/HYPERION").filter(
ee.Filter.date("2016-01-01", "2017-03-01")
)
hyperion_vis = {
"min": 1000.0,
"max": 14000.0,
"gamma": 2.5,
}
365
m.add_layer(hyperion, hyperion_vis, "Hyperion")
m.add_plot_gui()
m
m.set_plot_options(add_marker_cluster=True, overlay=True)
Adjust the plotting options for Hyperion data by setting a bar plot type with marker clusters, suitable for
visualizing Hyperion’s data values in bar format.
m.set_plot_options(add_marker_cluster=True, plot_type="bar")
23.9.2. Legends
23.9.2.1. Built-in Legends
Geemap provides built-in legends that can be easily added to maps for better interpretability of data
classes. To check all available built-in legends, run the code below:
366
Add an NLCD WMS layer along with its corresponding legend, which appears as an overlay, providing
users with an informative legend display.
Add an Earth Engine layer for NLCD land cover and display its legend, specifying title, legend type, and
dimensions for user convenience (Figure 90).
nlcd = ee.Image("USGS/NLCD_RELEASES/2021_REL/NLCD/2021")
landcover = nlcd.select("landcover")
367
23.9.2.2. Custom Legends
Create a custom legend by defining unique labels and colors for each class, allowing for flexible map
customization. Define a custom legend using a dictionary that links specific colors to labels. This approach
provides flexibility to represent specific categories in the data, such as various land cover types.
legend_dict = {
"11 Open Water": "466b9f",
"12 Perennial Ice/Snow": "d1def8",
"21 Developed, Open Space": "dec5c5",
"22 Developed, Low Intensity": "d99282",
"23 Developed, Medium Intensity": "eb0000",
"24 Developed High Intensity": "ab0000",
"31 Barren Land (Rock/Sand/Clay)": "b3ac9f",
"41 Deciduous Forest": "68ab5f",
"42 Evergreen Forest": "1c5f2c",
"43 Mixed Forest": "b5c58f",
"51 Dwarf Scrub": "af963c",
"52 Shrub/Scrub": "ccb879",
"71 Grassland/Herbaceous": "dfdfc2",
"72 Sedge/Herbaceous": "d1d182",
"73 Lichens": "a3cc51",
"74 Moss": "82ba9e",
"81 Pasture/Hay": "dcd939",
"82 Cultivated Crops": "ab6c28",
"90 Woody Wetlands": "b8d9eb",
"95 Emergent Herbaceous Wetlands": "6c9fb8",
}
nlcd = ee.Image("USGS/NLCD_RELEASES/2021_REL/NLCD/2021")
landcover = nlcd.select("landcover")
m = geemap.Map()
dem = ee.Image("USGS/SRTMGL1_003")
vis_params = {
368
"min": 0,
"max": 4000,
"palette": ["006633", "E5FFCC", "662A00", "D8D8D8", "F5F5F5"],
}
m.add_colorbar(
vis_params,
label="Elevation (m)",
layer_name="SRTM DEM",
orientation="vertical",
max_width="100px",
)
m = geemap.Map()
m.split_map(left_layer="Esri.WorldTopoMap", right_layer="OpenTopoMap")
m
Use Earth Engine layers in a split-panel map to compare NLCD data from 2001 and 2021. This layout is
effective for examining changes between datasets across time.
369
m = geemap.Map(center=(40, -100), zoom=4, height="600px")
nlcd_2001 = ee.Image("USGS/NLCD_RELEASES/2019_REL/NLCD/2001").select("landcover")
nlcd_2021 = ee.Image("USGS/NLCD_RELEASES/2021_REL/NLCD/2021").select("landcover")
m.split_map(left_layer, right_layer)
m
image = (
ee.ImageCollection("COPERNICUS/S2_SR_HARMONIZED")
.filterDate("2024-07-01", "2024-10-01")
.filter(ee.Filter.lt("CLOUDY_PIXEL_PERCENTAGE", 5))
.map(lambda img: img.divide(10000))
.median()
)
vis_params = [
{"bands": ["B4", "B3", "B2"], "min": 0, "max": 0.3, "gamma": 1.3},
{"bands": ["B8", "B11", "B4"], "min": 0, "max": 0.3, "gamma": 1.3},
{"bands": ["B8", "B4", "B3"], "min": 0, "max": 0.3, "gamma": 1.3},
{"bands": ["B12", "B12", "B4"], "min": 0, "max": 0.3, "gamma": 1.3},
]
labels = [
"Natural Color (B4/B3/B2)",
"Land/Water (B8/B11/B4)",
"Color Infrared (B8/B4/B3)",
"Vegetation (B12/B11/B4)",
]
geemap.linked_maps(
rows=2,
cols=2,
height="300px",
center=[35.959111, -83.909463],
zoom=12,
ee_objects=[image],
vis_params=vis_params,
labels=labels,
370
label_position="topright",
)
Figure 92: Linked maps showing Sentinel-2 imagery in different band combinations.
Use a timeseries inspector to compare changes in NLCD data across different years. This tool is ideal for
temporal data visualization, showing how land cover changes in an interactive format (see Figure 93).
m.ts_inspector(
left_ts=collection,
right_ts=collection,
left_names=years,
right_names=years,
left_vis=vis_params,
371
right_vis=vis_params,
width="80px",
)
m
m = geemap.Map()
collection = (
ee.ImageCollection("MODIS/MCD43A4_006_NDVI")
.filter(ee.Filter.date("2018-06-01", "2018-07-01"))
.select("NDVI")
)
vis_params = {
"min": 0.0,
"max": 1.0,
"palette": "ndvi",
}
372
Add a time slider for visualizing NOAA weather data over a 24-hour period, using color-coding to show
temperature variations. The slider enables temporal exploration of hourly data.
m = geemap.Map()
collection = (
ee.ImageCollection("NOAA/GFS0P25")
.filterDate("2018-12-22", "2018-12-23")
.limit(24)
.select("temperature_2m_above_ground")
)
vis_params = {
"min": -40.0,
"max": 35.0,
"palette": ["blue", "purple", "cyan", "green", "yellow", "red"],
}
Add a time slider to visualize Sentinel-2 imagery with specific bands and cloud cover filtering. This feature
enables temporal analysis of imagery data, allowing users to explore seasonal and other changes over time.
m = geemap.Map()
collection = (
ee.ImageCollection("COPERNICUS/S2_SR_HARMONIZED")
.filterBounds(ee.Geometry.Point([-83.909463, 35.959111]))
.filterMetadata("CLOUDY_PIXEL_PERCENTAGE", "less_than", 10)
.filter(ee.Filter.calendarRange(6, 8, "month"))
)
m.add_time_slider(collection, vis_params)
m.set_center(-83.909463, 35.959111, 12)
m
373
in_geojson = "https://github.com/gee-community/geemap/blob/master/examples/data/
countries.geojson"
m = geemap.Map()
fc = geemap.geojson_to_ee(in_geojson)
m.add_layer(fc.style(**{"color": "ff0000", "fillColor": "00000000"}), {},
"Countries")
m
url = "https://github.com/gee-community/geemap/blob/master/examples/data/
countries.zip"
geemap.download_file(url, overwrite=True)
in_shp = "countries.shp"
fc = geemap.shp_to_ee(in_shp)
m = geemap.Map()
m.add_layer(fc, {}, "Countries")
m
gdf = gpd.read_file(in_shp)
fc = geemap.gdf_to_ee(gdf)
m = geemap.Map()
m.add_layer(fc, {}, "Countries")
m
23.10.4. To GeoJSON
Filter U.S. state data to select Tennessee and save it as a GeoJSON file, which can be shared or used in
other GIS tools.
m = geemap.Map()
states = ee.FeatureCollection("TIGER/2018/States")
fc = states.filter(ee.Filter.eq("NAME", "Tennessee"))
374
m.add_layer(fc, {}, "Tennessee")
m.center_object(fc, 7)
m
geemap.ee_to_geojson(fc, filename="Tennessee.geojson")
23.10.5. To Shapefile
Export the filtered Tennessee FeatureCollection to a shapefile format for offline use or compatibility with
desktop GIS software.
geemap.ee_to_shp(fc, filename="Tennessee.shp")
23.10.6. To GeoDataFrame
Convert an Earth Engine FeatureCollection to a GeoDataFrame for further analysis in Python or use in
interactive maps.
gdf = geemap.ee_to_gdf(fc)
gdf
gdf.explore()
23.10.7. To DataFrame
Transform an Earth Engine FeatureCollection into a pandas DataFrame, which can then be used for data
analysis in Python.
df = geemap.ee_to_df(fc)
df
23.10.8. To CSV
Export the FeatureCollection data to a CSV file, useful for spreadsheet applications and data reporting.
geemap.ee_to_csv(fc, filename="Indiana.csv")
375
23.11. Raster Data Processing
23.11.1. Extract Pixel Values
23.11.1.1. Extracting Values to Points
Load and visualize SRTM DEM and Landsat 7 imagery. Points from a shapefile of U.S. cities are added,
and the tool extracts DEM values to these points, saving the results in a shapefile.
dem = ee.Image("USGS/SRTMGL1_003")
landsat7 = ee.Image("LANDSAT/LE7_TOA_5YEAR/1999_2003")
vis_params = {
"min": 0,
"max": 4000,
"palette": ["006633", "E5FFCC", "662A00", "D8D8D8", "F5F5F5"],
}
m.add_layer(
landsat7,
{"bands": ["B4", "B3", "B2"], "min": 20, "max": 200, "gamma": 2},
"Landsat 7",
)
m.add_layer(dem, vis_params, "SRTM DEM", True, 1)
m
Let’s download a shapefile of the world cities from GitHub and convert it to a feature collection.
in_shp = "us_cities.shp"
url = "https://github.com/opengeos/data/raw/main/us/us_cities.zip"
geemap.download_file(url)
in_fc = geemap.shp_to_ee(in_shp)
m.add_layer(in_fc, {}, "Cities")
With the DEM and cities, we can now extract the elevation values for each city:
Convert the shapefile to a GeoDataFrame and display the first few rows:
gdf = geemap.shp_to_gdf("dem.shp")
gdf.head()
376
In addition to extracting pixel values from a single-band image, we can also extract pixel values from a
multi-band image. The example below shows how to extract pixel values from a Landsat image using US
cities as points.
Convert the CSV file to a pandas DataFrame and display the first few rows.
df = geemap.csv_to_df("landsat.csv")
df.head()
image = ee.Image("USGS/SRTMGL1_003")
vis_params = {
"min": 0,
"max": 4000,
"palette": ["006633", "E5FFCC", "662A00", "D8D8D8", "F5F5F5"],
}
m.add_layer(image, vis_params, "SRTM DEM", True, 0.5)
m
Use the drawing tool to draw a line on the map. If no line is drawn, the code below will draw a line from
(-120.2232, 36.3148) to (-118.9269, 36.7121) to (-117.2022, 36.7562). Then, run the code below to extract
pixel values along the line.
line = m.user_roi
if line is None:
line = ee.Geometry.LineString(
[[-120.2232, 36.3148], [-118.9269, 36.7121], [-117.2022, 36.7562]]
)
m.add_layer(line, {}, "ROI")
m.centerObject(line)
The extract_transect function can be used to extract pixel values along a line. The function takes an
image, a line, and a reducer function. The reducer function is used to reduce the pixel values along the
line. The function returns a pandas dataframe with the pixel values and the coordinates of the line.
377
reducer = "mean"
transect = geemap.extract_transect(
image, line, n_segments=100, reducer=reducer, to_pandas=True
)
transect
With the resulting data frame from above, we can plot the elevation profile (Figure 94).
geemap.line_chart(
data=transect,
x="distance",
y="mean",
markers=True,
x_label="Distance (m)",
y_label="Elevation (m)",
height=400,
)
transect.to_csv("transect.csv")
378
# Add NASA SRTM
dem = ee.Image("USGS/SRTMGL1_003")
dem_vis = {
"min": 0,
"max": 4000,
"palette": ["006633", "E5FFCC", "662A00", "D8D8D8", "F5F5F5"],
}
m.add_layer(dem, dem_vis, "SRTM DEM")
Run the code below to calculate the mean elevation values within U.S. state boundaries using NASA’s
SRTM DEM data:
out_dem_stats = "dem_stats.csv"
geemap.zonal_stats(
dem, states, out_dem_stats, statistics_type="MEAN", scale=1000,
return_fc=False
)
Run the code below to calculate the mean spectral values within U.S. state boundaries using the Landsat
composite and the U.S. state boundaries:
out_landsat_stats = "landsat_stats.csv"
geemap.zonal_stats(
landsat,
states,
out_landsat_stats,
statistics_type="MEAN",
scale=1000,
return_fc=False,
)
379
m = geemap.Map(center=[40, -100], zoom=4)
nlcd_stats = "nlcd_stats.csv"
geemap.zonal_stats_by_group(
landcover,
states,
nlcd_stats,
stat_type="SUM",
denominator=1e6,
decimal_places=2,
scale=300
)
nlcd_stats = "nlcd_stats_pct.csv"
geemap.zonal_stats_by_group(
landcover,
states,
nlcd_stats,
stat_type="PERCENTAGE",
denominator=1e6,
decimal_places=2,
scale=300
)
380
m = geemap.Map(center=[40, -100], zoom=4)
dem = ee.Image("USGS/3DEP/10m")
vis = {"min": 0, "max": 4000, "palette": "terrain"}
m.add_layer(dem, vis, "DEM")
landcover = ee.Image("USGS/NLCD_RELEASES/2019_REL/NLCD/2019").select("landcover")
m.add_layer(landcover, {}, "NLCD 2019")
m.add_legend(title="NLCD Land Cover Classification", builtin_legend="NLCD")
m
stats.to_csv("mean.csv", index=False)
m = geemap.Map()
# Compute NDVI.
ndvi_1999 = (
landsat_1999.select("B4")
.subtract(landsat_1999.select("B3"))
.divide(landsat_1999.select("B4").add(landsat_1999.select("B3")))
)
381
Figure 95: NDVI visualization of Landsat 7 data.
382
image locally with specified scale and region or exporting to Google Drive. It’s possible to define custom
CRS and transformation settings when saving the image.
Add a Landsat image to the map.
m = geemap.Map()
image = ee.Image("LANDSAT/LC08/C02/T1_TOA/LC08_044034_20140318").select(
["B5", "B4", "B3"]
)
m.center_object(image)
m.add_layer(image, vis_params, "Landsat")
m
projection = image.select(0).projection().getInfo()
projection
crs = projection["crs"]
crs_transform = projection["transform"]
If you need to export multiple images covering the same region, you can specify the region, crs, and
crs_transform so that the exported images can align with each other.
geemap.ee_export_image(
image,
filename="landsat_crs.tif",
crs=crs,
crs_transform=crs_transform,
region=region,
)
383
The ee_export_image function introduced above can only export a small image (< 32 MB). If you need
to export a larger image, you can use the ee_export_image_to_drive function to export the image to
Google Drive, which can handle much larger images.
geemap.ee_export_image_to_drive(
image, description="landsat", folder="export", region=region, scale=30
)
Alternatively, you can use the download_ee_image() function to download the image to your local
machine.
Alternatively, you can use export all images in the collection to Google Drive.
384
m = geemap.Map()
states = ee.FeatureCollection("TIGER/2018/States")
fc = states.filter(ee.Filter.eq("NAME", "Alaska"))
m.add_layer(fc, {}, "Alaska")
m.center_object(fc, 4)
m
Let’s export the Alaska state boundary to various formats, including Shapefile, GeoJSON, CSV, GeoPandas
GeoDataFrame, and Pandas Dataframe.
geemap.ee_to_shp(fc, filename="Alaska.shp")
geemap.ee_export_vector(fc, filename="Alaska.shp")
geemap.ee_to_geojson(fc, filename="Alaska.geojson")
geemap.ee_to_csv(fc, filename="Alaska.csv")
gdf = geemap.ee_to_gdf(fc)
df = geemap.ee_to_df(fc)
geemap.ee_export_vector_to_drive(
fc, description="Alaska", fileFormat="SHP", folder="export"
)
385
nations to highlight different features: natural color (Red, Green, Blue) for intuitive visualization, false
color (NIR, Red, Green) for vegetation analysis, or other combinations for specific applications.
Here, we create a timelapse animation showing the dramatic urban expansion of Las Vegas from 1984
to 2025:
m = geemap.Map()
roi = ee.Geometry.BBox(-115.5541, 35.8044, -113.9035, 36.5581)
m.add_layer(roi)
m.center_object(roi)
m
timelapse = geemap.landsat_timelapse(
roi,
out_gif="las_vegas.gif",
start_year=1984,
end_year=2025,
bands=["NIR", "Red", "Green"],
frames_per_second=5,
title="Las Vegas, NV",
font_color="blue",
)
geemap.show_image(timelapse)
The Las Vegas example uses a false-color band combination (NIR, Red, Green) that makes urban areas
appear blue-gray while vegetation appears red. This combination is particularly effective for visualizing
urban development and land use changes.
Let’s create another example showing rapid development in Hong Kong:
m = geemap.Map()
roi = ee.Geometry.BBox(113.8252, 22.1988, 114.0851, 22.3497)
m.add_layer(roi)
m.center_object(roi)
m
timelapse = geemap.landsat_timelapse(
roi,
out_gif="hong_kong.gif",
start_year=1990,
end_year=2025,
start_date="01-01",
end_date="12-31",
bands=["SWIR1", "NIR", "Red"],
frames_per_second=3,
title="Hong Kong",
386
)
geemap.show_image(timelapse)
This Hong Kong example demonstrates land reclamation and urban development over three decades. The
SWIR1, NIR, Red band combination effectively highlights the contrast between water, vegetation, and
built environments.
Pan and zoom the map to an area of interest. Use the drawing tools to draw a rectangle on the map. If no
rectangle is drawn, the default rectangle shown below will be used.
roi = m.user_roi
if roi is None:
roi = ee.Geometry.BBox(-74.7222, -8.5867, -74.1596, -8.2824)
m.add_layer(roi)
m.center_object(roi)
timelapse = geemap.sentinel2_timelapse(
roi,
out_gif="sentinel2.gif",
start_year=2017,
end_year=2025,
start_date="05-01",
end_date="09-31",
frequency="year",
bands=["SWIR1", "NIR", "Red"],
frames_per_second=3,
title="Sentinel-2 Timelapse",
)
geemap.show_image(timelapse)
387
The Sentinel-2 animation uses the SWIR1, NIR, Red band combination, which is excellent for distinguish-
ing between water, vegetation, and built environments.
m = geemap.Map()
m
roi = m.user_roi
if roi is None:
roi = ee.Geometry.BBox(-18.6983, -36.1630, 52.2293, 38.1446)
m.add_layer(roi)
m.center_object(roi)
timelapse = geemap.modis_ndvi_timelapse(
roi,
out_gif="ndvi.gif",
data="Terra",
band="NDVI",
start_date="2000-01-01",
end_date="2022-12-31",
frames_per_second=3,
title="MODIS NDVI Timelapse",
overlay_data="countries",
)
geemap.show_image(timelapse)
This NDVI timelapse reveals the “green wave” phenomenon as vegetation responds to seasonal temper-
ature and precipitation patterns. The country overlays provide geographic context, helping viewers
understand how vegetation cycles vary across different climate zones and latitudes.
Now let’s create a global ocean temperature timelapse to demonstrate marine monitoring capabilities:
388
m = geemap.Map()
m
roi = m.user_roi
if roi is None:
roi = ee.Geometry.BBox(-171.21, -57.13, 177.53, 79.99)
m.add_layer(roi)
m.center_object(roi)
timelapse = geemap.modis_ocean_color_timelapse(
satellite="Aqua",
start_date="2018-01-01",
end_date="2020-12-31",
roi=roi,
frequency="month",
out_gif="temperature.gif",
overlay_data="continents",
overlay_color="yellow",
overlay_opacity=0.5,
)
geemap.show_image(timelapse)
This ocean temperature animation demonstrates the seasonal warming and cooling of Earth’s oceans,
revealing patterns like El Niño/La Niña cycles and the formation of seasonal temperature gradients. The
continental overlays help viewers understand the relationship between land masses and ocean temper-
ature patterns.
389
timelapse = geemap.goes_timelapse(
roi, "goes.gif", start_date, end_date, data, scan, framesPerSecond=5
)
geemap.show_image(timelapse)
timelapse = geemap.goes_timelapse(
roi, "hurricane.gif", start_date, end_date, data, scan, framesPerSecond=5
)
geemap.show_image(timelapse)
This hurricane animation demonstrates the dramatic cloud formations and eye wall structure of a major
storm system, showing how GOES data can be used for weather forecasting and disaster monitoring.
Finally, let’s create a wildfire monitoring animation using the specialized fire detection algorithm:
timelapse = geemap.goes_fire_timelapse(
roi, "fire.gif", start_date, end_date, data, scan, framesPerSecond=5
)
geemap.show_image(timelapse)
The fire timelapse uses specialized algorithms to highlight thermal anomalies associated with wildfires,
providing critical information for fire management and emergency response efforts.
390
The geemap charting system provides a comprehensive toolkit for creating various types of charts from
Earth Engine data, including time series plots, histograms, scatter plots, and statistical summaries. These
charts can be generated from features, images, image collections, and computed statistics, making it
possible to visualize everything from individual pixel values to continental-scale phenomena.
Understanding when and how to use different chart types is crucial for effective geospatial data analysis.
Time series charts reveal temporal patterns and trends, histograms show data distributions, scatter plots
reveal relationships between variables, and bar charts enable comparisons between categories or regions.
The following sections demonstrate these fundamental charting techniques using real Earth Engine
datasets.
import calendar
from geemap import chart
23.14.1.2. feature_by_feature
The feature_by_feature function creates charts where individual features (such as geographic
regions) are plotted along the x-axis, with multiple properties displayed as different data series. This
chart type is ideal for comparing multiple attributes across different geographic areas, such as comparing
monthly temperature or precipitation values across different ecoregions.
In this example, we visualize average monthly temperature data across different ecoregions, with each
month represented as a separate data series (Figure 96).
ecoregions = ee.FeatureCollection("projects/google/charts_feature_example")
features = ecoregions.select("[0-9][0-9]_tmean|label")
geemap.ee_to_df(features)
x_property = "label"
y_properties = [str(x).zfill(2) + "_tmean" for x in range(1, 13)]
colors = [
"#604791",
391
"#1d6b99",
"#39a8a7",
"#0f8755",
"#76b349",
"#f0af07",
"#e37d05",
"#cf513e",
"#96356f",
"#724173",
"#9c4f97",
"#696969",
]
title = "Average Monthly Temperature by Ecoregion"
x_label = "Ecoregion"
y_label = "Temperature"
fig = chart.feature_by_feature(
features,
x_property,
y_properties,
colors=colors,
labels=labels,
title=title,
x_label=x_label,
y_label=y_label,
)
fig
392
23.14.1.3. feature.by_property
The feature_by_property function creates charts where properties (such as months or categories)
are plotted along the x-axis, with different features displayed as separate data series. This chart type is
particularly useful for examining temporal patterns or categorical comparisons, showing how different
regions respond to seasonal or categorical variations.
This example displays average precipitation by month for different ecoregions, allowing us to compare
seasonal precipitation patterns across different ecological zones (Figure 97):
ecoregions = ee.FeatureCollection("projects/google/charts_feature_example")
features = ecoregions.select("[0-9][0-9]_ppt|label")
geemap.ee_to_df(features)
393
Figure 97: A chart showing the average monthly precipitation by ecoregion.
23.14.1.4. feature_groups
The feature_groups function creates charts that group features based on categorical properties,
allowing for comparison between different categories or classes. This is particularly useful for analyzing
how different groups of features differ in their measured properties, such as comparing environmental
conditions between different land cover types or climate zones.
In this example, we compare average January temperatures between ecoregions classified as “Warm” and
“Cold”, demonstrating how categorical grouping can reveal climate-based patterns (Figure 98):
ecoregions = ee.FeatureCollection("projects/google/charts_feature_example")
features = ecoregions.select("[0-9][0-9]_ppt|label")
features = ee.FeatureCollection("projects/google/charts_feature_example")
x_property = "label"
y_property = "01_tmean"
series_property = "warm"
title = "Average January Temperature by Ecoregion"
colors = ["#cf513e", "#1d6b99"]
labels = ["Warm", "Cold"]
fig = chart.feature_groups(
features,
x_property,
y_property,
series_property,
title=title,
colors=colors,
x_label="Ecoregion",
y_label="January Temperature (°C)",
394
legend_location="top-right",
labels=labels,
)
fig
23.14.1.5. feature_histogram
The feature_histogram function creates frequency distribution charts that show how values are
distributed across a dataset. Histograms are essential for understanding data characteristics such as central
tendency, spread, and the presence of outliers or multiple modes. This chart type is particularly valuable
for quality assessment and statistical analysis of geospatial data.
This example generates a histogram showing the distribution of July precipitation values across the
northwestern United States, revealing the range and frequency of precipitation levels in this region
(Figure 99):
source = ee.ImageCollection("OREGONSTATE/PRISM/Norm91m").toBands()
region = ee.Geometry.Rectangle(-123.41, 40.43, -116.38, 45.14)
features = source.sample(region, 5000)
geemap.ee_to_df(features.limit(5).select(["07_ppt"]))
property = "07_ppt"
title = "July Precipitation Distribution for NW USA"
fig = chart.feature_histogram(
features,
property,
max_buckets=None,
395
title=title,
x_label="Precipitation (mm)",
y_label="Pixel Count",
colors=["#1d6b99"],
)
fig
Figure 99: A histogram showing the distribution of July precipitation in the northwest United States.
396
23.14.2. Charting Images
Image-based charts extract statistical information from Earth Engine Images and present it in graphical
form. These charts are particularly useful for analyzing how image values vary across different geographic
regions, spectral bands, or statistical measures. Unlike feature-based charts that work with pre-computed
attributes, image charts perform spatial aggregation operations to derive statistics from pixel values.
23.14.2.1. image_by_region
The image_by_region function computes statistics from an image across different spatial regions, then
visualizes these statistics as a chart. This approach is particularly useful for comparing environmental
conditions across different administrative boundaries, ecoregions, or other spatial units.
This example extracts monthly temperature data from PRISM climate images and displays average
temperature values for different ecoregions (Figure 100):
ecoregions = ee.FeatureCollection("projects/google/charts_feature_example")
image = (
ee.ImageCollection("OREGONSTATE/PRISM/Norm91m").toBands().select("[0-9]
[0-9]_tmean")
)
labels = calendar.month_abbr[1:] # a list of month labels, e.g. ['Jan',
'Feb', ...]
title = "Average Monthly Temperature by Ecoregion"
fig = chart.image_by_region(
image,
ecoregions,
reducer="mean",
scale=500,
x_property="label",
title=title,
x_label="Ecoregion",
y_label="Temperature",
labels=labels,
)
fig
397
Figure 100: A chart showing the average monthly temperature by ecoregion.
23.14.2.2. image_regions
Generates a chart showing average monthly precipitation across regions, using properties to represent
months and comparing precipitation levels (Figure 101):
ecoregions = ee.FeatureCollection("projects/google/charts_feature_example")
image = (
ee.ImageCollection("OREGONSTATE/PRISM/Norm91m").toBands().select("[0-9]
[0-9]_ppt")
)
keys = [str(x).zfill(2) + "_ppt" for x in range(1, 13)]
values = calendar.month_abbr[1:] # a list of month labels, e.g. ['Jan',
'Feb', ...]
x_properties = dict(zip(keys, values))
title = "Average Ecoregion Precipitation by Month"
colors = ["#f0af07", "#0f8755", "#76b349"]
fig = chart.image_regions(
image,
ecoregions,
reducer="mean",
scale=500,
series_property="label",
x_labels=x_properties,
title=title,
colors=colors,
x_label="Month",
y_label="Precipitation (mm)",
legend_location="top-left",
398
)
fig
23.14.2.3. image_by_class
Displays spectral signatures of ecoregions by wavelength, showing reflectance values across spectral
bands for comparison (Figure 102):
ecoregions = ee.FeatureCollection("projects/google/charts_feature_example")
image = (
ee.ImageCollection("MODIS/061/MOD09A1")
.filter(ee.Filter.date("2018-06-01", "2018-09-01"))
.select("sur_refl_b0[0-7]")
.mean()
.select([2, 3, 0, 1, 4, 5, 6])
)
wavelengths = [469, 555, 655, 858, 1240, 1640, 2130]
fig = chart.image_by_class(
image,
class_band="label",
region=ecoregions,
reducer="MEAN",
scale=500,
x_labels=wavelengths,
title="Ecoregion Spectral Signatures",
x_label="Wavelength (nm)",
y_label="Reflectance (x1e4)",
399
colors=["#f0af07", "#0f8755", "#76b349"],
legend_location="top-left",
interpolation="basis",
)
fig
23.14.2.4. image_histogram
Generates histograms to visualize MODIS surface reflectance distribution across red, NIR, and SWIR
bands, providing insights into data distribution by band (Figure 103):
image = (
ee.ImageCollection("MODIS/061/MOD09A1")
.filter(ee.Filter.date("2018-06-01", "2018-09-01"))
.select(["sur_refl_b01", "sur_refl_b02", "sur_refl_b06"])
.mean()
)
region = ee.Geometry.Rectangle([-112.60, 40.60, -111.18, 41.22])
fig = chart.image_histogram(
image,
region,
scale=500,
max_buckets=200,
min_bucket_width=1.0,
max_raw=1000,
max_pixels=int(1e6),
title="MODIS SR Reflectance Histogram",
400
labels=["Red", "NIR", "SWIR"],
colors=["#cf513e", "#1d6b99", "#f0af07"],
)
fig
Figure 103: A histogram showing the distribution of MODIS surface reflectance values.
23.14.3.1. image_series
The image_series function creates time series charts from Earth Engine ImageCollections, showing
how values change over time for a specific location or region. This chart type is fundamental for moni-
toring environmental change, tracking seasonal cycles, and detecting trends or anomalies in geospatial
data.
This example creates a time series showing vegetation health indicators (NDVI and EVI) for a forest region
over a decade, revealing seasonal patterns and inter-annual variability (Figure 104):
401
ee.ImageCollection("MODIS/061/MOD13A1")
.filter(ee.Filter.date("2010-01-01", "2020-01-01"))
.select(["NDVI", "EVI"])
)
fig = chart.image_series(
veg_indices,
region=forest,
reducer=ee.Reducer.mean(),
scale=500,
x_property="system:time_start",
chart_type="LineChart",
x_cols="date",
y_cols=["NDVI", "EVI"],
colors=colors,
title=title,
x_label=x_label,
y_label=y_label,
legend_location="right",
)
fig
Figure 104: A time series chart showing the average vegetation index value by date for a forest region.
402
23.14.3.2. image_series_by_region
Shows NDVI time series by region, comparing desert, forest, and grassland areas. Each region has a unique
series for easy comparison (Figure 105):
ecoregions = ee.FeatureCollection("projects/google/charts_feature_example")
veg_indices = (
ee.ImageCollection("MODIS/061/MOD13A1")
.filter(ee.Filter.date("2010-01-01", "2020-01-01"))
.select(["NDVI"])
)
fig = chart.image_series_by_region(
veg_indices,
regions=ecoregions,
reducer=ee.Reducer.mean(),
band="NDVI",
scale=500,
x_property="system:time_start",
series_property="label",
chart_type="LineChart",
x_cols=x_cols,
y_cols=y_cols,
title=title,
x_label=x_label,
y_label=y_label,
colors=colors,
stroke_width=3,
legend_location="bottom-left",
)
fig
403
Figure 105: A time series chart showing the average NDVI value by date for different ecoregions.
23.14.3.3. image_doy_series
Plots average vegetation index by day of year for a specific region, showing seasonal vegetation patterns
over a decade (Figure 106):
# Import the example feature collection and subset the grassland feature.
grassland = ee.FeatureCollection("projects/google/
charts_feature_example").filter(
ee.Filter.eq("label", "Grassland")
)
fig = chart.image_doy_series(
image_collection=veg_indices,
region=grassland,
scale=500,
chart_type="LineChart",
404
title=title,
x_label=x_label,
y_label=y_label,
colors=colors,
stroke_width=5,
)
fig
Figure 106: A time series chart showing the average NDVI value by day of year for a grassland region.
23.14.3.4. image_doy_series_by_year
Compares NDVI by day of year across two different years, showing how vegetation patterns vary between
2012 and 2019 (Figure 107):
# Import the example feature collection and subset the grassland feature.
grassland = ee.FeatureCollection("projects/google/
charts_feature_example").filter(
ee.Filter.eq("label", "Grassland")
)
# Load MODIS vegetation indices data and subset years 2012 and 2019.
veg_indices = (
ee.ImageCollection("MODIS/061/MOD13A1")
.filter(
ee.Filter.Or(
ee.Filter.date("2012-01-01", "2013-01-01"),
ee.Filter.date("2019-01-01", "2020-01-01"),
)
405
)
.select(["NDVI", "EVI"])
)
fig = chart.doy_series_by_year(
veg_indices,
band_name="NDVI",
region=grassland,
scale=500,
chart_type="LineChart",
colors=colors,
title=title,
x_label=x_label,
y_label=y_label,
stroke_width=5,
)
fig
Figure 107: A time series chart showing the average NDVI value by day of year for a grassland region.
23.14.3.5. image_doy_series_by_region
Visualizes average NDVI by day of year across regions, showcasing seasonal changes for desert, forest,
and grassland ecoregions (Figure 108):
406
# Import the example feature collection and subset the grassland feature.
ecoregions = ee.FeatureCollection("projects/google/charts_feature_example")
fig = chart.image_doy_series_by_region(
veg_indices,
"NDVI",
ecoregions,
region_reducer="mean",
scale=500,
year_reducer=ee.Reducer.mean(),
start_day=1,
end_day=366,
series_property="label",
stroke_width=5,
chart_type="LineChart",
title=title,
x_label=x_label,
y_label=y_label,
colors=colors,
legend_location="right",
)
fig
407
Figure 108: A time series chart showing the average NDVI value by day of year by ecoregion.
# Import the example feature collection and subset the forest feature.
forest = ee.FeatureCollection("projects/google/charts_feature_example").filter(
ee.Filter.eq("label", "Forest")
)
408
.select("sur_refl_b0[0-7]")
.mean()
)
# Convert NIR and SWIR value lists to an array to be plotted along the y-axis.
y_values = pixel_vals.toArray(["sur_refl_b02", "sur_refl_b06"])
# Get the red band value list; to be plotted along the x-axis.
x_values = ee.List(pixel_vals.get("sur_refl_b01"))
fig = chart.array_values(
y_values,
axis=1,
x_labels=x_values,
series_names=["NIR", "SWIR"],
chart_type="ScatterChart",
colors=colors,
title=title,
x_label="Red reflectance (x1e4)",
y_label="NIR & SWIR reflectance (x1e4)",
default_size=15,
xlim=(0, 800),
)
fig
409
Figure 109: A scatter plot showing the relationship between different spectral bands in a forest area.
# Import a digital surface model and add latitude and longitude bands.
elev_img = ee.Image("USGS/SRTMGL1_003").select("elevation").addBands(lat_lon_img)
# Reduce elevation and coordinate bands by transect line; get a dictionary with
# band names as keys, pixel values as lists.
elev_transect = elev_img.reduceRegion(
reducer=ee.Reducer.toList(),
geometry=transect,
scale=1000,
)
# Get longitude and elevation value lists from the reduction dictionary.
lon = ee.List(elev_transect.get("longitude"))
elev = ee.List(elev_transect.get("elevation"))
410
# Sort the longitude and elevation values by ascending longitude.
lon_sort = lon.sort(lon)
elev_sort = elev.sort(lon)
fig = chart.array_values(
elev_sort,
x_labels=lon_sort,
series_names=["Elevation"],
chart_type="AreaChart",
colors=["#1d6b99"],
title="Elevation Profile Across Longitude",
x_label="Longitude",
y_label="Elevation (m)",
stroke_width=5,
fill="bottom",
fill_opacities=[0.4],
ylim=(0, 2500),
)
fig
411
# Import a Landsat 8 collection and filter to a single path/row.
col = ee.ImageCollection("LANDSAT/LC08/C02/T1_L2").filter(
ee.Filter.expression("WRS_PATH == 45 && WRS_ROW == 30")
)
# Reduce image properties to a series of lists; one for each selected property.
propVals = col.reduceColumns(
reducer=ee.Reducer.toList().repeat(2),
selectors=["CLOUD_COVER", "GEOMETRIC_RMSE_MODEL"],
).get("list")
# Get selected image property value lists; to be plotted along x and y axes.
x = ee.List(ee.List(propVals).get(0))
y = ee.List(ee.List(propVals).get(1))
412
Figure 111: A scatter plot showing the relationship between cloud cover and geometric RMSE for Landsat
8 images.
import math
start = -2 * math.pi
end = 2 * math.pi
points = ee.List.sequence(start, end, None, 50)
def sin_func(val):
return ee.Number(val).sin()
values = points.map(sin_func)
fig = chart.array_values(
values,
points,
chart_type="LineChart",
colors=["#39a8a7"],
title="Sine Function",
x_label="radians",
y_label="sin(x)",
marker="circle",
)
fig
413
23.14.5. Charting Data Table
DataTable charts provide the most flexible approach to visualization by working with structured tabular
data. This method is particularly useful when you need to combine Earth Engine data with external
datasets, perform complex calculations, or create custom visualizations that don’t fit standard templates.
DataTables can be created manually or computed from Earth Engine operations.
import pandas as pd
data = {
"State": ["CA", "NY", "IL", "MI", "OR"],
"Population": [37253956, 19378102, 12830632, 9883640, 3831074],
}
df = pd.DataFrame(data)
df
fig = chart.Chart(
df,
x_cols=["State"],
y_cols=["Population"],
chart_type="ColumnChart",
colors=["#1d6b99"],
title="State Population (US census, 2010)",
x_label="State",
y_label="Population",
)
fig
414
Figure 113: A chart showing the population of each state in the US.
# Import the example feature collection and subset the forest feature.
forest = ee.FeatureCollection("projects/google/charts_feature_example").filter(
ee.Filter.eq("label", "Forest")
)
# Build a feature collection where each feature has a property that represents
# a DataFrame row.
def aggregate(img):
# Reduce the image to the mean of pixels intersecting the forest ecoregion.
stat = img.reduceRegion(
415
**{"reducer": ee.Reducer.mean(), "geometry": forest, "scale": 500}
)
reduction_table = veg_indices.map(aggregate)
# Aggregate the 'row' property from all features in the new feature collection
# to make a server-side 2-D list (DataTable).
data_table_server = reduction_table.aggregate_array("row")
# Define column names and properties for the DataTable. The order should
# correspond to the order in the construction of the 'row' property above.
column_header = ee.List([["Date", "EVI", "NDVI"]])
fig = chart.Chart(
data_table,
chart_type="LineChart",
x_cols="Date",
y_cols=["EVI", "NDVI"],
colors=["#e37d05", "#1d6b99"],
title="Average Vegetation Index Value by Date for Forest",
x_label="Date",
y_label="Vegetation index (x1e4)",
stroke_width=3,
legend_location="right",
)
fig
416
Figure 114: A chart showing the average NDVI value by date for a forest region.
# Define a band of interest (NDVI), import the MODIS vegetation index dataset,
# and select the band.
band = "NDVI"
ndvi_col = ee.ImageCollection("MODIS/061/MOD13Q1").select(band)
# Map over the collection to add a day of year (doy) property to each image.
def set_doy(img):
doy = ee.Date(img.get("system:time_start")).getRelative("day", "year")
# Add 8 to day of year number so that the doy label represents the middle of
# the 16-day MODIS NDVI composite.
return img.set("doy", ee.Number(doy).add(8))
ndvi_col = ndvi_col.map(set_doy)
417
# Join all coincident day of year observations into a set of image collections.
distinct_doy = ndvi_col.filterDate("2013-01-01", "2014-01-01")
filter = ee.Filter.equals(**{"leftField": "doy", "rightField": "doy"})
join = ee.Join.saveAll("doy_matches")
join_col = ee.ImageCollection(join.apply(distinct_doy, ndvi_col, filter))
# Calculate the absolute range, interquartile range, and median for the set
# of images composing each coincident doy observation group. The result is
# an image collection with an image representative per unique doy observation
# with bands that describe the 0, 25, 50, 75, 100 percentiles for the set of
# coincident doy images.
def cal_percentiles(img):
doyCol = ee.ImageCollection.fromImages(img.get("doy_matches"))
return doyCol.reduce(
ee.Reducer.percentile([0, 25, 50, 75, 100], ["p0", "p25", "p50", "p75",
"p100"])
).set({"doy": img.get("doy")})
comp = ee.ImageCollection(join_col.map(cal_percentiles))
def order_percentiles(img):
stats = ee.Dictionary(
img.reduceRegion(
**{"reducer": ee.Reducer.first(), "geometry": geometry, "scale": 250}
)
)
# Order the percentile reduction elements according to how you want columns
# in the DataTable arranged (x-axis values need to be first).
row = ee.List(
[
img.get("doy"),
stats.get(band + "_p50"),
stats.get(band + "_p0"),
stats.get(band + "_p25"),
stats.get(band + "_p75"),
418
stats.get(band + "_p100"),
]
)
reduction_table = comp.map(order_percentiles)
# Define column names and properties for the DataTable. The order should
# correspond to the order in the construction of the 'row' property above.
column_header = ee.List([["DOY", "median", "p0", "p25", "p75", "p100"]])
fig = chart.Chart(
df,
chart_type="IntervalChart",
x_cols="DOY",
y_cols=["p0", "p25", "median", "p75", "p100"],
title="Annual NDVI Time Series with Inter-Annual Variance",
x_label="Day of Year",
y_label="Vegetation index (x1e4)",
stroke_width=1,
fill="between",
fill_colors=["#b6d1c6", "#83b191", "#83b191", "#b6d1c6"],
fill_opacities=[0.6] * 4,
labels=["p0", "p25", "median", "p75", "p100"],
display_legend=True,
legend_location="top-right",
ylim=(0, 10000),
)
fig
419
Figure 115: A chart showing the annual NDVI time series with inter-annual variance.
420
Analytical Workflows: The chapter covered essential operations including data filtering, zonal statistics,
map algebra, and data export. These foundational skills enable complex analytical workflows that leverage
Earth Engine’s computational power for scientific research and operational applications.
The paradigm shift to cloud-based geospatial analysis opens new possibilities for research and analysis at
scales previously accessible only to major institutions, democratizing access to planetary-scale environ-
mental monitoring and analysis capabilities.
23.16. Exercises
23.16.1. Exercise 1: Visualizing DEM Data
Find a DEM dataset in the Earth Engine Data Catalog and clip it to a specific area (e.g., your country, state,
or city). Display it with an appropriate color palette. For example, the sample map below shows the DEM
of the state of Colorado.
421
23.16.5. Exercise 5: Visualizing Land Cover Change
Use the USGS National Land Cover Database and US Census States to create a split-panel map for
visualizing land cover change (2001-2019) for a US state of your choice. Make sure you add the NLCD
legend to the map.
422
Chapter 24. Hyperspectral Data Visualization with HyperCoast
24.1. Introduction
Hyperspectral remote sensing captures detailed spectral information across hundreds of narrow wave-
length bands, representing one of the most advanced forms of Earth observation technology. Unlike
traditional multispectral imagery that typically uses 3-10 broad bands, hyperspectral sensors record
reflectance values across 200-300 or more contiguous spectral bands, creating a continuous spectrum for
each pixel.
This rich spectral information enables scientists to identify and analyze materials, vegetation types, water
quality parameters, and geological features with unprecedented precision. Each pixel contains a complete
spectral signature that can distinguish between materials that appear identical in traditional RGB or
multispectral imagery.
NASA’s Earth Surface Mineral Dust Source Investigation (EMIT)117 instrument on the International Space
Station provides leading hyperspectral data. While designed to map mineral composition of dust source
regions, EMIT’s applications extend to vegetation monitoring, water quality assessment, and environ-
mental change detection.
The HyperCoast118 Python package supports reading and visualizing hyperspectral data from various mis-
sions, including AVIRIS119, NEON120, PACE121, EMIT, and DESIS122, along with datasets like ECOSTRESS123.
Users can interactively explore hyperspectral data, extract spectral signatures, modify band combinations
and colormaps, visualize data in 3D, and perform interactive slicing and thresholding operations. By
leveraging the earthaccess124 package, HyperCoast provides tools for searching NASA’s hyperspectral
data archives. This makes HyperCoast a versatile tool for working with hyperspectral data globally, with
particular strength in coastal regions.
117 https://earth.jpl.nasa.gov/emit
118 https://hypercoast.org
119 https://aviris.jpl.nasa.gov
120 https://data.neonscience.org/data-products/DP3.30006.001
121 https://pace.gsfc.nasa.gov
122 https://tinyurl.com/nasa-desis
123 https://ecostress.jpl.nasa.gov
124 https://github.com/nsidc/earthaccess
423
24.3. Environment Setup
Before working with hyperspectral data, we need to install the HyperCoast package along with its
dependencies. Uncomment and run the following cell to install the required packages.
Once installed, we can import the HyperCoast library, which provides all the functions we’ll need for this
chapter.
import hypercoast
Accessing NASA’s hyperspectral data requires authentication through the Earthdata Login system. This
system ensures proper access credentials and helps NASA track data usage for scientific purposes. Regis-
tration is free and provides access to vast archives of Earth observation data.
To download and access data, create an Earthdata login at urs.earthdata.nasa.gov125. Once registered, run
the following cell and enter your NASA Earthdata credentials.
hypercoast.nasa_earth_login()
The search function returns two objects: a list of dataset results and a GeoDataFrame containing spatial
footprints. Bounding box coordinates use longitude-latitude format (west, south, east, north), while
temporal ranges use ISO date format strings.
Let’s plot the footprints of the returned datasets on a map (see Figure 116):
125 https://urs.earthdata.nasa.gov
424
gdf.explore()
Figure 116: The EMIT image footprints covering the Florida coast.
This interactive map shows the spatial coverage of each EMIT dataset matching your search criteria. The
footprints help you understand data availability and identify datasets that best cover your area of interest.
Run the following cell to download the first dataset. Note that downloads may take time as hyperspectral
data files are large (typically 1-2 GB). Data is stored in the data directory.
hypercoast.download_emit(results[:1], out_dir="data")
425
Figure 117: The interactive search GUI for the EMIT dataset.
Search results are automatically stored in the map object as a GeoDataFrame, allowing you to examine
metadata and attributes of each dataset programmatically. Run the following cell to see the first few rows.
if hasattr(m, "_NASA_DATA_GDF"):
display(m._NASA_DATA_GDF.head())
You can download the first dataset from search results by running the following cell. Note that downloads
may take time as the data files are large and stored in the data directory.
hypercoast.download_emit(results[:1], out_dir="data")
url = "https://github.com/opengeos/datasets/releases/download/hypercoast/EMIT_L2
A_RFL_001_20240404T161230_2409511_009.nc"
filepath = "data/EMIT_L2A_RFL_001_20240404T161230_2409511_009.nc"
hypercoast.download_file(url, filepath, quiet=True)
The dataset filename follows NASA’s naming convention, encoding important metadata including
processing level (L2A), product type (RFL for reflectance), acquisition date, and orbit information. Under-
standing these conventions helps organize and manage hyperspectral data collections.
426
24.6. Reading Hyperspectral Data
Once downloaded, hyperspectral data must be read and processed into a format suitable for analysis.
HyperCoast handles the complexities of EMIT data format and structure, providing a clean interface to
the underlying spectral information.
Read the downloaded EMIT data and process it as an xarray.Dataset with 285 bands:
dataset = hypercoast.read_emit(filepath)
dataset
The resulting xarray Dataset provides structured representation of hyperspectral data, including coordi-
nates, data variables, and attributes. The 285 spectral bands span visible through shortwave infrared
portions of the electromagnetic spectrum, typically covering wavelengths from approximately 380 to 2500
nanometers. Each band represents a narrow spectral slice, enabling detailed analysis of surface materials.
427
m = hypercoast.Map()
m.add_basemap("SATELLITE")
m.add_emit(dataset, wavelengths=[1000, 600, 500], vmin=0, vmax=0.3,
layer_name="EMIT")
m.add("spectral")
m
Figure 118: The interactive GUI for changing the band combination for the EMIT dataset.
This interactive visualization demonstrates several key capabilities. The wavelength selection (1000, 600,
500 nm) creates a false-color composite enhancing vegetation and water features. The spectral profiling
tool allows you to click any pixel to extract its complete spectral signature, revealing detailed reflectance
characteristics across all wavelengths (see Figure 119). This capability is fundamental to hyperspectral
analysis, enabling material identification and quantitative analysis based on spectral properties.
428
Figure 119: The interactive GUI for extracting spectral signatures from the EMIT dataset.
The extracted spectral signature can be saved to a CSV file for further analysis. Click the “Save” button
on the toolbar to save the signature.
Subsetting the data serves multiple purposes: reducing computational requirements, focusing analysis
on areas with valid data, and eliminating edge effects that can occur in satellite imagery. The selected
area should contain diverse land cover types to demonstrate hyperspectral data’s spectral discrimination
capabilities.
Let’s visualize the EMIT data in 3D with an RGB image overlaid on top of the 3D plot (see Figure 120).
p = hypercoast.image_cube(
ds,
variable="reflectance",
cmap="jet",
clim=(0, 0.4),
429
rgb_wavelengths=[1000, 700, 500],
rgb_gamma=2,
title="EMIT Reflectance",
)
p.show()
A smaller spatial subset ensures responsive interaction while maintaining sufficient detail for meaningful
analysis. The selected area should ideally contain contrasting materials or land cover types to demonstrate
spectral discrimination. Run the following cell to create a smaller subset and visualize it (Figure 121).
430
p = hypercoast.image_cube(
ds,
variable="reflectance",
cmap="jet",
clim=(0, 0.5),
rgb_wavelengths=[1000, 700, 500],
rgb_gamma=2,
title="EMIT Reflectance",
widget="plane",
)
p.add_text("Band slicing", position="upper_right", font_size=14)
p.show()
Figure 121: The interactive slicing widget for the EMIT dataset.
The slicing widget provides an intuitive way to navigate through the spectral dimension. As you move
the slicing plane up and down, you’re viewing different wavelength “slices” of the hyperspectral cube.
This interaction reveals how different materials respond to various wavelengths, helping identify optimal
bands for specific applications. Drag the plane up and down to slice the data in 3D.
The interactive slicing capability demonstrates how spectral information varies across wavelengths. Some
features may be prominent in certain spectral ranges while invisible in others, highlighting the importance
of multi-band analysis in hyperspectral remote sensing.
431
24.10. Interactive Thresholding
Thresholding is a fundamental technique for isolating pixels with specific reflectance characteristics. In
hyperspectral data, thresholding can help identify materials with known spectral properties or highlight
areas with unusual spectral signatures warranting further investigation.
Run the following cell to display the interactive thresholding widget (see Figure 122). Note that this
function does not work in the Google Colab environment.
p = hypercoast.image_cube(
ds,
variable="reflectance",
cmap="jet",
clim=(0, 0.5),
rgb_wavelengths=[1000, 700, 500],
rgb_gamma=2,
title="EMIT Reflectance",
widget="threshold",
)
p.add_text("Thresholding", position="upper_right", font_size=14)
p.show()
Figure 122: The interactive thresholding widget for the EMIT dataset.
The thresholding widget allows you to interactively set reflectance cutoff values, revealing only pixels
meeting your criteria. This technique is particularly useful for preliminary material classification or
identifying areas of interest for detailed analysis. Drag the slider to adjust the threshold value and see
corresponding pixels with reflectance values above the threshold.
432
24.11. Key Takeaways
This chapter introduced fundamental concepts and techniques for working with hyperspectral data using
the HyperCoast package:
Hyperspectral Data Structure: Hyperspectral data is three-dimensional with two spatial dimensions
and one spectral dimension, enabling more sophisticated analysis than traditional multispectral imagery.
The continuous spectral coverage from instruments like EMIT allows detailed material discrimination
and quantitative surface property analysis.
Data Discovery and Access: The workflow begins with data discovery and acquisition through both
programmatic and interactive search methods. Authentication through NASA Earthdata services provides
access to extensive archives of high-quality hyperspectral datasets.
Visualization Techniques: False-color composites reveal features invisible to the human eye, while
spectral profiling enables quantitative material property analysis. Three-dimensional visualization
through image cubes provides intuitive understanding of spatial-spectral relationships.
Interactive Exploration: Slicing and thresholding widgets facilitate data exploration and analysis,
helping identify optimal wavelength ranges for specific applications and enabling preliminary material
classification based on spectral characteristics.
These fundamental techniques provide the foundation for advanced hyperspectral analysis methods,
including automated classification, spectral unmixing, and quantitative parameter retrieval.
24.12. Exercises
Practice your understanding of hyperspectral data visualization with these hands-on exercises:
433
24.12.4. Exercise 4: Interactive Visualization
Create an interactive map visualization of your hyperspectral data with multiple band combinations.
Add spectral profiling capabilities and extract profiles from at least five different locations. Export the
spectral profiles and create comparative plots showing how different materials exhibit distinct spectral
characteristics.
434
Chapter 25. High-Performance Geospatial Analytics with
DuckDB
25.1. Introduction
DuckDB126 is a revolutionary analytical database engine that transforms how we work with geospatial
data. In today’s data-driven world, geospatial datasets are growing exponentially in both size and
complexity—from GPS tracking data and satellite imagery to urban planning datasets and environmental
monitoring systems. Traditional data processing tools often struggle with these demands, leading to slow
analysis, complex setup requirements, and limitations in handling diverse data formats.
DuckDB addresses these challenges by bringing high-performance, SQL-based analysis capabilities
directly to your geospatial workflows. Think of it as having the power of a sophisticated database engine
embedded right in your Python environment, without the overhead of managing database servers or
complex infrastructure.
126 https://duckdb.org
435
• Work with geometry data types and spatial functions
• Perform spatial relationships analysis
• Execute high-performance spatial joins and operations
• Visualize geospatial data with Python libraries
• Conduct large-scale geospatial data analysis
The duckdb package provides the core DuckDB database engine with Python bindings, while leafmap
and pandas offer additional utilities for working with geospatial data and data analysis in Python.
These configuration settings optimize our Jupyter environment for spatial analysis. The
autopandas = True setting automatically converts SQL query results to pandas DataFrames for easy
manipulation. Setting feedback = False reduces verbose output to keep notebooks clean, while
displaycon = False hides connection details from the output.
import duckdb
import leafmap
import pandas as pd
436
# Connect to persistent file-based database
con = duckdb.connect('filename.db')
con.sql("FROM duckdb_extensions();")
DuckDB only shows the first 10 rows and the last 10 rows of a table. To show all rows, you can convert
the table to a pandas DataFrame using the .df() method:
437
con.sql("FROM duckdb_extensions();").df()
127 https://opengeos.org/data/duckdb/cities.csv
128 https://opengeos.org/data/duckdb/countries.csv
438
%%sql
SELECT * FROM 'https://opengeos.org/data/duckdb/cities.csv';
In this query, SELECT * means “show me all columns,” while FROM 'https://...' tells DuckDB to
read data directly from a web URL. The %%sql magic command allows us to write SQL in a Jupyter
notebook cell.
%%sql
SELECT * FROM 'https://opengeos.org/data/duckdb/countries.csv';
439
25.4.4. Creating Persistent Tables
While reading data directly from URLs is convenient for exploration, creating tables gives us several
advantages including faster subsequent queries (data is cached locally), the ability to index the data for
better performance, and more consistent column names and data types.
%%sql
CREATE OR REPLACE TABLE cities AS SELECT * FROM 'https://opengeos.org/data/
duckdb/cities.csv';
In this command, CREATE OR REPLACE TABLE creates a new table (or replaces if it exists), cities is
our chosen table name, and AS SELECT * means “populate this table with all data from the source.”
Let’s also create a table for the countries dataset:
%%sql
CREATE OR REPLACE TABLE countries AS SELECT * FROM 'https://opengeos.org/data/
duckdb/countries.csv';
%%sql
FROM cities;
Note: FROM cities is shorthand for SELECT * FROM cities . This simplified syntax makes data
exploration faster.
%%sql
FROM countries;
%%sql
SELECT * FROM cities LIMIT 10;
440
Breaking this down: SELECT * means “show all columns,” FROM cities specifies which table to query,
and LIMIT 10 restricts output to first 10 rows (useful for large datasets).
%%sql
SELECT name, country FROM cities LIMIT 10;
This approach is important for geospatial work because it reduces data transfer and memory usage,
focuses your analysis on relevant attributes, and makes results easier to read and understand.
%%sql
SELECT DISTINCT country FROM cities LIMIT 10;
Common use cases include determining how many different countries are represented, what are the
unique administrative regions, and which categories exist in a land-use classification.
%%sql
SELECT COUNT(*) FROM cities;
What this tells us: The total number of cities in our dataset.
%%sql
SELECT COUNT(DISTINCT country) FROM cities;
What this tells us: How many different countries are represented.
%%sql
SELECT MAX(population) FROM cities;
%%sql
SELECT MIN(population) FROM cities;
441
%%sql
SELECT AVG(population) FROM cities;
What this tells us: The average city population across all cities.
Aggregations are important in geospatial analysis because they help you understand the scale and distri-
bution of your data, identify outliers that might need special attention, calculate summary statistics for
regions or categories, and validate data quality before spatial operations.
%%sql
SELECT * FROM cities WHERE population > 1000000 LIMIT 10;
Understanding this query: WHERE population > 1000000 filters for cities with more than 1 million
people, which is equivalent to finding “major cities” or “megacities.” Common spatial filters might include
elevation ranges, area thresholds, or specific regions.
%%sql
SELECT * FROM cities ORDER BY population DESC LIMIT 10;
This query shows us several important concepts: ORDER BY population DESC sorts by population in
descending order (largest first), LIMIT 10 gives us the top 10 most populous cities, and this represents
a common pattern in geospatial analysis for finding the largest, smallest, northernmost, etc.
%%sql
SELECT country, COUNT(*) as city_count, AVG(population) as avg_population
FROM cities
GROUP BY country
ORDER BY city_count DESC
LIMIT 10;
442
Breaking down this complex query: GROUP BY country creates groups for each unique
country, COUNT(*) as city_count counts how many cities are in each country,
AVG(population) as avg_population calculates the average city population per country, and
ORDER BY city_count DESC shows countries with the most cities first.
Grouping is important in geospatial analysis for understanding regional patterns (population distribution,
economic activity), calculating statistics by administrative boundaries, comparing characteristics across
different areas, and preparing data for choropleth maps and regional visualizations.
The fetchall() method returns results as Python lists and tuples, making it best for small result sets,
when you need basic Python data structures, and for simple data processing without pandas overhead.
The df() method returns results as pandas DataFrames, making it best for data analysis and manipu-
lation, integration with other pandas-based workflows, and when you need pandas’ rich functionality
(groupby, pivot, etc.).
The fetchnumpy() method returns results as NumPy arrays, making it best for high-performance
numerical computations, machine learning workflows, and when working with large numerical datasets.
443
This capability is revolutionary because there’s no need to import data into a database first, you can
combine SQL’s power with Python’s data processing, and you can mix DataFrame operations with SQL
operations seamlessly.
Practical applications for geospatial work include loading spatial data from various sources into
DataFrames, using SQL for complex spatial queries, using pandas for data cleaning and preprocessing,
and combining both approaches in a single workflow.
Once you are done with the analysis, you can close the connection to the database by calling close
method on the connection object.
You can also use a context manager to create a connection to a DuckDB database. This is useful for creating
a connection to a database and then closing it automatically when you’re done.
444
with duckdb.connect('file.db') as con:
con.sql(
'CREATE TABLE IF NOT EXISTS cities AS FROM read_csv_auto("https://
opengeos.org/data/duckdb/cities.csv")'
)
con.table('cities').show(max_rows=5)
# the context manager closes the connection automatically
url = "https://opengeos.org/data/duckdb/cities.zip"
leafmap.download_file(url, unzip=True)
Auto-detection provides several benefits: it automatically determines column types (integer, float, text),
detects delimiters and quote characters, and handles common CSV variations without manual configu-
ration.
445
You should specify options when auto-detection fails, for better performance with very large files, or
when you need specific data types.
Parallel processing offers significant advantages by being dramatically faster for large files and utilizing
multiple CPU cores, which is especially beneficial for files with millions of rows
25.6.5. DataFrames
Pandas DataFrames stored in local variables can be queried as if they are regular tables within DuckDB.
446
df = pd.read_csv('cities.csv')
df
con.load_extension('spatial')
Once the spatial extension is loaded, we can check the list of supported spatial file formats with the
following command:
129 https://duckdb.org/docs/stable/core_extensions/spatial/overview.html
447
With the spatial extension loaded, we can read spatial data from various formats into DuckDB using the
ST_Read function.
First, let’s read a GeoJSON file:
# Read GeoJSON
con.sql("SELECT * FROM ST_Read('cities.geojson')")
Note that the geom column is a WKB (Well-Known Binary) column, which is a binary representation of
the geometry. DuckDB can read this column as a geometry object.
To create a table from a GeoJSON file, use the CREATE TABLE statement with the ST_Read function.
con.table('cities')
The ST_Read function can read data any vector format supported by GDAL. For example, we can read
an ESRI Shapefile into DuckDB:
448
con.sql(
"""
CREATE TABLE IF NOT EXISTS cities2 AS
SELECT * FROM ST_Read('cities.shp')
"""
)
con.table('cities2')
con = duckdb.connect()
con.load_extension('spatial')
con.sql("""
CREATE TABLE IF NOT EXISTS cities AS
FROM 'https://opengeos.org/data/duckdb/cities.parquet'
""")
con.table("cities").show()
# Export to CSV
con.sql("COPY cities TO 'cities.csv' (HEADER, DELIMITER ',')")
con.sql(
"COPY (SELECT * FROM cities WHERE country='USA') TO 'cities_us.csv' (HEADER,
449
DELIMITER ',')"
)
con.sql(
"COPY (SELECT * EXCLUDE geometry FROM cities) TO 'cities.xlsx' WITH (FORMAT
GDAL, DRIVER 'XLSX')"
)
con.sql(
"COPY (SELECT * FROM cities WHERE country='USA') TO
'cities_us.parquet' (FORMAT PARQUET)"
)
# Export to GeoJSON
con.sql("COPY cities TO 'cities.geojson' WITH (FORMAT GDAL, DRIVER 'GeoJSON')")
450
con.sql(
"COPY (SELECT * FROM cities WHERE country='USA') TO 'cities_us.geojson' WITH
(FORMAT GDAL, DRIVER 'GeoJSON')"
)
# Export to Shapefile
con.sql("COPY cities TO 'cities.shp' WITH (FORMAT GDAL, DRIVER 'ESRI
Shapefile')")
# Export to GeoPackage
con.sql("COPY cities TO 'cities.gpkg' WITH (FORMAT GDAL, DRIVER 'GPKG')")
con.sql("SHOW TABLES;")
451
25.8.3. Creating and Understanding Geometries
Let’s create examples of each geometry type to understand how they work:
con.sql("""
""")
Understanding Well-Known Text (WKT)130 is essential for working with geometries. The
ST_GeomFromText() function creates geometries from WKT format, which is a standard text represen-
tation for geometries. Coordinates are typically in (X, Y) or (longitude, latitude) order, and the spatial
reference system affects how these coordinates are interpreted.
When printing a geometry, we can use the ST_AsText() function to get the WKT representation of the
geometry.
To save the geometry to a file, we can use COPY statement with the FORMAT GDAL and
DRIVER 'GeoJSON' options.
con.sql("""
COPY samples TO 'samples.geojson' (FORMAT GDAL, DRIVER GeoJSON);
""")
con.sql("""
SELECT ST_AsText(geom)
FROM samples
130 https://tinyurl.com/y3m3aeod
452
WHERE name = 'Point';
""")
con.sql("""
SELECT ST_X(geom), ST_Y(geom)
FROM samples
WHERE name = 'Point';
""")
con.sql("""
SELECT * FROM nyc_subway_stations
""")
Select the name and geom columns from the nyc_subway_stations table:
con.sql("""
SELECT name, ST_AsText(geom)
FROM nyc_subway_stations
LIMIT 10;
""")
con.sql("""
SELECT ST_AsText(geom)
FROM samples
WHERE name = 'Linestring';
""")
con.sql("""
SELECT ST_Length(geom)
FROM samples
WHERE name = 'Linestring';
""")
453
25.8.6. Working with Polygons
Polygons represent areas with boundaries:
con.sql("""
SELECT ST_AsText(geom)
FROM samples
WHERE name = 'Polygon';
""")
Calculate the area and perimeter of the polygon using the ST_Area() and ST_Perimeter() functions:
con.sql("""
SELECT
ST_Area(geom) as area,
ST_Perimeter(geom) as perimeter
FROM samples
WHERE name = 'Polygon';
""")
454
25.9.2. Working with Real Spatial Data
Let’s bring these concepts to life using real data from New York City, specifically subway stations and
neighborhood boundaries.
First, let’s find a subway station by name:
This query finds stations whose geometry precisely matches the given point—useful for quality control
or deduplication.
455
# Find features within distance
con.sql("""
SELECT COUNT(*) as nearby_stations
FROM nyc_subway_stations
WHERE ST_DWithin(geom, ST_GeomFromText('POINT(583571 4506714)'), 500);
""")
What we’re seeing in these tables: each table has a geometry column containing spatial data, additional
attribute columns provide descriptive information, and the spatial join will combine these based on where
features intersect geographically.
456
SELECT
subways.name AS subway_name,
neighborhoods.name AS neighborhood_name,
neighborhoods.boroname AS borough
FROM nyc_neighborhoods AS neighborhoods
JOIN nyc_subway_stations AS subways
ON ST_Intersects(neighborhoods.geom, subways.geom)
WHERE subways.NAME = 'Broad St';
""")
con.sql("""
SELECT DISTINCT COLOR FROM nyc_subway_stations;
""")
This helps identify available filters for downstream analysis (e.g., focusing on stations served by the RED
line).
Let’s identify which RED line subway stations intersect with NYC neighborhoods:
This query tells us where red-line stations are located and which neighborhoods they serve—valuable for
understanding transit coverage or planning service changes.
457
25.10.5. Distance-Based Analysis
Before zooming into specific locations, it helps to get a citywide demographic baseline. Here we compute
the overall racial composition of NYC using census block data:
This acts as a benchmark for comparison with smaller areas of interest, such as those near subway stations.
Suppose we want to evaluate racial composition within walking distance of subway stations that serve
the A train. First, we identify which routes include ‘A’:
con.sql("""
SELECT DISTINCT routes
FROM nyc_subway_stations AS subways
WHERE strpos(subways.routes,'A') > 0;
""")
Now we can calculate demographics for census blocks located within 200 meters of these A-train stations:
This type of proximity analysis can be used to study equitable access to public transit or to explore
neighborhood-level disparities in infrastructure benefits.
458
# Create subway lines reference table
con.sql("""
CREATE OR REPLACE TABLE subway_lines ( route char(1) );
INSERT INTO subway_lines (route) VALUES
('A'),('B'),('C'),('D'),('E'),('F'),('G'),
('J'),('L'),('M'),('N'),('Q'),('R'),('S'),
('Z'),('1'),('2'),('3'),('4'),('5'),('6'),
('7');
""")
Next, we join this table with subway stations and census blocks to calculate demographics near each route:
This enables comparative analysis across transit corridors—essential for transit equity assessments,
neighborhood investment planning, and targeting of public services.
459
# Connect and prepare for large-scale analysis
con = duckdb.connect()
con.install_extension("spatial")
con.load_extension("spatial")
Let’s begin by querying wetlands data for a single state (e.g., Washington, D.C.).
You can inspect the schema of the dataset to understand its structure:
con.sql(f"""
SELECT COUNT(*) AS Count
FROM 's3://us-west-2.opendata.source.coop/giswqs/nwi/wetlands/*.parquet'
""")
Result: There are over 38 million wetland features in the United States, and the query completes in just
a few seconds.
460
GROUP BY filename
ORDER BY COUNT(*) DESC;
""").df()
df.head()
con.sql(
"""
SELECT * FROM states INNER JOIN wetlands ON states.id = wetlands.State
"""
)
461
# Join wetlands count with state geometries for visualization
file_path = "states_with_wetlands.geojson"
con.sql(f"""
COPY (
SELECT s.name, s.id, w.Count, s.geometry
FROM states s
JOIN wetlands w ON s.id = w.State
ORDER BY w.Count DESC
) TO '{file_path}' WITH (FORMAT GDAL, DRIVER 'GeoJSON')
""")
m = leafmap.Map()
m.add_data(
file_path, column="Count", scheme="Quantiles", cmap="Greens",
legend_title="Wetland Count"
)
m
Figure 123: Choropleth map showing the number of wetlands per state.
462
25.11.4. Wetland Distribution Charts
25.11.4.1. Pie Chart of Wetlands by State
The pie chart below shows the relative proportion of wetlands in each U.S. state. This visualization helps
identify which states contribute the most to the national wetland count (Figure 124).
leafmap.pie_chart(
count_df, "State", "Count", height=800, title="Number of Wetlands by State"
)
Figure 124: The pie chart shows the number of wetlands in each state.
463
Figure 125: The bar chart shows the number of wetlands in each state.
con.sql(
f"""
SELECT SUM(Shape_Area) / 1000000 AS Area_SqKm
FROM 's3://us-west-2.opendata.source.coop/giswqs/nwi/wetlands/*.parquet'
"""
)
area_df = con.sql(
f"""
SELECT SUBSTRING(filename, LENGTH(filename) - 18, 2) AS State, SUM(Shape_Area) /
1000000 AS Area_SqKm
FROM read_parquet('s3://us-west-2.opendata.source.coop/giswqs/nwi/wetlands/*.
parquet', filename=true)
GROUP BY State
ORDER BY COUNT(*) DESC;
"""
).df()
area_df.head(10)
464
25.11.5.3. Pie Chart of Wetland Area by State
This chart represents the proportion of total U.S. wetland area located in each state. It emphasizes states
that contribute most to the country’s total wetland coverage (Figure 126).
leafmap.pie_chart(
area_df, "State", "Area_SqKm", height=850, title="Wetland Area by State"
)
Figure 126: The pie chart shows the area of wetlands in each state.
465
Figure 127: The bar chart shows the area of wetlands in each state.
466
Simplicity: The embedded architecture means you can start analyzing spatial data immediately without
complex database setup or administration.
Integration: Seamless integration with Python’s ecosystem (pandas, GeoPandas, visualization libraries)
allows you to combine SQL’s power with Python’s flexibility.
Accessibility: By using familiar SQL syntax with spatial extensions, DuckDB makes advanced spatial
analysis techniques accessible to anyone with basic database knowledge.
25.13. Exercises
Sample Datasets:
The following datasets are used in the exercises. You don’t need to download them manually, they can be
accessed directly from the notebook.
• nyc_subway_stations.tsv131
• nyc_neighborhoods.tsv132
131 https://opengeos.org/data/duckdb/nyc_subway_stations.tsv
132 https://opengeos.org/data/duckdb/nyc_neighborhoods.tsv
467
25.13.4. Exercise 4: Sorting Results
Write a SQL query to list the subway stations in the nyc_subway_stations dataset in alphabetical order
by their names.
468
25.13.11. Exercise 11: Basic Setup and Data Loading
Connect to a DuckDB database and install the httpfs and spatial extensions
Calculate the total population of all countries in the database using the POP_EST column.
Select countries in Europe with a population greater than 10 million and order them by population in
descending order.
Save the results of the previous query as a new table called europe .
133 https://www.naturalearthdata.com/downloads/10m-cultural-vectors
134 https://beta.source.coop/cholmes/nyc-taxi-zones/taxi_zones.parquet
469
Find out the unique values in the borough column and order them alphabetically.
Find out the total area of all buildings in the selected country.
135 https://beta.source.coop/cholmes/google-open-buildings/v2/geoparquet-admin1
470
Chapter 26. Geospatial Data Processing with GDAL and OGR
26.1. Introduction
If you’ve ever opened a satellite image in QGIS, converted a shapefile to GeoJSON, or transformed
coordinates between different map projections, you’ve almost certainly used GDAL—even if you didn’t
realize it. GDAL136 (Geospatial Data Abstraction Library) and its vector counterpart OGR form the invisible
foundation beneath virtually every piece of geospatial software you encounter.
Think of GDAL as the universal translator for geospatial data. Just as Google Translate can convert
between dozens of human languages, GDAL can read and write over 200 different raster formats and 80+
vector formats. Whether you’re working with NASA satellite imagery stored in HDF5 format, vintage
survey data in an obscure proprietary format, or modern web-friendly GeoJSON files, GDAL provides a
consistent, unified way to access and manipulate all of this data.
This universality makes GDAL incredibly powerful, but it can also make it seem intimidating at first.
The library handles an enormous range of data types and formats, which means it has many functions
and options. However, understanding the core concepts and most common operations will give you the
foundation to handle the vast majority of geospatial data challenges you’ll encounter.
GDAL operates on a simple but powerful principle: it provides drivers for different data formats and
presents a unified interface for working with spatial data regardless of the underlying format. This means
you can use the same basic approach whether you’re reading a GeoTIFF image or a NetCDF climate
dataset, or whether you’re working with an ESRI Shapefile or a PostGIS database.
The relationship between GDAL and higher-level Python libraries is crucial to understand. Libraries like
GeoPandas and Rasterio that we’ve explored in previous chapters are actually built on top of GDAL. They
provide more Pythonic, user-friendly interfaces while GDAL handles the heavy lifting of format support
and data transformation behind the scenes. This chapter teaches you to work directly with GDAL, giving
you deeper control and understanding of what happens beneath those convenient abstractions.
136 https://gdal.org
471
• Use GDAL to read and inspect raster and vector datasets
• Extract metadata and spatial reference information from geospatial files
• Perform coordinate transformations and reprojections
• Convert between different geospatial data formats
• Perform raster and vector analysis
• Manage fields and layers
• Tiling and merging raster and vector data
• Perform terrain analysis
wget https://github.com/opengeos/datasets/releases/download/gdal/gdal_sample_
data.zip
unzip gdal_sample_data.zip
137 https://github.com/opengeos/datasets/releases/tag/gdal
472
26.5.1. Examining Raster Data
The gdalinfo command is your primary tool for understanding raster datasets. It reveals everything
from basic dimensions and data types to complex metadata and coordinate reference systems. Under-
standing this information is essential before performing any transformations or analysis.
gdalinfo dem.tif
This command reveals the raster’s dimensions, coordinate reference system, geotransform parameters
(which define how pixel coordinates map to real-world coordinates), and any embedded metadata. Pay
particular attention to the coordinate system information, as this determines how the data relates to
locations on Earth.
ogrinfo buildings.geojson
The basic ogrinfo command shows you the available layers and their basic properties. However, you
often need more detailed information about the layer contents and structure.
The -al flag shows information about all layers, while -so provides a summary only, which is faster
for large datasets. This combination gives you the essential information about field names, data types,
coordinate systems, and spatial extents without loading all the feature data.
This command reprojects the digital elevation model from whatever coordinate system it’s currently in
to WGS84 geographic coordinates (EPSG:4326). The transformation process involves several steps: GDAL
473
determines the source coordinate system, calculates the transformation parameters, creates a new pixel
grid in the target coordinate system, and resamples the elevation values to populate the new grid.
This command transforms building footprints from their original coordinate system to Web Mercator
(EPSG:3857), which is commonly used for web mapping applications. The transformation changes the
coordinate values but preserves the geometric relationships between features.
The gdal_translate command converts the GeoTIFF elevation model to ERDAS IMAGINE format.
While both formats store similar information, they have different internal structures and capabilities.
GDAL handles these differences transparently while preserving as much information as possible.
This conversion transforms GeoJSON data to GeoPackage138 format. GeoPackage is often preferable for
analysis because it supports spatial indexing, can store multiple layers in a single file, and handles attribute
data more efficiently than GeoJSON.
138 https://www.geopackage.org
474
ogr2ogr las_vegas_bbox.fgb las_vegas_bbox.geojson
FlatGeobuf (FGB)139 is an increasingly popular format for large vector datasets because it provides very
fast access to spatial data through efficient internal indexing and a columnar storage structure.
The -cutline parameter specifies the vector boundary to use for clipping, while -crop_to_cutline
ensures that the output raster is cropped to the minimum bounding rectangle of the clipping boundary.
This creates a smaller, more focused dataset that contains only the pixels within the specified boundary.
This command clips road features to the Las Vegas boundary, retaining only those road segments that
intersect with the boundary polygon. Features that cross the boundary are split at the boundary line.
Let’s examine the bounding box coordinates so we can also demonstrate coordinate-based clipping:
Now we can perform the same clipping operation using explicit coordinate bounds:
This approach is useful when you know the exact coordinates of your study area or when working with
regular grids or tiles.
139 https://flatgeobuf.org
475
26.9. Raster Analysis and Calculations
GDAL provides powerful capabilities for raster analysis, including band extraction, mathematical opera-
tions, and index calculations. These operations form the foundation for more complex remote sensing and
spatial analysis workflows.
These commands extract the near-infrared, red, and green bands from a Landsat image. Each band
captures different wavelengths of light, providing information about different surface properties. The
near-infrared band is particularly useful for vegetation analysis because healthy vegetation strongly
reflects near-infrared light.
gdal_calc.py \
-A nir.tif \
-B red.tif \
--outfile=ndvi.tif \
--calc="(A.astype(float) - B) / (A + B + 1e-6)" \
--type=Float32 \
--NoDataValue=-9999
This calculation computes the Normalized Difference Vegetation Index (NDVI), a widely used indicator of
vegetation health and density. The formula (NIR - Red) / (NIR + Red) produces values between −1
and 1, where higher values indicate healthier, denser vegetation. The small epsilon value (1e-6) prevents
division by zero in areas where both bands have zero values.
Sometimes calculated indices need to be constrained to their theoretical ranges:
gdal_calc.py \
-A ndvi.tif \
--outfile=ndvi_clipped.tif \
--calc="(A < -1)*-1 + (A > 1)*1 + ((A >= -1) * (A <= 1))*A" \
--type=Float32 \
--NoDataValue=-9999 \
--overwrite
476
This expression clamps NDVI values to the valid range of −1 to 1, which can be necessary when working
with noisy data or when calculation errors produce out-of-range values.
You can also create binary masks based on threshold values:
This creates a binary mask where pixels with NDVI greater than 0.3 are marked as vegetation. Such masks
are useful for area calculations and as inputs to other analysis processes.
Managing NoData values is crucial in raster analysis:
These commands set the NoData value for the binary vegetation mask, ensuring that zero values are
properly treated as “no vegetation” rather than missing data.
26.10.1. Vectorization
Vectorization converts raster data into vector polygons, typically grouping adjacent pixels with the same
value into discrete geometric features.
This process traces the boundaries between different pixel values in the building mask raster, creating
polygon features that represent individual buildings or groups of connected building pixels. The resulting
vector data can then be analyzed using traditional GIS operations like area calculations, spatial joins, and
topology analysis.
26.10.2. Rasterization
Rasterization converts vector data into raster format, which can be useful for spatial analysis, modeling,
or integration with other raster datasets.
This command rasterizes building polygons, using the ‘uid’ attribute to assign pixel values. The -tr
parameter sets the target resolution (0.6 x 0.6 meters), which should be chosen based on the scale of your
analysis and the precision of your source data.
477
Alternatively, you can create a simple presence/absence mask:
The -burn option assigns a fixed value (1) to all pixels that intersect building polygons, creating a binary
mask that shows where buildings are present.
This command combines reprojection with simplification, removing vertices that are less than 1 meter
apart. The simplification tolerance should be chosen based on your data accuracy requirements and
intended use. Larger tolerance values create simpler geometries but may lose important shape details.
Now we can dissolve the state boundaries by country, combining all states that belong to the same country:
This SQL query uses PostGIS-style spatial functions to union geometries that share the same country
attribute. The result is a simplified dataset with one feature per country rather than one per state.
478
ogr2ogr -explodecollections hawaii_parts.geojson hawaii.geojson
This operation converts each part of a multi-part geometry into a separate feature, which can be useful for
individual island analysis or when working with software that doesn’t handle multi-part geometries well.
This creates a new dataset containing only the ‘id’ and ‘name’ fields from the original state boundaries.
This approach is particularly valuable when sharing data or when creating focused datasets for specific
analyses.
The -nln (new layer name) parameter creates a layer called ‘states’ in the output GeoPackage, which is
more descriptive than the default layer name derived from the filename.
ogrinfo us_states_rename.gpkg -sql "ALTER TABLE states ADD COLUMN area DOUBLE"
This SQL command adds a new ‘area’ field to store calculated area values. The field can then be populated
using additional SQL commands or analysis operations.
479
26.13.1. Creating Raster Tiles
Tiling large rasters improves processing efficiency and enables parallel processing of different parts of
a dataset.
mkdir -p tiles
This command splits the Landsat image into 512x512 pixel tiles, stored in the ‘tiles’ directory. The
-co "TILED=YES" option creates internally tiled GeoTIFF files, which provide better performance for
many operations.
The gdal_merge.py script is straightforward for simple merging operations, combining all the tiles back
into a single file. This approach works well for datasets that were previously tiled.
Using gdalwarp for merging provides more control over the process and can handle overlapping areas
more sophisticated ways, including blending and resampling options.
mkdir -p states
ogrinfo -ro -al -geom=NO us_states.geojson | grep "id (" | awk '{print $4}' |
while read id; do
ogr2ogr -f GeoJSON "states/${id}.geojson" us_states.geojson -where "id =
'${id}'"
done
This script creates individual GeoJSON files for each state, naming each file with the state’s ID. This
approach is useful for distributed processing or when different users need access to specific geographic
areas.
480
Merging vector files back together requires careful attention to maintaining consistent schemas and
coordinate systems:
The -update -append flags add features from the Hawaii dataset to the existing merged shapefile,
combining datasets from different sources into a unified product.
This command resamples the digital elevation model to 100-meter resolution using average resampling.
The choice of resampling method is important: ‘average’ is appropriate for continuous data like elevation,
while ‘nearest’ is better for categorical data like land cover classifications.
The -separate flag ensures that each input file becomes a separate band in the output composite,
maintaining the spectral information from each individual band.
For large datasets, Virtual Raster (VRT) files provide an efficient alternative:
VRT files don’t store pixel data directly but instead reference the source files, making them very efficient
for creating composites from large datasets. The final translation step creates the actual composite file
with compression.
481
26.14.3. Handling Missing Data
Real-world raster datasets often contain gaps or missing values that need to be addressed before analysis.
This interpolation algorithm fills small gaps in the elevation data using surrounding values. The -md 5
parameter limits gap-filling to areas smaller than 5 pixels, preventing the algorithm from filling large data
voids where interpolation would be unreliable.
Properly defining NoData values is crucial for accurate analysis:
This ensures that zero values in the elevation data are properly recognized as missing data rather than
sea level elevations.
Cloud Optimized GeoTIFF (COG) format includes internal tiling and overviews that enable efficient access
to subsets of large raster datasets over network connections, making it ideal for web mapping and cloud-
based analysis.
By default, GDAL computes slope in degrees, measuring the angle between the surface and horizontal. The
-compute_edges flag ensures that slope values are calculated even at the edges of the dataset, preventing
data loss around the boundaries.
For certain applications, slope as percent rise is more intuitive:
482
gdaldem slope dem.tif slope_percent.tif \
-of GTiff \
-compute_edges \
-p
Percent slope represents the vertical rise over horizontal distance, expressed as a percentage. A 45-degree
slope equals 100% slope.
The output aspect is measured in degrees from 0° (North) clockwise to 360°, with flat areas assigned a
value of −1. This information is essential for solar radiation modeling and ecological analysis.
483
Figure 128: The hillshade visualization of the digital elevation model.
Standard hillshading assumes a single light source, typically positioned in the northwest. This creates
dramatic relief visualization but can sometimes obscure details in areas facing away from the light source
(Figure 129).
Figure 129: The multidirectional hillshade visualization of the digital elevation model.
Multidirectional hillshading uses multiple light sources to create more balanced illumination, revealing
terrain details that might be lost in standard single-source hillshading.
484
This colormap defines color transitions from low elevations (blue) through middle elevations (green/
yellow) to high elevations (brown/white), creating an intuitive representation of terrain (Figure 130).
Figure 130: The color relief visualization of the digital elevation model.
The color relief process interpolates colors between the defined elevation breakpoints, creating smooth
color transitions that highlight elevation patterns.
gdal_calc.py \
-A color_relief.tif \
-B hillshade.tif \
--outfile=shaded_relief.tif \
--calc="((A.astype(float) * 0.6) + (B.astype(float) * 0.4))" \
--type=Byte
485
Figure 131: The shaded relief visualization of the digital elevation model.
This blending approach combines 60% color information with 40% hillshading, creating a visualization
that shows both elevation patterns and surface texture.
For more sophisticated control over the blending process:
gdal_calc.py \
-A color_relief.tif \
-B hillshade.tif \
--outfile=color_hillshade.tif \
--calc="A * (B / 255.0)" \
--type=Byte
Since color relief images typically have three bands (RGB), precise blending requires processing each color
band separately:
Then combine the processed bands into a final composite (Figure 132).
486
Figure 132: The color shaded relief visualization of the digital elevation model.
This approach provides the highest quality terrain visualization by preserving the full color information
while adding realistic shadowing effects.
This command generates contour lines at 100-meter intervals, creating vector features that represent lines
of equal elevation (Figure 133). The interval should be chosen based on the terrain characteristics and the
intended use of the contours.
487
Figure 133: The contour lines of the digital elevation model.
488
26.17. References and Further Reading
For an in-depth introduction to GDAL, check out the course on Mastering GDAL Tools140 by
SpatialThoughts141. This course provides a practical hands-on introduction to GDAL and OGR command-
line programs.
For more information about GDAL, please visit the following links:
• GDAL Documentation142
• GDAL Cheat Sheet143
26.18. Exercises
26.18.1. Exercise 1: Data Inspection and Understanding
Objective: Practice examining geospatial datasets to understand their structure and properties.
Tasks:
1. Use gdalinfo to examine the landsat.tif file and answer these questions:
• What is the coordinate reference system?
• What are the dimensions (width x height) of the image?
• How many bands does it contain?
• What is the pixel size/resolution?
2. Use ogrinfo to examine the us_states.geojson file and determine:
• How many features (states) are in the dataset?
• What attributes/fields are available?
• What is the coordinate reference system?
• What is the spatial extent (bounding box)?
140 https://courses.spatialthoughts.com/gdal-tools.html
141 https://spatialthoughts.com
142 https://gdal.org
143 https://github.com/dwtkns/gdal-cheat-sheet
489
26.18.3. Exercise 3: Format Conversion and Optimization
Objective: Practice converting between different formats and optimizing data storage.
Tasks:
1. Convert the buildings.geojson to three different formats:
• ESRI Shapefile (.shp)
• GeoPackage (.gpkg)
• FlatGeobuf (.fgb)
2. Convert the dem.tif to a Cloud Optimized GeoTIFF with DEFLATE compression
3. Compare the file sizes of all formats and discuss the advantages/disadvantages of each
490
4. Create a custom color relief map with at least 5 elevation classes
491
Chapter 27. Building Interactive Dashboards with Voilà and
Solara
27.1. Introduction
Creating interactive web applications from geospatial analyses has traditionally required expertise in web
development technologies like HTML, CSS, and JavaScript. However, modern Python frameworks have
simplified this process significantly, allowing geospatial analysts to build sophisticated web applications
using familiar Python syntax and tools.
This chapter introduces you to two powerful frameworks for creating interactive web applications from
your Python geospatial analyses: Voilà and Solara. These tools bridge the gap between data analysis
and web deployment, transforming your Jupyter notebooks and Python scripts into interactive web
applications that others can use without needing Python expertise or technical knowledge.
Voilà144 represents the simplest approach to web application development for geospatial analysts. This
framework transforms existing Jupyter notebooks into standalone web applications by removing code
cells and preserving only outputs and interactive widgets. The result is a clean, professional interface that
allows users to interact with your geospatial data and visualizations without seeing the underlying code.
This approach excels for sharing analysis results, creating dashboards, or building simple geospatial tools
where the primary goal is to present findings in an accessible format.
Solara145 takes a more sophisticated approach by providing a reactive web framework specifically
designed for data science applications. Built on modern web technologies while maintaining Python-first
development, Solara enables you to create complex interactive applications using pure Python code. The
framework excels at handling state management and real-time updates, making it particularly well-suited
for dynamic geospatial applications that require multi-page interfaces, advanced user interactions, and
responsive design elements.
Both frameworks integrate seamlessly with Leafmap’s MapLibre backend, enabling you to create stunning
3D globe visualizations, interactive maps with advanced styling, and sophisticated geospatial analysis
tools. They also support deployment on cloud platforms like Hugging Face Spaces, enabling you to share
your applications with the world at no cost.
The choice between Voilà and Solara often depends on your specific needs and technical requirements.
Voilà provides the fastest path from notebook to web application, making it ideal for quickly sharing
analysis results or creating simple dashboards. Solara offers more flexibility and control, making it
better suited for applications that require complex user interactions, multiple pages, or advanced state
management.
While other popular tools like Streamlit146 and Gradio147 exist for creating interactive web applications,
they are less suitable for geospatial applications due to limited support for ipywidgets. This limitation
prevents them from capturing user interactions with maps, such as drawing polygons or selecting features
for analysis. Voilà and Solara, with their native ipywidgets support, can capture these interactions and
enable full bidirectional communication between users and geospatial interfaces.
144 https://voila.readthedocs.io
145 https://solara.dev
146 https://streamlit.io
147 https://www.gradio.app
492
Throughout this chapter, we’ll explore both frameworks through practical examples, starting with simple
globe visualizations and progressing to multi-page applications with interactive sidebars and real-time
data visualization capabilities.
Let’s verify that our installations are working correctly by importing the key libraries:
import voila
import solara
148 https://huggingface.co
493
27.4.2. Logging in to Hugging Face
To login to Hugging Face, type the following command in your terminal:
huggingface-cli login
Figure 134: Duplicate the Hugging Face space to your own account.
On the pop-up window, you can change the name of the space if you want. Change the visibility to
“Public” (see Figure 135). Then, click on “Duplicate Space” to create a new space.
494
It will take a few minutes to create the space. Once the space is created, you can access it at https://
huggingface.co/spaces/YOUR-USERNAME/leafmap-voila. You may need to refresh the page to see the
new space.
495
Figure 137: Run the Voilà application.
You can explore the source code of the application in the leafmap.ipynb notebook under the
notebooks folder. The source code like this:
m = leafmap.Map(
style="liberty", projection="globe", sidebar_visible=True, height="750px"
)
m.add_basemap("USGS.Imagery")
cities = (
"https://github.com/opengeos/datasets/releases/download/world/world_cities.
geojson"
)
m.add_geojson(cities, name="Cities")
m
This is how you would normally use the MapLibre backend in a Jupyter environment. However, Voilà
removes the source code from the notebook and only displays the output when the notebook is run,
resulting the clean and simple dashboard shown above.
496
27.5.4. Exploring the File Structure of the Space
Understanding the structure of your Hugging Face space is essential for customizing and maintaining
your Voilà application. Let’s examine the key components that make up the space (see Figure 138).
The space contains several critical files, each serving a specific purpose in the application deployment
process. The README.md file serves as both the main documentation for your space and contains
essential configuration metadata (see Figure 139). You can update the title and description to reflect
your application’s purpose, but the YAML header at the top of the file must remain intact as it contains
deployment configuration. The app_port specification is particularly important as it tells Hugging Face
which port your Voilà application will use.
The Dockerfile defines the containerized environment where your application will run. This file speci-
fies the base image, installs dependencies, and sets up the runtime environment. While you typically won’t
need to modify this file for basic applications, understanding its role helps you troubleshoot deployment
issues if they arise.
The environment.yml file manages your application’s Python dependencies using conda. This file
ensures that your application has access to all necessary libraries and specific versions required for proper
functionality. You can add new packages here as your application grows in complexity, but be mindful of
package compatibility and deployment time constraints.
The notebooks folder contains the Jupyter notebooks that form the core of your application. Each
notebook in this folder becomes accessible through the Voilà interface, allowing users to interact with
your geospatial analyses without seeing the underlying code.
497
Figure 139: The README file of the space.
498
To add new files, click on the “Contribute” button create a new file or upload existing files (see Figure 141).
Once the files are uploaded, click on the “Commit changes” button to save the changes. The Space will be
rebuilt and the changes will be reflected in the space. Wait for a few minutes for the space to be updated.
Once the space is updated, you should see the “Running” status next to the space name.
Then, you can make changes to any file in the space using VS Code. For example, you can update the
README.md file to change the title and short description of the space. You can modify the dependencies
in the environment.yml file. You can add or remove notebooks in the notebooks folder. To test the
application locally, you can run the following command:
voila notebooks/
A new tab will be opened in your browser with the application running locally. Select a notebook and
run it. The notebook will be rendered as a web application. Markdown cells will be rendered as text, and
code cells will be rendered as interactive widgets. However, the source code of the notebook will not be
rendered. To show the source code in the rendered application, you can stop the running application by
pressing Ctrl+C in the terminal. Then, run the following command:
499
voila notebooks/ --strip_sources=False
149 https://solara.dev/documentation/components
150 https://ipywidgets.readthedocs.io
500
27.6.2. Using a Leafmap Template for Solara
To streamline the development process, we will use a Leafmap template for Solara. The template is
available at https://huggingface.co/spaces/giswqs/solara-maplibre. You can duplicate the space to your
own account by clicking on the three dots in the top right corner of the space and selecting “Duplicate
Space”. In the pop-up window, you can change the name of the space if you want. Change the visibility
to “Public”. Then, click on “Duplicate Space” to create a new space. In a few minutes, the space will be
created and you can access it at https://huggingface.co/spaces/YOUR-USERNAME/solara-maplibre.
Once the space is running, you can access it like this: https://huggingface.co/spaces/giswqs/solara-
maplibre. It has a home page and a menu bar on the top. The menu bar contains tabs for various pages
(Figure 142). Click on any tab to see the content.
501
The README.md file functions as both documentation and configuration center for your web application.
Similar to the Voilà space, this file contains essential metadata in its YAML header that controls how
Hugging Face deploys and runs your application. You can customize the title and description to reflect
your application’s purpose, but the header configuration must remain intact to ensure proper deployment.
The Dockerfile defines the containerized environment for your application, specifying the base image,
dependency installation process, and runtime configuration. While you typically won’t need to modify
this file for standard geospatial applications, understanding its structure helps with troubleshooting
deployment issues and customizing the runtime environment for specialized requirements.
The requirements.txt file manages your Python dependencies using pip, listing all the libraries your
application needs to function properly. This file is critical for ensuring reproducible deployments and
should include specific version numbers for key dependencies to prevent compatibility issues during
deployment.
The pages directory represents the heart of your Solara application, containing individual Python scripts
that define each page of your multi-page application. Each script in this directory becomes a separate
page in your web application’s navigation menu. The naming convention using numerical prefixes (like
01_ , 02_ ) determines the order in which pages appear in the navigation, while the descriptive names
become the display labels for each tab.
502
m.to_solara() . If you run this code in JupyterLab, you will see a map with a 3D globe. This is the same
code as the one used by the globe page151 in the Solara web app you just created.
import solara
import leafmap.maplibregl as leafmap
def create_map():
m = leafmap.Map(
style="liberty",
projection="globe",
height="750px",
zoom=2.5,
sidebar_visible=True,
)
return m
@solara.component
def Page():
m = create_map()
return m.to_solara()
Page()
Essentially, you can use leafmap like you normally do in Jupyter notebooks. The only difference is that
you need to return the map as a solara.Component object using m.to_solara() . Place the content
within the Page component, which is the root component of the web app. Inside JupyterLab, you will
need to add Page() to the end of the cell to display the map. In a Python script, the Page component
will be automatically called without the need to add Page() to the end of the cell.
Take your time to explore other pages in the Solara web app. You will see that the code structure of each
page is very similar to the one we used above.
151 https://huggingface.co/spaces/giswqs/solara-maplibre/blob/main/pages/01_globe.py
503
import solara
import leafmap.maplibregl as leafmap
def create_map():
m = leafmap.Map(
style="positron",
projection="globe",
height="750px",
zoom=2.5,
sidebar_visible=True,
)
geojson = "https://github.com/opengeos/datasets/releases/download/world/mgrs_
grid_zone.geojson"
m.add_geojson(geojson)
m.add_labels(geojson, "GZD", text_color="white", min_zoom=2, max_zoom=10)
return m
@solara.component
def Page():
m = create_map()
return m.to_solara()
cd solara-maplibre
solara run pages/
This will start the web app on your local machine. You can access it at http://localhost:8765 . Click
on the MGRS tab to see the MGRS map you just created (Figure 144).
504
Figure 144: Visualizing the Military Grid Reference System (MGRS) with Solara.
git add .
git commit -m "Update the web app"
git push
The Hugging Face space will be re-built and the changes will be deployed to the space. Wait for a few
minutes until you see the Running status in the Hugging Face space. Then, you can access the web app
at https://huggingface.co/spaces/YOUR-USERNAME/solara-maplibre.
Congratulations! You have successfully created a multi-page Solara web app and deployed it to Hugging
Face Spaces. Share your web app with your friends and colleagues.
505
creating professional-looking web interfaces. The framework excels when your primary goal is to present
findings or enable basic interactions with geospatial data.
Solara offers more sophisticated capabilities for creating complex, multi-page applications with advanced
user interactions. Its component-based architecture and reactive programming model make it well-
suited for applications requiring state management, real-time updates, and complex user workflows. The
framework’s native support for ipywidgets ensures that all Leafmap functionality translates seamlessly
to web applications.
Both frameworks integrate naturally with the geospatial Python ecosystem, particularly through
Leafmap’s MapLibre backend, enabling stunning 3D visualizations and interactive mapping capabilities.
Their support for deployment on platforms like Hugging Face Spaces makes sharing applications with
global audiences both simple and cost-effective.
The choice between frameworks depends on your specific requirements: use Voilà for quick deployment
of notebook-based analyses and simple dashboards, and choose Solara for complex applications requiring
multiple pages, advanced interactivity, or sophisticated state management. Both frameworks eliminate
the traditional barrier of needing web development expertise, allowing geospatial analysts to focus on
their domain expertise while creating professional web applications.
27.8. Exercises
27.8.1. Exercise 1: Create a Simple Voilà Dashboard
Create a basic Voilà application that displays a choropleth map of world population data. Start by creating
a new Jupyter notebook that loads world country boundaries and population data152, creates a choropleth
visualization using Leafmap. Deploy your notebook as a Voilà application to Hugging Face Spaces and
share the public URL.
152 https://github.com/opengeos/datasets/releases/download/vector/countries.geojson
153 https://github.com/opengeos/datasets/releases/tag/vector
506
Chapter 28. Distributed Computing with Apache Sedona
28.1. Introduction
In today’s data-driven world, geospatial datasets are growing exponentially in both size and complexity.
From GPS tracking data and satellite imagery to smart city sensor networks and social media geotagged
content, we’re generating spatial data at unprecedented rates. Traditional single-machine geospatial pro-
cessing tools, while excellent for moderate-sized datasets, often struggle with the scale and computational
demands of modern big geospatial data applications.
Apache Sedona154 (formerly GeoSpark) addresses this challenge by bringing distributed computing capa-
bilities to geospatial data processing. Built on top of Apache Spark155, Sedona extends Spark’s distributed
computing framework with comprehensive spatial data types, functions, and algorithms. This combina-
tion enables you to process massive geospatial datasets across clusters of machines while maintaining the
familiar SQL and DataFrame APIs that make Spark accessible to data scientists and analysts.
Apache Sedona fills a critical gap in the geospatial big data ecosystem. While traditional GIS software ex-
cels at interactive analysis and visualization, and databases like PostGIS handle spatial queries efficiently,
neither is designed for the massive-scale, distributed processing that modern geospatial applications
require. Sedona combines the best of both worlds: the analytical power of distributed computing with the
rich spatial functionality that geospatial professionals need.
The power of Sedona lies in its comprehensive approach to distributed geospatial computing. It provides
native support for standard spatial data types including Points, LineStrings, Polygons, and complex
geometries, all distributed seamlessly across Spark clusters. The framework includes sophisticated spatial
operations such as distance calculations, intersections, unions, buffers, and other spatial relationships,
all optimized to work at massive scale. Perhaps most importantly, Sedona implements advanced spatial
indexing using R-tree and Quad-tree structures, enabling efficient spatial queries on datasets containing
billions of spatial objects.
One of Sedona’s greatest strengths is its integration with the broader data ecosystem. The platform offers
full spatial SQL support through Spark SQL, making it accessible to users comfortable with SQL syntax.
It provides APIs in multiple programming languages including Scala, Java, Python, and R, ensuring
compatibility with diverse development environments.
Understanding when to use Apache Sedona is crucial for effective implementation. The platform excels
when working with datasets that exceed the capacity of traditional GIS tools, typically those involving
millions or billions of spatial records. It’s particularly valuable for batch processing scenarios where you
need to perform complex spatial analysis on massive datasets, such as analyzing years of GPS tracking
data or processing satellite imagery archives. Sedona also shines in real-time spatial analytics applications,
where streaming spatial data needs immediate processing and analysis. For organizations already using
Spark-based data pipelines, Sedona provides a natural extension that adds spatial capabilities without
requiring a complete architectural overhaul. Finally, when your analysis requirements demand scaling
across multiple machines or cloud clusters, Sedona’s distributed architecture provides the necessary
infrastructure to handle enterprise-scale geospatial workloads.
154 https://sedona.apache.org
155 https://spark.apache.org
507
28.2. Learning Objectives
By the end of this chapter, you will be able to:
• Understand the architecture and benefits of distributed geospatial computing
• Install and configure Apache Sedona with PySpark
• Create and manipulate spatial DataFrames using Sedona’s spatial data types
• Perform fundamental spatial operations like distance calculation, spatial relationships, and geometric
transformations
• Execute efficient spatial joins on large datasets using Sedona’s optimized algorithms
• Use spatial indexing to improve query performance on big geospatial data
• Write spatial SQL queries using Sedona’s spatial functions
• Apply Sedona to real-world geospatial analysis scenarios using common datasets
For cloud-based alternatives, Wherobots156 provides a managed cloud platform specifically designed for
Apache Sedona workloads. This managed service handles the infrastructure complexity while providing
a familiar Jupyter notebook interface for development and analysis. Navigate to Wherobots and sign
up for a free account. Once you have an account, you can start a new notebook and select the “Tiny”
instance. This will give you free access to a JupyterLab instance with Apache Sedona installed. Within the
JupyterLab interface, you can upload the code examples in this chapter to your notebook and run them.
156 https://wherobots.com
508
import leafmap
import geopandas as gpd
import pandas as pd
from sedona.spark import *
from pyspark.sql.functions import col, expr
config = (
SedonaContext.builder()
.appName("SedonaApp") # Application name for tracking in Spark UI
.config(
"spark.serializer", "org.apache.spark.serializer.KryoSerializer"
) # Faster serialization
.config(
"spark.kryo.registrator",
"org.apache.sedona.core.serde.SedonaKryoRegistrator"
) # Spatial object serialization
.config(
"spark.jars.packages",
"org.apache.sedona:sedona-spark-
shaded-3.5_2.12:1.7.2,org.datasyslab:geotools-wrapper:1.7.2-28.5",
) # Core Sedona packages
.config(
"spark.jars.repositories",
"https://artifacts.unidata.ucar.edu/repository/unidata-all",
) # Additional repositories
.config(
"spark.hadoop.fs.s3a.connection.ssl.enabled", "true"
) # Enable SSL for S3 connections
.config(
"spark.hadoop.fs.s3a.path.style.access", "true"
) # Use path-style access for S3
.config(
"spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem"
) # S3 filesystem implementation
509
.config(
"spark.hadoop.fs.s3a.aws.credentials.provider",
"org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider",
) # Anonymous access to public S3 buckets
.getOrCreate()
)
# Create the Sedona context which adds spatial capabilities to the Spark session
sedona = SedonaContext.create(config)
This configuration demonstrates the essential setup requirements for a production-ready Sedona envi-
ronment. The Kryo serialization configuration significantly improves the performance of spatial object
serialization and deserialization across the distributed cluster. The spatial SQL extensions enable the use
of spatial functions directly within SQL queries, making Sedona accessible to users familiar with spatial
databases. The memory allocation settings ensure that the system has sufficient resources to handle
large spatial datasets efficiently. The S3 configuration enables seamless access to cloud-stored geospatial
datasets, which is increasingly important for modern geospatial applications that leverage cloud data
repositories.
url = (
"https://github.com/opengeos/datasets/releases/download/book/sedona-sample-
data.zip"
)
leafmap.download_file(url)
The downloaded zip file will be automatically unzipped and placed in the current working directory.
510
nature of these DataFrames means that operations performed on them are automatically parallelized
across the available cluster resources, providing significant performance improvements for large-scale
spatial analysis.
511
data=cities_data, schema=["city", "latitude", "longitude"]
)
The output shows our DataFrame with a new geometry column containing Point objects. Each row
represents a city with both tabular attributes (name, coordinates) and spatial geometry. The geometry
column contains Well-Known Binary (WKB) representations of the point geometries, which Sedona uses
internally for efficient spatial processing.
# Reproject from New York State Plane (EPSG:2263) to Albers Equal Area
(EPSG:5070)
# This ensures accurate area calculations for our later analysis
boroughs_gdf = boroughs_gdf.to_crs("EPSG:5070")
boroughs_gdf
This initial step uses GeoPandas to load and reproject the data. We’re converting from a local coordinate
system (New York State Plane) to a continental projection (Albers Equal Area) that’s suitable for accurate
area and distance calculations across the broader region.
The next step converts our GeoPandas GeoDataFrame into a Sedona spatial DataFrame. This process
involves extracting the geometry information as Well-Known Text (WKT) format, which can be easily
transferred to Spark, then reconstructing the spatial objects within the distributed computing environ-
ment.
512
boroughs_df = sedona.createDataFrame(boroughs_pd)
This conversion process enables us to work with complex polygon geometries in Sedona’s distributed
environment. The WKT format serves as an intermediate representation that preserves the complete
geometric information while enabling transfer between different spatial computing frameworks. The
resulting spatial DataFrame contains the same polygon boundaries as our original data, but now distrib-
uted across the Spark cluster for scalable processing.
157 https://sedona.apache.org/latest/api/sql/Function
513
)
The results show each borough’s area in square kilometers, perimeter in kilometers, and centroid
coordinates. These calculations are performed using the projected coordinate system (EPSG:5070), which
provides accurate measurements in meters that we then convert to more readable units. The centroid
represents the geometric center of each borough, useful for various analytical applications such as distance
calculations or label placement in maps.
The transformed coordinates now represent positions in meters from the projection’s origin, enabling
accurate distance calculations. You’ll notice the coordinate values are much larger, as they represent
distances in meters rather than degrees.
Now let’s calculate distances from each city to Manhattan’s centroid. This demonstrates how to perform
distance calculations between different spatial datasets, a common requirement in proximity analysis and
location-based services.
514
# Calculate distances from each city to Manhattan's centroid
# First, extract Manhattan's centroid as a reference point
manhattan_centroid = (
boroughs_spatial.filter(col("BoroName") == "Manhattan")
.select(expr("ST_Centroid(geometry)").alias("manhattan_center"))
.collect()[0][0] # collect() brings the result to the driver node
)
The results show the distance from each city to Manhattan’s geometric center, ordered from closest to
farthest. New York City appears first because it contains Manhattan. The distances are calculated using
the projected coordinate system, providing accurate measurements in kilometers. This type of analysis is
fundamental for proximity-based applications such as delivery routing, market analysis, or service area
planning.
158 https://sedona.apache.org/latest/api/sql/Predicate
515
and proximity. These relationship tests are the building blocks for complex spatial queries and are essential
for understanding spatial patterns and connections in your data.
The buffer operation demonstrated below is particularly useful for proximity analysis, creating zones of
influence around geographic features. This technique is commonly used in urban planning, environmental
analysis, and market research to understand the impact or reach of specific locations or features.
Let’s demonstrate buffer analysis by creating a 5-kilometer buffer around Manhattan and identifying
which cities fall within this zone. Buffer analysis is essential for proximity studies, impact assessments,
and service area analysis.
The results show which cities fall within a 5-kilometer radius of Manhattan’s boundary. This type of
analysis is commonly used in urban planning to understand the reach of services, in environmental studies
to assess impact zones, and in business analytics to define market areas. The buffer operation creates a
polygon that extends 5 kilometers outward from Manhattan’s boundary, and the spatial join efficiently
identifies all cities that intersect with this zone.
516
The efficiency of Sedona’s spatial joins stems from sophisticated spatial indexing algorithms that organize
spatial data for rapid retrieval. These indexes, including R-tree and Quad-tree structures, dramatically
reduce the computational complexity of spatial queries by eliminating the need to compare every
geometry with every other geometry. This optimization is crucial when working with large datasets, as
the computational complexity of naive spatial joins increases exponentially with dataset size.
world_cities_pd = pd.read_csv("world_cities.csv")
world_cities_pd.head()
This dataset contains essential information for each city including name, country, coordinates, and
population. The geographic coordinates will be used to create spatial point geometries for each city.
Now we’ll convert this Pandas DataFrame to a Sedona spatial DataFrame, creating point geometries from
the coordinate columns. This transformation distributes the data across our Spark cluster and enables
spatial operations.
517
The resulting DataFrame shows our cities with their attributes and a new geometry column containing
point objects. This spatial DataFrame is now distributed across the cluster and ready for large-scale spatial
analysis operations.
The spatial join operation efficiently identifies which cities fall within each country’s boundaries. This
is accomplished through Sedona’s optimized spatial indexing, which avoids the need to test every city
against every country boundary, making the operation scalable even with millions of records.
Now let’s summarize the results by counting how many cities fall within each country. This demonstrates
how spatial joins can be combined with traditional aggregation operations for comprehensive analysis.
518
# Aggregate cities by country to get summary statistics
city_counts = cities_in_countries.groupBy("countries.country").count()
city_counts.show()
This aggregation provides a summary of how many cities from our dataset fall within each country’s
boundary. This type of analysis is commonly used in demographic studies, market research, and admin-
istrative reporting to understand the distribution of features across geographic regions.
519
country_unions.show()
This aggregation creates a spatial union of all city points within each country, effectively creating a
MultiPoint geometry that represents the spatial distribution of cities. Simultaneously, it calculates statis-
tical summaries like city count and average population. The ST_Union_Aggr function is particularly
powerful for creating composite geometries from multiple spatial features, useful for visualization and
further spatial analysis.
520
) # Grid cells with 5+ major cities
high_density_grids.orderBy(col("city_count").desc()).show()
The results reveal global urban clusters by identifying grid cells with high concentrations of major cities.
Each row represents a 2-degree grid cell containing five or more cities with populations exceeding 100,000
people. This analysis helps identify major urban corridors and megalopolitan regions worldwide, such as
the Northeast US corridor, European urban clusters, or densely populated areas in Asia.
# Read CSV file and create spatial geometries from coordinate columns
cities_df = (
sedona.read.option("header", True) # CSV has header row
.format("CSV")
.load("world_cities.csv")
.withColumn(
"geometry", expr("ST_Point(longitude, latitude)")
) # Create point geometries from coords
)
cities_df.show()
This approach transforms tabular coordinate data into a spatial DataFrame ready for distributed spatial
analysis. The ST_Point function creates proper spatial geometries from the longitude and latitude
columns, enabling all spatial operations and functions.
521
The initial read shows the nested structure typical of GeoJSON files. We need to flatten this structure to
make it suitable for DataFrame operations and spatial analysis.
This transformation creates a flat DataFrame where each row represents a cable feature with its geometry
and attributes as separate columns. The geometry column contains the spatial information while the other
columns contain the feature attributes.
Sedona automatically handles the multiple files that comprise a shapefile (.shp, .shx, .dbf, etc.) and creates
a unified spatial DataFrame with both geometry and attribute information.
522
GeoPackage files can contain multiple spatial and non-spatial tables, making them excellent for data
distribution and archival purposes.
GeoParquet files load significantly faster than traditional formats and support efficient spatial indexing,
making them ideal for big data geospatial applications.
# Read individual state wetlands data from S3 (DC wetlands - smaller dataset for
demonstration)
url = "s3a://us-west-2.opendata.source.coop/giswqs/nwi/wetlands/DC_Wetlands.
parquet"
df = sedona.read.format("parquet").load(url)
The raw data contains geometry information in Well-Known Binary (WKB) format. We need to convert
this to a format that Sedona can work with for spatial operations.
159 https://www.fws.gov/program/national-wetlands-inventory/wetlands-mapper
160 https://www.fws.gov/
161 https://source.coop/giswqs/nwi/wetlands
523
# Convert WKB geometry to readable WKT format for inspection
df_wkt = df.withColumn("geometry", expr("ST_AsText(ST_GeomFromWKB(geometry))"))
df_wkt.show()
For analysis of the complete national dataset, we can use wildcard patterns to read all state files simulta-
neously. This demonstrates Sedona’s ability to process truly massive geospatial datasets.
# Read all state wetlands data using wildcard pattern (75.8 GB total)
url = "s3a://us-west-2.opendata.source.coop/giswqs/nwi/wetlands/*.parquet"
df = sedona.read.format("parquet").load(url)
This example showcases Sedona’s scalability - processing 75.8 GB of wetlands data distributed across
multiple files and containing millions of polygon features. The distributed nature of Spark and Sedona
makes this analysis feasible on cluster resources.
162 https://docs.kepler.gl/docs/keplergl-jupyter
524
# Add additional layer to the existing map
SedonaKepler.add_df(m, countries_df, name="Countries")
Figure 145: Visualization of the world cities and countries using KeplerGL.
The resulting map provides interactive controls for styling, filtering, and exploring the spatial data,
making it easy to identify patterns and relationships in large datasets.
163 https://deckgl.readthedocs.io
525
Figure 146: Visualization of the world countries using PyDeck.
Figure 147 shows the world cities using PyDeck.
526
28.11. Writing Vector Data
After processing spatial data with Sedona, you often need to export the results for use in other applications
or for archival purposes. Sedona supports writing to various formats including GeoParquet and GeoJSON,
enabling integration with different downstream workflows.
Now we can write the processed data to different formats. The repartitioning operation optimizes the
output by controlling the number of output files, which is important for performance with large datasets.
The choice of output format depends on your downstream requirements. GeoParquet is optimal for
large datasets and further analysis, while GeoJSON provides maximum compatibility with web mapping
applications and other GIS tools.
527
# Extract metadata information to understand the file structure
record_info_df = df.selectExpr("RS_NetCDFInfo(content) as record_info")
The metadata output shows the structure of the NetCDF file, including available variables, dimensions,
and coordinate systems. This information is crucial for extracting the specific data layers you need.
This transformation converts the NetCDF data into Sedona’s raster format, enabling spatial analysis
operations on the wind data.
dem_df = sedona.read.format("binaryFile").load("dem_90m.tif")
dem_df = dem_df.withColumn("raster", expr("RS_FromGeoTiff(content)"))
dem_df.show()
For large raster datasets, it’s often useful to tile the data into smaller chunks for distributed processing.
This approach improves parallelization and memory management.
# Tile the raster into 256x256 pixel chunks for distributed processing
tiles_df = dem_df.selectExpr("RS_TileExplode(raster, 256, 256) as (x, y,
raster)")
528
The tiling process creates multiple smaller raster tiles from the original large raster, with each tile
containing coordinates (x, y) and the raster data. This approach enables efficient distributed processing
of large raster datasets.
dem_df.printSchema()
dem_df.createOrReplaceTempView("dem_df")
529
Now we’ll create a binary mask identifying mountain areas (elevations above 2000 meters) using map
algebra. This type of operation is fundamental for habitat analysis, risk assessment, and land classification.
# Create binary mask for areas above 2000m elevation using map algebra
mountain = sedona.sql(
"""
SELECT
RS_MapAlgebra(
raster,
'D',
'out = (rast[0] > 2000) ? 1 : 0;'
) AS mountain
FROM dem_df
"""
)
530
The resulting image (Figure 149) shows a binary classification where white pixels represent areas above
2000 meters elevation (mountains) and black pixels represent lower elevations. This type of analysis is
fundamental for environmental modeling, risk assessment, and land use planning.
After creating our mountain classification raster, we can save it to disk for future use or sharing with
other applications.
This operation saves the binary mountain classification as GeoTIFF files that can be used in other GIS
applications or shared with collaborators.
Let’s define a study area polygon in California’s Sierra Nevada mountains to analyze elevation statistics
within this specific region.
Now we’ll calculate the average elevation within this polygon using Sedona’s zonal statistics function.
531
"""
)
The result shows the average elevation within our study area polygon. This type of analysis is fundamental
for environmental studies, habitat assessment, and regional planning applications.
This example saves the DEM with LZW compression (75% quality) to reduce file size while maintaining
data integrity. The compression settings can be adjusted based on your storage and quality requirements.
Let’s also demonstrate writing NetCDF data after processing. First, we’ll reload and process wind data for
demonstration.
532
# Extract the u_wind component as a raster for analysis
netcdf_df = netcdf_df.withColumn(
"raster", expr("RS_FromNetCDF(content, 'u_wind', 'lon', 'lat')")
)
netcdf_df.show()
After processing the wind data, you can save it using the same writing operations as shown above,
enabling efficient storage and sharing of processed atmospheric data.
gdf.explore()
This conversion enables you to use all of GeoPandas’ visualization and analysis capabilities on data that
was processed at scale using Sedona’s distributed computing.
533
28.17.2. From Sedona DataFrame To GeoArrow
GeoArrow164 provides an efficient interchange format between different geospatial computing libraries,
enabling high-performance data transfer with minimal overhead.
The Arrow format preserves spatial data types while providing excellent performance for data transfer
between different systems and libraries.
The direct conversion from GeoPandas to Sedona preserves both the spatial geometries and attribute data,
enabling seamless scaling of existing geospatial workflows.
164 https://geoarrow.org
534
# Display the reconstructed GeoPandas DataFrame
gdf.head()
This round-trip conversion demonstrates the interoperability between different geospatial computing
environments, enabling flexible workflow design.
pandas_df["geometry"] = pandas_df["wkt_geom"].apply(wkt.loads)
return gdf
535
# Display the converted major cities dataset
major_cities_gdf
This custom function provides a robust way to convert Sedona results to GeoPandas format, enabling
detailed analysis and visualization of distributed processing results.
# Analyze urban density patterns for cities with population > 500,000
urban_analysis = (
world_cities_spatial.filter(col("population") > 500000) # Focus on major
cities
.withColumn(
"urban_density",
col("population")
/ expr("ST_Area(ST_Buffer(geometry, 10000))"), # People per m² in 10km
radius
)
.withColumn(
"city_category",
expr(
"""
CASE
WHEN population > 5000000 THEN 'Megacity' -- Cities > 5M people
WHEN population > 1000000 THEN 'Large City' -- Cities 1-5M people
ELSE 'Medium City' -- Cities 500K-1M
people
END
"""
),
)
)
536
category
).show()
This analysis reveals global patterns in urban density, showing how different city sizes relate to population
concentration. Such analysis is valuable for urban planning, infrastructure development, and comparative
urban studies.
# Analyze potential transit coverage for cities with population > 100,000
transit_analysis = (
world_cities_spatial.filter(col("population") > 100000)
.withColumn(
"transit_coverage",
expr("ST_Buffer(geometry, 1000)"), # 1km radius transit catchment area
)
.withColumn(
"coverage_area_km2",
expr("ST_Area(ST_Buffer(geometry, 1000)) / 1000000"), # Convert m² to
km²
)
)
accessibility_metrics.show()
This analysis provides insights into transit accessibility patterns globally, helping transportation planners
understand coverage requirements and potential service areas. The 1-kilometer buffer represents typical
walking distance to transit stops, making this analysis directly applicable to real-world planning scenarios.
537
Sedona allows analysts to perform complex spatial operations on billions of spatial objects across multiple
machines, breaking through the computational barriers that previously limited geospatial analysis.
The performance improvements offered by Sedona are substantial and multifaceted. The combination of
sophisticated spatial indexing algorithms, optimized spatial joins, and distributed processing architecture
provides dramatic speed improvements for large-scale spatial analysis tasks. These optimizations become
increasingly important as dataset sizes grow, where traditional approaches would become prohibitively
slow or impossible to execute.
Integration capabilities make Sedona particularly valuable in modern data ecosystems. The platform
seamlessly integrates with the existing Spark ecosystem, enabling organizations to incorporate spatial
analysis into their established data pipelines and workflows without requiring complete architectural
overhauls. This integration extends beyond technical compatibility to include operational workflows,
team skills, and existing infrastructure investments.
Accessibility represents another crucial advantage of Sedona’s design. The platform’s SQL interface
makes advanced spatial analysis accessible to users familiar with database technologies, democratizing
access to sophisticated geospatial analysis capabilities. Simultaneously, the DataFrame API provides a
familiar interface for data scientists already working with Spark, reducing the learning curve for adopting
distributed spatial analysis.
The ecosystem compatibility of Sedona enables hybrid analytical workflows that leverage the strengths
of both distributed and traditional geospatial tools. Easy integration with popular Python geospatial
libraries like GeoPandas allows analysts to combine the computational power of distributed processing
with the rich visualization and analysis capabilities of traditional desktop tools, creating flexible analytical
workflows that can adapt to different scales and requirements.
For organizations and analysts working with large geospatial datasets, Apache Sedona provides the essen-
tial tools needed to perform complex spatial analysis at scale. This capability makes it an indispensable
component in the modern geospatial data scientist’s toolkit, particularly as spatial data volumes continue
to grow exponentially across industries and applications.
165 https://youtu.be/V__Lq72ge5A
166 https://sedona.apache.org
167 https://sedona.apache.org/latest/api/sql/Overview/
168 https://resources.wherobots.com/cloud-native-geospatial-analytics-with-apache-sedona
538
28.21. Exercises
28.21.1. Exercise 1: Setting Up Sedona and Basic Spatial Operations
1. Install and configure Apache Sedona with PySpark in your environment.
2. Create a Sedona-enabled Spark session with appropriate memory configuration.
3. Create a spatial DataFrame from the following city coordinates:
• San Francisco: (37.7749, −122.4194)
• Seattle: (47.6062, −122.3321)
• Portland: (45.5152, −122.6784)
• Los Angeles: (34.0522, −118.2437)
4. Calculate the area of the convex hull that encompasses all four cities.
539
28.21.5. Exercise 5: Spatial Aggregation and Clustering
1. Load the world cities dataset and filter for cities with population > 50,000.
2. Create a 5-degree grid over the world (divide longitude and latitude by 5 and round).
3. Group cities by grid cell and calculate:
• Number of cities per grid cell
• Total population per grid cell
• Average population per grid cell
4. Find the top 10 most populous grid cells and display their coordinates and statistics.
540