0% found this document useful (0 votes)
44 views

Machine Learning Part1

This document provides an overview of a NASA training on machine learning for earth science. The training will cover an overview of machine learning, its importance and applications for earth science analysis, and how to apply basic machine learning algorithms and techniques to remote sensing data. The training is intended to help participants recognize common machine learning methods used in earth science, describe their benefits and limitations, and complete basic machine learning procedures. The training will include hands-on exercises and assignments.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Machine Learning Part1

This document provides an overview of a NASA training on machine learning for earth science. The training will cover an overview of machine learning, its importance and applications for earth science analysis, and how to apply basic machine learning algorithms and techniques to remote sensing data. The training is intended to help participants recognize common machine learning methods used in earth science, describe their benefits and limitations, and complete basic machine learning procedures. The training will include hands-on exercises and assignments.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

National Aeronautics and Space Administration

Fundamentals of Machine Learning for Earth Science


Part 1: Overview of Machine Learning

Trainers: Jordan A. Caraballo-Vega, Mark L. Carroll, Jules Kouatchou, Jian Li, Caleb S. Spradlin

April 20, 2023


National Aeronautics and Space Administration

NASA Applied Remote


Sensing Training (ARSET)
Brock Blevins, Training Coordinator
NASA Applied Remote Sensing Training (ARSET) Goal

Agriculture
Empower the global
community Climate
to incorporate Disasters
Earth-observing data into
Health & Air Quality
environmental
management and Land
decision-making
Water Resources

NASA’s Applied Remote Sensing Training Program 3


NASA ARSET Training Availability

• Online webinar and self-paced


• Custom in-person
• cost-free
• Multi-lingual options
• Range of levels to meet diverse audience needs
• Materials are free use and adapt with credit to NASA ARSET

Visit NASA ARSET website to view all of our options

CAPACITY
BUILDING

NASA’s Applied Remote Sensing Training Program 4


Training Objectives

At the end of the training, participants will be able to:


• Recognize the most common machine learning methods used for processing Earth
Science data
• Describe the benefits and limitations of machine learning for Earth Science analysis
• Explain how to apply basic machine learning algorithms and techniques in a
meaningful manner to remote sensing data
• Use an analysis-appropriate training dataset to evaluate conditions and solutions
for a given case study
• Complete basic procedures to interpret, refine and evaluate the accuracy of the
results of machine learning analysis

NASA’s Applied Remote Sensing Training Program 5


Reminder of Prerequisites

• Prerequisites:
• Session 1 of our on-demand Fundamentals of Remote Sensing series or have
equivalent experience.
• Attendees will need access to Google Drive and Google Colab. To access
these resources, users must use an email ending in ‘gmail.com’.
• We will have the video of this demonstration within the training recording
available within 48 hours after the presentation for you to go through at your
own pace.

NASA’s Applied Remote Sensing Training Program 6


Training Schedule

Part 1: Overview of
Machine Learning Part 2: Part 3: Homework
Training Data and Model Tuning, Independent
Land Cover Parameter
practice and
Classification Optimization, and
Additional application
Example
Machine Learning
Algorithms

April 27, 2023 May 4, 2023 Due May 19


April 20, 2023 Opens May 4

Optional opportunity to earn a certificate of completion

NASA’s Applied Remote Sensing Training Program 7


National Aeronautics and Space Administration

Fundamentals of Machine Learning for Earth Science


Part 1: Overview of Machine Learning

Trainers: Jordan A. Caraballo-Vega, Mark L. Carroll, Jules Kouatchou, Jian Li, Caleb S. Spradlin

April 20, 2023


Instructor Team

Jules Kouatchou
Chief Programmer/Analyst
Jordan A. Caraballo-Vega Caleb S. Spradlin
Computer Engineer Software Developer

Mark L. Carroll Jian Li


Research Scientist Senior Principal
NASA’s Applied Remote Sensing Training Program
Applications Engineer 9
Session 1 Outline

• Overview of Machine Learning


• Importance of Machine Learning targeted towards Earth Science
• Usability of Machine Learning
• Software to Support Machine Learning
• Machine Learning Applications
• Hands on Jupyter Notebook Exercise: Load and Visualize Data
• Post-Session Assignment
• Q&A Session

Resources for this Training


https://github.com/NASAARSET/ARSET_ML_Fundamentals

NASA’s Applied Remote Sensing Training Program 10


Training Objectives

After participating in this training, attendees will be able to:

• Recognize the most common machine learning methods used for processing
Earth Science data
• Describe the benefits and limitations of machine learning for Earth Science
analysis
• Explain how to apply basic machine learning algorithms and techniques in a
meaningful manner to remote sensing data

NASA’s Applied Remote Sensing Training Program 11


Overview and Theory
Trainer: Jules Kouatchou
Overview of Machine Learning

The following quote from Arthur Samuel describes what Machine Learning (ML) is:

“Machine learning enables a machine to automatically learn from data,


improve performance from experiences, and
predict things without being explicitly programmed.”

ML uses techniques from Statistics, Mathematics, and Computer Science to make


computer programs learn from data to predict an output.

NASA’s Applied Remote Sensing Training Program 13


How does Machine Learning Work?

Image Source: Daniel Crankshaw (in a Short History of Prediction-Serving Systems)

NASA’s Applied Remote Sensing Training Program 14


Machine Learning Steps

Problem Data Data Feature


Statement Collection Preprocessing Selection

Parameter Train Choose


Prediction Tuning Model Model

NASA’s Applied Remote Sensing Training Program 15


Machine Learning Algorithms

Image Source: guru99.com


NASA’s Applied Remote Sensing Training Program 16
Big Data in Earth Science

Reichstein et al. (2019), https://doi.org/10.1038/s41586-019-0912-1


NASA’s Applied Remote Sensing Training Program 17
Machine Learning in Earth Science

Leveraging advances in artificial intelligence could


revolutionize the Earth and environmental sciences.
We must ensure that our research funding and training
choices give the next generation of geoscientists the
capacity to realize this potential.
Fleming et al. (2021), https://doi.org/10.1038/s41561-021-00865-3

NASA’s Applied Remote Sensing Training Program 18


Machine Learning in Earth Science

• Problems in Earth science are often complex.


• It is difficult to apply well-known and described mathematical models to the
natural environment:
– ML is commonly a better alternative for such non-linear problems.
• A number of researchers found that machine learning outperforms traditional
statistical models in Earth science, such as in:
– Characterizing forest canopy structure,
– Predicting climate-induced range shifts, and
– Delineating geologic facies.

NASA’s Applied Remote Sensing Training Program 19


How Machine Learning is Applied in Earth Science

Siwei Yu and Jianwei Ma (2021), https://doi.org/10.1029/2021RG000742


NASA’s Applied Remote Sensing Training Program 20
Machine Learning Applications
Trainer: Jian Li
Benefits of Utilizing Machine Learning

There are numerous ways in which ML can accelerate scientific research,


such as:
• Increased Efficiency: Machine learning can help automate the analysis of
large and complex datasets, allowing scientists to quickly process and
analyze large amounts of data.
• New Insights and Discoveries: Machine learning can help scientists identify
new patterns and relationships in complex datasets, leading to new insights
and discoveries in Earth Science research.

• Improved Predictive Modeling: Machine learning algorithms can be used


to build accurate predictive models that can help scientists better
understand complex Earth Science phenomena.

NASA’s Applied Remote Sensing Training Program 22


Efficiency, Accuracy, and Discovery
Identify new stars & star systems from a massive number of observations
• An all-sky survey mission, called the
Transiting Exoplanet Survey Satellite (TESS)
• Using AI, ML, and HPC tools, NASA
scientists have extracted more than 60
million light curves for further investigation.
• NASA astronomers have identified:
– > 50 planet candidates
– > 200 potential heartbeat stars
– > 10 potential triple star systems
– > 20 potential quadruple star systems
– A potential sextuple star system A two-dimensional projection of the high-dimensional
space of TESS light curve representations. Image Credit:
Brian P. Powell, NASA Goddard.
• All previously undiscovered

Prša et al. (2022), https://doi.org/10.3847/1538-4365/ac324a


NASA’s Applied Remote Sensing Training Program 23
Efficiency, Accuracy, and Discovery
Machine Learning-Based Crop Type and Yield Estimates in Burkina Faso, West Africa

• NASA Goddard and the • Model accuracy was


Millennium Challenge 88% for crop type and
Corporation (MCC) >64% for crop yield
• Invest in Agricultural during 2019’s rainy
Development Projects to season and 64% for
empower local farmers crop type and >53% for
Illustration by ESA. and combat food crop yield during 2019’s
insecurity dry season.

• Sentinel-2 satellite and in- • The machine learning


situ data to train and model made
optimize five Random interannual predictions
Forest machine learning for 2020’s dry season
models to estimate crop without training data;
type and yield across the accuracies were up to
study region 60% for crop type.
ML predictions of crop type across the ~2,250-hectare
study region for the 2019 rainy (left) and dry (right)
seasons. The rainy season has maize (yellow) and rice
(maroon) predominating, while the dry season has onions
(orange), tomatoes (red), and rice (maroon) Elders et al. (2022), https://doi.org/10.1016/j.rsase.2022.100820
NASA’s Applied Remote Sensing Training Program
predominating. 24
Software to Support Machine Learning

• Programming Languages
• Software Packages

Image Source: https://dev.to/minchulkim87/my-data-science-tech-stack-2020-1poa

NASA’s Applied Remote Sensing Training Program 25


Software to Support Machine Learning, Cont.
Python: Python is the most used Java: Java is an important Julia: Julia is one of the
language for Machine Learning. language for AI. One reason for newer languages on the list
One of the main reasons Python is that is how prevalent the and was created to focus
so popular within AI development language is in mobile app on performance computing
development. And given how in scientific and technical
is that it was created as a powerful many mobile apps take fields.
data analysis tool and has always advantage of AI, it’s a perfect
been popular within the field of big match.
data.
R: R might not be the perfect
language for AI, but it’s fantastic
at crunching very large numbers,
which makes it better than Python
at scale. And with R’s built-
in functional programming,
vectorial computation, and
Object-Oriented Nature, it does
make for a viable language for AI.

NASA’s Applied Remote Sensing Training Program Image Source: Stack Overflow 2022 26
Machine Learning Frameworks in Python

• Python-based tools dominate


the machine learning
frameworks based
on Kaggle's 2021 State of
Data Science and Machine
Learning survey.
• Scikit-learn is the top with over
80% of data scientists using it.
• TensorFlow and Keras were
each chosen by about half of
the data scientists for deep
learning.
• Gradient boosting library
XGBoost is fourth. Image Source: Kaggle's 2021 State of Data Science and
Machine Learning survey

NASA’s Applied Remote Sensing Training Program 27


Machine Learning Frameworks in Python, Cont.

• Scikit-Learn — One of the most important libraries (Swiss Knife) for Machine Learning as it
provides a number of simple and efficient tools for data analysis. It provides functionality for
classification, regression, clustering algorithms, dimensionality reduction, model selection, and
data preprocessing.
• TensorFlow — Library was developed by engineers and researchers working on the Google Brain
team that conducts machine learning and neural networks research. It allows researchers to
push boundaries in discovering state-of-the-art (SOTA) results, and also allows developers to
create ML-powered applications.
• Keras — High-level neural networks API, which can be implemented on top of TensorFlow or
Theano used for building and training deep learning models. It allows for easy and fast
prototyping and supports both convolutional neural networks and recurrent networks.
• PyTorch — Provides functionality largely centered around building and training neural
networks—the backbone of deep learning. PyTorch offers scalable distributed training of models
across single or multiple CPUs and GPUs. the first release was in September 2016, but it has
quickly been widely adopted by industry such as Tesla & Uber.

NASA’s Applied Remote Sensing Training Program 28


Machine Learning Frameworks in Python, Cont.

• Jupyter Notebook — Open-source, web-based application which allows us


to create and share documents that contain code, equations,
visualizations, and text. Its uses include data cleaning and transformation,
statistical modeling, data visualization, machine learning, etc.
• Matplotlib — Data visualization library that is used to create static,
animated, and interactive visualizations. It can be used to create detailed
scatterplots, histograms, bar charts, pie charts, etc.
• Seaborn — Statistical visualization library based on matplotlib and is
integrated with pandas data structures. It provides a high-level interface for
informative and statistical graphics. Since it is built on top of Matplotlib, it
offers extra plots and can produce more sophisticated visualizations.

NASA’s Applied Remote Sensing Training Program 29


Graphics Processing Unit (GPU) Role in Machine Learning

• There are many available platforms for • Popular GPU-Supported Python


parallel computing and programming. Libraries:
Out of them, CUDA (by NVIDIA) is the most – XGBoost
popular platform due to the following – OpenCV
reasons:
– cuML (Part of RAPIDS)
– CUDA runs on both Windows and Linux.
– cuDF (Part of RAPIDS)
– Almost all the GPU-supported Python
– CuPy (NumPy for GPU)
libraries like CatBoost, TensorFlow,
Keras, PyTorch, OpenCV, and CuPy
were designed to run on NVIDIA CUDA-
enabled graphics cards.

NASA’s Applied Remote Sensing Training Program 30


Overview of Machine Learning Algorithms
Trainer: Jordan A. Caraballo-Vega
Machine Learning Algorithms: Which algorithm to choose?

• The development of machine


learning algorithms has been
exponentially increasing.
• We will not dive into the
specifics of each algorithm, but
we will give you the tools to aid
in the selection of these for your
own science problem(s).
• You are not bounded to a
single algorithm, but it always
saves time to start from a
logical base.

Core Machine Learning Algorithms.


Image Source: github.com
NASA’s Applied Remote Sensing Training Program 32
Machine Learning Algorithms: Science Problem

Data
• Which scientific question would
you like to address?

• What information is missing to Performance Explainability

answer this question?


Science Problem

Dimensionality Complexity

Components to aid the selection of your ML algorithm.

NASA’s Applied Remote Sensing Training Program 33


Machine Learning Algorithms: Science Problem

• Which scientific question would Data


you like to address? We want to
identify the sign, magnitude, and
potential drivers of change in
surface water extent in X study
Performance Explainability
area.
• What information is missing to Science Problem
answer this question? We need
surface water extent maps to
quantify and analyze these
drivers.
Dimensionality Complexity

Components to aid the selection of your ML algorithm.

NASA’s Applied Remote Sensing Training Program 34


Machine Learning Algorithms: Data
Dataset Structure

• What data do you have available?


Discrete Continuous

• Do you have training data available?


Spatial Tabular Spatial Tabular

• What is the data structure of your data? Supervised Unsupervised

• Is your dependent variable a CNN


continuous or discrete problem? SVM
K-means NN RF

Transformers

LSTM XGBoost

Algorithm decision branch based on data structure.

NASA’s Applied Remote Sensing Training Program 35


Machine Learning Algorithms: Data
• What data do you have available? We Dataset Structure

have global coverage with data from


the MODIS satellite.
• Do you have training data available?
Discrete Continuous

We have gathered large extents of


training data points. Spatial Tabular Spatial Tabular

• What is the data structure of your Supervised Unsupervised

data? Our data is in raster format. We


can preprocess it to make it tabular.
• Is your dependent variable a CNN
continuous or discrete problem? Our SVM
K-means NN RF

dependent variable is water pixels, Transformers

which is discrete (0 – no water, 1 – LSTM XGBoost

water)
Algorithm decision branch based on data structure.

NASA’s Applied Remote Sensing Training Program 36


Machine Learning Algorithms: Performance

• Are there any performance


requirements based on your science
question (e.g., real time vs. static)?

• Is your software going to run on on-


premise, cloud, or embedded
hardware?

• What is more important for your


project: inference time or model
performance?

Tradeoff between speed and accuracy.


Image Source: github.com

NASA’s Applied Remote Sensing Training Program 37


Machine Learning Algorithms: Performance

• Are there any performance requirements


based on your science question (e.g.,
real time vs. static)? We do not need real
time maps (e.g., disaster response teams
might need results quickly).
• Is your software going to run on on-
premise, cloud, or embedded hardware?
We want our software to run both on-
premise and in the cloud.
• What is more important for your project:
inference time or model performance?
We care more about model
performance than inference time. Tradeoff between speed and accuracy.
Image Source: github.com

NASA’s Applied Remote Sensing Training Program 38


Machine Learning Algorithms: Operations

Workflow of possible scenarios when selecting an ML algorithm.


Image Source: sckit-learn.org
NASA’s Applied Remote Sensing Training Program 39
Exercise: Running Introductory Notebooks in
Google Colab
Trainer: Jordan A. Caraballo-Vega
Summary

• Overview of Machine Learning

• Importance of Machine Learning targeted towards Earth Science

• Usability of Machine Learning

• Software to Support Machine Learning

• Machine Learning Applications

• Hands on Jupyter Notebook Exercise: Load and Visualize Data

• Post-Session Assignment

NASA’s Applied Remote Sensing Training Program 41


Looking Ahead
Part 2: Training Data and Land Cover Classification Example
• Download the training data

• Exploratory data analysis

• Extracting training data from a tabular dataset

• Extracting training data from a raster dataset

• Training and inference of a tabular and raster dataset

• Metrics and model evaluation

• Hands on Jupyter Notebook Exercise: MODIS Water Classification Case Study

NASA’s Applied Remote Sensing Training Program 42


Contacts

• Trainers:
– Jordan A. Caraballo-Vega: [email protected]
– Jules Kouatchou: [email protected]
– Caleb S. Spradlin: [email protected]
– Jian Li: [email protected]
– Brock Blevins: [email protected]

• Training Webpage:
– https://appliedsciences.nasa.gov/join-mission/training/english/arset-fundamentals-
machine-learning-earth-science

• ARSET Website: Check out our sister programs:


– https://appliedsciences.nasa.gov/arset

NASA’s Applied Remote Sensing Training Program 43


Questions?

• Please enter your question in


the Q&A box. We will answer
them in the order they were
received.
• We will post the Q&A to the
training website following the
conclusion of the webinar.

NASA’s Applied Remote Sensing Training Program 44


Thank You!

NASA’s Applied Remote Sensing Training Program 45


References

• Crankshaw, D., & Gonzalez, J. (2018). Prediction-Serving Systems: What happens when we
wish to actually deploy a machine learning model to production?. Queue, 16(1), 83-97.
• Elders, A., Carroll, M. L., Neigh, C. S., D'Agostino, A. L., Ksoll, C., Wooten, M. R., & Brown, M.
E. (2022). Estimating crop type and yield of small holder fields in Burkina Faso using multi-
day Sentinel-2. Remote Sensing Applications: Society and Environment, 27, 100820.
• Fleming, S. W., Watson, J. R., Ellenson, A., Cannon, A. J., & Vesselinov, V. C. (2021).
Machine learning in Earth and environmental science requires education and research
policy reforms. Nature Geoscience, 14(12), 878-880.
• Prša, A., Kochoska, A., Conroy, K. E., Eisner, N., Hey, D. R., IJspeert, L., ... & Winn, J. N. (2022).
TESS Eclipsing Binary Stars. I. Short-cadence Observations of 4584 Eclipsing Binaries in
Sectors 1–26. The Astrophysical Journal Supplement Series, 258(1), 16.
• Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., & Carvalhais, N. the
National Energy Research Supercomputing Center in Lawrence Berkeley National
Laboratory, Berkeley, CA, USA: Deep learning and process understanding for data-driven
Earth system science. Nature, 566, 195-204.
• Yu, S., & Ma, J. (2021). Deep learning for geophysics: Current and future trends. Reviews of
Geophysics, 59(3), e2021RG000742.

NASA’s Applied Remote Sensing Training Program 46


Contributors

• Jordan A. Caraballo-Vega
• Mark L. Carroll
• Jules R. Kouatchou
• Jian Li
• Caleb S. Spradlin
• Brock Blevins
• Melanie Follette-Cook
• Erika Podest
• Brian Powell
• Akiko Elders

NASA’s Applied Remote Sensing Training Program 47

You might also like