Machine LearningModel
Machine LearningModel
❑Semi-physical models
❑Simulation models
❑Statistical/Empirical: Linear/Non-linear
JawasiyaSolanki
Uni-variate linear model rabi 2017-18
Ground
Data
Popular Machine learning models
ML Models
Genre ML models
Model Selection
Decision trees Classification and Regression tree (CART)
Random forest
Feature Selection
Testing &
Evaluation
Random Forest (RF)
ML Models
1000*5
❖ Several subsets of input data (bags)
Model Selection are created Bootstrap Aggravation
Bagging
❖ For each of the bags, a very deep
decision tree is grown
Feature Selection
❖ While making the decision, at every
split point in the decision tree the 750*5 750*5 750*5
learning algorithm looks through only
Feature random subsample of the feature DT DT DT
Engineering
space.
❖ The decision is made by looking the
Hyper- Gini Index value for each feature
parametrization
❖ The final output is the ensembled Output 1 Output 2 Output 3
output of each of the decision tree.
Testing &
Evaluation ❖ Less prone to overfitting Final
Output
Neural Network (NN)
ML Models
❖ Utilizes the principle of universal
approximation
Yield
❖ Mimic the biological neurons signal
Model Selection transfer (Brain) W2
❖ Comprised of a node layers, (3*1)
Back Propagation
containing an input layer, one or
Feed Forward
Feature Selection more hidden layers, and an output h1/ h1/ h3/ Hidden
layer. a1 a2 a3
layer
❖ Each node has an associated
Feature weight and threshold.
Engineering ❖ If the output of any individual node W1
is above the specified threshold (5*3)
value, that node is activated,
Hyper- sending data to the next layer of the
parametrization x1 x2 x3 x4 x5
network
❖ Learning algorithms to optimize the
weights and bias of the model: Input layer
Testing &
Evaluation Gradient descent, ADAM etc
Which ML model to select?
ML Models
Feature Selection
❖ Fully connected or sparsely connected
Testing &
Evaluation
Category Features Source
Satellite based Reflectance bands Sentinel-2, Landsat-8, MODIS
ML Models
Vegetation Indices-Greenness Sentinel-2, Landsat-8, MODIS
(NDVI, EVI, Red edge index)
Vegetation Indices-wetness Sentinel-2, Landsat-8, MODIS
(NDWI, LSWI, etc)
Model Selection
Radar backscatter (VH, VV, RVI Sentinel-1, EOS-4
etc)
Meteorological Rainfall, Rainy Days IMD gridded, CHIRPS, any other gridded/reanalysis
data
Feature Selection
Dry-spell/Wet-spell IMD gridded, CHIRPS, any other gridded/reanalysis
data
Canopy Biomass
Model Selection Canopy wetness IR-Thermal
Chlorophyll, LAI
(LST, LST_EVI GPP) Microwave
SWIR (Radar
Visble-NIR (NDWI, LSWI) Backscatter)
Feature Selection (NDVI)
Feature
Engineering
Hyper-
parametrization
Testing &
Evaluation
Crop mapping
ML Models
K = 0.81
❖ Classification accuracies
Hyper- should be higher at
parametrization disaggregated level
Testing &
Evaluation
Crop window
ML Models
Model Selection
Feature Selection
Feature
Engineering
Sep-2FN Oct-1FN Oct-2FN Nov-1FN Nov-2FN Dec-1FN Dec-2FN Jan-1FN Jan-2FN Feb-1FN Feb-2FN Mar-1FN Mar-2FN Apr-1FN Apr-2FN
❖ SOS and EOS needs to Early-wheat
Hyper-
parametrization be derived at Mid-wheat
disaggregated level Late-wheat
❖ Features should Gram
Testing & represent the crop Garlic
Evaluation
growing window Potato
Need for feature engineering
ML Models GP: Guskara (m)
Dist: Purba Bardhaman
Model Selection
Feature Selection
Feature
Engineering
Hyper-
parametrization
Testing &
Evaluation
Feature engineering: Example
ML Models
1 3 4 5
2
Maximum Red Maximum Max. Minimum
Model Selection Max. p o s i t i v e
NDVI Negative NDVI
Slope (average)
Bare soil at sowing (average o
f3 max.)
Slope Harvest crop or
preparation Fastest growth Maximum Fastest non green
of vegetation
biomass reduction of residue
Feature Selection greeness
Feature
Engineering
Hyper-
parametrization
Testing &
Evaluation
Feature engineering: Example
ML Models
-13
Feature Selection
1. Season Maximum VH
-15
(Smax VH)
2. Dynamic range of VH
VH
Feature -17
Engineering 332 Kg/Ha
(Range_VH)
-19 544 Kg/Ha
Hyper-
1025 Kg/Ha 3. Area under the VH
parametrization -21 1173 Kg/Ha curve (AUC)
1293 Kg/Ha
-23
Testing &
Evaluation
Tuning of Hyper-parameters
ML Models
ML Model Hyper-parametrs
Random Forest •Number of trees
Model Selection •No. of features selected at each node
•Minimum Leaf size
Temporal VH
Model Training
Weather data
(RF&RD) Data ML model training &
preprocessing validation
(gap filling, outlier •Model parameter optimization
• 2017-2020 soybean yield data
Smax NDVI removal, data
normalization etc)
Smax LSWI
Model deployment
PAWC Current Year Trained ML Soybean crop yield
datamatrix model estimate for current year
Case Study
Machine Learning based Soybean yield estimation in Maharashtra
Model Hyper-parameters
DNN 18-36-18-9-1
Architecture
Activation Leaky ReLU
function
Based upon the accuracies
Learning Adam observed during model training
Algorithm process, these architecture were
Learning rate 0.001 further modified for each clusters
Loss Function RMSE
Regularization Dropout of 0.2
Batch size 8
Case Study
Machine Learning based Soybean yield estimation in Maharashtra
Results Model Performance
Cluster Training RMSE Validation
(Kg/Ha) RMSE (Kg/Ha)
Akola-Washim 228 283
Amravati-Yavatmal 165 240
Nagpur,Wardha 203 240
Jalgaon,Buldana 214 287
Parbhani-Jalna-Hingoli 228 259
Latur-Nanded 187 280
Osmanabad-Solapur 140 246
Beed,Ahmednagar-Pune 231 244
Nagpur
Osmanabad
To Summarize
❖ Machine learning models can captures the non-linear relationships
between the yield and the features influencing the crop yield