USTH
USTH
List of Acronyms i
List of Tables iv
List of Tables v
Abstract vi
1 Introduction 1
1.1 Context and motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Report Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Theoretical Background 3
2.1 Hyperspectral Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Chlorophyll . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Nitrogen, Phosphorus, Potassium concentration . . . . . . . . . . . . . . . . . . . . . 4
2.4 Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.5 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.6 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.6.1 Ridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.6.2 Lasso . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.6.3 Decision Tree Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6.4 Random Forest Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6.5 Support Vector Regression (SVR) . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6.6 Boosting Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6.6.1 Adaptive Boosting (AdaBoost) . . . . . . . . . . . . . . . . . . . . . . 11
2.6.6.2 Extreme Gradient Boosting (XGBoost) . . . . . . . . . . . . . . . . . 12
2.6.6.3 Categorical Boosting (CatBoost) . . . . . . . . . . . . . . . . . . . . 13
Contents
References 38
DL Deep Learning.
K Potassium.
N Nitrogen.
P Phosphorus.
PCA Principal Component Analysis.
i
List of Acronyms ii
2.1 Images record a reflectance spectrum for each pixel in the image[15] . . . . . . . . 4
2.2 The scatter plot shows the relationship between the dependent variable and inde-
pendent variable [10] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Illustration of decision tree regression [11] . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 The schematic of random forest [14] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Working of Boosting Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6 The schematic of XGBoost [14] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.7 Symmetric Tree Architecture of CatBoost . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.8 Leaf Wise Tree Grow Architecture of LGBM . . . . . . . . . . . . . . . . . . . . . . . . 15
2.9 The typical architecture of CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.10 The architecture of VGG16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.11 Residual Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.12 Resnet-50 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.13 DenseNet121 architecture [12] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.14 MobilenetV2 architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.15 EfficientNetB0 architecture [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
iii
List of Tables
A.1 Table of some good chlorophyll machine learning models with their hyper-parameters vii
A.2 Table of some good N concentration machine learning models with their hyper-
parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
A.3 Table of some good P concentration machine learning models with their hyper-
parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
A.4 Table of some good K concentration machine learning models with their hyper-
parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
iv
List of Algorithms
3.1 The proposed pseudocode for getting all ROI position of an HSI . . . . . . . . . . . 24
v
Abstract
Chlorophyll content is one of the most essential elements in the photosynthesis process. It can be
affected by sowing seeds, weather, watering, etc. Nitrogen (N) is essential for rice leaves growth
and helps produce chlorophyll. On the other hand, Phosphorus (P) is important for root growth
and helps grow seeds. And Potassium (K) is for resisting diseases and stresses. Therefore, the
estimations of the these nutrients play important roles to assess the quality of nutrition in the rice
leaves. So that it can help farmers adjust the way to take care of rice plants. Hyperspectral Image
(HSI) captured from Unmanned Aerial Vehicle (UAV) gives the useful information in nutrient
concentration to optimize the agricultural practices.
This work focused on Machine Learning (ML) and Deep Learning (DL) model to solve the problem
of estimation in chlorophyll and N, P, K concentration using hyperspectral image captured from
Unmanned Aerial Vehicle (UAV). This work applied different Machine Learning and Deep Learning
models. Comparing the performance of each model, choosing the best models by using different
model evaluation techniques include R2 , Mean Absolute Percentage Error (MAPE) and Root Mean
Square Error (RMSE). This model can help farmers solve the problem in estimation of nutrient
concentration.
vi
Chapter 1
Introduction
Precision Agriculture is a science field using technologies such as drone, sensors, weather station
or satellite imagery, etc. From the data collected by these kinds of devices, it helps farmers can
receive information of the issues such that environments, weather changes, nutrient concentration
in the leaf so that they can optimize fertilization, irrigation, and soil sowing operations to increase
the profitability and efficiency.
There are several ways to measure nutrient content of the rice leaves: commonly measured by
extraction of chlorophyll in a solvent followed by in vitro measurements in a spectrophotometer or
using non-destructive, in situ, optical techniques. These procedures for getting measured nutrients
are often time-consuming, laborious, economically inefficient and non-scalable. Moreover, by
observing the rice field with hyperspectral cameras, with each pixel has an associated continuous
spectrum, that provides for us nutrient concentration’s information, so that we can detect things
like early symptoms of diseases, water, soil quality and crop health. However, most of the existing
studies have relied on the weather data, water quality level, optical flow analysis or multispectral
imaging for prediction of nutrients. There is a lack of research on using hyperspectral image
to predict rice leaves nutrients, especially in regions with similar climate to Vietnam. Most of
the previous works have focused on different types of leaves, such as oil palm, grapes, broccoli,
and potatoes. Therefore, To address these challenges, I applied the state-of-art techniques of
supervised learning to estimate the concentration of nutrients in Vietnamese rice leaves using
Hyperspectral Image (HSI). Unfortunately, there are no public datasets available for this task
using HSI. However, Dr. Tran Giang Son and his team already built the dataset by fertilizing the
nutrition in the rice field of Phu Tho. Primarily, they divided the rice field into a few replicates,
which they already fertilized with different nutrients. Then, they captured Hyperspectral Image
1
1.1 Context and motivation 2
by using Unmanned Aerial Vehicle, these images were used as the dataset to perform regression
models. As a result, we could know the nutrient contents from each place.
1.2 Objectives
The objective of this report is to create different models to predict the nutrients’ concentration
from Hyperspectral Image (HSI), This work studied and implemented different machine learning
(such as XGBoost, LGBM and Catboost) and deep learning models, then we used different model
evaluation methods for regression problems for analyzing which model is the best one.
Another similar works conducted by Songtao Ban et al., in that study, they acquire images from
two different regions using multispectral and hyperspectral cameras. They had analysis vegetation
indices were high correlated with rice LCC. Study area is in Ningxia and Shanhai. And they
achieved the values which is R2 score = 0.9, RMSE = 1.63 and MAPE is 4.13% in SVR models
for Shanhai calibration datasets based on using eight vegetation indices [4]. In another study by
Xiaokai Chen, they did analysis on canopy using spectral transformation and machine learning
method [5]. And Bogdan Ruszczak, he performed unbiasing the estimation of chlorophyll from
hyperspectral images. With SPAD parameter, R2 score is 0.818, MAPE is 7.2% and 9.583 MSE;
with FvFm, Ridge has R2 = 0.727, MAPE is 3.6% and MSE is 0.001, Ridge is also the best with PI
and RWC parameter[13].
Sulaymon Eshkabilov et al. estimated nutrient concentration on lettuce based on HSI data,
they achieved with a mean of R2 is 0.911 for hydroponics and 0.877 for NFT-grown cultivars [6].
There are not many related works for estimating phosphorus concentration, though. Megan Io
Ariadne Abenina et al. performed prediction of potassium in Peach Leaves using HSI, they used
pretreatment methods, and PLS prediction for original-PLS, R2 is 0.3479 and for SVN-PLS, the
highest R2 which is 0.8446 and RMSE is 0.2917.[1].
• Chapter 5: Conclusion from the result we got and propose in the future.
Chapter 2
Theoretical Background
Hyperspectral remote sensing is the activity of collecting image in many narrow spectral bands.
Hyperspectral Image (HSI) is the image which contains information for collecting and processing
electromagnetic spectrum to obtain spectrum of each pixel in the image (usually capture lights
range from 400 nm to 2500 nm, including near infrared (NIR) and short wave infrared(SWIR)).
Hyperspectral imaging involves using a device called imaging spectrometer (hyperspectral cam-
era, hyperspectral sensors) to collect spectral information. A hyperspectral camera captures
an area’s light, and it will be divided into individual wavelengths or spectral bands. It offers a
two-dimensional representation of the area, while concurrently storing the spectral data of each
pixel. The result is a hyperspectral image with each pixel represents a unique spectrum. Since
materials, compounds at this pixel interact with light in distinct ways, it also has unique spectral
signatures to identify them.
Hyperspectral image is to analyze a spectral response to detect and classify features or objects in
images based on their unique spectra. It also provides both spatial and spectral information about
the object’s physical and chemical properties. The spectral information allows for the identification
and classification of material’s distribution such as P concentration, K concentration, etc. or areal
separation. Hyperspectral imaging helps us to solve the question "what" (based on the spectrum),
"where" (based on location), and "when".
A hyperspectral image contains multiple spectra to create a massive hyperspectral data cube
comprising position, wavelength and time-related information. Compared to multispectral imag-
ing, multispectral image can acquire only a relatively small number of bands (smaller than 10
bands) and broad spectral bands (about 100 nm bandwidth), but hyperspectral imaging provides
more information for more accurate analysis, it can provide us a hundred of bands and narrow
wavelength between bands [16].
3
2.1 Hyperspectral Image 4
Figure 2.1 – Images record a reflectance spectrum for each pixel in the image[15]
Hyperspectral imaging can be used for various applications, such as environmental monitoring
(monitoring the changes in land use, vegetation health and water quality), agriculture precision
(judging crops’ health, monitoring moisture in the soil and nutrient concentration to optimize
crop management and crop yields in practice) [7].
2.2 Chlorophyll
Chlorophyll is an important reaction in the photosynthetic process and using to analyze vege-
tation stress, nutrient cycling, productivity, growth stages and diseases. It is a green pigment
in photosynthetic bacteria, algae, and plants. In plants, chlorophyll is used for the process of
photosynthesis by converting and absorbing light energy into chemical energy. Chlorophyll is
called the central pigment of the photosynthesis reaction because it can accommodate light that
is absorbed by other pigments through photosynthesis. Chlorophyll content is one indicator of
photosynthetic activity, it also plays a role in the process of plant organogenesis [18].
Nitrogen (N), Phosphorus (P), Potassium (K) are the three primary nutrients in commercial
fertilizers. All three of these fundamental nutrients plays an important role for plant growth and
development[8].
Nitrogen is necessary for making sure plants are healthy. It is essential in the constitution of
protein, and protein appears in the most living thing’s tissue. Nitrogen had an impact in organic
structure, physiologic. Therefore, if we do not have enough Nitrogen, it may affect the structure
and function of photosynthesis [8].
2.3 Nitrogen, Phosphorus, Potassium concentration 5
Phosphorus is related to the plant’s ability to use and store energy, including the process of
photosynthesis, helps the plant grow normally. When P is insufficient, it will decrease leaves,
change photosynthesis and carbon [8].
Potassium presents the most in cellular cation, have a key role in cellular activities such as
charge balance, membrane protein transport [8]. Therefore, it helps plants’ to have the ability to
resist disease. Also, it increases crop yields and quality of plants. K protects the plant from is cold
or dry weather, enhance the root and prevent the wilt.
Regression analysis is one of the typical task in machine learning and deep learning. The task is
to predict a numeric value (dependent variable or outcome) such as price of a car, given the set of
features (independent variables) such as mileage, age, brand, etc. Another example is in context
of this thesis, the goal is to predict the nutrient concentration (dependent variables) based on the
pixel values on each band (independent variables).
Figure 2.2 – The scatter plot shows the relationship between the dependent variable and
independent variable [10]
To know how much error when the system makes predictions in regression, some typical
common performance measures are mean square error (M SE), mean absolute error (M AE), mean
absolute percentage error (M AP E), R square (R2 ).
Principal Component Analysis (PCA) is a technique which used for highlights the importance of
each new features. If we have hundreds of features, the redundancy now become suboptimal. But
with PCA, we could reshape the data to a form of the new variables set (principal components)
2.5 Principal Component Analysis 6
which are uncorrelated and ordered according to the first few retain most of the variation presents
in all the original variables.
Firstly, we will standardize the dataset by calculating the mean and standard deviation of each
feature. Then, we apply the below formula for each data of each feature:
x −µ
x new = (2.1)
σ
Next, we will calculate covariance matrix. Covariance and Variance are measure of how "spread"
of a set of points around their center of mass (mean). Covariance measure between 2 dimensions
to see if there is a relationship between these two dimensions.
n
1X
C ov(X , Y ) = (x i − x)( yi − y) (2.2)
n i=1
Where:
Using the above formula, we construct the covariance matrix. For example, we have three
variables X, Y and Z
C ov(X , X ) C ov(X , Y ) C ov(X , Z)
A=
C ov(Y, X ) C ov(Y, Y ) ,
C ov(Y, Z) (2.3)
C ov(Z, X ) C ov(Z, Y ) C ov(Z, Z)
The diagonal is the variances of X, Y and Z. This matrix is symmetrical about the diagonal. Next,
we calculate eigenvector and eigenvalue. Eigenvectors are principal components and Eigenvalues
are the percentage of information (variance) explained. The formula for finding Eigenvectors and
Eigenvalues is:
Av = λv (2.4)
• A is the matrix
• v is Eigenvector
• λ is Eigenvalue
2.5 Principal Component Analysis 7
Av − λv = 0
(A − λI)v = 0
Since v is a non-zero vector, the equation can be zero only when: det(A − λI) = 0. Solving for
the λ. Then finding values for vector v for different λ.
After finding Eigenvalues and their corresponding Eigenvectors, we need to sort it such that
from highest to lowest. Then pick the top k eigenvalues and form a matrix of eigenvectors. Finally,
we transform the original matrix.
2.6.1 Ridge
Ridge regression is similar to the Linear Regression, but adding another term called penalty term.
This penalty term is added to ensure there is regularization, it also can be called L2 penalty. This
term is used for shrinking the weights of the model close to zero for ensuring that the model does
not over-fit the data. The cost function of Ridge regression can be represented as the function
below:
n D
1 X X
const(w) = ( yi − ŷi )2 + λ w2j (2.6)
2n i=1 j=1
However, ridge regression does not reduce the number of variables because it never leads a
coefficient to a zero, it only minimizes it. Therefore, this learning model is not good for feature
reduction. This model is suitable when all features are important.
2.6.2 Lasso
The lasso regression is similar to ridge regression. It has another name, which is L1 regularization.
The purpose also is to shrink the coefficient to zero. However, lasso takes the magnitude of the
coefficients.
n D
1 X X
const(w) = ( yi − ŷi )2 + λ |w j | (2.7)
2n i=1 j=1
2.6 Machine Learning 8
If we have two or more highly collinear variables then lasso regression select one of them
randomly, hence, it is not good for interpretation of data. But it is suitable when some features
are irrelevant or redundant.
Regression Tree is a kind of decision tree is used for the regression problem to predict the values
which are continuous. There are two steps in regression tree:
— Partition the predictor space set of possible feature variables into separate and non-overlapping
regions.
— For each observation that falls into a region, we predict, which is the mean of response value
in the training set.
For decision tree regressor, RSS is the measure tells us how much the prediction deviate from
the original target value and the goal is to divide the space in a way that minimize RSS:
n
X
RSS = ( yi − ŷi )2 (2.8)
i=1
— There are several ways to avoid overfitting problems on regression tree, the simplest is to only
split observations when there are more than some minimum number.
Random Forest is an ensemble technique using in both regression and classification tasks with the
use of multiple decision trees and a technique called Bootstrap and Aggregation (bagging).
Implementation of Bagging:
Step 1: Multiple subsets are created from the original data set with equal tuples, selecting obser-
vations with replacement.
Step 3: Every model is learned independently of each another and in tandem with every training
set.
Step 4: The final predictions are determined by combing the predictions from all the models.
Basically, Random Forest Regression uses multiple decision trees as base learning models.
Randomly perform row sampling and feature sampling from the datasets forming datasets for
every model, combine multiple decision trees in determining the final output rather than relying
on individual decision trees.
Random Forest gives us more advantages, such as it is less sensitive to the training data compared
to the decision tree. Besides, it is more accurate than the decision tree, effective in handling large
datasets and missing data, outliers and noisy features.
Support Vector Machine is a method had been used in wide range of fields. In machine learning,
SVR is a variant of the SVM. The strategy here is to minimize the error, individualize the hyperplane
which maximizes the margin.
N
1 X
min ||w||2 + C (ξi + ξ∗i ) (2.9)
2 i=1
such that
• yi − wx i − b ≤ ε + ξi
• wx i + b − yi ≤ ε + ξ∗i
2.6 Machine Learning 10
• ξi , ξ∗i ≥ 0
The two functions below were used to minimize the above function
N
X
y= (αi − α∗i )K(x i , x) + b (2.11)
i=1
K(x i , x) ≡ 〈ϕ(x i ), ϕ(x)〉, The kernel functions transform the data into a higher dimensional
feature space to make it possible to perform linear separation.
During training process, boosting regressor algorithm assign weights to the training examples
based on the importance. Previously mis-predicted examples or have higher error assigned
with higher weights, so that subsequent weak learners focus more on them. This adaptive
weighting scheme allows the boosting regressor to give more attention to challenging examples
and potentially improve overall performance.
Gradient descent optimization is used to minimize the loss function in boosting regressor
algorithm. The optimization process aims to find the best weights and parameters for each weak
learner to minimize the overall loss.
AdaBoost (short for Adaptive Boosting), is a kind of boosting algorithm that works by first fitting
a weak regressor (typically decision tree) on the training dataset, the weak learner is trained to
minimize the weighted error (weights reflect the importance of each training example). After
training the weak learner, the performance is evaluated by calculating the weighted error, which
measures how well the weak learner predicts the target values. The algorithm will "re-weight" the
training samples based on the performance of the weak learner. The sample that were incorrectly
predicted by the weak learner are assigned higher weights to give the next weak learner focuses
on the subsequent iterations. On the other hand, correctly predicted examples are assigned lower
weights. This process is repeated for in predefined number of iterations. In each iteration, the
algorithm assigns a weight to each weak learner in the ensemble based on its performance. The
learner have better performance will have higher weight in the final prediction.
And to make prediction on new samples, the AdaBoost regressor will combine the prediction of
all weak learners, weighted by their individual weights and the final prediction is the weighted
sum of the predictions from each weak learner.
Input : X - Training features, y - Target values, num_est imat ors - Number of weak learners
(iterations)
Output : Ensemble Model
1
Initialize the sample weights w i = N, where N is the number of training examples;
for t = 1 to num_est imat ors do
Train a weak learner h t on the training data using the weights w i ;
Compute the weighted error e t :
N
w i | yi − h t (X i )|
P
i=1
et = N
, for each training example (X i , yi );
P
wi
i=1
Compute the weak learner weight α t :
1 1 − et
αt = ln
2 et
Update the sample weights w i :
w i = w i e−αt ( yi −h t (X i )) , for each training example (X i , yi ).
Normalize the weights:
wi
wi = N
P
wi
i=1
end
Pnum_est imat ors
return Ensemble Model = t=1 (α t × h t (X )) for each weak learner.
The AdaBoost have many advantages such as fast, simple, easy to use. However, it also can be
vulnerable to noise or lead to overfitting the data.
XGBoost is a popular library designed by using gradient boosted trees algorithm, which tries to
accurately predict a target by combining the estimates of a set of simpler, weaker models.
2.6 Machine Learning 13
Input : X - Training features, y - Target values, num_est imat ors - Number of weak learners
(iterations), l ear ning_r at e - Learning rate to control the contribution of each weak,
ma x_d epth: Maximum depth of each weak learner (decision tree)
Output : Ensemble Model
Initialize F0 (x) = ini t ial_pr edic t ion = mean( y)
for t = 1 to num_est imat ors do
Compute the first derivative (gradient) and second derivative (hessian) of the loss function of
each training example:
∂ L( yi , F (x i ))
gi = for each training example (x i , yi ).
∂ F (x i )
∂ 2 L( yi , F (x i )
hi = for each training example (x i , yi ).
∂ F (x i )2
L( yi , F (x i )) is the loss function MSE
Train a weak learner m t on the first g i and second derivative hi :
- g i indicate the direction and magnitude of the change in the loss function to minimize it
when the ensemble predictions change.
- hi measures the curvature of the loss function and provide additional information about
the the rate of change of the gradients. It helps adjust the learning rate of the weak learners
based on the curvature of the loss function.
- Specify parameter ma x_depth to control the depth of the weak learn. Fit the weak learner
to the training data and obtain the prediction m t (x).
Update the ensemble model:
F t (x) = F t−1 (x) + lear ning_r at e × m t (x)
end
Output: Ensemble Model = F t (x), for t = 1 to num_est imat ors.
CatBoost is a relatively new open-source machine learning algorithm developed in 2017 by Yandex.
One of the main advantages of CatBoost’s is its ability to integrate a wide variety of different data
2.6 Machine Learning 14
types (such as images, audio, etc.). CatBoost strength consist of handling the data, requiring
minimum of categorical feature transformation, unlike other machine learning algorithms, can
not handle non-numeric values; its gradient-based optimization and regularization techniques.
Combining these techniques can build a strong regressor.
CatBoost is built based on the decision trees and gradient boosting. Specifically, CatBoost grows
symmetric trees, which means that the trees are grown by imposing the rule that all nodes at the
same level, test the same predictor with the same condition, and hence an index of a leaf can
be calculated with bit-wise operations. The symmetric tree makes a simple fitting scheme and
efficient CPUs, while the tree structure find an optimal solution and avoid overfitting.
• Data Preparation: CatBoost can handle both numerical and categorical features directly
without encoding. For categorical features, it uses "Ordered Target Encoding" technique
(utilizes the target variable’s statistical information).
• Initialize Model and Prediction: Define model hyperparameter and an empty ensemble of
decision trees, then use the mean of the target variable as the initial prediction.
• Gradient Calculation: Compute the negative gradients (residuals) between the true target
values and the current predictions. These gradients represent the direction and magnitude
of the errors.
• Decision Tree Construction: Using symmetric tree growth strategy and builds trees level by
level, similar to gradient boosting algorithms.
• Gradient Boosting: Update the predictions by adding the predictions from the newly created
decision tree, then multiply the predictions by a learning rate to control the contribution of
each tree.
• Repeat steps 3 to 6 for a specified number of iterations. For each iteration, a new decision
tree is built to correct the errors made by the ensemble so far.
2.6 Machine Learning 15
• The final prediction is obtained by weighted summing the predictions from all decision
trees in the ensemble.
• For LGBM, it requires categorical features which are preprocessed and encoded as numerical
values before training, but for CatBoost, it has built-in support for handling categorical
features using an algorithm called Order Target Encoding.
• LGBM provides options to handle missing values. However, CatBoost can automatically
handle missing values in both categorical and numerical features.
• LGBM uses a leaf-wise tree growth strategy, where it grows the tree by expanding the leaf
with the highest loss reduction at each level, and employs techniques like histogram-based
binning. but CatBoost using a symmetric tree growth strategy and builds trees level by level.
Convolutional Neural Networks (CNN/ConvNet) is a type of deep learning algorithm. The main
purpose of ConvNet is to reduce the images into a form that is easier to process, without losing
important features required for good prediction. CNN use fewer parameters (weights) to learn
than a fully connected network, they are designed invariant to object position or distortion of the
scene, and it can automatically learn and generalize features from the input domain.
• Convolutional layers: The element firstly involves in the convolution is called the kernel
(filter). This kernel will shift to the right with a stride length, until it parses the complete
width. Every time, it will perform an element-wise multiplication operation between the
2.7 Deep Learning 16
kernel and the portion of the image. The objective of the convolution operation is extracting
the features. The first convolution layer is responsible for extracting the low level features,
and if adding more layers, the architecture will learn the high level features as well.
• Pooling layers: is used for reducing the spatial size of the convolved feature or reducing the
dimensionality. Some types of pooling are max pooling and average pooling.
• Fully connected layer: The output now will be flattened and feed to the neural network.
2.7.2 VGG16
VGGNet was developed by company Visual Geometry Group. VGG16 is very simple and classical
but can be considered one of the best computer vision model. It had 2 or 3 convolution layers
and a pooling layers, then again 2 or 3 convolution layers and a pooling layers and so on (until
reaching 16 or 19 convolutional layers depend on its variants), and finally a dense network with
hidden layers and output.
The limits of VGG 16 can be very slow to train and a large of parameters.
2.7.3 Resnet50
Resnet50 is introduced by Microsoft Research in 2015, ResNet50 is another kind of CNN which
has 50 layers and ResNet is short for residual networks. This model show us that computer vision
models can be getting deeper but with fewer parameters. The key of ResNet to train is using skip
connection, which skips over some layers of the model. When you have a neural network, the
purpose is to make it model a target function h(x). You add input x to the output of network (skip
connection), it will force modeling f(x) = h(x)-x instead of h(x). This is residual learning.
2.7 Deep Learning 17
Resnet-50 architecture can be divided into 6 parts: Input Pre-processing, Cfg[0] blocks, Cfg[1]
blocks, Cfg[2] blocks, Cfg[3] blocks and fully connected layer. As the below figure shows:
2.7.4 DenseNet121
When the number of CNN layers get deeper, it raises vanishing gradient. It means that some
certain information get lost or vanish. Therefore, it affects the network to train effectively.
• Connectivity: the feature maps of all the previous layers are not summed but concatenated
and used as inputs. Hence, the i t h layer receives the feature-maps of all preceding layers
x 0 , ..., x i−1 as input: x l = H l ([x 0 , x 1 , ..., x l−1 ]) where [x 0 , x 1 , ..., x l−1 ] is the concatenation
of the feature maps. The multiple inputs of H i are concatenated into a single tensor to ease
implementation.
• Growth Rate: The size of the feature map grows after each dense layer, with each layer
adding ’K’ features on existing features. This parameter k is the growth rate of the network.
If each function H i produces k feature maps, then the i t h layer has: kl = k0 + k(l − 1) where
l is layer index, k0 is the number of channels in the input channel.
2.7.5 MobileNetV2
2.7.6 EfficientNetB0
EfficientNet using the idea of compound scaling, which is stead of scaling only one model attribute
which are depth, width and resolution, strategy here is scaling all three of them together to give a
better result. here is the formula for scaled attributes:
2.7 Deep Learning 19
• depth: d = αφ
• width: w = β φ
• resolution: r = γφ
such that αβ 2 γ2 ≈ 2, α ≥ 1, α ≥ 1, γ ≥ 1
The EfficientNetB0 network includes a stem convolutional layer, multiple MB blocks, global av-
erage pooling and fully connected layer for performing regression or classification. EfficientNetB0
is the base model of EfficientNet family.
The purpose of EfficientNetB0 is to design for both accurate and computationally efficient by
using scaling approach.
Chapter 3
3.1 Material
The data field was conducted by Dr.Tran Giang Son and his ICTlab team at Phu Tho province. The
land in these areas is used mainly by the smallholder farmers. The common crops in those areas
cultivated is rice. In this thesis, the dataset was carried out on 6th May 2022. There are two main
types of rice cultivar. The left of the field is using rice cultivar TBR225 and the right of the field
using J05. With each plot is numbered from right to left, from top to bottom. In total, we have 27
plots for rice cultivar TBR225 and J05. Other than that, we also have rice cultivar bc15 which had
been planted in 5 different location at the edge of the rice field. The shape of each plot is a square
field with side 10 m. The plot design has the spatial distribution as the illustration in Figure 3.1.
20
3.1 Material 21
The Hyperspectral Image (HSI) were collected by Hyperspectral Camera OCI-F attached to the UAV
DJI Matrice 600 Pro, which can capture images with 120 channels in the visible and near-infrared
wavelength (400 nm-1000 nm). This camera captures images by using the push-room method and
its scanning range width is 800 pixels. There are two hyperspectral images were collected in 6th
May 2022 which is stored as the ENVI standard format. We can see the details of Hyperspectral
Image files as below:
– description, samples, lines, bands, header offset, file type, data type, interleave, sensor
type, byte order, system type, main file format, gps file included
– map info: show geographic information of Hyperspectral Image
From the information provided by metadata, we can know that an HSI has the resolution of 2
cm per pixel, the dimension is 10254 × 11687 × 122. The other one with the resolution is 3 cm
per pixel and the dimension is 7655 × 7347 × 122. These two images contain 122 continuous
spectral bands with the corresponding wavelength ranging from 410 nm to 958 nm, each band
step is about 4 nm.
Also, a CSV file was assigned which contains the following information: Code, Northing, Easting,
Height, Latitude, Longitude, Elevation, Date, Time, Chlorophyll, Rice height, N Concentration, P
Concentration, K Concentration, etc. Because of the thesis’s scope, what we need is to analyze the
geographical positions of the sample (code, longitude, latitude, north, east) and their nutrient
concentrations which are N concentration, P concentration, K concentration, and chlorophyll
content.
From the information in the .csv file, there are three interested positions in each plot. Each
position has different irrigation, fertilizers, so the rice leaves have various nutrition. In total, there
are 171 rice leaves in different regions that have been collected and analyzed.
3.2 Methodology
Firstly, the HSI captured the entire region of the field. However, only some region was measured
the nutrient concentration. What we need to do is extract the Region Of Interest (ROI), merge it
with the measured concentration as the actual value based on the field code. Then, we would apply
some preprocessing technique to remove redundant data and make the value dataset become
more suitable for making more precise in prediction. The next step is finding and studying the
proper learning algorithm, applying it to the dataset, finding the optimal hyperparameter to satisfy
statistical metrics. All of these models will be compared and assessed. These process would be
iterated until finding the best model based on these metrics. The detail of these all steps can be
illustrated as the Figure 3.2.
3.2 Methodology 23
The HSI file consists of metadata, hence we could know geographic information based on the
attribute "map info". In this attribute, what we are interested here is the key-value pairs of pixel
easting, pixel northing and x, y pixel size.
Easting and Northing are the term used for geographic Cartesian coordinates (projected coordi-
nate system) for a point. Instead of using latitude and longitude which are spherical coordinates
are hard for determining position of ROI in planar HSI. Pixel easting refers to the position of
the eastward-measured distance (or the x-coordinate) of HSI and pixel northing refers to the
northward-measured distance (or the y-coordinate). In short, it represents the position of HSI’s
geographic coordinates in the planar earth map. This position is used as the original point when
determine the position of ROI.
Another information we need to use, which is geographic Cartesian coordinates of ROI. Thanks
to Department of Space and Applications at USTH for converting from unprojected position in
longitude and latitude to the planar coordinates, so that we can get the exact planar geographic
position of the field code.
The position of ROI in HSI now is the distance between the position of ROI in planar earth map
and the original point (position of HSI in planar earth map). After getting this distance, we need
to scaling it to make this position fits the HSI size by dividing it the x pixel size and y pixel size.
From the above idea, the proposed pseudocode had been used for getting the list of all ROI in a
hyperspectral image.
Input : csv_ f il e_path - The file .csv contain related information of ROI including geographic
position
head er_ f il e_path - The header file .hdr of HSI containing the map info
Output : The list of all ROI position in HSI
field_pd = READ_CSV(csv_ f il e_path)
header = READ_ENVI_HEADER(header_ f ile_path)
f iel d_pd["East"] − header["map info"]["Pixel easting"]
f iel d_east =
head er["map info"]["x pixel size"]
f iel d_pd["North"] − header["map info"]["Pixel northing"]
f iel d_nor th = −
header["map info"]["y pixel size"]
coordinate = {"east": f iel d_east, "north": f ield_nor th}
coordinate = CONVERT_TO_DATAFRAME(coordinate)
coordinate = CONVERT_TO_NUMPY_ARRAY(coordinate_df)
return coordinate
Algorithm 3.1 – The proposed pseudocode for getting all ROI position of an HSI
This pseudocode can be easily implemented by using library spectral, pandas and numpy in
python, what we need to have is a file CSV which has planar geographic information of ROI and
header file .hdr of hyperspectral image. Then we calculate the coordinate in HSI, convert the
result into a list of (x, y) coordinate in Hyperspectral Image as array.
After knowing these coordinates, we can map all these positions to the image by pinpoint
them so that we can know if the process is working correctly. To map the point, the python code
could be implemented to show how to map the position by using OpenCV function: cv2.circle(),
cv2.putText(), cv2.imwrite().
The Figure 3.3 show the illustration of all ROI we are interested in the hyperspectral image if
they are in the correct position or not.
3.2 Methodology 25
Figure 3.3 – The ROI’s spatial distribution of 3 cm per pixel image after extracting RGB
channels
After determining these ROI had been correctly map to the Hyperspectral Image. The next step is
getting all these pixels value of ROI as the input of learning model.
The library spectral provides convenient interface for getting the pixel value of ENVI format.
What we need to do now is using their interface such as .read_pixel() to get pixel values of all
bands at designated coordinate or slice pixel value of ROI, the same as array in the library numpy.
For the machine learning models, we extracted the exact pixel value at these coordinate, to
create a patch of 1×1. Therefore, we have a dataset with each band from channel 1 to channel
122 now becomes the features. Each feature shows the pixel value at samples’ positions. And
for the deep learning models, we extracted the small regions around those coordinate, created a
patch of 32×32 with 122 channels. We had 2 hyper-spectral image, each one have 171 points
found. In total, we have 342 samples.
Before putting this dataset into learning models, some normalization techniques will be per-
formed to improve the performance and training stability of the model. We also need to get rid
of the null nutrient data. After that, we normalized the dataset by using z-score technique for
machine learning and convert value range of 0-255 into the range of 0-1 in deep learning.
3.2 Methodology 26
PCA
Because of the high dimension in HSI, PCA is the technique used for bands reduction to decrease
redundancies and increase model’s efficiency [9]. Principal Components explain the part of the
variance. For machine learning model, we would like to see if reducing the bands can give the
better performance. To choose how many principal components should we have, we need to get
the information about the explained variance and plot the cumulative variance. The figure shows
the needs to explain variance of the dataset 1×1 used for the machine learning algorithm.
From the above figure, we see that to get the 90% of variance explained, we need to have 60
number of components. Therefore, we define that 90% is the cut-off threshold. So for the case of
using PCA to reduce the dimension dataset for machine learning models, the number of principal
components was decided which is 60.
Hyperparameter Optimization
In machine learning, hyperparameters is a kind of parameter must be set before training ML models
to configure a ML model and reduce the loss function. Hyperparameter Tuning is the process of
finding an optimal hyperparameter. Manual tuning is not an optimal choice because there are
many so many hyperparameters for complex models, time will be consumed for model evaluation.
Therefore, so many HPO technique had been research for automating tuning hyperparameter and
making it effective in practical problems[17]. In this work, grid-search will be used for Ridge,
3.2 Methodology 27
Lasso, Decision Tree Regression and Random Forest Regression. Grid-search firstly start with
searching a large space and step, then it will narrow the previous result until finding optimum.
The grid-search is not the choice when it comes to high-dimensionality hyperparameter space
because of it complexity O(nk )[17]. For the XGBoost, CatBoost and LGBM, we will use Optuna
which is an optimization framework using "define-by-run" principle [3] because of their large
parameter search space.
After the dataset is processed, it will be used as the input of the machine learning models. There
are 122 features corresponds to 122 bands in the image and 60 features after performing PCA
technique. The dataset was divided into 80% (about 273 samples) using as training dataset and
20% test set (about 69 samples). While the training models, 20% of the training dataset was used
as the validation set (about 55 samples). The 5 folds cross-validation had been applied while
hyperparameter tuning to find the optimal hyperparameter. The performance of the model will
be evaluated by using the test set.
Hyperparameter Tuning
PCA
Training Data Training Model Validation Result
1x1x122...
PCA
Test Data Model
For the deep learning model, after preprocessing the image, we still divided the same ratio as
machine learning model, 80% for training set and 20% for test set. While training the models,
20% data of training set was used as the validation set before being evaluated in test set. We
use the size 32×32×122 because it is the requirement input size of some CNN architectures to
learn. Then, we will use 5 architectures: VGG16, Resnet50, DenseNet121, MobileNetV2 and
EfficientNetB0. After that, we put it into global average pooling layer before put it into fully
connected layers with output layer using linear activation function. Each of the models is trained
about 200 epochs with Adam optimization is 0.001.
3.2 Methodology 28
Performance metrics are important for regression models – to evaluate and monitor the perfor-
mance and error of their predictions. The quality of the statistical metrics depends on many
factors, such as the nature of the variables employed in the model, the units of measure of the
variables, and the data transformation is used. In this work, the learning algorithm performance
were evaluated and compared by using different statistical metrics.
3.2.4.1 RMSE
RMSE is usually used as a standard metric for measuring errors of regression. RMSE constitutes
the standard deviation of the residuals (the differences between the model predictions and the
true values). RMSE plays an important role in estimating of how large the residuals are distributed.
When it comes to outliers, RMSE is more sensitive than MAE by producing large errors in the
presence of outliers. RMSE can be calculated as the following equation:
v
N
t1
u X
RM SE = ( yi − ŷi ) (3.2)
N i=1
where:
• yi is actual value
• ŷ is predicted value
• N: number of samples
The smaller RMSE, the smaller the variance between the errors.
3.2.4.2 MAPE
Mean Absolute Percentage Error (MAPE) is the mean of all absolute percentage errors between
the predicted and actual values. It is similar to MAE, but it is a metric for calculating the error as
a percentage. The formula of MAPE can be represented as:
3.2 Methodology 29
N
1 X | yi − ŷi |
M AP E = (3.3)
N i=1 yi
where:
• yi is actual value
• ŷ is predicted value
• N is a number of samples
where:
• yi is actual value
• ŷ is predicted value
• N is a number of samples
This thesis was implemented in my personal computer for image preprocessing, Google Colab for
machine learning models and Kaggle kernels for deep learning models. Besides, here is the list of
libraries was used for this work:
• TensorFlow and Keras: TensorFlow is an open-source library developed by Google using for
deep learning application. Keras is a higher level library that is build on top of TensorFlow.
• Spectral (SPy): Python library using for processing hyperspectral image data, including
reading, displaying, manipulating, and classifying hyperspectral imagery.
The performance of chlorophyll overall working good with the dataset. With the MAPE is very
low (<5%). The R2 is quite good when comparing with other models in N, P, K concentrations.
The best one is the Ridge model after applying PCA (with a very small RMSE which is 1.93 and
3.96% in MAPE and highest R2 score). Overall, the machine learning algorithm seems to perform
very well. Besides, SVR after PCA also give us a very good result. For boosting machine learning
algorithm. It seems that there are not much difference between them. CatBoost perform the best
with 122 channels, it has the value of 4.33% in MAPE and 2.09 in RMSE, and the R2 is 0.16 which
is relative good. Although, after performing PCA, the score is now become worse because of the
MAPE and RMSE increase a little and R2 now reduce to 0.11. To see hyperparameters of some
good machine learning models, we can see in Appendix A. In contrast, decision tree regression
31
4.1 Chlorophyll Model Prediction and Comparison 32
is the worst in machine learning models with very low in R2 and high RMSE, MAPE. Even after
applying PCA technique, with just can improve a little bit but can not compare to other learning
models.
With the deep learning models, DenseNet121 seems to be the best one with the low result in
RMSE which is 2.05 and 4.40% in MAPE. Besides, it has a second-highest value in R2 which is
0.22. The next one is MobileNetV2 which has the highest value – 0.23 in R2 score but RMSE and
MAPE is not good as DenseNet121. The third one is VGG16, it also has a relative good value in
R2 and even the best result in RMSE and MAPE. Though, Resnet50 and EfficientNetB0 is not as
our expectation because of their bad performance in all three model evaluation. So we can see
that ridge regression in machine learning and DenseNet121 is the best model in deep learning.
For performance in N concentration, it does not perform well. Overall, it has very high in MAPE
and RMSE. And most of them can not beat the R2 score. All the models seem not to learn anything
from the dataset. In N concentration, boosting algorithm (AdaBoost, XGBoost, CatBoost and
LGBM) performs seem to be better than the others, with XGBoost (68.45% in MAPE and 0.024 in
R2 ) and LGBM (0.069 in R2 ) can be considered good choice.
4.2 N concentration Model Prediction and Comparison 33
For deep learning models, we can see that DenseNet121 still is the best one with the highest
value in R2 score and lowest value in MAPE (49.90%). It beats other models in both machine
learning and deep learning. The next good choice is VGG16, which is not good enough, though it
can beat almost all the machine learning models.
In N concentration, it seems that deep learning model performs better than machine learning
models. With DenseNet121 is the best one. And machine learning models we used here is not the
optimal choice for applying in N prediction.
Now with the prediction of P, the MAPE in general is good (<25%) though the models have the
relative high in RMSE and very low in R2 score. All the models can not beat the R2 score. The
possible reason here comes from our dataset, it does not have a good measured value range of
P concentration. In models for P prediction. SVR, AdaBoost and CatBoost seems to have the
similar performance. but with CatBoost give us the best result in prediction of P concentration. In
contrast to the performance of chlorophyll and N concentration, DenseNet121 now become the
worst model. Unlike VGG16, which still has the relative good performance in RMSE and MAPE
when compared to other models and Resnet50 also is another good choice though.
4.3 P Concentration Model Prediction and Comparison 34
In general, prediction of P concentration is still not good as chlorophyll, but they have an
overall good results in MAPE and RMSE when compare to N concentration. The machine learning
model, specifically catboost seems to have better result than the others’ machine learning and
deep learning models, but it is trivial.
Now moving to the K prediction, in general, MAPE of all models can achieve very good result
(<20%) though R2 and RMSE is not good as our expectation. We see that SVR and LGBM are two
models have achieved higher R2 than the most other models (LGBM achieve values of R2 score is
0.005 and MAPE is 19.02% with total 122 channels). However, after applying PCA techniques,
we can see that AdaBoost is improved so much with R2 is 0.012, better result than R2 of LGBM
but still have the approximately results in RMSE and MAPE as LGBM. So we can conclude that
AdaBoost with applying PCA give us the best result for prediction of K concentration in machine
learning. And for deep learning models, EfficientNetB0 surprisingly gives us the best the result
with highest R2 and performs good at MAPE and RMSE compared to other deep learning models.
Chapter 5
5.1 Conclusion
In this thesis, we performed of predicting nutrition regression, specifically with the chlorophyll
content, N concentration, P concentration and K concentration. As the result we discussed, we
got the best result in predicting chlorophyll content. This is because of good dataset which uses
HSI captured with suitable wavelength, the stability in chlorophyll content and the accuracy in
measured chlorophyll content. However, we did not successfully get the good result with N, P, K
concentration. Specifically, N concentration had the worst result because of its poor performance
in all three of model evaluations, all the models almost learned nothing from the dataset. P and K
concentration prediction perform good at MAPE, but low R2 and relatively bad at RMSE. The
possibility is because of imbalanced data in measured N, P, K concentration, or the training data of
a season is not enough for learning model. Another possible reason is that the captured wavelength
of hyper-spectral image is not covered enough wavelength (N, P, K may use the wavelength in
the range of 900 nm to 2100 nm [19]) to lead to low correlation between N, P, K concentration
and spectral bands. Many models after applying PCA can get better result, but some of them do
not improve the performance. Therefore, we can conclude that PCA can be a good choice for
preprocessing the datasets to have better results. Boosting algorithm and SVR, all of them may
not give the best result, but they give us the stability of performance because there are no many
differences in model evaluation. For deep learning models, VGG16 did not give us any impressive
results, but the overall results are better than other models. DenseNet121 performs the best in
Chlorophyll and N concentration, but it becomes the worst in P, K concentration prediction.
In the future work, we would like to improve the quality of the dataset and increase the sample in
the dataset. Because the dataset in one season is not enough for us to train a good models. Besides,
35
5.2 Future Work 36
studying and applying more image preprocessing techniques such as Gaussian smoothing, median
filtering, and wavelet denoising to reduce the noise of hyper-spectral image. Using normalized
difference vegetation index (NDVI) for finding the most correlation wavelength features of N, P, K
concentration. Finding the way to solve the imbalanced dataset in regression by using techniques
label distribution smoothing (LDS) and feature distribution smoothing (FDS). And using more
state-of-the-art learning models such as vision transformer and introduce more metrics to have a
balanced view in model evaluation.
References
[1] Megan Io Ariadne Abenina, Joe Mari Maja, Matthew Cutulle, Juan Carlos Melgar, and
Haibo Liu. “Prediction of Potassium in Peach Leaves Using Hyperspectral Imaging and
Multivariate Analysis.” In: AgriEngineering 4.2 (2022), pp. 400–413. ISSN: 2624-7402.
DOI : 10.3390/agriengineering4020027. URL : https://www.mdpi.com/2624-7402
/4/2/27.
[2] Tashin Ahmed and Noor Sabab. “Classification and understanding of cloud structures via
satellite images with EfficientUNet.” In: (Sept. 2020). DOI: 10.1002/essoar.10507423
.1.
[3] Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama.
Optuna: A Next-generation Hyperparameter Optimization Framework. 2019. arXiv: 1907.1
0902 [cs.LG].
[4] Songtao Ban et al. “Rice Leaf Chlorophyll Content Estimation Using UAV-Based Spectral
Images in Different Regions.” In: Agronomy 12.11 (2022). ISSN: 2073-4395. URL: https:
//www.mdpi.com/2073-4395/12/11/2832.
[5] Xiaokai Chen et al. “Estimation of Winter Wheat Canopy Chlorophyll Content Based
on Canopy Spectral Transformation and Machine Learning Method.” In: Agronomy 13.3
(2023). ISSN: 2073-4395. DOI: 10.3390/agronomy13030783. URL: https://www.mdpi
.com/2073-4395/13/3/783.
[6] Sulaymon Eshkabilov et al. “Hyperspectral Image Data and Waveband Indexing Methods
to Estimate Nutrient Concentration on Lettuce (Lactuca sativa L.) Cultivars.” In: Sensors
22.21 (2022). ISSN: 1424-8220. DOI: 10.3390/s22218158. URL: https://www.mdpi.c
om/1424-8220/22/21/8158.
[7] Dehua Gao et al. “In-field chlorophyll estimation based on hyperspectral images segmenta-
tion and pixel-wise spectra clustering of wheat canopy.” In: Biosystems Engineering 217
(2022), pp. 41–55. ISSN: 1537-5110. DOI: https://doi.org/10.1016/j.biosystem
seng.2022.03.003. URL : https://www.sciencedirect.com/science/article/pii
/S1537511022000551.
[8] Ma Jiaying et al. “Functions of Nitrogen, Phosphorus and Potassium in Energy Status and
Their Influences on Rice Growth and Development.” In: Rice Science 29.2 (2022), pp. 166–
37
5.2 Future Work 38
Table A.1 – Table of some good chlorophyll machine learning models with their hyper-
parameters
Models Hyper-parameters
Ridge (PCA) alpha: 100
SVR (PCA) C: 10, gamma: 0.0003
AdaBoost (PCA) learning_rate=1, n_estimators=1000
Table A.2 – Table of some good N concentration machine learning models with their hyper-
parameters
Models Hyper-parameters
XGBoost max_depth: 6, learning_rate: 0.01443, n_estimators: 146,
min_child_weight: 4, gamma: 0.0000232, subsample:
0.096, colsample_bytree: 0.267, reg_alpha: 0.00000057,
reg_lambda: 0.00000097766
SVR (PCA) C: 1, gamma: 1, kernel: linear
AdaBoost (PCA) learning_rate: 0.01, n_estimators: 250
Table A.3 – Table of some good P concentration machine learning models with their hyper-
parameters
Models Hyper-parameters
AdaBoost learning_rate: 0.1, n_estimators: 1000
CatBoost learning_rate: 0.017, depth: 13, l2_leaf_reg: 1.5, min_-
child_samples: 4
SVR (PCA) C: 1, gamma: 1, kernel: linear
vii
A Hyperparameters for ML learning models viii
Table A.4 – Table of some good K concentration machine learning models with their hyper-
parameters
Models Hyper-parameters
AdaBoost (PCA) learning_rate: 0.01, n_estimators: 250
LGBM reg_alpha: 6.71, reg_lambda: 0.00155, colsample_bytree:
0.7, subsample: 0.8, learning_rate: 0.006, max_depth: 20,
num_leaves: 2435, min_child_samples: 104, min_data_-
per_groups: 8
SVR C: 0.1, gamma: 1, kernel: linear