Methods and Models

The document discusses methods and models for NPK demand prediction including data preparation, feature selection using random forest, and training models like SVR, random forest regression, elastic net regression, gradient boosting, and stacking. Technologies used include Python libraries like Scikit-learn, Pandas, Numpy, Feature Engine, and tools like GitHub, Azure Machine Learning, and FastAPI.

Uploaded by

MINDANOU SHEIKH ALIOU DJAGNE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views12 pages

Methods and Models

Uploaded by

MINDANOU SHEIKH ALIOU DJAGNE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Methods and models

Introduction:
In this chapter we will present the approach we used for our work, it will go from how we
treated our data to the prediction. We will also present the technologies used to carry out
our project successfully
General overview of our pipeline:
Our NPK demand prediction pipeline consists of 3 stages:
- Data loading and processing
- Feature selection using the Random Forest algorithm
- Training and model generation

Data Preparation
This section presents the process we followed for preparation of the dataset to be used by
our machine learning models, the process consists of:
 Data Acquisition
 Data Preprocessing
Data acquisition:
The data collection is a crucial step in setting up a machine learning model, since it is a
learning process that is done with the help of available data. Nowadays we are facing a
great challenge regarding the data collection and reliability especially in agriculture where
data are hardly available in the African continent.
That’s why that the first step in our study was the identification of data sources. This led
us to DAPSA, an entity of ANSD which the statistic national agency of Senegal focused
on agriculture part. They are doing annuals surveys to collect information from farmers
and for our work we used their dataset of 2017-2018 agriculture campaign, which contains
11700 farmers records with 90 features, to build our model.
Data Preprocessing:
Data pre-processing is a key step in the process of generating a machine learning model. It
is the step that focuses on how to have the data ready for the training of the models and
have the best possible result.
We will explain how we managed this process and try to obtain the readiest dataset
possible for the treatment by the machine learning models .
Data Cleaning:
In our dataset we noticed the presence of many null values, so the first thing we did was to
keep the rows where the value of our target variable was different of null value. We
obtained after this task only 2500 records, so in the study of the DAPSA we can say that
only 25% of the farmers used the NPK fertilizer. This is an indication of the low
consumption of the fertilizer in Senegal.
We also deleted the following attributes because we judged them inappropriate for our
study.
• Id_parcelle : refers to the identifier of a field of the farmer
• Id_menage : refers to the identifier of the household of the farmer
• Poids_par :refers to the weight of the field in the survey .
• Id_responsable :refers to the identifier of the responsible of the field
• Unite_production_maximale/minimale : refers to the unit of the
minimum/maximum production of the farmer. We uniformize it to the universal unit
which is the kg.

Features Engineering
We applied standard methods such as filling the missing values, reformatting, removing
unnecessary features in order to make it the base for the creation of the model.
We also encoded our categorical variables because some machine learning algorithms can
only deal with numeric values, so we must transform categorical variables into numeric
variables.
At the end we applied a scaling of our features to have all the values in the same range,
the MinMaxScaler specifically here.
Features Selection
Before applying a machine learning model for our prediction, we had to keep the more
relevant features in our dataset since here we have a lot of features. To do it we applied an
algorithm that give us indication about the features importance, in our project we had
chosen the the random forest algorithm.

We retained the ones that had an importance of 2.5% minimum.

ML Models:
Regression :
Regression is method that helps to have a better view of the relation between independent
variables or features and a depend variable or target. The regression models are used in
predictive analytics to do forecasting or predicting outcomes . To do it the models has to
train labelled data in order to learn the relationship between input and output data .[0]

SVR:
Support Vector Regression allows us to predict discrete values based on the principle of
the SVM .To do so the model tries to fit the best line with a threshold value which is the
distance between the hyperplane and boundary line [1] .

Random Forest Regression:

The Random Forest Regression is a supervised learning algorithm that uses the ensemble
learning which is a technique that combine many regressors to solve complex problem .
To achieve this performance it passes within many steps that are:
- Randomly select k data points from the training set.
- For each k data points we build a decision tree
- Select a number N of trees we want to build and repeat step 1 and 2
- N-tree trees will be built for each new point we will in order to predict the y
value and to get the final predict value we will take the average of each
prediction [2]

Elastic Net Regression :

Elastic Net is a supervised machine learning algorithm that combines the Lasso and Ridge
approaches in order to find coefficients that minimize the sum of error squares .The key
idea of the algorithm is to find a trade off between the Lasso and Ridge in order to have
the best regulator possible .
[3]
Gradient Boosting :
Gradient Boosting is an ensemble technique that has the aim to get a strong model by
underfitting multiples models .The intuition behind it is to minimize error by setting the
target outcomes from the previous models to the next model [4]

Stacking :
Stacking or stacked generalisation was introduced by Wolpert. In the essence, stacking
makes prediction by using a meta-model trained from a pool of base models — the base
models are first trained using training data and asked to give their prediction; a different
meta model is then trained to use the outputs from base models to give the final
prediction.
Stacking is an ensemble technique which produces a meta model trained on the outputs
obtained by training multiple models called base models . This meta model provides the
final prediction of the whole process [5]

Technologies and Tools :

In the process of setting up our model we have used several technologies and tools to
make our work easier. Below we will outline each of them and the role they played in our
project
Python Librairies
Scikit learn :
Scikit learn is a famous python library generally used to do machine learning tasks. It
allowed us to do the preprocessing task, to use machine learning regression models and
metrics to evaluate them.

Pandas :
pandas is intended to facilitate the tasks related to data manipulation and analysis. The
library helped us in the cleaning tasks of our dataset as well as in the exploration and
preprocessing of our data .
Numpy
NumPy is a library aimed at manipulating matrices or multidimensional arrays as well as
mathematical functions operating on these arrays .

Feature Engine:
Feature engine is a library with a great utility in the task of data preprocessing as it has
several transformers, engineered to make things easier for us.

GitHub :
Github is a version management software set up by Linus Torvalds in 2005 ,the creator of
linux, and which became essential in the world of software development. In our project it
helped us to manage the different versions of our codes.
Azure Machine Learning :
Azure machine learning is a cloud service developed by Microsoft to help manage the
lifecycle of machine learning projects. It allows to train machine learning models, to
facilitate their deployment as well as the implementation of pipelines.
In our project we used it mainly to train our model on the cloud and thus facilitate the
access to our codes to some collaborators.
FastAPI :
FastApi is a python web framework that allows the generation of APIs and create web
applications with the template engine Jinja2 among other features. Unlike other python
frameworks like Flask it is created on an asynchronous server namely the
ASGI(Asynchronous Server Gateway Interface). In our project we used it to consume our
model and generate a web application with Jinja2.
0 :https://www.seldon.io/machine-learning-regression-explained#:~:text=Regression
%20is%20a%20technique%20for,used%20to%20predict%20continuous%20outcomes.

1 = https://towardsdatascience.com/unlocking-the-true-power-of-support-vector-
regression-847fd123a4a0#:~:text=Support%20Vector%20Regression%20is%20a,the
%20maximum%20number%20of%20points.
2 : https://levelup.gitconnected.com/random-forest-regression-209c0f354c84
3: https://medium.com/mlearning-ai/elasticnet-regression-fundamentals-and-modeling-in-
python-8668f3c2e39e
4: https://medium.com/analytics-vidhya/introduction-to-the-gradient-boosting-algorithm-
c25c653f826b

Mangalathusivasubramanianpillai Dissertation 2017
No ratings yet
Mangalathusivasubramanianpillai Dissertation 2017
260 pages
Machine Learning Simplified
100% (1)
Machine Learning Simplified
109 pages
Data Collection
No ratings yet
Data Collection
8 pages
Silver Oak College of Computer Application: Subject:Machine Learning
No ratings yet
Silver Oak College of Computer Application: Subject:Machine Learning
15 pages
2-ML Principles
No ratings yet
2-ML Principles
34 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
AIML-Unit 5 Notes-Assignment 5
No ratings yet
AIML-Unit 5 Notes-Assignment 5
24 pages
SML
No ratings yet
SML
8 pages
A to Z of Machine Learning by Rashvandh
No ratings yet
A to Z of Machine Learning by Rashvandh
34 pages
Combine PDF
No ratings yet
Combine PDF
75 pages
Module_-1
No ratings yet
Module_-1
9 pages
(A) What Is Machine Learning? Explain The Impact of Various Machine Learning Techniques in Today's World
No ratings yet
(A) What Is Machine Learning? Explain The Impact of Various Machine Learning Techniques in Today's World
6 pages
ML_notion_1
No ratings yet
ML_notion_1
18 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Seminar Presentation
No ratings yet
Seminar Presentation
25 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
8 pages
ML Notes-1
No ratings yet
ML Notes-1
59 pages
ml_pipeline
No ratings yet
ml_pipeline
6 pages
Machine Learning with Python for Everyone (Addison Wesley Data & Analytics Series) 1st Edition, (Ebook PDF) - Download the ebook and explore the most detailed content
100% (1)
Machine Learning with Python for Everyone (Addison Wesley Data & Analytics Series) 1st Edition, (Ebook PDF) - Download the ebook and explore the most detailed content
60 pages
Unit 5
No ratings yet
Unit 5
11 pages
Lecture 5 - Feature extraction, model building & evaluation
No ratings yet
Lecture 5 - Feature extraction, model building & evaluation
35 pages
PRCV Unit-2
No ratings yet
PRCV Unit-2
24 pages
All About ML
No ratings yet
All About ML
18 pages
T3 Bda
No ratings yet
T3 Bda
27 pages
Unit 1 Machine Learning - PDF Lands
No ratings yet
Unit 1 Machine Learning - PDF Lands
5 pages
Manual Data
No ratings yet
Manual Data
13 pages
Final ML
No ratings yet
Final ML
2 pages
ML Project Report
No ratings yet
ML Project Report
40 pages
COMP1801 - Copy 1
No ratings yet
COMP1801 - Copy 1
18 pages
Machine Learning Model Workflow
No ratings yet
Machine Learning Model Workflow
3 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
No ratings yet
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
22 pages
LECTURE-2
No ratings yet
LECTURE-2
36 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
PID5108657
No ratings yet
PID5108657
8 pages
Intro ML 1 Day
No ratings yet
Intro ML 1 Day
43 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Introduction To Machine Learning PPT Main
No ratings yet
Introduction To Machine Learning PPT Main
15 pages
Machine Learning Basic Principles
No ratings yet
Machine Learning Basic Principles
124 pages
ML Notes
No ratings yet
ML Notes
52 pages
Lec 2
No ratings yet
Lec 2
13 pages
Final pesentation
No ratings yet
Final pesentation
13 pages
MACHINE LEARNING 1-5 (Ai &DS)
100% (1)
MACHINE LEARNING 1-5 (Ai &DS)
60 pages
Irjet V10i395
No ratings yet
Irjet V10i395
4 pages
ML & Statistical Methods in Business
No ratings yet
ML & Statistical Methods in Business
9 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
CS601_Machine Learning_Unit 1_Notes_1672759748
No ratings yet
CS601_Machine Learning_Unit 1_Notes_1672759748
13 pages
ML - Part - A
No ratings yet
ML - Part - A
10 pages
Zzplagiarism
No ratings yet
Zzplagiarism
23 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
10 pages
CE802 Report
No ratings yet
CE802 Report
7 pages
frmCourseSyllabusIPDownload (2)
No ratings yet
frmCourseSyllabusIPDownload (2)
3 pages
ML assignment
No ratings yet
ML assignment
13 pages
ML Module 1
No ratings yet
ML Module 1
12 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
63 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
Machine Learning with Python for Everyone (Addison Wesley Data & Analytics Series) 1st Edition, (Ebook PDF) instant download
100% (2)
Machine Learning with Python for Everyone (Addison Wesley Data & Analytics Series) 1st Edition, (Ebook PDF) instant download
38 pages
Unit 1
No ratings yet
Unit 1
41 pages
Oe Cae 3
No ratings yet
Oe Cae 3
7 pages
ML Training PDF
No ratings yet
ML Training PDF
6 pages
ML Training
No ratings yet
ML Training
6 pages
Sparse Estimation With Math and Python 100 Exercises For Building Logic (Joe Suzuki) (Z-Library)
No ratings yet
Sparse Estimation With Math and Python 100 Exercises For Building Logic (Joe Suzuki) (Z-Library)
254 pages
The Impact of Economic Freedom On Startups
No ratings yet
The Impact of Economic Freedom On Startups
14 pages
PDF 132821 67441
No ratings yet
PDF 132821 67441
10 pages
A Simple Explanation of The Lasso and Least Angle Regression
No ratings yet
A Simple Explanation of The Lasso and Least Angle Regression
3 pages
Boosting and Additive Tree
No ratings yet
Boosting and Additive Tree
26 pages
Supervised Learning With Scikit-learn
No ratings yet
Supervised Learning With Scikit-learn
178 pages
Download full Computational Statistics Handbook with MATLAB 3rd Edition Wendy L. Martinez ebook all chapters
100% (1)
Download full Computational Statistics Handbook with MATLAB 3rd Edition Wendy L. Martinez ebook all chapters
51 pages
ML Cars Group5
No ratings yet
ML Cars Group5
28 pages
Lasso Vs Ridge Vs Elastic 1
No ratings yet
Lasso Vs Ridge Vs Elastic 1
5 pages
CS L03 MachineLearning Basics 01
No ratings yet
CS L03 MachineLearning Basics 01
73 pages
Winbugs: A Tutorial: Anastasia Lykou and Ioannis Ntzoufras
No ratings yet
Winbugs: A Tutorial: Anastasia Lykou and Ioannis Ntzoufras
12 pages
SRM formula sheet
No ratings yet
SRM formula sheet
16 pages
Subset ARMA Selection Via The Adaptive Lasso: Kun Chen and Kung-Sik Chan
No ratings yet
Subset ARMA Selection Via The Adaptive Lasso: Kun Chen and Kung-Sik Chan
9 pages
Sparse Modeling Theory Algorithms and Applications 1st Edition Irina Rishdownload
100% (1)
Sparse Modeling Theory Algorithms and Applications 1st Edition Irina Rishdownload
49 pages
Storm Baylis Heckelei MLReview Paper April 2019
No ratings yet
Storm Baylis Heckelei MLReview Paper April 2019
40 pages
Forecasting Oil Futures Returns With News
No ratings yet
Forecasting Oil Futures Returns With News
16 pages
Reading 3 Machine Learning - Answers
No ratings yet
Reading 3 Machine Learning - Answers
11 pages
Unit 2 (3)
No ratings yet
Unit 2 (3)
100 pages
1. Lecture+Notes+-+Advanced+Regression
No ratings yet
1. Lecture+Notes+-+Advanced+Regression
12 pages
Electric Vehicle Range Prediction-Regression Analysis
No ratings yet
Electric Vehicle Range Prediction-Regression Analysis
37 pages
Regression Analysis Lasso and Ridge Regression 1678810035
No ratings yet
Regression Analysis Lasso and Ridge Regression 1678810035
18 pages
Islp 5
No ratings yet
Islp 5
5 pages
Machine Learning Intro Final
No ratings yet
Machine Learning Intro Final
74 pages
Regression Shrinkage Methods For Clinical Prediction Models Do Not Guarantee Improved Performance: Simulation Study
No ratings yet
Regression Shrinkage Methods For Clinical Prediction Models Do Not Guarantee Improved Performance: Simulation Study
14 pages
A Research On Bitcoin Price Prediction Using Machine Learning Algorithms
No ratings yet
A Research On Bitcoin Price Prediction Using Machine Learning Algorithms
5 pages
Analysis of Temporal Pattern, Causal Interaction and Predictive Modeling of
100% (1)
Analysis of Temporal Pattern, Causal Interaction and Predictive Modeling of
17 pages
A Hybrid Model for the Prediction of Air Pollutants Concentration, Based on Statistical and Machine Learning Techniques
No ratings yet
A Hybrid Model for the Prediction of Air Pollutants Concentration, Based on Statistical and Machine Learning Techniques
13 pages
Reading 3 Machine Learning - Answers
No ratings yet
Reading 3 Machine Learning - Answers
12 pages
Applied Methods PHD Syllabus
No ratings yet
Applied Methods PHD Syllabus
8 pages

Methods and Models

Uploaded by

Methods and Models

Uploaded by

Methods and models

We retained the ones that had an importance of 2.5% minimum.

Random Forest Regression:

Elastic Net Regression :

Technologies and Tools :

You might also like