Presentation 4
Presentation 4
USING
MACHINE LEARNING
MACHINE LEARNING MODEL FOR CROP
RECOMMENDATION USING PYTHON
BY
S.AFIFA 18F71A0502
S.TAHASEEN 18F71A0530
S.ABEEDA 18F71A0501
S.UMMESALMA 18F71A0531
The idea of this project to gather GDP data from multiple data sources and uses
various machine learning algorithms on this data to extract important information
This model can be used for calculating GDP of a country.
In this project Random Forest Regression Technique is used to predict the GDP of a
country.
Random Forest Regression is marginally better it includes variables of literacy , net
migration , infant morality (Per 1000 births literacy (%) phones (per 1000 ) Arable
(%) crops other , birth rate , death rate , agriculture , industry , service etc. . .
INTRODUCTION
GROSS DOMESTIC PRODUCT( GDP )
GDP Stands for Gross Domestic product and represents the total monetary
value of all final goods and services produced (and sold on the market )
within a period of time( typically 1 year ) GDP is the most commonly used
measure of economic activity
DISADVANTAGES OF GDP:
•GDP does not incorporate any measures of welfare.
People are restricted too by the amount of time they can spend
with a resource, since each must be returned to the library within
a set period of time.
SOFTWARE REQUIREMENTS
• Python
• Pandas
• Seaborn
• Sklearn
• Flask
ARCHITECTURE:
numPY :
Matplotlib is a plotting library for the Python programming language and its
Sklearn
Scikit-learn (formerly scikits.learn and also known as sklearn) is a
free software machine learning library for the Python programming language.
• It solves many problems which we had once one of which is the cross-
• Flask does not support for API flask does not support
The training dataset is model data which consists of various attributes and this
dataset is processed and unnecessary attributes and extracted and new dataset
appears to have attributes like
Train test split machine learning
learning algorithms
results
Linear Models
• The term linear model implies that the model is specified as a linear
• The learning process computers one weight each future to form a model
• Decision tree builts regression are classification models in the form of a tree
structure
• It break down a data set into smaller and smaller subsets while at the same
• may be the same type or different types and may or may not be trained on
the same training data.
Random Forest Regression:
• They tell you if you’re making progress , and put a number on it.
• The mean squared error(MSE) is perhaps the simplest and most common loss
function .
• Often taught introductory machine learning courses.
• To calculate the MSE , you take the differences between your models
predictions and the ground truth
mean_squared_log_error
Region:
• A region is an area of land that has common features .
Area:
• GDP estimates the value of the goods and services produced in an area.
• It can be used to compare the size and growth of country economies across the
nation.
Pop density:
• Population density is the number of people per km² of land area.
• To allow comparisons between countries and over time, GDP per capita is
adjusted for price differences between countries and adjusted for inflation –it is
measured in international-$.
• Per capita gross domestic product (GDP) is a financial metric that breaks down a
country's economic output per person and is calculated by dividing the GDP of a nation
by its population.
Literacy:
• Countries with a high literacy rate usually have a high GDP per capita.
• Nations with low GDP frequently have lower literacy rates since the people in that country have
less access to education, and children often have to work to help support the family.
Phones(per 1000):
• Number of mobile phone subscriptions, measured per 100 people versus gross domestic
product (GDP) per capita, measured in constant international-$.
Arable(%):
• Arable land (hectares per person) in India was reported at 0.11564 in 2018.
• according to the World Bank collection of development indicators, compiled from officially recognized sources.
Crop(%):
• The share of agriculture in GDP increased to 19.9 per cent in 2020-21 from 17.8 per cent in 2019-20.
• The last time the contribution of the agriculture sector in GDP was at 20 per cent was in 2003-04.
other(%):
• The services sector accounts for 53.89% of total India's GVA of 179.15 lakh crore Indian
rupees.
• With GVA of Rs. 46.44 lakh crore, the Industry sector contributes 25.92%
Climate:
• Over the past 10 years, storms, wildfires, and floods alone have caused losses of
around 0.3% of GDP per year globally according insurance firm Swiss Re.
• For most countries, exposure to, and costs from climate change are already increasing.
Birthrate:
• There is generally an inverse correlation between income and the total fertility rate within
and between nations.
• The higher the degree of education and GDP per capita of a human population, subpopulation
or social stratum, the fewer children are born in any developed country.
Deathrate:
• In 2021, the crude death rate for the world is 7.64 deaths per thousand population.
• The crude death rate and birth rate of the world have been declining at a moderating rate since
1950.
agriculture:
The share of agriculture in GDP increased to 19.9 per cent in 2020- 21 from 17.8 per cent in
2019-20.The
Industry:
The services sector is the largest sector of India. Gross Value Added (GVA) at current prices for the
services sector is estimated at 96.54 lakh crore INR in 2020-21.
Service:
The services sector accounts for 53.89% of total India's GVA of 179.15 lakh crore Indian rupees.
With GVA of Rs. 46.44 lakh crore, the Industry sector contributes 25.92%. While Agriculture and
allied sector share 20.19%.
19.9 per cent in 2020-21 from 17.8 per cent in 2019-20.
Data PrepRocessing
Data Preprocessing:
o It is a technique that is used to convert the raw data into a clean dataset.
Data preparation - fill in missing data We noticed that there are some missing
data in the table. For simplicity, I will just fill the missing data using the median of the
region that a country belongs, as countries that are close geologically are often similar in
many ways. For example, lets check the region median of 'GDP ($ per capita)', 'Literacy
(%)' and 'Agriculture'. Note that for 'climate' we use the mode instead of median as it seems
that 'climate' is a categorical feature here.
EXPLORATORY DATA ANALYSIS
EDA is an approach to analyze the data visual techniques
UNIVARIATE ANALYSIS
BI-VARIATE ANALYSIS
MULTI-VARIATE ANALYSIS
EDA
We will use Matplotlib and Seaborn library for the data visualization.
Some commonly used graphs are
•Bar Plot
• Bar Plot •Hist Plot •Pair Plot
•Box Plot •Scatter Plot •Heat Map
•Hist Plot •Box Plot
UNIVARIATE:
Three univariate model classes are considered: ARIMA-models (after BOX AND
JENKINS [1976]) ARIMAX-models (Combination of ARIMA-terms and additional
predictors)analysis
Univariate and an “ordinary multiple”form
is the simplest approach with lagged
of analyzing terms of the variables.
data.
UNIVARIATE ANALYSIS
Uni means one, so in other words the data has only one variable.
BI-VARIATE ANALYSIS:
Bivariate analysis is one of the statistical analysis where two variables are observed.
One variable here is dependent while the other is independent.
MULTIVARIATE:
Multivariate analysis is defined as: The statistical study of data where multiple
measurements are made on each experimental unit and where the relationships
among multivariate measurements and their structure are important.
MODELING
Linear Regression:
Linear Regression is the supervised Machine Learning model in which the model finds the best
fit linear line between the independent and Dependent variable i .e it finds the linear
relationship between the dependent and independent variable.
is where only one independent variable is present and the model has
to find the linear relationship of it with the dependent variable. Simple
and Multiple.
MULTIVARIATE:
there are more than one independent variables for the model to find the
Relationship
Equation of Simple Linear Regression , where bo is the intercept, b1 is coefficient or slope, x is the
independent variable and y is the dependent variable.
Equation of Multiple Linear Regression , where bo is the intercept, b1,b2,b3,b4…,bn are coefficients or
slopes of the independent variables x1,x2,x3,x4…,xn and y is the dependent variable.
A Linear Regression model’s main aim is to find the best fit linear line and the optimal values of intercept
and coefficients such that the error is minimized.
Error is the difference between the actual value and Predicted value and the goal is to reduce this difference.
Let’s understand this with the help of a diagram.
Mathematical Approach:
Residual/Error = Actual values – Predicted Values
Sum of Residuals/Errors = Sum(Actual- Predicted Values)
Square of Sum of Residuals/Errors = (Sum(Actual- Predicted Values)) 2
i.e
PERFORMANCE
TUNING
METRICS:
Evaluation metrics are a measure of how good a model
performance and how well it approximates the relationship.
There are three error metrics that are commonly used for
evaluating and reporting the performance of a regression model.
They are :
Mean Square Error(MSE)
Mean Absolute Error(MAE)
Root Mean Squared Error(RMSE)
Mean Squared Error:
The MSE error tells you how close a regression line is a set of points .It
does this by taking the distances from the points to the regression line ( these
distances are the “errors”) and squaring them .
The squaring is necessary to remove any negative signs.
mse = mean_squared_error(y_test,y_predict)
2.Insert X values in the equation found in step 1 in order to get the respective Y values i.e.
3.Now subtract the new Y values (i.e.) from the original Y values. Thus, found values are the error terms
5.It is also known as the vertical distance of the given point from the regression line.
mae= mean_absolute_error(y_test,y_predict)
Formula:
Mean Absolute Error = (1/n) * ∑|yi – xi|
where,
•Σ: Greek symbol for summation
•yi: Actual value for the ith observation
•xi: Calculated value for the ith observation
•n: Total number of observations
Method 1: Using Actual Formulae
Mean Absolute Error (MAE) is calculated by taking the summation of the absolute difference
between the actual and calculated values of each observation over the entire array and then
dividing the sum obtained by the number of observations in the array.
Syntax:
mean_absolute_error (actual,calculated)
Where:
Mathematical Formula :
Where,
During prediction, when it encounters a new instance ( or test example ) to predict, it finds
the K number of training instances nearest to this new instance
Then predicts the target value for this instance by calculating the mean of the target values of
these nearest neighbors.
Pseudocode :
algorithms that analyze data used for classification and regression analysis.
Lastly , we wanted to predict the results for the obtained values from the
user for which we made use of the FLASK Framework to integrate the
Backend and the frontend.
And we generated the pickle file for our model to generate the predictions for the input
data.
In the next step, we have built a UI for a user to input his data so that once he enters the
information of all the inputs.
The model will process the data and will be recommend the appropriate type of GDP to be
grown in such a condition..
HTML:
HTML stands for Hyper Text Markup Language.
It is used to design web pages using the markup language.
Hypertext defines the link between the web pages and markup language
defines the text document within the tag that define the structure of web
pages.
SYNTAX:
Syntax:
<form> <!--form elements--> </form>
CSS:
CSS (Cascading Style Sheets) is a stylesheet language used to design a
webpage to make it attractive.
The reason for using this is to simplify the process of making web pages
presentable.
Basic Format:
It is the basic structure of HTML webpage and we use CSS style inside webpage.
In a web page, we use internal CSS (i.e. adding CSS code inside <head> tag of
HTML code).
BOOTSRAMP:
Bootstrap is a free and open-source tool collection for creating responsive
websites and web applications. It is the most popular HTML, CSS, and JavaScript
framework for developing responsive, mobile-first websites.
Flask is a web application framework written in Python. Flask is based on the Werkzeug WSGI toolkit
and Jinja2 template engine. Both are Pocco projects.
To understand what Flask is you have to understand few general terms.
1.WSGI: Web Server Gateway Interface (WSGI) has been adopted as a standard for Python web
application development.
2.Werkzeug :It is a WSGI toolkit, which implements requests, response objects, and other utility
functions.
3.jinja2 :jinja2 is a popular templating engine for Python. A web templating system combines a
template with a certain data source to render dynamic web pages.