0% found this document useful (0 votes)

6 views80 pages

Presentation 4

Uploaded by

afifashaik169

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views80 pages

Presentation 4

Uploaded by

afifashaik169

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 80

PREDICTING GDP ACROSS WORLD BY

USING
MACHINE LEARNING
MACHINE LEARNING MODEL FOR CROP
RECOMMENDATION USING PYTHON
BY

 S.AFIFA 18F71A0502
 S.TAHASEEN 18F71A0530
 S.ABEEDA 18F71A0501
 S.UMMESALMA 18F71A0531

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING SRI

SAI INSTITUTE OF TECHNOLOGY AND SCIENCE
INDEX
 ABSTRACTION
 INTRODUCTION
 GROSS DOMESTIC PRODUCT(GDP)
 DISADVANTAGES OF GDP
 LOAD LIBRARIES
 DISADVANTAGES OF LOAD LIBRARIES
 SYSTEM REQUIREMENTS
 ARCHITECTURE
 DATA WRANGLING
 EDA
 MODELLING
 PERFORMANCE TUNNING
 DEPLOYMENT
 FUTURE SCOPE
 CONCLUSION
ABSTRACTION

The idea of this project to gather GDP data from multiple data sources and uses
various machine learning algorithms on this data to extract important information
This model can be used for calculating GDP of a country.
In this project Random Forest Regression Technique is used to predict the GDP of a
country.
Random Forest Regression is marginally better it includes variables of literacy , net
migration , infant morality (Per 1000 births literacy (%) phones (per 1000 ) Arable
(%) crops other , birth rate , death rate , agriculture , industry , service etc. . .
INTRODUCTION
GROSS DOMESTIC PRODUCT( GDP )
GDP Stands for Gross Domestic product and represents the total monetary
value of all final goods and services produced (and sold on the market )
within a period of time( typically 1 year ) GDP is the most commonly used
measure of economic activity
DISADVANTAGES OF GDP:
•GDP does not incorporate any measures of welfare.

•GDP only includes market transactions.

•GDP does not describe income distribution.

•GDP does not describe what is being produced.

•GDP ignores externalities.

•Social Progress Index.

LOAD LIBRARIES
DISADVANTAGES OF LOAD LIBRARIES:

 Public libraries have operating hours; if an individual

arrives too late, she won't be able to access the library's
resources.

 People are restricted too by the amount of time they can spend
with a resource, since each must be returned to the library within
a set period of time.
SOFTWARE REQUIREMENTS

Python technology stack and system requirements:

• Python
• Pandas
• Seaborn
• Sklearn
• Flask
ARCHITECTURE:
numPY :

NumPY is a library for the python programming

language ,adding support for large ,multi-Dimensional
arrays and matrices , along with a large collection of
high-level mathematical functions to operate on these
arrays.
Pandas:
 Pandas is an open-source library that is made mainly for working with
relational or labeled data both easily and intuitively. It provides various data
structures and operations for manipulating numerical data and time series.
 This library is built on top of the NumPy library. Pandas is fast and it has high
performance & productivity for users.
Seaborn:
 Seaborn is an open-source Python library built on top of matplotlib.
 It is used for data visualization and exploratory data analysis.
Seaborn works easily with data frames and the Pandas library.
The graphs created can also be customized easily.
Matplotlib:
 Numerical mathematics extension NumPy. It provides an object-oriented API for
embedding plots into applications using general-purpose GUI toolkits like Tkinter
, wxPython, Qt, or GTK.

 There is also a procedural "pylab" interface based on a state machine (like

OpenGL), designed to closely resemble that of MATLAB, though its use is
discouraged.

 Matplotlib is a plotting library for the Python programming language and its
Sklearn
 Scikit-learn (formerly scikits.learn and also known as sklearn) is a
free software machine learning library for the Python programming language.

 It various classification, regression and clustering algorithms including

support-vector machines, random forests, gradient boosting, k-means and
DBSCAN, and is designed to interoperate with the Python numerical and
scientific libraries NumPy and SciPy.
Bootstrap
• Bootstrap is a free and open-source tool collection for creating

responsive websites and web applications

• It is the most popular HTML,CSS and JavaScript framework for

developing responsive , mobile-first websites.

• It solves many problems which we had once one of which is the cross-

browser compatibility issue

Flask:
Flask

• Flask is a web application framework written in python

• Armin Ronacher who leads an international group of python

enthusiasts named pocco, develops it

• Flask does not support for API flask does not support

dynamic HTML pages

MODULES
• Data preprocessing:

The training dataset is model data which consists of various attributes and this
dataset is processed and unnecessary attributes and extracted and new dataset
appears to have attributes like
Train test split machine learning

• The train-test split is used to estimate the performance of machine

learning algorithms

• That are applicable for prediction – based algorithms/applications

• This method is a fast and easy procedure to perform such that we

can compare own machine learning model results to machine

results
Linear Models

• The term linear model implies that the model is specified as a linear

combination of future based on training data

• The learning process computers one weight each future to form a model

that can predict or estimate the target value

Decision Tree Regression

• Decision tree builts regression are classification models in the form of a tree

structure

• It break down a data set into smaller and smaller subsets while at the same

time and associated decision tree is incremently developed

• The final result is a tree with decision notes and leaf notes
Ensemble
• An ensemble is a machine learning model that combines the predictions from
two or more models .

• The models that contribute to the ensemble , referred to as ensemble

members.

• may be the same type or different types and may or may not be trained on
the same training data.
Random Forest Regression:

• Random forest regression is a supervised learning algorithm that uses

ensemble learning method regression.

• Ensemble learning method is a technique that combines prediction.

• Multiple machine learning algorithms to make a more

Accurate prediction then a single model.
Metrices:

• Performances metrices are a part of every machine learning pipeline.

• They tell you if you’re making progress , and put a number on it.

• All machine learning models , whether it’s Random Forest.

MEAN _SQUARED _ERROR:

• The mean squared error(MSE) is perhaps the simplest and most common loss
function .
• Often taught introductory machine learning courses.
• To calculate the MSE , you take the differences between your models
predictions and the ground truth
mean_squared_log_error

• Mean squared logarithmic error (MSLE)can be intrepeted .

• Measured of the ratio between the true and predicted values .

• Mean squared logarithmic error is, as the name suggests,a variation

of the mean squared error.
overview
Country:

• A country is a distinct territorial body , a state , nation or other political

entity.

• It may be a sovereign state or part of a larger state.

Region:
• A region is an area of land that has common features .

• A region can be defined by natural or artificial features.

Population:
• The whole number of people or inhabitants in a country or region.
• The total of individuals occupying an area or making up a whole

Area:
• GDP estimates the value of the goods and services produced in an area.
• It can be used to compare the size and growth of country economies across the
nation.

Pop density:
• Population density is the number of people per km² of land area.
• To allow comparisons between countries and over time, GDP per capita is
adjusted for price differences between countries and adjusted for inflation –it is
measured in international-$.

Per capita gross domestic product (GDP) measures a country's economic

Coastline(coast/area ratio):
• It contributes to nearly 4% of the total GDP.
The people in the coastal areas rely on the coastal economy as it provides
them with their basic livelihood.
Net migration:
• Net migration is the difference between the number of people
moving into an area (a country, state, or county, for example) and
the number moving out.
• Between 2010 and 2019, more than 7.6 million more people moved into
the United States than left.
Infant mortality:
• Infant mortality is the death of an infant before his or her first
birthday.
• The infant mortality rate is the number of infant deaths for every 1,000
live births.
GDP($per capita):

• Per capita gross domestic product (GDP) is a financial metric that breaks down a
country's economic output per person and is calculated by dividing the GDP of a nation
by its population.
Literacy:
• Countries with a high literacy rate usually have a high GDP per capita.
• Nations with low GDP frequently have lower literacy rates since the people in that country have
less access to education, and children often have to work to help support the family.
Phones(per 1000):
• Number of mobile phone subscriptions, measured per 100 people versus gross domestic
product (GDP) per capita, measured in constant international-$.
Arable(%):
• Arable land (hectares per person) in India was reported at 0.11564 in 2018.
• according to the World Bank collection of development indicators, compiled from officially recognized sources.
Crop(%):

• The share of agriculture in GDP increased to 19.9 per cent in 2020-21 from 17.8 per cent in 2019-20.

• The last time the contribution of the agriculture sector in GDP was at 20 per cent was in 2003-04.
other(%):
• The services sector accounts for 53.89% of total India's GVA of 179.15 lakh crore Indian
rupees.
• With GVA of Rs. 46.44 lakh crore, the Industry sector contributes 25.92%
Climate:
• Over the past 10 years, storms, wildfires, and floods alone have caused losses of
around 0.3% of GDP per year globally according insurance firm Swiss Re.
• For most countries, exposure to, and costs from climate change are already increasing.
Birthrate:
• There is generally an inverse correlation between income and the total fertility rate within
and between nations.
• The higher the degree of education and GDP per capita of a human population, subpopulation
or social stratum, the fewer children are born in any developed country.
Deathrate:
• In 2021, the crude death rate for the world is 7.64 deaths per thousand population.
• The crude death rate and birth rate of the world have been declining at a moderating rate since
1950.
agriculture:

The share of agriculture in GDP increased to 19.9 per cent in 2020- 21 from 17.8 per cent in
2019-20.The

Industry:

The services sector is the largest sector of India. Gross Value Added (GVA) at current prices for the
services sector is estimated at 96.54 lakh crore INR in 2020-21.

Service:

The services sector accounts for 53.89% of total India's GVA of 179.15 lakh crore Indian rupees.
With GVA of Rs. 46.44 lakh crore, the Industry sector contributes 25.92%. While Agriculture and
allied sector share 20.19%.
19.9 per cent in 2020-21 from 17.8 per cent in 2019-20.
Data PrepRocessing
Data Preprocessing:
o It is a technique that is used to convert the raw data into a clean dataset.

Data preparation - fill in missing data We noticed that there are some missing
data in the table. For simplicity, I will just fill the missing data using the median of the
region that a country belongs, as countries that are close geologically are often similar in
many ways. For example, lets check the region median of 'GDP ($ per capita)', 'Literacy
(%)' and 'Agriculture'. Note that for 'climate' we use the mode instead of median as it seems
that 'climate' is a categorical feature here.
EXPLORATORY DATA ANALYSIS
EDA is an approach to analyze the data visual techniques

There are various types of EDA

 UNIVARIATE ANALYSIS
 BI-VARIATE ANALYSIS
 MULTI-VARIATE ANALYSIS
EDA
We will use Matplotlib and Seaborn library for the data visualization.
Some commonly used graphs are

Univariate BIVARIATE MULTIVARIATE

•Bar Plot
• Bar Plot •Hist Plot •Pair Plot
•Box Plot •Scatter Plot •Heat Map
•Hist Plot •Box Plot
UNIVARIATE:
Three univariate model classes are considered: ARIMA-models (after BOX AND
JENKINS [1976]) ARIMAX-models (Combination of ARIMA-terms and additional
predictors)analysis
 Univariate and an “ordinary multiple”form
is the simplest approach with lagged
of analyzing terms of the variables.
data.
UNIVARIATE ANALYSIS
 Uni means one, so in other words the data has only one variable.
BI-VARIATE ANALYSIS:
Bivariate analysis is one of the statistical analysis where two variables are observed.
One variable here is dependent while the other is independent.
MULTIVARIATE:
Multivariate analysis is defined as: The statistical study of data where multiple
measurements are made on each experimental unit and where the relationships
among multivariate measurements and their structure are important.
MODELING
Linear Regression:
Linear Regression is the supervised Machine Learning model in which the model finds the best
fit linear line between the independent and Dependent variable i .e it finds the linear
relationship between the dependent and independent variable.

Linear Regression is of two types:

Simple Linear Regression:

is where only one independent variable is present and the model has
to find the linear relationship of it with the dependent variable. Simple
and Multiple.
MULTIVARIATE:
there are more than one independent variables for the model to find the
Relationship

Equation of Simple Linear Regression , where bo is the intercept, b1 is coefficient or slope, x is the
independent variable and y is the dependent variable.

Equation of Multiple Linear Regression , where bo is the intercept, b1,b2,b3,b4…,bn are coefficients or
slopes of the independent variables x1,x2,x3,x4…,xn and y is the dependent variable.
A Linear Regression model’s main aim is to find the best fit linear line and the optimal values of intercept
and coefficients such that the error is minimized.
Error is the difference between the actual value and Predicted value and the goal is to reduce this difference.
Let’s understand this with the help of a diagram.
Mathematical Approach:
Residual/Error = Actual values – Predicted Values
Sum of Residuals/Errors = Sum(Actual- Predicted Values)
Square of Sum of Residuals/Errors = (Sum(Actual- Predicted Values)) 2
i.e
PERFORMANCE
TUNING
METRICS:
 Evaluation metrics are a measure of how good a model
performance and how well it approximates the relationship.

 There are three error metrics that are commonly used for
evaluating and reporting the performance of a regression model.
They are :
 Mean Square Error(MSE)
Mean Absolute Error(MAE)
 Root Mean Squared Error(RMSE)
Mean Squared Error:
The MSE error tells you how close a regression line is a set of points .It
does this by taking the distances from the points to the regression line ( these
distances are the “errors”) and squaring them .
The squaring is necessary to remove any negative signs.

mse = mean_squared_error(y_test,y_predict)

Steps to find the MSE

1.Find the equation for the regression line.

Steps to find the MSE:

1.Find the equation for the regression line.

2.Insert X values in the equation found in step 1 in order to get the respective Y values i.e.

3.Now subtract the new Y values (i.e.) from the original Y values. Thus, found values are the error terms

4.square the errors found in step 3

5.It is also known as the vertical distance of the given point from the regression line.

6. Divide the value found in step 5 by the total number of observations.

Mean Absolute Error:
Mean absolute error (MAE) is a loss function used for regression. Use MAE when you are doing regression
and don't want outliers to play a big role. The loss is the mean over the absolute differences between true
and predicted values, deviations in either direction from the true value are treated the same way.

mae= mean_absolute_error(y_test,y_predict)
Formula:
Mean Absolute Error = (1/n) * ∑|yi – xi|
where,
•Σ: Greek symbol for summation
•yi: Actual value for the ith observation
•xi: Calculated value for the ith observation
•n: Total number of observations
Method 1: Using Actual Formulae
Mean Absolute Error (MAE) is calculated by taking the summation of the absolute difference
between the actual and calculated values of each observation over the entire array and then
dividing the sum obtained by the number of observations in the array.

Method 2: Using sklearn

sklearn.metrics module of python contains functions for calculating errors for different purposes. It
provides a method named mean_absolute_error() to calculate the mean absolute error of the given
arrays.

Syntax:

mean_absolute_error (actual,calculated)
Where:

•actual- Array of actual values as first argument

•calculated – Array of predicted/calculated values as
second argument
It will return the mean absolute error of the given arrays .
R2 score:
Coefficient of determination also called as R 2 score is used to evaluate the performance of a linear
regression model. It is the amount of the variation in the output dependent attribute which is predictable
from the input independent variable(s). It is used to check how well-observed results are reproduced by
the model, depending on the ratio of total deviation of results described by the model.

Mathematical Formula :

R2= 1- SSres / SStot

Where,

SSres is the sum of squares of the residual errors.

SStot is the total sum of the errors.

Decision Tree:
As we know the target not linear with many features, it is worth
trying some nonlinear models.

 For example, the Decision Tree model

 A decision tree is a flow chart – likes structure in which
each internal mode represents a test on a feature (ex:
whether a coin flip comes up heads or tails)
 Each leaf node represents a class label (decision taken after
computing all features )
K Nearest Neighbors :
The K-Nearest Neighbors (KNN) algorithm is a simple, easy-to-implement
supervised machine learning algorithm that can be used to solve both
classification and regression problems.

 This article demonstrates an illustration of K-nearest neighbours on a

sample random data using sklearn library.

Pre-requisites : Numpy, Pandas, matplotlib, sklearn We’ve been given a

random data set with one feature as the target classes. We’ll try to use KNN to
create a model that directly predicts a class for a new data point based off of
the features.
Choosing a K Value:
Let’s go ahead and use the elbow method to pick a good K Value
K Nearest Neighbors Regression:
 K Nearest Neighbors Regression first stores the training examples.

 During prediction, when it encounters a new instance ( or test example ) to predict, it finds
the K number of training instances nearest to this new instance

 Then predicts the target value for this instance by calculating the mean of the target values of
these nearest neighbors.
Pseudocode :

Store all training examples.

1.Repeat steps 3, 4, and 5 for each test example.
2.Find the K number of training examples nearest to the
current test example.
3.y_pred for current test example = mean of the true target
values of these K neighbors.
4.Go to step 2.
knn.fit(X_train, y_train)
SUPPORT VECTOR
MACHINE(SVM):
As we know the target not linear with many features, it is worth trying some nonlinear
models. For example, the Decision Tree model
 In machine learning, support vector machines (SVMs, also support vector
networks) are supervised learning models with associated learning

 algorithms that analyze data used for classification and regression analysis.

 A Support Vector Machine (SVM) is a discriminative classifier formally

defined by a separating hyperplane.
DEPLOYMENT
 In order to deploy the trained model for the country of a GDP to use off , we would need an
application with the simple user interface which GDP of a country can utilize.
 Thus , here we made a simple web interface using HTML,CSS,BOOTSTRAMP,FLASK

 Lastly , we wanted to predict the results for the obtained values from the
user for which we made use of the FLASK Framework to integrate the
Backend and the frontend.

 And we generated the pickle file for our model to generate the predictions for the input
data.
 In the next step, we have built a UI for a user to input his data so that once he enters the
information of all the inputs.
 The model will process the data and will be recommend the appropriate type of GDP to be
grown in such a condition..
HTML:
 HTML stands for Hyper Text Markup Language.
 It is used to design web pages using the markup language.
 Hypertext defines the link between the web pages and markup language
defines the text document within the tag that define the structure of web
pages.

SYNTAX:

Syntax:
<form>  </form>
CSS:
 CSS (Cascading Style Sheets) is a stylesheet language used to design a
webpage to make it attractive.
 The reason for using this is to simplify the process of making web pages
presentable.
Basic Format:

 It is the basic structure of HTML webpage and we use CSS style inside webpage.

 In a web page, we use internal CSS (i.e. adding CSS code inside <head> tag of
HTML code).
BOOTSRAMP:
Bootstrap is a free and open-source tool collection for creating responsive
websites and web applications. It is the most popular HTML, CSS, and JavaScript
framework for developing responsive, mobile-first websites.

Why we use Bootstrap ?

•It is Faster and Easier way for Web-Development.
•It creates Platform-independent web-pages.
•It creates Responsive Web-pages.
•It designes the responsive web pages for mobile devices too.
•It is Free and open-source framework available on www.getbootstrap.com
Flask – (Creating first simple application):

Flask is a web application framework written in Python. Flask is based on the Werkzeug WSGI toolkit
and Jinja2 template engine. Both are Pocco projects.
To understand what Flask is you have to understand few general terms.

1.WSGI: Web Server Gateway Interface (WSGI) has been adopted as a standard for Python web
application development.

2.Werkzeug :It is a WSGI toolkit, which implements requests, response objects, and other utility
functions.

3.jinja2 :jinja2 is a popular templating engine for Python. A web templating system combines a
template with a certain data source to render dynamic web pages.

Statistical Modelling For Biomedical Researchers
100% (2)
Statistical Modelling For Biomedical Researchers
544 pages
Trackpad Pro Ver. 5.0 Class 6: WINDOWS 11 & MS OFFICE 2021
From Everand
Trackpad Pro Ver. 5.0 Class 6: WINDOWS 11 & MS OFFICE 2021
Nidhi Arora
No ratings yet
Exam AFF700 211210 - Solutions
No ratings yet
Exam AFF700 211210 - Solutions
11 pages
CPIM Formulas Flashcard
No ratings yet
CPIM Formulas Flashcard
4 pages
Trackpad Ver. 2.0 Class 6
From Everand
Trackpad Ver. 2.0 Class 6
Nidhi Arora
No ratings yet
Pyspark PDF
100% (1)
Pyspark PDF
406 pages
bg4 calculatingGDP
No ratings yet
bg4 calculatingGDP
63 pages
Machine Learning Project 3
No ratings yet
Machine Learning Project 3
74 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
INT 354 CA1 Mokshagna
No ratings yet
INT 354 CA1 Mokshagna
8 pages
Machine Learning
No ratings yet
Machine Learning
4 pages
Agenda: - Introduction - Basics - Classification - Clustering - Regression - Use-Cases
No ratings yet
Agenda: - Introduction - Basics - Classification - Clustering - Regression - Use-Cases
30 pages
Reference Papers
No ratings yet
Reference Papers
7 pages
David L. Olson, Desheng Wu - Predictive Data Mining Models (2nd Ed.) - Springer (2020)
No ratings yet
David L. Olson, Desheng Wu - Predictive Data Mining Models (2nd Ed.) - Springer (2020)
127 pages
Analytics Boot Camp
No ratings yet
Analytics Boot Camp
126 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
Py Spark
83% (6)
Py Spark
195 pages
Machine Learning Theory and Practice
No ratings yet
Machine Learning Theory and Practice
299 pages
5.classification in AI - Unit 2
No ratings yet
5.classification in AI - Unit 2
5 pages
Algorithms 1
No ratings yet
Algorithms 1
23 pages
Unit 3
No ratings yet
Unit 3
97 pages
Methods and Models
No ratings yet
Methods and Models
12 pages
Pyspark PDF
0% (1)
Pyspark PDF
239 pages
Report Print
No ratings yet
Report Print
22 pages
Machine Learningand Econometrics
No ratings yet
Machine Learningand Econometrics
80 pages
Top 10 Machine Learning Algorithms With Their Use
100% (1)
Top 10 Machine Learning Algorithms With Their Use
12 pages
Module 5 Machine Learning
No ratings yet
Module 5 Machine Learning
36 pages
House Report
No ratings yet
House Report
26 pages
Classification Algorithms 3rd
No ratings yet
Classification Algorithms 3rd
15 pages
Python 06 MachineLearning
No ratings yet
Python 06 MachineLearning
45 pages
COMP1801 - Copy 1
No ratings yet
COMP1801 - Copy 1
18 pages
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
No ratings yet
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
13 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
Ethics, Uses and Abuses of ML
No ratings yet
Ethics, Uses and Abuses of ML
11 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Machinelearning GateNotes
No ratings yet
Machinelearning GateNotes
105 pages
Learning and Big Data AI, Machine
No ratings yet
Learning and Big Data AI, Machine
42 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
ML & DL Notes
No ratings yet
ML & DL Notes
30 pages
Introduction To Data Science - Lin and Li
No ratings yet
Introduction To Data Science - Lin and Li
403 pages
Pyspark PDF
100% (1)
Pyspark PDF
397 pages
Python Programming123uo00es0440
No ratings yet
Python Programming123uo00es0440
405 pages
Practitioner's Guide To Data Science
No ratings yet
Practitioner's Guide To Data Science
403 pages
Data Science For Civil Engineering Unit 4 Notes
No ratings yet
Data Science For Civil Engineering Unit 4 Notes
18 pages
Library
No ratings yet
Library
23 pages
Introduction To Data Science: Hui Lin and Ming Li
No ratings yet
Introduction To Data Science: Hui Lin and Ming Li
403 pages
Machine Learning Most Important Question For Mid Term Ipu University
No ratings yet
Machine Learning Most Important Question For Mid Term Ipu University
36 pages
Machine Learning Basic Principles
No ratings yet
Machine Learning Basic Principles
124 pages
All About ML
No ratings yet
All About ML
18 pages
Machine Learning Concept1
No ratings yet
Machine Learning Concept1
16 pages
Big Data Analytics Algorithm, Tools in Systematic Review
No ratings yet
Big Data Analytics Algorithm, Tools in Systematic Review
7 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
14 pages
Orange 3
100% (1)
Orange 3
46 pages
Practical Machine Learning R
90% (10)
Practical Machine Learning R
149 pages
University Institute of Computing: Big Data Analytics 22CAH-782
No ratings yet
University Institute of Computing: Big Data Analytics 22CAH-782
27 pages
A Comprehensive Guide To Machine Learning
No ratings yet
A Comprehensive Guide To Machine Learning
152 pages
Py Spark
No ratings yet
Py Spark
427 pages
Learning Apache Spark With Python: Wenqiang Feng
No ratings yet
Learning Apache Spark With Python: Wenqiang Feng
8 pages
Py Spark
No ratings yet
Py Spark
427 pages
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
Trackpad Ver. 1.0 Class 8: Windows 7 & MS Office 2010
From Everand
Trackpad Ver. 1.0 Class 8: Windows 7 & MS Office 2010
Nidhi Arora
No ratings yet
Trackpad iPro Ver. 4.0 Class 6: Windows 10 & MS Office 2019
From Everand
Trackpad iPro Ver. 4.0 Class 6: Windows 10 & MS Office 2019
Team Orange
No ratings yet
Moch. Salman Alfarizi - 20230040195
No ratings yet
Moch. Salman Alfarizi - 20230040195
3 pages
Week 7 - Linear and Multiple Regression
No ratings yet
Week 7 - Linear and Multiple Regression
2 pages
Lecture Notes in Statistics 153: Edited by P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg, Olkin, N. Wermuth, S. Zeger
No ratings yet
Lecture Notes in Statistics 153: Edited by P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg, Olkin, N. Wermuth, S. Zeger
10 pages
Vision-Based Defect Detection in Laser Metal Deposition Process
No ratings yet
Vision-Based Defect Detection in Laser Metal Deposition Process
10 pages
Econometrics by Shalabh Sir
No ratings yet
Econometrics by Shalabh Sir
488 pages
Calibration and Validation of Multiple - Split Sample - N
No ratings yet
Calibration and Validation of Multiple - Split Sample - N
8 pages
The Impact of BIT On FDI
No ratings yet
The Impact of BIT On FDI
17 pages
Analisis COC PT. Astra Internasional TBK
No ratings yet
Analisis COC PT. Astra Internasional TBK
5 pages
Lecture 2.3 Model Validation
No ratings yet
Lecture 2.3 Model Validation
16 pages
One Way Anova
No ratings yet
One Way Anova
23 pages
Practice Multiple Choice Questions and F
No ratings yet
Practice Multiple Choice Questions and F
13 pages
CH 05
No ratings yet
CH 05
64 pages
Harris' Chapter 4: Statistics
No ratings yet
Harris' Chapter 4: Statistics
22 pages
Practice Questions MGT 632: Business Research Methods/ MGB 114: Research Methods
No ratings yet
Practice Questions MGT 632: Business Research Methods/ MGB 114: Research Methods
9 pages
Measurement of Gliadin and Glutenin Content of Flour by NIR Spectros
No ratings yet
Measurement of Gliadin and Glutenin Content of Flour by NIR Spectros
9 pages
Chapter 2 Econometric
No ratings yet
Chapter 2 Econometric
28 pages
Jurnal Ilmu Dan Riset Manajemen e-ISSN: 2461-0593: Pendahuluan
No ratings yet
Jurnal Ilmu Dan Riset Manajemen e-ISSN: 2461-0593: Pendahuluan
16 pages
UMUR Kekurangan Energi Kronis Crosstabulation
No ratings yet
UMUR Kekurangan Energi Kronis Crosstabulation
9 pages
AT2. Quiz - Audit Sampling
No ratings yet
AT2. Quiz - Audit Sampling
5 pages
Augmented Designs Final
100% (1)
Augmented Designs Final
42 pages
Sources of Variation IJFMS 5-2001
No ratings yet
Sources of Variation IJFMS 5-2001
23 pages
Doeslegalizedprostitution PDF
No ratings yet
Doeslegalizedprostitution PDF
49 pages
Unit-I Concepts of Measurement
No ratings yet
Unit-I Concepts of Measurement
33 pages
2-11 ANOVA Analysis of Variance
No ratings yet
2-11 ANOVA Analysis of Variance
81 pages
Matrix Sampling Dan Jurnalnya
No ratings yet
Matrix Sampling Dan Jurnalnya
36 pages
Vapor Pressure Data Analysis and Statistics: ECBC-TR-1422
No ratings yet
Vapor Pressure Data Analysis and Statistics: ECBC-TR-1422
42 pages

Presentation 4

Uploaded by

Presentation 4

Uploaded by

PREDICTING GDP ACROSS WORLD BY

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING SRI

•GDP only includes market transactions.

•GDP does not describe income distribution.

•GDP does not describe what is being produced.

•GDP ignores externalities.

•Social Progress Index.

 Public libraries have operating hours; if an individual

Python technology stack and system requirements:

NumPY is a library for the python programming

 There is also a procedural "pylab" interface based on a state machine (like

 It various classification, regression and clustering algorithms including

responsive websites and web applications

• It is the most popular HTML,CSS and JavaScript framework for

developing responsive , mobile-first websites.

browser compatibility issue

• Flask is a web application framework written in python

• Armin Ronacher who leads an international group of python

enthusiasts named pocco, develops it

dynamic HTML pages

• The train-test split is used to estimate the performance of machine

• That are applicable for prediction – based algorithms/applications

• This method is a fast and easy procedure to perform such that we

can compare own machine learning model results to machine

combination of future based on training data

that can predict or estimate the target value

time and associated decision tree is incremently developed

• The models that contribute to the ensemble , referred to as ensemble

• Random forest regression is a supervised learning algorithm that uses

• Ensemble learning method is a technique that combines prediction.

• Multiple machine learning algorithms to make a more

• Performances metrices are a part of every machine learning pipeline.

• All machine learning models , whether it’s Random Forest.

• Mean squared logarithmic error (MSLE)can be intrepeted .

• Measured of the ratio between the true and predicted values .

• Mean squared logarithmic error is, as the name suggests,a variation

• A country is a distinct territorial body , a state , nation or other political

• It may be a sovereign state or part of a larger state.

• A region can be defined by natural or artificial features.

Per capita gross domestic product (GDP) measures a country's economic

There are various types of EDA

Univariate BIVARIATE MULTIVARIATE

Linear Regression is of two types:

Simple Linear Regression:

Steps to find the MSE

1.Find the equation for the regression line.

1.Find the equation for the regression line.

4.square the errors found in step 3

6. Divide the value found in step 5 by the total number of observations.

Method 2: Using sklearn

•actual- Array of actual values as first argument

R2= 1- SSres / SStot

SSres is the sum of squares of the residual errors.

SStot is the total sum of the errors.

 For example, the Decision Tree model

 This article demonstrates an illustration of K-nearest neighbours on a

Pre-requisites : Numpy, Pandas, matplotlib, sklearn We’ve been given a

Store all training examples.

 A Support Vector Machine (SVM) is a discriminative classifier formally

Why we use Bootstrap ?

You might also like