0% found this document useful (0 votes)

27 views74 pages

Machine Learning Project 3

Uploaded by

afifashaik169

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views74 pages

Machine Learning Project 3

Uploaded by

afifashaik169

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 74

MACHINE LEARNING MODEL FOR finding GDP USING

PYTHON BACHELORE OF TECHNOLOGY

IN
COMPUTER SCIENCE AND ENGINEERING
BY
 S.AFIFA (18F71A0502)
 S.UMME SALMA (18F71A0531)
 S.TAHASEEN (18F71A0530)
 S.ABEEDA (18F71A0501)
Under the estimated guidance of
Mr.G.Mohammad Rafi M.Tech.,
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SRI SAI INSTITUTE OF TECHNOLOGY AND SCIENCE
PREDICTING GDP OF A COUNTRY

BY USING
MACHINE LEARNING
INDEX
 ABSTRACTION  ARCHITECTURE
 INTRODUCTION  DATA WRANGLING
 GROSS DOMESTIC PRODUCT(GDP)  EDA
 EXISTING SYSTEM  MODELLING
 PROPOSED SYSTEM  PERFORMANCE METRICS
 ADVANTAGES OF PROPOSED SYSTEM  DEPLOYMENT
 LOAD LIBRARIES  FUTURE SCOPE
 SYSTEM REQUIREMENTS  CONCLUSION
ABSTRACTION
This project is to gather GDP data from multiple data sources and
uses various machine learning algorithms on this data to extract
important information This model can be use for calculating GDP
of a country . In this project Linear Regression Techniques is used
to predict the GDP of a country .
INTRODUCTION
GROSS DOMESTIC PRODUCT( GDP )
GDP Stands for Gross Domestic product and represents the total monetary
value of all final goods and services produced (and sold on the market )
within a period of time( typically 1 year ) GDP is the most commonly used
measure of economic activity
Existing System :

 GDP impacts personal finance, investments, and job growth.

 Investors look at a nation's growth rate to decide if they should adjust their asset
allocation, as well as compare country growth rates to find their best
international opportunities.
 They purchase shares of companies that are in rapidly growing countries.
 Gross Domestic Product (GDP) is calculated using five elements: Consumption (C);
Investment (I); Government Spending (G); and Exports (X) minus imports (M).
Disadvantages of Existing System :

 The failure to account for or represent the degree of income inequality in society.
 The failure to indicate whether the nation's rate of growth is sustainable or not.
 The exclusion of non-market transactions
 The failure to account for or represent the degree of income inequality in society
 The failure to indicate whether the nation’s rate of growth is sustainable or not
PROPOSED SYSTEM
 In the proposed system we will build a machine learning model by training
the model with training dataset and this model will help Business firms to
decide where to invest and buy the shares in that location

 A countries data like population, literacy , birth rate etc.., is given input to
the machine learning model and our output label is GDP

 Most economists, politicians and businesses like to see GDP rising

steadily.
ADVANTAGES OF PROPOSED SYSTEM

 GDP enables policymakers and central banks to judge whether the

economy is contracting or expanding and promptly take necessary
action.

 GDP helps government decide how much it can spend on public

services and how much it needs to raise in taxes.
SOFTWARE REQUIREMENTS
HARDWARE REQUIREMENTS:
✓ SYSTEM : INTEL I5
✓ HARD DISK : 500 GB
✓ RAM : 4 GB
✓ OPERATING SYSTEM : WINDOWS 10, 11 and above
versions
SOFTWARE REQUIREMENTS:
✓ WEB FRAMEWORK : FLASK
✓ TECHNOLOGY : PYTHON,HTML,CSS,BOOTSTRAP
✓ LIBRARIES USED : Pandas, NumPy, Seaborn, Matplotlib, Scikit-
Learn
HOSSTING ENVIRONMENT : HEROKU
ARCHITECTURE: Understand the
problem

Data
Gathering

Data
Wrangling

EDA

Modelling

Performance
Metrics

Deployment
MODULES:

This PROJECT CONTAINS FOUR MODULES:

 EXPLORATORY DATA ANALYSIS

 MODELLING
 PERFORMANCE METRICS
 DEPLOYMENT
DATA WRANGLING
&
EXPLORATORY DATA ANALYSIS
LOAD LIBRARIES
We install and import all these libraries in python

 Pandas is defined as an open-source library that provides high performance data

manipulation in Python.

 They provide you with a huge set of important commands and features which are used to
easily analyze your data.

 It used for working with data set. It has functions for analyzing, cleaning, exploring, and
manipulating data.

 We get the insights about the dataset using some functions in pandas, such as
head(),tail(),info(),describe(),sample().

 There are several useful functions for detecting, removing and replacing null values in pandas
data frame, such as isnull (), fillna(), replace(), drop()
LOAD LIBRARIES

NumPy (Numerical Python) is an open-source core Python library for scientific

computations. It is a general-purpose array and matrices processing package.
Numpy is compatible with, and used by many other popular Python packages,
including pandas and matplotlib. Numpy makes many mathematical
operations used widely in scientific computing fast and easy to use.
LOAD LIBRARIES
 Matplotlib is a low level graph plotting library in python that serves as a visualization utility.
 Matplotlib is open source and we can use it freely.
 Pie plot is a Matplotlib module provides functions that interact with the figure, decorates the
plot with labels, creates plotting area in a figure. Different plots can be plotted using this
library they are

 Bar Graph  Histogram

 Pie Chart  Line Chart
 Box Plot  Scatter Plot
LOAD LIBRARIES

Seaborn is a library mostly used for statistical plotting in python. It is built on top of
Matplotlib and Provides beautiful default styles and color palettes to make statistical
plots more attractive. Different categories of plot in Seaborn Plots are basically used for
visualizing the relationship between variables. Those variables can be either be
completely numerical or a category like a group, class or division.
 Relational plots
 CATEGORICAL PLOTS
 DISTRIBUTION PLOTS
 REGRESSION PLOTS
 METRICS PLOTS
DATA GATHERING

 This dataset was build by augmenting dataset will be

focusing on the factors that affecting a country's GDP
per capita and try to make a model using the data of
227 countries from the database ( reference: Kaggle)
The Data set Used in Our Model include the following Attributes.

 Infant mortality (per 1000 births)  Deathrate rate of a country

 Literacy (%)  Service
 Phones (per 1000)  Label-GDP ($ per capita)
 Arable (%)  Selecting the country name
 Crops (%)  Selecting the Region
 Other (%)  Calculating the Population in the country
 Birth rate of a country  Selecting the Area of the Country in (sq. mi.)
 Agriculture  Calculating the Population density in (per sq. mi.)
 Industry  Ratio of the coast per area
 Service  Taking Net Migration of the country
 Label-GDP ($ per capita)  Agriculture
 Climate  Industry
READING THE DATASET AND GETTING INSIGHTS ABOUT THE DATA

We get the insights about the data using the following functions

 df.shape --for the shape of the data

 df.describe() -- for the distribution of data.
 df.info() -- for the columns and their data types
 df.head() -- for the first 5 rows
 df.tail() -- for the last 5 rows
 df.isnull().sum()—to check the null values in columns
DATA WRANGLING
One of the first steps is to make sure that the dataset we are using is accurate. The dataset
should not have any missing values and if the dataset does have missing values, they should be
replaced by the appropriate value.

Handling Missing Values:

There are three types to fill missing values

 Mean = df['column_name'].fillna(df['column_name'].mean()[0], inplace = True)

 Median = df['column_name'].fillna(df['column_name'].median()[0], inplace = True)
 Mode = df['column_name'].fillna(df['column_name’].mode()[0], inplace = True)
Outlier Treatment:
Outliers can be filled by Percentile and Quantile
EXPLORATORY DATA ANALYSIS
DATA VISUALIZATION
Data Visualization is the process of analyzing data in the form of graphs or maps, making it easier to
understand patterns in the data.

There are various types of Visualizations:

✓ UNIVARIATE ANALYSIS

✓ BI-VARIATE ANALYSIS

✓ MULTI-VARIATE ANALYSI
EXPLORATORY DATA ANALYSIS
We will use Matplotlib and Seaborn library for the data visualization.
Some commonly used graphs are

Univariate BIVARIATE MULTIVARIATE

•Bar Plot
• Bar Plot
•Hist Plot •Pair Plot
•Box Plot
•Scatter Plot •Heat Map
•Hist Plot
•Box Plot
UNIVARIATE NUMERICAL FEATURE ANALYSIS:
Three univariate model classes are considered: ARIMA-models (after BOX AND JENKINS
[1976]) ARIMAX-models (Combination of ARIMA-terms and additional predictors) and an
“ordinary multiple” approach with lagged terms of the variables.

 Univariate analysis is the simplest form of analyzing data.

 Uni means one, so in other words the data has only one variable.
Univariant Categorical Analysis
BI-VARIATE NUMERICAL FEAUTURE ANALYSIS:
Bivariate analysis is one of the statistical analysis where two variables are observed.
One variable here is dependent while the other is independent.
Bivariant categorical analysis
Multivariate Analysis :

The statistical study of data where multiple measurements are made on each

experimental unit and where the relationships among multivariate measurements

and their structure are important.

CORELATION BETWEEN TWO VARIABLES

The heatmap shows the correlation between all numerical columns.

MODELING
FEATURE SELECTION

After performing the data cleaning and visualizations, we implemented our

machine learning algorithms on the features of the dataset.
X =is the set of input features from the data set.
y =is the output feature from the data set.
df = dataset
For example :
X = df.drop(columns= ['GDP ($ per capita)','Country' , 'Region’])
y = df['GDP ($ per capita)']
SCIKIT-LEARN LIBRARY

 Scikit-learn is an open source Machine Learning Python package that offers

functionality supporting supervised and unsupervised learning.
 Additionally, it provides tools for model development, selection and evaluation
as well as many other utilities including data pre-processing functionality.

 Supervised Learning :
Supervised learning is an approach to creating Artificial Intelligence (AI) where a computer
algorithm is trained an input data have been labelled for a particular
 Unsupervised Learning :
Unsupervised learning is an approach to creating Artificial Intelligence (AI) where a
computer algorithm is trained an input data does not have labelled for a particular
Methods in Scikit-Learn Package:

fit_transform()
It joins the fit() and transform() method for the transformation of the dataset. It is used on the training
data so that we can scale the training data and also learn the scaling parameters.

fit()
This method calculates the parameters μ(mean) and σ(standard deviation) and saves them as internal
objects.

transform()
Using these same parameters, using this method we can transform a particular dataset. Used for pre-
processing before modeling.

predict()
Use the above-calculated weights on the test data to make the predictions
SPLITTING

 The next step is building the machine learning model.

 While building the machine learning model, first we need to split our dataset into 2 parts
i.e.: training data and test data.

The Syntax for splitting is given below :

from sklearn.model_selection import train_test_split

X_train,X_test, y_train, y_test = train_test_split(X, y, test_size = .2, stratify =y,random_state=1)
SCALING:

 Scaling is a technique to standardize the independent features present

in data in a fixed range.
 It is performed during the data pre-processing to handle highly varying
units.

Techniques to perform Scaling are

 Standard Scalar
 Min-Max Scalar
 STANDARD SCALAR :
It is very effective technique which rescales a feature value so that it
has distribution with 0-Mean value and 1-Variance.
 MIN-MAX SCALAR :
This technique rescales a feature or observation value between 0 and
1. The Scikit learn provides the implementation of scaling in a preprocessing
package. We import MinMaxScalar or StandardScalar from preprocessing
package to perform scaling.

Syntax :
from sklearn.preprocessing import StandardScaler
my_scalar = StandardScaler()
X_train_scaled= my_scalar.fit_transform(X_train)
X_test_scaled= my_scalar.transform(X_test_scaled , y_test)
Linear Regression:
Linear Regression is a machine learning algorithm based on supervised learning. It
performs a regression task. Regression models a target prediction value based on
independent variables.
Linear Regression Formula:

While training the model we are given :

x: input training data
y: labels to data (supervised learning)
θ1: intercept
θ2: coefficient of x
Gradient Descent
To update θ1 and θ2 values in order to reduce Cost function (minimizing RMSE
value) and achieving the best fit line the model uses Gradient Descent. The idea is to
start with random θ1 and θ2 values and then iteratively updating the values, reaching
minimum cost.

Syntax:
from sklearn.linear_model import LinearRegression
model_linear=LinearRegression()
model_linear.fit(X_train_scaled,y_train)
linear = model_linear.score(X_test_scaled,y_test)
SUPPORT VECTOR
MACHINE(SVM):
 In machine learning, support vector machines (SVMs, also support vector
networks) are supervised learning models with associated learning

 Algorithms that analyze data used for classification and regression analysis.

 A Support Vector Machine (SVM) is a discriminative regressor formally

defined by a separating hyperplane.

They are two types:

 Linear SVM – It is used when data is linearly separable

 Non Linear SVM – It is used when data is not linearly separable

By using Kernel converting lower dimension data into higher dimension data

Syntax:
from sklearn import svm
model_svm = svm.SVR()
model_svm.fit(X_train_scaled,y_train)
svm=model_svm.score(X_test_scaled, y_test)
Decision Tree Regression

 Decision tree builds regression or

classification models in the form of a tree
structure.
 It breaks down a dataset into smaller and
smaller subsets while at the same time an
associated decision tree is incrementally
developed.
 The final result is a tree with decision
nodes and leaf nodes.Discrete output
example:
 In decision tree each split minimize the impurity
 To calculate impurity of nodes:
 Gini = 1-Σ(Pi)^2
 Entropy = 1-ΣPi log₂ Pi
 Information Gain =parent node entropy – Σweighted entropy of child nodes

SYNTAX:

from sklearn.tree import DecisionTreeRegressor

model_decision = DecisionTreeRegressor(random_state = 100,
max_depth = 3,
min_samples_leaf = 3)
model_decision .fit(X_train, y_train)
decisiontree = model_decision.score(X_test, y_test)
RANDOM FOREST ALGORITHM

 Random Forest is a supervised ensemble machine learning algorithm

used in both classification as well as regression problems.
 It contains various decision trees and an average of it is taken so as to
give the output.
 As decision tree are prone to over fitting, random forest is useful in
reducing the effect of over fitting and hence giving a more accurate
output.
RANDOM FOREST REGRESSION FLOWCHART
SYNTAX:

from sklearn.ensemble import RandomForestRegressor

model_rfr = RandomForestRegressor(n_estimators = 50,
max_depth = 6,
min_weight_fraction_leaf = 0.05,
max_features = 0.8,
random_state = 42)
model_rfr.fit(X_train, y_train)
PERFORMANCE
TUNING
METRICS:
 Evaluation metrics are a measure of how good a model
performance and how well it approximates the relationship.

 There are three error metrics that are commonly used for
evaluating and reporting the performance of a regression model.

They are :

 MEAN SQUARED ERROR(MSE)

 MEAN ABSOLUTE LOG ERROR(MAE)
 R2 SCORE
Metrices:

• Performances metrices are a part of every machine learning

pipeline.

• They tell you if you’re making progress , and put a number on it.

• All machine learning models , whether it’s Random Forest.

mean_squared_log_error:

• Mean squared logarithmic error (MSLE)can be intrepreted .

• Measured of the ratio between the true and predicted values .

• Mean squared logarithmic error is, as the name suggests , a

variation of the mean squared error.

Mean Squared Error:
The MSE error tells you how close a regression line is a set of points .It does this by
taking the distances from the points to the regression line (these distances are the
“errors”) and squaring them .

The squaring is necessary to remove any negative signs.

mse = mean_squared_error(y_test,y_predict)
Steps to find the MSE:

1.Find the equation for the regression line.

2.Insert X values in the equation found in step 1 in order to get the respective Y values i.e.

3.Now subtract the new Y values (i.e.) from the original Y values. Thus, found values are the error terms.

4.square the errors found in step 3.

5.It is also known as the vertical distance of the given point from the regression line.

6. Divide the value found in step 5 by the total number of observations.

Mean Absolute Error:
Mean absolute error (MAE) is a loss function used for regression. Use MAE when you are doing
regression and don't want outliers to play a big role. The loss is the mean over the absolute
differences between true and predicted values, deviations in either direction from the true value
are treated the same way.

mae= mean_absolute_error(y_test,y_predict)

Formula:
Mean Absolute Error = (1/n) * ∑|yi – xi|
where,
•Σ: Greek symbol for summation
•yi: Actual value for the ith observation
•xi: Calculated value for the ith observation
•n: Total number of observations
Method 1: Using Actual Formulae
Mean Absolute Error (MAE) is calculated by taking the summation of the absolute difference
between the actual and calculated values of each observation over the entire array and then
dividing the sum obtained by the number of observations in the array.

Method 2: Using sklearn

sklearn.metrics module of python contains functions for calculating errors for different purposes. It
provides a method named mean_absolute_error() to calculate the mean absolute error of the given
arrays.

Syntax:
mean_absolute_error (actual , calculated)
R2 score:
Coefficient of determination also called as R 2 score is used to evaluate the performance of a linear
regression model. It is the amount of the variation in the output dependent attribute which is predictable
from the input independent variable(s). It is used to check how well-observed results are reproduced by
the model, depending on the ratio of total deviation of results described by the model.

Mathematical Formula :
R2= 1- SSres / SStot

Where,

SSres is the sum of squares of the residual errors.

SStot is the total sum of the errors.

DEPLOYMENT
DEPLOYMENT
We developed a web application with simple user interface to end user to use
our model to predict the recommended crop in his location.
TOOLS USED

Frontend Tools: HTML,CSS&BOOTSTRAP.

Backend Tools: Python, Web Frame Work: Flask.

Lastly ,we wanted to predict the results for the obtained values from the
user for which we made use of the FLASK Framework to integrate the backend
and the frontend.
DEPLOYMENT
HTML:

✓ HTML stands for Hyper Text Markup Language.

✓ HTML is the standard markup language for creating Web pages.
✓ HTML describes the structure of a Web page.
✓ HTML consists of a series of elements.
✓ HTML elements tell the browser how to display the content.
✓ HTML elements label pieces of content such as "this is a
heading", "this is a paragraph", "this is a link", etc.
DEPLOYMENT
CSS:

WHAT IS CSS ?
✓ CSS stands for Cascading Style Sheets.
✓ CSS describes how HTML elements are to be displayed on screen, paper, or in
other media.
✓ CSS saves a lot of work. It can control the layout of multiple web pages all at once.
✓ External style sheets are stored in CSS files.
DEPLOYMENT
BOOTSRAMP:
Bootstrap is a free and open-source tool collection for creating responsive
websites and web applications. It is the most popular HTML, CSS, and
JavaScript framework for developing responsive, mobile-first websites.

Why we use Bootstrap ?

•It is Faster and Easier way for Web-Development.
•It creates Platform-independent web-pages.
•It creates Responsive Web-pages.
•It designes the responsive web pages for mobile devices too.
DEPLOYMENT
Flask – (Creating first simple application):
Flask is an API of Python that allows us to build up web-applications. A Web-
Application Framework is the collection of modules and libraries that helps
the developer to write applications without writing the low-level codes such
as protocols, thread management, etc. Flask is based on WSGI(Web Server
Gateway Interface) toolkit and Jinja2 template engine. Python 2.6 or higher is
usually required for installation of Flask
DEPLOYMENT
SET UP PROJECT

Python projects live in virtual environments. So, we need to install virtual environment.

✓ To setup the project first Create the Project Directory, then in terminal goto that directory and use the
following command to install virtual environment.

pip install virtualenv

✓ After installing virtualenvironment create the virtual environment by below command.

virtualenv my_venv (Name of virtual environment)

✓ Now we should Activate the virtualenvironment, to activate use the following command

my_venv/scripts/activate

✓ After activating the virtual environment we should install the flask, to import Flask module in python

pip inatall flask

DEPLOYMENT
 Now create an app that hosts the application.
app = Flask(__name__)
✓ Then you need a route that calls a Python function. A route maps what you type
in the browser (the url)to a Python function.
@app.route(‘/’)
def index():
✓ The function should return something to the web browser, so use the return
statement.
✓ To run the application use the below code
app.run(debug=True)
✓ Now run the python file in Terminal, you will get the URL.

✓ Enter the url in your web browser, To see the website

DEPLOYMENT
✓ It is possible to return the output of a function bound to a certain URL in the form of HTML.

✓ Generating HTML content from Python code is difficult.

✓ This is where one can take advantage of Jinja2 template engine, on which Flask is based. Instead of
returning hardcode HTML from the function, a HTML file can be rendered by the render_template( )
function

page
App.py Python Flask Jinja
input data
data

template

Index.html
DEPLOYMENT
✓ A web application often requires a static file such as a javascript file or
a CSS file supporting the display of a web page.
✓ Usually, the web server is configured to serve them for you, but
during the development, these files are served from static folder in your
package or next to your module and it will be available at /static on the
application
DEPLOYMENT
 To Deploy the project in the CLOUD, we are using HEROKU Platform.
 Heroku is a platform as a service (PaaS) that enables developers to build, run, and operate
applications entirely in the cloud.
 To Deploy the Project Using Heroku we need to follow the following Steps.
 In the terminal go to Project Location and install the gunicorn which is a python
 Now go to your Folder and Create the folder Procfile without extensions.
 In Terminal use this commands pip freeze > requriements.txt .
 Install the Git and HerokuCli in your system.
 Now Initialize the git in terminal. To initialize use git init
 To add files use git add
Commit the git git commit -m “msg”
DEPLOYMENT
 Login into Heroku, use Heroku login

 Now create the application using heroku AppName

 Add a remote to your local repository

heroku git: remote –a AppName

 Push the files to Heroku

git push heroku master

Now application is deployed into the cloud.

WELCOME PAGE
DEPLOYMENT

Form page for giving the input values

 After putting the values by the user , we get the results of the model in the
way mentioned below
FUTURE SCOPE

This applicant can be implemented as a mobile app and be made

available free for the business to download which they can use it to
predict the GDP of a country where the can buy shares in that
location.
CONCLUSION
:  we have made this project to business improvements to use the
current technology instead of relying on old methods.

 Using machine learning algorithms, we trained the model with 4

following algorithms namely logistic regression, Decision Tree,
SVM & Random Forest. And we got highest accuracy of 83% with
Random Forest Model. So we selected this model as best model
for future prediction.
THAnk’s
FROM

 S.AFIFA  S.ABEEDA
 S.TAHASEEN  S.UMMESALMA

Herdsmen Farmers Conflict and Food Security in Nigeria A Case Study of Lau Lga
100% (3)
Herdsmen Farmers Conflict and Food Security in Nigeria A Case Study of Lau Lga
58 pages
The Concept of An Insect Pest
No ratings yet
The Concept of An Insect Pest
8 pages
Hanke, John E. - Wichern, Dean W. - Business Forecasting
No ratings yet
Hanke, John E. - Wichern, Dean W. - Business Forecasting
45 pages
Internship Report in Accounting Field
No ratings yet
Internship Report in Accounting Field
25 pages
May Thu, MEcon. Stats. Roll.4
No ratings yet
May Thu, MEcon. Stats. Roll.4
90 pages
Carhart Model
100% (1)
Carhart Model
12 pages
Isolation of Citral From Lemongrass Oil Using Steam Distillation Statistical Optimization by Response Surface Methodolog
No ratings yet
Isolation of Citral From Lemongrass Oil Using Steam Distillation Statistical Optimization by Response Surface Methodolog
10 pages
Topic 6 Simple Linear Regression
No ratings yet
Topic 6 Simple Linear Regression
57 pages
Engineering, Construction and Architectural Management
No ratings yet
Engineering, Construction and Architectural Management
12 pages
Model-Based Control of The Mitsubishi PA-10 Robot Arm: Application To Robot-Assisted Surgery
No ratings yet
Model-Based Control of The Mitsubishi PA-10 Robot Arm: Application To Robot-Assisted Surgery
6 pages
Auto/cross-Correlation: Generalized Regression Model
No ratings yet
Auto/cross-Correlation: Generalized Regression Model
37 pages
R Notes For Data Analysis and Statistical Inference
No ratings yet
R Notes For Data Analysis and Statistical Inference
10 pages
CH 12
No ratings yet
CH 12
30 pages
Analytically Based Equations For Distortion and Residual Stress Estimations of Thin Butt-Welded Plates
No ratings yet
Analytically Based Equations For Distortion and Residual Stress Estimations of Thin Butt-Welded Plates
10 pages
Anova
No ratings yet
Anova
24 pages
Trade, Farmers' Heterogeneity, and Agricultural Productivity
No ratings yet
Trade, Farmers' Heterogeneity, and Agricultural Productivity
22 pages
Rahul Project Mba
No ratings yet
Rahul Project Mba
46 pages
Improve - 7 - Fractional Factorial Experiments - v12-1
No ratings yet
Improve - 7 - Fractional Factorial Experiments - v12-1
36 pages
How Customers Are Willing To Pay Price Premium On The Bases of Brand Image For Food Brands?
No ratings yet
How Customers Are Willing To Pay Price Premium On The Bases of Brand Image For Food Brands?
15 pages
Sampling Errors - 6 PDF
No ratings yet
Sampling Errors - 6 PDF
13 pages
Sample Question Econometrics
No ratings yet
Sample Question Econometrics
11 pages
Guide To Intelligent Data Analysis
No ratings yet
Guide To Intelligent Data Analysis
398 pages
Model Summary: A. Predictors: (Constant), Shelfspace
No ratings yet
Model Summary: A. Predictors: (Constant), Shelfspace
3 pages
Presentation 4
No ratings yet
Presentation 4
80 pages
Amit Soni
No ratings yet
Amit Soni
24 pages
bg4 calculatingGDP
No ratings yet
bg4 calculatingGDP
63 pages
Problem Set 2 Solutions
No ratings yet
Problem Set 2 Solutions
34 pages
Estimating Residential Water Demand
No ratings yet
Estimating Residential Water Demand
8 pages
Module 2notes
No ratings yet
Module 2notes
44 pages
Problem Set 2 With Solutions
No ratings yet
Problem Set 2 With Solutions
5 pages
Unit 2
No ratings yet
Unit 2
36 pages
Ccs346 Eda Unit 1
No ratings yet
Ccs346 Eda Unit 1
139 pages
Irjet Data Science Project On GDP Analys
No ratings yet
Irjet Data Science Project On GDP Analys
10 pages
IJERT Data Analysis Using Python
No ratings yet
IJERT Data Analysis Using Python
6 pages
Data Analytics
No ratings yet
Data Analytics
24 pages
Data - Part 1
No ratings yet
Data - Part 1
58 pages
Fpls 14 1290078
No ratings yet
Fpls 14 1290078
13 pages
Machine: Learning
No ratings yet
Machine: Learning
24 pages
Matplotlib Library in Python
No ratings yet
Matplotlib Library in Python
85 pages
Python For Exploratory Data Analysis
No ratings yet
Python For Exploratory Data Analysis
12 pages
Exploratory Data Analysis On Indian Economy Using Python
No ratings yet
Exploratory Data Analysis On Indian Economy Using Python
12 pages
Aa MDM MST
No ratings yet
Aa MDM MST
8 pages
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
Data Preprocess Steps
No ratings yet
Data Preprocess Steps
2 pages
DAC Phase2
No ratings yet
DAC Phase2
8 pages
Principles of Data Science WEB 3
No ratings yet
Principles of Data Science WEB 3
30 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Machine
No ratings yet
Machine
10 pages
Unit1-Data Science Fundamentals
No ratings yet
Unit1-Data Science Fundamentals
35 pages
Data Analysis
No ratings yet
Data Analysis
42 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
Statistical Experimental Study of Geosyn
No ratings yet
Statistical Experimental Study of Geosyn
10 pages
Economic Data Analysis (Finance Analyst)
No ratings yet
Economic Data Analysis (Finance Analyst)
38 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Project Report
No ratings yet
Project Report
37 pages
Eda Indepth
No ratings yet
Eda Indepth
19 pages
Data Analytics
No ratings yet
Data Analytics
34 pages
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
No ratings yet
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
53 pages
Datascience
No ratings yet
Datascience
26 pages
Python Report
No ratings yet
Python Report
6 pages
Data+Visualization+in+Python
No ratings yet
Data+Visualization+in+Python
17 pages
ML Expt 1 Description
No ratings yet
ML Expt 1 Description
15 pages
Data Science in Society Cat
No ratings yet
Data Science in Society Cat
5 pages
Ip Project Matplot (4) Con
No ratings yet
Ip Project Matplot (4) Con
18 pages
Introduction To Data ScienceA Python Approach To Concepts, Techniques and Applications PDF
100% (10)
Introduction To Data ScienceA Python Approach To Concepts, Techniques and Applications PDF
227 pages
Machine Learning Project Report
No ratings yet
Machine Learning Project Report
65 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
Data Science
No ratings yet
Data Science
42 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
BUSINESS INTELLIGENCE Docs
No ratings yet
BUSINESS INTELLIGENCE Docs
12 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
Total Documentation
No ratings yet
Total Documentation
21 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
BDA File
No ratings yet
BDA File
26 pages
Data Analyst Course
No ratings yet
Data Analyst Course
8 pages
Topic 2. Visual Data Analysis in Python: Mlcourse - Ai (Https://mlcourse - Ai)
No ratings yet
Topic 2. Visual Data Analysis in Python: Mlcourse - Ai (Https://mlcourse - Ai)
25 pages
Learneverythingai
No ratings yet
Learneverythingai
9 pages
Data Exploration and Analysis With Python
No ratings yet
Data Exploration and Analysis With Python
9 pages
Machine Learning Statistical Model Using Transportation Data
No ratings yet
Machine Learning Statistical Model Using Transportation Data
32 pages
Module 2
No ratings yet
Module 2
20 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Ip Project
No ratings yet
Ip Project
16 pages
DsNaIT v2.0
No ratings yet
DsNaIT v2.0
43 pages
Digital Vidya Python Data Analytst Course
No ratings yet
Digital Vidya Python Data Analytst Course
18 pages
Big Data Report
No ratings yet
Big Data Report
6 pages
Data Science Essentials For Dummies
From Everand
Data Science Essentials For Dummies
Lillian Pierson
No ratings yet
1 PB
No ratings yet
1 PB
14 pages

Machine Learning Project 3

Uploaded by

Machine Learning Project 3

Uploaded by

MACHINE LEARNING MODEL FOR finding GDP USING

PYTHON BACHELORE OF TECHNOLOGY

 GDP impacts personal finance, investments, and job growth.

 Most economists, politicians and businesses like to see GDP rising

 GDP enables policymakers and central banks to judge whether the

 GDP helps government decide how much it can spend on public

This PROJECT CONTAINS FOUR MODULES:

 EXPLORATORY DATA ANALYSIS

 Pandas is defined as an open-source library that provides high performance data

NumPy (Numerical Python) is an open-source core Python library for scientific

 Bar Graph  Histogram

 This dataset was build by augmenting dataset will be

 Infant mortality (per 1000 births)  Deathrate rate of a country

 df.shape --for the shape of the data

Handling Missing Values:

 Mean = df['column_name'].fillna(df['column_name'].mean()[0], inplace = True)

There are various types of Visualizations:

Univariate BIVARIATE MULTIVARIATE

 Univariate analysis is the simplest form of analyzing data.

experimental unit and where the relationships among multivariate measurements

and their structure are important.

The heatmap shows the correlation between all numerical columns.

After performing the data cleaning and visualizations, we implemented our

 Scikit-learn is an open source Machine Learning Python package that offers

 The next step is building the machine learning model.

The Syntax for splitting is given below :

from sklearn.model_selection import train_test_split

 Scaling is a technique to standardize the independent features present

Techniques to perform Scaling are

While training the model we are given :

 A Support Vector Machine (SVM) is a discriminative regressor formally

They are two types:

 Linear SVM – It is used when data is linearly separable

 Non Linear SVM – It is used when data is not linearly separable

 Decision tree builds regression or

from sklearn.tree import DecisionTreeRegressor

 Random Forest is a supervised ensemble machine learning algorithm

from sklearn.ensemble import RandomForestRegressor

 MEAN SQUARED ERROR(MSE)

• Performances metrices are a part of every machine learning

• All machine learning models , whether it’s Random Forest.

• Mean squared logarithmic error (MSLE)can be intrepreted .

• Measured of the ratio between the true and predicted values .

• Mean squared logarithmic error is, as the name suggests , a

variation of the mean squared error.

The squaring is necessary to remove any negative signs.

1.Find the equation for the regression line.

4.square the errors found in step 3.

6. Divide the value found in step 5 by the total number of observations.

Method 2: Using sklearn

SSres is the sum of squares of the residual errors.

SStot is the total sum of the errors.

Frontend Tools: HTML,CSS&BOOTSTRAP.

Backend Tools: Python, Web Frame Work: Flask.

✓ HTML stands for Hyper Text Markup Language.

Why we use Bootstrap ?

pip install virtualenv

✓ After installing virtualenvironment create the virtual environment by below command.

virtualenv my_venv (Name of virtual environment)

pip inatall flask

✓ Enter the url in your web browser, To see the website

✓ Generating HTML content from Python code is difficult.

 Now create the application using heroku AppName

 Add a remote to your local repository

heroku git: remote –a AppName

 Push the files to Heroku

git push heroku master

Now application is deployed into the cloud.

Form page for giving the input values

This applicant can be implemented as a mobile app and be made

 Using machine learning algorithms, we trained the model with 4

You might also like