Machine Learning Project 3
Machine Learning Project 3
BY USING
MACHINE LEARNING
INDEX
ABSTRACTION ARCHITECTURE
INTRODUCTION DATA WRANGLING
GROSS DOMESTIC PRODUCT(GDP) EDA
EXISTING SYSTEM MODELLING
PROPOSED SYSTEM PERFORMANCE METRICS
ADVANTAGES OF PROPOSED SYSTEM DEPLOYMENT
LOAD LIBRARIES FUTURE SCOPE
SYSTEM REQUIREMENTS CONCLUSION
ABSTRACTION
This project is to gather GDP data from multiple data sources and
uses various machine learning algorithms on this data to extract
important information This model can be use for calculating GDP
of a country . In this project Linear Regression Techniques is used
to predict the GDP of a country .
INTRODUCTION
GROSS DOMESTIC PRODUCT( GDP )
GDP Stands for Gross Domestic product and represents the total monetary
value of all final goods and services produced (and sold on the market )
within a period of time( typically 1 year ) GDP is the most commonly used
measure of economic activity
Existing System :
The failure to account for or represent the degree of income inequality in society.
The failure to indicate whether the nation's rate of growth is sustainable or not.
The exclusion of non-market transactions
The failure to account for or represent the degree of income inequality in society
The failure to indicate whether the nation’s rate of growth is sustainable or not
PROPOSED SYSTEM
In the proposed system we will build a machine learning model by training
the model with training dataset and this model will help Business firms to
decide where to invest and buy the shares in that location
A countries data like population, literacy , birth rate etc.., is given input to
the machine learning model and our output label is GDP
Data
Gathering
Data
Wrangling
EDA
Modelling
Performance
Metrics
Deployment
MODULES:
They provide you with a huge set of important commands and features which are used to
easily analyze your data.
It used for working with data set. It has functions for analyzing, cleaning, exploring, and
manipulating data.
We get the insights about the dataset using some functions in pandas, such as
head(),tail(),info(),describe(),sample().
There are several useful functions for detecting, removing and replacing null values in pandas
data frame, such as isnull (), fillna(), replace(), drop()
LOAD LIBRARIES
Seaborn is a library mostly used for statistical plotting in python. It is built on top of
Matplotlib and Provides beautiful default styles and color palettes to make statistical
plots more attractive. Different categories of plot in Seaborn Plots are basically used for
visualizing the relationship between variables. Those variables can be either be
completely numerical or a category like a group, class or division.
Relational plots
CATEGORICAL PLOTS
DISTRIBUTION PLOTS
REGRESSION PLOTS
METRICS PLOTS
DATA GATHERING
We get the insights about the data using the following functions
✓ UNIVARIATE ANALYSIS
✓ BI-VARIATE ANALYSIS
✓ MULTI-VARIATE ANALYSI
EXPLORATORY DATA ANALYSIS
We will use Matplotlib and Seaborn library for the data visualization.
Some commonly used graphs are
•Bar Plot
• Bar Plot
•Hist Plot •Pair Plot
•Box Plot
•Scatter Plot •Heat Map
•Hist Plot
•Box Plot
UNIVARIATE NUMERICAL FEATURE ANALYSIS:
Three univariate model classes are considered: ARIMA-models (after BOX AND JENKINS
[1976]) ARIMAX-models (Combination of ARIMA-terms and additional predictors) and an
“ordinary multiple” approach with lagged terms of the variables.
The statistical study of data where multiple measurements are made on each
Supervised Learning :
Supervised learning is an approach to creating Artificial Intelligence (AI) where a computer
algorithm is trained an input data have been labelled for a particular
Unsupervised Learning :
Unsupervised learning is an approach to creating Artificial Intelligence (AI) where a
computer algorithm is trained an input data does not have labelled for a particular
Methods in Scikit-Learn Package:
fit_transform()
It joins the fit() and transform() method for the transformation of the dataset. It is used on the training
data so that we can scale the training data and also learn the scaling parameters.
fit()
This method calculates the parameters μ(mean) and σ(standard deviation) and saves them as internal
objects.
transform()
Using these same parameters, using this method we can transform a particular dataset. Used for pre-
processing before modeling.
predict()
Use the above-calculated weights on the test data to make the predictions
SPLITTING
Syntax :
from sklearn.preprocessing import StandardScaler
my_scalar = StandardScaler()
X_train_scaled= my_scalar.fit_transform(X_train)
X_test_scaled= my_scalar.transform(X_test_scaled , y_test)
Linear Regression:
Linear Regression is a machine learning algorithm based on supervised learning. It
performs a regression task. Regression models a target prediction value based on
independent variables.
Linear Regression Formula:
Syntax:
from sklearn.linear_model import LinearRegression
model_linear=LinearRegression()
model_linear.fit(X_train_scaled,y_train)
linear = model_linear.score(X_test_scaled,y_test)
SUPPORT VECTOR
MACHINE(SVM):
In machine learning, support vector machines (SVMs, also support vector
networks) are supervised learning models with associated learning
Algorithms that analyze data used for classification and regression analysis.
Syntax:
from sklearn import svm
model_svm = svm.SVR()
model_svm.fit(X_train_scaled,y_train)
svm=model_svm.score(X_test_scaled, y_test)
Decision Tree Regression
SYNTAX:
There are three error metrics that are commonly used for
evaluating and reporting the performance of a regression model.
They are :
• They tell you if you’re making progress , and put a number on it.
mse = mean_squared_error(y_test,y_predict)
Steps to find the MSE:
2.Insert X values in the equation found in step 1 in order to get the respective Y values i.e.
3.Now subtract the new Y values (i.e.) from the original Y values. Thus, found values are the error terms.
5.It is also known as the vertical distance of the given point from the regression line.
mae= mean_absolute_error(y_test,y_predict)
Formula:
Mean Absolute Error = (1/n) * ∑|yi – xi|
where,
•Σ: Greek symbol for summation
•yi: Actual value for the ith observation
•xi: Calculated value for the ith observation
•n: Total number of observations
Method 1: Using Actual Formulae
Mean Absolute Error (MAE) is calculated by taking the summation of the absolute difference
between the actual and calculated values of each observation over the entire array and then
dividing the sum obtained by the number of observations in the array.
sklearn.metrics module of python contains functions for calculating errors for different purposes. It
provides a method named mean_absolute_error() to calculate the mean absolute error of the given
arrays.
Syntax:
mean_absolute_error (actual , calculated)
R2 score:
Coefficient of determination also called as R 2 score is used to evaluate the performance of a linear
regression model. It is the amount of the variation in the output dependent attribute which is predictable
from the input independent variable(s). It is used to check how well-observed results are reproduced by
the model, depending on the ratio of total deviation of results described by the model.
Mathematical Formula :
R2= 1- SSres / SStot
Where,
WHAT IS CSS ?
✓ CSS stands for Cascading Style Sheets.
✓ CSS describes how HTML elements are to be displayed on screen, paper, or in
other media.
✓ CSS saves a lot of work. It can control the layout of multiple web pages all at once.
✓ External style sheets are stored in CSS files.
DEPLOYMENT
BOOTSRAMP:
Bootstrap is a free and open-source tool collection for creating responsive
websites and web applications. It is the most popular HTML, CSS, and
JavaScript framework for developing responsive, mobile-first websites.
Python projects live in virtual environments. So, we need to install virtual environment.
✓ To setup the project first Create the Project Directory, then in terminal goto that directory and use the
following command to install virtual environment.
✓ Now we should Activate the virtualenvironment, to activate use the following command
my_venv/scripts/activate
✓ After activating the virtual environment we should install the flask, to import Flask module in python
✓ This is where one can take advantage of Jinja2 template engine, on which Flask is based. Instead of
returning hardcode HTML from the function, a HTML file can be rendered by the render_template( )
function
page
App.py Python Flask Jinja
input data
data
template
Index.html
DEPLOYMENT
✓ A web application often requires a static file such as a javascript file or
a CSS file supporting the display of a web page.
✓ Usually, the web server is configured to serve them for you, but
during the development, these files are served from static folder in your
package or next to your module and it will be available at /static on the
application
DEPLOYMENT
To Deploy the project in the CLOUD, we are using HEROKU Platform.
Heroku is a platform as a service (PaaS) that enables developers to build, run, and operate
applications entirely in the cloud.
To Deploy the Project Using Heroku we need to follow the following Steps.
In the terminal go to Project Location and install the gunicorn which is a python
Now go to your Folder and Create the folder Procfile without extensions.
In Terminal use this commands pip freeze > requriements.txt .
Install the Git and HerokuCli in your system.
Now Initialize the git in terminal. To initialize use git init
To add files use git add
Commit the git git commit -m “msg”
DEPLOYMENT
Login into Heroku, use Heroku login
S.AFIFA S.ABEEDA
S.TAHASEEN S.UMMESALMA