0% found this document useful (0 votes)
20 views109 pages

6 Introduction To Machine Learning With Python

Uploaded by

mahesh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views109 pages

6 Introduction To Machine Learning With Python

Uploaded by

mahesh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 109

How To Make The Best Use Of Live Sessions

• Please log in 10 mins before the class starts and check your internet connection to avoid any network issues during the LIVE
session

• All participants will be on mute, by default, to avoid any background noise. However, you will be unmuted by instructor if
required. Please use the “Questions” tab on your webinar tool to interact with the instructor at any point during the class

• Feel free to ask and answer questions to make your learning interactive. Instructor will address your queries at the end of on-
going topic

• Raise a ticket through your LMS in case of any queries. Our dedicated support team is available 24 x 7 for your assistance

• Your feedback is very much appreciated. Please share feedback after each class, which will help us enhance your learning
experience

Copyright © edureka and/or its affiliates. All rights reserved.


Course Outline

Introduction to Python Dimensionality Reduction

Sequences and File Operations Supervised Learning - II

Deep Dive-Functions, OOPS,


Modules, Errors and Exceptions Unsupervised Learning

Introduction to Numpy, Pandas Association Rules Mining and


and Matplotlib Recommendation Systems

Data Manipulation Reinforcement Learning

Introduction to Machine Learning


with Python Time Series Analysis

Supervised Learning - I Model Selection and Boosting

Copyright © edureka and/or its affiliates. All rights reserved.


Introduction to Machine Learning with
Python
Topics
The topics covered in this module:

▪ Machine Learning

▪ Applications of Machine Learning

▪ Supervised Learning

▪ Process Flow of Supervised Learning

▪ Linear Regression

Copyright © edureka and/or its affiliates. All rights reserved.


Objectives
After completing this module, you should be able to:

▪ Define Machine Learning

▪ Understand the step to learn machine learning

▪ Understand Machine Learning Use Cases

▪ List the Applications of Machine Learning

▪ Illustrate Supervised Learning Algorithms

▪ State the Scope of Machine Learning

▪ List the Categories of Machine Learning

▪ Define Linear Regression

Copyright © edureka and/or its affiliates. All rights reserved.


Let’s look at the Scenario from Edgeways:
A Software Company

Copyright © edureka and/or its affiliates. All rights reserved.


Scenario of Edgeways
Hey Dear,
Congratulations!!!!
I am pleased to inform you that
you have won an amount of 100K
A professional working in Edgeways, a in a lottery ticket. You can send
your “account details” by replying
software company checks inbox for new to same mail.
mails or updates and becomes Best Regards
frustrated after seeing unnecessary Friend
mails

Hey Dear,
Surprises are waiting at your door!!!!
Reply to the same mail and see the
magic.
Best Regards
Anonymous

many more

Copyright © edureka and/or its affiliates. All rights reserved.


Edgeways: Challenge Faced

I have been receiving a


lot of unnecessary and
disturbing mails since a
long time
(shows the mails)

To IT Support
& Helpdesk

Copyright © edureka and/or its affiliates. All rights reserved.


Edgeways: Challenge Faced

I have been receiving a


lot of unnecessary and
disturbing mails since a
long time
(shows the mails) To IT Support & Helpdesk

Sir, these are the “Spam


Mails” which can even
corrupt your system. We
will try to find the
solution for it at the
earliest

Copyright © edureka and/or its affiliates. All rights reserved.


Now, Let’s See How IT Support will Solve the
Problem of the Spam Mails

Copyright © edureka and/or its affiliates. All rights reserved.


Probable Solution
Spam mails can be detected by manually setting a filter on some words

For Example,
If there are words like lottery, then mark the mail as a spam mail

Such as ‘lottery’
keyword

Copyright © edureka and/or its affiliates. All rights reserved.


Probable Solution

If the mail contains any


virus prone link, then
mark it as spam

Copyright © edureka and/or its affiliates. All rights reserved.


Let’s Look at Another Scenario from E-Cart,
An E-Commerce Company

Copyright © edureka and/or its affiliates. All rights reserved.


Scenario of E-Cart

How can I
improve the sales
of my company?

There is a downfall in the sales of “E-Cart” an e-commerce company

Copyright © edureka and/or its affiliates. All rights reserved.


E-Cart: Challenge Faced

In a leadership meeting they noticed, that they have a lot of data which is unused which
can be used to boost company sales

Copyright © edureka and/or its affiliates. All rights reserved.


E-Cart: Customer Data Sources

Copyright © edureka and/or its affiliates. All rights reserved.


E-Cart: Customer Data Sources

Copyright © edureka and/or its affiliates. All rights reserved.


Now, Let’s See How Unused Data Can be Used to
Market Correct Product to the Correct Audience

Copyright © edureka and/or its affiliates. All rights reserved.


Probable Solution

Purchase History

Search History

Products in Cart

Address
Recommended Products

Copyright © edureka and/or its affiliates. All rights reserved.


Probable Solution

Purchase History

Search History

Products in Cart

Address Recommended Products

Copyright © edureka and/or its affiliates. All rights reserved.


Probable Solution

Purchase History

Search History

Products in Cart

Address Recommended Products

Copyright © edureka and/or its affiliates. All rights reserved.


Probable Solution

Purchase History

Search History

Products in Cart

Address Recommended Products

Copyright © edureka and/or its affiliates. All rights reserved.


Who Can Solve the Problem?
I can assist you
How many times do I need to set in these tasks
filter manually to get rid of these
spam mails

How can I do my shopping more


efficiently, so that I can easily buy
the products which I really want

Copyright © edureka and/or its affiliates. All rights reserved.


Now, Let’s Discuss What is Machine Learning?

Copyright © edureka and/or its affiliates. All rights reserved.


What is Machine Learning?
Machine learning is an application of artificial intelligence (AI) that provide systems the ability to automatically learn and improve
from experience without being explicitly programmed

Training Data

Copyright © edureka and/or its affiliates. All rights reserved.


What is Machine Learning?
Machine learning is an application of artificial intelligence (AI) that provide systems the ability to automatically learn and improve
from experience without being explicitly programmed

Training Data Testing Data


Accuracy

Copyright © edureka and/or its affiliates. All rights reserved.


What is Machine Learning?
Machine learning is an application of artificial intelligence (AI) that provide systems the ability to automatically learn and improve
from experience without being explicitly programmed

New Input Predicted Output

Copyright © edureka and/or its affiliates. All rights reserved.


Traditional Programming Vs Machine Learning
Traditional Programming Machine Learning

Data Data

Model
Output

Program Output

Copyright © edureka and/or its affiliates. All rights reserved.


Features of Machine Learning
It uses the data to detect patterns
in a dataset and adjust program
01 actions accordingly

It focuses on the development of


computer programs that can teach
themselves to grow and change 02
when exposed to new data
It enables computers to find
hidden insights using
03 iterative algorithms without
being explicitly programmed

Machine learning is a method of


data analysis that automates
analytical model building 04
Copyright © edureka and/or its affiliates. All rights reserved.
I wonder is
Machine Let’s have a look
Learning useful at trends
in Industry
Market Trend: Machine Learning

Copyright © edureka and/or its affiliates. All rights reserved.


Now, Let’s See How Edgeways Software Company
uses “Machine Learning” to solve their problem

Copyright © edureka and/or its affiliates. All rights reserved.


Solution Using Machine Learning
Labelled Data

Machine is programmed to learn from labelled data to create rules


based on which we are able to classify the mails

Copyright © edureka and/or its affiliates. All rights reserved.


Let’s See How “Machine Learning” Solves E-Cart
Company Problem

Copyright © edureka and/or its affiliates. All rights reserved.


Solution Using Machine Learning
Best Recommendation
Customer Data
Iphone 6
Iphone 7

Other Recommendations

Mobile Cover

Tempered Glass

Machine learns from the customer data, to cluster


similar data together and based on which it Iphone Charger
recommends products to the customers

Copyright © edureka and/or its affiliates. All rights reserved.


Phases of Machine learning

Phase 1
Training

Training Learning
Data Algorithms

Phase 2
Testing

Test Data Accuracy


Model
Copyright © edureka and/or its affiliates. All rights reserved.
Now, Let’s Have a Look at the Steps of Machine
Learning

Copyright © edureka and/or its affiliates. All rights reserved.


Steps of Machine Learning
Data Wrangling Train Algorithm Deployment

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6


6

Collecting Data Analyse Data Test Algorithm

Copyright © edureka and/or its affiliates. All rights reserved.


Collecting Data
▪ This stage involves the collection of all relevant data from various sources

1 Collecting Data

2 Data Wrangling
Data is collected from
various sources in a server
3 Analyze Data

4 Train Algorithm Data


Sources

5 Test Algorithm
Server

6 Deployment

Copyright © edureka and/or its affiliates. All rights reserved.


Data Wrangling
▪ Data Wrangling is the process of cleaning and converting “Raw Data” into a format that allows
convenient consumption
1 Collecting Data

2 Data Wrangling
Data acquired from sources

3 Analyze Data

4 Train Algorithm
Data filtering

5 Test Algorithm

6 Deployment
Clean Data

Copyright © edureka and/or its affiliates. All rights reserved.


Analyze Data
▪ Data is analysed to select and filter the data required to prepare the model

1 Collecting Data

2 Data Wrangling

3 Analyze Data

4 Train Algorithm

5 Test Algorithm Model

6 Deployment
Feature selection

Copyright © edureka and/or its affiliates. All rights reserved.


Train Algorithm
▪ The algorithm is trained on the training dataset, through which algorithm understands the
pattern and the rules which govern the data
1 Collecting Data

2 Data Wrangling

fed
3 Analyze Data

4 Train Algorithm Model

Training
5 Test Algorithm
Dataset Training

6 Deployment

Copyright
Copyright
© 2017,
© edureka and/or its affiliates. All rights reserved.
Test Algorithm
▪ The testing dataset determines the accuracy of our model

1 Collecting Data

2 Data Wrangling

fed Predicted
3 Analyze Data Output

4 Train Algorithm Model

Testing
5 Test Algorithm
Dataset
Testing
6 Deployment

Accuracy??

Copyright
Copyright
© 2017,
© edureka and/or its affiliates. All rights reserved.
Deployment
▪ If the speed and accuracy of the model is acceptable, then that model should be deployed in
the real system

1 Collecting Data

2 Data Wrangling

3 Analyze Data

4 Train Algorithm

The model that


5 Test Algorithm Models The results of
is used in
improve with the model
production
the amount of need to be
should be
available data incorporated in
6 Deployment made with all
used to create the business
the available
the model strategy
data

Copyright
Copyright
© 2017,
© edureka and/or its affiliates. All rights reserved.
Deployment
▪ After the model is deployed based upon it’s performance the model is updated and improved,
if there is a dip in performance the model is retrained

1 Collecting Data

2 Data Wrangling

3 Analyze Data

4 Train Algorithm

The model that


5 Test Algorithm Models The results of
is used in
improve with the model
production
the amount of need to be
should be
available data incorporated in
6 Deployment made with all
used to create the business
the available
the model strategy
data

Copyright
Copyright
© 2017,
© edureka and/or its affiliates. All rights reserved.
Let’s move forward and understand Different Types
of Machine Learning through various Use-Cases

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case – 1
Group Similar Fruits Together

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case - 1
▪ Suppose you have a basket filled with
Banana Apple
different kind of fruits. Your task is to
group similar type of fruits together

▪ Let’s say we have three kind of fruits

Grapes
Big
Big Red
Green or Yellow Rounded shape
Long curved Depression at the top
Cylindrical shape

Small
Green
Round to oval
Bunch shape
Cylindrical

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case - 1
▪ Suppose we have taken a new fruit from the basket, to decide which fruit it is we will see the fruit’s colour, size and shape
▪ If I say the fruit is yellow, curved cylindrical in shape you can confirm that it’s a banana, same way for other fruits also

It’s red and round it


must be an Apple.

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case - 1
▪ Same way if you learn from the train data and apply it on the new data, that is test data, this kind of learning is called
Supervised Learning algorithm

It’s red and round it


must be an Apple.

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case - 1

Train The
Feature
Raw Data Model Model Evaluate
Extraction
Train

Labels

Feature Labels
New Data Predict
Extraction

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case - 1
▪ We now want to train a Machine to do this task
▪ For this purpose we have to train the machine with the same experience/knowledge using the data

Machine Learning from the data

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case - 1 Characteristics
▪ The problem has following characteristics:

1. Labelled learning data and output is available

2. We have historical data using which machine can find the relationship between the input and the output

3. Output classes are predefined

i.e. Apple, Grape or Banana Output

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case - 1
▪ Once the machine learns from the data, it can use that knowledge, to predict the output of a new input

Machine Learning predicting the output based upon a particular input

Copyright © edureka and/or its affiliates. All rights reserved.


Classifying Upon New Data Input
Output

Input Data Model

Based upon the model created from the training data, the machine is now able to classify into predefined classes, which in this
case are Apple, Banana or Grapes.

Machine classifying data into predefined classes is called Supervised Learning.


Now let us understand Supervised Learning in detail.

Copyright © edureka and/or its affiliates. All rights reserved.


Supervised Learning ▪ Supervised Learning is where you have input variables (X) and an output variable (Y) and you use
an algorithm to learn the mapping function from the input to the output

Supervised
1
Learning

Unsupervised
2
Learning

Reinforcement
3
Learning

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case – 2
Group Unseen Fruits Together

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case - 2
I have seen
them first
time

So, how will I arrange them?

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case – 2 Solution
What will you do first???

You will take a fruit and you will arrange them by considering the physical character of that particular fruit

Suppose you have considered colour


▪ Then you will arrange them on considering base condition
as colour
▪ Then the groups will be something like this
– Red colour group: apples
– Green colour group: grapes
– Yellow colour group: bananas

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case – 2 Solution
So now you will take another physical character such as colour, and size
– RED COLOR AND BIG SIZE: apple
– YELLOW COLOR AND BIG SIZE: bananas
– GREEN COLOR AND SMALL SIZE: grapes

The task has been done

Here you did not learn anything before ,means no train data and no target variable

In Machine Learning, this kind of learning is known as Unsupervised Learning

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case – 2 Characteristics
▪ The problem has following characteristics:

1. Unlabelled learning data is available

2. Output is dynamic to the input values, upon input of new values, output might change

3. No predefined output classes. It can only be grouped into clusters based on the characteristics by the
machine at runtime

Copyright © edureka and/or its affiliates. All rights reserved.


Unsupervised
▪ Unsupervised Learning is the training of a model using information that is neither classified nor
Learning
labelled

Supervised ▪ This model can be used to cluster the input data in classes on the basis of their statistical
1
Learning
properties
Unsupervised
2
Learning

Reinforcement
3
Learning
Cluster - 1 Cluster - 2

Example: For a basket full of vegetables, we can cluster different vegetables based upon their
colour or size

Copyright © edureka and/or its affiliates. All rights reserved.


Use-Case 3:
Training a Self-Drive Car

Copyright © edureka and/or its affiliates. All rights reserved.


Use-Case 3
▪ Suppose you want to train a self drive car to follow your instructions on the road. As the problem statement is not defined in
this case and the train or test data is not available

▪ We want the machine to learn from the events and the result of their actions

Reward
or
Good Job! Penalty
Bad Job!

Stay on the Stay on the


same lane same lane

This kind of learning is called Reinforcement Learning.

Copyright © edureka and/or its affiliates. All rights reserved.


Reinforcement
▪ Reinforcement Learning (RL) is learning by interacting with a space or an environment
Learning
▪ It selects its actions on basis of its past experiences (exploitation) and also by new choices
(exploration)
Supervised
1
Learning
Reward
Unsupervised
2 or
Learning
Good Job! Penalty
Bad Job!
Reinforcement
3
Learning

Stay on the Stay on the


same lane same lane

▪ An RL agent learns from the consequences of its actions, rather than from being taught explicitly

Copyright © edureka and/or its affiliates. All rights reserved.


Applications of Machine Learning

Copyright © edureka and/or its affiliates. All rights reserved.


Machine Learning Applications

Siri

▪ Apple claims that the software adapts to user’s individual


preferences overtime and personalizes results

Marketing and Sales

▪ This ability to capture data, analyse it and use it to personalize a


shopping experience (or implement a marketing campaign) is the
future of retail

Copyright © edureka and/or its affiliates. All rights reserved.


Machine Learning Applications

HealthCare

▪ Machine Learning involves the use of wearable devices and sensors


to assess a patient’s health in real time

Financial Services

▪ Financial Industry use machine Learning technology to identify


insights in data and prevent fraud

Copyright © edureka and/or its affiliates. All rights reserved.


Machine Learning Applications

Biometrics

▪ Biometrics use machine learning to identify individual credentials


and prevent banking fraud

Fingerprint Optical
Scanner
▪ In most cases no image of the fingerprint is actually created, only a set of data
that can be used for comparison

Copyright © edureka and/or its affiliates. All rights reserved.


Let’s Learn More About Supervised Learning

Copyright © edureka and/or its affiliates. All rights reserved.


Supervised Learning
Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the
mapping function from the input to the output

Function: Y=F(X)

It is called Supervised Learning because the process of an


algorithm learning from the training dataset can be
thought as a teacher supervising the learning process

Copyright © edureka and/or its affiliates. All rights reserved.


Let’s Look at the Process Flow of Supervised
Learning

Copyright © edureka and/or its affiliates. All rights reserved.


Process Flow: Supervised Learning

Training and Testing

70%
Machine Statistical
Training
Learning Model
Dataset

Random
Historical Sampling
Data 30%
Prediction and
Prediction Test Testing
Dataset

Model Validation Outcome

Copyright © edureka and/or its affiliates. All rights reserved.


Process Flow: Supervised Learning

Training and Testing

New Data Model Predicted Outcome

Prediction
The model is used for predicting outcome of a new data set. Whenever
performance of the model degrades, the model is retrained

Copyright © edureka and/or its affiliates. All rights reserved.


Supervised Learning Algorithms
Used to estimate real values (cost of houses, number of calls, total
sales etc.) based on continuous variable(s)

Linear Regression

Logistic Regression

Decision Tree

Random Forest

Naïve Bayes Classifier

Copyright © edureka and/or its affiliates. All rights reserved.


Supervised Learning Algorithms
Used to estimate real values (cost of houses, number of calls, total
sales etc.) based on continuous variable(s)

Linear Regression
Used to estimate discrete values (binary values like 0/1, yes/no,
true/false ) based on given set of independent variable(s)
Logistic Regression

Decision Tree

Random Forest

Naïve Bayes Classifier

Copyright © edureka and/or its affiliates. All rights reserved.


Supervised Learning Algorithms
Used to estimate real values (cost of houses, number of calls, total
sales etc.) based on continuous variable(s)

Linear Regression
Used to estimate discrete values (binary values like 0/1, yes/no,
true/false ) based on given set of independent variable(s)
Logistic Regression

Used for classification problems. It works for both categorical and


Decision Tree
continuous dependent variables

Random Forest

Naïve Bayes Classifier

Copyright © edureka and/or its affiliates. All rights reserved.


Supervised Learning Algorithms
Used to estimate real values (cost of houses, number of calls, total
sales etc.) based on continuous variable(s)

Linear Regression
Used to estimate discrete values (binary values like 0/1, yes/no,
true/false ) based on given set of independent variable(s)
Logistic Regression

Used for classification problems. It works for both categorical and


Decision Tree
continuous dependent variables

Random Forest
Random Forest is an ensemble of decision trees. It gives better
prediction and accuracy than decision tree
Naïve Bayes Classifier

Copyright © edureka and/or its affiliates. All rights reserved.


Supervised Learning Algorithms
Used to estimate real values (cost of houses, number of calls, total
sales etc.) based on continuous variable(s)

Linear Regression
Used to estimate discrete values (binary values like 0/1, yes/no,
true/false ) based on given set of independent variable(s)
Logistic Regression

Used for classification problems. It works for both categorical and


Decision Tree
continuous dependent variables

Random Forest
Random Forest is an ensemble of decision trees. It gives better
prediction and accuracy than decision tree
Naïve Bayes Classifier

It is a classification technique based on Bayes’ theorem with an


assumption of independence between predictors

Copyright © edureka and/or its affiliates. All rights reserved.


Now, Let’s Discuss What is Linear Regression?

Copyright © edureka and/or its affiliates. All rights reserved.


What is Linear Regression?
▪ Linear Regression Analysis is a powerful technique used for predicting the unknown value of a variable (Dependent Variable)
from the known value of another variables (Independent Variable)

▪ A Dependent Variable(DV) is the variable to be predicted or explained in a regression model

▪ An Independent Variable(IDV) is the variable related to the dependent variable in a regression equation

For Example:-

Dependent
Variable

Independent
Variable

Copyright © edureka and/or its affiliates. All rights reserved.


Simple Linear Regression
Independent variable
Dependent variable

Y = a + bX

Slope of the line


Y-intercept

▪ Y-intercept (a) is that value of the Dependent Variable(y) when the value of the Independent Variable(x) is
zero. It is the point at which the line cuts the y-axis.

▪ Slope (b) is the change in the Dependent Variable for a unit increase in the Independent Variable. It is the
tangent of the angle made by the line with the x-axis.

Copyright © edureka and/or its affiliates. All rights reserved.


The Regression Line
The regression line is simply a single line that best fits the data (in terms of having the smallest overall distance from the line
to the points)

This technique is used for finding the “best-fitting line” using the “least squares method”.

The red lines shows the


Fitted Points deviations from regression line

Regression Line

Copyright © edureka and/or its affiliates. All rights reserved.


Let’s understand the Concept of
Linear Regression by taking a
Simple Scenario

Copyright © edureka and/or its affiliates. All rights reserved.


Linear Regression Use Case – A Real Estate Company

A Real Estate Company “Prime Homes”


has a new project coming up in which
they have build the homes at different
locations in Boston.

They have rough idea about prices but


Palette
actual price is not decided yet. They
want prices such that houses can be
easily afforded by common people.

Copyright © edureka and/or its affiliates. All rights reserved.


I have provided you with
the Boston Dataset,
Analyse the data and
predict the approximate
prices for the houses
Show me the
dataset
Boston Dataset

Here is the Boston data which


“Prime Homes” will use to predict
the price:-

Copyright © edureka and/or its affiliates. All rights reserved.


Boston Dataset Description
In order to train the model, we will use Boston dataset. The dataset looks like this:

• CRIM - per capita crime rate by town


• ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
• INDUS - proportion of non-retail business acres per town.
• CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
• NOX - nitric oxides concentration (parts per 10 million)
• RM - average number of rooms per dwelling
• AGE - proportion of owner-occupied units built prior to 1940
• DIS - weighted distances to five Boston employment centres
• RAD - index of accessibility to radial highways
• TAX - full-value property-tax rate per $10,000
• PTRATIO - pupil-teacher ratio by town
• B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
• LSTAT - % lower status of the population
• MEDV - Median value of owner-occupied homes in $1000's

Copyright © edureka and/or its affiliates. All rights reserved.


But, where should
Don’t worry,
I start predicting
the price for our Let’s start by finding the
houses ??? relation between different
Variables provided to you
in dataset

First, Let me introduce


you to the libraries used
here in python
Some Libraries Used in Python
Scikit-Learn
Simple and efficient or data mining and data
analysis, Built on NumPy and matplotlib, Open
source
Seaborn
Focused on the visual of
statistical models which include
Pandas
heat maps and depict the overall Perfect tool for data wrangling,
distributions designed for quick and easy data
manipulation, aggregation, and
visualization

Matplotlib Numpy
It enables you to make- Stands for Numerical Python,
Bar charts, Scatter plots, Line provides an abundance of useful
Charts, Histograms, Pie charts, features for operations on n-arrays
Contour plots, Quiver plots and matrices in Python

Copyright © edureka and/or its affiliates. All rights reserved.


Relation Between Dependent and Independent Variable
In order to know how these variables are related to each other, we plot them against each other

Here, we can see that with


increase in the crime, price fall
down
Thus, Price is dependent
variable and Crime Rate is an
independent variable

Scatter Plot

Copyright © edureka and/or its affiliates. All rights reserved.


Now, Let’s write the code to create this Plot

Copyright © edureka and/or its affiliates. All rights reserved.


Collecting Data
Below is the code to import the data

1 Collecting Data # Import libraries necessary for this project


import numpy as np
import pandas as pd
2 Data Wrangling import matplotlib.pyplot as plt
%matplotlib inline
3 Analyze Data bos1 = pd.read_csv('BostonHousing.csv')
print(bos1)

4 Train Algorithm

5 Test Algorithm

6 Deployment

Copyright © edureka and/or its affiliates. All rights reserved.


Data Wrangling
Defining Dependent and Independent Variables

1 Collecting Data x = bos1.iloc[:,0:13]


y = bos1["medv"]
2 Data Wrangling

3 Analyze Data
x

4 Train Algorithm

5 Test Algorithm
y
6 Deployment

Copyright © edureka and/or its affiliates. All rights reserved.


Relation Between Variables: Correlation
▪ “Correlation” is an important factor to check the dependencies, when there are multiple variables
▪ It gives us an insight of the mutual relationship among variables
▪ For creating a correlation plot, “Seaborn” library is used

Important Terms

• heatmap:- Graphical representation of data where the individual values contained in


a matrix are represented as colours
• Cmap:- mapping from data values to colour space
• annot:- bool or rectangular dataset, If true means write data value in each cell
• plt.tight_layout:- provides routines to adjust subplot parameters so that subplots
are nicely fit in the figure

Copyright © edureka and/or its affiliates. All rights reserved.


Analyze Data
Creating correlation plot to check relationship between variables

#code to plot correlation


1 Collecting Data #librarry to establish correlation
import seaborn as sns
names = []
2 Data Wrangling
#creating a correlation matrix
correlations = bos1.corr()
3 Analyze Data sns.heatmap(correlations,square =
True, cmap = "YlGnBu")
plt.yticks(rotation=0)
4 Train Algorithm plt.xticks(rotation=90)
plt.show()
5 Test Algorithm

6 Deployment

Copyright © edureka and/or its affiliates. All rights reserved.


Linear Regression Model: Estimators
Important functions used while fitting a linear regression model are:

Estimator Description
lm.fit() Fits a linear model

lm.predict() Predict Y using the linear model with estimated coefficients

lm.score() Returns the coefficient of determination (R^2)

lm.coef_ Estimated Coefficients

lm.intercept_ Estimated Intercepts

Copyright © edureka and/or its affiliates. All rights reserved.


Analyze Data
Training and Testing partitions are used to provide:-
▪ Honest assessments of the performance of our predictive models
▪ Least amount of mathematical reasoning and manipulation of results
1 Collecting Data

Scikit learn provides a function called train –test split to train and test data
2 Data Wrangling

from sklearn.model_selection import train_test_split


3 Analyze Data
#testing data size is of 33% of entire data
x_train, x_test, y_train, y_test =train_test_split(x,y,
4 Train Algorithm test_size = 0.33, random_state =5)

5 Test Algorithm

6 Deployment

Copyright
Copyright
© 2017,
© edureka and/or its affiliates. All rights reserved.
Before moving ahead and
building a model. Let me
introduce you to
‘Model Fitting’
What is Model Fitting?
Fitting a model means that you're making your algorithm learn the relationship between predictors and
outcome so that you can predict the future values of the outcome
So the best fitted model has a specific set of parameters which best defines the problem at hand

Since this is a linear model with equation


y=mx+c, so in this case, parameters, that the
model learns from the data are: m and c

Forest Land
Forecast

Human Population
Copyright © edureka and/or its affiliates. All rights reserved.
Types of Model Fitting: Underfitting And Overfitting
▪ Machine Learning algorithms first attempt to solve the problem of under-fitting; that is, of taking a line that does not
approximate the data well, and making it to approximate the data better

Machine doesn’t know where to stop in order to solve the problem it can even go ahead from Appropriate to Over Fit
model. When we say a model overfits a dataset we mean, it may have a low error rate for the training data, but it may
not generalize well to the overall population of data we’re interested in

Copyright © edureka and/or its affiliates. All rights reserved.


Train Algorithm
Let’s build the model on the train data

1 Collecting Data
from sklearn.linear_model import LinearRegression
#fitting our model to train and test
2 Data Wrangling
lm = LinearRegression()
model = lm.fit(x_train,y_train)
3 Analyze Data

4 Train Algorithm

5 Test Algorithm

6 Deployment

Copyright
Copyright
© 2017,
© edureka and/or its affiliates. All rights reserved.
Testing the Algorithm

Copyright © edureka and/or its affiliates. All rights reserved.


Test Algorithm
pred_y = lm.predict(x_test)
plt.scatter(y_test,pred_y)
1 Collecting Data plt.xlabel('Y Test')
plt.ylabel('Predicted Y')

2 Data Wrangling

3 Analyze Data

4 Train Algorithm

5 Test Algorithm

6 Deployment

Copyright
Copyright
© 2017,
© edureka and/or its affiliates. All rights reserved.
Deployment
Download the complete code for Linear regression on the Boston Dataset from the LMS

1 Collecting Data

2 Data Wrangling

3 Analyze Data

4 Train Algorithm

5 Test Algorithm

6 Deployment

Copyright
Copyright
© 2017,
© edureka and/or its affiliates. All rights reserved.
Summary
▪ Express Machine Learning

▪ Types of Machine Learning

▪ Real World Applications of Machine Learning

▪ Supervised Learning

▪ Linear Regression Model & its implementation

Copyright © edureka and/or its affiliates. All rights reserved.


Copyright © edureka and/or its affiliates. All rights reserved.
Copyright © edureka and/or its affiliates. All rights reserved.
Copyright © edureka and/or its affiliates. All rights reserved.

You might also like