100% found this document useful (1 vote)

433 views

Lesson 5 - Supervised Learning-Classification

Uploaded by

aditya jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

433 views

Lesson 5 - Supervised Learning-Classification

Uploaded by

aditya jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 91

Machine Learning

Lesson 5: Supervised Learning–Classification

© Simplilearn. All rights reserved.

Concepts Covered

Classification: A supervised learning algorithm

Decision Tree

Random Forest

Naïve Bayes

Confusion Matrix vs Cost Matrix

Kernel SVM
Learning Objectives

By the end of this lesson, you will be able to:

Understand classification as part of supervised learning

Demonstrate different classification techniques in Python

Evaluate classification models

Classification
Topic 1: Definition of Classification
What Is Classification?

A machine learning task that identifies the class to which an instance

belongs

Acme Article
Acme
AcmeArticle
Article

Technology

Acme Article
Food
FoodArticle
FoodArticle
Article

Sports

Bar Article
Bar Article
Bar Article

Entertainment
Classification: Example

Training a classifier model with respect to the available data

Classification
Training Algorithms
Data

Name Rank Years Tenured

Mike Assistant Prof 3 no
Mary Assistant Prof 7 yes Classifier
Bill Professor 2 yes (Model)
Jim Associate Prof 7 yes
Dave Associate Prof 6 no
Anne Associate Prof 3 no
IF rank = “professor”
OR years > 6
THEN tenured = “yes”
Classification: Example

The model will classify if the professors are tenured or not

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)

Name Rank Years Tenured Tenured?

Tom Assistant Prof 2 no
Merlisa Assistant Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes Yes
Classification: Work Flow

A typical classifier model workflow with input training data and output labels

(a) Training

label
machine
learning
feature algorithm
extractor
features
input

(a) Prediction

feature
Classifier label
extractor
model
features
input
Classification: A Supervised Learning Algorithm

Classification is a supervised learning algorithm as the training data contains labels

Supervised Learning Model

Record Spectacle Tear production Class Label

ID
Age
Prescription
Astigmatic
Rate Lenses Feature
Training Text, Vectors
1 Young Myope No Reduced Noncontact
2 Young Myope No Normal Soft contact
Documents,
Images, etc.
3 Young Myope Yes Reduced Noncontact
4 Young Myope Yes Normal Hard contact
5 Young Hypermetrope No Reduced Noncontact
6 Young Hypermetrope No Normal Soft contact
7 Young Hypermetrope Yes Reduced Noncontact
8 Young Hypermetrope Yes Normal Hard contact
9 Pre-presbyopic Myope No Reduced Noncontact Machine
10 Pre-presbyopic Myope No Normal Soft contact Labels Learning
11 Pre-presbyopic Myope Yes Reduced Noncontact Algorithm
12 Pre-presbyopic Myope Yes Normal Hard contact
13 Pre-presbyopic Hypermetrope No Reduced Noncontact
14 Pre-presbyopic Hypermetrope No Normal Soft contact
15 Pre-presbyopic Hypermetrope Yes Reduced Noncontact
16 Pre-presbyopic Hypermetrope Yes Normal Noncontact Feature
17 Presbyopic Myope No Reduced Noncontact New Text, Vectors
Predictive Expected
Document,
18 Presbyopic Myope No Normal Noncontact Model Label
Images, etc.
19 Presbyopic Myope Yes Reduced Noncontact
20 Presbyopic Myope Yes Normal Hard contact
Classification
Topic 2: Use Cases and Algorithms
Sentiment
Analysis
Fraud
Detection
Face
Detection
Classification Algorithms

Few of the most commonly used classification algorithms:

Decision Tree

Support Vector Machines Random Forest

(SVM)

Naîve Bayes Classifier

Classification
Topic 3: Decision Tree Classifier
Decision Tree Classifier

ROOT Node

Branch/Sub-Tree
Splitting

Decision Node A Decision Node

B C

Terminal Node Decision Node Terminal Node Terminal Node

A tree-like structure in which the internal node represents

the test on an attribute
Terminal Node Terminal Node

Each branch represents the outcome of the test, and each

leaf node represents the class label

A path from root to leaf represents classification rules

Decision Tree: Schematic Representation

The tree is splitted whenever an impure node is detected

a1 a2 a3 a4 a5 a6

X Y
Z

If impure node is detected, If impure node is detected, If pure node is detected,

select best attribute select best attribute it can be classified as
and continue and continue leaf node
Decision Tree: Example 1
Below example illustrates the splitting attributes with respect to the adjacent training
data

Splitting Attributes
Tid Refund Marital Taxable Cheat
Status Income

1 Yes Single 125K No

2 No Married 100K No Refund
Yes No
3 No Single 70K No
4 Yes Married 120K No
NO MarSt
5 No Divorced 95K Yes Single, Married
Divorced
6 No Married 60K No
TaxInc NO
7 Yes Divorced 220K No
< 80K > 80K
8 No Single 85K Yes
9 No Married 75K No
NO YES
10 No Single 90K Yes
10

Training Data
Decision Tree: Example 2
Forming a decision tree to check if the match will be played or not based on climatic
conditions

Play = 3
Play = 3 Don’t play = 0
Don’t play = 2

Play = 0
Rainy Don’t play = 2
Play = 9
Windy
Don’t play = 5

Play = 4
Outlook ? Don’t play = 0

Cloudy
Play = 0
Don’t play = 3

Play = 2 oC

Don’t play = 3
Play = 2
Don’t play = 0
Sunny
Decision Tree Formation

Entropy Information Gain

• Entropy measures the impurity of a

• Information gain is the expected reduction
collection of examples.
in entropy caused by partitioning the
• It depends on the distribution of the examples on an attribute.
random variable.
• Higher the information gain, the more
• Entropy, in general, measures the effective the attribute in classifying
amount of information in a random training data.
variable:
C C
• Expected reduction in entropy, given A
H(X) = –  pi log2 pi =  pi log2 1/ pi
i=1 i=1 |Sv|
Gain(S, A) = Entropy(S) −  Entropy(Sv)
X = {i, …, c} |S|
v Values(A)
for classification in c classes
Values(A) = possible values for A
Which Attribute Is the Best Classifier?

The attribute with the highest information gain is selected as the splitting attribute

S: [9+,5-] S: [9+,5-]
E=0.940 E=0.940
Gain(S, Outlook) = 0.246
Humidity Windy

Gain(S, Humidity) = 0.151

High Normal Weak Strong

Gain(S, Wind) = 0.084

[3+,4-] [6+,1-] [6+,2-] [3+,3-]
E=0.985 E=0.592 E=0.811 E=1.00
Gain (S, Wind) Gain(S, Temperature) = 0.029
Gain (S, Humidity)
= .940 – (7/14).985 – (7/14).592 = .940 – (8/14).811 – (6/14)1.0
=.151 =.048
Which Attribute Is the Best Classifier?

The attributes within outlook are further splitted with respect to their gains

Working on Outlook=Sunny node:

Gain(SSunny, Humidity) = 0.970 − 3/5  0.0 − 2/5  0.0 = 0.970
Gain(SSunny, Wind) = 0.970 − 2/5  1.0 − 3.5  0.918 = 0 .019
Gain(SSunny, Temp.) = 0.970 − 2/5  0.0 − 2/5  1.0 − 1/5  0.0 = 0.570

Humidity provides the best prediction for the target

For each possible value of Humidity, you can add a successor to the tree.
Which Attribute Is the Best Classifier?

Finally, you arrive at leaf nodes with a strong decisions

Outlook

Sunny Overcast Rain

Humidity Windy
{D3, D7, D12, D13}
[4+, 0-]

High Normal Yes Strong Weak

{D1, D2, D8} {D9, D11} {D4, D5, D10} {D6, D14}
No Yes Yes No
Overfitting of Decision Trees

Overfitting occurs when the learning algorithm continues to develop

hypotheses that reduce training set error at the cost of an increased test set
error.
Avoiding Overfitting of Decision Trees

Post Pruning
Classification
Topic 4: Random Forest Classifier
Bagging and Bootstrapping

Bagging Bootstrapping

A technique for reducing the Randomly draws datasets with

variance of an estimated replacement from the
prediction function training data

For classification, a committee

of trees each cast a vote for Each sample is of the same
the predicted class size as the original training set
Bagging and Bootstrapping
Create bootstrap samples from the training data

M features

N examples

....…
Decision Tree Classifier
Each sample contributes to a decision tree classifier

Location
M features Similarity
N examples

Gene
Expression Domain-motif

Interact
....…

Neighbor Gene
Degree
Function Similarity Expression

Interact Not Interact Interact Not Interact Interact Not Interact

Random Forest Classifier
Location
Similarity

Gene
Expression Domain-motif

Interact

Neighbor Gene
Degree

M features
Function Similarity Expression
N examples

Interact Not Interact Interact Not Interact Interact Not Interact

Neighbor
process similarity

Tissue
Expression
Centraility Take the
majority
vote
....…

Neighbor PTM Interact Not Interact

Function Similarity

Interact Not Interact Interact Not Interact

Neighbor
process similarity

Tissue
Centraility
Expression

Neighbor PTM Interact Not Interact

Function Similarity

Interact Not Interact Interact Not Interact

Classification
Topic 5: Performance Measures
Confusion Matrix

Focus on the predictive capability of a model

PREDICTED CLASS

Class=Yes Class=No a: TP (true positive)

b: FN (false negative)
ACTUAL Class=Yes a b
c: FP (false positive)
CLASS
Class=No c d d: TN (true negative)
Accuracy Metric
Ratio of true positives and true negatives to the sum of true positives, true negatives, false negatives,
and false positives

PREDICTED CLASS

Class=Yes Class=No

ACTUAL Class=Yes a b
CLASS (TP) (FN)
Class=No c d
(FP) (TN)

a+d TP + TN
Accuracy = =
a + b + c + d TP + TN + FP + FN
Limitation of Accuracy

Consider a 2-class problem

Number of Class 0 examples = 9990
Number of Class 1 examples = 10

If the model predicts every example to be

class 0, accuracy is 9990/10000 = 99.9 %

Hence, accuracy is misleading because the

model does not detect any class 1 example
Cost Matrix

Cost matrix takes weights into account

PREDICTED CLASS

C(i|j) Class=Yes Class=No

ACTUAL Class=Yes C(Yes|Yes) C(No|Yes)

CLASS
Class=No C(Yes|No) C(No|No)

wa + w d
Cost of classifying class j example as class i Weighted Accuracy = 1 4

wa + wb+ wc + w d
1 2 3 4
Computing Cost of Classification

Cost PREDICTED CLASS

Matrix
C(i|j) + -
ACTUAL + -1 100
CLASS
- 1 0

Model PREDICTED CLASS Model PREDICTED CLASS

M1 M2
+ - + -
ACTUAL + 150 40 ACTUAL + 250 45
CLASS CLASS
- 60 250 - 5 200

Accuracy = 80% Accuracy = 90%

Cost = 3910 Cost = 4255
Cost vs. Accuracy

Count PREDICTED CLASS Accuracy is proportional to cost if

1. C(Yes|No)=C(No|Yes) = q
Class=Yes Class=No 2. C(Yes|Yes)=C(No|No) = p

ACTUAL
Class=Yes a b N=a+b+c+d
CLASS Class=No c d
Accuracy = (a + d)/N

Cost = p (a + d) + q (b + c)
Cost PREDICTED CLASS
= p (a + d) + q (N – a – d)
Class=Yes Class=No
= q N – (q – p)(a + d)
ACTUAL Class=Yes p q
= N [q – (q-p)  Accuracy]
CLASS
Class=No q p
Assisted Practice
Random Forest Classifier Duration: 15 mins.

Problem Statement: Predict the survival of a horse based on various observed medical conditions. Load
the data from “horses.csv” and observe whether it contains missing values. The dataset contains many
categorical features; replace them with label encoding. Replace the missing values by the most frequent value
in each column. Fit a decision tree classifier and random forest classifier, and observe the accuracy.

Objective: Learn to fit a decision tree, and compare its accuracy with random forest classifier.

Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and password
that are generated. Click on the Launch Lab button. On the page that appears, enter the username and
password in the respective fields, and click Login.
Unassisted Practice
Random Forest Classifier Duration: 15 mins.

Problem Statement: PeerLoanKart is an NBFC (Non-banking Financial Company) that facilitates peer-to-peer loan.
It connects people who need money (borrowers) with people who have money (investors). As an investor, you would
want to invest in people who showed a profile of having a high probability of paying you back.
You “as an ML expert” create a model that will help predict whether a borrower will pay the loan or not.

Objective: Increase profits up to 20% as NPA will be reduced due to loan disbursal for only creditworthy borrowers

Note: This practice is not graded. It is only intended for you to apply the knowledge you gained to solve real-world
problems.

Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and password that are
generated. Click on the Launch Lab button. On the page that appears, enter the username and password in the
respective fields, and click Login.
Import Libraries

Code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from sklearn.model_selection import train_test_split
Get the Data

Code

loans = pd.read_csv('loan_borowwer_data.csv’)
loans.describe()
Exploratory Data Analysis
Create a histogram of two FICO distributions on top of each other, one for
each credit.policy outcome.

Code

plt.figure(figsize=(10,6))
loans[loans['credit.policy']==1]['fico'].hist(alpha=0.5,color='blue',
bins=30,label='Credit.Policy=1')
loans[loans['credit.policy']==0]['fico'].hist(alpha=0.5,color='red',
bins=30,label='Credit.Policy=0')
plt.legend()
plt.xlabel('FICO')
Exploratory Data Analysis
Exploratory Data Analysis

Create a similar figure; select the not.fully.paid column

Code

plt.figure(figsize=(10,6))
loans[loans['not.fully.paid']==1]['fico'].hist(alpha=0.5,color='blue',
bins=30,label='not.fully.paid=1')
loans[loans['not.fully.paid']==0]['fico'].hist(alpha=0.5,color='red',
bins=30,label='not.fully.paid=0')
plt.legend()
plt.xlabel('FICO')
Exploratory Data Analysis
Exploratory Data Analysis
Create a countplot using seaborn showing the counts of loans by purpose,
with the hue defined by not.fully.paid.

Code

plt.figure(figsize=(11,7))
sns.countplot(x='purpose',hue='not.fully.paid',data=loans,palette='Set1')
Setting Up the Data
Create a list of elements, containing the string “purpose.” Call this list
cat_feats.

Code

cat_feats = ['purpose’]
Setting Up the Data
Now use pd.get_dummies (loans,columns=cat_feats,drop_first=True) to create a fixed larger data
frame that has new feature columns with dummy variables. Set this data frame as final_data.

Code

final_data = pd.get_dummies(loans,columns=cat_feats,drop_first=True)
final_data.info()
Train-Test Split

Code

X = final_data.drop('not.fully.paid',axis=1)
y = final_data['not.fully.paid']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30,
random_state=101)
Training Decision Tree Model

Code

from sklearn.tree import DecisionTreeClassifier

dtree = DecisionTreeClassifier()
dtree.fit(X_train,y_train)
Evaluating Decision Tree

Create predictions from the test set, and create a classification report and a confusion matrix.

Code

predictions = dtree.predict(X_test)
from sklearn.metrics import classification_report,confusion_matrix
print(classification_report(y_test,predictions))
Confusion Matrix

Code

print(confusion_matrix(y_test,predictions))
Training Random Forest Model

Code

from sklearn.ensemble import RandomForestClassifier

rfc = RandomForestClassifier(n_estimators=600)
rfc.fit(X_train,y_train)
Evaluating Random Forest Model

Code

predictions = rfc.predict(X_test)
from sklearn.metrics import classification_report,confusion_matrix
print(classification_report(y_test,predictions))
Printing the Confusion Matrix

Code

print(confusion_matrix(y_test,predictions))
Classification
Topic 7: Naïve Baye’s Classifier
Naïve Baye’s Classifier and Baye’s Theorem
Classification technique based on Baye’s
theorem

Example:

Baye’s Theorem

P(A|B) = P(B|A) P(A) • P(A) – Class Prior Probability

• P(B|A) – Likelihood
P(B) Where, • P(A|B) – Posterior Probability
• P(A) - Predictor Prior Probability

Note: Naive Baye’s classifier assumes that the presence of a particular feature in a class is unrelated to the
presence of any other feature.
Naïve Baye’s Classifier: Example
As the first step toward prediction using naïve bayes, you will have to estimate frequency of each and every
attribute

Play
Frequency Table
Yes No
Sunny 3 2
Outlook Overcast 4 0
Rainy 3 2

Play
Frequency Table
Yes No
High 3 4
Humidity
Normal 6 1

Play
Frequency Table
Yes No
Strong 6 2
Wind
Weak 3 3
Building Likelihood Tables

Calculating likelihood of each attribute

Play
Likelihood Table P(B|A) = P(Sunny|Yes) = 3/9 = 0.33
Yes No
Sunny 3/9 2/5 5/14
P(B) = P(Sunny) = 5/14 = 0.36
Outlook Overcast 4/9 0/5 4/14
Rainy 3/9 2/5 5/14
P(A) = P(Yes) = 10/14 = 0.71
10/14 4/14

Similarly likelihood of “No” given Sunny is:

P(A|B) = P(No|Sunny) = P(Sunny|No)* P(No) / P(Sunny) = (0.4 x 0.36) /0.36 = 0.40

Building Likelihood Tables

Likelihood table for Humidity Likelihood table for Wind

Play Play
Likelihood Table Likelihood Table
Yes No Yes No
High 3/9 4/5 7/14 Weak 6/9 2/5 8/14
Humidity Wind
Normal 6/9 1/5 7/14 Strong 3/9 3/5 6/14
9/14 5/14 9/14 5/14

P(Yes|High) = 0.33 x 0.6 / 0.5 = 0.42 P(Yes|Weak) = 0.67 x 0.64 / 0.57 = 0.75

P(No|High) = 0.8 x 0.36 / 0.5 = 0.58 P(No|Weak) = 0.4 x 0.36 / 0.57 = 0.25
Getting the Output

Outlook = Rain
Humidity = High
Wind = Weak
Play = ?

Likelihood of “Yes” = P(Outlook = Rain|Yes)P(Humidity= High|Yes) P(Wind=

Weak|Yes)*P(Yes) = 2/9 * 3/9 * 6/9 * 9/14 = 0.0199

Likelihood of “No” = P(Outlook = Rain|No)P(Humidity= High|No) P(Wind=

Weak|No)*P(No) = 2/5 * 4/5 * 2/5 * 5/14 = 0.0166
Getting the Output

Normalizing the values

P(Yes) = 0.0199 / (0.0199+ 0.0166) = 0.55

P(No) = 0.0166 / (0.0199+ 0.0166) = 0.45

The model predicts
that there is a 55%
chance
that there will be
game tomorrow
Classification
Topic 8: Support Vector Machines
Linear Separators

Consider a binary separation which can be viewed as the task of separating classes
in feature space.

𝒘𝑻 𝒙 + 𝒃 = 𝟎
𝒘𝑻 𝒙 + 𝒃 > 𝟎 𝒘𝑻 𝒙 + 𝒃 < 𝟎

𝒇 𝒙 = 𝒔𝒊𝒈𝒏(𝒘𝑻 𝒙 + 𝒃)
Optimal Separation

It’s difficult to evaluate the optimal separator.

Concept of Classification Margin

wT xi + b
𝒓 Distance from example xi to the separator ris=
w

Closest to the hyperplane are support vectors.

Margin ρ of the separator is the distance between

support vectors.
Maximizing Classification Margin

Helps generalize the predictions Takes care only of the support

and perform better on the test vectors, ignoring other training
data by not overfitting the examples
model to the training data
Linear SVM: Mathematically

Let training set {(xi, yi)}i=1..n, xiRd, yi  {-1, 1}

be separated by a hyperplane with margin
ρ. Then for each training example (xi, yi):
wTxi + b ≤ - ρ/2 if yi = -1
wTxi + b ≥ ρ/2 if yi = 1  yi(wTxi + b) ≥ ρ/2 For every support vector xs, the
above inequality is an equality. After
rescaling w and b by ρ/2 in the
equality, you obtain the distance
between each xs. The hyperplane is:

y s ( w T x s + b) 1
r= =
Then the margin can be expressed w w
through (rescaled) w and b as:
2
 = 2r =
w
Linear SVM: Mathematically

Now, you can formulate the quadratic optimization problem:

Find w and b such that = 2

is maximized
w
and for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1

You can reformulate the problem as:

Find w and b such that

Φ(w) = ||w||2=wTw is minimized
and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1
Nonlinear SVMs
Scenario 1

• Datasets that are linearly separable with some noise:

0 x

Scenario 2

• When the dataset is hard:

0 x

Scenario 3

• Mapping data to a higher dimensional space

Nonlinear SVMs: Feature Spaces

The original feature space can always be mapped to some higher-dimensional

feature space where the training set is separable.

𝝓: 𝒙 → 𝝋(𝒙)
The Kernel Trick

• The linear classifier relies on inner product between vectors K(xi,xj)=xiTxj

• If every datapoint is mapped into high-dimensional space via some transformation Φ: x → φ(x), the inner
product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
• A kernel function is a function that is equivalent to an inner product in a feature space.
• Example:
2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2,
Need to show that K(xi,xj)= φ(xi) Tφ(xj):
K(xi,xj)=(1 + xiTxj)2,= 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2=
= [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2] =
= φ(xi) Tφ(xj), where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2]
• Thus, a kernel function implicitly maps data to a high-dimensional space (without the need to compute each φ(x)
explicitly).
Assisted Practice
Support Vector Machines Duration: 15 mins.

Problem Statement: Motion Studios is the largest radio production house in Europe. Its total revenue is $
1B+. The company has launched a new reality show "The Star RJ." The show is about finding a new radio jockey
who will be the star presenter on upcoming shows.
In the first round, participants have to upload their voice clip online. The clip will be evaluated by experts for
selection to the next round. There is a separate team in the first round for evaluation of male and female
voice.
Response to the show is unprecedented, and company is flooded with voice clips.
You “as an ML” expert have to classify the voice as either male or female so that the first level of filtration is
quicker.

Objective: Optimize selection process.

Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and password that
are generated. Click on the Launch Lab button. On the page that appears, enter the username and password
in the respective fields, and click Login.
Unassisted Practice
Support Vector Machines Duration: 15 mins.

Problem Statement: Load the data from “college.csv” that has attributes collected about private and public colleges for a
particular year. Predict the private/public status of the colleges from other attributes.
Use LabelEncoder to encode the target variable to numerical form. Split the data such that 20% of the data is set aside for
testing. Fit a linear svm from scikit learn and observe the accuracy. [Hint: Use Linear SVC]
Preprocess the data using StandardScalar and fit the same model again. Observe the change in accuracy.
Use scikit learn’s gridsearch to select the best hyperparameter for a nonlinear SVM. Identify the model with best score and
its parameters. [Hint: Refer to model_selection module of Scikit learn]

Objective: Employ SVM from scikit learn for binary classification and measure the impact of preprocessing data and hyper
parameter search using grid search.

Note: This practice is not graded. It is only intended for you to apply the knowledge you have gained to solve real-world
problems.
Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and password that are
generated. Click on the Launch Lab button. On the page that appears, enter the username and password in the respective
fields, and click Login.
Import the Dataset

Code

import pandas as pd
df = pd.read_csv("College.csv")
df.columns
Label Encoding

Code

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import LabelEncoder
X, y = df.iloc[:, 1:].values, df.iloc[:, 0].values
# male -> 1
# female -> 0
target_encoder = LabelEncoder()
y = target_encoder.fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=1)
print(X_train.shape)
Fit the Linear SVC Classifier

Code

from sklearn.svm import LinearSVC,SVC

classifier = LinearSVC()

classifier.fit(X_train,y_train)
y_predict = classifier.predict(X_test)
classifier.score(X_test,y_test)
Obtain Performance Matrix

Code

from sklearn.metrics import confusion_matrix

print(confusion_matrix(y_predict,y_test))
Fit the SVC Classifier

Code

classifier = SVC()
classifier.fit(X_train,y_train)
classifier.score(X_test,y_test)
Preprocess the Data

Code

from sklearn.model_selection import GridSearchCV

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X, y = df.iloc[:, 1:].values, df.iloc[:, 0].values
X = scaler.fit_transform(X)
target_encoder = LabelEncoder()
y = target_encoder.fit_transform(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=1)
print(X_train.shape)
Refitting the SVC Model

Code

classifier = SVC()
classifier.fit(X_train,y_train)
classifier.score(X_test,y_test)
Fitting Grid Search

Code

import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit
C_range = np.logspace(-2, 10, 13)
gamma_range = np.logspace(-9, 3, 13)
param_grid = dict( gamma=gamma_range,C=C_range)
grid = GridSearchCV(SVC(), param_grid=param_grid)
grid.fit(X_train, y_train)
Getting the Best Hyperparameter

Code

print("The best parameters are %s with a score of %0.2f"

% (grid.best_params_, grid.best_score_))
Key Takeaways

Now, you are now able to:

Understand classification as part of supervised learning

Demonstrate different classification techniques in Python

Evaluate classification models

Knowledge
Check

Knowledge
Check Let us train T1, a decision tree, with the data given below. Which feature will you split
at the root?
1

a. x1

b. x2

c. x3

d. y

Knowledge
Check Let us train T1, a decision tree, with the data given below. Which feature will you split
at the root?
1

a. x1

b. x2

c. x3

d. y

The correct answer is c. x3

x3 will split because it has the lowest classification error. At row 3, x3=1, y=-1; there is only one error compared to other
features.
Knowledge
Check If you are training a decision tree, and you are at a node in which all of its data has the
same y value, you should:
2

a. Find the best feature to split

b. Create a leaf that predicts the y value of all the data

c. Terminate recursions on all branches and return the current tree

d. Go back to the parent node and select a different feature to split so that the y values are
not all the same at this node

Knowledge
Check If you are training a decision tree, and you are at a node in which all of its data has the
same y value, you should
2

a. Find the best feature to split

b. Create a leaf that predicts the y value of all the data

c. Terminate recursions on all branches and return the current tree

d. Go back to the parent node and select a different feature to split so that the y values are
not all the same at this node

The correct answer is b. Create a leaf that predicts the y value of all the data
You should create a leaf that predicts the y value of all the data.
Lesson-End Project Duration: 20 mins.

Problem Statement: Load the kinematics dataset as measured on mobile sensors from the file
“run_or_walk.csv.” List the columns in the dataset. Let the target variable “y” be the activity, and assign all the
columns after it to “x.”
Using Scikit-learn, fit a Gaussian Naive Bayes model and observe the accuracy. Generate a classification report
using Scikit-learn. Repeat the model once using only the acceleration values as predictors and then using only
the gyro values as predictors. Comment on the difference in accuracy between both the models.

Objective: Practice classification based on Naive Bayes algorithm. Identify the predictors that can be
influential.

Access: Click the Labs tab in the left side panel of the LMS. Copy or note the username and password that are
generated. Click the Launch Lab button. On the page that appears, enter the username and password in the
respective fields, and click Login.
Thank You

2023-YoungOnes Shortlist
No ratings yet
2023-YoungOnes Shortlist
84 pages
Ddecvi Ai Manual
100% (7)
Ddecvi Ai Manual
20 pages
Ant Colony Optimization
No ratings yet
Ant Colony Optimization
18 pages
01 AI Overview
No ratings yet
01 AI Overview
62 pages
Data Science Harvard Lecture 1 PDF
No ratings yet
Data Science Harvard Lecture 1 PDF
43 pages
Lec 5 Contd Minimax Alpha Beta Algorithm
No ratings yet
Lec 5 Contd Minimax Alpha Beta Algorithm
21 pages
Lecture 023+-+Decision+Trees+ - 1
No ratings yet
Lecture 023+-+Decision+Trees+ - 1
54 pages
Deep Learning - Project Scope Document
No ratings yet
Deep Learning - Project Scope Document
2 pages
Lesson 1 - Introduction To Big Data and Hadoop
No ratings yet
Lesson 1 - Introduction To Big Data and Hadoop
46 pages
Microsoft Excel 2013: MOS Foundation: Lesson 3 Create Cells and Ranges
No ratings yet
Microsoft Excel 2013: MOS Foundation: Lesson 3 Create Cells and Ranges
39 pages
Lecture-1to8-HCL-DSE - Sumita Narang - IDS PDF
No ratings yet
Lecture-1to8-HCL-DSE - Sumita Narang - IDS PDF
304 pages
Microsoft Excel 2013: MOS Foundation
No ratings yet
Microsoft Excel 2013: MOS Foundation
11 pages
IIMK - DS - W6 - Summary Deck
No ratings yet
IIMK - DS - W6 - Summary Deck
96 pages
01-Introduction To DS With Python
No ratings yet
01-Introduction To DS With Python
32 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
22 pages
Cheatsheet Midterms 2 - 3
No ratings yet
Cheatsheet Midterms 2 - 3
2 pages
Instructor Materials Chapter 6: Architecture For Big Data and Data Engineering
No ratings yet
Instructor Materials Chapter 6: Architecture For Big Data and Data Engineering
32 pages
Machine Learning Basics: 1. General Introduction
No ratings yet
Machine Learning Basics: 1. General Introduction
46 pages
Cluster
100% (1)
Cluster
72 pages
2 - LinearProg 1 PDF
No ratings yet
2 - LinearProg 1 PDF
21 pages
Get Data With Power BI Desktop: Angeles University Foundation College of Computer Studies
No ratings yet
Get Data With Power BI Desktop: Angeles University Foundation College of Computer Studies
35 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
Dataiku Datsheet
No ratings yet
Dataiku Datsheet
16 pages
Exploratory Data Analysis - Satyajit
No ratings yet
Exploratory Data Analysis - Satyajit
35 pages
PCA Using Python
No ratings yet
PCA Using Python
18 pages
DataMining S
No ratings yet
DataMining S
103 pages
NLP2 7
No ratings yet
NLP2 7
400 pages
PGP Purdue Projects DS
No ratings yet
PGP Purdue Projects DS
5 pages
Lesson 2 Linear Regression
100% (1)
Lesson 2 Linear Regression
21 pages
BDS Session 1
100% (1)
BDS Session 1
70 pages
Week 8-Association Rules Part 1
No ratings yet
Week 8-Association Rules Part 1
31 pages
Anomaly Detection
No ratings yet
Anomaly Detection
11 pages
New Ebook Guide To AI Data Science
No ratings yet
New Ebook Guide To AI Data Science
50 pages
Lesson 0 - Course Introduction
No ratings yet
Lesson 0 - Course Introduction
6 pages
Basics of Statistics1
No ratings yet
Basics of Statistics1
63 pages
Lesson 4 Deep Neural Network and Tools
No ratings yet
Lesson 4 Deep Neural Network and Tools
159 pages
DBSCAN
No ratings yet
DBSCAN
18 pages
SPA Full Course PPTs (9 Files Merged)
No ratings yet
SPA Full Course PPTs (9 Files Merged)
239 pages
UNIT - 2 .DataScience 04.09.18
No ratings yet
UNIT - 2 .DataScience 04.09.18
53 pages
BDM Unit I Slides Part 1
No ratings yet
BDM Unit I Slides Part 1
27 pages
B.A. 1st Notes PDF
No ratings yet
B.A. 1st Notes PDF
64 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
2.2 ML Session Bias Variance Tradeoffs
No ratings yet
2.2 ML Session Bias Variance Tradeoffs
38 pages
DATA Mining
No ratings yet
DATA Mining
55 pages
Chapter 08 Advanced SQL
No ratings yet
Chapter 08 Advanced SQL
28 pages
Approaches To The Analysis of Survey Data PDF
No ratings yet
Approaches To The Analysis of Survey Data PDF
28 pages
Module 4 - Theory and Methods
No ratings yet
Module 4 - Theory and Methods
161 pages
Unit II Notes
No ratings yet
Unit II Notes
36 pages
Bachelor of Science in Accountancy: Program Curriculum Ay 2020 - 2021
No ratings yet
Bachelor of Science in Accountancy: Program Curriculum Ay 2020 - 2021
6 pages
Lecture Note On Statistical Methods With An Application
No ratings yet
Lecture Note On Statistical Methods With An Application
489 pages
Data Analytics Program Training
No ratings yet
Data Analytics Program Training
13 pages
1 The Role of Statistics and The Data Analysis Process
100% (1)
1 The Role of Statistics and The Data Analysis Process
30 pages
WQD7005 Final Exam - 17219402
100% (1)
WQD7005 Final Exam - 17219402
12 pages
Predict 422 - Module 8
100% (1)
Predict 422 - Module 8
138 pages
Lesson 6 NoSQL Databases HBase
100% (1)
Lesson 6 NoSQL Databases HBase
47 pages
BI 09 Optimiz
No ratings yet
BI 09 Optimiz
52 pages
Big Data Analytics and Visualization Lab
No ratings yet
Big Data Analytics and Visualization Lab
193 pages
Download Complete Data Mining for Business Intelligence Concepts Techniques and Applications in Microsoft Office Excel r with XLMiner r 2nd ed Edition Patel PDF for All Chapters
100% (19)
Download Complete Data Mining for Business Intelligence Concepts Techniques and Applications in Microsoft Office Excel r with XLMiner r 2nd ed Edition Patel PDF for All Chapters
60 pages
Advanced Certification in Data Science and Artificial Intelligence
No ratings yet
Advanced Certification in Data Science and Artificial Intelligence
18 pages
Data Smart For Product Managers
100% (1)
Data Smart For Product Managers
13 pages
Multi-Criteria Decision Making
No ratings yet
Multi-Criteria Decision Making
5 pages
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
No ratings yet
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
4 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Chapter 5: Design of Goods & Services
No ratings yet
Chapter 5: Design of Goods & Services
2 pages
Lesson 1 - Introduction To Deep Learning
No ratings yet
Lesson 1 - Introduction To Deep Learning
40 pages
Lesson 10 - Text Mining PDF
No ratings yet
Lesson 10 - Text Mining PDF
72 pages
Deep Learning - Installation Guide
No ratings yet
Deep Learning - Installation Guide
4 pages
Data Science With SAS Ebook PDF
No ratings yet
Data Science With SAS Ebook PDF
783 pages
Apple Inc. Is An American
No ratings yet
Apple Inc. Is An American
1 page
Use Case Diagram: Login: Enter Username
No ratings yet
Use Case Diagram: Login: Enter Username
1 page
Smart Shopping: - Augmented Reality Based Shopping Application
No ratings yet
Smart Shopping: - Augmented Reality Based Shopping Application
13 pages
Financial Management Leasing
No ratings yet
Financial Management Leasing
38 pages
BEL Paint Coating Catalog
No ratings yet
BEL Paint Coating Catalog
14 pages
Incremental Kinematics For Finite Element
No ratings yet
Incremental Kinematics For Finite Element
20 pages
JS1 3RD Term Business Studies
100% (1)
JS1 3RD Term Business Studies
28 pages
ENERGETICS Ebook
No ratings yet
ENERGETICS Ebook
96 pages
A Survey Report On Paytm
No ratings yet
A Survey Report On Paytm
9 pages
You Will Love Again
No ratings yet
You Will Love Again
2 pages
Module 1.2
No ratings yet
Module 1.2
6 pages
Epp. SHS Ict.
No ratings yet
Epp. SHS Ict.
3 pages
OSINT Intelligence Report on Elon Musk
No ratings yet
OSINT Intelligence Report on Elon Musk
19 pages
Nationalism in India (Presentation and Explanation)
No ratings yet
Nationalism in India (Presentation and Explanation)
8 pages
Friction - Factors Affecting Friction
No ratings yet
Friction - Factors Affecting Friction
2 pages
Wells Fargo Bank Digital Sign Outage Tech Procedures
No ratings yet
Wells Fargo Bank Digital Sign Outage Tech Procedures
10 pages
Fopen SELEX PDF
No ratings yet
Fopen SELEX PDF
3 pages
Critical Care MCI CCU Assignment
No ratings yet
Critical Care MCI CCU Assignment
14 pages
Li Register20230324
No ratings yet
Li Register20230324
109 pages
New Anc Guideline Training
No ratings yet
New Anc Guideline Training
40 pages
Cemm 1 Ps
No ratings yet
Cemm 1 Ps
16 pages
Why Should Anyone Be Led You - PDF
No ratings yet
Why Should Anyone Be Led You - PDF
62 pages
Mata Trader 5 Ppt
No ratings yet
Mata Trader 5 Ppt
7 pages
CM 7 Case 1
No ratings yet
CM 7 Case 1
2 pages
Gaither Banchoffa
No ratings yet
Gaither Banchoffa
12 pages
OWNING AND OPERATING COST OF EQUIPMENT prahlad
No ratings yet
OWNING AND OPERATING COST OF EQUIPMENT prahlad
6 pages
Jigjiga University
No ratings yet
Jigjiga University
30 pages
Curriculum: Bulacan State University College of Education
No ratings yet
Curriculum: Bulacan State University College of Education
9 pages
Kala 22
No ratings yet
Kala 22
4 pages
Criterion D - Reflecting and Improving Performance (As Client and Coach) - Bridget Bitarabeho - Petronilla
No ratings yet
Criterion D - Reflecting and Improving Performance (As Client and Coach) - Bridget Bitarabeho - Petronilla
2 pages

Lesson 5 - Supervised Learning-Classification

Uploaded by

Lesson 5 - Supervised Learning-Classification

Uploaded by

Machine Learning

Lesson 5: Supervised Learning–Classification

© Simplilearn. All rights reserved.

Classification: A supervised learning algorithm

Confusion Matrix vs Cost Matrix

By the end of this lesson, you will be able to:

Understand classification as part of supervised learning

Demonstrate different classification techniques in Python

Evaluate classification models

A machine learning task that identifies the class to which an instance

Training a classifier model with respect to the available data

Name Rank Years Tenured

The model will classify if the professors are tenured or not

Name Rank Years Tenured Tenured?

Classification is a supervised learning algorithm as the training data contains labels

Record Spectacle Tear production Class Label

Few of the most commonly used classification algorithms:

Support Vector Machines Random Forest

Naîve Bayes Classifier

Decision Node A Decision Node

Terminal Node Decision Node Terminal Node Terminal Node

A tree-like structure in which the internal node represents

Each branch represents the outcome of the test, and each

A path from root to leaf represents classification rules

The tree is splitted whenever an impure node is detected

If impure node is detected, If impure node is detected, If pure node is detected,

1 Yes Single 125K No

Entropy Information Gain

• Entropy measures the impurity of a

Gain(S, Humidity) = 0.151

Gain(S, Wind) = 0.084

Working on Outlook=Sunny node:

Humidity provides the best prediction for the target

Finally, you arrive at leaf nodes with a strong decisions

Sunny Overcast Rain

High Normal Yes Strong Weak

Overfitting occurs when the learning algorithm continues to develop

A technique for reducing the Randomly draws datasets with

For classification, a committee

Interact Not Interact Interact Not Interact Interact Not Interact

Interact Not Interact Interact Not Interact Interact Not Interact

Neighbor PTM Interact Not Interact

Interact Not Interact Interact Not Interact

Neighbor PTM Interact Not Interact

Interact Not Interact Interact Not Interact

Focus on the predictive capability of a model

Class=Yes Class=No a: TP (true positive)

Consider a 2-class problem

If the model predicts every example to be

Hence, accuracy is misleading because the

Cost matrix takes weights into account

C(i|j) Class=Yes Class=No

ACTUAL Class=Yes C(Yes|Yes) C(No|Yes)

Cost PREDICTED CLASS

Model PREDICTED CLASS Model PREDICTED CLASS

Accuracy = 80% Accuracy = 90%

Count PREDICTED CLASS Accuracy is proportional to cost if

Create a similar figure; select the not.fully.paid column

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier

P(A|B) = P(B|A) P(A) • P(A) – Class Prior Probability

Calculating likelihood of each attribute

Similarly likelihood of “No” given Sunny is:

P(A|B) = P(No|Sunny) = P(Sunny|No)* P(No) / P(Sunny) = (0.4 x 0.36) /0.36 = 0.40

Likelihood table for Humidity Likelihood table for Wind

Likelihood of “Yes” = P(Outlook = Rain|Yes)*P(Humidity= High|Yes)* P(Wind=

Likelihood of “No” = P(Outlook = Rain|No)*P(Humidity= High|No)* P(Wind=

Normalizing the values

P(Yes) = 0.0199 / (0.0199+ 0.0166) = 0.55

P(No) = 0.0166 / (0.0199+ 0.0166) = 0.45

It’s difficult to evaluate the optimal separator.

Closest to the hyperplane are support vectors.

Margin ρ of the separator is the distance between

Helps generalize the predictions Takes care only of the support

Let training set {(xi, yi)}i=1..n, xiRd, yi  {-1, 1}

Now, you can formulate the quadratic optimization problem:

Likelihood of “Yes” = P(Outlook = Rain|Yes)P(Humidity= High|Yes) P(Wind=

Likelihood of “No” = P(Outlook = Rain|No)P(Humidity= High|No) P(Wind=