100% found this document useful (1 vote)
414 views

Lesson 5 - Supervised Learning-Classification

Uploaded by

aditya jain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
414 views

Lesson 5 - Supervised Learning-Classification

Uploaded by

aditya jain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

Machine Learning

Lesson 5: Supervised Learning–Classification

© Simplilearn. All rights reserved.


Concepts Covered

Classification: A supervised learning algorithm

Decision Tree

Random Forest

Naïve Bayes

Confusion Matrix vs Cost Matrix

Kernel SVM
Learning Objectives

By the end of this lesson, you will be able to:

Understand classification as part of supervised learning

Demonstrate different classification techniques in Python

Evaluate classification models


Classification
Topic 1: Definition of Classification
What Is Classification?

A machine learning task that identifies the class to which an instance


belongs

Acme Article
Acme
AcmeArticle
Article

Technology

Acme Article
Food
FoodArticle
FoodArticle
Article

Sports

Bar Article
Bar Article
Bar Article

Entertainment
Classification: Example

Training a classifier model with respect to the available data

Classification
Training Algorithms
Data

Name Rank Years Tenured


Mike Assistant Prof 3 no
Mary Assistant Prof 7 yes Classifier
Bill Professor 2 yes (Model)
Jim Associate Prof 7 yes
Dave Associate Prof 6 no
Anne Associate Prof 3 no
IF rank = “professor”
OR years > 6
THEN tenured = “yes”
Classification: Example

The model will classify if the professors are tenured or not

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)

Name Rank Years Tenured Tenured?


Tom Assistant Prof 2 no
Merlisa Assistant Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes Yes
Classification: Work Flow

A typical classifier model workflow with input training data and output labels

(a) Training

label
machine
learning
feature algorithm
extractor
features
input

(a) Prediction

feature
Classifier label
extractor
model
features
input
Classification: A Supervised Learning Algorithm

Classification is a supervised learning algorithm as the training data contains labels


Supervised Learning Model

Record Spectacle Tear production Class Label


ID
Age
Prescription
Astigmatic
Rate Lenses Feature
Training Text, Vectors
1 Young Myope No Reduced Noncontact
2 Young Myope No Normal Soft contact
Documents,
Images, etc.
3 Young Myope Yes Reduced Noncontact
4 Young Myope Yes Normal Hard contact
5 Young Hypermetrope No Reduced Noncontact
6 Young Hypermetrope No Normal Soft contact
7 Young Hypermetrope Yes Reduced Noncontact
8 Young Hypermetrope Yes Normal Hard contact
9 Pre-presbyopic Myope No Reduced Noncontact Machine
10 Pre-presbyopic Myope No Normal Soft contact Labels Learning
11 Pre-presbyopic Myope Yes Reduced Noncontact Algorithm
12 Pre-presbyopic Myope Yes Normal Hard contact
13 Pre-presbyopic Hypermetrope No Reduced Noncontact
14 Pre-presbyopic Hypermetrope No Normal Soft contact
15 Pre-presbyopic Hypermetrope Yes Reduced Noncontact
16 Pre-presbyopic Hypermetrope Yes Normal Noncontact Feature
17 Presbyopic Myope No Reduced Noncontact New Text, Vectors
Predictive Expected
Document,
18 Presbyopic Myope No Normal Noncontact Model Label
Images, etc.
19 Presbyopic Myope Yes Reduced Noncontact
20 Presbyopic Myope Yes Normal Hard contact
Classification
Topic 2: Use Cases and Algorithms
Sentiment
Analysis
Fraud
Detection
Face
Detection
Classification Algorithms

Few of the most commonly used classification algorithms:

Decision Tree

Support Vector Machines Random Forest


(SVM)

Naîve Bayes Classifier


Classification
Topic 3: Decision Tree Classifier
Decision Tree Classifier

ROOT Node

Branch/Sub-Tree
Splitting

Decision Node A Decision Node

B C

Terminal Node Decision Node Terminal Node Terminal Node

A tree-like structure in which the internal node represents


the test on an attribute
Terminal Node Terminal Node

Each branch represents the outcome of the test, and each


leaf node represents the class label

A path from root to leaf represents classification rules


Decision Tree: Schematic Representation

The tree is splitted whenever an impure node is detected

a1 a2 a3 a4 a5 a6

X Y
Z

If impure node is detected, If impure node is detected, If pure node is detected,


select best attribute select best attribute it can be classified as
and continue and continue leaf node
Decision Tree: Example 1
Below example illustrates the splitting attributes with respect to the adjacent training
data

Splitting Attributes
Tid Refund Marital Taxable Cheat
Status Income

1 Yes Single 125K No


2 No Married 100K No Refund
Yes No
3 No Single 70K No
4 Yes Married 120K No
NO MarSt
5 No Divorced 95K Yes Single, Married
Divorced
6 No Married 60K No
TaxInc NO
7 Yes Divorced 220K No
< 80K > 80K
8 No Single 85K Yes
9 No Married 75K No
NO YES
10 No Single 90K Yes
10

Training Data
Decision Tree: Example 2
Forming a decision tree to check if the match will be played or not based on climatic
conditions

Play = 3
Play = 3 Don’t play = 0
Don’t play = 2

Play = 0
Rainy Don’t play = 2
Play = 9
Windy
Don’t play = 5

Play = 4
Outlook ? Don’t play = 0

Cloudy
Play = 0
Don’t play = 3

Play = 2 oC

Don’t play = 3
Play = 2
Don’t play = 0
Sunny
Decision Tree Formation

Entropy Information Gain

• Entropy measures the impurity of a


• Information gain is the expected reduction
collection of examples.
in entropy caused by partitioning the
• It depends on the distribution of the examples on an attribute.
random variable.
• Higher the information gain, the more
• Entropy, in general, measures the effective the attribute in classifying
amount of information in a random training data.
variable:
C C
• Expected reduction in entropy, given A
H(X) = –  pi log2 pi =  pi log2 1/ pi
i=1 i=1 |Sv|
Gain(S, A) = Entropy(S) −  Entropy(Sv)
X = {i, …, c} |S|
v Values(A)
for classification in c classes
Values(A) = possible values for A
Which Attribute Is the Best Classifier?

The attribute with the highest information gain is selected as the splitting attribute

S: [9+,5-] S: [9+,5-]
E=0.940 E=0.940
Gain(S, Outlook) = 0.246
Humidity Windy

Gain(S, Humidity) = 0.151


High Normal Weak Strong

Gain(S, Wind) = 0.084


[3+,4-] [6+,1-] [6+,2-] [3+,3-]
E=0.985 E=0.592 E=0.811 E=1.00
Gain (S, Wind) Gain(S, Temperature) = 0.029
Gain (S, Humidity)
= .940 – (7/14).985 – (7/14).592 = .940 – (8/14).811 – (6/14)1.0
=.151 =.048
Which Attribute Is the Best Classifier?

The attributes within outlook are further splitted with respect to their gains

Working on Outlook=Sunny node:


Gain(SSunny, Humidity) = 0.970 − 3/5  0.0 − 2/5  0.0 = 0.970
Gain(SSunny, Wind) = 0.970 − 2/5  1.0 − 3.5  0.918 = 0 .019
Gain(SSunny, Temp.) = 0.970 − 2/5  0.0 − 2/5  1.0 − 1/5  0.0 = 0.570

Humidity provides the best prediction for the target

For each possible value of Humidity, you can add a successor to the tree.
Which Attribute Is the Best Classifier?

Finally, you arrive at leaf nodes with a strong decisions

Outlook

Sunny Overcast Rain

Humidity Windy
{D3, D7, D12, D13}
[4+, 0-]

High Normal Yes Strong Weak

{D1, D2, D8} {D9, D11} {D4, D5, D10} {D6, D14}
No Yes Yes No
Overfitting of Decision Trees

Overfitting occurs when the learning algorithm continues to develop


hypotheses that reduce training set error at the cost of an increased test set
error.
Avoiding Overfitting of Decision Trees

Post Pruning
Classification
Topic 4: Random Forest Classifier
Bagging and Bootstrapping

Bagging Bootstrapping

A technique for reducing the Randomly draws datasets with


variance of an estimated replacement from the
prediction function training data

For classification, a committee


of trees each cast a vote for Each sample is of the same
the predicted class size as the original training set
Bagging and Bootstrapping
Create bootstrap samples from the training data

M features

N examples

....…
Decision Tree Classifier
Each sample contributes to a decision tree classifier

Location
M features Similarity
N examples

Gene
Expression Domain-motif

Interact
....…

Neighbor Gene
Degree
Function Similarity Expression

Interact Not Interact Interact Not Interact Interact Not Interact


Random Forest Classifier
Location
Similarity

Gene
Expression Domain-motif

Interact

Neighbor Gene
Degree

M features
Function Similarity Expression
N examples

Interact Not Interact Interact Not Interact Interact Not Interact

Neighbor
process similarity

Tissue
Expression
Centraility Take the
majority
vote
....…

Neighbor PTM Interact Not Interact


Function Similarity

Interact Not Interact Interact Not Interact

Neighbor
process similarity

Tissue
Centraility
Expression

Neighbor PTM Interact Not Interact


Function Similarity

Interact Not Interact Interact Not Interact


Classification
Topic 5: Performance Measures
Confusion Matrix

Focus on the predictive capability of a model

PREDICTED CLASS

Class=Yes Class=No a: TP (true positive)


b: FN (false negative)
ACTUAL Class=Yes a b
c: FP (false positive)
CLASS
Class=No c d d: TN (true negative)
Accuracy Metric
Ratio of true positives and true negatives to the sum of true positives, true negatives, false negatives,
and false positives

PREDICTED CLASS

Class=Yes Class=No

ACTUAL Class=Yes a b
CLASS (TP) (FN)
Class=No c d
(FP) (TN)

a+d TP + TN
Accuracy = =
a + b + c + d TP + TN + FP + FN
Limitation of Accuracy

Consider a 2-class problem


Number of Class 0 examples = 9990
Number of Class 1 examples = 10

If the model predicts every example to be


class 0, accuracy is 9990/10000 = 99.9 %

Hence, accuracy is misleading because the


model does not detect any class 1 example
Cost Matrix

Cost matrix takes weights into account

PREDICTED CLASS

C(i|j) Class=Yes Class=No

ACTUAL Class=Yes C(Yes|Yes) C(No|Yes)


CLASS
Class=No C(Yes|No) C(No|No)

wa + w d
Cost of classifying class j example as class i Weighted Accuracy = 1 4

wa + wb+ wc + w d
1 2 3 4
Computing Cost of Classification

Cost PREDICTED CLASS


Matrix
C(i|j) + -
ACTUAL + -1 100
CLASS
- 1 0

Model PREDICTED CLASS Model PREDICTED CLASS


M1 M2
+ - + -
ACTUAL + 150 40 ACTUAL + 250 45
CLASS CLASS
- 60 250 - 5 200

Accuracy = 80% Accuracy = 90%


Cost = 3910 Cost = 4255
Cost vs. Accuracy

Count PREDICTED CLASS Accuracy is proportional to cost if


1. C(Yes|No)=C(No|Yes) = q
Class=Yes Class=No 2. C(Yes|Yes)=C(No|No) = p

ACTUAL
Class=Yes a b N=a+b+c+d
CLASS Class=No c d
Accuracy = (a + d)/N

Cost = p (a + d) + q (b + c)
Cost PREDICTED CLASS
= p (a + d) + q (N – a – d)
Class=Yes Class=No
= q N – (q – p)(a + d)
ACTUAL Class=Yes p q
= N [q – (q-p)  Accuracy]
CLASS
Class=No q p
Assisted Practice
Random Forest Classifier Duration: 15 mins.

Problem Statement: Predict the survival of a horse based on various observed medical conditions. Load
the data from “horses.csv” and observe whether it contains missing values. The dataset contains many
categorical features; replace them with label encoding. Replace the missing values by the most frequent value
in each column. Fit a decision tree classifier and random forest classifier, and observe the accuracy.

Objective: Learn to fit a decision tree, and compare its accuracy with random forest classifier.

Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and password
that are generated. Click on the Launch Lab button. On the page that appears, enter the username and
password in the respective fields, and click Login.
Unassisted Practice
Random Forest Classifier Duration: 15 mins.

Problem Statement: PeerLoanKart is an NBFC (Non-banking Financial Company) that facilitates peer-to-peer loan.
It connects people who need money (borrowers) with people who have money (investors). As an investor, you would
want to invest in people who showed a profile of having a high probability of paying you back.
You “as an ML expert” create a model that will help predict whether a borrower will pay the loan or not.

Objective: Increase profits up to 20% as NPA will be reduced due to loan disbursal for only creditworthy borrowers

Note: This practice is not graded. It is only intended for you to apply the knowledge you gained to solve real-world
problems.

Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and password that are
generated. Click on the Launch Lab button. On the page that appears, enter the username and password in the
respective fields, and click Login.
Import Libraries

Code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from sklearn.model_selection import train_test_split
Get the Data

Code

loans = pd.read_csv('loan_borowwer_data.csv’)
loans.describe()
Exploratory Data Analysis
Create a histogram of two FICO distributions on top of each other, one for
each credit.policy outcome.

Code

plt.figure(figsize=(10,6))
loans[loans['credit.policy']==1]['fico'].hist(alpha=0.5,color='blue',
bins=30,label='Credit.Policy=1')
loans[loans['credit.policy']==0]['fico'].hist(alpha=0.5,color='red',
bins=30,label='Credit.Policy=0')
plt.legend()
plt.xlabel('FICO')
Exploratory Data Analysis
Exploratory Data Analysis

Create a similar figure; select the not.fully.paid column

Code

plt.figure(figsize=(10,6))
loans[loans['not.fully.paid']==1]['fico'].hist(alpha=0.5,color='blue',
bins=30,label='not.fully.paid=1')
loans[loans['not.fully.paid']==0]['fico'].hist(alpha=0.5,color='red',
bins=30,label='not.fully.paid=0')
plt.legend()
plt.xlabel('FICO')
Exploratory Data Analysis
Exploratory Data Analysis
Create a countplot using seaborn showing the counts of loans by purpose,
with the hue defined by not.fully.paid.

Code

plt.figure(figsize=(11,7))
sns.countplot(x='purpose',hue='not.fully.paid',data=loans,palette='Set1')
Setting Up the Data
Create a list of elements, containing the string “purpose.” Call this list
cat_feats.

Code

cat_feats = ['purpose’]
Setting Up the Data
Now use pd.get_dummies (loans,columns=cat_feats,drop_first=True) to create a fixed larger data
frame that has new feature columns with dummy variables. Set this data frame as final_data.

Code

final_data = pd.get_dummies(loans,columns=cat_feats,drop_first=True)
final_data.info()
Train-Test Split

Code

X = final_data.drop('not.fully.paid',axis=1)
y = final_data['not.fully.paid']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30,
random_state=101)
Training Decision Tree Model

Code

from sklearn.tree import DecisionTreeClassifier


dtree = DecisionTreeClassifier()
dtree.fit(X_train,y_train)
Evaluating Decision Tree

Create predictions from the test set, and create a classification report and a confusion matrix.

Code

predictions = dtree.predict(X_test)
from sklearn.metrics import classification_report,confusion_matrix
print(classification_report(y_test,predictions))
Confusion Matrix

Code

print(confusion_matrix(y_test,predictions))
Training Random Forest Model

Code

from sklearn.ensemble import RandomForestClassifier


rfc = RandomForestClassifier(n_estimators=600)
rfc.fit(X_train,y_train)
Evaluating Random Forest Model

Code

predictions = rfc.predict(X_test)
from sklearn.metrics import classification_report,confusion_matrix
print(classification_report(y_test,predictions))
Printing the Confusion Matrix

Code

print(confusion_matrix(y_test,predictions))
Classification
Topic 7: Naïve Baye’s Classifier
Naïve Baye’s Classifier and Baye’s Theorem
Classification technique based on Baye’s
theorem

Example:

Baye’s Theorem

P(A|B) = P(B|A) P(A) • P(A) – Class Prior Probability


• P(B|A) – Likelihood
P(B) Where, • P(A|B) – Posterior Probability
• P(A) - Predictor Prior Probability

Note: Naive Baye’s classifier assumes that the presence of a particular feature in a class is unrelated to the
presence of any other feature.
Naïve Baye’s Classifier: Example
As the first step toward prediction using naïve bayes, you will have to estimate frequency of each and every
attribute

Play
Frequency Table
Yes No
Sunny 3 2
Outlook Overcast 4 0
Rainy 3 2

Play
Frequency Table
Yes No
High 3 4
Humidity
Normal 6 1

Play
Frequency Table
Yes No
Strong 6 2
Wind
Weak 3 3
Building Likelihood Tables

Calculating likelihood of each attribute

Play
Likelihood Table P(B|A) = P(Sunny|Yes) = 3/9 = 0.33
Yes No
Sunny 3/9 2/5 5/14
P(B) = P(Sunny) = 5/14 = 0.36
Outlook Overcast 4/9 0/5 4/14
Rainy 3/9 2/5 5/14
P(A) = P(Yes) = 10/14 = 0.71
10/14 4/14

Similarly likelihood of “No” given Sunny is:

P(A|B) = P(No|Sunny) = P(Sunny|No)* P(No) / P(Sunny) = (0.4 x 0.36) /0.36 = 0.40


Building Likelihood Tables

Likelihood table for Humidity Likelihood table for Wind


Play Play
Likelihood Table Likelihood Table
Yes No Yes No
High 3/9 4/5 7/14 Weak 6/9 2/5 8/14
Humidity Wind
Normal 6/9 1/5 7/14 Strong 3/9 3/5 6/14
9/14 5/14 9/14 5/14

P(Yes|High) = 0.33 x 0.6 / 0.5 = 0.42 P(Yes|Weak) = 0.67 x 0.64 / 0.57 = 0.75

P(No|High) = 0.8 x 0.36 / 0.5 = 0.58 P(No|Weak) = 0.4 x 0.36 / 0.57 = 0.25
Getting the Output

Outlook = Rain
Humidity = High
Wind = Weak
Play = ?

Likelihood of “Yes” = P(Outlook = Rain|Yes)*P(Humidity= High|Yes)* P(Wind=


Weak|Yes)*P(Yes) = 2/9 * 3/9 * 6/9 * 9/14 = 0.0199

Likelihood of “No” = P(Outlook = Rain|No)*P(Humidity= High|No)* P(Wind=


Weak|No)*P(No) = 2/5 * 4/5 * 2/5 * 5/14 = 0.0166
Getting the Output

Normalizing the values

P(Yes) = 0.0199 / (0.0199+ 0.0166) = 0.55

P(No) = 0.0166 / (0.0199+ 0.0166) = 0.45


The model predicts
that there is a 55%
chance
that there will be
game tomorrow
Classification
Topic 8: Support Vector Machines
Linear Separators

Consider a binary separation which can be viewed as the task of separating classes
in feature space.

𝒘𝑻 𝒙 + 𝒃 = 𝟎
𝒘𝑻 𝒙 + 𝒃 > 𝟎 𝒘𝑻 𝒙 + 𝒃 < 𝟎

𝒇 𝒙 = 𝒔𝒊𝒈𝒏(𝒘𝑻 𝒙 + 𝒃)
Optimal Separation

It’s difficult to evaluate the optimal separator.


Concept of Classification Margin

wT xi + b
𝒓 Distance from example xi to the separator ris=
w

Closest to the hyperplane are support vectors.

Margin ρ of the separator is the distance between


support vectors.
Maximizing Classification Margin

Helps generalize the predictions Takes care only of the support


and perform better on the test vectors, ignoring other training
data by not overfitting the examples
model to the training data
Linear SVM: Mathematically

Let training set {(xi, yi)}i=1..n, xiRd, yi  {-1, 1}


be separated by a hyperplane with margin
ρ. Then for each training example (xi, yi):
wTxi + b ≤ - ρ/2 if yi = -1
wTxi + b ≥ ρ/2 if yi = 1  yi(wTxi + b) ≥ ρ/2 For every support vector xs, the
above inequality is an equality. After
rescaling w and b by ρ/2 in the
equality, you obtain the distance
between each xs. The hyperplane is:

y s ( w T x s + b) 1
r= =
Then the margin can be expressed w w
through (rescaled) w and b as:
2
 = 2r =
w
Linear SVM: Mathematically

Now, you can formulate the quadratic optimization problem:

Find w and b such that = 2


is maximized
w
and for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1

You can reformulate the problem as:

Find w and b such that


Φ(w) = ||w||2=wTw is minimized
and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1
Nonlinear SVMs
Scenario 1

• Datasets that are linearly separable with some noise:

0 x

Scenario 2

• When the dataset is hard:

0 x

Scenario 3

• Mapping data to a higher dimensional space


Nonlinear SVMs: Feature Spaces

The original feature space can always be mapped to some higher-dimensional


feature space where the training set is separable.

𝝓: 𝒙 → 𝝋(𝒙)
The Kernel Trick

• The linear classifier relies on inner product between vectors K(xi,xj)=xiTxj


• If every datapoint is mapped into high-dimensional space via some transformation Φ: x → φ(x), the inner
product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
• A kernel function is a function that is equivalent to an inner product in a feature space.
• Example:
2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2,
Need to show that K(xi,xj)= φ(xi) Tφ(xj):
K(xi,xj)=(1 + xiTxj)2,= 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2=
= [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2] =
= φ(xi) Tφ(xj), where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2]
• Thus, a kernel function implicitly maps data to a high-dimensional space (without the need to compute each φ(x)
explicitly).
Assisted Practice
Support Vector Machines Duration: 15 mins.

Problem Statement: Motion Studios is the largest radio production house in Europe. Its total revenue is $
1B+. The company has launched a new reality show "The Star RJ." The show is about finding a new radio jockey
who will be the star presenter on upcoming shows.
In the first round, participants have to upload their voice clip online. The clip will be evaluated by experts for
selection to the next round. There is a separate team in the first round for evaluation of male and female
voice.
Response to the show is unprecedented, and company is flooded with voice clips.
You “as an ML” expert have to classify the voice as either male or female so that the first level of filtration is
quicker.

Objective: Optimize selection process.

Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and password that
are generated. Click on the Launch Lab button. On the page that appears, enter the username and password
in the respective fields, and click Login.
Unassisted Practice
Support Vector Machines Duration: 15 mins.

Problem Statement: Load the data from “college.csv” that has attributes collected about private and public colleges for a
particular year. Predict the private/public status of the colleges from other attributes.
Use LabelEncoder to encode the target variable to numerical form. Split the data such that 20% of the data is set aside for
testing. Fit a linear svm from scikit learn and observe the accuracy. [Hint: Use Linear SVC]
Preprocess the data using StandardScalar and fit the same model again. Observe the change in accuracy.
Use scikit learn’s gridsearch to select the best hyperparameter for a nonlinear SVM. Identify the model with best score and
its parameters. [Hint: Refer to model_selection module of Scikit learn]

Objective: Employ SVM from scikit learn for binary classification and measure the impact of preprocessing data and hyper
parameter search using grid search.

Note: This practice is not graded. It is only intended for you to apply the knowledge you have gained to solve real-world
problems.
Access: Click on the Labs tab on the left side panel of the LMS. Copy or note the username and password that are
generated. Click on the Launch Lab button. On the page that appears, enter the username and password in the respective
fields, and click Login.
Import the Dataset

Code

import pandas as pd
df = pd.read_csv("College.csv")
df.columns
Label Encoding

Code

from sklearn.model_selection import train_test_split


from sklearn.preprocessing import LabelEncoder
X, y = df.iloc[:, 1:].values, df.iloc[:, 0].values
# male -> 1
# female -> 0
target_encoder = LabelEncoder()
y = target_encoder.fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=1)
print(X_train.shape)
Fit the Linear SVC Classifier

Code

from sklearn.svm import LinearSVC,SVC


classifier = LinearSVC()

classifier.fit(X_train,y_train)
y_predict = classifier.predict(X_test)
classifier.score(X_test,y_test)
Obtain Performance Matrix

Code

from sklearn.metrics import confusion_matrix

print(confusion_matrix(y_predict,y_test))
Fit the SVC Classifier

Code

classifier = SVC()
classifier.fit(X_train,y_train)
classifier.score(X_test,y_test)
Preprocess the Data

Code

from sklearn.model_selection import GridSearchCV


from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X, y = df.iloc[:, 1:].values, df.iloc[:, 0].values
X = scaler.fit_transform(X)
target_encoder = LabelEncoder()
y = target_encoder.fit_transform(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=1)
print(X_train.shape)
Refitting the SVC Model

Code

classifier = SVC()
classifier.fit(X_train,y_train)
classifier.score(X_test,y_test)
Fitting Grid Search

Code

import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit
C_range = np.logspace(-2, 10, 13)
gamma_range = np.logspace(-9, 3, 13)
param_grid = dict( gamma=gamma_range,C=C_range)
grid = GridSearchCV(SVC(), param_grid=param_grid)
grid.fit(X_train, y_train)
Getting the Best Hyperparameter

Code

print("The best parameters are %s with a score of %0.2f"


% (grid.best_params_, grid.best_score_))
Key Takeaways

Now, you are now able to:

Understand classification as part of supervised learning

Demonstrate different classification techniques in Python

Evaluate classification models


Knowledge
Check

© Simplilearn. All rights reserved.


Knowledge
Check Let us train T1, a decision tree, with the data given below. Which feature will you split
at the root?
1

a. x1

b. x2

c. x3

d. y

© Simplilearn. All rights reserved.


Knowledge
Check Let us train T1, a decision tree, with the data given below. Which feature will you split
at the root?
1

a. x1

b. x2

c. x3

d. y

The correct answer is c. x3


x3 will split because it has the lowest classification error. At row 3, x3=1, y=-1; there is only one error compared to other
features.
Knowledge
Check If you are training a decision tree, and you are at a node in which all of its data has the
same y value, you should:
2

a. Find the best feature to split

b. Create a leaf that predicts the y value of all the data

c. Terminate recursions on all branches and return the current tree

d. Go back to the parent node and select a different feature to split so that the y values are
not all the same at this node

© Simplilearn. All rights reserved.


Knowledge
Check If you are training a decision tree, and you are at a node in which all of its data has the
same y value, you should
2

a. Find the best feature to split

b. Create a leaf that predicts the y value of all the data

c. Terminate recursions on all branches and return the current tree

d. Go back to the parent node and select a different feature to split so that the y values are
not all the same at this node

The correct answer is b. Create a leaf that predicts the y value of all the data
You should create a leaf that predicts the y value of all the data.
Lesson-End Project Duration: 20 mins.

Problem Statement: Load the kinematics dataset as measured on mobile sensors from the file
“run_or_walk.csv.” List the columns in the dataset. Let the target variable “y” be the activity, and assign all the
columns after it to “x.”
Using Scikit-learn, fit a Gaussian Naive Bayes model and observe the accuracy. Generate a classification report
using Scikit-learn. Repeat the model once using only the acceleration values as predictors and then using only
the gyro values as predictors. Comment on the difference in accuracy between both the models.

Objective: Practice classification based on Naive Bayes algorithm. Identify the predictors that can be
influential.

Access: Click the Labs tab in the left side panel of the LMS. Copy or note the username and password that are
generated. Click the Launch Lab button. On the page that appears, enter the username and password in the
respective fields, and click Login.
Thank You

© Simplilearn. All rights reserved.

You might also like