0% found this document useful (0 votes)
75 views47 pages

Aml CS 9 PRV

The document discusses machine learning module 5 on classification models, covering topics like linear classification, naive Bayes classifier, logistic regression, and support vector machines. It provides examples and explanations of classification process and classifier types, focusing in more detail on naive Bayes classification through examples and explanations of its probabilistic model and assumptions. The document is intended for a course on applied machine learning at BITS Pilani university in India.

Uploaded by

ved prakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views47 pages

Aml CS 9 PRV

The document discusses machine learning module 5 on classification models, covering topics like linear classification, naive Bayes classifier, logistic regression, and support vector machines. It provides examples and explanations of classification process and classifier types, focusing in more detail on naive Bayes classification through examples and explanations of its probabilistic model and assumptions. The document is intended for a course on applied machine learning at BITS Pilani university in India.

Uploaded by

ved prakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Applied Machine Learning –

Module -5
Raja vadhana P
Assistant Professor – BITS CSIS
BITS Pilani [email protected]
Pilani Campus
Module 5 : Classification Model I

A Linear Classification

B Naïve Bayes Classifier

C Applications of Naïve Bayes Classifier


Important Note :
D Logistic Regression Both Saturday and Sunday Session
Contents are placed in this same
E Linear Support Vector Machine document

F Comparative Analysis and Applicability

BITS Pilani, Pilani Campus


Module 5 : Classification Model I :
Post Mid Semester Exam

A Linear Classification

B Naïve Bayes Classifier

C Applications of Naïve Bayes Classifier

D Logistic Regression

E Linear Support Vector Machine

F Comparative Analysis and Applicability

BITS Pilani, Pilani Campus


LINEAR CLASSIFICATION

Generative Models
Discriminative Models
Tree Based Models

BITS Pilani, Pilani Campus


Classification Vs Other ML Technique

BITS Pilani, Pilani Campus


Classification Steps

Supervised task of dividing objects such that each object is assigned to one of a
number of mutually exclusive and exclusive categories called classes

1. Divide the data / record into training set and test set
2. Derive a model for the class attribute as a function of other important variables
from the training set
3. Pass the test set and get the class value to Validate the accuracy

GOAL : Previously unseen records should be assigned a class as accurately as


possible.
BITS Pilani, Pilani Campus
Classification Process

Tid Attrib1 Attrib2 Attrib3 Class Learning


1 Yes Large 125K No
algorithm
2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No


Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn


8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes


Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction


14 No Small 95K ?

15 No Large 67K ?
10

Test Set

BITS Pilani, Pilani Campus


Classification Process : 1. Construction

BITS Pilani, Pilani Campus


Classification Process 2: Usage

•Evaluation Metrics

BITS Pilani, Pilani Campus


Classifier Types
Model Building

Generative Discriminative Tree Based

y=1

x2 x2
Decision
y=0 Boundary

x1 x1

𝜃0 + 𝜃𝑖 𝑥𝑖 ≥ 0 𝐼𝐹 𝑂𝑈𝑇𝐿𝑂𝑂𝐾 = Overcast THEN PLAY = Yes


P( X 1 X 2  X d | Y ) P(Y )
P(Y | X 1 X 2  X n )  𝑖 ELSE
P( X 1 X 2  X d ) 𝐼𝐹 𝑂𝑈𝑇𝐿𝑂𝑂𝐾 = Rain AND WIND = Strong
𝜃0 + 𝜃𝑖 𝑥𝑖 < 0 THEN PLAY = No
𝑖

BITS Pilani, Pilani Campus


Classifier Types
Output Labels

Binary Multi Class Multi Output

𝑥2 x2
x2

𝑥1 x1
x1
Multi Label

x2

x1 BITS Pilani, Pilani Campus


Classifier Types
Output Labels

Binary
Multi Label

x2
x2

x1
x1
from sklearn.neighbors import KNeighborsClassifier
y_train_large = (y_train >= 7)
y_train_odd = (y_train % 2 == 1)
y_multilabel = np.c_[y_train_large, y_train_odd]

knn_clf = KNeighborsClassifier()
knn_clf.fit(X_train, y_multilabel)
knn_clf.predict([some_digit])

Ans: array([[False, True]])

BITS Pilani, Pilani Campus


NAÏVE BAYES CLASSIFICATION

Classifier Example
Smoothing Technique
Python Example

BITS Pilani, Pilani Campus


Naïve Bayes Classifier

Sky AirTemp Humidity Wind Forecast Enjoy


Sport?

Sunny Warm Normal Strong Same Yes

Sunny Warm High Strong Same No

Rainy Cold High Strong Change No

Sunny Warm Normal Breeze Same Yes

Sunny Hot Normal Breeze Same No

Rainy Cold High Strong Change No

Sunny Warm High Strong Change Yes

Rainy Warm Normal Breeze Same Yes

BITS Pilani, Pilani Campus


Naïve Bayes Classifier

• For a case of N-input parameters X = {x1,x2,…, xN} and output variable of K classes C=
{c1,c2,…,cK}
• The probability for a class cj given input vector, based on Bayes Theorem is:
P(cj | x1,x2,…, xN) = P(x1,x2,…, xN | cj). P(cj) / P(x1,x2,…, xN)

• Assuming the input parameters are conditionally independent given cj, applying conditional
independence
P(cj | x1,x2,…, xN)
= (P(x1 | cj). P(x2 | cj).. P(xN | cj)). P(cj) / P(x1,x2,…, xN)
= P(cj) . 𝜋 P(xi | cj) / P(x1,x2,…, xN)
Sky AirTemp Humidity Wind Forecast Enjoy
Sport?

• Treatment of numerical values - Idea Sunny Warm Normal Strong Same Yes

−(𝒙−𝝁)𝟐 Sunny Warm High Strong Same No


𝒆 𝟐𝝈𝟐
P(No.of.Friends = 4 | Enjoy=Yes) = Rainy Cold High Strong Change No
𝟐𝝅𝝈𝟐
Sunny Warm Normal Breeze Same Yes

Sunny Hot Normal Breeze Same No

Rainy Cold High Strong Change No

Sunny Warm High Strong Change Yes

Rainy Warm Normal Breeze Same Yes

BITS Pilani, Pilani Campus


Naïve Bayes Classifier

P(Enjoy=Yes | X) = P(X | Enjoy=Yes). P(Enjoy=Yes) / P(X)


= P(X | Enjoy=Yes). P(Enjoy=Yes)
= P(X | Enjoy=Yes). (3/7)
= P(Sunny | Enjoy=Yes). P(Warm | Enjoy=Yes). P(Normal | Enjoy=Yes). P(Strong | Enjoy=Yes).
P(Change | Enjoy=Yes).(3/7) Sky AirTemp Humidity Wind Forecast Enjoy
Sport?
= (3/3) . (3/3) . (2/3) . (2/3) . (1/3) . (3/7) Sunny Warm Normal Strong Same Yes
=0.0635
Sunny Warm High Strong Same No

Rainy Cold High Strong Change No

P(Enjoy=Yes | X) > P(Enjoy=No | X)  EnjoySport = Yes Sunny Warm Normal Breeze Same Yes

Sunny Hot Normal Breeze Same No

Rainy Cold High Strong Change No

Sunny Warm High Strong Change Yes

P(Enjoy=No | X) = P(X | Enjoy=No). P(Enjoy=No) / P(X) Sunny Warm Normal Strong Change ????
= P(X | Enjoy=No). P(Enjoy=No)
Rainy Warm Normal Breeze Same ????
= P(X | Enjoy=No). (4/7)
= P(Sunny | Enjoy=No). P(Warm | Enjoy=No). P(Normal | Enjoy=No). P(Strong | Enjoy=No). P(Change
| Enjoy=No). (4/7)
= (2/4) . (1/4) . (1/4) . (3/4) . (2/4) . (4/7)
= 0.006696

BITS Pilani, Pilani Campus


Naïve Bayes Classifier

P(Enjoy=Yes | X) = P(X | Enjoy=Yes). P(Enjoy=Yes) / P(X)


= P(X | Enjoy=Yes). P(Enjoy=Yes)
= P(X | Enjoy=Yes). (3/7)
= P(Rainy | Enjoy=Yes). P(Warm | Enjoy=Yes). P(Normal | Enjoy=Yes). P(Breeze | Enjoy=Yes).
P(Same | Enjoy=Yes).(3/7) Sky Enjoy AirTemp Humidity Wind Forecast Enjoy
Sport? Sport?
= (0+1/3+2) . (3/3) . (2/3) . (1/3) . (2/3) . (3/7)
Sunny Yes Warm Normal Strong Same Yes

Sunny No Warm High Strong Same No

Rainy No Cold High Strong Change No

Sunny Yes Warm Normal Breeze Same Yes

Sunny No Hot Normal Breeze Same No

Rainy No Cold High Strong Change No

Sunny Yes Warm High Strong Change Yes

Rainy ???? Warm Normal Breeze Same ????


P(Enjoy=No | X) = P(X | Enjoy=No). P(Enjoy=No) / P(X)
Rainy Yes
= P(X | Enjoy=No). P(Enjoy=No)
Sunny Yes
= P(X | Enjoy=No). (4/7)
Rainy No
= P(Rainy | Enjoy=No). P(Warm | Enjoy=No). P(Normal | Enjoy=No). P(Breeze | Enjoy=No). P(Same |
Enjoy=No). (4/7) Sunny No

= (2+1/4+2) . (1/4) . (1/4) . (1/4) . (2/4) . (4/7)

BITS Pilani, Pilani Campus


Naïve Bayes Classifier

P(Enjoy=Yes | X) = P(X | Enjoy=Yes). P(Enjoy=Yes) / P(X)


= P(X | Enjoy=Yes). P(Enjoy=Yes)
= P(X | Enjoy=Yes). (3/7)
= P(Rainy | Enjoy=Yes). P(Warm | Enjoy=Yes). P(Normal | Enjoy=Yes). P(Breeze | Enjoy=Yes).
P(Same | Enjoy=Yes).(3/7) Sky AirTemp Humidity Wind Forecast Enjoy
Sport?
= (1/5) . (3/3) . (2/3) . (1/3) . (2/3) . (3/7) Sunny Warm Normal Strong Same Yes
=0.0127
Sunny Warm High Strong Same No

Rainy Cold High Strong Change No

P(Enjoy=Yes | X) > P(Enjoy=No | X)  EnjoySport = Yes Sunny Warm Normal Breeze Same Yes

Sunny Hot Normal Breeze Same No

Rainy Cold High Strong Change No

Sunny Warm High Strong Change Yes

P(Enjoy=No | X) = P(X | Enjoy=No). P(Enjoy=No) / P(X) Sunny Warm Normal Strong Change ????
= P(X | Enjoy=No). P(Enjoy=No)
Rainy Warm Normal Breeze Same ????
= P(X | Enjoy=No). (4/7)
= P(Rainy | Enjoy=No). P(Warm | Enjoy=No). P(Normal | Enjoy=No). P(Breeze | Enjoy=No). P(Same |
Enjoy=No). (4/7)
= (3/6) . (1/4) . (1/4) . (1/4) . (2/4) . (4/7)
= 0.0023

BITS Pilani, Pilani Campus


Naïve Bayes Classifier
Applications in Natural Language Processing

Politics Sports Techno Celebrit Trendy? Politics Sports Techno Celebrit Trendy?
logy y logy y

10 0 0 5 Yes 1 0 0 1 Yes

1 5 5 0 No 0 1 1 0 No

5 6 10 0 No 1 1 1 0 No

0 20 0 10 Yes 0 1 0 1 Yes

???? ????

Politics Sports Techno Celebrit Trendy? P(Trendy=Yes) = 2/4


logy y
P(Trendy=No) = 2/4
10 20 0 15 Yes = 45
P(Politics | Trendy=Yes) = 1/2
6 11 15 0 No = 32 P(Politics | Trendy=No) = 1/2

P(Trendy=Yes) = 2/4
P(Trendy=No) = 2/4
P(Politics | Trendy=Yes) = 10/45
P(Politics | Trendy=No) = 6/32

BITS Pilani, Pilani Campus


Naïve Bayes Classifier
Another Approach of smoothing
P(Trendy=Yes | X) = P(X | Trendy=Yes). P(Trendy=Yes) / P(X)
= P(X | Trendy=Yes). P(Trendy=Yes)
= P(X | Trendy=Yes). (2/4)
= P(Politics | Trendy=Yes). P(Sports | Trendy=Yes). P(Technology | Trendy=Yes). P(Celebrity |
Trendy=Yes). (2/4) Tech Trendy? Politics Sports Techno Celebrit Trendy?
logy y
= (1/2) . (1/2) . (0/2) . (2/2) . (2/4)
0 Yes 1 0 0 1 Yes

1 No 0 1 1 0 No

1 No 1 1 1 0 No

0 Yes 0 1 0 1 Yes

????

1 Yes

1 Yes

1 Yes
P(Trendy=No | X) = P(X | Trendy=No). P(Trendy=No) / P(X)
1 Yes
= P(X | Trendy=No). Trendy=No)
1 No
= P(X | Trendy=No). (2/4)
1 No
= P(Politics | Trendy=No). P(Sports | Trendy=No).
P(Technology | Trendy=No). P(Celebrity | Trendy=No). (2/4) 1 No

= (1/2) . (2/2) . (2/2) . (0/2) . (2/4) 1 No

BITS Pilani, Pilani Campus


Naïve Bayes Classifier
Another Approach of smoothing
P(Trendy=Yes | X) = P(X | Trendy=Yes). P(Trendy=Yes) / P(X)
= P(X | Trendy=Yes). P(Trendy=Yes)
= P(X | Trendy=Yes). (2/4)
= P(Politics | Trendy=Yes). P(Sports | Trendy=Yes). P(Technology | Trendy=Yes). P(Celebrity |
Trendy=Yes). (2/4) Politics Sports Techno Celebrit Trendy?
logy y
= (1+1/2+4) . (1+1/2+4) . (0+1/2+4) . (2+1/2+4) . (2+4/4+8)
1 0 0 1 Yes

Tech Trendy? 0 1 1 0 No

1 1 1 0 No
0 Yes
0 1 0 1 Yes
1 No
????
1 No
1 Yes
0 Yes
1 Yes

1 Yes
P(Trendy=No | X) = P(X | Trendy=No). P(Trendy=No) / P(X)
1 Yes
= P(X | Trendy=No). Trendy=No)
1 No
= P(X | Trendy=No). (2/4)
1 No
= P(Politics | Trendy=No). P(Sports | Trendy=No).
P(Technology | Trendy=No). P(Celebrity | Trendy=No). (2/4) 1 No

= (2/6) . (3/6) . (3/6) . (1/6) . (6/12) 1 No

BITS Pilani, Pilani Campus


M5-#P1

Fit a classification model for the following data and verify the model performance metric.
Bayes Classifier to classify the new input
• Input : {Round, Black, Small}
• Use Laplace Smoothing for empty sets

Shape Colour Size Action


Round Green Small Reject
Square Black Big Allow
Square Brown Big Allow
Round Brown Small Reject
Square Green Big Allow
Square Brown Small Reject
Oval Green Big Reject
Oval Brown Small Allow
Oval Green Small Reject

BITS Pilani, Pilani Campus


LOGISTIC REGRESSION CLASSIFICATION

Classifier Intuition
Numerical Classifier Example
Python Example (Please watch the uploaded Virtual Lab demo)

BITS Pilani, Pilani Campus


Logistic Regression
CGPA IQ Job Offered

5.5 100 1

5 105 0
Discriminative
Job : 8 90 1

9 105 1
1
6 120 0

y=1 7.5 110 0

4.5 80 0
Decision
7 90 1
Boundary
y=0 8.5 95 ????
0
6 130 ????
1 2 3 4 5 6 7 8 9 10 cgpa

𝜃0 + 𝜃𝑖 𝑥𝑖 ≥ 0
𝑖

𝜃0 + 𝜃𝑖 𝑥𝑖 < 0
𝑖

BITS Pilani, Pilani Campus


Logistic Regression

Discriminative Y=-∞ Y=+∞


Job :

y=1
Decision
Boundary
y=0
0

1 2 3 4 5 6 7 8 9 10 cgpa

𝜃0 + 𝜃𝑖 𝑥𝑖 ≥ 0
𝑖

𝜃0 + 𝜃𝑖 𝑥𝑖 < 0 P=0 P=1


𝑖

BITS Pilani, Pilani Campus


Logistic Regression

Discriminative
Job : 𝑌 = 𝜃0 + 𝜃𝑖 𝑥𝑖
𝑖
1

y=1
Decision
Boundary
y=0
0

1 2 3 4 5 6 7 8 9 10 cgpa 𝑝
ln = 𝜃0 + 𝜃𝑖 𝑥𝑖
1−𝑝
𝑖
𝜃0 + 𝜃𝑖 𝑥𝑖 ≥ 0 𝑝 ≥ 0.5 𝐶𝐺𝑃𝐴 ≥ 6.5
𝑖

𝜃0 + 𝜃𝑖 𝑥𝑖 < 0 𝑝 < 0.5 𝐶𝐺𝑃𝐴 < 6.5


𝑖

BITS Pilani, Pilani Campus


Logistic Regression

• At decision boundary output of logistic regression is 0.5

• ℎ𝜃 𝑥 = 𝑔 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2
– e.g., 𝜃0 = −3, 𝜃1 = 1, 𝜃2 = 1

Decision
Age boundary

Tumor Size

• Predict “𝑦 = 1” if −3 + 𝑥1 + 𝑥2 ≥ 0

BITS Pilani, Pilani Campus


Logistic Regression

• Training set:

• m examples
• n features

• How to choose parameters (feature weights)?

BITS Pilani, Pilani Campus


Logistic Regression

• Training set:
• How to choose parameters (feature weights)?

“non-convex” “convex”

BITS Pilani, Pilani Campus


Logistic regression cost function (cross entropy)

If y = 1

Cost

0 1

BITS Pilani, Pilani Campus


Logistic regression cost function

If y = 0

Cost=0; If y=0 and hꝊ(x)=0

Cost

0 1

BITS Pilani, Pilani Campus


Cost function

To fit parameters : Apply Gradient Descent Algorithm

To make a prediction given new :

Output :

BITS Pilani, Pilani Campus


Gradient Descent Algorithm

𝑚
1
𝐽 𝜃 =− 𝑦 (𝑖) log ℎ𝜃 𝑥 (𝑖) + (1 − 𝑦 (𝑖) ) log 1 − ℎ𝜃 𝑥 (𝑖)
𝑚
𝑖=1

Goal: min 𝐽(𝜃)


𝜃
Repeat
{
𝜕
𝜃𝑗 ≔ 𝜃𝑗 − 𝛼 𝐽 𝜃
𝜕𝜃𝑗
}
𝑚
𝜕 1 𝑖 (𝑖)
𝐽 𝜃 = (ℎ𝜃 𝑥 − 𝑦 (𝑖) ) 𝑥𝑗
𝜕𝜃𝑗 𝑚
𝑖=1

BITS Pilani, Pilani Campus


Logistic Regression – Fit a Model

CGPA IQ IQ Job Offered


Hyper parameters:
5.5 6.7 100 1
Learning Rate = 0.3
5 7 105 0
Initial Weights = (0.5, 0.5,0.5)
8 6 90 1
9 7 105 1
Regularization Constant = 0
6 8 120 0
7.5 7.3 110 0

w0=0.5,w1=0.5,w2=0.5
wTx h(x) (h(x)-y)*x0 (h(x)-y)*x1 (h(x)-y)*x2

6.6 1 0 0 0
6.5 1 1 5 7
7.5 1 0 0 0
8.5 1 0 0 0
7.5 1 1 6 8
7.9 1 1 7.5 7.3
0.15 0.925 1.115
LR*Mean Error Term
0.4 -0.4 -0.6 New Weights

BITS Pilani, Pilani Campus


Logistic Regression – Inference & Interpretation

CGPA IQ IQ Job Offered


0.4+0.3CGPA-0.45IQ
5.5 6.7 100 1
5 7 105 0
8 6 90 1
Predict the Job offered for a candidate : (5, 6)
h(x) = 0.31
9 7 105 1
Y-Predicted = 0 / No
6 8 120 0
7.5 7.3 110 0

Note :
The exponential function of the regression coefficient (ew-cpga) is the odds ratio associated with a one-unit
increase in the cgpa.
+ The odd of being offered with job increase by a factor of 1.35 for every unit increase in the CGPA
[np.exp(model.params)]

BITS Pilani, Pilani Campus


Logistic regression (Classification)

• Model
1
ℎ𝜃 𝑥 = 𝑃 𝑌 = 1 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 = ⊤
1+𝑒 −𝜃 𝑥
• Cost function
𝑚
1 −log ℎ𝜃 𝑥 if 𝑦 = 1
𝐽 𝜃 = Cost(ℎ𝜃 (𝑥 𝑖 ), 𝑦 (𝑖) )) Cost(ℎ𝜃 𝑥 , 𝑦) =
𝑚 −log 1 − ℎ𝜃 𝑥 if 𝑦 = 0
𝑖=1
• Learning
1 𝑚 𝑖 𝑖 𝑖
Gradient descent: Repeat {𝜃𝑗 ≔ 𝜃𝑗 − 𝛼 𝑚 𝑖=1 ℎ𝜃 𝑥 −𝑦 𝑥𝑗 }

• Inference
1
𝑌 = ℎ𝜃 𝑥 test = ⊤ 𝑥 test
1 + 𝑒 −𝜃

Note:
• σ(t) < 0.5 when t < 0, and σ(t) ≥ 0.5 when t ≥ 0, so a Logistic model predicts 1 if xTθ is positive, and 0 if it is negative
• logit(p) = log(p / (1 - p)), is the inverse of the logistic function. Indeed, if you compute the logit of the estimated probability
p, you will find that the result is t. The logit is also called the log-odds

BITS Pilani, Pilani Campus


Overfitting vs Underfitting

Overfitting Underfitting
• Fitting the data too well • Learning too little of the true
– Features are noisy / uncorrelated to concept
concept – Features don’t capture concept
– Too much bias in model

1
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1

BITS Pilani, Pilani Campus


Overfitting

• There are two main options to address the issue of overfitting:


• 1) Reduce the number of features:
– Manually select which features to keep.
– Use a model selection algorithm
• 2) Regularization
– Keep all the features, but reduce the magnitude of parameters θ
– Regularization works well when we have a lot of features and each of which
contribute a bit to predicting “y”

38
BITS Pilani, Pilani Campus
Ways to Control Overfitting

• Regularization
𝑛 # 𝑊𝑒𝑖𝑔ℎ𝑡𝑠

𝐿𝑜𝑠𝑠 𝑆 = 𝐿𝑜𝑠𝑠(𝑦𝑖^ , 𝑦𝑖 ) + 𝛼 |θ𝑗 |


𝑖 𝑗

Note:
The hyperparameter controlling the regularization strength of a Scikit-Learn LogisticRegression model is not
alpha (as in other linear models), but its inverse: C. The higher the value of C, the less the model is
regularized.

39
BITS Pilani, Pilani Campus
Regularization
Ridge Regression / Tikhonov regularization

BITS Pilani, Pilani Campus


Regularization
Lasso Regression (Least Absolute Shrinkage and Selection Operator Regression)

BITS Pilani, Pilani Campus


M5-#P2

Fit a classification model for the following data and verify the model performance metric using
Confusion Matrix on Training Set.
Use Logistic Regression with learning rate = 0.05 to predict the Buy-Preference for new
observation (40, 60)

Risk (%) Discount Rate (%) Buy-Preference


10 25 Yes
30 80 Yes
50 60 No
35 10 No
25 50 Yes

BITS Pilani, Pilani Campus


Prediction – Multi class Classification
One Vs All Strategy

𝑥2
1
ℎ𝜃 𝑥
𝑥1

𝑥2
2
ℎ𝜃 𝑥 𝑥2

𝑥1 𝑥1

Class 1:
Class 2:
3
Class 3: ℎ𝜃 𝑥
𝑥2

𝑖
ℎ𝜃 𝑥 = 𝑃 𝑦 = 𝑖 𝑥; 𝜃 (𝑖 = 1, 2, 3) 𝑥1
Note: Scikit-Learn detects when you try to use a binary classification
𝑖
For input x Predict ∶ max ℎ𝜃 𝑥 algorithm for a multi‐class classification task, and it automatically runs
i OvA (except for SVM classifiers for which it uses OvO)

BITS Pilani, Pilani Campus


Prediction – Multi class Classification
One Vs One Strategy

𝑥2
1
ℎ𝜃 𝑥

𝑥1

𝑥2

ℎ𝜃
2
𝑥 𝑥2

𝑥1
𝑥1
Class 1:
Class 2:
3
Class 3: ℎ𝜃 𝑥 𝑥2

𝑖
ℎ𝜃 𝑥 = 𝑃 𝑦 = 𝑖 𝑥; 𝜃 (𝑖 = 1, 2, 3)
𝑥1
𝑖
For input x Predict ∶ max ℎ𝜃 𝑥
i
N × (N – 1) / 2 classifiers
BITS Pilani, Pilani Campus
Tuning the Training – Eg., Binary Class
Confusion Matrix

𝑥2 Refer Chapter 3 Page 102 onward to read the


the interpretation of confusion matrix
for multi class logistic regression

𝑥1

from sklearn.linear_model import LogisticRegression


from sklearn import metrics, cross_validation
from sklearn.metrics import confusion_matrix

logreg=LogisticRegression()
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.2, random_state=0)
predicted = cross_validation.cross_val_predict(logreg, X_train_scaled, y_train, cv=3)

metrics.accuracy_score(y_train, predicted)
metrics.classification_report(y_train, predicted)

logreg.score(X_test, y_test)
print(‘Test Accuracy Score’, score)
-----------------------------------------------------------------------------------------------------------
probs = logreg.predict_proba(X)[:, 1]
preds = np.where(probs > 0.75, 1, 0)
confusion_matrix(y, preds)

BITS Pilani, Pilani Campus


Inbuilt Solvers in Sklearn package
(for your Python Lab Inputs)

Common Experimental Observations:


Liblinear – Variant Coordinate Gradient Descent. Efficient for High Dimensions and smaller datasets
Lbfgs – Saves memory . Not good for large datasets
Newton-cg – Slow for large datasets
Sag – Variant Stochastic Average Gradient Descent. Fast for large datasets
Saga – Slightly faster than Sag

BITS Pilani, Pilani Campus


Next Class Plan

Module 6
Decision Tree Classifiers
Ensemble Methods Start

BITS Pilani, Pilani Campus

You might also like