Part 5 Classification
Part 5 Classification
Dr Chiradeep Mukherjee,
Associate Professor,
Department of CST and CSIT,
University of Engineering & Management Kolkata
Introduction to Classification
Examples:
i) Classifying Email as SPAM/NOT SPAM
ii) Online Transaction, whether it is FRAUDULENT – YES/NO
0: Negative Class (→Benign)
iii) Tumor: MALIGNANT/BENIGN
Tumor
Learning
Dataset xi yi Output or TARGET, yi {0,1}
Algorithm
xi yi
23 N 1: Positive Class (→Malignant)
If we use a STRAIGHT LINE!!!
20 N
hθ(x)= θTx
… … (Yes) 1
51 Y
Threshold
Malignant? may be used Threshold classifier output hθ(x) at 0.5:
as 0.5 →if hθ(x) ≥ 0.5, predict y=1 (YES)
(No) 0
→if hθ(x) < 0.5, predict y=0 (NO)
Introduction to Classification
Now:::
If tumor size becomes EXCESSIVE LARGE as given by point A, then prediction, hθ(x)= θTx
IS NOT SUITABLE.
These points are NOT
PREDICTED AS YES hθ(x)= θTx Point A
(Yes) 1
Malignant?
(No) 0
Dataset In case of logistic regression, hθ(x) value should lie between 0 and 1. SO WE
NEED A TRANSFORMATION FUNCTION THAT CONVERTS {-,+ } TO
{0,1}.
- 0
hθ(x) HOW?
+ 1
Solution:
Apply σ
θ0+θ1x→hθ(x) σ(hθ(x)) =
σ-stands for Sigmoid
For very high value of hθ(x)=10000, For very low value of hθ(x)=-10000, function:
σ(10000) = =1 σ(-10000) = =0 σ(x) =
Interpretation of Hypothesis
→hθ(x) is ESTIMATED PROBABILITY THAT y=1 ON INPUT x
Example: Consider hypothesis as hθ(x)= 1+(tumor size)*θ1 θ0+θ1x, θ0 =1 and θ1 = tumor
size. If hθ(x)=70% then TELL THE PATIENT THAT THERE IS A 70% CHANCE OF
TUMOR BEING MALIGNANT
Brief:
Measure
Measure
A patient Sigmoid against yi
Tumor Sigmoid 0.7
comes the Tumor Size,
Size, x
x
σ(hθ(x)) =
Read it as x
Notation: We know y=0.7. SO y=hθ(x)= p(y=1|x,θ)=0.7 parameterized by θ
In general, p(y=1|x,θ)+p(y=0|x,θ)=1, or, p(y=0|x,θ)=1- p(y=1|x,θ)
Numerical Problem on Classification
The dataset of pass/fail in an examination for 5 student is given below. If logistic regression is used as the
classifier and the model suggested by the optimizer will become the following ODDS of passing a course:
log(ODDS)=-64+2*hours. i) Calculate the “Probability of pass” for the student who studied 33 hours, ii) At
least how many hours the student should study that makes sure will pass the course with the probability of
more than 95%, iii) Verify the solution.
x2 2
Decision
Boundary
It defines the equation of
STRAIGHT LINE. (A LINEAR
1
Class 0 x1 DECISION BOUNDARY)
0 1 2 3
Decision Boundary
Example 2: Consider the hypothesis with two Example 3: Consider the hypothesis hθ(x)= +
features. If =5, =-1, =0 then draw the + ++ . If =-1, =0, =0, =1, =1 then draw the
DECISION BOUNDARY. DECISION BOUNDARY.
Solution: =
Make =0, i.e. =5 Solution:+ + ++ =++
x1 x1
Decision Boundary-Definition
While training a classifier on a dataset, using a specific classification algorithm, it is required to define a set of
hyper-planes, called Decision Boundary, that separates the data points into specific classes, where the
algorithm switches from one class to another. On one side a decision boundary, a datapoints is more likely to be
called as class A — on the other side of the boundary, it’s more likely to be called as class B.