0% found this document useful (0 votes)
18 views12 pages

Part 5 Classification

Uploaded by

shubhodippal01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views12 pages

Part 5 Classification

Uploaded by

shubhodippal01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Classification

Dr Chiradeep Mukherjee,
Associate Professor,
Department of CST and CSIT,
University of Engineering & Management Kolkata
Introduction to Classification
Examples:
i) Classifying Email as SPAM/NOT SPAM
ii) Online Transaction, whether it is FRAUDULENT – YES/NO
0: Negative Class (→Benign)
iii) Tumor: MALIGNANT/BENIGN
Tumor
Learning
Dataset xi yi Output or TARGET, yi {0,1}
Algorithm
xi yi
23 N 1: Positive Class (→Malignant)
If we use a STRAIGHT LINE!!!
20 N
hθ(x)= θTx
… … (Yes) 1
51 Y
Threshold
Malignant? may be used Threshold classifier output hθ(x) at 0.5:
as 0.5 →if hθ(x) ≥ 0.5, predict y=1 (YES)
(No) 0
→if hθ(x) < 0.5, predict y=0 (NO)
Introduction to Classification
Now:::
If tumor size becomes EXCESSIVE LARGE as given by point A, then prediction, hθ(x)= θTx
IS NOT SUITABLE.
These points are NOT
PREDICTED AS YES hθ(x)= θTx Point A
(Yes) 1

Malignant?

(No) 0

We will use Sigmoid function or LOGIT σ(x)


FUNCTION to avoid such problem: graph
σ(x) =
Another Interpretation of Classification
For Linear Regression, the hypothesis function is hθ(x)=θ0+θ1x. The boundation of hθ(x) lies
between - to + . Consider the following dataset for better understanding.

Dataset In case of logistic regression, hθ(x) value should lie between 0 and 1. SO WE
NEED A TRANSFORMATION FUNCTION THAT CONVERTS {-,+ } TO
{0,1}.
- 0
hθ(x) HOW?
+ 1
Solution:
Apply σ
θ0+θ1x→hθ(x) σ(hθ(x)) =
σ-stands for Sigmoid
For very high value of hθ(x)=10000, For very low value of hθ(x)=-10000, function:
σ(10000) = =1 σ(-10000) = =0 σ(x) =
Interpretation of Hypothesis
→hθ(x) is ESTIMATED PROBABILITY THAT y=1 ON INPUT x
Example: Consider hypothesis as hθ(x)= 1+(tumor size)*θ1 θ0+θ1x, θ0 =1 and θ1 = tumor
size. If hθ(x)=70% then TELL THE PATIENT THAT THERE IS A 70% CHANCE OF
TUMOR BEING MALIGNANT
Brief:
Measure
Measure
A patient Sigmoid against yi
Tumor Sigmoid 0.7
comes the Tumor Size,
Size, x
x
σ(hθ(x)) =

Read it as x
Notation: We know y=0.7. SO y=hθ(x)= p(y=1|x,θ)=0.7 parameterized by θ
In general, p(y=1|x,θ)+p(y=0|x,θ)=1, or, p(y=0|x,θ)=1- p(y=1|x,θ)
Numerical Problem on Classification
The dataset of pass/fail in an examination for 5 student is given below. If logistic regression is used as the
classifier and the model suggested by the optimizer will become the following ODDS of passing a course:
log(ODDS)=-64+2*hours. i) Calculate the “Probability of pass” for the student who studied 33 hours, ii) At
least how many hours the student should study that makes sure will pass the course with the probability of
more than 95%, iii) Verify the solution.

Solution: i) According to sigmoid function, we can write :


Dataset
Result p(y;x,θ) = , here z=-64+2*hours
Hours
(1=Pass,
(Studied) Here, hours =33.
0=Fail)
28 0 So, z=-64+2*33=-64+66=+2
15 0 We put z in p(y;x,θ) as follows:
30 1 p(y;x,θ) = = = =1/1.13= 0.88.
28 1 Therefore, a student who studied for 33 hours has a 88% chance of passing the
39 1 course.
Numerical Problem on Classification…contd
Solution: ii) According to question statement, we can write :
p(y;x,θ) =
Again, we know p(y;x,θ)=
So, 0.95= , or, 0.95()=1, or =1/0.95=1.0526, or, =1.0526-1=0.0526, or, ln| |=ln|0.0526|, or, -z=-2.94, or,
z=2.94
The log-odd equation is given as log(odd)=-64+2*hours
Or, z=-64+2*hours, or, 2.94=-64+2*hours, or, hours=66.94/2=33.47
Therefore, a student have to study 33.47 hours to pass the course with probability 95%.
iii) We know, z=-64+2*hours=-64+2*33.5=-64+67, or, z=3
So, p(y;x,θ)= =
It shows (at around) 95% probability with study hours of 33.5 hours. Hence, verified.
Decision Boundary
Decision boundary helps to differentiate probabilities into positive class and negative class.
Example1: Consider the hypothesis hθ(z)=σ(z)=σ(z)= σ(). If =-3, = =1 then draw the DECISION
BOUNDARY.
Solution: =
Make =0, i.e. =3
Instead of we are
<3 ≥3 taking to indicate
Class 1 multivariate data
Class 0
interpretation.
3 Class 1

x2 2
Decision
Boundary
It defines the equation of
STRAIGHT LINE. (A LINEAR
1
Class 0 x1 DECISION BOUNDARY)
0 1 2 3
Decision Boundary
Example 2: Consider the hypothesis with two Example 3: Consider the hypothesis hθ(x)= +
features. If =5, =-1, =0 then draw the + ++ . If =-1, =0, =0, =1, =1 then draw the
DECISION BOUNDARY. DECISION BOUNDARY.
Solution: =
Make =0, i.e. =5 Solution:+ + ++ =++

Predict y=1 if ≥ 0, i.e. ≥ 5


Or, + =1
5 Predict y=1 if + ≥ 1 and y=0 if + < 1
It defines the
equation of
Predict y=1 if + ≥ 1
STRAIGHT LINE.
Predict Predict
(A LINEAR x2 y=0 y=1
x2
DECISION
BOUNDARY)

x1 x1
Decision Boundary-Definition
While training a classifier on a dataset, using a specific classification algorithm, it is required to define a set of
hyper-planes, called Decision Boundary, that separates the data points into specific classes, where the
algorithm switches from one class to another. On one side a decision boundary, a datapoints is more likely to be
called as class A — on the other side of the boundary, it’s more likely to be called as class B.

IMPORTANCE OF DECISION BOUNDARY


A decision boundary, is a surface that separates data points belonging to different class lables. Decision Boundaries are
not only confined to just the data points that we have provided, but also they span through the entire feature space we
trained on. The model can predict a value for any possible combination of inputs in our feature space. If the data we
train on is not ‘diverse’, the overall topology of the model will generalize poorly to new instances. So, it is important to
analyse all the models which can be best suitable for ‘diverse’ dataset, before using the model into production.
Examining decision boundaries is a great way to learn how the training data we select affects performance and the
ability for our model to generalize. Visualization of decision boundaries can illustrate how sensitive models are to each
dataset, which is a great way to understand how specific algorithms work, and their limitations for specific datasets.
Numerical Problem on Classification
We fit a logistic regression model to estimate the log-odds of a baby being born low birth weight where few
selected features have been included. The features are:- i) smoking, ii) age (numeric), iii) race, iv) race
category except black (other), v) LWT.
Question statement is to estimate the probability p (probability of a low birth weight baby) for a mother that
smoked during pregnancy, age 35, race was white and weight (LWT) = 135 lbs. The probability, p is calculated
by below mentioned expression:
p(y;x,θ)=0.332+1.05(smoke)-0.022(age)+1.23(black)+0.943(other)-0.0125(lwt)
Why Classification is called Logistic Regression?

You might also like