Part 5 Classification

Uploaded by

shubhodippal01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views12 pages

Part 5 Classification

Uploaded by

shubhodippal01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Classification

Dr Chiradeep Mukherjee,
Associate Professor,
Department of CST and CSIT,
University of Engineering & Management Kolkata
Introduction to Classification
Examples:
i) Classifying Email as SPAM/NOT SPAM
ii) Online Transaction, whether it is FRAUDULENT – YES/NO
0: Negative Class (→Benign)
iii) Tumor: MALIGNANT/BENIGN
Tumor
Learning
Dataset xi yi Output or TARGET, yi {0,1}
Algorithm
xi yi
23 N 1: Positive Class (→Malignant)
If we use a STRAIGHT LINE!!!
20 N
hθ(x)= θTx
… … (Yes) 1
51 Y
Threshold
Malignant? may be used Threshold classifier output hθ(x) at 0.5:
as 0.5 →if hθ(x) ≥ 0.5, predict y=1 (YES)
(No) 0
→if hθ(x) < 0.5, predict y=0 (NO)
Introduction to Classification
Now:::
If tumor size becomes EXCESSIVE LARGE as given by point A, then prediction, hθ(x)= θTx
IS NOT SUITABLE.
These points are NOT
PREDICTED AS YES hθ(x)= θTx Point A
(Yes) 1

Malignant?

(No) 0

We will use Sigmoid function or LOGIT σ(x)

FUNCTION to avoid such problem: graph
σ(x) =
Another Interpretation of Classification
For Linear Regression, the hypothesis function is hθ(x)=θ0+θ1x. The boundation of hθ(x) lies
between - to + . Consider the following dataset for better understanding.

Dataset In case of logistic regression, hθ(x) value should lie between 0 and 1. SO WE
NEED A TRANSFORMATION FUNCTION THAT CONVERTS {-,+ } TO
{0,1}.
- 0
hθ(x) HOW?
+ 1
Solution:
Apply σ
θ0+θ1x→hθ(x) σ(hθ(x)) =
σ-stands for Sigmoid
For very high value of hθ(x)=10000, For very low value of hθ(x)=-10000, function:
σ(10000) = =1 σ(-10000) = =0 σ(x) =
Interpretation of Hypothesis
→hθ(x) is ESTIMATED PROBABILITY THAT y=1 ON INPUT x
Example: Consider hypothesis as hθ(x)= 1+(tumor size)*θ1 θ0+θ1x, θ0 =1 and θ1 = tumor
size. If hθ(x)=70% then TELL THE PATIENT THAT THERE IS A 70% CHANCE OF
TUMOR BEING MALIGNANT
Brief:
Measure
Measure
A patient Sigmoid against yi
Tumor Sigmoid 0.7
comes the Tumor Size,
Size, x
x
σ(hθ(x)) =

Read it as x
Notation: We know y=0.7. SO y=hθ(x)= p(y=1|x,θ)=0.7 parameterized by θ
In general, p(y=1|x,θ)+p(y=0|x,θ)=1, or, p(y=0|x,θ)=1- p(y=1|x,θ)
Numerical Problem on Classification
The dataset of pass/fail in an examination for 5 student is given below. If logistic regression is used as the
classifier and the model suggested by the optimizer will become the following ODDS of passing a course:
log(ODDS)=-64+2*hours. i) Calculate the “Probability of pass” for the student who studied 33 hours, ii) At
least how many hours the student should study that makes sure will pass the course with the probability of
more than 95%, iii) Verify the solution.

Solution: i) According to sigmoid function, we can write :

Dataset
Result p(y;x,θ) = , here z=-64+2*hours
Hours
(1=Pass,
(Studied) Here, hours =33.
0=Fail)
28 0 So, z=-64+2*33=-64+66=+2
15 0 We put z in p(y;x,θ) as follows:
30 1 p(y;x,θ) = = = =1/1.13= 0.88.
28 1 Therefore, a student who studied for 33 hours has a 88% chance of passing the
39 1 course.
Numerical Problem on Classification…contd
Solution: ii) According to question statement, we can write :
p(y;x,θ) =
Again, we know p(y;x,θ)=
So, 0.95= , or, 0.95()=1, or =1/0.95=1.0526, or, =1.0526-1=0.0526, or, ln| |=ln|0.0526|, or, -z=-2.94, or,
z=2.94
The log-odd equation is given as log(odd)=-64+2*hours
Or, z=-64+2*hours, or, 2.94=-64+2*hours, or, hours=66.94/2=33.47
Therefore, a student have to study 33.47 hours to pass the course with probability 95%.
iii) We know, z=-64+2*hours=-64+2*33.5=-64+67, or, z=3
So, p(y;x,θ)= =
It shows (at around) 95% probability with study hours of 33.5 hours. Hence, verified.
Decision Boundary
Decision boundary helps to differentiate probabilities into positive class and negative class.
Example1: Consider the hypothesis hθ(z)=σ(z)=σ(z)= σ(). If =-3, = =1 then draw the DECISION
BOUNDARY.
Solution: =
Make =0, i.e. =3
Instead of we are
<3 ≥3 taking to indicate
Class 1 multivariate data
Class 0
interpretation.
3 Class 1

x2 2
Decision
Boundary
It defines the equation of
STRAIGHT LINE. (A LINEAR
1
Class 0 x1 DECISION BOUNDARY)
0 1 2 3
Decision Boundary
Example 2: Consider the hypothesis with two Example 3: Consider the hypothesis hθ(x)= +
features. If =5, =-1, =0 then draw the + ++ . If =-1, =0, =0, =1, =1 then draw the
DECISION BOUNDARY. DECISION BOUNDARY.
Solution: =
Make =0, i.e. =5 Solution:+ + ++ =++

Predict y=1 if ≥ 0, i.e. ≥ 5

Or, + =1
5 Predict y=1 if + ≥ 1 and y=0 if + < 1
It defines the
equation of
Predict y=1 if + ≥ 1
STRAIGHT LINE.
Predict Predict
(A LINEAR x2 y=0 y=1
x2
DECISION
BOUNDARY)

x1 x1
Decision Boundary-Definition
While training a classifier on a dataset, using a specific classification algorithm, it is required to define a set of
hyper-planes, called Decision Boundary, that separates the data points into specific classes, where the
algorithm switches from one class to another. On one side a decision boundary, a datapoints is more likely to be
called as class A — on the other side of the boundary, it’s more likely to be called as class B.

IMPORTANCE OF DECISION BOUNDARY

A decision boundary, is a surface that separates data points belonging to different class lables. Decision Boundaries are
not only confined to just the data points that we have provided, but also they span through the entire feature space we
trained on. The model can predict a value for any possible combination of inputs in our feature space. If the data we
train on is not ‘diverse’, the overall topology of the model will generalize poorly to new instances. So, it is important to
analyse all the models which can be best suitable for ‘diverse’ dataset, before using the model into production.
Examining decision boundaries is a great way to learn how the training data we select affects performance and the
ability for our model to generalize. Visualization of decision boundaries can illustrate how sensitive models are to each
dataset, which is a great way to understand how specific algorithms work, and their limitations for specific datasets.
Numerical Problem on Classification
We fit a logistic regression model to estimate the log-odds of a baby being born low birth weight where few
selected features have been included. The features are:- i) smoking, ii) age (numeric), iii) race, iv) race
category except black (other), v) LWT.
Question statement is to estimate the probability p (probability of a low birth weight baby) for a mother that
smoked during pregnancy, age 35, race was white and weight (LWT) = 135 lbs. The probability, p is calculated
by below mentioned expression:
p(y;x,θ)=0.332+1.05(smoke)-0.022(age)+1.23(black)+0.943(other)-0.0125(lwt)
Why Classification is called Logistic Regression?

Mastery in Coaching A Complete Psychological Toolkit For Advanced Coaching (Jonathan Passmore) (Z-Lib.-6
100% (1)
Mastery in Coaching A Complete Psychological Toolkit For Advanced Coaching (Jonathan Passmore) (Z-Lib.-6
345 pages
Pedal Powered Water Purifier
No ratings yet
Pedal Powered Water Purifier
30 pages
IGC1 - Element 6 Principles of Control (1st Ed) v.1.0
100% (2)
IGC1 - Element 6 Principles of Control (1st Ed) v.1.0
87 pages
19 - EMG Scorpio 2& 4
No ratings yet
19 - EMG Scorpio 2& 4
58 pages
Kendriya Vidyalaya Virudhunagar
No ratings yet
Kendriya Vidyalaya Virudhunagar
18 pages
AML AfterMid Merged
No ratings yet
AML AfterMid Merged
389 pages
Legal Literacy Manual For Empowerment of Women With Disabilities in India English
100% (1)
Legal Literacy Manual For Empowerment of Women With Disabilities in India English
361 pages
Fanconi Anemia
No ratings yet
Fanconi Anemia
41 pages
Tai Yin MT Points
No ratings yet
Tai Yin MT Points
12 pages
MC Company Profile
No ratings yet
MC Company Profile
13 pages
Sweetened Condensed Milk Nutrition Information
100% (1)
Sweetened Condensed Milk Nutrition Information
9 pages
Chemostat
No ratings yet
Chemostat
39 pages
Algorithms Notes
No ratings yet
Algorithms Notes
66 pages
ICAM Lead Investigator - Course Overview 2017.pub
No ratings yet
ICAM Lead Investigator - Course Overview 2017.pub
1 page
Classification
100% (2)
Classification
105 pages
Res Manual
No ratings yet
Res Manual
45 pages
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
Logistic Regression
No ratings yet
Logistic Regression
37 pages
Unit 4 - Logistic Regression
No ratings yet
Unit 4 - Logistic Regression
26 pages
Logistic Regression
No ratings yet
Logistic Regression
97 pages
Plasma Enzyme Diagnosis
No ratings yet
Plasma Enzyme Diagnosis
30 pages
Soal Un Bahasa Inggris Xii (Lat 8)
No ratings yet
Soal Un Bahasa Inggris Xii (Lat 8)
13 pages
Sensors 22 08069
No ratings yet
Sensors 22 08069
15 pages
Pre - Operative
No ratings yet
Pre - Operative
11 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
ML Algo
No ratings yet
ML Algo
36 pages
As ISO 11979.2-2003 Opthalmic Implants - Intraocular Lenses Optical Properties and Test Methods
No ratings yet
As ISO 11979.2-2003 Opthalmic Implants - Intraocular Lenses Optical Properties and Test Methods
8 pages
WBCS GAZETTE AUGUST - Compressed
No ratings yet
WBCS GAZETTE AUGUST - Compressed
44 pages
Craniosynostosis & Craniofacial Surgery: A Parent's Guide
No ratings yet
Craniosynostosis & Craniofacial Surgery: A Parent's Guide
16 pages
Children's Understanding of Death at Different Ages
No ratings yet
Children's Understanding of Death at Different Ages
2 pages
5 LR Apr 7 2021
No ratings yet
5 LR Apr 7 2021
94 pages
Blood Hard
No ratings yet
Blood Hard
22 pages
Logistic Regression
No ratings yet
Logistic Regression
23 pages
Week 3 Lecture Notes
No ratings yet
Week 3 Lecture Notes
7 pages
06 Logistic Regression PDF
No ratings yet
06 Logistic Regression PDF
10 pages
Lec 05
No ratings yet
Lec 05
53 pages
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
100% (1)
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
42 pages
Ophthalmology Paper 2
No ratings yet
Ophthalmology Paper 2
2 pages
Lecture - 6.2 - Logistic Regression - Standford ML Andrew NG
No ratings yet
Lecture - 6.2 - Logistic Regression - Standford ML Andrew NG
29 pages
670104915eec8pmi Saincidentinvelo4 05october2024
No ratings yet
670104915eec8pmi Saincidentinvelo4 05october2024
121 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
Ch2Regression and Regularization1
No ratings yet
Ch2Regression and Regularization1
45 pages
Part 8 - Confusion Matrix
No ratings yet
Part 8 - Confusion Matrix
21 pages
W4 Ecs7020p
No ratings yet
W4 Ecs7020p
48 pages
Stucture and Written Expression
No ratings yet
Stucture and Written Expression
27 pages
Decision Boundry
No ratings yet
Decision Boundry
2 pages
Binary Logistic Regression From Scratch
No ratings yet
Binary Logistic Regression From Scratch
10 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
Logistic - Regression Class 2
No ratings yet
Logistic - Regression Class 2
91 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
Slide 2
No ratings yet
Slide 2
30 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
23 pages
AIML Lec-3
No ratings yet
AIML Lec-3
16 pages
ML 03 Logistic Regression
No ratings yet
ML 03 Logistic Regression
32 pages
Logistic - Regression Class 3
No ratings yet
Logistic - Regression Class 3
88 pages
AIML Lec-11
No ratings yet
AIML Lec-11
18 pages
DETOXIFICATION
No ratings yet
DETOXIFICATION
9 pages
Outdoor Recreation
No ratings yet
Outdoor Recreation
14 pages
Logistic Regression Class 1
No ratings yet
Logistic Regression Class 1
37 pages
AIML Lec-7
No ratings yet
AIML Lec-7
24 pages
Linearclassification
No ratings yet
Linearclassification
31 pages
AIML Lec-1
No ratings yet
AIML Lec-1
15 pages
Brochure - Active Member Summary of Benefits
No ratings yet
Brochure - Active Member Summary of Benefits
7 pages
Text Classification Using Logistics Regression
No ratings yet
Text Classification Using Logistics Regression
64 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
ML Notes 1
No ratings yet
ML Notes 1
13 pages
06 Logistic Regression
No ratings yet
06 Logistic Regression
55 pages
Logistic Regression: Classification: Machine Learning
No ratings yet
Logistic Regression: Classification: Machine Learning
141 pages
Logistic Regression
No ratings yet
Logistic Regression
78 pages
AIML Lec-14
No ratings yet
AIML Lec-14
12 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Convolution Over Volumes
No ratings yet
Convolution Over Volumes
9 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Classification-Introduction, Logistic Regression
No ratings yet
Classification-Introduction, Logistic Regression
26 pages
Lecture 5 - Logistic Regression
No ratings yet
Lecture 5 - Logistic Regression
28 pages
Summary Notes of CNN
No ratings yet
Summary Notes of CNN
23 pages
ML Assignment
No ratings yet
ML Assignment
20 pages
Session 9-Logistic Regression
No ratings yet
Session 9-Logistic Regression
33 pages
5 LR Apr 7 2021
No ratings yet
5 LR Apr 7 2021
93 pages
Lecture 3. Classification
No ratings yet
Lecture 3. Classification
60 pages
Logistic Regression - Byimran
No ratings yet
Logistic Regression - Byimran
35 pages
CH3 Logistic Regression 2024
No ratings yet
CH3 Logistic Regression 2024
31 pages
Lecture3 Logistic Regression Classifier V0
No ratings yet
Lecture3 Logistic Regression Classifier V0
41 pages
03-Logistic Regression
No ratings yet
03-Logistic Regression
59 pages
2 - Classification Models
No ratings yet
2 - Classification Models
52 pages
Lec 20
No ratings yet
Lec 20
16 pages
Unit 3 LOGISTIC
No ratings yet
Unit 3 LOGISTIC
7 pages
Updated Nafdac Ceiling List
No ratings yet
Updated Nafdac Ceiling List
2 pages
Womens Occupational Health and Safety in The Informal Economy - M
No ratings yet
Womens Occupational Health and Safety in The Informal Economy - M
183 pages
Gaussian Distribution
No ratings yet
Gaussian Distribution
3 pages
Lecture W3
No ratings yet
Lecture W3
28 pages
Introduction To Machine Learning CS - 229
No ratings yet
Introduction To Machine Learning CS - 229
109 pages
ML Bayes05
No ratings yet
ML Bayes05
18 pages
Logistic Regression by IntuitiveAI v2.5
No ratings yet
Logistic Regression by IntuitiveAI v2.5
8 pages
ML 03 Classification
No ratings yet
ML 03 Classification
47 pages
ML - Logistic Regression&KNN
No ratings yet
ML - Logistic Regression&KNN
48 pages
Lecture 07
No ratings yet
Lecture 07
26 pages

Part 5 Classification

Uploaded by

Part 5 Classification

Uploaded by

Classification

We will use Sigmoid function or LOGIT σ(x)

Solution: i) According to sigmoid function, we can write :

Predict y=1 if ≥ 0, i.e. ≥ 5

IMPORTANCE OF DECISION BOUNDARY

You might also like