0% found this document useful (0 votes)

21 views

3logistic Regression

Uploaded by

shukladinesh0206

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

3logistic Regression

Uploaded by

shukladinesh0206

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Logistic Regression

S. Sumitra
Department of Mathematics
Indian Institute of Space Science and Technology

MA613 Data Mining

Introduction

Binary Classification
{(x1 , y1 ), (x2 , y2 ), . . . (xN , yN )} be the given data where
xi ∈ Rn , yi ∈ {1, 0}.

Data Soleus Gactrocnemius yi : 1/0

x1T x11 x12 1

x2T x21 x22 0

x3T x31 x32 0

x4T x41 x42 1

x5T x51 x52 1

Probability Theory
Experiments

Discrete Experiment: Oxygen and Hydrogen in the ratio

2:1
Random Experiment: Tossing a coin
Basic Concepts: Random Experiment

Sample Space (Ω): The set of all outcomes of a random

experiment.
For example, the outcomes of tossing a coin are:

Ω = {Heads, Tails}

A subset of a sample space is called an event. An event

can include one or more outcomes. For example, getting
Heads is an event.
Probability Function: P : A → [0, 1], where A is the
collection of events (i.e., the powerset of Ω).
Example: The probability of the event "getting Heads" can
be denoted as P(Heads).
Random Variable

Random Variable
A random variable is a measurable function defined on a
probability space that maps outcomes from the sample
space to real numbers.
Formally, it can be expressed as:

X : Sample Space → R
Types of Random Variables

Discrete Random Variable

The range is countable.
For example, let X denote a random variable that maps
outcomes of a coin toss:

X : {Heads, Tails} → {0, 1}, X (Heads) = 0, X (Tails) = 1

Thus, the probability can be expressed as

P(X = 0) = P(Heads).
Continuous Random Variable
The range is uncountable.
An example is the height of people in a country, which can
take any value within a specified range.
Question

An unbiased die is rolled. Let X = 1, if it shows even

number, else X = 0. What is P(X = 0)?
Attribute

An attribute of a data point is considered a random

variable.
For example, the attribute for **Height** can take values
such as:
{165, 170, 160}
For the attribute **Rain**, which could indicate whether it
rained (1 for rain, 0 for no rain):

{1, 0, 1, 0}
Random Vector

A random vector is defined as:

 
X1
 X2 
X = . 
 
 .. 
Xn

where each Xi is a random variable.

Each data point xi can be represented as a random vector:
 
A1
A2 
xi =  . 
 
 .. 
An
Probability Distribution

A probability distribution is a mathematical description of

the probabilities of events:

P : R(X ) → [0, 1]

where R(X ) is the range of the random variable X .

It can be represented using a table or an equation,
depending on the nature of the distribution (discrete or
continuous).
Discrete Probability Distribution
A discrete distribution describes the probability of
occurrence of each value of a discrete random variable.
The range of a discrete random variable X is given by:
R(X ) = {x1 , x2 , . . . , xm }
The probability mass function (PMF) is defined as:
p(xi ) = P(X = xi ) for i = 1, 2, . . . , m
x P(x)
1
1 6
1
2 6
1
3 6
1
4 6
1
5 6
1
6 6
Common types of discrete probability distributions include:
Bernoulli distribution, Binomial distribution, Poisson
distribution
Continuous Probability Distribution

If a random variable is continuous, its probability

distribution is described by a probability density function
(PDF).
The probability that the random variable X falls within a
certain interval [a, b] is given by:
Z b
P(a ≤ X ≤ b) = p(x) dx
a

Common types of continuous probability distributions

include:
Normal distribution
Exponential distribution
Uniform distribution
Others such as the Gamma and Beta distributions.
Figure: y axis shows density, which indicates the proportion of the
population having a particular range of height.
Univariate & Multivariate Probability Distribution

Univariate Distribution: A probability distribution that

describes the variability of a single random variable.
Multivariate Distribution: A probability distribution that
describes the variability of a random vector, which consists
of multiple random variables.
Joint Probability Distribution

The joint probability distribution of random variables

x1 , x2 , . . . , xN is denoted as:

p(x1 , x2 , . . . , xN )

If the random variables are mutually independent, the joint

probability can be expressed as:

p(x1 , x2 , . . . , xN ) = p(x1 ) · p(x2 ) · · · p(xN )

Bernoulli Distribution

The Bernoulli distribution is a discrete probability

distribution of a random variable X that takes only two
values: 1 (success) and 0 (failure).
Success: X = 1; Failure: X = 0.
The probabilities are defined as:

P(X = 1) = p(1) = ϕ, P(X = 0) = p(0) = 1 − ϕ

The probability mass function (PMF) can be expressed as:

p(x) = ϕx (1 − ϕ)1−x , x = 0, 1
Logistic Regression
Formulation
Output Variable

Define a random variable Y | X such that

(
1 given data point x in positive class
Y =
0 otherwise

Y | X follows Bernoulli distribution. Therefore,

p(Y = y |x) = ϕy (1 − ϕ)1−y , y = 0, 1

where, ϕ is the probability of success, that is the probability that

the given data point belongs to positive class.
Hypothesis Function in Logistic Regression

f (x) be the probability that x belongs to the positive class, that

is, f (x) is the probability of success. Hence, f (x) = p(y = 1|x).
Therefore,

P(Y = y |x) = p(y |x) = f (x)y (1 − f (x))1−y , y = 0, 1

Bernoulli Structures

yi | xi ∼ Bernoulli(f (xi )), ; i = 1, 2, . . . N

yi | xi are:
Bernoulli structures
Independent
Need not be identically distributed as f (xi ) may be different
Odds in favour of an Event

P(A)
The odds in favor of an event A =
1 − P(A)
The log-odds is the natural logarithm of the odds. It is also
known as logit
For logistic regression
f (x)
Odds in favour of getting the positive class is:
1 − f (x)
f (x)
logit(f(x)) = log , which lies in the interval
1 − f (x)
(−∞, +∞)
Modeling Using Linear Regression

Consider
the
data
f (x1 ) f (xN )
x1 , log , . . . xN , log
1 − f (x1 ) 1 − f (xN )
Apply linear regression concepts to model the data, that is,
a hyper plane is modeled to predict the log odds (the logit)
of the probability that Y = 1.

f (x)
log = w T x, w ∈ Rn+1
1 − f (x)
f (x)
exp(w T x) =
1 − f (x)

f (x) = (1 − f (x)) exp(w T x)

f (x)(1 + exp(w T x)) = exp(w T x)
1
f (x) =
1 + exp(−w T x)
1
f (x) = g(w T x) =
1 + e(−w T x)
where
1
g(t) = ,t ∈ R
1 + e−t
is called the logistic function or sigmoid function
Logistic Regression: Output variable

y |x is a random variable

1
y |x ∼ Bernoulli(f (x)) = Bernoulli
1 + exp(−w T x)
1
Sigmoid Function: g(t) =
1 + e−t
Decision Boundary

w T x = 0 is the decision boundary for logistic regression.

w T x ≥ 0, x belongs to positive class and when w T x < 0, x
is in negative class.
Linear Algorithm
Separable Data
Derivative of logistic function

d 1
g ′ (t) =
dt 1 + e−t
1
= (e−t )
(1 + e−t )2

1 1
= 1−
1 + e−t 1 + e−t
= g(t)(1 − g(t))
.
Random Sample

A random sample is a subset of individuals chosen from a

larger population, where each individual has an equal
chance of being selected. This helps to ensure that the
sample is representative of the population.
Consider the study of the heights of the population of a
country. The sample space Ω is defined as the set of all
people. We choose a random sample x1 , x2 , . . . , xN from
this population.
Each sample element can be represented as xi : Ω → R,
where each xi denotes the height of the i-th individual
sampled and is expressed as a real number.
Each xi is a random variable that can take on different
values based on the heights of individuals in the
population.
Data Generation
Consider an experiment of choosing a ball randomly from an
urn three times. The urn consists of 6 blue and 4 orange balls.
Define Ai = 1 if an orange ball is chosen, and Ai = 2 if a blue
ball is chosen, for i = 1, 2, 3. Let this experiment be conducted
4 times.
First Day:
Draw Outcomes: obb, bbo, bob, obo
Corresponding Values: 122, 221, 212, 121
Result Vectors:
       
1 2 2 1
x1 = 2 , x2 = 2 , x3 = 1 , x4 = 2
2 1 2 1
Second Day:
Draw Outcomes: oob, obo, ooo, boo
Corresponding Values: 112, 121, 111, 211
Result Vectors:
       
1 1 1 2
x1 = 1 , x2 = 2 , x3 = 1 , x4 = 1
2 1 1 1
Here, the xi and Ai are treated as random variables.
Independent and Identically Distributed Random
Variables (iid)

A set of random variables x1 , x2 , . . . , xN are declared to be iid if

they satisfy the following:
All have the same probability distribution.
All are mutually independent.
Maximum Likelihood Estimation

Maximum likelihood estimation (MLE) is a method of

estimating the parameters of a probability distribution
Maximize a likelihood function (or likelihood) so that the
observed data is most probable under the probability
distribution under consideration
Assumption

Assume that the N training examples were generated

independently. Maximum likelihood method can be used to find
the unknown parameters.
Problem
Solution

L(θ) = p(x1 , x2 , . . . x10 )

10
Y
= p(x1 = 3)p(x2 = 0)p(x3 = 2)p(x4 = 1)p(x5 = 3)
i=1
p(x6 = 2)p(x7 = 1)p(x8 = 0)p(x9 = 2)p(x10 = 1)

θ = arg max L(θ)

Log Likelihood

Find θ that maximizes L(θ)

Logarithm function is strictly increasing on (0, ∞)
g(x) = log x
1
g ′ x = ≥ 0 on (0, ∞)
x
arg maxθ log L(θ) = arg maxθ L(θ)
log L(θ) = l(θ)
l(θ) is called the log likelihood
Likelihood Function
y |x is a random variable

1
y |x ∼ Bernoulli(f (x)) = Bernoulli
1 + exp(−w T x)
Given N independent random variables
The likelihood of the parameters can be written as

L(w) = p(y1 |x1 , y2 |x2 , . . . yN |xN )

N
Y
= p(yi |xi )
i
N
Y
= (f (xi ))yi (1 − f (xi ))1−yi
i
N
Y yi 1−yi
1 1
= 1−
1 + exp(−w T xi ) 1 + exp(−w T xi )
i
Example

L(w) = p(y1 = 1|x1 , y2 = 0|x2 , y3 = 0|x3 , y4 = 1|x4 , y5 =

1|x5 ; w)
Find w that maximizes the probability of getting such a
sample
l(w)

The log likelihood l(w) is,

l(w) = log L(w)

N
X
= (yi log f (xi ) + (1 − yi ) log(1 − f (xi )))
i=1

Find arg maxw l(w)

Parameter Estimation

To find the maximum value, we can apply gradient ascent

method. That is,

w := w + α∇l(w)

Equating the jth component from both sides,

∂l(w)
wj := wj + α , j = 0, 1, . . . n
∂wj
N
!
∂l(w) ∂ X
= (yi log f (xi ) + (1 − yi ) log(1 − f (xi ))
∂w ∂w
i=1
N
X 1 1 ∂
= yi − (1 − yi ) f (xi )
f (xi ) 1 − f (xi ) ∂w
i=1
N
X 1 1
= yi − (1 − yi ) f (xi )(1 − f (xi ))
f (xi ) 1 − f (xi )
i=1
∂
(w T xi )(refer previous section)
∂w
N
X
= (yi (1 − f (xi )) − (1 − yi )f (xi )) xi
i=1
X
= (yi − f (xi )) xi
Algorithm 1 Updation of w using Batch Gradient Ascent
Choose an initial w and learning parameter α
while not converged
PN do
w := w + α i=1 (yi − f (xi ))xi
end while

Algorithm 2 Updation of w: Gradient Ascent

Choose an initial w and learning parameter α
Iterate until convergence {
wj := wj + α N
P
(y
i=1 i − f (x i ))xij , j = 0, 1, . . . n
}
Stochastic Gradient Ascent

One point at a time

Algorithm 3 Updation of w using Stochastic Gradient Ascent
Choose an initial w and learning parameter α
while not converged do
for i = 1, 2, . . . , N do
w := w + α(yi − f (xi ))xi
end for
Randomly shuffle the data
end while

Algorithm 4 Updation of w using Stochastic Gradient Ascent

Choose an initial w and learning parameter α
Iterate until convergence{
for i = 1, 2 . . . N (randomly shuffle the data) {
wj := wj + α(yi − f (xi ))xij , j = 0, 1, 2, . . . n
}
Randomly shuffle the data
}
Formulation: Minimization Problem

l(w) = log L(w)

N
X
= (yi log f (xi ) + (1 − yi ) log(1 − f (xi )))
i=1

N
!
X
J(w) = − (yi log f (xi ) + (1 − yi ) log(1 − f (xi )))
i=1
N
XD E
= − (yi , (1 − yi ))T , (log f (xi ), (1 − log f (xi ))T
i=1

w = argw min J(w)

Loss Function and Cost Function

A loss function/error function is to measure the discrepancy

between input and output of a single training point whereas
cost function for the entire training data.
Cross Entropy Function

The cross entropy function is used to find the difference

between two probability distributions. In the case of
discrete probability distributions,it takes in two distributions,
p(x), the true distribution, and q(x), the estimated
distribution, defined over the discrete variable X and is
given by P
H(p, q) = − x p(x)log(q(x))
Cross Entropy Function

The cross entropy function is used to quantify the

difference between two probability distributions.
For discrete probability distributions, it takes two
distributions:
p(x) - the true distribution,
q(x) - the estimated distribution,
defined over the discrete variable X .
The cross entropy H(p, q) is given by:
X
H(p, q) = − p(x) log(q(x))
x
Logistic Regression: Loss Function

L(yi , f (xi )) = − (yi log f (xi ) + (1 − yi ) log(1 − f (xi )))

N
X
J(w) = L(yi , f (xi ))
i=1

L is called the logistic loss function, cross-entropy loss, or log

loss.
Newton’s Method
To find the solution of a function h(y ) = 0, y ∈ R using Newton’s
method, the iteration is given by:

h(y )
y := y −
h′ (y )

For the maximum of the log-likelihood l(w), the critical points

satisfy:
∇l(w) = 0
By applying Newton’s method to this equation, we have:

w := w − H −1 ∇l

Here, H is the Hessian matrix of size (n + 1) × (n + 1), defined

as:
∂ 2 l(w)
H = [Hkl ], Hkl = , k , l = 0, 1, 2, . . . , n
∂wk ∂wl
Newton’s Method

Newton’s method converges faster than (batch) gradient

descent as well as stochastic gradient descent. However, one
iteration of Newton’s method is more expensive than both
gradient descent methods, as the Hessian matrix has to be
inverted. If ( n ) is not too large, Newton’s method will be more
effective. When Newton’s method is applied to maximize the
logistic regression log-likelihood function, the algorithm is called
Fisher scoring.
Confusion Matrix
Performance Measures: Classification
Accuracy:
No of correctly classified data
Accuracy =
Total number of data
TP + TN
=
TP + TN + FP + FN
Sensitivity (Recall):
No of correctly classified positive data
Sensitivity =
Total number of positive data
TP
=
TP + FN
Specificity:
No of correctly classified negative data
Specificity =
Total number of negative data
TN
=
TN + FP
Precision:
TP
Precision =
TP + FP

Is a good measure to check whether false positives are

high.
F Measure: Harmonic mean of precision and recall:

2 × (Precision × Recall)
F Measure =
Precision + Recall
True Positive Rate(TPR) & False Positive Rate (FPR)

TPR = sensitivity
FP
FPR = 1 − specificity =
TN + FP
Receiver Operating Characteristic Curve (ROC)

The receiver operating characteristic curve (ROC) of a

binary classifier plots (1 - specificity) (FPR) on the x-axis
and sensitivity (TPR) on the y-axis.
It is a visual tool for comparing classification models,
showing the trade-off between sensitivity and specificity.
ROC curves help in choosing the best threshold for a given
model.
The points in the curve are obtained using different
threshold values for classification.
Interpretation of ROC Curve

A completely random guess will result in points along the

diagonal line from the bottom-left to the top-right corner.
Points below the diagonal are worse than random
guessing.
The accuracy of the test is higher if the ROC plot bulges
towards the upper-left corner.
Area Under the ROC Curve

The value of the area under the ROC curve lies in the
interval [0,1] and is a measure of the accuracy of the
model.
An area of 1 represents a perfect test, while an area less
than or equal to 0.5 indicates a model that is not better
than chance.
More area under the curve signifies that the model is
identifying more true positives while minimizing the number
of false positives.
ROC

0.9

0.8 CGF ROC Area= 0.9403

CGPI ROC Area= 0.9328
0.7 CGPII ROC Area= 0.9402
CGPIII ROC Area= 0.9403
0.6
Sensitivity

0.5

0.4

0.3

0.2

0.1

0
0 0.2 0.4 0.6 0.8 1
(1−Specificity)

Figure: ROC curve and ROC area.

A Probability and Statistics Cheatsheet
No ratings yet
A Probability and Statistics Cheatsheet
28 pages
© Mcgraw-Hill Education. All Rights Reserved. No Reproduction or Distribution Without The Prior Written Consent of Mcgraw-Hill Education
No ratings yet
© Mcgraw-Hill Education. All Rights Reserved. No Reproduction or Distribution Without The Prior Written Consent of Mcgraw-Hill Education
218 pages
Safety Evaluation of Mexico's Tepuxtepec Dam
No ratings yet
Safety Evaluation of Mexico's Tepuxtepec Dam
7 pages
ALL ST218 Lecture Notes
No ratings yet
ALL ST218 Lecture Notes
87 pages
Asae S296
No ratings yet
Asae S296
5 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Scribe: Naive Bayes Classifier
No ratings yet
Scribe: Naive Bayes Classifier
16 pages
Unit 2 (2) - 1
No ratings yet
Unit 2 (2) - 1
37 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
Log-Linear Models and Conditional Random Fieldsels
No ratings yet
Log-Linear Models and Conditional Random Fieldsels
27 pages
CHP 5
No ratings yet
CHP 5
63 pages
Unit-Ii: Probability I: Introductory Ideas
No ratings yet
Unit-Ii: Probability I: Introductory Ideas
28 pages
ML U3
No ratings yet
ML U3
34 pages
Suresh Kumar 5-9 Chap Notes
No ratings yet
Suresh Kumar 5-9 Chap Notes
24 pages
Probability Formula Sheet
No ratings yet
Probability Formula Sheet
11 pages
Probability and Statistics - Cookbook
No ratings yet
Probability and Statistics - Cookbook
28 pages
Stats Cheat Sheet
No ratings yet
Stats Cheat Sheet
28 pages
Probability and Statistics: Cookbook
No ratings yet
Probability and Statistics: Cookbook
28 pages
Formulario Ep Probability and Statistics
No ratings yet
Formulario Ep Probability and Statistics
28 pages
Applied Maths
No ratings yet
Applied Maths
34 pages
01 Lectureslides ProbTheory
No ratings yet
01 Lectureslides ProbTheory
42 pages
Probability and Statistics Cheat Sheet
100% (2)
Probability and Statistics Cheat Sheet
28 pages
Notes Aukland Studied PDF
No ratings yet
Notes Aukland Studied PDF
200 pages
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
No ratings yet
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
7 pages
Revision - Elements or Probability: Notation For Events
No ratings yet
Revision - Elements or Probability: Notation For Events
20 pages
Mayhs
No ratings yet
Mayhs
4 pages
Course Notes Stats 210 Statistical Theory
No ratings yet
Course Notes Stats 210 Statistical Theory
199 pages
210 Book
No ratings yet
210 Book
199 pages
Probability and Statistics Cookbook
No ratings yet
Probability and Statistics Cookbook
28 pages
Stats 210 Course Book
No ratings yet
Stats 210 Course Book
200 pages
lec2 (1)
No ratings yet
lec2 (1)
46 pages
3171617_Probability_360
No ratings yet
3171617_Probability_360
74 pages
STAT515 Lecture
No ratings yet
STAT515 Lecture
85 pages
Probab Refresh
No ratings yet
Probab Refresh
7 pages
PSQT Notes Co1
No ratings yet
PSQT Notes Co1
7 pages
All in One CheatSheet PDF
No ratings yet
All in One CheatSheet PDF
52 pages
ML_Lec 2- Review of probability and statistics
No ratings yet
ML_Lec 2- Review of probability and statistics
30 pages
CME 106 - Probability Cheatsheet PDF
No ratings yet
CME 106 - Probability Cheatsheet PDF
11 pages
c2_RVs_distribution
No ratings yet
c2_RVs_distribution
48 pages
L05 Final
No ratings yet
L05 Final
19 pages
AML-IV_new
No ratings yet
AML-IV_new
98 pages
Intro To Data Science Lecture 2
No ratings yet
Intro To Data Science Lecture 2
12 pages
Advanced Statistics
100% (1)
Advanced Statistics
131 pages
ML Cheat Sheet
50% (2)
ML Cheat Sheet
74 pages
03 MLE MAP NBayes-1-21-2015
No ratings yet
03 MLE MAP NBayes-1-21-2015
40 pages
MAS 102_Topic 1
No ratings yet
MAS 102_Topic 1
13 pages
Cheat Sheet
No ratings yet
Cheat Sheet
5 pages
Lecture2 Math ML Review
No ratings yet
Lecture2 Math ML Review
87 pages
Distribution and Statistical Interference
No ratings yet
Distribution and Statistical Interference
43 pages
Probability and Statistics: Cookbook
No ratings yet
Probability and Statistics: Cookbook
28 pages
Probability
No ratings yet
Probability
28 pages
SI_Chapter-1
No ratings yet
SI_Chapter-1
30 pages
probs_stats
No ratings yet
probs_stats
26 pages
03 MultivariateProbability
No ratings yet
03 MultivariateProbability
73 pages
report-endterm
No ratings yet
report-endterm
30 pages
Statistics
No ratings yet
Statistics
60 pages
Exam P Review Sheet
No ratings yet
Exam P Review Sheet
12 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
Elgenfunction Expansions Associated with Second Order Differential Equations
From Everand
Elgenfunction Expansions Associated with Second Order Differential Equations
E. C. Titchmarsh
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Infinite Series
From Everand
Infinite Series
James M Hyslop
No ratings yet
Hook Inspection
No ratings yet
Hook Inspection
4 pages
Cloud Dynamics Volume 104 Second Edition Robert A. Houze Jr. instant download
100% (2)
Cloud Dynamics Volume 104 Second Edition Robert A. Houze Jr. instant download
55 pages
A Summary of The Various Types of Probability Sample
No ratings yet
A Summary of The Various Types of Probability Sample
4 pages
(English (Auto-Generated) ) Kubernetes YAML File Explained - Deployment and Service - Kubernetes Tutorial 19 (DownSub - Com)
No ratings yet
(English (Auto-Generated) ) Kubernetes YAML File Explained - Deployment and Service - Kubernetes Tutorial 19 (DownSub - Com)
10 pages
Teledyne Odom MB2 Product Leaflet
No ratings yet
Teledyne Odom MB2 Product Leaflet
2 pages
FEE Lecture-4-6
No ratings yet
FEE Lecture-4-6
73 pages
Optidrive Plus 3gv Manual v2.10
100% (1)
Optidrive Plus 3gv Manual v2.10
12 pages
Electrochemistry Notes
No ratings yet
Electrochemistry Notes
72 pages
MAS Assignment (Quantitative Techniques)
No ratings yet
MAS Assignment (Quantitative Techniques)
3 pages
Adding and Subtracting
No ratings yet
Adding and Subtracting
3 pages
Text Book: Hydrologic Analysis and Design, Richard H-Mccuen, Second Edition, Printice Hall, 1998
No ratings yet
Text Book: Hydrologic Analysis and Design, Richard H-Mccuen, Second Edition, Printice Hall, 1998
3 pages
Respiratory System Insects 2
No ratings yet
Respiratory System Insects 2
4 pages
Deflection of Beam
No ratings yet
Deflection of Beam
34 pages
Most SAP ABAP Module Pool Interview Questions and Answers
100% (1)
Most SAP ABAP Module Pool Interview Questions and Answers
9 pages
chap5
No ratings yet
chap5
79 pages
Chapter 3 Outline
No ratings yet
Chapter 3 Outline
2 pages
ADRI Report
No ratings yet
ADRI Report
9 pages
Cambridge International AS & A Level: Chemistry 9701/23 May/June 2020
No ratings yet
Cambridge International AS & A Level: Chemistry 9701/23 May/June 2020
11 pages
Multilayer Sheet Coextrusion: Analysis and Design: Advances in Polymer Technology September 1988
No ratings yet
Multilayer Sheet Coextrusion: Analysis and Design: Advances in Polymer Technology September 1988
19 pages
45 Acta15 ABS 2 2014
No ratings yet
45 Acta15 ABS 2 2014
8 pages
Laboratory Soil Tests Method
No ratings yet
Laboratory Soil Tests Method
6 pages
Get Limnoecology 2nd Edition Winfried Lampert free all chapters
100% (9)
Get Limnoecology 2nd Edition Winfried Lampert free all chapters
50 pages
Simple Present Tense Worksheet
No ratings yet
Simple Present Tense Worksheet
2 pages
Level - 2 & 3 Questions: 25 Exercise - 11 Time: 25 Min.: PQ R PQ + PQ R 4 +
No ratings yet
Level - 2 & 3 Questions: 25 Exercise - 11 Time: 25 Min.: PQ R PQ + PQ R 4 +
6 pages
Facebow
No ratings yet
Facebow
11 pages
Plasma Physics and Engineering Second Edition Fridman instant download
100% (1)
Plasma Physics and Engineering Second Edition Fridman instant download
55 pages
DRV83 X 2
No ratings yet
DRV83 X 2
39 pages

3logistic Regression

Uploaded by

3logistic Regression

Uploaded by

Logistic Regression

MA613 Data Mining

Data Soleus Gactrocnemius yi : 1/0

x2T x21 x22 0

x3T x31 x32 0

x4T x41 x42 1

x5T x51 x52 1

Discrete Experiment: Oxygen and Hydrogen in the ratio

Sample Space (Ω): The set of all outcomes of a random

A subset of a sample space is called an event. An event

Discrete Random Variable

X : {Heads, Tails} → {0, 1}, X (Heads) = 0, X (Tails) = 1

Thus, the probability can be expressed as

An unbiased die is rolled. Let X = 1, if it shows even

An attribute of a data point is considered a random

A random vector is defined as:

where each Xi is a random variable.

A probability distribution is a mathematical description of

where R(X ) is the range of the random variable X .

If a random variable is continuous, its probability

Common types of continuous probability distributions

Univariate Distribution: A probability distribution that

The joint probability distribution of random variables

If the random variables are mutually independent, the joint

p(x1 , x2 , . . . , xN ) = p(x1 ) · p(x2 ) · · · p(xN )

The Bernoulli distribution is a discrete probability

P(X = 1) = p(1) = ϕ, P(X = 0) = p(0) = 1 − ϕ

The probability mass function (PMF) can be expressed as:

Define a random variable Y | X such that

Y | X follows Bernoulli distribution. Therefore,

p(Y = y |x) = ϕy (1 − ϕ)1−y , y = 0, 1

where, ϕ is the probability of success, that is the probability that

f (x) be the probability that x belongs to the positive class, that

P(Y = y |x) = p(y |x) = f (x)y (1 − f (x))1−y , y = 0, 1

yi | xi ∼ Bernoulli(f (xi )), ; i = 1, 2, . . . N

f (x) = (1 − f (x)) exp(w T x)

w T x = 0 is the decision boundary for logistic regression.

A random sample is a subset of individuals chosen from a

A set of random variables x1 , x2 , . . . , xN are declared to be iid if

Maximum likelihood estimation (MLE) is a method of

Assume that the N training examples were generated

L(θ) = p(x1 , x2 , . . . x10 )

θ = arg max L(θ)

Find θ that maximizes L(θ)

L(w) = p(y1 |x1 , y2 |x2 , . . . yN |xN )

L(w) = p(y1 = 1|x1 , y2 = 0|x2 , y3 = 0|x3 , y4 = 1|x4 , y5 =

The log likelihood l(w) is,

l(w) = log L(w)

Find arg maxw l(w)

To find the maximum value, we can apply gradient ascent

Equating the jth component from both sides,

Algorithm 2 Updation of w: Gradient Ascent

One point at a time

Algorithm 4 Updation of w using Stochastic Gradient Ascent

l(w) = log L(w)

w = argw min J(w)

A loss function/error function is to measure the discrepancy

The cross entropy function is used to find the difference

The cross entropy function is used to quantify the

L(yi , f (xi )) = − (yi log f (xi ) + (1 − yi ) log(1 − f (xi )))

L is called the logistic loss function, cross-entropy loss, or log

For the maximum of the log-likelihood l(w), the critical points

Here, H is the Hessian matrix of size (n + 1) × (n + 1), defined

Newton’s method converges faster than (batch) gradient

Is a good measure to check whether false positives are

The receiver operating characteristic curve (ROC) of a

A completely random guess will result in points along the

0.8 CGF ROC Area= 0.9403

Figure: ROC curve and ROC area.

You might also like