0% found this document useful (0 votes)

142 views10 pages

06 Logistic Regression PDF

This document summarizes logistic regression for classification problems. It discusses: 1) Logistic regression generates probabilities between 0 and 1 to classify inputs as belonging to one of two classes, unlike linear regression which can output values above 1 or below 0. 2) The hypothesis function uses the sigmoid/logistic function to generate a probability that an input belongs to class 1. 3) The decision boundary is the line where the hypothesis output is 0.5 - the model predicts class 1 above this boundary and class 0 below it. 4) Higher order terms can create more complex, non-linear decision boundaries to fit more complex data. 5) A convex cost function is used to optimize the parameters with

Uploaded by

marc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

142 views10 pages

06 Logistic Regression PDF

Uploaded by

marc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

06: Logistic Regression

Previous Next Index

Classification
Where y is a discrete value
Develop the logistic regression algorithm to determine what class a new input
should fall into
Classification problems
Email -> spam/not spam?
Online transactions -> fraudulent?
Tumor -> Malignant/benign
Variable in these problems is Y
Y is either 0 or 1
0 = negative class (absence of something)
1 = positive class (presence of something)
Start with binary class problems
Later look at multiclass classification problem, although this is just an extension of
binary classification
How do we develop a classification algorithm?
Tumour size vs malignancy (0 or 1)
We could use linear regression
Then threshold the classifier output (i.e. anything over some value is yes, else
no)
In our example below linear regression with thresholding seems to work

We can see above this does a reasonable job of stratifying the data points into one of two
classes
But what if we had a single Yes with a very small tumour
This would lead to classifying all the existing yeses as nos
Another issues with linear regression
We know Y is 0 or 1
Hypothesis can give values large than 1 or less than 0
So, logistic regression generates a value where is always either 0 or 1
Logistic regression is a classification algorithm - don't be confused

Hypothesis representation
What function is used to represent our hypothesis in classification
We want our classifier to output values between 0 and 1
When using linear regression we did hθ(x) = (θT x)
For classification hypothesis representation we do hθ(x) = g((θT x))
Where we define g(z)
z is a real number
g(z) = 1/(1 + e -z)
This is the sigmoid function, or the logistic function
If we combine these equations we can write out the hypothesis as

What does the sigmoid function look like

Crosses 0.5 at the origin, then flattens out]
Asymptotes at 0 and 1

Given this we need to fit θ to our data

Interpreting hypothesis output

When our hypothesis (hθ(x)) outputs a number, we treat that value as the estimated
probability that y=1 on input x
Example
If X is a feature vector with x0 = 1 (as always) and x1 = tumourSize
hθ(x) = 0.7
Tells a patient they have a 70% chance of a tumor being malignant
We can write this using the following notation
hθ(x) = P(y=1|x ; θ)
What does this mean?
Probability that y=1, given x, parameterized by θ
Since this is a binary classification task we know y = 0 or 1
So the following must be true
P(y=1|x ; θ) + P(y=0|x ; θ) = 1
P(y=0|x ; θ) = 1 - P(y=1|x ; θ)

Decision boundary
Gives a better sense of what the hypothesis function is computing
Better understand of what the hypothesis function looks like
One way of using the sigmoid function is;
When the probability of y being 1 is greater than 0.5 then we can predict y = 1
Else we predict y = 0
When is it exactly that hθ(x) is greater than 0.5?
Look at sigmoid function
g(z) is greater than or equal to 0.5 when z is greater than or equal to 0

So if z is positive, g(z) is greater than 0.5

z = (θT x)
So when
θT x >= 0
Then hθ >= 0.5
So what we've shown is that the hypothesis predicts y = 1 when θT x >= 0
The corollary of that when θT x <= 0 then the hypothesis predicts y = 0
Let's use this to better understand how the hypothesis makes its predictions
Decision boundary

hθ(x) = g(θ0 + θ1 x1 + θ2 x2 )

So, for example

θ0 = -3
θ1 = 1
θ2 = 1
So our parameter vector is a column vector with the above values
So, θT is a row vector = [-3,1,1]
What does this mean?
The z here becomes θT x
We predict "y = 1" if
-3x0 + 1x 1 + 1x 2 >= 0
-3 + x 1 + x2 >= 0
We can also re-write this as
If (x 1 + x 2 >= 3) then we predict y = 1
If we plot
x1 + x 2 = 3 we graphically plot our decision boundary

Means we have these two regions on the graph

Blue = false
Magenta = true
Line = decision boundary
Concretely, the straight line is the set of points where hθ(x) = 0.5 exactly
The decision boundary is a property of the hypothesis
Means we can create the boundary with the hypothesis and parameters without
any data
Later, we use the data to determine the parameter values
i.e. y = 1 if
5 - x1 > 0
5 > x1

Non-linear decision boundaries

Get logistic regression to fit a complex non-linear data set
Like polynomial regress add higher order terms
So say we have
hθ(x) = g(θ0 + θ1 x1 + θ3 x1 2 + θ4 x2 2 )
We take the transpose of the θ vector times the input vector
Say θT was [-1,0,0,1,1] then we say;
Predict that "y = 1" if
-1 + x 1 2 + x2 2 >= 0
or
x1 2 + x2 2 >= 1
If we plot x1 2 + x2 2 = 1
This gives us a circle with a radius of 1 around 0

Mean we can build more complex decision boundaries by fitting complex parameters to
this (relatively) simple hypothesis
More complex decision boundaries?
By using higher order polynomial terms, we can get even more
complex decision boundaries

Cost function for logistic regression

Fit θ parameters
Define the optimization object for the cost function we use the fit the parameters
Training set of m training examples
Each example has is n+1 length column vector

This is the situation

Set of m training examples
Each example is a feature vector which is n+1 dimensional
x0 = 1
y ∈ {0,1}
Hypothesis is based on parameters ( θ)
Given the training set how to we chose/fit θ?
Linear regression uses the following function to determine θ

Instead of writing the squared error term, we can write

If we define "cost()" as;
cost(hθ(xi), y) = 1/2(hθ(xi) - yi)2
Which evaluates to the cost for an individual example using the same measure
as used in linear regression
We can redefine J(θ) as
Which, appropriately, is the sum of all the individual costs over the training
data (i.e. the same as linear regression)
To further simplify it we can get rid of the superscripts
So
What does this actually mean?
This is the cost you want the learning algorithm to pay if the outcome is hθ(x) and
the actual outcome is y
If we use this function for logistic regression this is a non-convex function for
parameter optimization
Could work....
What do we mean by non convex?
We have some function - J( θ) - for determining the parameters
Our hypothesis function has a non-linearity (sigmoid function of hθ(x) )
This is a complicated non-linear function
If you take hθ(x) and plug it into the Cost() function, and them plug the Cost()
function into J(θ) and plot J(θ) we find many local optimum -> non convex function
Why is this a problem
Lots of local minima mean gradient descent may not find the global optimum -
may get stuck in a global minimum
We would like a convex function so if you run gradient descent you converge to a
global minimum

A convex logistic regression cost function

To get around this we need a different, convex Cost() function which means we can apply
gradient descent

This is our logistic regression cost function

This is the penalty the algorithm pays
Plot the function
Plot y = 1
So hθ(x) evaluates as -log(hθ(x))

So when we're right, cost function is 0

Else it slowly increases cost function as we become "more" wrong
X axis is what we predict
Y axis is the cost associated with that prediction
This cost functions has some interesting properties
If y = 1 and hθ(x) = 1
If hypothesis predicts exactly 1 and thats exactly correct then that corresponds
to 0 (exactly, not nearly 0)
As hθ(x) goes to 0
Cost goes to infinity
This captures the intuition that if hθ(x) = 0 (predict P (y=1|x; θ) = 0) but y = 1
this will penalize the learning algorithm with a massive cost
What about if y = 0
then cost is evaluated as -log(1- hθ( x ))
Just get inverse of the other function
Now it goes to plus infinity as hθ(x) goes to 1
With our particular cost functions J( θ) is going to be convex and avoid local minimum

Simplified cost function and gradient descent

Define a simpler way to write the cost function and apply gradient descent to the logistic
regression
By the end should be able to implement a fully functional logistic regression
function
Logistic regression cost function is as follows

This is the cost for a single example

For binary classification problems y is always 0 or 1
Because of this, we can have a simpler way to write the cost function
Rather than writing cost function on two lines/two cases
Can compress them into one equation - more efficient
Can write cost function is
cost(hθ, (x),y) = -ylog( hθ(x) ) - (1-y)log( 1- hθ(x) )
This equation is a more compact of the two cases above
We know that there are only two possible cases
y=1
Then our equation simplifies to
-log(hθ(x)) - (0)log(1 - hθ(x))
-log(hθ(x))
Which is what we had before when y = 1
y=0
Then our equation simplifies to
-(0)log(hθ(x)) - (1)log(1 - hθ(x))
= -log(1- hθ(x))
Which is what we had before when y = 0
Clever!
So, in summary, our cost function for the θ parameters can be defined as

Why do we chose this function when other cost functions exist?

This cost function can be derived from statistics using the principle of
maximum likelihood estimation
Note this does mean there's an underlying Gaussian assumption relating to the
distribution of features
Also has the nice property that it's convex
To fit parameters θ:
Find parameters θ which minimize J(θ)
This means we have a set of parameters to use in our model for future predictions
Then, if we're given some new example with set of features x, we can take the θ which we
generated, and output our prediction using

This result is
p(y=1 | x ; θ)
Probability y = 1, given x, parameterized by θ

How to minimize the logistic regression cost function

Now we need to figure out how to minimize J( θ)

Use gradient descent as before
Repeatedly update each parameter using a learning rate

If you had n features, you would have an n+1 column vector for θ
This equation is the same as the linear regression rule
The only difference is that our definition for the hypothesis has changed
Previously, we spoke about how to monitor gradient descent to check it's working
Can do the same thing here for logistic regression
When implementing logistic regression with gradient descent, we have to update all
the θ values (θ0 to θn ) simultaneously
Could use a for loop
Better would be a vectorized implementation
Feature scaling for gradient descent for logistic regression also applies here

Advanced optimization
Previously we looked at gradient descent for minimizing the cost function
Here look at advanced concepts for minimizing the cost function for logistic regression
Good for large machine learning problems (e.g. huge feature set)
What is gradient descent actually doing?
We have some cost function J( θ), and we want to minimize it
We need to write code which can take θ as input and compute the following
J(θ)
Partial derivative if J(θ) with respect to j (where j=0 to j = n)

Given code that can do these two things

Gradient descent repeatedly does the following update

So update each j in θ sequentially

So, we must;
Supply code to compute J( θ) and the derivatives
Then plug these values into gradient descent
Alternatively, instead of gradient descent to minimize the cost function we could use
Conjugate gradient
BFGS (Broyden-Fletcher-Goldfarb-Shanno)
L-BFGS (Limited memory - BFGS)
These are more optimized algorithms which take that same input and minimize the cost
function
These are very complicated algorithms
Some properties
Advantages
No need to manually pick alpha (learning rate)
Have a clever inner loop (line search algorithm) which tries a bunch of
alpha values and picks a good one
Often faster than gradient descent
Do more than just pick a good learning rate
Can be used successfully without understanding their complexity
Disadvantages
Could make debugging more difficult
Should not be implemented themselves
Different libraries may use different implementations - may hit performance

Using advanced cost minimization algorithms

How to use algorithms

Say we have the following example

Example above
θ1 and θ2 (two parameters)
Cost function here is J(θ) = (θ1 - 5) 2 + ( θ2 - 5) 2
The derivatives of the J( θ) with respect to either θ1 and θ2 turns out to be the 2( θi -
5)
First we need to define our cost function, which should have the following signature

function [jval, gradent] = costFunction(THETA)

Input for the cost function is THETA, which is a vector of the θ parameters
Two return values from costFunction are
jval
How we compute the cost function θ (the underived cost function)
In this case = (θ1 - 5) 2 + (θ2 - 5) 2
gradient
2 by 1 vector
2 elements are the two partial derivative terms
i.e. this is an n-dimensional vector
Each indexed value gives the partial derivatives for the partial derivative
of J(θ) with respect to θi
Where i is the index position in the gradient vector
With the cost function implemented, we can call the advanced algorithm using

options= optimset('GradObj', 'on', 'MaxIter', '100'); % define

the options data structure
initialTheta= zeros(2,1); # set the initial dimensions for
theta % initialize the theta values
[optTheta, funtionVal, exitFlag]= fminunc(@costFunction,
initialTheta, options); % run the algorithm

Here
options is a data structure giving options for the algorithm
fminunc
function minimize the cost function ( find minimum of
unconstrained multivariable function)
@costFunction is a pointer to the costFunction function to be used
For the octave implementation
initialTheta must be a matrix of at least two dimensions

How do we apply this to logistic regression?

Here we have a vector

Here
theta is a n+1 dimensional column vector
Octave indexes from 1, not 0
Write a cost function which captures the cost function for logistic regression

Multiclass classification problems

Getting logistic regression for multiclass classification using one vs. all
Multiclass - more than yes or no (1 or 0)
Classification with multiple classes for assignment

Given a dataset with three classes, how do we get a learning algorithm to work?
Use one vs. all classification make binary classification work for multiclass
classification
One vs. all classification
Split the training set into three separate binary classification problems
i.e. create a new fake training set
Triangle (1) vs crosses and squares (0) hθ1 (x)
P(y=1 | x 1 ; θ)
Crosses (1) vs triangle and square (0) hθ2 (x)
P(y=1 | x2 ; θ)
Square (1) vs crosses and square (0) hθ3 (x)
P(y=1 | x3 ; θ)

Overall
Train a logistic regression classifier hθ(i)(x) for each class i to predict the probability
that y = i
On a new input, x to make a prediction, pick the class i that maximizes the
probability that hθ(i)(x) = 1

MKF 2121 Assignment 1
No ratings yet
MKF 2121 Assignment 1
5 pages
Ijser: Analysis and Design of RCC Box Culvert
No ratings yet
Ijser: Analysis and Design of RCC Box Culvert
6 pages
Effects of Social Media To Students' Behavior
73% (11)
Effects of Social Media To Students' Behavior
17 pages
Discuss Baillargeon's Explanation of Early Infant Abilities
No ratings yet
Discuss Baillargeon's Explanation of Early Infant Abilities
2 pages
Algorithms Notes
No ratings yet
Algorithms Notes
66 pages
Week 3 Lecture Notes
No ratings yet
Week 3 Lecture Notes
7 pages
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
100% (1)
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
42 pages
Slide 2
No ratings yet
Slide 2
30 pages
Logistic Regression by IntuitiveAI v2.5
No ratings yet
Logistic Regression by IntuitiveAI v2.5
8 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
Ch2Regression and Regularization1
No ratings yet
Ch2Regression and Regularization1
45 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
06 Logistic Regression
No ratings yet
06 Logistic Regression
55 pages
Sample Research Paper
No ratings yet
Sample Research Paper
26 pages
ML Week 3 Logistic Regression
60% (10)
ML Week 3 Logistic Regression
6 pages
ML - Logistic Regression&KNN
No ratings yet
ML - Logistic Regression&KNN
48 pages
01B DL2023 LinearModels
No ratings yet
01B DL2023 LinearModels
47 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Lecture Note #9 - PEC-CS701E
No ratings yet
Lecture Note #9 - PEC-CS701E
41 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Lecture 3. Classification
No ratings yet
Lecture 3. Classification
60 pages
Binary Logistic Regression From Scratch
No ratings yet
Binary Logistic Regression From Scratch
10 pages
Lecture 07
No ratings yet
Lecture 07
26 pages
A Tutorial of Machine Learning
No ratings yet
A Tutorial of Machine Learning
16 pages
Decision Boundry
No ratings yet
Decision Boundry
2 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
ML 03 Logistic Regression
No ratings yet
ML 03 Logistic Regression
32 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
Classification-Introduction, Logistic Regression
No ratings yet
Classification-Introduction, Logistic Regression
26 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
Logistic Regression
No ratings yet
Logistic Regression
37 pages
CSE445 T4a Logistic Regression
No ratings yet
CSE445 T4a Logistic Regression
38 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Lecture W3
No ratings yet
Lecture W3
28 pages
03-Logistic Regression
No ratings yet
03-Logistic Regression
59 pages
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
No ratings yet
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
20 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
Logistic Regression (Probability Concepts) and Perceptron
No ratings yet
Logistic Regression (Probability Concepts) and Perceptron
20 pages
3 Logistic Regression and Regularization
No ratings yet
3 Logistic Regression and Regularization
42 pages
Lec18 Logistic Regression
No ratings yet
Lec18 Logistic Regression
17 pages
Part 5 Classification
No ratings yet
Part 5 Classification
12 pages
CH3 Logistic Regression 2020
No ratings yet
CH3 Logistic Regression 2020
28 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
(MLP) MidtermNote
No ratings yet
(MLP) MidtermNote
31 pages
Notes Chapter Logistic Regression
No ratings yet
Notes Chapter Logistic Regression
6 pages
Binary Classification and Logistic Regression
No ratings yet
Binary Classification and Logistic Regression
7 pages
Logistic Regression
No ratings yet
Logistic Regression
23 pages
What Is Machine Learning by Coursera
No ratings yet
What Is Machine Learning by Coursera
47 pages
CS229 Supplemental Lecture Notes: 1 Binary Classification
No ratings yet
CS229 Supplemental Lecture Notes: 1 Binary Classification
7 pages
Lecture 8 Logistic Regression
No ratings yet
Lecture 8 Logistic Regression
34 pages
Machine Learning 2
No ratings yet
Machine Learning 2
19 pages
Logistic - Regression Class 2
No ratings yet
Logistic - Regression Class 2
91 pages
What Is Machine Learning?
No ratings yet
What Is Machine Learning?
12 pages
Lecture 03 Logistic Regression
No ratings yet
Lecture 03 Logistic Regression
34 pages
Lec 3
No ratings yet
Lec 3
22 pages
Machine Learning - Home - Coursera Quiz PDF
100% (1)
Machine Learning - Home - Coursera Quiz PDF
5 pages
Text Classification Using Logistics Regression
No ratings yet
Text Classification Using Logistics Regression
64 pages
M146 Lec3 Sidenotes S25
No ratings yet
M146 Lec3 Sidenotes S25
29 pages
Introduction To Machine Learning: 2 Linear Classifiers
No ratings yet
Introduction To Machine Learning: 2 Linear Classifiers
4 pages
Anomaly Detection - Problem Motivation
No ratings yet
Anomaly Detection - Problem Motivation
9 pages
18: Application Example OCR: Problem Description and Pipeline
No ratings yet
18: Application Example OCR: Problem Description and Pipeline
6 pages
16 Recommender Systems PDF
No ratings yet
16 Recommender Systems PDF
6 pages
09: Neural Networks - Learning: Neural Network Cost Function
No ratings yet
09: Neural Networks - Learning: Neural Network Cost Function
9 pages
17 Large Scale Machine Learning PDF
No ratings yet
17 Large Scale Machine Learning PDF
10 pages
11 Machine Learning System Design PDF
No ratings yet
11 Machine Learning System Design PDF
7 pages
10: Advice For Applying Machine Learning: Deciding What To Try Next
No ratings yet
10: Advice For Applying Machine Learning: Deciding What To Try Next
8 pages
12 Support Vector Machines PDF
No ratings yet
12 Support Vector Machines PDF
11 pages
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
No ratings yet
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
7 pages
13: Clustering: Unsupervised Learning - Introduction
No ratings yet
13: Clustering: Unsupervised Learning - Introduction
4 pages
07: Regularization: The Problem of Overfitting
No ratings yet
07: Regularization: The Problem of Overfitting
5 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
03: Linear Algebra - Review: Matrices - Overview
No ratings yet
03: Linear Algebra - Review: Matrices - Overview
4 pages
01 02 Introduction Regression Analysis and GR
No ratings yet
01 02 Introduction Regression Analysis and GR
11 pages
Linear Regression With Multiple Features
No ratings yet
Linear Regression With Multiple Features
7 pages
08 Neural Networks Representation PDF
No ratings yet
08 Neural Networks Representation PDF
10 pages
Assignment 6
No ratings yet
Assignment 6
4 pages
Key Officials
No ratings yet
Key Officials
2 pages
Agricultural Communication PPT 01032024 - 240304 - 114919
No ratings yet
Agricultural Communication PPT 01032024 - 240304 - 114919
65 pages
SMRP 1995 PMO Analysis
No ratings yet
SMRP 1995 PMO Analysis
10 pages
Interviews With The Early Explorers
No ratings yet
Interviews With The Early Explorers
2 pages
MST-002 - Descriptive Statistics
No ratings yet
MST-002 - Descriptive Statistics
267 pages
This Study Resource Was: SCI-M 3114 Teaching Science in The Elementary Grades
No ratings yet
This Study Resource Was: SCI-M 3114 Teaching Science in The Elementary Grades
7 pages
Citi - Test Lead - TCoE
No ratings yet
Citi - Test Lead - TCoE
2 pages
Example of A Good Scientific Literature Review
100% (3)
Example of A Good Scientific Literature Review
7 pages
Shift in Between - Group 2 - Stem 1
No ratings yet
Shift in Between - Group 2 - Stem 1
7 pages
Statement of Problem, Scope and Limitations
100% (1)
Statement of Problem, Scope and Limitations
6 pages
One-Tail vs. Two-Tail P Values PDF
No ratings yet
One-Tail vs. Two-Tail P Values PDF
2 pages
Organizational Behavior in Education: Leadership and School Reform (The Allyn & Bacon Educational Leadership Series) 11th Edition - Ebook PDF
100% (54)
Organizational Behavior in Education: Leadership and School Reform (The Allyn & Bacon Educational Leadership Series) 11th Edition - Ebook PDF
62 pages
The Outer Space Treaty of 1967
No ratings yet
The Outer Space Treaty of 1967
5 pages
Mankiw PrinciplesOfEconomics 10e PPT CH38
No ratings yet
Mankiw PrinciplesOfEconomics 10e PPT CH38
43 pages
I.1 Theoretical Framework Problem Statement and Research Questions
No ratings yet
I.1 Theoretical Framework Problem Statement and Research Questions
5 pages
Seminar
No ratings yet
Seminar
27 pages
Unit 3, Pharmaceutical Jurisprudence, B Pharmacy 5th Sem, Carewell Pharma
No ratings yet
Unit 3, Pharmaceutical Jurisprudence, B Pharmacy 5th Sem, Carewell Pharma
13 pages
"The Gay Clubs Are It" An Analysis of Straight Women's Motivations For Frequenting Gay Bars
No ratings yet
"The Gay Clubs Are It" An Analysis of Straight Women's Motivations For Frequenting Gay Bars
26 pages
Interrater Reliability
No ratings yet
Interrater Reliability
4 pages
Tourism Merchandise As A Means
No ratings yet
Tourism Merchandise As A Means
16 pages
Conditional Probability and - Independence
No ratings yet
Conditional Probability and - Independence
41 pages
The Sage Handbook of Applied Social Psychology Kieran C Odoherty Instant Download
No ratings yet
The Sage Handbook of Applied Social Psychology Kieran C Odoherty Instant Download
77 pages
Bilingual Effects On Cognitive Abilities
No ratings yet
Bilingual Effects On Cognitive Abilities
11 pages
WP Practical Guide To Leading Indicators of Safety
No ratings yet
WP Practical Guide To Leading Indicators of Safety
20 pages
AFB Saurabh Last Year
No ratings yet
AFB Saurabh Last Year
11 pages
Micro Project Format Proposal
No ratings yet
Micro Project Format Proposal
5 pages

06 Logistic Regression PDF

Uploaded by

06 Logistic Regression PDF

Uploaded by

06: Logistic Regression

Previous Next Index

What does the sigmoid function look like

Given this we need to fit θ to our data

Interpreting hypothesis output

So if z is positive, g(z) is greater than 0.5

So, for example

Means we have these two regions on the graph

Non-linear decision boundaries

Cost function for logistic regression

This is the situation

Instead of writing the squared error term, we can write

A convex logistic regression cost function

This is our logistic regression cost function

So when we're right, cost function is 0

Simplified cost function and gradient descent

This is the cost for a single example

Why do we chose this function when other cost functions exist?

How to minimize the logistic regression cost function

Now we need to figure out how to minimize J( θ)

Given code that can do these two things

So update each j in θ sequentially

Using advanced cost minimization algorithms

How to use algorithms

function [jval, gradent] = costFunction(THETA)

options= optimset('GradObj', 'on', 'MaxIter', '100'); % define

How do we apply this to logistic regression?

Multiclass classification problems

You might also like