0% found this document useful (0 votes)
3 views3 pages

Exercise 4

The document outlines an exercise on logistic regression for a machine learning course, focusing on two main problems. The first problem discusses the limitations of linear regression for binary outcome variables and introduces logistic regression, while the second problem involves building a logistic regression model to predict university admissions based on exam scores. It includes instructions for implementing various functions and evaluating the model's performance.

Uploaded by

keutcha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views3 pages

Exercise 4

The document outlines an exercise on logistic regression for a machine learning course, focusing on two main problems. The first problem discusses the limitations of linear regression for binary outcome variables and introduces logistic regression, while the second problem involves building a logistic regression model to predict university admissions based on exam scores. It includes instructions for implementing various functions and evaluating the model's performance.

Uploaded by

keutcha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Machine Learning – WS’12

Exercise-4
Prof. Dr. Dr. Lars Schmidt-Thieme, Umer Khan
Information Systems and Machine Learning Lab (ISMLL),
University of Hildesheim

Logistic Regression:
Problem-1:

Given the following data:

Suppose the following problem: A doctor wants to find out whether a particular antibiotic has an
effect on the incidence of infection in women who have a caesarean section. He starts from a
simple linear regression model:

y =1 means that an infection occurs in the next 2 weeks, y = 0, that no infection occurs. x
encoding the administered amount of the antibiotic. When the target variable is a binomial
random variable with the following discrete distribution:
a) Why should you not use linear regression on data with binary outcome variable? Consider the
error 'ɛ' and the variance σ2. What value ɛ takes for y = 1 and y = 0? What value does σ2 takes?]

b) We assume that the variance for data x can be varying in different cases. What method can be
applied to solve the problem?

c) Suppose we use logistic regression. Describe the behavior of the logistic function with respect
to β

Problem 2:

Build a logistic regression model to predict whether a student gets admitted into a university
based on ‘Exam 1 Score’ and ‘Exam 2 Score’. You have to estimate the probability of a student
to get admitted. Implement ‘plotData.m’ so that the training data in ‘ex2data1.txt’ can be
visualized as in the following figure:

Implement a Sigmoid Function:

Recall that logistic regression hypothesis is defined as:

Where function ‘g’ is a sigmoid function defined as:


Implement ‘sigmoid.m’. Test sigmoid(x) for few values. For very large positive values of x,
sigmoid should be close to 1, and for large negatives it should be close to 0. At x=0, it should be
exactly 0.5.
For a matrix, your function should perform sigmoid function on every element.

Cost Function and Gradient:

Impelemt ‘costFunction.m’ to return cost and gradient of Logistic Regression, which takes the
following form:

And the gradient of the cost is a vector ‘Θ’ (theta) where the jth element (for j= 1,2,….,n) is
defined as:

Test your costFunction through ‘ex2.m’ using the initial parameters of ‘Θ’

Learning Parameters through ‘fminunc’:

‘fminunc’ is an optimization solver built-in Octave, which finds the minimum of an


unconstrained function. Logistic regression is unconstrained since ‘Θ’ can take any real value. It
takes initial parameters ‘Θ’, the costFunction which returns cost and gradient for your data
(X,y). ‘ex2.m’ has already the code to call ‘fminunc’ (Also explained in tutorial). You will notice
that you do not have to specify any loop and learning rate. ‘fminunc’ will do this automatically.
Finally it returns optimized ‘theta’ and value of cost function. ‘ex2.m’ will then call
‘plotDataBounary.m’ to plot a decision boundary on training data using ‘Θs’. The code for
‘plotDataBounary.m’ is already given, but you should try to understand how such boundaries can
be plotted in Octave.

Evaluating Logistic Regression:

After learning the parameters, you can use the model to predict whether a particular student will
be admitted. For a student with an Exam 1 score of 45 and an Exam 2 score of 85, you should
expect to see an admission probability of 0.774. Complete the code in ‘predict.m’ which
produces ‘1’ or ‘0’ and takes data set and thetas as parameters. Using ‘predict.m’, ‘ex2.m’ will
report training accuracy of classifier based on percentage of examples it classified correctly.

You might also like