0% found this document useful (0 votes)
18 views17 pages

W5S01 - PM-Logistic Regression

The document provides an overview of logistic regression, a statistical method used for predicting the probability of a dichotomous outcome based on independent variables. It covers key concepts such as the logit function, maximum likelihood estimation, and multinomial logistic regression for scenarios with more than two categories. Additionally, it discusses the interpretation of results and the differences between logistic and linear regression.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views17 pages

W5S01 - PM-Logistic Regression

The document provides an overview of logistic regression, a statistical method used for predicting the probability of a dichotomous outcome based on independent variables. It covers key concepts such as the logit function, maximum likelihood estimation, and multinomial logistic regression for scenarios with more than two categories. Additionally, it discusses the interpretation of results and the differences between logistic and linear regression.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

LOGISTIC REGRESSION

W5S01

4143104 - Machine Learning

Oppir Hutapea, S.Tr.Kom.,M.Kom


([email protected])
CONTENT

01. Supervised Learning Teachniques

02. Logit Function

03. Calculate logistic regression

04. Maximum Likelihood Method

05 The Likelihood Function

06 Multinomial logistic regression

07 Interpretation of the results


01. Supervised Learning Teachniques

There are two categories for supervised learning :


Regression
Classification
Classification
output type : Discrete
trying to find : a boudary
evaluation : accuracy
Regression
output type : continuous
trying to find : best fit line
evaluation : sum of squared errors
01. Supervised Learning Teachniques

There are two categories for supervised learning :


Regression
Classification
01. Introduction to Logistic Regression

Logistic regression is a special case of regression analysis and is used when the dependent variable is nominally scaled. Logistical
regression analysis is thus the counterpart of linear regression, in which the dependent variable of the regression model must at
least be interval-scaled. For example, with the variable purchase decision with the two values buys a product and does not buy a
product.

With logistic regression, it is now possible to explain the dependent variable or estimate the probability of
occurrence of the categories of the variable.
A type of classification algorithm
Based on linear regression to evaluate output and to minimized the error
Named after the method it uses to evaluate the outputs the Logit Function
Logit Function : Chances of happening one event over chances of this event not happening

In the basic form of logistic regression, dichotomous variables (0 or 1) can be


predicted. For this purpose, the probability of the occurrence of value 1
(=characteristic present) is estimated.
01. Introduction to Logistic Regression

In medicine, for example, a frequent application is to find out which variables have an influence on a disease. In
this case, 0 could stand for not diseased and 1 for diseased. Subsequently, the influence of age, gender and
smoking status (smoker or not) on this particular disease could be examined.

Logistic regression and probabilities


In linear regression, the independent variables (e.g., age and gender) are used to estimate the specific value of the
dependent variable (e.g., body weight).
In logistic regression, on the other hand, the dependent variable is dichotomous (0 or 1) and the probability that
expression 1 occurs is estimated. Returning to the example above, this means: How likely is it that the disease is
present if the person under consideration has a certain age, sex and smoking status.
02 Logit Function

this logit function is bounded over x (between 0 and 1


for input values) and unbounded over y-axis (output
values)
We need y-axis (output values) to be bounded to
classify them
Take inverse of the logit function
03 Calculate logistic regression

To build a logistic regression model, the linear regression equation is used as the starting point.

However, if a linear regression were simply calculated for solving a


logistic regression, the following result would appear graphically:

values between plus and minus infinity can now


occur. The goal of logistic regression, however, is to
estimate the probability of occurrence and not the
value of the variable itself. Therefore, the this
equation must still be transformed.
03 Calculate logistic regression

To do this, it is necessary to restrict the value range for the prediction to the range between 0 and 1. To ensure
that only values between 0 and 1 are possible, the logistic function f is used.

The logistic model is based on the logical function. The special thing about the logistic
function is that for values between minus and plus infinity, it always assumes only values
between 0 and 1.
So the logistic function is perfect to describe the probability
P(y=1). If the logistic function is now applied to the upper
regression equation the result is:
03 Calculate logistic regression

This now ensures that no matter in which range the x values are located, only values between 0 and 1 will come
out. The new graph now looks like this:

The probability that for given values of the independent variable


the dichotomous dependent variable y is 0 or 1 is given by:

To calculate the probability of a person being sick or not using


the logistic regression for the example above, the model
parameters b1, b2, b3 and a must first be determined. Once
these have been determined, the equation for the example
above is:
04 Maximum Likelihood Method

To determine the model parameters for the logistic regression equation, the Maximum Likelihood Method is
applied. The maximum likelihood method is one of several methods used in statistics to estimate the parameters
of a mathematical model. Another well-known estimator is the least squares method, which is used in linear
regression.
05 The Likelihood Function

To understand the maximum likelihood method, we introduce the likelihood function L. L is a function of the
unknown parameters in the model, in case of logistic regression these are b1,... bn, a. Therefore we can also write
L(b1,... bn, a) or L(θ) if the parameters are summarized in θ. L(θ) now indicates how probable it is that the
observed data occur. With the change of θ, the probability that the data will occur as observed changes.
05 Maximum Likelihood Estimator

The Maximum Likelihood Estimator can be applied to the estimation of complex nonlinear as well as linear
models. In case of logistic regression, the goal is to estimate the parameters b1,... bn, a, which maximize the so-
called log likelihood function L(θ). The log likelihood function is simply the logarithm of L(θ).
For this nonlinear optimization, different algorithms have been established over the years such as, for example,
the Stochastic Gradient Descent.
06 Multinomial logistic regression

As long as the dependent variable has two characteristics (e.g. male, female), i.e. is dichotomous, binary logistic
regression is used. However, if the dependent variable has more than two instances, e.g. which mobility concept
describes a person's journey to work (car, public transport, bicycle), multinomial logistic regression must be used.

Each expression of the mobility variable (car, public transport, bicycle) is transformed into a new variable. The
one variable mobility concept becomes the three new variables:
1. car is used
2. public transport is used
3. bicycle is used

Each of these new variables then only has the two expressions yes or no, e.g. the variable car is used only has
the two answer options yes or no (either it is used or not). Thus, for the one variable "mobility concept" with
three values, there are three new variables with two values each: yes and no (0 and 1). Three logistic regression
models are now created for these three variables.
07 Interpretation of the results

The relationship between dependent and independent variables in logistic regression is not linear, hence the
regression coefficients cannot be interpreted in the same way. For this reason, odds are interpreted in logistic
regression.
The odds are calculated by relating the two probabilities that y is "1" and that y is "not 1".

This quotient can take any positive value. If this value is now
logarithmized, values between minus and plus are infinitely possible
Q & A!!!!

You might also like