0% found this document useful (0 votes)

97 views12 pages

Logistics Regression Notes

This document discusses logistic regression and its advantages over linear regression when the dependent variable is categorical. It begins by defining categorical data and the two types: binary and polychotomous. It then explains binary logistic regression and its assumptions. Key differences between linear and logistic regression are outlined, specifically that logistic regression does not assume a linear relationship between variables and is suited for classification problems rather than regression. The limitations of linear regression for categorical dependent variables are described. Finally, the document introduces the sigmoid function used in logistic regression to limit the range of probabilities to 0 to 1.

Uploaded by

shruti gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

97 views12 pages

Logistics Regression Notes

Uploaded by

shruti gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Financial Econometrics Assignment 3

Logistic Regression
Submitted by:
Name: Shruti Gupta
Class: BBA (FIA) 2B
Roll No: 18373

Introduction
When the dependent variable is categorical, then the relationship between the dependent
variable and the independent variables can be represented by using a logistic regression
model. Using the logistic regression model, the value of the dependent variable can be
predicted from the values of the independent variables.
Categorical Data:
Categorical data is a type of statistical data which consists of categorical variables or
grouped data which can be converted into categorical form. Categorical data is divided into
groups according to the variables present in the data.
Examples of Categorical data can be:

• Gender, such as Male or Female

• Income levels groups such as low-income group, middle income group or high-
income group
• Blood type of a person: A, B, AB or O.
• Race of a person
• Educational level of a person etc.
Categorical variables can be of two types:
1. Binary Variable: A categorical variable which can take on exactly two values is
termed a binary variable or a dichotomous variable. Binary variable is a random
variable which is binary in nature. Examples of Binary variables can be – True or
False, male or female, yes or no etc.
2. Polychotomous Variable: A Categorical variable which can take more than two
possible values are called polychotomous variables. For example; blood type of a
person, income levels etc.

Binary Logistic Regression:

Binary Logistic Regression is a statistical method for the prediction of the binary classes. This
technique is used in the cases when the Dependent variable is binary in nature, that is it can
take only two variables such as a yes or a no. It calculates the probability of occurrence of an
event.
Some examples of Logistic regression problems can be:
• To buy or sell a stock?
• Will the car break down or not?
• Should the bank give loan to the person or not?

Assumptions of Binary Linear Regression:

The Logistic Regression Model does not undertake some of the assumptions taken under
the Classical Linear Regression Model which are:
1. Under this model, a linear relationship is not required between the dependent and
independent variables. The relationship between the variables can be non – linear as
the method involves a log transformation of the variable.
2. The error or residual terms are not required to be normally distributed.
3. Logistic regression does not require variances to be homoscedastic for each level of
the independent variables i.e. homoscedasticity is not required in the data.
4. The independent variable can be ordinal or nominal.
The Logistic Regression Model takes the following assumptions:
1. The Binary Logistic Regression Model requires the dependent variable to be binary in
nature i.e. taking only 2 values.
2. Under this model, the observations should be independent i.e. the data should not
come from repeated measurements or matched data.
3. There should not be multicollinearity in the data. This means that the independent
variables should be independent and there should not be any correlation between
the independent variables.
4. The independent variables are linearly related to the log odds. The independent
variables need not be linearly related to the dependent variable, but under the
logistic regression model, there is a linear relationship between the independent
variables and the log odds.
5. The model should be fitted correctly. Over fitting nor under fitting of the model
should not occur. All the meaningful variables should be included in the model.
6. Logistic regression requires the sample sizes to be large because maximum likelihood
estimates are less powerful than ordinary least squares

Logistic Regression Vs Linear Regression

There are significant differences between Logistic Regression and Linear Regression. The
Linear Regression models are used to estimate and solve the Regression Problems whereas
the Logistic Regression models are mainly used to estimate and solve the Classification
Problems. However, Logistic Regression Models can also be used for solving Regression
Problems.

The differences in the characteristics of Linear Regression and Logistic Regression are
illustrated below:

S. No. Linear Regression Logistic Regression

1. Linear regression is used to Logistic Regression is used to
predict the continuous predict the categorical dependent
dependent variable through a variable through a given set of
given set of independent independent variables.
variables.
2. Linear Regression is used to Logistic regression is used to
solve Regression problem. solve Classification problems.
4. Under linear regression, we find Under Logistic Regression, we
the best fit line in order to find the S-curve in order to
predict the output. classify the samples.
3. Under Linear regression, we Under logistic Regression, we
predict the value of continuous predict the values of categorical
variables. variables.
5. Least square estimation method Maximum likelihood estimation
is used for estimation of method is used for estimation of
accuracy. accuracy.
6. In Linear regression, the In Logistic regression, it is not
relationship between required to have the linear
dependent variable and relationship between the
independent variable must be dependent and independent
linear. variable.
Limitations of Linear Regression in the case of Categorical Variables:
A linear regression model is not suitable in case the dependent variable is categorical
(binary) in nature. When the dependent variable is binary in nature, it can assume only two
values i.e. 0 or 1 just like a dummy variable.
Under Linear Regression, the regression equation of the model is
Yi = β0 + β1Xi + µi
Under the linear regression model, the dependent and independent variables can take any
number of values which causes certain problems while analysing the categorical data. These
are stated as follows:
1. In a binary classification problem, we tend to estimate the probability of an outcome
occurring. Probability is ranged between 0 and 1, where the probability of something
certain to happen is 1, and 0 is something unlikely to happen. But in linear
regression, we are predicting an
absolute number, which can range
outside 0 and 1. Since, the linear
regression model produces a
straight-line curve, some values
may be either less than 0 or more
1. But the probability value cannot
be greater than 1 or less than 0.
Though we can limit any value
greater than 1 to be 1, and value
lower than 0 to be 0, but in this
case the analysis would not
generate the most accurate results
as compared to Logistic Regression
model.
2. Since binary classification problems can only have one of two possible values(0 or 1),
the residuals or the error terms will not be normally distributed about the regression
line. But, under the Classical Linear Regression model, we assume the error terms to
be normally distributed.

Due to the above factors, Linear Regression becomes unsuitable in case the
Dependent variable is categorical.

The model can be made suitable if the two conditions are satisfied:
• The function must always be positive
• The function must be less than 1
Sigmoid Function:

The sigmoid function, which is also called the logistic function gives an ‘S’ shaped curve that
can take any real-valued number and limit it into a value between 0 and 1. If the curve goes
to positive infinity, y predicted will become 1, and if the curve goes to negative infinity, y
predicted will become 0.

If the output of the sigmoid function is more than 0.5, we can classify the outcome as 1 or
Yes, and if it is less than 0.5, we can classify it like 0 or No.

The linear regression model is

Yi (or p) = β0 + β1Xi

To make the value of the RHS positive and less than 1, we apply the sigmoid function to the
linear regression model.

We take the exponent of the RHS to bring its value between 0 and 1

After this transformation, the value of the dependent variable is limited between 0 and 1.

To overcome the residual issue, we identify a threshold probability value. If the probability is
more than the threshold value, it is predicted that the event is certain to happen and if the
probability value is less than the threshold value, it is predicted that the event is not certain
to happen.
This is the Logistic Regression Function which overcomes the limitations of the linear model.

Logistic Regression:

The logistic regression predicts the dependent variable using the independent variables.

The equation of the logistic regression model is

This can be written as

p = e^( β0 + β1Xi)/ 1 + e^( β0 + β1Xi)

p (1 + e^( β0 + β1Xi)) = e^( β0 + β1Xi)

p + p(e^( β0 + β1Xi) = e^( β0 + β1Xi)

p = e^( β0 + β1Xi)*(1-p)

p/(1-p) = e^( β0 + β1Xi)

When we take natural log on both sides, then the equation becomes

Ln(p/(1-p)) = β0 + β1Xi

This is another form of the logistic regression equation.

Here p/1-p represents the log of odds i.e. the ratio of the probability of the event happening
with the probability of event not happening.
Though, the independent variable is not linearly related with the dependent variable, it is
linearly related with the log of odds, making this is linear function.

Interpretation of the Coefficients:

Logistic Coefficients: Log of odds is the ratio of the probability of the event happening with
the probability of event not happening.

The slope coefficient is interpreted as the rate of change in the "log odds" as X changes. The
coefficient is used to determine whether a change in a predictor variable makes the event
more likely or less likely. A positive coefficient makes the event more likely and negative
coefficient makes the event less likely. An estimated coefficient near 0 implies that the
effect of the predictor is small.

Log of Odds: If the β1 value is 1.6, it means that 1 unit change in X1 while others
independent variables are at same level, produces 1.6 unit change in log of the odd. If we
take exponential for log odd, we will get odd value.

Research Problem:
The research problem taken to analyse the application of Logistic regression is to predict
“whether the birthweight of an infant would be low (< 2500 g) or not”. This would be
affected by a number of factors such as the age of the mother, race of the mother, weight of
the mother, whether she smoked during pregnancy or not etc.
This is a problem of Logistic Regression since the dependent variable i.e. low birthweight is
binary in nature. It can take only two values – whether the birthweight or the infant would
be low or not.
Data: For the analysis, a sample data of 189 mothers has been taken
Dependent Variable – Low Birthweight which is labelled as ‘low’
Independent Variables –
“age” – indicates the age of the mother at the time of pregnancy
“smoke” – indicates whether the mother was a smoker or a non – smoker during pregnancy

Binary Predictor:
Under this, the independent (explanatory variable) is also a binary variable which can take
only 2 values. To analyse this, “smoke” which is a binary independent variable has been
taken. The Logistic Regression model aims to estimate whether smoking by a pregnant
woman causes low birthweight in infants or not.
Low Birthweight = β0 + β1*(Smoke) + µi
Logistic Coefficients Method:

Since the coefficient of the predictor variable is positive, it indicates that low birthweight is
likely to occur in infants as the incidences of smoking increases in the mothers. The value of
the coefficient is 0.7040, which implies that there is 70.4% change that smoking will cause
low birthweight in infants.

Odds Ratio:
From the results, we can observe that the odds ratio of the independent variable is
2.0219. This indicates that the odds of low birthweight in mothers who smoke is
almost twice the odds for a non – smoking mother. The p – value is less than 5%,
which implies that the results are statistically significant.
Continuous Predictors:
Under this, the independent (explanatory variable) is a continuous variable which can take
any number of values. To analyse this, “age” which is a continuous independent variable has
been taken. The Logistic Regression model aims to estimate whether age of the pregnant
woman causes low birthweight in infants or not.
Low Birthweight = β0 + β1*(Age) + µi

Logistic Coefficients Method: The coefficient of a continuous predictor is the estimated

change in the natural log of the odds for the reference event for each unit increase in the
predictor.
Since the coefficient of the predictor variable is negative, it indicates that low birthweight is
less likely to occur in infants as the age of the mothers increases. The value of the coefficient
is -0.0511, which implies that there is 5.11% chance that increase in age will cause low
birthweight in infants.
Odds Ratio:

From the results, we can observe that the odds ratio of the independent variable is
0.9501 . The odds ratio of 0.95 indicates that each year increase in age is associated
with a 5% increase in the odds of low birthweight in infants.
This is the margin graph and margin plot of the predictions. On the vertical axis, there is
probability of low birthweight in infants and on the horizontal axis is the age of the mother.
The graphs indicate the probability of low birthweight at various levels of the age of mother.
Along with the predicted values, the graphs depict the 95% confidence interval of low
birthweight for the ages ranging between 25 – 45.

Practical Application of Logistic Regression:

The objective of Logistic Regression is to develop a mathematical equation that gives us a
score in the range of 0 to 1. This score provides the probability of the variable taking the
value 1.
Logistic Regression can be used in a variety of spheres to estimate the real – life problems
1. Spam Detection: Spam detection is a binary classification problem where we need to
classify whether or an email is a spam or not. If the email is spam, we label it 1; if it is
not spam, we label it 0. In order to apply Logistic Regression to the spam detection
problem, the following features of the email are extracted:
• Sender of the email
• Number of typos in the email
• Occurrence of words/phrases like “offer”, “prize”, “free gift”, etc.
The resulting feature vector is then used to train a Logistic classifier which emits a
score in the range 0 to 1. If the score is more than the threshold value, say 0.5, we
label the email as spam. Otherwise, we don’t label it as spam.
2. Fraud Detection: The Credit Card Fraud Detection problem is another application of
Logistic Regression and is of significant importance to the banking industry because
banks each year spend hundreds of millions of dollars due to fraud. When a credit
card transaction happens, the bank makes a note of several factors. For instance, the
date of the transaction, amount, place, type of purchase, etc. Based on these factors,
they develop a Logistic Regression model of whether or not the transaction is a
fraud.
We may label doing fraud as 1 and not doing fraud as zero. Through the logistic
regression estimates, we may determine whether the person is more likely to
commit a fraud or not according to the probability value.

3. Tumour Prediction: A Logistic Regression model may be used to identify whether a

tumour is malignant or if it is benign. Several medical imaging techniques are used to
extract various features of tumours. For instance, the size of the tumour, the
affected body area, etc. These features are then fed to a Logistic Regression classifier
to identify if the tumour is malignant or if it is benign.

4. Subscription Prediction: Logistic regression can be used to predict whether a person

will subscribe to an OTT platform like Netflix, Hotstar or not. We can label
subscribing to the platform as 1 and not subscribing to the platform as 0. The
decision to subscribe may be dependent on factors such as price of the subscription,
types of shows available or the rating of the platform etc. Through this method, we
can predict the probability of a person subscribing to the OTT Platform.

Data Analysis Fundamentals
100% (9)
Data Analysis Fundamentals
56 pages
Visual Guide To Machine Learning
No ratings yet
Visual Guide To Machine Learning
364 pages
Binary Logistic Regression
100% (1)
Binary Logistic Regression
11 pages
Scikit Learn Docs PDF
100% (3)
Scikit Learn Docs PDF
2,204 pages
IQRM Book 2020 Jan 28
No ratings yet
IQRM Book 2020 Jan 28
277 pages
Lecture 4-Logistic Regression
No ratings yet
Lecture 4-Logistic Regression
20 pages
Health Statistics - Lesson 1
No ratings yet
Health Statistics - Lesson 1
40 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
7 pages
Business Analytics: Advance: Logistic Regression
100% (1)
Business Analytics: Advance: Logistic Regression
26 pages
Logistic Regression Model Study Assignment
100% (1)
Logistic Regression Model Study Assignment
5 pages
Unit-Iii-1 1
No ratings yet
Unit-Iii-1 1
31 pages
Econometrics II CH 1
No ratings yet
Econometrics II CH 1
48 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
DA Unit-3
No ratings yet
DA Unit-3
13 pages
LAB04 RegressionTasks
No ratings yet
LAB04 RegressionTasks
31 pages
Report Logistic Regression
No ratings yet
Report Logistic Regression
21 pages
Regression Analysis
No ratings yet
Regression Analysis
14 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
3 pages
Logistic Regression
No ratings yet
Logistic Regression
16 pages
29 LogisticRegression
No ratings yet
29 LogisticRegression
15 pages
Logistic - Poly Regression
No ratings yet
Logistic - Poly Regression
13 pages
2-Logistic Regression
No ratings yet
2-Logistic Regression
15 pages
B.Tech V KCS055 Unit2 2
No ratings yet
B.Tech V KCS055 Unit2 2
7 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
A Research Project On Applying Logistic Regression To Predict Result of Binary Classification Problems
No ratings yet
A Research Project On Applying Logistic Regression To Predict Result of Binary Classification Problems
6 pages
Logistic Regression
No ratings yet
Logistic Regression
16 pages
Unit 2 Linear and Logistic Regression
No ratings yet
Unit 2 Linear and Logistic Regression
23 pages
Chapter Two Dss
No ratings yet
Chapter Two Dss
3 pages
Logistic Regression
No ratings yet
Logistic Regression
17 pages
CH 8
No ratings yet
CH 8
13 pages
Linear Regression Vs Logistic Regression
No ratings yet
Linear Regression Vs Logistic Regression
2 pages
HR Analytics-: Data & Analysis Strategies
No ratings yet
HR Analytics-: Data & Analysis Strategies
27 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
21 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
Experiment No 8
No ratings yet
Experiment No 8
4 pages
SML Updated UNIT 3
No ratings yet
SML Updated UNIT 3
41 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Logistic Regression
No ratings yet
Logistic Regression
7 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Sta 3010 Quizes
No ratings yet
Sta 3010 Quizes
10 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
Chapter 2
No ratings yet
Chapter 2
11 pages
Sarang Ke Liye
No ratings yet
Sarang Ke Liye
14 pages
Compare & Contrast Linear Vs Logistic Regression
No ratings yet
Compare & Contrast Linear Vs Logistic Regression
3 pages
Logistic Regression
No ratings yet
Logistic Regression
22 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
Misc 5
No ratings yet
Misc 5
1 page
Applied Logistic Regression
No ratings yet
Applied Logistic Regression
15 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
Fai Module 3
No ratings yet
Fai Module 3
67 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Logistics Regression
No ratings yet
Logistics Regression
8 pages
G. David Garson-Correspondence Analysis-Statistical Associates Publishing (2012)
No ratings yet
G. David Garson-Correspondence Analysis-Statistical Associates Publishing (2012)
37 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
3 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
MACHINE LEARNING Presentation Logistic Regression
No ratings yet
MACHINE LEARNING Presentation Logistic Regression
18 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Lecture 22. GLM
No ratings yet
Lecture 22. GLM
41 pages
MLStackCafe QAS 1672810525772
No ratings yet
MLStackCafe QAS 1672810525772
12 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
spss10 LOGIT
No ratings yet
spss10 LOGIT
17 pages
Statistics I - Introduction To ANOVA, Regression, and Logistic Regression
100% (1)
Statistics I - Introduction To ANOVA, Regression, and Logistic Regression
29 pages
1.introduction To Biostat
No ratings yet
1.introduction To Biostat
39 pages
Logistic Regression Model - A Review
No ratings yet
Logistic Regression Model - A Review
5 pages
Unit Iii
100% (1)
Unit Iii
36 pages
Microsoft Word - 01 SBST1303 CP PDF
No ratings yet
Microsoft Word - 01 SBST1303 CP PDF
222 pages
Berry Statbook All
100% (1)
Berry Statbook All
328 pages
Chapter 3 Displaying and Describing Categorical Data Part 1
No ratings yet
Chapter 3 Displaying and Describing Categorical Data Part 1
20 pages
Rubric
No ratings yet
Rubric
3 pages
Notes of Week-1 and Week-2
No ratings yet
Notes of Week-1 and Week-2
30 pages
Create A Vector
No ratings yet
Create A Vector
46 pages
CH 01
No ratings yet
CH 01
11 pages
Study Guide - Inference Procedures
No ratings yet
Study Guide - Inference Procedures
4 pages
DataIku Machine Learning Basics p2
No ratings yet
DataIku Machine Learning Basics p2
43 pages
Assignment 1 Assessment Data Analysis 4
No ratings yet
Assignment 1 Assessment Data Analysis 4
11 pages
Cda 12
No ratings yet
Cda 12
75 pages
Feature Engineering
No ratings yet
Feature Engineering
43 pages
Chapter 1. Defining and Collecting Data
No ratings yet
Chapter 1. Defining and Collecting Data
59 pages
Nominal Data - What Is It and How Can You Use It?
No ratings yet
Nominal Data - What Is It and How Can You Use It?
14 pages
Journal3 L3 Q1 PR1
No ratings yet
Journal3 L3 Q1 PR1
3 pages
MASH WhatStatisticalTestHandout PDF
No ratings yet
MASH WhatStatisticalTestHandout PDF
2 pages
Saporta 1990 63
No ratings yet
Saporta 1990 63
7 pages
Attitude and Behavioral Factors in Waste Management in The Construction Industry of Malaysia
No ratings yet
Attitude and Behavioral Factors in Waste Management in The Construction Industry of Malaysia
8 pages
ANOVA Notes
No ratings yet
ANOVA Notes
3 pages
Stats Act
No ratings yet
Stats Act
7 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)

Logistics Regression Notes

Uploaded by

Logistics Regression Notes

Uploaded by

Financial Econometrics Assignment 3

• Gender, such as Male or Female

Binary Logistic Regression:

Assumptions of Binary Linear Regression:

Logistic Regression Vs Linear Regression

S. No. Linear Regression Logistic Regression

The linear regression model is

The equation of the logistic regression model is

This can be written as

p = e^( β0 + β1Xi)/ 1 + e^( β0 + β1Xi)

p (1 + e^( β0 + β1Xi)) = e^( β0 + β1Xi)

p + p(e^( β0 + β1Xi) = e^( β0 + β1Xi)

p/(1-p) = e^( β0 + β1Xi)

This is another form of the logistic regression equation.

Interpretation of the Coefficients:

Logistic Coefficients Method: The coefficient of a continuous predictor is the estimated

Practical Application of Logistic Regression:

3. Tumour Prediction: A Logistic Regression model may be used to identify whether a

4. Subscription Prediction: Logistic regression can be used to predict whether a person

You might also like