0% found this document useful (0 votes)
17 views15 pages

Linear Regression and Logit

Uploaded by

pra2112catprep
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views15 pages

Linear Regression and Logit

Uploaded by

pra2112catprep
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Linear Regression

• Linear Regression Models are used to identify the


relationship between a continuous dependent variable
and one or more independent variables.
• Simple Linear Regression:
• When there is only one independent variable and one
dependent variable.
• Multiple Linear Regression:
• When there are more than one independent variables.
Linear Regression
• Y = a + b1(X1) + b2(X2) + …….bn(Xn)
• Where,
• Y is the dependent variable
• Xs are independent variables
• a is the intercept
• b1....bn are slope coefficients
Logistic Regression
(Logit)
• Similar to linear regression, logistic regression is also
used to estimate the relationship between a dependent
variable and one or more independent variables,
• But it is used to make a prediction about a categorical
variable versus a continuous one.
• A categorical variable can be true or false, yes or no, 1
or 0, etc.
• Logit estimates the probability of an event occurring,
such as voted or didn’t vote, based on a given data set
of independent variables.
Logistic Regression
(Logit)
• The Logit equation is written as:
• Log Odds of Event =β0​+β1​X1​+β2​X2​+⋯+βn​Xn​
• Where:
• β0 is the intercept.
• β1,β2,…βn are coefficients for the predictors X1,X2……Xn​.
• The term log odds is a way of expressing the likelihood of an event (e.g., loan
defaulting, a person being employed) in a form that can be modeled linearly.
• Odds: A ratio of probabilities (p/(1−p)), where p is the probability of the event
happening.
• Log Odds: The natural logarithm of the odds, which allows probabilities to be
modeled linearly.
• Logistic regression predicts log odds, which can be transformed back to
probabilities for interpretation.
Types of Logit
• Binary logistic regression:
• In this approach, the response or dependent variable is dichotomous in nature—
i.e. it has only two possible outcomes (e.g. 0 or 1).
• Within logistic regression, this is the most commonly used approach, and more
generally, it is one of the most common classifiers for binary classification.
• Example 1: Suppose that we are interested in the factors that influence whether
a political candidate wins an election.
• The outcome (response) variable is binary (0/1); win or lose.
• The predictor variables of interest are:
• the amount of time spent on the campaign,
• the amount of money spent campaigning,
• whether the candidate is an incumbent.
• Example 2: A researcher is interested in how variables, such as GRE (Graduate
Record Exam scores), GPA (grade point average) and prestige of the
undergraduate institution, effect admission into graduate school.
• The outcome variable, admit/don’t admit, is binary.
Types of Logit
• Multinomial logistic regression:
• In this type of logistic regression model, the dependent variable has
three or more possible outcomes; however, these values have no
specified order.
• E.g.: movie studios want to predict what genre of film a moviegoer is
likely to see to market films more effectively. A multinomial logistic
regression model can help the studio to determine the strength of
influence a person's age, gender, and dating status may have on the
type of film that they prefer. The studio can then orient an advertising
campaign of a specific movie toward a group of people likely to go see
it.
• The marketing team of an organization can use the model to predict
the likelihood of a customer purchasing a specific product type (Basic,
Standard, or Premium) based on their age, income, and gender.
Types of Logit
• Ordinal logistic regression:
• In this type of logistic regression model, the response variable
has three or more possible outcomes
• But in this case, these values have a defined order.
• E.g.: grading scales from A to F or rating scales from 1 to 5.
Some Applications of Logit
• Fraud detection: Logistic regression models can help
teams identify data anomalies, which are predictive of
fraud. Certain behaviors or characteristics may have a
higher association with fraudulent activities, which is
particularly helpful to banking and other financial
institutions in protecting their clients.
• Disease prediction: In medicine, Logit can be used to
predict the likelihood of disease or illness for a given
population. Healthcare organizations can set up
preventative care for individuals that show higher
propensity for specific illnesses.
Some Applications of Logit
• Churn prediction: Specific behaviors may be
indicative of churn in different functions of an
organization. For example, human resources and
management teams may want to know if there are high
performers within the company who are at risk of
leaving the organization; this type of insight can prompt
conversations to understand problem areas within the
company, such as culture or compensation.
Case Study
• A leading financial institution is striving to improve its loan approval
process by better understanding the risk factors associated with loan
default. Defaulting on a loan not only causes financial losses but also
affects the institution's operational efficiency and reputation. By taking
data on borrowers, the institution seeks to develop a predictive model to
identify individuals who are more likely to default on their loans. The
institution has collected data on past loans, including financial,
demographic, and loan-specific attributes of borrowers, and their loan
repayment outcomes (whether they defaulted or not). The goal is to
analyze this dataset and build a model that predicts the probability of
loan default based on borrower characteristics.
• You are required to create a logistic regression model that:
1.Identifies the key predictors of loan default.
2.Provides actionable insights into borrower profiles more likely to default.
Case Study
(Data description)
• The dataset consists of the following features:
1.Income: Annual income of the borrower.
2.Credit Score: Credit score of the borrower, reflecting their
creditworthiness.
3.Employment Status: Employment status of the borrower (0 for
unemployed, 1 for employed).
4.Debt to Income Ratio: Ratio of the borrower’s debt payments to
their income.
5.Loan Amount: Amount of loan requested by the borrower.
6.Age: Age of the borrower.
7.Loan Default: The target variable (1 for default, 0 for no default).
Results
• A p<0.05 indicates statistical significance at 95% confidence level.
This means that the variable likely affects the dependent variable.
• Income:
• Interpretation: Measures the effect of a one-unit increase in income (e.g., 1
dollar) on the log odds of loan default. The negative coefficient (−0.0002)
suggests that higher income reduces the likelihood of default.
• p-value: 0.1320. This is not statistically significant (p>0.05), meaning the
effect of income on loan default is not conclusive in this model.
• Credit Score:
• Interpretation: Measures the effect of a one-point increase in credit score on
the log odds of loan default. The negative coefficient (−0.0638) suggests
higher credit scores reduce the likelihood of default.
• p-value: 0.1020. This is close to being statistically significant but not below
the 0.05 threshold.
Results
• Employment Status
• Interpretation: Employment status is encoded as 0
(unemployed) and 1 (employed). The negative coefficient
(−0.2967) suggests that being employed might slightly reduce
the likelihood of default, though the effect is negligible.
• p-value: 0.9380. This indicates no significant effect of
employment status on loan default.
• Interpreting coefficients of dummy variable:
• A positive coefficient suggests that when the dummy variable is 1
(as opposed to 0), the dependent variable (e.g., loan default) is
expected to increase.
• A negative coefficient suggests that when the dummy variable is 1
(as opposed to 0), the dependent variable is expected to decrease.
Results
• Debt-to-Income Ratio
• Interpretation: Reflects the effect of a 1 unit increase in debt-to-
income ratio on the log odds of default. The positive coefficient
(5.75555) suggests that higher ratios might increase default
likelihood.
• p-value: 0.6830 This is not statistically significant.
• Loan Amount
• Interpretation: Reflects the effect of a one-unit increase in loan
amount (e.g., 1 dollar) on the log odds of default. The negative
coefficient (−0.0002) suggests larger loan amounts might reduce
default likelihood.
• p-value: 0.2840. This indicates no significant effect of loan amount
on default.
Results
• Age
• Interpretation: Reflects the effect of a one-year increase in
age on the log odds of default. The negative coefficient
(−0.1643) suggests older individuals are slightly less likely to
default.
• p-value: 0.3540 is not statistically significant.

You might also like