0% found this document useful (0 votes)

24 views107 pages

Week 6 Notes

Uploaded by

Rama Bhushan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views107 pages

Week 6 Notes

Uploaded by

Rama Bhushan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 107

Introduction

• Limited dependent variable modeling: background and motivation

• OLS approach: linear probability models (LPMs)
• Issues with LPM models
• Introduction to logit/probit models
• Understanding logit function
Introduction

• Thresholding
• Confusion/classification Matrix
• Receiver operator characteristic (ROC) curve
• Parameter interpretation
• Summary and concluding remarks
Background and Motivation
Limited Dependent Variable/Qualitative
Response Regression
Discrete choice variables, limited dependent variables, or qualitative response
variables are not suitable for modeling through linear regression models
Consider the following questions
• Why do firms choose to list their stocks on NSE vs. BSE?
• Why do some stocks pay dividends and others do not?
• What factors affect large corporate borrowers to default?
• What factors affect choices of internal vs. external financing?
Limited Dependent Variable/Qualitative
Response Regression
Credit default scoring (classification problem)
Linear Probability Model (LPM)
Linear Probability Model (LPM)

• In such models, the dependent variable is Yes/No or 1/0 kind of

variable
• First, we will examine a simple linear regression approach to deal with
such models: linear probability model (LPM)
• This is the most simple approach to deal with binary dependent
variables
• It is based on the assumption that the probability of an event (𝑃𝑖 ) is
linearly related to a set of explanatory variables, 𝑥1𝑖 , 𝑥2𝑖 , … , 𝑥𝑘𝑖
• 𝑃𝑖 = 𝑝 𝑦𝑖 = 1 = 𝛽1 + 𝛽2 𝑥2 + 𝛽3 𝑥3𝑖 + ⋯ + 𝛽𝑘 𝑥𝑘𝑖 + 𝑢𝑖 , 𝑖 = 1, … … . , 𝑁
Linear Probability Model (LPM)

In such models, the actual probabilities cannot be observed, so

your estimates (or dependent variables) would be 0s and 1s
• Consider the relationship between the size of a company “i" and
its ability to pay dividends
𝑌𝑖 = 𝛽1 + 𝛽2 𝑋𝑖 + 𝑢𝑖
where 𝑋𝑖 = market capitalization of the firm, and 𝑌𝑖 =1 if the dividend
is paid and 0 if the dividend is not paid.
Linear Probability Model (LPM)

In such models, the actual probabilities cannot be observed, so

your estimates (or dependent variables) would be 0s and 1s
• This is called linear probability model. The conditional expectation
of 𝑌𝑖 given 𝑋𝑖 , i.e., 𝐸(𝑌𝑖 |𝑋𝑖 ), can be interpreted that the event will
occur given 𝑋𝑖 : that is, 𝑃(𝑌𝑖 = 1|𝑋𝑖 )
• 𝐸(𝑌𝑖 |𝑋𝑖 )= 𝛽1 + 𝛽2 𝑋𝑖 (assuming 𝐸(𝑢𝑖 )=0)
Summary
Issues with LPM
Issues with LPM

Non-normality and heteroscedasticity of

error terms
• 𝑌𝑖 has the following distribution
𝐸(𝑌𝑖 |𝑋𝑖 ) = 0×(1−𝑃𝑖 ) + 1×(𝑃𝑖 ) = 𝑃𝑖
• This kind of model has a number of
econometric issues
• What is the nature of errors:
𝑢𝑖 = 𝑌𝑖 − 𝛽1 − 𝛽2 𝑋𝑖 ?
Issues with LPM

Non-normality and heteroscedasticity

of error terms
• 𝑢𝑖 is not normally distributed;
although in large samples, it is not
a problem
• 𝑢𝑖 s are heteroscedastic, i.e., they
vary with 𝑌𝑖
Issues with LPM

Nonfulfillment of 0 ≤ E(Yi | X) ≤ 1
• 𝑌𝑖 = −0.3 + 0.012𝑋𝑖 ; where 𝑋𝑖 is in million
dollars
• For every $1 million increase in size, the
probability that the firm will pay dividend
increases by 1.2%
• However, for X < $25 million and X > $88
million, the probabilities are less than 0 and
more than 1
Issues with LPM

Nonfulfillment of 0 ≤ E(Yi | X) ≤ 1
• What to do: set all negative as 0 and all
those greater than 1 as 1?
• Implausible to suggest that small firms
will never pay dividend and large firms
will always pay dividends
Issues with LPM

Diminishing utility of 𝑅2 as a goodness of fit

measure
• All the Y values will be on a line Y = 0 or Y = 1
• The conventional LPM is not expected to fit
well with such observations, except those
cases where all the observations are
scattered closely around points A and B
• Both logit and probit approaches are able to
overcome the limitation of LPM that it
produces values less than 0 and more than 1
Introduction to Logit Model
Introduction to Logit Model

The logit (and probit) approaches overcome the

limitations of the regression model by
transforming to a function so that fitted values
are bounded within (0,1) interval
• The fitted function looks like an S-shape
curve
• The logistic function for a random variable z
(𝑒 𝑧𝑖 ) 1
is: 𝐹 𝑧𝑖 = =
(1+𝑒 𝑧𝑖 ) (1+𝑒 −𝑧𝑖 )
Introduction to Logit Model

The logit (and probit) approaches overcome

the limitations of the regression model by
transforming to a function so that fitted
values are bounded within (0,1) interval
• Here F is the cumulative logistic
distribution
• The final logit model: 𝑃𝑖 (𝑦𝑖 = 1) =
1
(1+𝑒 − 𝛽1+𝛽2𝑥2𝑖 +𝛽3𝑥3𝑖 +⋯+𝛽𝑘 𝑥𝑘𝑖 +𝑢𝑖 )
Introduction to Logit Model
1
𝑃𝑖 (𝑦𝑖 = 1) =
(1+𝑒 − 𝛽1 +𝛽2 𝑥2𝑖 +𝛽3 𝑥3𝑖 +⋯+𝛽𝑘 𝑥𝑘𝑖 +𝑢𝑖 )

• Model asymptotically touches 0 (z → −∞) and

1 (z→∞)
• Is this model linear? Hence, not amenable to
OLS estimation
• The model would predict that the probability,
e.g., probability of bank loan default
(dependent variable = y)
Introduction to Logit Model
1
𝑃𝑖 (𝑦𝑖 = 1) =
(1+𝑒 − 𝛽1 +𝛽2 𝑥2𝑖 +𝛽3 𝑥3𝑖 +⋯+𝛽𝑘 𝑥𝑘𝑖 +𝑢𝑖 )

• P (y = 1), then P(y = 0) = 1 − P(y = 1)

• Here independent variables are 𝑥2𝑖 , 𝑥3𝑖 , 𝑥4𝑖 ,
𝑥5𝑖 , and so on
• This is essentially a non-linear transformation
of the model to produce consistent probability
results
Understanding the Logit Function
Understanding the Logit Function
1
𝑃𝑖 (𝑦𝑖 = 1) =
(1+𝑒 − 𝛽1+𝛽2𝑥2𝑖 +𝛽3𝑥3𝑖 +⋯+𝛽𝑘 𝑥𝑘𝑖 +𝑢𝑖 )

• Here extremely low and negative values of

the linear function 𝛽1 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖 +
⋯ + 𝛽𝑘 𝑥𝑘𝑖 would predict No dividend (or
non-default cases) with a high probability
or 𝑃𝑖 (𝑦𝑖 = 0)
Understanding the Logit Function
1
𝑃𝑖 (𝑦𝑖 = 1) =
(1+𝑒 − 𝛽1+𝛽2𝑥2𝑖 +𝛽3𝑥3𝑖 +⋯+𝛽𝑘 𝑥𝑘𝑖 +𝑢𝑖 )

• Extremely high and positive values of the

linear function 𝛽1 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖 + ⋯ +
𝛽𝑘 𝑥𝑘𝑖 would predict dividend payment (or
default cases) with high probability or
𝑃𝑖 (𝑦𝑖 = 1)
Understanding the Logit Function
1
𝑃𝑖 (𝑦𝑖 = 1) =
(1+𝑒 − 𝛽1 +𝛽2 𝑥2𝑖 +𝛽3 𝑥3𝑖 +⋯+𝛽𝑘 𝑥𝑘𝑖 +𝑢𝑖 )

• This can also be expressed in the form of

Odds
𝑃 𝑦=1
• Odds = ;
𝑃 𝑦=0

• Odds > 1 if 𝑦 = 1 is more likely

• Odds < 1 if 𝑦 = 0 is more likely
Understanding the Logit Function
1
𝑃𝑖 (𝑦𝑖 = 1) =
(1+𝑒 − 𝛽1+𝛽2𝑥2𝑖 +𝛽3𝑥3𝑖 +⋯+𝛽𝑘 𝑥𝑘𝑖 +𝑢𝑖 )

• If we substitute the logit function in Odds

equation, then

• Odds = exp(𝛽1 +𝛽2 𝑥2𝑖+𝛽3 𝑥3𝑖 +⋯+𝛽𝑘 𝑥𝑘𝑖+𝑢𝑖) or

• ln Odds = 𝛽1 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖 + ⋯ + 𝛽𝑘 𝑥𝑘𝑖 +
𝑢𝑖
• The higher this logit (or ln Odds ) form, the
higher the probability for 𝑃𝑖 (𝑦𝑖 = 1)
Thresholding
Thresholding

The outcome of the regression model is a probability

• In real life, you would want to make a binary prediction, e.g.,
default or no default
• For this, we may consider a threshold value “t”
• If P(Default = 1) >= t, then predict a default case
• If P(Default = 0)< t, then predict a non-default case
Thresholding

What value should we select for “t”? What kind of error do you
prefer?
• Given a t value, one can make two types of errors: (1) predict
default, but the actual outcome is non-default: false positive; and
(2) predict non-default, but the actual outcome is default: false
negative
• A large threshold (e.g., t = 0.8) will have a very small probability of
predicting defaulters and, at the same time, a high probability of
predicting cases as non-defaulters
Thresholding

What value should we select for “t”? What kind of error do you
prefer?
• A small threshold (e.g., t = 0.1) will have a very large probability of
predicting defaulters and, at the same time, a small probability of
predicting cases as non-defaulters
• An aggressive bank would like to have high t values to increase
the possibility of converting a loan
Thresholding

What value should we select for “t”? What kind of error do you
prefer?
• A more conservative bank may choose a very low t value to select
those loan applications with a very low probability of default
• In the absence of any threshold, t = 0.5 is the correct value to pick
Classification Matrix
Selecting a Threshold:
Confusion/Classification Matrix
Predicted = 0 (Non-Default) Predicted = 1 (Default)
Actual = 0 True Negatives (TN) False Positives (FP)
Actual = 1 False Negatives (FN) True Positives (TP)
Let us compute two outcome measures to determine what kind of errors we are
making
TP
• Sensitivity = = TP rate
TP+FN
TN
• Specificity = = TN rate
TN+FP
Selecting a Threshold:
Confusion/Classification Matrix
Let us compute two outcome measures to determine what kind of errors we are
making
TP
• Sensitivity = = TP rate
TP+FN
TN
• Specificity = = TN rate
TN+FP

• A model with higher t will have lower sensitivity and higher specificity
• A model with lower t will have higher sensitivity and lower specificity
Selecting a Threshold:
Confusion/Classification Matrix
(TN+TP)
• Overall accuracy = , where 𝑁 = number of observations
𝑁
(FP+FN)
• Overall error rate =
𝑁
FN
• False negative error rate =
(TP+FN)
FP
• False positive error rate =
(TN+FP)
Receiver Operating Characteristic
(ROC) Curve
Receiver Operator Characteristic (ROC) Curve

• True positivity (TP) rate on the y-axis,

i.e., the proportion of default correctly
predicted
• False positive on the x-axis, i.e., the
proportion non-default incorrectly
predicted as default cases
• The curve shows how these two
measures vary with different threshold
values
Receiver Operator Characteristic (ROC) Curve

• For t = 1, TP = 0, and FP = 0 → will not be

able to predict any default cases but
correctly predict all the non-default cases
• For t = 0, TP = 1, and FP = 1 → will be
able to correctly predict all the default
cases but incorrectly predict all the non-
default cases
• As we move from t = 1 to t = 0, different
combinations of TP and FP are obtained
Receiver Operator Characteristic (ROC)
Curve
• ROC curve captures all the complete
threshold behavior
• High threshold: high specificity and low
sensitivity
• Low threshold: low specificity and high
sensitivity
• Thus, it is a tradeoff between cost in failing to
detect default cases vs. incorrectly
considering non-default cases as defaulters
Receiver Operator Characteristic (ROC)
Curve
• A 100% score area under the curve
will indicate complete accuracy, i.e., all
the observations are correctly
identified
TP = 1 and FP = 0
• A 50% score will indicate random
guessing, that is, half TP = 0.5 and
TN = 0.5 (FP = 0.5)
Parameter Interpretation
Parameter Interpretation
Parameter Interpretation
Unlike LPM, it is incorrect to state that 1 unit increase in 𝑥2𝑖 will
cause 100*𝛽2 % increase in the probability of 𝑦𝑖 = 1
𝑑𝑃𝑖
• For logit model, we calculate ; this works out to 𝛽2 𝐹(𝑥2𝑖 )(1 −
𝑑𝑥2𝑖
𝐹 𝑥2𝑖 ) for the logit model
• So, a 1-unit increase in 𝑥2𝑖 will increase the probability of 𝑦𝑖 = 1 by
𝛽2 𝐹(𝑥2𝑖 )(1 − 𝐹 𝑥2𝑖 )
• Usually, these marginal/incremental impacts are evaluated at
mean values
Parameter Interpretation
1
Example: 𝑃𝑖 (𝑦𝑖 = 1) =
(1+𝑒 − 𝛽1 +𝛽2 𝑥2𝑖 +𝛽3 𝑥3𝑖 +⋯+𝛽𝑘𝑥𝑘𝑖 +𝑢𝑖 )
1
• 𝐹(𝑧𝑖 ) = 𝑃෡𝑖 = ;
(1+𝑒 − 0.1+0.3𝑥2𝑖 −0.6𝑥3𝑖 +0.9𝑥4𝑖 )

• 𝛽1 = 0.1; 𝛽2 = 0.3; 𝛽3 = −0.6; 𝛽4 = 0.9

• What is 𝐹(𝑧𝑖 )? Given 𝑥ҧ2 = 1.6, 𝑥ҧ3 = 0.20, and 𝑥ҧ4 = 0.10?
• Marginal effects of 𝑥2𝑖 = 𝛽2 𝐹(𝑥2𝑖 )(1 − 𝐹 𝑥2𝑖 )
Parameter Interpretation
1 1
Example: 𝐹(𝑧𝑖 ) = 𝑃෡𝑖 = = = 0.63
(1+𝑒 − 0.1+0.3𝑥2𝑖 −0.6𝑥3𝑖 +0.9𝑥4𝑖 ) 1+𝑒 −0.55

• Thus, a 1-unit increase in 𝑥2𝑖 will increase the probability of 𝑦𝑖 by

0.3*0.63*(1 − 0.63) = 0.07
• Similarly, for 𝑥3𝑖 , −0.6*0.63*(1 − 0.63), and 𝑥4𝑖 , 0.9*0.63*(1 −
0.63)
• Sometimes, these are also called marginal effects
Probit Model
Maximum Likelihood Estimation (MLE)
Goodness-of-Fit Measures
Probit Model

• The probit model uses cumulative normal distribution: 𝐹 𝑧𝑖 =

1 𝑧𝑖 −(𝑧 2
‫׬‬ 𝑒 𝑖 )/2 𝑑𝑧
2𝜋 −∞

• Model asymptotically touches 0 (z→ −∞) and 1 (z→ ∞)

• Marginal impact of unit change on an explanatory variable 𝑥2𝑖 is
given as 𝛽2 𝐹(𝑧𝑖 ), where 𝛽2 is the parameter attached to 𝑥2𝑖 ;
𝑧𝑖 = 𝛽1 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖 + ⋯ + 𝛽𝑘 𝑥𝑘𝑖 + 𝑢𝑖
• Both logit and probit models give similar results; differences may
occur when data is extremely imbalanced
Maximum Likelihood Estimation (MLE) of
Logit/Probit Models
These are non-linear models, hence cannot be estimated with a
simple OLS method
• They are estimated with MLE
• In MLE, parameters are chosen to maximize a log-likelihood
function
• The log-likelihood function obtains the population estimates that
maximize the joint probability of observed sample/sample
estimates
Goodness-of-Fit Measures

Conventional 𝑅2 and adj. −𝑅2 measures do not work well with these
models
MLE aims to maximize the log-likelihood function (LLF) and do not
minimize RSS
(1) % of 𝑦𝑖 values correctly predicted
(2) % of 𝑦𝑖 = 1 values correctly predicted + % of 𝑦𝑖 = 0 values
correctly predicted
Goodness-of-Fit Measures

Conventional 𝑅2 and adj. −𝑅2 measures do not work well

LLF
(3) Pseudo − 𝑅 = 1 −
2
, where LLF is the maximized value of
LLF0
the log-likelihood function for the logit and probit models, and
LLF0 is the value of the log-likelihood function for a restricted
model
Summary and Concluding Remarks
Summary and Concluding Remarks

• Among supervised learning algorithms, classification algorithm is a very

important tool employed in the finance domain for applications such as credit
scoring of loan applications
• Classification algorithms are very often implemented through Logit/Probit
class of models; these are very simple yet powerful models
• These models account for a number of shortcomings of linear probability
models: (a) non-normality and heteroscedasticity of error terms; (b) values of
the dependent variable (probability) exceeding the 0–1 range; and (c)
diminishing utility of conventional measures of goodness-of-fit (e.g., 𝑅2 )
Summary and Concluding Remarks

• Limited dependent variable models (e.g., Logit model) employ

cumulative probability functions (e.g., logistic function)
• These models, although non-linear, are very useful for modeling
limited dependent variables that are probabilistic in nature
• In the case of the logit model, the logit function is essential the
odds ratio
• Since the estimated variable is in the form of probabilities, the
thresholding process is needed to convert these probabilities into
limited outcomes (e.g., Yes/No)
Summary and Concluding Remarks

• The conventional measures of goodness-of-fit (e.g., 𝑅2 ) are not very

useful for such models
• These measures are evaluated on their ability to accurately classify
observations correctly
• For such purposes, a confusion/classification matrix is often employed
• The receiver operator characteristic (ROC) curve provides another
useful tool to examine the efficiency of these models, and also
facilitates the selection of thresholding values
Summary and Concluding Remarks

• Unlike simple linear models, the parameter estimates are interpreted in a

different manner
• Marginal effects are computed to interpret the coefficients and their
relationship with the dependent variable
• Other models (e.g., probit model) remain identical in all other aspects,
except that a different cumulative probability function is considered (normal
distribution in case of probit)
• Since the model is non-linear in nature, OLS cannot be employed for
estimation; maximum likelihood method is often employed to estimate these
models
Thanks!
Introduction

• Application of classification algorithm in the prediction of

security prices
• Revisiting the ABC case study
• Logit/Probit modeling
• Training the model and testing the model
• Model performance evaluation
• Summary and concluding remarks
Case Study: ABC Stock Price
Forecasting
Case Study: Stock Price Prediction

• Stock price prediction or stock return prediction is an attempt to

determine the future value of a company based on an analysis of
factors, which impact its price movement
• There are a number of factors that help in predicting stock prices
• These can be macroeconomic factors like the state of the
country’s economy, growth rate inflation, etc.
• There are also other factors that are more specific to a stock like
profit margin, debt to equity issues, sales of a company, etc.
Case Study: Stock Price Prediction

So, we are given the data for stock market price for ABC company, along with Nifty and Sensex
(market indices). We are also given the data of dividend announcement and a sentiment index.

Dividend
Date Price ABC Sensex Sentiment Nifty
Announced
03-01-2007 718.15 0.079925 0.073772 0 0.048936 0.095816
04-01-2007 712.9 –0.00731 0.021562 0 –0.05504 0.009706
05-01-2007 730 0.023987 –0.02441 0 0.019135 –0.03221
06-01-2007 788.35 0.079932 0.012046 0 0.080355 0.011205
07-01-2007 851.4 0.079977 –0.0013 0 0.094038 –0.0004
10-01-2007 919.5 0.079986 0.019191 1 0.015229 0.030168
11-01-2007 880 –0.04296 –0.04025 0 –0.07217 –0.04966
12-01-2007 893.75 0.015625 0.036799 0 0.01396 0.020999
13-01-2007 875 –0.02098 –0.00845 0 0.057518 –0.01164
14-01-2007 891 0.018286 0.004858 1 0.008828 0.020714
17-01-2007 819.75 –0.07997 –0.01228 0 –0.12395 –0.00962
…… …… …… …… …… …… ……
…… …… …… …… …… …… ……
Case Study: Stock Price Prediction

• Consider a portfolio manager who has built a model for a particular

stock
• The manager wants to predict whether in the next period the ABC
stock price returns for this stock will go up or down
• The data starts from 2007 and goes till 2019, so we have
approximately 13 years of data
• We have daily returns of ABC or a change in the price of ABC in
column B. Next, we have a daily return on Sensex in column C and
a daily return on Nifty in column D.
Case Study: Stock Price Prediction

• Sensex and Nifty are the two main stock indices used in India
• They are benchmark Indian stock market indices that represent
the weighted average of the largest Indian companies
• So, Sensex represents average of 30 largest and most actively
traded Indian companies
• Similarly, Nifty represents a weighted average of 50 largest Indian
companies
Summary

The following tasks need to be performed

• Create a dummy variable that is 1 when stock prices go up and
create a dummy variable that is “0” when stock prices go down
• Segregate the data into test and train datasets
• Train and build the model using simple logit/probit classification
algorithms using market index as the independent variable, and
up/down dummy as the dependent variable
Summary

The following tasks need to be performed

• Evaluate the in-sample performance and out-of-sample
performance of the model
• Compute the marginal effects of the independent variable
• Visualize the performance of these models using the ROCR curve
• Examine the classification accuracy of the model and compare it
with a similar linear probability model
•
Data Input and Exploration
Data Input and Exploration

• In this video, we will start with the implementation of the

classification algorithms using ABC Case study Data
• First, we will set the working directory, then we will read the data
• Lastly, we will create the binary response variable: ‘1’ for positive
returns and ‘0’ for negative returns
Summary

• We started our analysis with setting the working directory

• Next, we loaded the relevant package libraries
• Then we read the data from the working directory
• Lastly we created a new ‘updown’ binary response variable, which
is ‘1’ when returns are positive and ‘0’ when returns are negative
Creation of Test and Train Datasets
Creation of Test and Train Datasets

• In this video, we will create the test and train sample datasets
• Then we will examine the distribution of our binary response
variable in 1’s and 0’s
Summary

• First, we filtered the observations after 2006 and cleaned our

data
• Next, we randomly selected 80% observations as training dataset
and remaining 20% as test dataset
• Lastly, we tested the proportion of 1’s and 0’s in the parent
dataset, test dataset, and train dataset
• The distribution of 1’s and 0’s is fairly similar for all the three
datasets
Training the Linear Probability Model
(LPM) Algorithm
Training the LPM Algorithm

• In this video, we will train an LPM algorithm with the training

dataset
• Next, we will compute the classification/confusion matrix
• Final, using the classification/confusion matrix, we will compute
various performance measures, i.e., accuracy, specificity, and
sensitivity
Summary

• We trained an LPM algorithm using the training dataset

• Using the fitted results, we converted them into 1’s and 0’s using
thresholding values of 0.4, 0.6, and 0.8
• Lastly, using the classification/confusion matrix, we computed
three performance parameters, namely, accuracy, specificity,
and sensitivity
Training the Logit/Probit Algorithms
Training the Logit/Probit Algorithms

• In this video, we will train the Logit/Probit classification algorithms

using the training dataset
• Next, we will compute the in-sample performance evaluation
measures
• We will also compute the marginal effects of the independent
variable on the dependent variable
• Lastly, we will evaluate and compare the performance of these
algorithms on parameters, namely, accuracy, specificity, and
sensitivity
Summary

• We trained our classification algorithms using the training dataset

• Next, we computed the Pseudo R-square measure and also
computed the marginal effects
• Lastly, we evaluated the performance of these algorithms on three
parameters of sensitivity, specificity, and accuracy, using
classification matrix at threshold values of 0.4, 0.6, and 0.8
• The performance of all the algorithms appear to be close to each
other; this is ascribed to the fairly symmetric distribution of 1’s
and 0’s in the training dataset
Visualizing the Performance
Visualizing the Performance

• In this video, we will compare the performance of the three

trained classification algorithms (linear, logit, and probit objects)
using correlation measure and through visualization
Summary

• We computed the correlation across the fitted values for the three
classification algorithms (linear, logit, and probit)
• The correlations appear to be very high
• Next, we visualized the performance of the algorithms on
parameters of accuracy, sensitivity, and specificity for the three
threshold values of 0.4, 0.6, and 0.8
• While the performance of these algorithms appear to be close,
logit model appears to offer the best fit, followed by the probit,
and then the linear model
Receiver Operating Characteristic
(ROC) Curve
ROC Curve

• In this video, we will compare the performance of the three

trained classification algorithms (linear, logit, and probit objects)
with the help of ROC curve
Summary

• We plotted the ROC curve and examined the performance of the

three trained classification algorithms
• Area under the curve (AUC) appears to be identical for all the
three algorithms; this is ascribed to the extremely high correlation
in the fitted objects of these models
Defining the Objective Performance
Function
Defining the Objective Performance
Function
• In this video, we will develop a simple machine learning system
that will help the computer learn how to select the best
classification algorithm across a class of algorithms
• We will create a suitable user defined performance function to
analyze the performance of these algorithms
Summary

• We created an optimization function, which included the

arguments, namely, fitted values, actual values, and simulated
threshold values
• These values are employed to compute accuracy, sensitivity, and
specificity parameters through classification matrix
• The final performance object is a simple average of these three
parameters (i.e., accuracy, sensitivity, and specificity)

•
Creating Performance Objects
Creating Performance Objects

• In the previous video, we defined our performance objective

function; in this video we will simulate 1000 threshold values and
calculate the performance object values for all the three
classification algorithms using these threshold values
Summary

• We created three performance objects for the three classification

algorithms, namely logit, probit, and linear
• We simulated 1000 performance object values using our
performance objective function for all the three algorithms (linear,
logit, and probit)
In-sample Performance Evaluation
In-sample Performance Evaluation

• In the previous video, we computed 1000 performance object

values for the three classification algorithms
• In this video we will compare the performance of these three
classification algorithms through visualization
Summary

• We plotted 1000 performance object values for our three

classification algorithms, namely, linear, logit, and probit
• We found that for most of the threshold values, the logit model
algorithm works best, closely followed by probit model algorithm,
and lastly the linear model algorithm
• Lastly, we extracted the best fit model and the corresponding
threshold value
Out-of-Sample Prediction
Out-of-Sample Prediction

• In this video, we will start with out-of-sample prediction

• We will use the trained algorithms for our linear, logit, and probit
models and predict using test data set
• Lastly, we will compute the correlations across the predicted
values between the three algorithms
Summary

• We performed the prediction on the test data using our trained

algorithms for linear, logit, and probit models
• We found that the correlation across the predicted values are very
high; in fact the correlation between logit and probit predicted
values are 99%, and the correlation with linear model predicted
values are more than 90%
• This is ascribed to the fact that correlations across predicted
objects are very high, and the distribution of 1’s and 0’s is highly
symmetric in our test and training datasets
Out-of-Sample Prediction: ROC Curve
Out-of-Sample Prediction: ROC Curve

• In the previous video, we performed prediction with trained

algorithms, using the test datasets
• In this, video, we will visualize and compare the performance of
the three trained algorithms, using ROC curve and also compute
area under the ROC curve
Summary

• We plotted ROC curves for all the three classification algorithms

for linear, logit, and probit models
• The performances as per the ROC curve are quite similar with
identical area under the curve (ROC)
• This is ascribed to the high correlation across fitted objects and
symmetric nature of 1’s and 0’s in our test and training datasets
• In the next video, we will simulate 1000 threshold values and
compute the performance object values
Out-of-Sample Prediction:
Performance object
Out-of-Sample Prediction: Performance
object
• We have already set-up a performance object, which is the
average of three parameters: accuracy, sensitivity, and specificity
• Using our predicted values for all the three algorithms, we will
compute the performance object values for the 1000 simulated
threshold values
Summary

• In this video, we computed the values of our performance object

using 1000 simulated threshold values for all the three algorithms,
i.e., linear, logit, and probit
• In the next video, using these values of the performance object,
we will visualize and compare the out-of-sample performance of
the three algorithms
Out-of-Sample Prediction:
Performance Evaluation and
Visualization
Out-of-Sample Prediction: Performance
Evaluation and Visualization
• In the previous video, we have simulated 1000 performance
object values using our trained algorithms with the test data
• In this video, using these performance object values, we will
visualize and compare the performance of the three trained
algorithms
Summary

• To summarize, we plotted our simulated performance object

values
• For most of the threshold region, the logit model offers the best
prediction, closely followed by the probit and linear models
• We also extracted the details corresponding to the best
performance object value, including its threshold level
Summary and Concluding Remarks
Summary and Concluding Remarks

• ABC stock price up/down movements are modelled using

logit/probit classification algorithms
• The model is trained using the training dataset and is examined
on various measures of model performance evaluation
• Fitted modelled is examined visually as well
Summary and Concluding Remarks

• The model is tested using test dataset and various measures of

out of sample fit are examined
• Marginal effects of these independent variables are computed
• The performance of this model is compared with a similar linear
probability model
Thanks!

How To Do A Dickey Fuller Test Using Excel
86% (7)
How To Do A Dickey Fuller Test Using Excel
2 pages
Chapter - Five - Limited Dependent Variable Models
No ratings yet
Chapter - Five - Limited Dependent Variable Models
75 pages
K Kiran Kumar IIM Indore
100% (1)
K Kiran Kumar IIM Indore
115 pages
Topic 3: Qualitative Response Regression Models
No ratings yet
Topic 3: Qualitative Response Regression Models
29 pages
Qualitative Response Regression Model - Probabilistic Models
No ratings yet
Qualitative Response Regression Model - Probabilistic Models
34 pages
Chapter 5-LDVM-2024
No ratings yet
Chapter 5-LDVM-2024
27 pages
MicroEconometrics Lecture10
No ratings yet
MicroEconometrics Lecture10
27 pages
Econometrics - Qualitative Response Models
No ratings yet
Econometrics - Qualitative Response Models
17 pages
Lecture 8 - Limited Dependent Var PDF
No ratings yet
Lecture 8 - Limited Dependent Var PDF
78 pages
Discrete Choice Models 230919 191735
No ratings yet
Discrete Choice Models 230919 191735
132 pages
Discrete Choice Model Soderbom
No ratings yet
Discrete Choice Model Soderbom
43 pages
Econometrics Eviews 6
No ratings yet
Econometrics Eviews 6
12 pages
Chapter 1 Slides Handout
No ratings yet
Chapter 1 Slides Handout
65 pages
Qualitative Response Models
No ratings yet
Qualitative Response Models
35 pages
Chapter 5 MGT
No ratings yet
Chapter 5 MGT
60 pages
Week 12 LPN Logit 0
No ratings yet
Week 12 LPN Logit 0
35 pages
Qualitative Response Regression Questions
No ratings yet
Qualitative Response Regression Questions
10 pages
Qualitative Response Regression Models 1
No ratings yet
Qualitative Response Regression Models 1
29 pages
Seminar Econometrie
No ratings yet
Seminar Econometrie
15 pages
Chapter 15.1
No ratings yet
Chapter 15.1
22 pages
CH - 7 - Econometrics UG
No ratings yet
CH - 7 - Econometrics UG
18 pages
Ecmetrics II Ch1
No ratings yet
Ecmetrics II Ch1
56 pages
Metrikaq
No ratings yet
Metrikaq
11 pages
Ecntr Assmm
No ratings yet
Ecntr Assmm
23 pages
Limited Dependent Variables
No ratings yet
Limited Dependent Variables
17 pages
Limited Dependent Variables - Binary Dependent Variables
No ratings yet
Limited Dependent Variables - Binary Dependent Variables
24 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Lecture Notes 5
No ratings yet
Lecture Notes 5
19 pages
Fe5209 4 Ay 2024
No ratings yet
Fe5209 4 Ay 2024
23 pages
Econometrics 2 Module 5 Video 1 Canvas
No ratings yet
Econometrics 2 Module 5 Video 1 Canvas
9 pages
Course Title: Quantitative Techniques For Economics Course Code: ECON6002 Topic: The Linear Probability Model (LPM)
No ratings yet
Course Title: Quantitative Techniques For Economics Course Code: ECON6002 Topic: The Linear Probability Model (LPM)
12 pages
Binaryresponsemf IMP
No ratings yet
Binaryresponsemf IMP
11 pages
Logit Probit
No ratings yet
Logit Probit
11 pages
TS&PDA
No ratings yet
TS&PDA
13 pages
Probit Logit Models
No ratings yet
Probit Logit Models
26 pages
Econometrics CH 4
No ratings yet
Econometrics CH 4
14 pages
Regress Model Chap 11 Four Per Page
No ratings yet
Regress Model Chap 11 Four Per Page
9 pages
Binary
No ratings yet
Binary
47 pages
LPM Stata Baum
No ratings yet
LPM Stata Baum
73 pages
Binary
No ratings yet
Binary
40 pages
In All The Regression Models That We Have Considered So
100% (1)
In All The Regression Models That We Have Considered So
52 pages
Dummy Dependent Variable
100% (1)
Dummy Dependent Variable
58 pages
Part III - Analysis With NonLinear Models
No ratings yet
Part III - Analysis With NonLinear Models
68 pages
411 Note LDV
No ratings yet
411 Note LDV
12 pages
Chapter 4
No ratings yet
Chapter 4
11 pages
7 Binaryresponsemf
No ratings yet
7 Binaryresponsemf
11 pages
Qualitative Response Regression Models
No ratings yet
Qualitative Response Regression Models
6 pages
Logit Probit
No ratings yet
Logit Probit
20 pages
4a. LPM-Logit-Probit-Tobit Model - IInd Sem 23-24
No ratings yet
4a. LPM-Logit-Probit-Tobit Model - IInd Sem 23-24
130 pages
CH 5. Discrete Choice Model
No ratings yet
CH 5. Discrete Choice Model
38 pages
Us20 Allison
No ratings yet
Us20 Allison
10 pages
0 LIMDEP MODEL NB NB
No ratings yet
0 LIMDEP MODEL NB NB
96 pages
Chapter 5
No ratings yet
Chapter 5
25 pages
Unitb - II - Linear Probability, Logit and Probit
No ratings yet
Unitb - II - Linear Probability, Logit and Probit
34 pages
Econometric Lec7
No ratings yet
Econometric Lec7
26 pages
Chapter 15 Qualitative Response Regression Models Part 2
No ratings yet
Chapter 15 Qualitative Response Regression Models Part 2
31 pages
Tedo New Se
No ratings yet
Tedo New Se
29 pages
Msfe Week9
No ratings yet
Msfe Week9
5 pages
09-Limited Dependent Variable Models
No ratings yet
09-Limited Dependent Variable Models
71 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet
Adsl Exp 9 2024
No ratings yet
Adsl Exp 9 2024
14 pages
Credit Card Default
No ratings yet
Credit Card Default
5 pages
Smoothed Bootstrap - Nelson-Siegel Revisited June 2010
No ratings yet
Smoothed Bootstrap - Nelson-Siegel Revisited June 2010
38 pages
Rotated Component Matrix
No ratings yet
Rotated Component Matrix
4 pages
第一次電腦分組作業
No ratings yet
第一次電腦分組作業
12 pages
ML Course Slides
No ratings yet
ML Course Slides
345 pages
Three Segments: - Multiple Regression (MR) - Matrix Algebra - Estimation of Coefficients
No ratings yet
Three Segments: - Multiple Regression (MR) - Matrix Algebra - Estimation of Coefficients
16 pages
Correlation and Regression
80% (5)
Correlation and Regression
24 pages
Performance of Machine Learning Methods in Diagnosing Parkinson's Disease Based On Dysphonia Measures
No ratings yet
Performance of Machine Learning Methods in Diagnosing Parkinson's Disease Based On Dysphonia Measures
11 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
CH 02 Ans
No ratings yet
CH 02 Ans
20 pages
Model Selection and Model Validation
No ratings yet
Model Selection and Model Validation
36 pages
KNN Algo
No ratings yet
KNN Algo
7 pages
ANOVA - Statistics Solutions
No ratings yet
ANOVA - Statistics Solutions
5 pages
Business Statistics Formula - Sheet
100% (2)
Business Statistics Formula - Sheet
7 pages
Assignment 2 DMED2103 - Statistics For Educational Research
No ratings yet
Assignment 2 DMED2103 - Statistics For Educational Research
6 pages
Multilevel Linear Modeling/Hierarchical Linear Modeling: By: Amanda Richmond
No ratings yet
Multilevel Linear Modeling/Hierarchical Linear Modeling: By: Amanda Richmond
23 pages
What Is A Correlation Matrix?
No ratings yet
What Is A Correlation Matrix?
4 pages
Medical Biostatistics 2
No ratings yet
Medical Biostatistics 2
278 pages
Introductory Econometrics For Finance 3rd Edition Chris Brooks PDF Download
100% (1)
Introductory Econometrics For Finance 3rd Edition Chris Brooks PDF Download
62 pages
Machine Learning Project: Problem 1
67% (3)
Machine Learning Project: Problem 1
26 pages
Chapter 08 - ANOVA MANOVA
No ratings yet
Chapter 08 - ANOVA MANOVA
33 pages
MC Multiple Regression
No ratings yet
MC Multiple Regression
7 pages
Machine Learning Notebook
No ratings yet
Machine Learning Notebook
19 pages
Slides 1 Arnold Ventures 2024
No ratings yet
Slides 1 Arnold Ventures 2024
68 pages
Econometric Theory: Module - Ii
No ratings yet
Econometric Theory: Module - Ii
11 pages
Chapter 5 TEst
No ratings yet
Chapter 5 TEst
18 pages
Shrinkage Regression: Rolf Sundberg Volume 4, PP 1994-1998 in
No ratings yet
Shrinkage Regression: Rolf Sundberg Volume 4, PP 1994-1998 in
5 pages

Week 6 Notes

Uploaded by

Week 6 Notes

Uploaded by

Introduction

• Limited dependent variable modeling: background and motivation

• In such models, the dependent variable is Yes/No or 1/0 kind of

In such models, the actual probabilities cannot be observed, so

In such models, the actual probabilities cannot be observed, so

Non-normality and heteroscedasticity of

Non-normality and heteroscedasticity

Diminishing utility of 𝑅2 as a goodness of fit

The logit (and probit) approaches overcome the

The logit (and probit) approaches overcome

• Model asymptotically touches 0 (z → −∞) and

• P (y = 1), then P(y = 0) = 1 − P(y = 1)

• Here extremely low and negative values of

• Extremely high and positive values of the

• This can also be expressed in the form of

• Odds > 1 if 𝑦 = 1 is more likely

• If we substitute the logit function in Odds

• Odds = exp(𝛽1 +𝛽2 𝑥2𝑖+𝛽3 𝑥3𝑖 +⋯+𝛽𝑘 𝑥𝑘𝑖+𝑢𝑖) or

The outcome of the regression model is a probability

• True positivity (TP) rate on the y-axis,

• For t = 1, TP = 0, and FP = 0 → will not be

• 𝛽1 = 0.1; 𝛽2 = 0.3; 𝛽3 = −0.6; 𝛽4 = 0.9

• Thus, a 1-unit increase in 𝑥2𝑖 will increase the probability of 𝑦𝑖 by

• The probit model uses cumulative normal distribution: 𝐹 𝑧𝑖 =

• Model asymptotically touches 0 (z→ −∞) and 1 (z→ ∞)

Conventional 𝑅2 and adj. −𝑅2 measures do not work well

• Among supervised learning algorithms, classification algorithm is a very

• Limited dependent variable models (e.g., Logit model) employ

• The conventional measures of goodness-of-fit (e.g., 𝑅2 ) are not very

• Unlike simple linear models, the parameter estimates are interpreted in a

• Application of classification algorithm in the prediction of

• Stock price prediction or stock return prediction is an attempt to

• Consider a portfolio manager who has built a model for a particular

The following tasks need to be performed

The following tasks need to be performed

• In this video, we will start with the implementation of the

• We started our analysis with setting the working directory

• First, we filtered the observations after 2006 and cleaned our

• In this video, we will train an LPM algorithm with the training

• We trained an LPM algorithm using the training dataset

• In this video, we will train the Logit/Probit classification algorithms

• We trained our classification algorithms using the training dataset

• In this video, we will compare the performance of the three

• In this video, we will compare the performance of the three

• We plotted the ROC curve and examined the performance of the

• We created an optimization function, which included the

• In the previous video, we defined our performance objective

• We created three performance objects for the three classification

• In the previous video, we computed 1000 performance object

• We plotted 1000 performance object values for our three

• In this video, we will start with out-of-sample prediction

• We performed the prediction on the test data using our trained

• In the previous video, we performed prediction with trained

• We plotted ROC curves for all the three classification algorithms

• In this video, we computed the values of our performance object

• To summarize, we plotted our simulated performance object

• ABC stock price up/down movements are modelled using

• The model is tested using test dataset and various measures of

You might also like