0% found this document useful (0 votes)
23 views78 pages

+part 02 - AMEFA - 2024 - Introduction and Repetition

Jonkoping University Analytical Methods for Economic and Financial Analysis Part 2

Uploaded by

Abhishekh Pandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views78 pages

+part 02 - AMEFA - 2024 - Introduction and Repetition

Jonkoping University Analytical Methods for Economic and Financial Analysis Part 2

Uploaded by

Abhishekh Pandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

AMEFA

Pär Sjölander
+Part 02 - AMEFA - 2024 - Introduction and Repetition.pptx
Flow Chart of Econometric Modeling
Identify Research Problem: Moreover, state why it is relevant to analyze.
Literature Review and Gap Identification: Examine prior studies to ensure your research is anchored in established
literature, and identify gaps that your work aims to address, thereby offering a unique contribution to the research
field.
Empirical Evaluation of Theoretical Models: Build a model to empirically test established economic theories,
selecting variables that are central to the theoretical framework. Outline the anticipated interactions between these
variables based on theoretical propositions. Transform the theoretical constructs into an econometric model, aiming
to evaluate the theory with empirical data.
Hypothesis Formulation: Create testable hypotheses defining expected relationships and hypothesizing their
statistical significance.
Data Collection: Compile data needed for analysis, ensuring its quality and relevance, whether from existing sources
or via new data collection methods like surveys or experiments.
Econometric Model Estimation: Use specific econometric methods, like regression analysis, to estimate the model.
Model Diagnostics: Conduct tests for common issues (e.g., multicollinearity, heteroskedasticity, autocorrelation,
model misspecification), applying corrective measures as needed.
Results Interpretation: Does the result make sense? Examine estimated coefficients and their statistical significance,
comparing findings with theoretical predictions and assessing their economic relevance.
Robustness Checks: Validate results through various robustness tests, such as alternative specifications, control
variables inclusion, or different estimation methods.
Conclusion and Policy Recommendations: Summarize principal findings, their broader implications, limitations, and
their contribution to existing knowledge. Discuss the research's policy relevance and directions for future inquiry.
Research Reporting: Publication to appropriate audiences.
Linear Models
• John Maynard Keynes in his 1936 work "The General Theory of Employment, Interest, and
Money,“ he emphasizes the significance of aggregate demand in the economy, suggesting that
the consumption function is primarily influenced by current income.
• The MPC is typically between 0 and 1, indicating that a person spends a portion of their
additional income. However, some indiviuals and countries borrow.
• 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑡𝑡 = 𝛽𝛽0 + 𝛽𝛽1 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑡𝑡) where the parameters 𝛽𝛽0 and 𝛽𝛽1 are unknown and must be
estimated by drawing a sample from the population.
• Theoretically, in this specific case since 0.7, we know that 𝛽𝛽1 > 0, which means a positive
association between the Income and Spending.
• We focus on simple linear OLS regression now – however, in practice, the relationship between education and income is likely
to be influenced by many other factors, such as work experience, industry, and geographical location, which are not accounted for in this simple model. It is also most
likely a non-linear relationship since Warren Buffet cannot spend anything more on consumption after he reached a certain income level.
Linear Models – what’s the point of OLS regressions?
• For example, let 𝑋𝑋 and 𝑌𝑌 be income and spending,
respectively. If dropping the error term, and the subscripts
for simplicity, a linear model of Spending (Y) in terms of
Income (X) can be written as 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋
(which is equal to 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = 𝛽𝛽0 + 𝛽𝛽1 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼).
•Cross-sectional analysis can be specified by “i” (individual):
i=1, 2,…,100 individuals: 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑖𝑖
Research Question: How does individual income affect
•Time-series analysis can be specified by “t” (time): spending behavior? (Assume e.g. cross-sectional data)
t=1, 2,…,365 days: 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑡𝑡 = 𝛽𝛽0 + 𝛽𝛽1 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑡𝑡) whereTest hypothesis: H0: 𝛽𝛽1 =0 vs. e.g. H1: 𝛽𝛽1 > 0
Intuition: Can the data support our theory and test
the parameters 𝛽𝛽0 and 𝛽𝛽1 are unknown and must be hypothesis – or is it just a random relationship in the data?
Conclusion: If e.g. p-value<5%, the data reject the null
estimated by drawing a sample from the population. hypothesis. If correctly specified with assumptions met,
there is a significant relationship between Income and
•Panel-data analysis can be specified by “i” & “t”: Speding.
Policy implications: If e.g. 𝛽𝛽1 =0.7 for each extra SEK in
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖, 𝑡𝑡 = 𝛽𝛽0,𝑖𝑖 + 𝛽𝛽1 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑖𝑖 , 𝑡𝑡) where the income, 0.7 SEK is spent on consumption. The
government/central bank can use this info to change the
parameters 𝛽𝛽0,𝑖𝑖 is varying over individuals, and 𝛽𝛽1 is fixed.taxes, subsidies/welfare, interest rates, adjust the budget.
Intro/Repetition: What is a R2?
Definition:
R², or the coefficient of determination, is the proportion of the variance in the dependent
variable that is predictable from the independent variables. It is a statistical measure that
represents the goodness of fit of a model.
𝑦𝑦 = β0 + β1 𝑥𝑥1 + β2 𝑥𝑥2 + 𝜖𝜖

Statistical Interpretation:
R² measures the strength of the relationship between the model and the dependent
variable(s). A higher R² value indicates a better fit and suggests that the model’s explanatory
variables can explain a large proportion of the variance in the dependent variable.

R² values range from 0 to 1. An R² of 0 indicates that the model explains none of the variability
of the response data around its mean, while an R² of 1 indicates that the model explains all the
variability of the response data around its mean.
Intro/Repetition: What is a t-test in regression analysis?
Definition:
A t-test in regression analysis is used to test the null hypothesis that a specific parameter, such
as a slope coefficient, is not significantly different from zero. This test is crucial for evaluating
if individual independent variables contribute to the explanatory power of the model
𝑦𝑦 = β0 + β1 𝑥𝑥1 + β2 𝑥𝑥2 + 𝜖𝜖

H0: βj=0 vs. H1: βj≠0

Statistical Interpretations:
Produces a t-value (with its p-value), assessing if differences are statistically significant – or
just due to natural variation in the data set (due to random chance). If (in relationship to the
variation) the t statistic is large in absolute terms (e.g. >1.96, at the 5% significance level) the
βj is statistically significantly different from zero and we reject H0.

A significant t-statistic indicates that the predictor is meaningfully contributing to the model,
and the relationship not simply caused by natural variation.
Intro/Repetition: What is a F-test in regression analysis?
Definition:
An F-test in regression analysis is used to test the null hypothesis that e.g. all independent
variables (when considered together) do not significantly contribute to the explanatory power
of the model. This evaluates the overall fit of the model.
𝑦𝑦 = β0 + β1 𝑥𝑥1 + β2 𝑥𝑥2 + 𝜖𝜖
H0: β1=β2=0 vs. H1: at least one βj≠0

Statistical Interpretation:
Produces an F-value (with its p-value), assessing if the overall model is statistically significant.
If the F-value is larger than the critical value from the F-distribution (e.g., F > 3.84 at the 5%
significance level for a model with 2 and 120 degrees of freedom), it indicates that the overall
model is statistically significant, and we reject H0.

A significant F-statistic suggests that the overall model is better at explaining the variation in
the dependent variable than a model without the independent variables. Then the
relationship does not seem to be due to random chance.
Intro/Repetition: What is a p-value?
Definition:
The p-value represents the probability of observing a test statistic as extreme as, or more
extreme than, the one obtained from the sample data, given that the null hypothesis (H0) is
true. It quantifies the evidence against H0 by indicating the likelihood of the observed data
under the assumption that H0 holds. For example, a p-value of 0.04 suggests that, assuming
the null hypothesis is true, there is a 4% chance of obtaining a result as extreme as, or more
extreme than, the one observed, solely due to random variation. This does NOT imply that H0
has a 4% probability of being true; instead, it reflects the rarity of the test statistic's value
under the null hypothesis framework.
Statistical Interpretation:
A measure of the strength of evidence against the null hypothesis. A small p-value indicates
strong evidence against the null hypothesis.
The smaller the p-value, the greater the statistical significance of the observed difference 
Reject H0 (otherwise for high p-values we cannot reject H0). If we reject the null hypothesis
we believe that there is a relationship between the variables.
No statistical tables are needed to reject/not reject. P-value can answer if a relationship is
statistically significant or just due to chance and natural variation in the variables.
At a 5% significance level, a p-value below 0.05 will indicate a statistically significant result.
CSM, TSM & panel data models
X=Income for individuals can be measured by cross-sectional, timeseries, and panel data.
Cross-sectional data: A sample of individuals T=1, N=22, 22 observations
observed in 1 time period (for instance 22
individual’s income for the year 2020) – Xi

Time-series data: A sample of observations over T=3, N=1, 3 observations*


time for one individual (for instance 1 person’s
income for the years 2020, 2021, and 2022) – Xt.
Thus, this is the same person in year 2020 (first
row), 2021 (the middle row), and 2022 (in the last
row) – how his/her income change over time.

Panel data: Same sample of individuals Panel data: N=22, T=3, NT=66 observations
observed in multiple time periods (for instance
22 persons’ income for the years 2020, 2021,
and 2022) – Xi,t
Note: Obviously there is more
information to extract and analyze
 Given some assumptions it is not uncommon that we in a full panel. Therefore, later in
can extract more reliable information from 66 observations the education, we will explore in
detail how to analyze panel data.
compared to 22 and (especially) 3 observations. *This is a simplified example, but in practice T=3 are too few observations to run a regression Now, we will just introduce it.
CSM, TSM & panel data models
Y=Spending for individuals can be measured by cross-sectional, timeseries, and panel data.
Cross-sectional data: A sample of individuals T=1, N=22, 22 observations
observed in 1 time period (for instance 22
individual’s spending year 2020) – Yi

Time-series data: A sample observations over T=3, N=1, 3 observations*


time for one individual (for instance 1 person’s
spending for the years 2020, 2021, and 2022) – Yt.
Thus, this is the same person in year 2020 (first
row), 2021 (the middle row), and 2022 (in the last
row) – how his/her spending change over time.

Panel data: Same sample of individuals Panel data: N=22, T=3, NT=66 observations
observed in multiple time periods (for instance
22 persons’ spending for the years 2020, 2021,
and 2022) – Yi,t
Note: Obviously there is more
information to extract and analyze
 Given some assumptions it is not uncommon that we in a full panel. Therefore, later in
can extract more reliable information from 66 observations the education, we will explore in
detail how to analyze panel data.
compared to 22 and (especially) 3 observations. *This is a simplified example, but in practice T=3 are too few observations to run a regression Now, we will just introduce it.
CSM, TSM & panel data models
Spending=f(Income) using formulas for cross-sectional, timeseries, and panel data.
• Cross-Sectional Models (CSM): Income and spending for e.g. a cross-
section of i=1,2,…,22 individuals (N=22, T=1):
Yi = α + β*Xi + ui or SPENi = α + β*INCi + ui

• Time-Series Models (TSM): Income and spending over e.g. t=1,2,3


time periods for 1 individual (N=1, T=3):
Yt = α + β*Xt + ut or SPENt = α + β*INCt + ut.

• Panel-data models: Income and spending are simultaneously


observed over individuals (cross-sectional units) and over several time
periods (time-series data) (N=22, T=3, N*T=66): See for instance (one
of many panel models) this panel fixed-effects model (FEM):
Note: Obviously there is more
Yi,t = αi + β*Xi,t + ui,t or SPENi,t = αi + β*INCi,t + ui,t information to extract and analyze
in a full panel. Therefore, later in
Note: This is just an example of FEM. As is explained in later slides, it is also possible to replace the the education, we will explore in
αi (α change for every individual) with αt (α change for every year). Also, β can be changed… explained later… detail how to analyze panel data.
Now, we will just introduce it.
Linear Models
“All models are wrong, but some are useful” / George Edward Pelham Box (1919–2013).
Consequently – strictly speaking – all models we will teach you in this course are wrong –
but most are useful !!! However, all model are simplifications of the reality and hence will
not (even strive to) capture all elements of reality (even if some big data models, due to
access to computing power, are using extremely many variables nowadays).
Even if the simplest linear models may not always be the best, they serve as a good
approximation and a good starting point due to their interpretability and (maybe in some
situations) reduced risk of overfitting.
However, an overly simplified model may not capture the full scope of real-world
complexities, potentially resulting in underfitting.
Linear Models – trade-off over/underfitting
• All models are simplified but there is a trade-off between overfitting and underfitting.
• Overfitting: Occurs when a model learns both the true underlying patterns and random noise,
which is not replicable. This results in high accuracy on the training dataset but poor generalizability to new data.
• Underfitting: Occurs when a model, too simplistic, overlooks important data patterns, leading to
inadequate predictions due to its inability to represent the data's complexity.
Simple linear models overfit less often in
general (while an infinite polynomial will
overfit). But linear OLS can still overfit if
too many variables are included:
True Model: 𝑦𝑦 = α0 + α1 𝑥𝑥1

Overfitted Estimated Model: 𝑦𝑦� = α


�0 + α
�1 𝑥𝑥1 + α
�2 𝑥𝑥2

True Model: 𝑦𝑦 = β0 + β1 𝑥𝑥1 + β2 𝑥𝑥2

�0 + β
Underfitted Estimated Model: 𝑦𝑦� = β �1 𝑥𝑥1
Linear Models – OLS estimator properties – bias & efficiency
• UNBIASEDNESS (= perfect, high, accuracy) is about an

High efficiency
estimator's accuracy: whether it consistently estimates the true
parameter value without systematic error (unbiasedness). An
estimator 𝛃𝛃�𝒏𝒏 is unbiased if its expected value is equal to the true
parameter value for any sample size n. E β�𝒏𝒏 = β

• High EFFICIENCY = High Precision (= low variance of the


estimator = low standard errors) is about an estimator’s precision:
the degree to which repeated estimates under the same
conditions yield similar results, reflected by smaller variance and
standard errors. The more efficient the statistic, the more precise
the statistic is as an estimator of the parameter.

Low efficiency
An estimator is an efficient estimator if in some well-defined class
of estimators (e.g. within a class of linear unbiased estimators)
it is the estimate with the smallest variance.
4 estimators: High bias Low bias
(unbiasedness)
•B=Unbiased & (fairly) efficient
•D=Unbiased & inefficient
•A=Biased & inefficient
•C=Biased & (fairly) efficient
Linear Models – OLS estimator properties – consistency
What happens when n  ∞ ?
• CONSISTENCY – as n increases – higher accuracy/lower bias AND
lower variance/higher precision/higher efficiency). All consistent estimators
improve both their accuracy and precision as the sample size n increases.
Definition: “Consistency
The estimator β�2 is a consistent estimator of β2 if the distribution fβ2 collapses at the point β2 into a straight line
when the sample size goes to infinity. This means, that for increasing n, the variance of 𝛃𝛃 �𝟐𝟐 goes to zero, and in
cases where the estimator is biased, the bias also goes to zero”. Vogelvang (2005), Econometrics, p.65.
For consistent estimators, when increasing n the variance of β�n goes to zero, and in cases
where the estimator is biased, this bias will also decrease as n increases. Thus, for
consistent estimators – the bias and variance typically get smaller as the sample size n
increases. Therefore, as n increases, 𝑀𝑀𝑀𝑀𝑀𝑀 β�𝑛𝑛 decreases (where MSE = Bias2 + Variance).

Note: Unbiasedness is about the average


accuracy of the estimator across different
samples of the same sample size for different
estimators, while consistency is about the
behavior of the estimator (the accuracy and
β precision) as the sample size becomes large.
True pop. param. β
Note: A consistent estimator is not necessarily unbiased – or not even necessarily asymptotically unbiased. Also, an unbiased estimator is not necessarily consistent.
Deterministic (Mathematical) vs. Stochastic (Statistical) Models
Roughly speaking – statistics is math with an error term
Deterministic model (math that has no error term): Stochastic model (probabilistic model with an error term):
* A deterministic system is a system in which no randomness is A stochastic system includes elements of randomness and
involved in the development of future states of the system. A unpredictability. The same initial state can lead to different
deterministic model will thus always produce the same output outcomes due to random variables influencing the
from a given starting condition or initial state. process.
* The relationship between the volume and weight of water is For instance, a stochastic model in economics might
fairly exact: e.g. at 20C (room temperature, etc.), the density is predict the range of possible future values for a stock
998.2 kg/m3 – or Ohm's Law (V = IR, where V = voltage, I = price, rather than a single expected value. This is because
current, and R = resistance ( ) is “almost” numerous unforeseen events (war, pandemic etc.) can
exact so no need for an error term during certain conditions. influence market behavior, making it impossible to predict
Deterministic models are less flexible and less robust since with absolute certainty. Income and Spending are related
they cannot handle randomness. However, in reality in – but not with an exact constant of 0.7 (stochastic models
economics and finance, there is always uncertainty. have an error term that picks up all the variation the is not
explained by the explanatory varaibles).

A Deterministic Model allows you to calculate a future event exactly, without the involvement of randomness. If something is A Stochastic Model has the capacity to handle uncertainties in the inputs applied. Stochastic models possess some inherent
deterministic, you have all of the data necessary to predict (determine) the outcome with certainty. However, different initial randomness - the same set of parameter values and initial conditions will lead to an ensemble of different outputs.This type
conditions will affect the output. This type of modeling is less relevant in economics and finance – but sometimes feasible in e.g. of modeling is more relevant in economics and finance since we add an error term – or empirically – a residual.
physics.
Deterministic (Mathematical) vs. Stochastic (Statistical) Models
• A time series can be decomposed into different elements.
(i) Original data series, (i)
(ii) trend,
(iii) seasonal cycle, and a
(iv) remainder - irregular component (residual/error component that
accounts for random noise and unexplained fluctuations in the data). (ii)
• Find the deterministic relationship between how the variation in x
affects the variation in y. Usually we are not interested in modeling
(ii)-(iv), therefore we often try to “filter out” these elements, and then
isolate the underlying pattern and (iii)
Approaches to filter out systematic patterns that are distorting our
regression analysis (since we are usually just interested in the
underlying data-generating process of our variables of interest) (iv)
• (ii) Detrending: Applying a moving average, or HP Filter (Hodrick-
Prescott), or fitting a model, such as a polynomial or other function,
to the data and then subtracting this trend component from the
original series smooths out short-term fluctuations, highlighting the (iv) Note: The irregular remaining random noise
longer-term trend from the (i). cannot be predicted (Random Walk Models: These models
are sometimes used to represent the irregular component,
• (iii) Deseasonalize: Based on (i), create binary (0 or 1) dummy variablesassuming that changes are random and unpredictable)
for each period that represents a season (e.g., months, quarters) to – while possibly the other parts (ii)-(iii) and maybe (i) can
capture seasonal effects. Loess smoothing, Holt-Winters Exponential be predicted.
Smoothing, X-12-ARIMA etc. are common deseasonalizing tools.
Regression Analysis - notations

X Y
Cause Effect
Independent Dependent
Explanatory Explained
Regressor Regressand
Covariate Outcome
Predictor Predicted
Exposure Response
Exogenous Endogenous
Definition of the error term and residual

The stars are observations


Residuals (often represented by e.g. e or 𝒖𝒖
� ) are observable – and are the deviation
of the observed values from the estimated regression line based on a sample (of
various sizes, the higher share of the population the more reliable sample) that we
try to generalize for the features of the entire population.

Error terms (often represented by e.g. u or ε) are not observable – and are the
deviation of the observed values from the true population regression line. We do
not know the true population regression line (and if we knew it, there would be no
point estimating a regression line if we already know the true relationship for the
population that the sample is intended to represent).

Therefore, these concepts are very different*

* However, if we had access to the entire population e.g. all salaries and all relevant explanatory variables for determining teacher salaries at JIBS,
then the error terms and the residuals would be identical (given that the model is correctly specified and that we have all the observations).
𝒀𝒀

𝒀𝒀 = 𝜷𝜷𝟎𝟎 + 𝜷𝜷𝟏𝟏 𝑿𝑿
This is the true population
relationship.

Note: This (true) population


relationship between X & Y is
unknown to the researcher (otherwise
if we already know this relationship it
would be pointless to take a sample
for estimating a regression
relationship that is known).

This relationship is what we aim to


estimate, but we will not be able to
estimate it exactly.
𝑿𝑿
Regression equation for modeling the dependent variable 𝑌𝑌 based on the independent variable 𝑋𝑋.
𝒀𝒀

𝒀𝒀𝟔𝟔 𝑬𝑬𝟔𝟔
𝒀𝒀𝟓𝟓
𝑬𝑬𝟓𝟓
𝒀𝒀𝟒𝟒
𝑬𝑬𝟒𝟒
𝒀𝒀𝟑𝟑
𝑬𝑬𝟑𝟑
𝒀𝒀𝟐𝟐
𝑬𝑬𝟐𝟐
𝒀𝒀𝟏𝟏
𝑬𝑬𝟏𝟏

𝜷𝜷𝟏𝟏 �

𝜷𝜷𝟎𝟎 �
1 unit
𝑿𝑿
𝑿𝑿𝟏𝟏 𝑿𝑿𝟐𝟐 𝑿𝑿𝟑𝟑 𝑿𝑿𝟒𝟒 𝑿𝑿𝟓𝟓 𝑿𝑿𝟔𝟔

(Theoretically) Expected values of 𝑌𝑌 (that is 𝐘𝐘𝟏𝟏, … , 𝐘𝐘𝐘𝐘), given 6 different values of 𝑋𝑋 (X1, … , X6)
𝒀𝒀
𝒀𝒀𝟒𝟒
𝑶𝑶𝟒𝟒
𝒀𝒀𝟔𝟔
𝑬𝑬𝟔𝟔
𝒀𝒀𝟓𝟓 𝒀𝒀𝟔𝟔
𝑬𝑬𝟓𝟓 𝑶𝑶𝟔𝟔
𝒀𝒀𝟒𝟒
𝑶𝑶𝟏𝟏 𝑶𝑶𝟑𝟑 𝑬𝑬𝟒𝟒
𝒀𝒀𝟑𝟑 𝒀𝒀𝟏𝟏 , 𝒀𝒀𝟑𝟑 , 𝒀𝒀𝟓𝟓 𝑶𝑶𝟓𝟓
𝑬𝑬𝟑𝟑
𝒀𝒀𝟐𝟐
𝑬𝑬𝟐𝟐
𝒀𝒀𝟏𝟏 𝒀𝒀𝟐𝟐
𝑬𝑬𝟏𝟏 𝑶𝑶𝟐𝟐

𝜷𝜷𝟏𝟏 �

𝜷𝜷𝟎𝟎 �
1 unit
𝑿𝑿
𝑿𝑿𝟏𝟏 𝑿𝑿𝟐𝟐 𝑿𝑿𝟑𝟑 𝑿𝑿𝟒𝟒 𝑿𝑿𝟓𝟓 𝑿𝑿𝟔𝟔

Observed values (from a sample) of 𝑌𝑌 compared to expected (not observable, since the true regression line is unknown) values of 𝑌𝑌, given 6 𝑋𝑋 obs.
𝒀𝒀
𝒀𝒀𝟒𝟒
𝑶𝑶𝟒𝟒
𝒀𝒀𝟔𝟔
𝑬𝑬𝟔𝟔
𝒀𝒀𝟓𝟓 𝒀𝒀𝟔𝟔 𝒖𝒖𝟒𝟒 � 𝒖𝒖𝟔𝟔 �
𝑬𝑬𝟓𝟓 𝑶𝑶𝟔𝟔
𝒀𝒀𝟒𝟒 𝒖𝒖𝟓𝟓 �
𝑶𝑶𝟏𝟏 𝑶𝑶𝟑𝟑 𝑬𝑬𝟒𝟒
𝒀𝒀𝟑𝟑 𝒀𝒀𝟏𝟏 , 𝒀𝒀𝟑𝟑 , 𝒀𝒀𝟓𝟓 𝒖𝒖𝟑𝟑 𝑶𝑶𝟓𝟓
𝑬𝑬𝟑𝟑
𝒀𝒀𝟐𝟐 𝒖𝒖𝟏𝟏 � 𝑬𝑬𝟐𝟐
𝒖𝒖𝟐𝟐 {

𝒀𝒀 = 𝜷𝜷𝟎𝟎 + 𝜷𝜷𝟏𝟏 𝑿𝑿𝟒𝟒


𝒀𝒀𝟏𝟏 𝒀𝒀𝟐𝟐
𝑬𝑬𝟏𝟏 𝑶𝑶𝟐𝟐

𝜷𝜷𝟏𝟏 �

𝜷𝜷𝟎𝟎 �
1 unit
𝑿𝑿
𝑿𝑿𝟏𝟏 𝑿𝑿𝟐𝟐 𝑿𝑿𝟑𝟑 𝑿𝑿𝟒𝟒 𝑿𝑿𝟓𝟓 𝑿𝑿𝟔𝟔

(Given the observations, 01 to 06) Expected error terms 𝒖𝒖𝟏𝟏 , … , 𝒖𝒖𝟔𝟔 from estimating values of 𝑌𝑌 given 6 𝑋𝑋 obs.
𝒀𝒀
𝒀𝒀𝟒𝟒 𝑷𝑷𝟔𝟔
𝑶𝑶𝟒𝟒
𝒀𝒀𝟔𝟔
𝑬𝑬𝟔𝟔� 𝒖𝒖
� 𝟔𝟔
𝒀𝒀𝟓𝟓 𝒀𝒀𝟔𝟔 𝒖𝒖𝟒𝟒 � 𝑷𝑷𝟓𝟓 𝒖𝒖𝟔𝟔 �
𝑷𝑷𝟒𝟒 𝑬𝑬𝟓𝟓 𝑶𝑶𝟔𝟔
𝒀𝒀𝟒𝟒 𝒖𝒖𝟓𝟓 �
𝑶𝑶𝟏𝟏 𝑶𝑶𝟑𝟑 𝑬𝑬𝟒𝟒
𝒀𝒀𝟑𝟑 𝒀𝒀𝟏𝟏 , 𝒀𝒀𝟑𝟑 , 𝒀𝒀𝟓𝟓 𝒖𝒖𝟑𝟑 𝑶𝑶𝟓𝟓
𝑬𝑬
𝑷𝑷𝟑𝟑 𝟑𝟑
𝒀𝒀𝟐𝟐 𝒖𝒖 �
� 𝟏𝟏 � 𝟏𝟏
𝒖𝒖 𝒖𝒖𝟐𝟐 { 𝑬𝑬𝟐𝟐

𝒀𝒀 = 𝜷𝜷𝟎𝟎 + 𝜷𝜷𝟏𝟏 𝑿𝑿𝟒𝟒


𝒀𝒀𝟏𝟏 𝒀𝒀𝟐𝟐 𝑷𝑷𝟐𝟐
𝑬𝑬𝟏𝟏 𝑶𝑶𝟐𝟐
𝑷𝑷𝟏𝟏
Note the population’s unobservable
𝜷𝜷𝟏𝟏 � parameters 𝜷𝜷𝟎𝟎 and 𝜷𝜷𝟏𝟏 vs. the estimated
model’s estimated (observed)
� 𝟏𝟏 coefficients 𝜷𝜷 � 𝟏𝟏 .
� 𝟎𝟎 and 𝜷𝜷
� 𝜷𝜷
𝜷𝜷𝟎𝟎 �
� 𝟎𝟎
�𝜷𝜷
1 unit
𝑿𝑿
𝑿𝑿𝟏𝟏 𝑿𝑿𝟐𝟐 𝑿𝑿𝟑𝟑 𝑿𝑿𝟒𝟒 𝑿𝑿𝟓𝟓 𝑿𝑿𝟔𝟔
Estimated (observable) regression equation, and predicted values of 𝑌𝑌 compared to expected and observed values of 𝑌𝑌, given 6 𝑋𝑋 obs.
𝒀𝒀
𝒀𝒀𝟒𝟒 𝑷𝑷𝟔𝟔
𝑶𝑶𝟒𝟒
𝒀𝒀𝟔𝟔
𝑬𝑬𝟔𝟔� 𝒖𝒖
� 𝟔𝟔
𝒀𝒀𝟓𝟓 𝒀𝒀𝟔𝟔 𝒖𝒖𝟒𝟒 � 𝑷𝑷𝟓𝟓 𝒖𝒖𝟔𝟔 �
𝑷𝑷𝟒𝟒 𝑬𝑬𝟓𝟓 𝑶𝑶𝟔𝟔
𝒀𝒀𝟒𝟒 𝒖𝒖𝟓𝟓 �
𝑶𝑶𝟏𝟏 𝑶𝑶𝟑𝟑 𝑬𝑬𝟒𝟒
𝒀𝒀𝟑𝟑 𝒀𝒀𝟏𝟏 , 𝒀𝒀𝟑𝟑 , 𝒀𝒀𝟓𝟓 𝒖𝒖𝟑𝟑 𝑶𝑶𝟓𝟓 Compare so you remember the notations
𝑬𝑬 from the previous slide – the difference
𝑷𝑷𝟑𝟑 𝟑𝟑
𝒀𝒀𝟐𝟐 𝒖𝒖 � between residual and error term
� 𝟏𝟏 � 𝟏𝟏
𝒖𝒖 𝒖𝒖𝟐𝟐 { 𝑬𝑬𝟐𝟐

𝒀𝒀 = 𝜷𝜷𝟎𝟎 + 𝜷𝜷𝟏𝟏 𝑿𝑿𝟒𝟒


𝒀𝒀𝟏𝟏 𝒀𝒀𝟐𝟐 𝑷𝑷𝟐𝟐
𝑬𝑬𝟏𝟏 𝑶𝑶𝟐𝟐
𝑷𝑷𝟏𝟏

𝜷𝜷𝟏𝟏 �
� 𝟏𝟏
� 𝜷𝜷
𝜷𝜷𝟎𝟎 �
� 𝟎𝟎
�𝜷𝜷
1 unit
𝑿𝑿
𝑿𝑿𝟏𝟏 𝑿𝑿𝟐𝟐 𝑿𝑿𝟑𝟑 𝑿𝑿𝟒𝟒 𝑿𝑿𝟓𝟓 𝑿𝑿𝟔𝟔
Estimated (observable) regression equation, and predicted values of 𝑌𝑌 compared to expected and observed values of 𝑌𝑌, given 6 𝑋𝑋 obs.
Ordinary Least Squares Method
• There are many ways to estimate a regression model. The most common method is the Ordinary
Least Squares (OLS) method.

• The OLS method aims at minimizing the sum of squared errors. The errors are unobservable.
Therefore, it aims for the sum of squared predicted errors = called residuals (which are
observable).

• With a sample of 𝑛𝑛 observations (𝑌𝑌𝑖𝑖 , 𝑋𝑋𝑖𝑖 ), for 𝑖𝑖=1, 2, …, 𝑛𝑛, we want


𝑛𝑛 𝑛𝑛

min � 𝑢𝑢𝑖𝑖2 = min � 𝑌𝑌𝑖𝑖 − 𝛽𝛽0 − 𝛽𝛽1 𝑋𝑋𝑖𝑖 2


𝛽𝛽0 ,𝛽𝛽1
𝑖𝑖=1 𝑖𝑖=1
Definition of the error term and residual
Minimizing the residual sum of squares is easy to remember when plotting the residuals in this way
Note: that the extreme observations are getting excessively high weight
– for affecting the OLS regression’s slope –
since it is squared
OLS Estimators
𝛽𝛽̂0 = 𝑌𝑌� − 𝛽𝛽̂1 𝑋𝑋� This formula is used to calculate the intercept. The intercept is the expected value of the dependent variable (Y) when
� mean an estimate, and the 𝑏𝑏𝑏𝑏𝑏𝑏 stands for the arithmetic mean).
all independent variables (X) are equal to zero (where the ℎ𝑎𝑎𝑎𝑎

_____________________

�𝑖𝑖 ) = 𝑌𝑌�𝑖𝑖 = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑋𝑋𝑖𝑖


𝐸𝐸(𝑌𝑌|𝑋𝑋 This equation represents the expected value of Y given a particular value of X (Xi). It's the equation of the regression line
itself, where 𝑌𝑌�𝑖𝑖 is the predicted value of Y for a given Xi. 𝛽𝛽̂0 is the estimated intercept, and 𝛽𝛽̂1 is the estimated slope of the regression line. This line is the
"best fit" through the data points (when we have minimized the residual sums of square).

_____________________

𝑢𝑢� 𝑖𝑖 = 𝑌𝑌𝑖𝑖 − 𝑌𝑌�𝑖𝑖 = 𝑌𝑌𝑖𝑖 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑋𝑋𝑖𝑖 Here, 𝑢𝑢� 𝑖𝑖 represents the residuals or the differences between the observed values (𝑌𝑌𝑖𝑖 ) and the predicted values (𝑌𝑌�𝑖𝑖 ) from
the regression. The residuals indicate how much the actual data points deviate from the estimated regression line.

_____________________

2 ∑𝑛𝑛 �𝑖𝑖2
𝑖𝑖 𝑢𝑢
𝜎𝜎� = This formula calculates the “estimated variance of the residuals” (𝜎𝜎� 2 ) which measures the dispersion of the residuals
𝑛𝑛−2
(or errors) around the regression line. It's an estimate of the variance of the error term in the population regression equation. The denominator (n-2) is the
degrees of freedom, where n is the number of observations, and “2” accounts for the 2 number of parameters estimated (𝛽𝛽̂0 𝑎𝑎𝑎𝑎𝑎𝑎 𝛽𝛽̂1 ). Thus, the variance of
the error term in the population regression is σ2, which is an unknown parameter that we estimate with 𝜎𝜎� 2 . A smaller value of 𝝈𝝈 � 𝟐𝟐 suggests that the model
� 𝟐𝟐 might indicate a poor model fit.
fits the data well, as it indicates less variability in the residuals. Conversely, a larger 𝝈𝝈
Equivalent Forms of 𝛽𝛽̂1 Estimators
� 𝑖𝑖
∑𝑛𝑛𝑖𝑖=1(𝑋𝑋𝑖𝑖 − 𝑋𝑋)𝑌𝑌
̂
𝛽𝛽1 = 𝑛𝑛
∑𝑖𝑖=1(𝑋𝑋𝑖𝑖 − 𝑋𝑋)� 2


∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 (𝑌𝑌𝑖𝑖 − 𝑌𝑌)
𝛽𝛽̂1 = 𝑛𝑛
∑𝑖𝑖=1(𝑋𝑋𝑖𝑖 − 𝑋𝑋) � 2


∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 (𝑌𝑌𝑖𝑖 − 𝑌𝑌)
𝛽𝛽̂1 = 𝑛𝑛
∑𝑖𝑖=1 𝑋𝑋𝑖𝑖 (𝑋𝑋𝑖𝑖 − 𝑋𝑋)�

∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − 𝑛𝑛𝑋𝑋� 𝑌𝑌� 𝑛𝑛 ∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − ∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 ∑𝑛𝑛𝑖𝑖=1 𝑌𝑌𝑖𝑖
𝛽𝛽̂1 = 𝑛𝑛 =
∑𝑖𝑖=1 𝑋𝑋𝑖𝑖2 − 𝑛𝑛𝑋𝑋� 2 𝑛𝑛 ∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖2 − ∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖
2

There are different views but some argue that the last version is the easiest to use, when find the
� 𝟏𝟏 .
parameters based on the data in tables. However, they will all give the same estimate for 𝜷𝜷
Precision of OLS Parameter Estimators
• Standard error is a measure of estimators’ precision – how much the estimated value would typically vary if
we were to repeat our sample many times. It gives us an idea of the precision of our estimate; a smaller
standard error suggests a more precise estimate).
• An estimate of the “standard deviation of an estimator” is called standard error.

𝜎𝜎 2
𝑠𝑠. 𝑒𝑒. (𝛽𝛽̂1 ) = ∑𝑛𝑛 � 2
The standard error of the slope coefficient (𝛽𝛽̂1 ​) from a regression model,
𝑖𝑖=1(𝑋𝑋𝑖𝑖 −𝑋𝑋)

where, σ2 represents the variance of the error terms in the model (estimated by 𝜎𝜎� 2 ). The denominator is the sum of
� This tells us how much 𝛽𝛽̂1 would vary from
the squared differences between each Xi and the mean of X (that is 𝑋𝑋).
sample to sample.

𝜎𝜎2 ∑𝑛𝑛 2
𝑖𝑖=1 𝑋𝑋𝑖𝑖
𝑠𝑠. 𝑒𝑒. 𝛽𝛽̂0 = This is the standard error of the intercept (𝛽𝛽̂0 ).
𝑛𝑛 ∑𝑛𝑛 � 2
𝑖𝑖=1 𝑋𝑋𝑖𝑖 −𝑋𝑋

∑ 𝑢𝑢 𝑛𝑛 �2
Note: The estimated variance (𝜎𝜎� ) is usually calculated as follows 𝜎𝜎�2 = 𝑖𝑖=1 𝑖𝑖 represents the residuals (the differences between
2
𝑛𝑛−𝑘𝑘
the observed and predicted values), n is the number of observations, and k is the number of parameters estimated (including the
intercept, even if it is sometimes written as n-k-1, when the intercept is not included).
Example 1: Simple Linear Regression
𝑦𝑦 = β0 + β1 𝑥𝑥1 + 𝜖𝜖 we obtain 𝑦𝑦 = 2 + 1 𝑥𝑥1 + e

OLS estimation for predicting supplied quantity (𝑌𝑌) based on the price (𝑋𝑋)
Item 𝑌𝑌 𝑋𝑋 𝑿𝑿𝟐𝟐 �
𝑋𝑋 − 𝑿𝑿 �
𝑋𝑋 − 𝑿𝑿 𝟐𝟐 � 𝒀𝒀
𝑋𝑋 − 𝑿𝑿 �
𝒀𝒀 �
𝒖𝒖 � 𝟐𝟐
𝒖𝒖 � 𝑖𝑖 𝟏𝟏𝟏𝟏
∑𝑛𝑛𝑖𝑖=1(𝑋𝑋𝑖𝑖 − 𝑋𝑋)𝑌𝑌
̂
𝛽𝛽1 = 𝑛𝑛 = = 𝟏𝟏 𝜎𝜎� 2 = the estimator of the
1 5 2 4 -1 1 -5 4 1 1 ∑𝑖𝑖=1(𝑋𝑋𝑖𝑖 − 𝑋𝑋)� 2 𝟏𝟏𝟏𝟏 variance of the error term u.
A measure of the variability of
2 5 4 16 1 1 5 6 -1 1 the residuals around the
𝛽𝛽̂0 = 𝑌𝑌� − 𝛽𝛽̂1 𝑋𝑋� = 𝟓𝟓 − (𝟏𝟏)(𝟑𝟑) = 𝟐𝟐 regression line, that is, the
overall variability in the model
3 3 1 1 -2 4 -6 3 0 0 that cannot be explained by
𝑛𝑛 2
4 8 5 25 2 4 16 7 1 1 ∑ 𝑖𝑖 𝑢𝑢
� 𝑖𝑖 𝟒𝟒 𝟒𝟒 the independent variables.
𝜎𝜎� 2 = = = = 1.3𝟑𝟑𝟑𝟑 𝑢𝑢� 2 = MSRes the individual
𝑛𝑛 − 2 5 − 2 3 squared residuals from the
n=5 4 3 9 0 0 0 5 -1 1 regression model. Each residua
is the difference between the
Sum Σ 25 15 55 0 10 10 0 4 𝑌𝑌� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑋𝑋 = 𝟐𝟐 + 𝟏𝟏 ∗ 𝑋𝑋 observed value and the
predicted value for a given obs.
Mean 5 3 0 - - 𝑢𝑢� = 𝑌𝑌 − 𝑌𝑌�
Used for calculating e.g. 𝜎𝜎� 2

Fitted dependent variable (𝑌𝑌): � is the predicted values of the dependent


variable based on the estimated coefficients of the independent variables.
𝜎𝜎� 2 1.3𝟑𝟑𝟑𝟑 Residuals: 𝑢𝑢� = 𝑌𝑌 − 𝑌𝑌� see the graph above.
𝑠𝑠. 𝑒𝑒. 𝛽𝛽̂1 = = = 0.1333 = 0.365 Interpretation of the Slope marginal effect: If the price (X) increase by 1 unit,
∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 − 𝑋𝑋� 2 𝟏𝟏𝟏𝟏
then the supplied quantity of the good (Y) increases on average by 𝛽𝛽�1 =1 unit.
Interpretation of the intercept: If the price (X) is 0 (unrealistic), then the
𝑛𝑛 2
2
𝜎𝜎� ∑𝑖𝑖=1 𝑋𝑋𝑖𝑖 1.3𝟑𝟑𝟑𝟑 𝟓𝟓𝟓𝟓 quantity of supplied good (Y) is 𝛽𝛽�0 =2.
𝑠𝑠. 𝑒𝑒. 𝛽𝛽̂0 = = = 1.211 Note, however, to make these interpretations our model must be statistically
𝑛𝑛 ∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 − 𝑋𝑋� 2 𝟓𝟓 𝟏𝟏𝟏𝟏
significant (explained later).
Example 1: Coefficient estimates – checking the calculations
* Stata: Clearing the current dataset
clear all

* Creating the data


input X Y
' In EViews:
25
' Create a new workfile for 5 cross-sections
45
‘ (u=undated)
13
wfcreate(wf=Example) u 5
58
' Create two vectors (filled with zeroes)
34
' for 5 cross-sections
end
vector(5) y_vec
vector(5) x_vec
* Fit the linear regression
' Fill the vector with some values
regress Y X
y_vec.fill 5, 5, 3, 8, 4
x_vec.fill 2, 4, 1, 5, 3
' Declare series and
series y
series x
' Transform vectors to series
mtos(y_vec,y) 𝑠𝑠. 𝑒𝑒. 𝛽𝛽̂1 = 0.365 and 𝑠𝑠. 𝑒𝑒. 𝛽𝛽̂0 =1.211
mtos(x_vec,x)
' Fit the linear regression model
' and save it as eq1 (show it) 𝛽𝛽̂1 =1
equation eq1.ls Y C X 𝛽𝛽̂0 =2
show eq1
Analyzing Sum of Squares

• The total sum of squares of the dependent variable can be analyzed as:
𝑛𝑛 𝑛𝑛
2 2
� 𝑦𝑦𝑖𝑖 − 𝑦𝑦� = � 𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 + 𝑦𝑦�𝑖𝑖 − 𝑦𝑦�
𝑖𝑖=1 𝑖𝑖=1
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑆𝑆𝑆𝑆𝑆𝑆 𝑜𝑜𝑜𝑜 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆

𝑛𝑛 𝑛𝑛 𝑛𝑛 � 𝒊𝒊
𝒖𝒖
2 2
= � 𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 + � 𝑦𝑦�𝑖𝑖 − 𝑦𝑦� + 2 �(𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 ) 𝑦𝑦�𝑖𝑖 − 𝑦𝑦�
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
𝒛𝒛𝒛𝒛𝒛𝒛𝒛𝒛
𝑛𝑛 𝑛𝑛

= � 𝑢𝑢� 𝑖𝑖2 + � 𝑦𝑦�𝑖𝑖 − 𝑦𝑦� 2

𝑖𝑖=1 𝑖𝑖=1
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝑆𝑆𝑆𝑆𝑆𝑆 𝑜𝑜𝑜𝑜 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑆𝑆𝑆𝑆 𝑜𝑜𝑜𝑜 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
(𝑑𝑑𝑑𝑑𝑑𝑑 𝑡𝑡𝑡𝑡 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸) (𝑑𝑑𝑑𝑑𝑑𝑑 𝑡𝑡𝑡𝑡 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅)
OLS Estimators – the logics behind the formula
Intuition of estimators:
𝑛𝑛
∑ 𝑋𝑋𝑖𝑖 −𝑋𝑋� 𝑌𝑌𝑖𝑖 −𝑌𝑌�
• OLS estimators 𝛽𝛽̂1 = 𝑖𝑖=1
∑𝑛𝑛 � 2
𝑖𝑖=1 𝑋𝑋𝑖𝑖 −𝑋𝑋

� 𝟏𝟏 (see the formula):


Intuition of 𝜷𝜷
--- First, take a look at the Covariance numerator (the covariance between X and Y): Covariance is a measure used in statistics to determine the degree
to which two variables, X and Y, change together. In simpler terms, it indicates whether an increase in one variable would correspond to an increase (or
decrease) in the other variable. It's crucial to remember that covariance, by itself, doesn't tell us about the strength of the relationship or causality.
• When 𝑋𝑋𝑖𝑖 ​ is greater than 𝑋𝑋� (mean of 𝑋𝑋𝑖𝑖 ) and 𝑌𝑌𝑖𝑖 ​ is greater than 𝑌𝑌� (mean of 𝑌𝑌𝑖𝑖 ), the product is positive,
 indicating that higher values of 𝑋𝑋𝑖𝑖 coincide with higher values of 𝑌𝑌𝑖𝑖 .
• When 𝑋𝑋𝑖𝑖 ​ is less than 𝑋𝑋� (mean of 𝑋𝑋𝑖𝑖 ) and 𝑌𝑌𝑖𝑖 ​ is less than 𝑌𝑌� (mean of 𝑌𝑌𝑖𝑖 ), the product is positive,
 indicating that lower values of 𝑋𝑋𝑖𝑖 coincide with lower values of 𝑌𝑌𝑖𝑖 .
• A negative product means that X and Y move in opposite directions: when one is higher than its mean value,
 the other is lower than its mean value, and vice versa.
--- Secondly, take a look at the squared deviations of X in the formula denominator:
The reason for dividing by the sum of squared deviations of X in the formula for is to scale the covariance between X and Y so that the resulting
slope coefficient represents the average change in Y for a one-unit change in X. 𝛽𝛽�1 can give us marginal effects, but unless the Y and X variables
are standardized 𝛽𝛽�1 is still scale-dependent. This division normalizes the effect of X on Y by accounting for the spread of X values, provides
appropriate weighting to observations further from the mean of X, and aligns with the least squares criterion that aims to minimize the sum of
squared residuals. It ensures that the scale of X does not distort the estimated relationship's magnitude, allowing for meaningful interpretation of
the coefficient.
R2 can provide the strength of the relationship between X and Y.
OLS Estimators
• In contrast to 𝛽𝛽̂1 , R2 (coefficient of determination) provides a standardized measure of
relationship strength in regression analysis. It shows the proportion of variance in the
dependent variable explained by the independent variable(s), ranging from 0 to 1. This
normalization allows for meaningful comparisons across different scales and units. R2
offers a clearer picture of the relationship’s strength, indicating how much one variable
explains the variance in another. R2=1 indicates that 100% of the total variation is
explained by the independent variable(s).
• In essence, covariance merely suggests a general tendency of variables to move together,
while R2 quantifies the extent to which variations variables account for variations in
another, making it a more universally interpretable statistic in statistical models.
• R2 is a normalized measure of the strength of the relationship, specifically in the context
of a model's explanatory power.
Coefficient of Determination (R2)
• The coefficient of determination, denoted by R2 or r2, is the proportion of the
variance in the dependent variable that is predictable from (explainable by) the
independent variable using the simple linear OLS estimated model.
𝑇𝑇𝑇𝑇𝑇𝑇 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑆𝑆𝑆𝑆𝑆𝑆 𝑜𝑜𝑜𝑜 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
𝐸𝐸𝐸𝐸𝐸𝐸 = 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑆𝑆𝑆𝑆 𝑜𝑜𝑜𝑜 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 (𝑑𝑑𝑑𝑑𝑑𝑑 𝑡𝑡𝑡𝑡 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟)
𝑅𝑅𝑅𝑅𝑅𝑅 = 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝑆𝑆𝑆𝑆𝑆𝑆 𝑜𝑜𝑜𝑜 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 (𝑑𝑑𝑑𝑑𝑑𝑑 𝑡𝑡𝑡𝑡 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒)
𝑇𝑇𝑆𝑆𝑆𝑆 = 𝐸𝐸𝐸𝐸𝐸𝐸 + 𝑅𝑅𝑅𝑅𝑅𝑅

2
𝐸𝐸𝐸𝐸𝐸𝐸
𝑅𝑅 =
𝑇𝑇𝑇𝑇𝑇𝑇
or
2
𝐸𝐸𝑆𝑆𝑆𝑆 𝑇𝑇𝑇𝑇𝑇𝑇 − 𝑅𝑅𝑅𝑅𝑅𝑅 𝑅𝑅𝑆𝑆𝑆𝑆
𝑅𝑅 = = =1−
𝑇𝑇𝑇𝑇𝑇𝑇 𝑇𝑇𝑇𝑇𝑇𝑇 𝑇𝑇𝑇𝑇𝑇𝑇
• We have 0 ≤ 𝑅𝑅 2 ≤ 1, since 𝑅𝑅 2 is a proportion. 𝑅𝑅2 = 1 means a perfect fit.
Coefficient of Determination (R2)
R2 = 0 indicates that the regression R2 = 1 indicates that the regression
line cannot explain more (of the variation line perfectly fits the data
in y) than the arithmetic mean of y.

Thus, the regression line does not explain


any of the variances and is equivalent to
the mean of the dependent variable.
Correlation (Pearson, r) vs. R2 – intuition
Thus, while R2 is an indicator of fit quality, β2 provides the magnitude and direction of the relationship between X and Y. The
correlation coefficient r combines the information about the direction and strength of the linear relationship but does not
contain information about the specific slope or the variability in Y that the regression line explains.
- Both R2 and r are related to the strength of the linear
relationship between 2 variables (R2 can be used for many
explanatory variables, and r only for one explanatory).
- The same slope in both figures (the beta), but different
Pearson correlation coefficients r=-0.98, and r=-0.76, and
different R2=0.96 and R2=0.58. Since this is only one
explanatory variable r2 = R2, so, r2=-0.982 = 0.96 (that is,
R2) and r2=-0.762 = 0.58 (that is R2)
- r measures the strength and direction of the linear
relationship, while R2 measures the strength of the
relationship in terms of explained variance.
- R2 is a measure of fit of the regression (and like r does Left vs. right figure:
not necessarily imply causation, that is not necessarily a •• The
Lower p-value for left β2
left figure has a higher
Abs(r)=Abs(-0.98)>Abs(-0.76), indicating
cause-and-effect relationship). a stronger linear relationship. Therefore,
the p-value for r would likely be lower for
- Since r can be positive (max +1) or negative (min -1) it the left figure, suggesting that the correlation is
statistically significant.
shows the direction of the relationship, while R2 does not •R2Moreover, because the left figure has a higher
, indicating a better fit, the F-test for the
have a direction, but is normalized between 0 (no fit) to 1 regression model is also likely to yield a lower p-
value, suggesting that the overall model is
(perfect fit of the regression line). statistically significant in explaining the variability
of the dependent variable Y.
- To the left a stronger negative (can only be seen by r)
relationship between x and y, and to the right a weaker
negative (with the same slope between x and y). Less
uncertainty to the left.
Slope Coefficient (𝛽𝛽) vs. Correlation (r)
𝑠𝑠𝑌𝑌 This formula shows that if r is held constant and the standard deviations stay the
β2 = 𝑟𝑟 ⋅
𝑠𝑠𝑋𝑋 same, the slope will also remain constant. Variance proportion is what matters.
The sign of r indicates the direction of the relationship (positive or negative), while R2 only indicates the strength of the relationship without
any direction.
R2 = r2 when there is one explanatory variable (not for multiple regression, as r itself does not apply to the relationship between the
dependent variable and more than one independent variable). There isn't a direct formula that relates R2 to β2 because β2 also depends on
the units of X and Y, while R2 is a dimensionless quantity.
- A higher absolute value of the slope β2 implies a steeper line, which indicates a larger change in Y for a given change in X.
- A higher R2 implies that a greater proportion of the variance in Y is explained by the model. It does not, however, tell you about the
steepness of the line, only how well the line fits the data.
- The Pearson correlation coefficient r does not inform about the steepness or the slope of the regression line. The value of r indicates the
strength and direction of a linear relationship between two variables, but it does not tell you how steep that relationship is.
- The steepness of the regression line is determined by a slope coefficient (e.g. 𝜷𝜷𝟐𝟐 ) in the regr. equation (in e.g. 𝑌𝑌 = 𝛽𝛽1 + 𝛽𝛽2 𝑋𝑋 + 𝑢𝑢).
β2 represents the change in the dependent variable Y for a one-unit change in the independent variable X. Higher abs(β2)=steeper line.
- In contrast, r simply tells you whether there is a positive or negative linear relationship and how strong that relationship is.
- You could have a very strong correlation ∣r∣ (close to 1) with a relatively flat line if the units of X are large compared to the units of Y.
- Conversely, you could have a very steep line (large β) with a lower r correlation if there is a lot of variability around the line.
- So, while r and β2 are related (since β2 is derived from r and the standard deviations of X and Y), they tell us different things about the
relationship between X and Y. r tells us about the direction and strength of the relationship, while β2 tells us about the magnitude of change
in Y when X changes.
Note: Even if R2 is high and close to 1 (100%), it does not mean that the regression model is the best or only model for the data.
Correlation (Pearson, r) vs. R2 – intuition

Correlation
Correlation (Pearson, r) vs. R2 – intuition
Correlation

Weak relationship = Small Moderate relationship = Strong relationship =


correlation value Abs(r) Moderate correlation value Large correlation value Abs(r)
Correlation (Pearson) vs. R2 – intuition
Correlation (Pearson) vs. R2 – intuition
Correlation (Pearson, r) vs. R2 – intuition
Correlation (Pearson, r) vs. R2 – intuition
Correlation (Pearson, r) vs. R2 – intuition
Correlation (Pearson, r) vs. R2 – intuition
Correlation (Pearson, r) vs. R 2 – intuition
For R2 (that is just the square of the correlation r) it is also just R2=1.
But few observations will lead to big uncertainties of course despite
R2=r=1 when n is low (β will show the slope though)
Note: For multiple regression, which involves more than one independent variable, if the dependent variable can be perfectly predicted without error from the independent variables,
the R2 will also be 100%. This would occur if the observed values lie exactly on a hyperplane when plotted in multidimensional space (where each dimension represents one of the
independent variables).
Correlation
Remedies:
(Pearson, r) vs. R 2 – intuition

p-value for the correlation r (indicates higher uncertainty if n is high)


(or F-testing the model for a multiple regression also adjusts for number
of observations)
Correlation (Pearson, r) vs. R2 – intuition
Correlation (Pearson, r) vs. R2 – intuition
Correlation (Pearson, r) vs. R2 – intuition
For correlation, a p-value tells us the probability that
randomly drawn dots (observations) will result in a
similarly strong (or stronger) relationship.
Correlation (Pearson, r) vs. R2 – intuition

This means that the probability of


random data creating a similarly strong,
or stronger, relationship is
0.000000000001 when taking into
account that there are many observations
(something R2 or r is not doing)
Correlation (Pearson, r) vs. R2 – intuition
Correlation (Pearson, r) vs. R2 – intuition
Correlation (Pearson, r) vs. R2 – intuition
Correlation (Pearson, r) vs. R2 – intuition
Correlation (Pearson, r) vs. R2 – intuition
Correlation (Pearson, r) vs. R2 – intuition
Correlation (Pearson, r) vs. R2 – intuition
Correlation (Pearson, r) vs. R2 – intuition
Correlation (Pearson, r) vs. R2 – intuition
Correlation (Pearson, r) vs. R2 – intuition
Correlation (Pearson, r) vs. R2 – intuition
Correlation (Pearson, r) vs. R2 – intuition
Coefficient of Determination (R2) vs. Correlation (r) Correlation (r):
Some more examples: R2 can still be close to zero even if the slope coefficient (β2) of the Note: The Pearson
regression between X and Y is 0.99. This is because R2 measures the proportion of correlation coefficient r
variance in the dependent variable (Y) that is predictable from the independent variable ranges from -1 to +1:
(X), while the slope (β2) indicates the change in Y for a one-unit change in X. • r = 1 a perfect pos linear
relationship,
• r = −1 a perfect neg linear
relationship,
• r = 0 no linear relationship
Using R
between x & y Code in R (install the libraries if they are not already installed)

library(ggplot2)
library(gridExtra)

Determination coefficient (R2): # Set a seed for reproducibility


set.seed(1234586378)

• Positive 0≤R2≤1 (almost always) # Function to generate data with a specific correlation
generate_data <- function(n, rho) {
# Generate x as normal random values

• R2=1 indicates that the


x <- rnorm(n)

# Generate noise
noise <- rnorm(n)

regression predictions # Calculate y using the specified correlation


y <- rho * x + sqrt(1 - rho^2) * noise

perfectly fit the data.


return(data.frame(x, y))
}

# Set different correlation values, excluding 0

• R2=0 means that the


correlations <- c(-1, -0.9, -0.6, -0.3, 0.3, 0.6, 0.9, 1)

# Function to categorize the strength of correlation


correlation_category <- function(r) {

regression does not explain


if (r == 1 || r == -1) {
return("Perfect")
} else if (abs(r) >= 0.7) {
return("Strong")

any of the variability in the } else if (abs(r) >= 0.4) {


return("Moderate")
} else {

outcome data.
return("Weak")
}
}

# List to store ggplot objects


plots <- list()

# Generate plots for different correlations


for (rho in correlations) {

There are no exact limits for Strong, Moderate, Weak R2


data <- generate_data(100, rho)

# Calculate the actual correlation from the data

Note 1: For one independent variable the square of the


actual_r <- cor(data$x, data$y)
actual_r_squared <- actual_r^2

# More precise values for r and R^2

correlation (r) between y and x is the coef. of


r_precise <- round(actual_r, 4)
r_squared_precise <- round(actual_r_squared, 4)

# Determine the category of the correlation

determination (R2), but this is not true for multiple


category <- correlation_category(r_precise)

# Create a ggplot
p <- ggplot(data, aes(x, y)) +

regression (with many explanatory x variables).


geom_point() +
geom_smooth(method = "lm", color = "red", se = FALSE) +
ggtitle(paste("r: ", r_precise, ", R^2: ", r_squared_precise)) +
labs(subtitle = category) +

Note 2: I used a random seed in the R code so it can be


xlab("X") + ylab("Y")

# Add plot to the list


plots[[length(plots) + 1]] <- p

exactly replicated by using the Python code to the right.


}

# Arrange the plots in a grid


grid.arrange(grobs = plots, ncol = 3)
Coefficient of Determination (R2) vs. Correlation (r)

The same thing…


Using Python
# R2 graphs in Python
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from scipy.stats import pearsonr

# Set a seed for reproducibility


# np.random.seed(5555568)

# Function to generate data with a specific correlation


def generate_data(n, rho):
# Generate x as normal random values
x = np.random.normal(size=n)

# Generate noise
noise = np.random.normal(size=n)

# Calculate y using the specified correlation


y = rho * x + np.sqrt(1 - rho**2) * noise
return x, y

# Set different correlation values, including moderate correlations


correlations = [1, -1, 0.9, -0.9, 0.5, -0.5, 0.35, -0.35]

# Function to categorize the strength of correlation


def correlation_category(r):
if r == 1 or r == -1:
return "Perfect"
elif abs(r) >= 0.7:
return "Strong"
elif abs(r) >= 0.35:
return "Moderate"
else:
return "Weak"

# Set up the matplotlib figure and axes


fig, axes = plt.subplots(4, 2, figsize=(10, 15)) # Adjust the subplot grid as
needed

# Flatten the axes array for easy iteration


axes_flat = axes.flatten()

# Generate plots for different correlations


for i, rho in enumerate(correlations):
x, y = generate_data(100, rho)
r, _ = pearsonr(x, y)
r_squared = r**2

# More precise values for r and R^2


r_precise = round(r, 4)
r_squared_precise = round(r_squared, 4)

# Determine the category of the correlation


category = correlation_category(r_precise)

# Create a scatterplot and line of best fit


sns.scatterplot(x=x, y=y, ax=axes_flat[i])
sns.regplot(x=x, y=y, scatter=False, color='red', ax=axes_flat[i])

# Set the title with r and R^2 values and category as subtitle
axes_flat[i].set_title(f'r: {r_precise}, R^2:
{r_squared_precise}\n{category}')

# Remove any extra subplots


for j in range(i + 1, len(axes_flat)):
fig.delaxes(axes_flat[j])

# Adjust layout
plt.tight_layout()
plt.show()
Coefficient of Determination (R2)
• The better the linear regression (on the right) fits the data in comparison to the simple average
(on the left graph), the closer the value of R2 is to 1.
• The areas of the blue squares represent the squared residuals with respect to the linear regr.
• The areas of the red squares represent the squared residuals with respect to the average value.

• If the area of the red boxes are equally large as the blue ones, then R2=0 – then there is no
value added of the regression line. The red boxes are the benchmark the regression must beat.
Coefficient of Determination (R2)

This is a rough
simplification,
but it shows the
intuition behind R2
Coefficient of Determination (R2)

The difference
between the
observations and
the mean
Coefficient of Determination (R2)

The difference
between the
observations and
the regression line …tells us how much
…and divide by the less variation there is
Var(mean)… around the line vs the
variation around the
…so we get variations of the difference of mean.
variations for the line and the mean

The line is a better


differences in relationship to the variation
of the mean.
indicator of the
relationship between
the mouse size and
weight, than the mean
Coefficient of Determination (R2)
Coefficient of Determination (R2)
Coefficient of Determination (R2) vs effect size
• If all coefficients in an OLS model are statistically significant, and the R² = 0.94 = 94%
The relationship between the two variables explains 94% of the variation in the data!
Good!

However, if all coefficients in an OLS model are statistically significant, and the R² = 0.01 = 1%
Who cares if the coefficients are significant if they only explain 1% of the variation in the
data. Something else must explain the remaining 99%. Not Good!

• Also, the (absolute) effect size should not be too low. If significant coefficients, R² = 0.94,
� 𝟐𝟐 =0.000001  then still not relevant if we increase advertisements spending (x) by
but 𝛃𝛃
1 USD, it increases the sales (y) by 0.000001 USD (or equivalently increase x by 1 million,
increases y by 1 USD).
Coefficient of Determination (R2)
R2 can quantify relationships that are more
complicated (and for many explanatory variables)

R2 is suitable for both linear and


nonlinear relationships.

However, the standard Pearson


correlation coefficient is only
suitable for linear relationships.
2
Example 2: 𝑅𝑅

Find coefficient of determination 𝑅𝑅 2 for the predicted model in Example 1.


Item 𝑌𝑌 𝑋𝑋 𝑿𝑿𝟐𝟐 �
𝑋𝑋 − 𝑿𝑿 �
𝑋𝑋 − 𝑿𝑿 𝟐𝟐 � 𝒀𝒀
𝑋𝑋 − 𝑿𝑿 �
𝒀𝒀 �
𝒖𝒖 � 𝟐𝟐
𝒖𝒖 �
𝒀𝒀 − 𝒀𝒀 𝟐𝟐

1 5 2 4 -1 1 -5 4 1 1 0
2 5 4 16 1 1 5 6 -1 1 0
3 3 1 1 -2 4 -6 3 0 0 4
4 8 5 25 2 4 16 7 1 1 9
5 4 3 9 0 0 0 5 -1 1 1
Sum Σ 25 15 55 0 10 10 - 0 4 14
Mean 5 3 - - - - - - - -
𝑛𝑛 𝑛𝑛

𝑇𝑇𝑇𝑇𝑇𝑇 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑆𝑆𝑆𝑆𝑆𝑆 𝑜𝑜𝑜𝑜 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = � 𝑌𝑌𝑖𝑖 − 𝑌𝑌� 2


= 𝟏𝟏𝟏𝟏 𝑅𝑅𝑅𝑅𝑅𝑅 = 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝑆𝑆𝑆𝑆𝑆𝑆 𝑜𝑜𝑜𝑜 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = � 𝑢𝑢� 2 = 𝟒𝟒
𝑖𝑖=1 𝑖𝑖=1
𝑅𝑅𝑅𝑅𝑅𝑅 4
𝑅𝑅2 = 1 − =1− = 𝟎𝟎. 𝟕𝟕𝟕𝟕𝟕𝟕
𝑇𝑇𝑇𝑇𝑇𝑇 14
2
Example 2: 𝑅𝑅 - Stata check 8
Scatter Plot with Regression Line

* Clearing the current dataset in Stata


clear all 7
Test: H0: β=0, H1: β≠0 at 10% sign level
* Creating the data What is R2? 6
input X Y Y
Fitted values
25 5
45
13 4
58
34
3
end
1 2 3 4 5
X
* Fit the linear regression model and report R^2
regress Y X
R2=0.7143
* Store the R-squared value for later use
scalar R2 = e(r2)

* Generate the scatter plot with a linear fit line


twoway (scatter Y X) (lfit Y X), title("Scatter Plot with Regression Line")
Sign at 10% significance level. Reject H0.
* Display the graph with the R^2 in the title
graph display, title("Scatter Plot with Regression Line, R-squared=" + string(R2))
2
Example 2: 𝑅𝑅 - EViews check
' In EViews:
' Create a new workfile for 5 cross-sections (u=undated)
wfcreate(wf=Example) u 5
' Create two vectors (filled with zeroes) for 5 cross-sections
vector(5) y_vec
vector(5) x_vec
' Fill the vector with some values
y_vec.fill 5, 5, 3, 8, 4
x_vec.fill 2, 4, 1, 5, 3
' Declare series and
series y Test: H0: β=0, H1: β≠0 at 10% sign level
series x What is R2?
' Transform vectors to series
mtos(y_vec,y) Sign at 10% significance level. Reject H0.
mtos(x_vec,x)
' Fit the linear regression model and save it as eq1 (show it)
equation eq1.ls Y C X
show eq1
' Group the variables (call it g1) and create the scatter plot with a linear fit line
group g1 x y R2=0.7143
' Scatterplot the variables and include a regression line
g1.scatpair linefit(y,x)

Note: With one explanatory variable the F-statistic is the square of the slope t-statistic.
That is, (t-statistic=2.7386132) = (7.5=F-statistic) with the same p-value=0.0714.

You might also like