+part 02 - AMEFA - 2024 - Introduction and Repetition
+part 02 - AMEFA - 2024 - Introduction and Repetition
Pär Sjölander
+Part 02 - AMEFA - 2024 - Introduction and Repetition.pptx
Flow Chart of Econometric Modeling
Identify Research Problem: Moreover, state why it is relevant to analyze.
Literature Review and Gap Identification: Examine prior studies to ensure your research is anchored in established
literature, and identify gaps that your work aims to address, thereby offering a unique contribution to the research
field.
Empirical Evaluation of Theoretical Models: Build a model to empirically test established economic theories,
selecting variables that are central to the theoretical framework. Outline the anticipated interactions between these
variables based on theoretical propositions. Transform the theoretical constructs into an econometric model, aiming
to evaluate the theory with empirical data.
Hypothesis Formulation: Create testable hypotheses defining expected relationships and hypothesizing their
statistical significance.
Data Collection: Compile data needed for analysis, ensuring its quality and relevance, whether from existing sources
or via new data collection methods like surveys or experiments.
Econometric Model Estimation: Use specific econometric methods, like regression analysis, to estimate the model.
Model Diagnostics: Conduct tests for common issues (e.g., multicollinearity, heteroskedasticity, autocorrelation,
model misspecification), applying corrective measures as needed.
Results Interpretation: Does the result make sense? Examine estimated coefficients and their statistical significance,
comparing findings with theoretical predictions and assessing their economic relevance.
Robustness Checks: Validate results through various robustness tests, such as alternative specifications, control
variables inclusion, or different estimation methods.
Conclusion and Policy Recommendations: Summarize principal findings, their broader implications, limitations, and
their contribution to existing knowledge. Discuss the research's policy relevance and directions for future inquiry.
Research Reporting: Publication to appropriate audiences.
Linear Models
• John Maynard Keynes in his 1936 work "The General Theory of Employment, Interest, and
Money,“ he emphasizes the significance of aggregate demand in the economy, suggesting that
the consumption function is primarily influenced by current income.
• The MPC is typically between 0 and 1, indicating that a person spends a portion of their
additional income. However, some indiviuals and countries borrow.
• 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑡𝑡 = 𝛽𝛽0 + 𝛽𝛽1 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑡𝑡) where the parameters 𝛽𝛽0 and 𝛽𝛽1 are unknown and must be
estimated by drawing a sample from the population.
• Theoretically, in this specific case since 0.7, we know that 𝛽𝛽1 > 0, which means a positive
association between the Income and Spending.
• We focus on simple linear OLS regression now – however, in practice, the relationship between education and income is likely
to be influenced by many other factors, such as work experience, industry, and geographical location, which are not accounted for in this simple model. It is also most
likely a non-linear relationship since Warren Buffet cannot spend anything more on consumption after he reached a certain income level.
Linear Models – what’s the point of OLS regressions?
• For example, let 𝑋𝑋 and 𝑌𝑌 be income and spending,
respectively. If dropping the error term, and the subscripts
for simplicity, a linear model of Spending (Y) in terms of
Income (X) can be written as 𝑌𝑌 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋
(which is equal to 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = 𝛽𝛽0 + 𝛽𝛽1 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼).
•Cross-sectional analysis can be specified by “i” (individual):
i=1, 2,…,100 individuals: 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑖𝑖
Research Question: How does individual income affect
•Time-series analysis can be specified by “t” (time): spending behavior? (Assume e.g. cross-sectional data)
t=1, 2,…,365 days: 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑡𝑡 = 𝛽𝛽0 + 𝛽𝛽1 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑡𝑡) whereTest hypothesis: H0: 𝛽𝛽1 =0 vs. e.g. H1: 𝛽𝛽1 > 0
Intuition: Can the data support our theory and test
the parameters 𝛽𝛽0 and 𝛽𝛽1 are unknown and must be hypothesis – or is it just a random relationship in the data?
Conclusion: If e.g. p-value<5%, the data reject the null
estimated by drawing a sample from the population. hypothesis. If correctly specified with assumptions met,
there is a significant relationship between Income and
•Panel-data analysis can be specified by “i” & “t”: Speding.
Policy implications: If e.g. 𝛽𝛽1 =0.7 for each extra SEK in
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑖𝑖, 𝑡𝑡 = 𝛽𝛽0,𝑖𝑖 + 𝛽𝛽1 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑖𝑖 , 𝑡𝑡) where the income, 0.7 SEK is spent on consumption. The
government/central bank can use this info to change the
parameters 𝛽𝛽0,𝑖𝑖 is varying over individuals, and 𝛽𝛽1 is fixed.taxes, subsidies/welfare, interest rates, adjust the budget.
Intro/Repetition: What is a R2?
Definition:
R², or the coefficient of determination, is the proportion of the variance in the dependent
variable that is predictable from the independent variables. It is a statistical measure that
represents the goodness of fit of a model.
𝑦𝑦 = β0 + β1 𝑥𝑥1 + β2 𝑥𝑥2 + 𝜖𝜖
Statistical Interpretation:
R² measures the strength of the relationship between the model and the dependent
variable(s). A higher R² value indicates a better fit and suggests that the model’s explanatory
variables can explain a large proportion of the variance in the dependent variable.
R² values range from 0 to 1. An R² of 0 indicates that the model explains none of the variability
of the response data around its mean, while an R² of 1 indicates that the model explains all the
variability of the response data around its mean.
Intro/Repetition: What is a t-test in regression analysis?
Definition:
A t-test in regression analysis is used to test the null hypothesis that a specific parameter, such
as a slope coefficient, is not significantly different from zero. This test is crucial for evaluating
if individual independent variables contribute to the explanatory power of the model
𝑦𝑦 = β0 + β1 𝑥𝑥1 + β2 𝑥𝑥2 + 𝜖𝜖
Statistical Interpretations:
Produces a t-value (with its p-value), assessing if differences are statistically significant – or
just due to natural variation in the data set (due to random chance). If (in relationship to the
variation) the t statistic is large in absolute terms (e.g. >1.96, at the 5% significance level) the
βj is statistically significantly different from zero and we reject H0.
A significant t-statistic indicates that the predictor is meaningfully contributing to the model,
and the relationship not simply caused by natural variation.
Intro/Repetition: What is a F-test in regression analysis?
Definition:
An F-test in regression analysis is used to test the null hypothesis that e.g. all independent
variables (when considered together) do not significantly contribute to the explanatory power
of the model. This evaluates the overall fit of the model.
𝑦𝑦 = β0 + β1 𝑥𝑥1 + β2 𝑥𝑥2 + 𝜖𝜖
H0: β1=β2=0 vs. H1: at least one βj≠0
Statistical Interpretation:
Produces an F-value (with its p-value), assessing if the overall model is statistically significant.
If the F-value is larger than the critical value from the F-distribution (e.g., F > 3.84 at the 5%
significance level for a model with 2 and 120 degrees of freedom), it indicates that the overall
model is statistically significant, and we reject H0.
A significant F-statistic suggests that the overall model is better at explaining the variation in
the dependent variable than a model without the independent variables. Then the
relationship does not seem to be due to random chance.
Intro/Repetition: What is a p-value?
Definition:
The p-value represents the probability of observing a test statistic as extreme as, or more
extreme than, the one obtained from the sample data, given that the null hypothesis (H0) is
true. It quantifies the evidence against H0 by indicating the likelihood of the observed data
under the assumption that H0 holds. For example, a p-value of 0.04 suggests that, assuming
the null hypothesis is true, there is a 4% chance of obtaining a result as extreme as, or more
extreme than, the one observed, solely due to random variation. This does NOT imply that H0
has a 4% probability of being true; instead, it reflects the rarity of the test statistic's value
under the null hypothesis framework.
Statistical Interpretation:
A measure of the strength of evidence against the null hypothesis. A small p-value indicates
strong evidence against the null hypothesis.
The smaller the p-value, the greater the statistical significance of the observed difference
Reject H0 (otherwise for high p-values we cannot reject H0). If we reject the null hypothesis
we believe that there is a relationship between the variables.
No statistical tables are needed to reject/not reject. P-value can answer if a relationship is
statistically significant or just due to chance and natural variation in the variables.
At a 5% significance level, a p-value below 0.05 will indicate a statistically significant result.
CSM, TSM & panel data models
X=Income for individuals can be measured by cross-sectional, timeseries, and panel data.
Cross-sectional data: A sample of individuals T=1, N=22, 22 observations
observed in 1 time period (for instance 22
individual’s income for the year 2020) – Xi
Panel data: Same sample of individuals Panel data: N=22, T=3, NT=66 observations
observed in multiple time periods (for instance
22 persons’ income for the years 2020, 2021,
and 2022) – Xi,t
Note: Obviously there is more
information to extract and analyze
Given some assumptions it is not uncommon that we in a full panel. Therefore, later in
can extract more reliable information from 66 observations the education, we will explore in
detail how to analyze panel data.
compared to 22 and (especially) 3 observations. *This is a simplified example, but in practice T=3 are too few observations to run a regression Now, we will just introduce it.
CSM, TSM & panel data models
Y=Spending for individuals can be measured by cross-sectional, timeseries, and panel data.
Cross-sectional data: A sample of individuals T=1, N=22, 22 observations
observed in 1 time period (for instance 22
individual’s spending year 2020) – Yi
Panel data: Same sample of individuals Panel data: N=22, T=3, NT=66 observations
observed in multiple time periods (for instance
22 persons’ spending for the years 2020, 2021,
and 2022) – Yi,t
Note: Obviously there is more
information to extract and analyze
Given some assumptions it is not uncommon that we in a full panel. Therefore, later in
can extract more reliable information from 66 observations the education, we will explore in
detail how to analyze panel data.
compared to 22 and (especially) 3 observations. *This is a simplified example, but in practice T=3 are too few observations to run a regression Now, we will just introduce it.
CSM, TSM & panel data models
Spending=f(Income) using formulas for cross-sectional, timeseries, and panel data.
• Cross-Sectional Models (CSM): Income and spending for e.g. a cross-
section of i=1,2,…,22 individuals (N=22, T=1):
Yi = α + β*Xi + ui or SPENi = α + β*INCi + ui
�0 + β
Underfitted Estimated Model: 𝑦𝑦� = β �1 𝑥𝑥1
Linear Models – OLS estimator properties – bias & efficiency
• UNBIASEDNESS (= perfect, high, accuracy) is about an
High efficiency
estimator's accuracy: whether it consistently estimates the true
parameter value without systematic error (unbiasedness). An
estimator 𝛃𝛃�𝒏𝒏 is unbiased if its expected value is equal to the true
parameter value for any sample size n. E β�𝒏𝒏 = β
Low efficiency
An estimator is an efficient estimator if in some well-defined class
of estimators (e.g. within a class of linear unbiased estimators)
it is the estimate with the smallest variance.
4 estimators: High bias Low bias
(unbiasedness)
•B=Unbiased & (fairly) efficient
•D=Unbiased & inefficient
•A=Biased & inefficient
•C=Biased & (fairly) efficient
Linear Models – OLS estimator properties – consistency
What happens when n ∞ ?
• CONSISTENCY – as n increases – higher accuracy/lower bias AND
lower variance/higher precision/higher efficiency). All consistent estimators
improve both their accuracy and precision as the sample size n increases.
Definition: “Consistency
The estimator β�2 is a consistent estimator of β2 if the distribution fβ2 collapses at the point β2 into a straight line
when the sample size goes to infinity. This means, that for increasing n, the variance of 𝛃𝛃 �𝟐𝟐 goes to zero, and in
cases where the estimator is biased, the bias also goes to zero”. Vogelvang (2005), Econometrics, p.65.
For consistent estimators, when increasing n the variance of β�n goes to zero, and in cases
where the estimator is biased, this bias will also decrease as n increases. Thus, for
consistent estimators – the bias and variance typically get smaller as the sample size n
increases. Therefore, as n increases, 𝑀𝑀𝑀𝑀𝑀𝑀 β�𝑛𝑛 decreases (where MSE = Bias2 + Variance).
A Deterministic Model allows you to calculate a future event exactly, without the involvement of randomness. If something is A Stochastic Model has the capacity to handle uncertainties in the inputs applied. Stochastic models possess some inherent
deterministic, you have all of the data necessary to predict (determine) the outcome with certainty. However, different initial randomness - the same set of parameter values and initial conditions will lead to an ensemble of different outputs.This type
conditions will affect the output. This type of modeling is less relevant in economics and finance – but sometimes feasible in e.g. of modeling is more relevant in economics and finance since we add an error term – or empirically – a residual.
physics.
Deterministic (Mathematical) vs. Stochastic (Statistical) Models
• A time series can be decomposed into different elements.
(i) Original data series, (i)
(ii) trend,
(iii) seasonal cycle, and a
(iv) remainder - irregular component (residual/error component that
accounts for random noise and unexplained fluctuations in the data). (ii)
• Find the deterministic relationship between how the variation in x
affects the variation in y. Usually we are not interested in modeling
(ii)-(iv), therefore we often try to “filter out” these elements, and then
isolate the underlying pattern and (iii)
Approaches to filter out systematic patterns that are distorting our
regression analysis (since we are usually just interested in the
underlying data-generating process of our variables of interest) (iv)
• (ii) Detrending: Applying a moving average, or HP Filter (Hodrick-
Prescott), or fitting a model, such as a polynomial or other function,
to the data and then subtracting this trend component from the
original series smooths out short-term fluctuations, highlighting the (iv) Note: The irregular remaining random noise
longer-term trend from the (i). cannot be predicted (Random Walk Models: These models
are sometimes used to represent the irregular component,
• (iii) Deseasonalize: Based on (i), create binary (0 or 1) dummy variablesassuming that changes are random and unpredictable)
for each period that represents a season (e.g., months, quarters) to – while possibly the other parts (ii)-(iii) and maybe (i) can
capture seasonal effects. Loess smoothing, Holt-Winters Exponential be predicted.
Smoothing, X-12-ARIMA etc. are common deseasonalizing tools.
Regression Analysis - notations
X Y
Cause Effect
Independent Dependent
Explanatory Explained
Regressor Regressand
Covariate Outcome
Predictor Predicted
Exposure Response
Exogenous Endogenous
Definition of the error term and residual
Error terms (often represented by e.g. u or ε) are not observable – and are the
deviation of the observed values from the true population regression line. We do
not know the true population regression line (and if we knew it, there would be no
point estimating a regression line if we already know the true relationship for the
population that the sample is intended to represent).
* However, if we had access to the entire population e.g. all salaries and all relevant explanatory variables for determining teacher salaries at JIBS,
then the error terms and the residuals would be identical (given that the model is correctly specified and that we have all the observations).
𝒀𝒀
𝒀𝒀 = 𝜷𝜷𝟎𝟎 + 𝜷𝜷𝟏𝟏 𝑿𝑿
This is the true population
relationship.
𝒀𝒀𝟔𝟔 𝑬𝑬𝟔𝟔
𝒀𝒀𝟓𝟓
𝑬𝑬𝟓𝟓
𝒀𝒀𝟒𝟒
𝑬𝑬𝟒𝟒
𝒀𝒀𝟑𝟑
𝑬𝑬𝟑𝟑
𝒀𝒀𝟐𝟐
𝑬𝑬𝟐𝟐
𝒀𝒀𝟏𝟏
𝑬𝑬𝟏𝟏
𝜷𝜷𝟏𝟏 �
𝜷𝜷𝟎𝟎 �
1 unit
𝑿𝑿
𝑿𝑿𝟏𝟏 𝑿𝑿𝟐𝟐 𝑿𝑿𝟑𝟑 𝑿𝑿𝟒𝟒 𝑿𝑿𝟓𝟓 𝑿𝑿𝟔𝟔
(Theoretically) Expected values of 𝑌𝑌 (that is 𝐘𝐘𝟏𝟏, … , 𝐘𝐘𝐘𝐘), given 6 different values of 𝑋𝑋 (X1, … , X6)
𝒀𝒀
𝒀𝒀𝟒𝟒
𝑶𝑶𝟒𝟒
𝒀𝒀𝟔𝟔
𝑬𝑬𝟔𝟔
𝒀𝒀𝟓𝟓 𝒀𝒀𝟔𝟔
𝑬𝑬𝟓𝟓 𝑶𝑶𝟔𝟔
𝒀𝒀𝟒𝟒
𝑶𝑶𝟏𝟏 𝑶𝑶𝟑𝟑 𝑬𝑬𝟒𝟒
𝒀𝒀𝟑𝟑 𝒀𝒀𝟏𝟏 , 𝒀𝒀𝟑𝟑 , 𝒀𝒀𝟓𝟓 𝑶𝑶𝟓𝟓
𝑬𝑬𝟑𝟑
𝒀𝒀𝟐𝟐
𝑬𝑬𝟐𝟐
𝒀𝒀𝟏𝟏 𝒀𝒀𝟐𝟐
𝑬𝑬𝟏𝟏 𝑶𝑶𝟐𝟐
𝜷𝜷𝟏𝟏 �
𝜷𝜷𝟎𝟎 �
1 unit
𝑿𝑿
𝑿𝑿𝟏𝟏 𝑿𝑿𝟐𝟐 𝑿𝑿𝟑𝟑 𝑿𝑿𝟒𝟒 𝑿𝑿𝟓𝟓 𝑿𝑿𝟔𝟔
Observed values (from a sample) of 𝑌𝑌 compared to expected (not observable, since the true regression line is unknown) values of 𝑌𝑌, given 6 𝑋𝑋 obs.
𝒀𝒀
𝒀𝒀𝟒𝟒
𝑶𝑶𝟒𝟒
𝒀𝒀𝟔𝟔
𝑬𝑬𝟔𝟔
𝒀𝒀𝟓𝟓 𝒀𝒀𝟔𝟔 𝒖𝒖𝟒𝟒 � 𝒖𝒖𝟔𝟔 �
𝑬𝑬𝟓𝟓 𝑶𝑶𝟔𝟔
𝒀𝒀𝟒𝟒 𝒖𝒖𝟓𝟓 �
𝑶𝑶𝟏𝟏 𝑶𝑶𝟑𝟑 𝑬𝑬𝟒𝟒
𝒀𝒀𝟑𝟑 𝒀𝒀𝟏𝟏 , 𝒀𝒀𝟑𝟑 , 𝒀𝒀𝟓𝟓 𝒖𝒖𝟑𝟑 𝑶𝑶𝟓𝟓
𝑬𝑬𝟑𝟑
𝒀𝒀𝟐𝟐 𝒖𝒖𝟏𝟏 � 𝑬𝑬𝟐𝟐
𝒖𝒖𝟐𝟐 {
𝜷𝜷𝟏𝟏 �
𝜷𝜷𝟎𝟎 �
1 unit
𝑿𝑿
𝑿𝑿𝟏𝟏 𝑿𝑿𝟐𝟐 𝑿𝑿𝟑𝟑 𝑿𝑿𝟒𝟒 𝑿𝑿𝟓𝟓 𝑿𝑿𝟔𝟔
(Given the observations, 01 to 06) Expected error terms 𝒖𝒖𝟏𝟏 , … , 𝒖𝒖𝟔𝟔 from estimating values of 𝑌𝑌 given 6 𝑋𝑋 obs.
𝒀𝒀
𝒀𝒀𝟒𝟒 𝑷𝑷𝟔𝟔
𝑶𝑶𝟒𝟒
𝒀𝒀𝟔𝟔
𝑬𝑬𝟔𝟔� 𝒖𝒖
� 𝟔𝟔
𝒀𝒀𝟓𝟓 𝒀𝒀𝟔𝟔 𝒖𝒖𝟒𝟒 � 𝑷𝑷𝟓𝟓 𝒖𝒖𝟔𝟔 �
𝑷𝑷𝟒𝟒 𝑬𝑬𝟓𝟓 𝑶𝑶𝟔𝟔
𝒀𝒀𝟒𝟒 𝒖𝒖𝟓𝟓 �
𝑶𝑶𝟏𝟏 𝑶𝑶𝟑𝟑 𝑬𝑬𝟒𝟒
𝒀𝒀𝟑𝟑 𝒀𝒀𝟏𝟏 , 𝒀𝒀𝟑𝟑 , 𝒀𝒀𝟓𝟓 𝒖𝒖𝟑𝟑 𝑶𝑶𝟓𝟓
𝑬𝑬
𝑷𝑷𝟑𝟑 𝟑𝟑
𝒀𝒀𝟐𝟐 𝒖𝒖 �
� 𝟏𝟏 � 𝟏𝟏
𝒖𝒖 𝒖𝒖𝟐𝟐 { 𝑬𝑬𝟐𝟐
𝜷𝜷𝟏𝟏 �
� 𝟏𝟏
� 𝜷𝜷
𝜷𝜷𝟎𝟎 �
� 𝟎𝟎
�𝜷𝜷
1 unit
𝑿𝑿
𝑿𝑿𝟏𝟏 𝑿𝑿𝟐𝟐 𝑿𝑿𝟑𝟑 𝑿𝑿𝟒𝟒 𝑿𝑿𝟓𝟓 𝑿𝑿𝟔𝟔
Estimated (observable) regression equation, and predicted values of 𝑌𝑌 compared to expected and observed values of 𝑌𝑌, given 6 𝑋𝑋 obs.
Ordinary Least Squares Method
• There are many ways to estimate a regression model. The most common method is the Ordinary
Least Squares (OLS) method.
• The OLS method aims at minimizing the sum of squared errors. The errors are unobservable.
Therefore, it aims for the sum of squared predicted errors = called residuals (which are
observable).
_____________________
_____________________
𝑢𝑢� 𝑖𝑖 = 𝑌𝑌𝑖𝑖 − 𝑌𝑌�𝑖𝑖 = 𝑌𝑌𝑖𝑖 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑋𝑋𝑖𝑖 Here, 𝑢𝑢� 𝑖𝑖 represents the residuals or the differences between the observed values (𝑌𝑌𝑖𝑖 ) and the predicted values (𝑌𝑌�𝑖𝑖 ) from
the regression. The residuals indicate how much the actual data points deviate from the estimated regression line.
_____________________
2 ∑𝑛𝑛 �𝑖𝑖2
𝑖𝑖 𝑢𝑢
𝜎𝜎� = This formula calculates the “estimated variance of the residuals” (𝜎𝜎� 2 ) which measures the dispersion of the residuals
𝑛𝑛−2
(or errors) around the regression line. It's an estimate of the variance of the error term in the population regression equation. The denominator (n-2) is the
degrees of freedom, where n is the number of observations, and “2” accounts for the 2 number of parameters estimated (𝛽𝛽̂0 𝑎𝑎𝑎𝑎𝑎𝑎 𝛽𝛽̂1 ). Thus, the variance of
the error term in the population regression is σ2, which is an unknown parameter that we estimate with 𝜎𝜎� 2 . A smaller value of 𝝈𝝈 � 𝟐𝟐 suggests that the model
� 𝟐𝟐 might indicate a poor model fit.
fits the data well, as it indicates less variability in the residuals. Conversely, a larger 𝝈𝝈
Equivalent Forms of 𝛽𝛽̂1 Estimators
� 𝑖𝑖
∑𝑛𝑛𝑖𝑖=1(𝑋𝑋𝑖𝑖 − 𝑋𝑋)𝑌𝑌
̂
𝛽𝛽1 = 𝑛𝑛
∑𝑖𝑖=1(𝑋𝑋𝑖𝑖 − 𝑋𝑋)� 2
�
∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 (𝑌𝑌𝑖𝑖 − 𝑌𝑌)
𝛽𝛽̂1 = 𝑛𝑛
∑𝑖𝑖=1(𝑋𝑋𝑖𝑖 − 𝑋𝑋) � 2
�
∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 (𝑌𝑌𝑖𝑖 − 𝑌𝑌)
𝛽𝛽̂1 = 𝑛𝑛
∑𝑖𝑖=1 𝑋𝑋𝑖𝑖 (𝑋𝑋𝑖𝑖 − 𝑋𝑋)�
∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − 𝑛𝑛𝑋𝑋� 𝑌𝑌� 𝑛𝑛 ∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 𝑌𝑌𝑖𝑖 − ∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 ∑𝑛𝑛𝑖𝑖=1 𝑌𝑌𝑖𝑖
𝛽𝛽̂1 = 𝑛𝑛 =
∑𝑖𝑖=1 𝑋𝑋𝑖𝑖2 − 𝑛𝑛𝑋𝑋� 2 𝑛𝑛 ∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖2 − ∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖
2
There are different views but some argue that the last version is the easiest to use, when find the
� 𝟏𝟏 .
parameters based on the data in tables. However, they will all give the same estimate for 𝜷𝜷
Precision of OLS Parameter Estimators
• Standard error is a measure of estimators’ precision – how much the estimated value would typically vary if
we were to repeat our sample many times. It gives us an idea of the precision of our estimate; a smaller
standard error suggests a more precise estimate).
• An estimate of the “standard deviation of an estimator” is called standard error.
𝜎𝜎 2
𝑠𝑠. 𝑒𝑒. (𝛽𝛽̂1 ) = ∑𝑛𝑛 � 2
The standard error of the slope coefficient (𝛽𝛽̂1 ) from a regression model,
𝑖𝑖=1(𝑋𝑋𝑖𝑖 −𝑋𝑋)
where, σ2 represents the variance of the error terms in the model (estimated by 𝜎𝜎� 2 ). The denominator is the sum of
� This tells us how much 𝛽𝛽̂1 would vary from
the squared differences between each Xi and the mean of X (that is 𝑋𝑋).
sample to sample.
𝜎𝜎2 ∑𝑛𝑛 2
𝑖𝑖=1 𝑋𝑋𝑖𝑖
𝑠𝑠. 𝑒𝑒. 𝛽𝛽̂0 = This is the standard error of the intercept (𝛽𝛽̂0 ).
𝑛𝑛 ∑𝑛𝑛 � 2
𝑖𝑖=1 𝑋𝑋𝑖𝑖 −𝑋𝑋
∑ 𝑢𝑢 𝑛𝑛 �2
Note: The estimated variance (𝜎𝜎� ) is usually calculated as follows 𝜎𝜎�2 = 𝑖𝑖=1 𝑖𝑖 represents the residuals (the differences between
2
𝑛𝑛−𝑘𝑘
the observed and predicted values), n is the number of observations, and k is the number of parameters estimated (including the
intercept, even if it is sometimes written as n-k-1, when the intercept is not included).
Example 1: Simple Linear Regression
𝑦𝑦 = β0 + β1 𝑥𝑥1 + 𝜖𝜖 we obtain 𝑦𝑦 = 2 + 1 𝑥𝑥1 + e
OLS estimation for predicting supplied quantity (𝑌𝑌) based on the price (𝑋𝑋)
Item 𝑌𝑌 𝑋𝑋 𝑿𝑿𝟐𝟐 �
𝑋𝑋 − 𝑿𝑿 �
𝑋𝑋 − 𝑿𝑿 𝟐𝟐 � 𝒀𝒀
𝑋𝑋 − 𝑿𝑿 �
𝒀𝒀 �
𝒖𝒖 � 𝟐𝟐
𝒖𝒖 � 𝑖𝑖 𝟏𝟏𝟏𝟏
∑𝑛𝑛𝑖𝑖=1(𝑋𝑋𝑖𝑖 − 𝑋𝑋)𝑌𝑌
̂
𝛽𝛽1 = 𝑛𝑛 = = 𝟏𝟏 𝜎𝜎� 2 = the estimator of the
1 5 2 4 -1 1 -5 4 1 1 ∑𝑖𝑖=1(𝑋𝑋𝑖𝑖 − 𝑋𝑋)� 2 𝟏𝟏𝟏𝟏 variance of the error term u.
A measure of the variability of
2 5 4 16 1 1 5 6 -1 1 the residuals around the
𝛽𝛽̂0 = 𝑌𝑌� − 𝛽𝛽̂1 𝑋𝑋� = 𝟓𝟓 − (𝟏𝟏)(𝟑𝟑) = 𝟐𝟐 regression line, that is, the
overall variability in the model
3 3 1 1 -2 4 -6 3 0 0 that cannot be explained by
𝑛𝑛 2
4 8 5 25 2 4 16 7 1 1 ∑ 𝑖𝑖 𝑢𝑢
� 𝑖𝑖 𝟒𝟒 𝟒𝟒 the independent variables.
𝜎𝜎� 2 = = = = 1.3𝟑𝟑𝟑𝟑 𝑢𝑢� 2 = MSRes the individual
𝑛𝑛 − 2 5 − 2 3 squared residuals from the
n=5 4 3 9 0 0 0 5 -1 1 regression model. Each residua
is the difference between the
Sum Σ 25 15 55 0 10 10 0 4 𝑌𝑌� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑋𝑋 = 𝟐𝟐 + 𝟏𝟏 ∗ 𝑋𝑋 observed value and the
predicted value for a given obs.
Mean 5 3 0 - - 𝑢𝑢� = 𝑌𝑌 − 𝑌𝑌�
Used for calculating e.g. 𝜎𝜎� 2
• The total sum of squares of the dependent variable can be analyzed as:
𝑛𝑛 𝑛𝑛
2 2
� 𝑦𝑦𝑖𝑖 − 𝑦𝑦� = � 𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 + 𝑦𝑦�𝑖𝑖 − 𝑦𝑦�
𝑖𝑖=1 𝑖𝑖=1
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑆𝑆𝑆𝑆𝑆𝑆 𝑜𝑜𝑜𝑜 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
𝑛𝑛 𝑛𝑛 𝑛𝑛 � 𝒊𝒊
𝒖𝒖
2 2
= � 𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 + � 𝑦𝑦�𝑖𝑖 − 𝑦𝑦� + 2 �(𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 ) 𝑦𝑦�𝑖𝑖 − 𝑦𝑦�
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
𝒛𝒛𝒛𝒛𝒛𝒛𝒛𝒛
𝑛𝑛 𝑛𝑛
𝑖𝑖=1 𝑖𝑖=1
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝑆𝑆𝑆𝑆𝑆𝑆 𝑜𝑜𝑜𝑜 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑆𝑆𝑆𝑆 𝑜𝑜𝑜𝑜 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
(𝑑𝑑𝑑𝑑𝑑𝑑 𝑡𝑡𝑡𝑡 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸) (𝑑𝑑𝑑𝑑𝑑𝑑 𝑡𝑡𝑡𝑡 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅)
OLS Estimators – the logics behind the formula
Intuition of estimators:
𝑛𝑛
∑ 𝑋𝑋𝑖𝑖 −𝑋𝑋� 𝑌𝑌𝑖𝑖 −𝑌𝑌�
• OLS estimators 𝛽𝛽̂1 = 𝑖𝑖=1
∑𝑛𝑛 � 2
𝑖𝑖=1 𝑋𝑋𝑖𝑖 −𝑋𝑋
2
𝐸𝐸𝐸𝐸𝐸𝐸
𝑅𝑅 =
𝑇𝑇𝑇𝑇𝑇𝑇
or
2
𝐸𝐸𝑆𝑆𝑆𝑆 𝑇𝑇𝑇𝑇𝑇𝑇 − 𝑅𝑅𝑅𝑅𝑅𝑅 𝑅𝑅𝑆𝑆𝑆𝑆
𝑅𝑅 = = =1−
𝑇𝑇𝑇𝑇𝑇𝑇 𝑇𝑇𝑇𝑇𝑇𝑇 𝑇𝑇𝑇𝑇𝑇𝑇
• We have 0 ≤ 𝑅𝑅 2 ≤ 1, since 𝑅𝑅 2 is a proportion. 𝑅𝑅2 = 1 means a perfect fit.
Coefficient of Determination (R2)
R2 = 0 indicates that the regression R2 = 1 indicates that the regression
line cannot explain more (of the variation line perfectly fits the data
in y) than the arithmetic mean of y.
Correlation
Correlation (Pearson, r) vs. R2 – intuition
Correlation
library(ggplot2)
library(gridExtra)
• Positive 0≤R2≤1 (almost always) # Function to generate data with a specific correlation
generate_data <- function(n, rho) {
# Generate x as normal random values
# Generate noise
noise <- rnorm(n)
outcome data.
return("Weak")
}
}
# Create a ggplot
p <- ggplot(data, aes(x, y)) +
# Generate noise
noise = np.random.normal(size=n)
# Set the title with r and R^2 values and category as subtitle
axes_flat[i].set_title(f'r: {r_precise}, R^2:
{r_squared_precise}\n{category}')
# Adjust layout
plt.tight_layout()
plt.show()
Coefficient of Determination (R2)
• The better the linear regression (on the right) fits the data in comparison to the simple average
(on the left graph), the closer the value of R2 is to 1.
• The areas of the blue squares represent the squared residuals with respect to the linear regr.
• The areas of the red squares represent the squared residuals with respect to the average value.
• If the area of the red boxes are equally large as the blue ones, then R2=0 – then there is no
value added of the regression line. The red boxes are the benchmark the regression must beat.
Coefficient of Determination (R2)
This is a rough
simplification,
but it shows the
intuition behind R2
Coefficient of Determination (R2)
The difference
between the
observations and
the mean
Coefficient of Determination (R2)
The difference
between the
observations and
the regression line …tells us how much
…and divide by the less variation there is
Var(mean)… around the line vs the
variation around the
…so we get variations of the difference of mean.
variations for the line and the mean
However, if all coefficients in an OLS model are statistically significant, and the R² = 0.01 = 1%
Who cares if the coefficients are significant if they only explain 1% of the variation in the
data. Something else must explain the remaining 99%. Not Good!
• Also, the (absolute) effect size should not be too low. If significant coefficients, R² = 0.94,
� 𝟐𝟐 =0.000001 then still not relevant if we increase advertisements spending (x) by
but 𝛃𝛃
1 USD, it increases the sales (y) by 0.000001 USD (or equivalently increase x by 1 million,
increases y by 1 USD).
Coefficient of Determination (R2)
R2 can quantify relationships that are more
complicated (and for many explanatory variables)
1 5 2 4 -1 1 -5 4 1 1 0
2 5 4 16 1 1 5 6 -1 1 0
3 3 1 1 -2 4 -6 3 0 0 4
4 8 5 25 2 4 16 7 1 1 9
5 4 3 9 0 0 0 5 -1 1 1
Sum Σ 25 15 55 0 10 10 - 0 4 14
Mean 5 3 - - - - - - - -
𝑛𝑛 𝑛𝑛
Note: With one explanatory variable the F-statistic is the square of the slope t-statistic.
That is, (t-statistic=2.7386132) = (7.5=F-statistic) with the same p-value=0.0714.