Econometrics Notes
Econometrics Notes
F-test
The F-test in linear regression is used to assess overall significance
of the model. It compares the fit of the estimated model with a
model that has no independent variables. The null hypothesis for
the F-test is that all coefficients of the independent variables are
zero, meaning the model has no explanatory power.
RSS is the residual sum of squares residuals when the model is
estimated with the independent variables.
1).ESS is the explained sum of squared residuals when the model
is estimated without the independent variables (i.e., the null
model with only an intercept).
2). k is the number of independent variables in the model.
3). n is the number of observations.
Steps for Conducting an F-test in Linear Regression
1. Formulate Hypotheses
Null hypothesis (H0): All coefficients are zero (model has no
explanatory power).
Alternative Hypothesis (H1): At least one coefficient is nonzero
(model is significant).
2. Calculate the F-statistic Obtain the residual sum of squares (RSS)
and explain the sum of squares (ESS) from the regression output.
3. Determine Degrees of Freedom
Degrees of freedom for the numerator is k (number of coefficients
being tested).
Degrees of freedom for the denominator is n- k-1
(Total sample size minus the number of coefficients and 1).
4. Compare with Critical Value or P-value
Use the F-statistic to look up a critical value from an F-distribution
table or compare it with a significance level.
Alternatively, obtain the P-value associated with the F-statistic. If
the P-value is less than the chosen significance level (commonly
0.05), you reject the null hypothesis.
If the F-test is significant, it indicates that at least one
independent variable contributes significantly to explaining the
variability in the dependent variable.
It is crucial for assessing the overall significance of a regression
model. If the F-statistic is significant, it suggests that the model
explains a significant amount of variance in the dependent
variable, and at least one predictor variable is contributing to the
model’s explanatory power.
Goodness of Fit
R-squared (R2)
R-squared (R2) is a statistical measure that represents the
proportion of the variance in the dependent variable that is
explained by the independent variables in a linear regression
2
model. It ranges from 0 to 1, where a higher R indicates a better
fit of the model to the data.
For linear Regression
2
R =1-RSS/TSS
RSS is the residual sum of squares (the sum of the squared
differences between the observed and predicted values of the
dependent variable).
TSS is the total sum of squares, which measures the total variance
of the dependent variable.
2
Interpreting R
1. R2 = 0: The model does not explain any variability in the
dependent variable.
2. R2 = 1: The model perfectly explains the variability in the
dependent variable.
3. 0 < R2 < 1: Indicates the proportion of variability explained by
the model. For example, an R2 of 0.75 means that 75% of the
variance in the dependent variable is explained by the
independent variables.
Limitations of R2
1.R2 does not indicate whether the coefficients are statistically
significant.
2. It does not provide information about the goodness of fit for
models with different numbers of independent variables.
3. High R2 does not necessarily mean a causal relationship.
4. R2 may increase with addition of irrelevant variables (overfitting)
Testing of Hypothesis
Hypothesis testing in linear regression, whether simple or
multiple, involves assessing the significance of the regression
coefficients and overall model fit.
The two primary hypotheses commonly tested are related to the
individual coefficients (slope parameters) and the overall
significance of the model.
Hypothesis Testing for Simple Linear Regression
1. Testing Individual Coefficients (β0 and β1)
Null Hypotheses
1. H0: β0 = 0 (the intercept is equal to zero)
2. H0: β1 = 0 (the slope is equal to zero)
Alternative Hypotheses
1. H1: β0 ≠ 0 H1: β0 > 0 or H1: β0 < 0
2. H1: β1 ≠ 0 H1: β0 = 0 or H1: β1< 0
Test Statistic: The t-statistic is used for testing individual
coefficients.
Decision Rule: Reject the null hypothesis if the P-value is less
than the chosen significance level (e.g., 0.05).Below is an example
of hypothesis testing for simple linear regression.
2. Testing Overall Model Significance (ANOVA)
Null Hypothesis
H0: β1 = β2 = ...=βκ = 0 (none of the coefficients are significant)
Alternative Hypothesis
H1: at least βi ≠0 ( at least one coefficient is significant )
Test Statistic: F-statistic is used for testing overall model
significance.
Decision Rule: Reject the null hypothesis if the P-value is less
than the chosen significance level.
Practical Application Using Econometric Software
Estimating Simple Regression in EViews
Step 1: Open EViews on your computer.
Step 2: Click on File, then select New, and choose Workfile to
create a new file.
Step 3: Specify the frequency of the data for time series data or
select undated/irregular for cross-sectional data. Define the start
and end of your data set. EViews will open a new window
automatically containing a constant (c) and a residual (resid)
series.
Step 4: On the command line, type the following: genr x = 0 (press
enter). genr y = 0 (press enter)
This creates two new series named x and y with zeros for every
observation. Open x and y as a group by selecting them and
doubleclicking with your mouse.
Step 5: Either type the data into EViews or copy/paste it from
Excel. To edit the data or paste anything into EViews cells, press
Edit + / - button after editing press the button to close or secure
the data.
Step 6: Once the data is entered into EViews, estimate the
regression line (to obtain alpha and beta) either by typing:
ls ycx (press enter) on the command line, or by clicking on
quick/estimate equation and writing your equation (i.e., ycx) in
the new window. Note that EViews automatically chooses OLS
(ordinary least squares) as the estimation method, and the
sample is automatically set to the maximum possible.
Linear regression model
3. Avoiding Multicollinearity
In regression analysis, including dummy variables for all
categories without omitting one can lead to multicollinearity
issues. Omitting one category helps prevent perfect correlation
among the dummy variables.
Types of Dummy Variables
1. Binary Dummy Variables: This is the simplest type of dummy
variable and is used for a categorical variable with two categories
(binary). One category is chosen as the reference, and a single
dummy variable is created to represent the other category. The
dummy variable takes the value of 0 or 1, indicating the absence
or presence of the category
2. Multicategory Dummy Variables: For categorical variables with
more than two categories, multiple dummy variables are created.
If a variable has k categories, k-1 dummy variables are typically
created, with one category chosen as the reference
3. Interaction Dummy Variables: Interaction dummy variables are
used when there are potential interactions between two or more
categorical variables. These variables are created by taking the
product of the dummy variables representing the individual
categories.
Intercept dummy Variables
1. Reference Category: One category of the categorical variable is
chosen as the reference or baseline category. This category is not
explicitly represented by a dummy variable.
2.Dummy Variables: For categorical variable with k categories,k- 1
dummy variables are created. Each dummy variable represents
one of the nonreference categories. If the variable has k
categories (including the reference category), then k - 1 dummy
variables are created.
3.Intercept: The intercept in the regression equation represents
the expected value of the dependent variable when all predictor
variables (including dummy variables) are set to zero. In the
context of dummy variables, this means when the observation
belongs to the reference category.
Dependent variable:-β0 + β1 x Green dummy + β2 x Blue dummy + ε
β0 represents the expected value when the color is red (the
reference category).
β1 represents change in expected value when the color is green.
β2 represents change in the expected value when the color is blue
Slope Dummy Variables: Relationship between the independent
variable (predictor) and the dependent variable (response) differs
between the two groups.
Interaction Term: The slope dummy variable is often used in
interaction with the original independent variable.The interaction
term is the product of the slope dummy variable and the original
independent variable. This interaction term is added to the
regression model to allow for different slopes for the two groups.
Yi = β0 + βiXi + β2Di + β3(Xi x Di) + ε
Yi = dependent variable, Xi = independent variable,
Di=slope dummy variable (1 for Group B, 0 for Group A),
ε = error term, β0=is the intercept,
β1= is the slope of Group A,
β2=Difference in intercepts between Group B and Group A, and
β3=Difference in the slopes between Group B and Group A.
The presence of the interaction term Xi × Di allows the slope of the
regression line to vary between the two groups.
Use of Dummy Variables to Model Qualitative/Binary/
Structural
1. Binary variables
Dummy Variable: Create a dummy variable (also known as an
indicator variable) that takes the value 1 for one category (e.g.,
female) and 0 for the other category (e.g., male).
Regression Model: Include this dummy variable in your regression
model.The coefficient associated with the dummy variable
indicates the average change in the dependent variable when
moving from one category to the other.
Dependent variable = β0 + β1 x X1+ β2 x Gender dummy + ε
represents the average difference in the dependent variable
between the two gender categories.
β2 represents the average difference in the dependent variable
between the two gender categories.
2. Structural changes
Example: Imagine you have data for a period before and after the
implementation of a new policy.
Dummy Variable: Create a dummy variable that takes the value 0
for the period before the policy change and 1 for the period after
the policy change.
Regression Model: Include this dummy variable in your regression
model to account for the structural change in the data.
Dependent variable = β0 + β1 x X1 + β2 x Policy dummy + ε
The coefficient β2 now captures the average change in dependent
variable associated with the policy change.
3. Interactions
interaction terms by multiplying two (or more) dummy
variables, for example, an interaction between the policy change
dummy and a regional dummy.
Yi = β0 + β1 x X1 + β2 x Policy dummy + β3 x Regional dummy + β4
( Policy dummy + Regional dummy)+ ε
The interaction term (β4) captures how the effect of the policy
change differs across regions.
Other Functional Forms of Dummy Variables
1. Interaction effects
Dummy variables can be used to model interaction effects
between different categorical variables. For example, if you have
two categorical variables A and B, you can create dummy
variables for each category and an interaction term by multiplying
the two dummy variables.
Yi= β0 + β1 x Dummy₋A + β2 x Dummy₋ B + β3 (Dummy ₋A ⨯ Dummy₋
B) + εi
interaction term (β3) captures the joint effect of categories A and B.
2.Piece wise linear regression
Dummy variables can be used to model piecewise linear
relationships. This is helpful when you expect different slopes or
intercepts for different ranges of your independent variable.
Yi= β0 + β1 x X1 + β2 x Dummy₋indicator + β3 x (X1 x
Dummy₋indicator) + ε
Here, Dummy_Indicator takes the value 1 for observations within
a specific range and 0 otherwise.
3. Polynomial regression
Dummy variables can be used to model polynomial relationships.
For instance, you might use dummy variables to represent
different polynomial degrees.
2
Yi = β0 + β1 x X + β2 x X + β3 x Dummy₋indicator) + β4 x ( X x
2
Dummy₋indicator) + β5 x ( X x Dummy₋ indicator )+ εi
Dummy_Indicator takes the value 1 for observations where the
polynomial term is relevant and 0 otherwise.
4. Seasonal dummy variables
time series analysis, dummy variables are often used to model
seasonal effects. Each dummy variable represents a different
season.
Yi = β0 + β1 x X + β2 x Dummy₋winter + β3 x Dummy₋Spring + β4 x
Dummy₋ summer + β5 x Dummy₋fall + εi
The coefficients associated with seasonal dummy variables
capture the average change in the dependent variable during each
season.
Response Regression Models
Qualitative response regression models, also known as binary
choice models or binary response models, are used when the
dependent variable is categorical and takes on only two possible
outcomes. The most common example is a binary outcome, such 1.
as “success” or “failure,” “yes” or “no,” or “1” or “0.”
1. logistic regression
Logistic regression is a statistical method used for predicting the
probability of a binary outcome. It is commonly used when the
dependent variable is categorical and represents two classes,
such as 0 or 1, yes or no, true or false. Logistic regression models
the probability that an instance belongs to a particular category.
logistic function (sigmoid function) is at the core of logistic regres-
sion, ensuring that the predicted probabilities lie between 0 and 1.