0% found this document useful (0 votes)
19 views

pdfresizer.com pdf-resize (1)

Uploaded by

Tafa Tulu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

pdfresizer.com pdf-resize (1)

Uploaded by

Tafa Tulu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Chapter 4: Multiple Regression Analysis

4.1. Model with Two Explanatory Variables


Although multiple regression equation can be fitted for any number of explanatory variables, the simplest
possible regression model, and three-variable regression will be presented for the sake of simplicity. It is
characterized by one dependent variable (Y) and two explanatory variables (X1 and X2). The model is given by:
Yi ý ò 0 û ò1 X 1 û ò 2 X 2 û u i 4.1

ò 0 = the intercept = value of Y when both X & X are zero


1 2

ò 1 = the change in Y, when X 1 changes by one unit, keeping the effect of X 2 constant.
ò 2 = the change in Y, when X 2 changes by one unit, keeping the effect of X 1 constant

4.2. Concept and Notations of Multiple Regression Models


4.2.1. Concept of Multiple Regression Models
Simple linear regression model (also called the two-variable model) is extensively discussed in the previous
chapter. Such models assume that a dependent variable is influenced by only one explanatory variable.
However, economic variables are influenced by several factors or variables. Hence, simple regression models
are unrealistic. There is no more practicality of such models except simple to understand. Very good examples,
for this argument, are demand and supply in which they have several determinants each.
Adding more variables to the simple linear regression model leads us to the discussion of multiple regression
models i.e. models in which the dependent variable (or regressand) depends on two or more explanatory
variables, or regressors.
ø Multiple regression is a statistical technique that can be used to analyze the relationship between a single
dependent variable and several independent variables.
ø The objective of multiple regression analysis is to use the independent variables whose values are known
to predict the value of the single dependent value.
ö The multiple linear regression (population regression function) in which we have one dependent variable
Y, and k explanatory variables, X 1 , X 2 ,...... X k is given by

Yi ý ò 0 û ò1 X 1 û ò 2 X 2 û .... û ò k X k û u i 4.2

Where, ò 0 ý the intercept = value of Yi when all X9s are zero


ò i = are partial slope coefficients
u i = the random term

1
In this model, for example, ò 1 is the amount of change in Yi when X 1 changes by one unit, keeping the effect of
other variables constant. Similarly, ò 2 is the amount of change in Yi when X 2 changes by one unit, keeping the
effect of other variables constant. The other slopes are also interpreted in the same way.

4.2.2. Assumptions of Multiple Regression Model


The assumptions of multiple linear regression model are the same as in the single explanatory variable model
developed earlier except the assumption of no perfect multicollinearity. These assumptions are:
Assumption 1: Linear regression model, or linear in the parameters.
Assumption 2: Zero mean of u i - the random variable u i has a zero mean for each value of X i i.e. E(ui ) ý 0

Assumption 3: Homoscedasticity of the random term - the random term u i has constant variance. In other

words, the variance of each u i is the same for all the X i values.

E(ui2 ) ý ô u2 Cons tan t

Assumption 4: Normality of u i - the values of each u i are normally distributed ui û N (0, ô u2 )


Assumption 5: No autocorrelation or serial independence of the random terms - the successive values of the
random terms are not strongly correlated. The values of u i are independent of the values of any other u j .

Assumption 6: Independence of u i and X i - every disturbance term u i is independent of the explanatory

variables. cov(ui, x1i ) ý cov(ui, x 2i ) ý 0


Assumption 7: The number of observations must be greater than the number of parameters to be estimated.
Assumption 8: No perfect multicollinearity among the X ' s - the explanatory variables are not perfectly linearly
correlated.
4.3. Estimation of Partial Regression Coefficients
In order to understand the nature of multiple regression model easily, we start our analysis with the case of two
explanatory variables, then extend this to the case of k-explanatory variables.

The process of estimating the parameters in the multiple regression model is similar with that of the simple
linear regression model. The main task is to derive the normal equations using the same procedure as the case of
simple regression. Like in the simple linear regression model case, OLS method can be used to estimate partial
regression coefficients of multiple regression models. The OLS procedure consists in so choosing the values of
the unknown parameters that the residual sum of squares is as small as possible.

2
The model: o
is multiple regression with two explanatory variables. The expected value of the above model is called
population regression equation.
i.e. , since &&&&&&&&&&&&&&&.4.4
Where is the population parameters. is referred to as the intercept and and are also sometimes
known as slopes of the regression. Note that, for example measures the effect on of a unit change in
X2 when X1 is held constant.
Since the population regression equation is unknown to any investigator, it has to be estimated from sample
data. Let us suppose that the sample data has been used to estimate the population regression equation. The
estimated sample regression equation, which we write as:
    &&&&&&&&&&&&&&&&&&&&&..4.5
Where  are estimates of the and  is known as the predicted value of Y.
Now it is time to state how (4.3) is estimated. Given sample observation on , and , we estimate (4.3)
using the method of least square (OLS).
  
is sample relation between , and .
    &&&&&&&&&&&&&&...4.7
And take summation and squaring both sides 3  3(    )

To obtain expressions for the least square estimators, we partially differentiate 3 with respect to  ,  and
 and set the partial derivatives equal to zero.
+3 +
3    &&&&&&&4.8


+3 +
3    &&&&&&4.9


+3 +
3    &&&&&&4.10


Summing from 1 to n, the multiple regression equation produces three Normal Equations:
3   3 + 3 &&&&&&&&&&&&&&.4.11
3  3  3  3 &&&&&&&&&..4.12
3  3  3  3 &&&&&&&&&..4.13
From (4.11) we obtain 

3
      &&&&&&&&&&&&&&&&&.4.14
Substituting (4.14) in (4.12), we get:
3      3  3  3

3 3 =  3  3 +  3  3 )
3  =  3  +  3   &&&&&&4.15
We know that
3   3
3  = (3  ) 3
Substituting the above equations in equation (4.14), the normal equation (4.12) can be written in deviation form
as follows:
3  3 + 3 &&&&&&&&&&&&&&&.4.16
Using the above procedure if we substitute (4.14) in (4.13), we get
3  3 + 3 &&&&&&&&&&&&&&&.4.17
Let9s bring (4.16) and (4.17) together
3  3 + 3 &&&&&&&&&&&&&&&.4.18
3  3 + 3 &&&&&&&&&&&&&&&.4.19
  can easily be solved using matrix
We can rewrite the above two equations in matrix form as follows.
3 3  3
[ ]* + [ ]&&&&&&&&&&&&&&&&&&&4.20
3 3  3
If we use Cramer9s rule to solve the above matrix we obtain
3 3 3 3
 &&&&&&&&&&&&&&&&&&&4.21
3 3 3

3 3 3 3
 &&&&&&&&&&&&&&&&&&&4.22
3 3 3

4
Using the above formula, ³Æ 0 ý ý23.75, ³Æ 1 ý ý0.25 and ³Æ 2 ý 5.5
Therefore, the fitted regression line will be

Æ ý ý23.75ý 0.25X û 5.5 X


Y 1 2

Interpretation:
ö One more year of experience, after controlling for years of education, results in $5,500 rise in salary, on
average.
ö Or, if we consider two persons (A & B) with the same level of education, the one with one more year of
experience (A) is expected to have a salary $5500 more than that of B.
ö Similarly, for two people (C & D) with the same level of experience, the one with an education of one
more year (D) is expected to have a salary $250 less than that of C.
ö Experience looks far more important than education (which has a negative sign).
4.4. Variance and Standard Errors of OLS Estimators
Estimating the numerical values of the parameters is not enough in econometrics if the data are coming from the
samples. The standard errors derived are important for two main purposes: to establish confidence intervals for
the parameters and to test statistical hypotheses. They are important to look into their precision or statistical
reliability.
Like in the case of simple linear regression, the standard errors of the coefficients are vital in statistical
inferences about the coefficients. The standard error of a coefficient is the positive square root of the variance of
the coefficient. Thus, we start with defining the variances of the coefficients.

Variance of the intercept ö÷ ò 0 ö÷


^

ø ø
ù 2
ý 2 ý ý ù
2
ú X 1 õ x 2 û X 2 õ x12 ý 2 X 1 X 2 õ x1 x 2 ú
2

ö ^ ö ^ ú1 û ú
Var ÷ ò 0 ÷ ý ô ui 4.23
ún ú
ø ø
ú õ 1õ 2 õ 1 2
x 2
x 2
ý ( x x ) 2
ú
úû úû
^
Variance of ò1
ö ^ ö ù
Var ÷ ò1 ÷ ý ô u ú
2 õ x 22 ù
4.24
2 ú
ø ø úû õ x1 õ x 2 ý (õ x1 x 2 ) úû
2 2

^
Variance of ò 2

5
^ ^ 2ù
Var ( ò 2 ) ý ô u ú
õ x12 ù 4.25
2 ú
ûú õ x1 õ x 2 ý (õ x1 x 2 ) ûú
2 2

Where,
^
ô u2 ý
õe 2
i
4.26
ný3
Equation 4.26 here gives the estimate of the variance of the random term. Then, the standard errors are
computed as follows:
^
Standard error of ò 0
^ ^
SE ( ò 0 ) ý Var ( ò 0 ) 4.27
^
Standard error of ò1
^ ^
SE( ò1 ) ý Var( ò1 ) 4.28
^
Standard error of ò 2
^ ^
SE( ò 2 ) ý Var( ò 2 ) 4.29
Note: The OLS estimators of the multiple regression model have properties which are parallel to those of the
two-variable model/simple linear regression model/.
4.5. Coefficient of Determination (R2) and Adjusted Coefficient of Determination (  2)
The coefficient of determination (R2): measures the proportion of variation in the dependent variable Y that is
explained by the explanatory variables (or by the multiple linear regression models). As in the case of simple
linear regression, R 2 is the ratio of the explained variation to the total variation. Mathematically:
3 3
=3 = = 3
&&&&&&&&&&&&&&&.4.30

Or R can also be given in terms of the slope coefficients   as:


2

 3  3
3
&&&&&&&&&&&&..&&&&4.31
In multiple linear regression, however, every time we insert additional explanatory variable in the model, the
R 2 increases irrespective of the improvement in the goodness-of- fit of the model. Thus, we adjust the R 2 as
follows:
 &&&&&&&&&&&&4.32

6
R² (R-squared) and Adjusted R²

Similarity
Both R² (R-squared) and Adjusted R² are measures used in regression analysis to evaluate how well the
independent variables explain the variation in the dependent variable.
Differences
1. R² (R-squared): Represents the proportion of variance in the dependent variable that is predictable from the
independent variables.
÷ Range: 0 to 1, where 0 indicates no explanatory power and 1 indicates perfect explanatory power.
÷ Sensitivity: Increases with the addition of more independent variables, even if they are not significant.
÷ R²: Useful for a general understanding of the model's explanatory power.
2. Adjusted R²: Adjusts R² for the number of predictors in the model. It accounts for the number of
independent variables and only increases if the new variable improves the model more than would be expected
by chance.
÷ Range: Usually slightly lower than R², especially when the number of predictors is high.
÷ The adjusted however, can sometimes be negative when the goodness of fit is poor. When the adjusted
value is negative, we considered it as zero and interpret as no variation of the dependent variable is
explained by regressors.
÷ Sensitivity: Provides a more accurate measure by penalizing the inclusion of non-significant variables. It
adjusts for the number of predictors to prevent over fitting.
÷ Adjusted R²: More useful when comparing models with different numbers of predictors, as it accounts for
over fitting.

4.6. Hypothesis Testing


In multiple regression models we will undertake two tests of significance. One is significance of individual
parameters of the model. This test of significance is the same as the tests discussed in simple regression model.
The second test is overall significance of the model.
4.6.1. Tests of individual significance
The tests concerning the individual coefficients can be done using the standard error test or the t-test. In all the
cases the hypothesis is stated as:
a) b) c) d)

7
A. Standard Error Test: Using the standard error test we can test the above hypothesis. Thus the decision rule
is based on the relationship between the numerical value of the parameter estimate and the standard error of the
same.
1 Æ
i. If S ( òÆ i ) þ ò i , we accept the null hypothesis, i.e. the estimate of ò i is not statistically significant.
2

Conclusion: The coefficient ( òÆ i ) is not statistically significant. In other words, it does not have a significant
influence on the dependent variable.

ii. If S ( òÆ i ) ü 1 òÆ i , we reject H0, i.e., we reject the null hypothesis in favour of the alternative hypothesis meaning
2

the estimate of òi has a significant influence on the dependent variable.

Generalization: The smaller the standard error, the stronger is the evidence that the estimates are statistically
significant.
B. t-test
The more appropriate and formal way to test the above hypothesis is to use the t-test. As usual we compute the
t-ratios and compare them with the tabulated t-values and make our decision.

Therefore ( )
Decision Rule: |tcal| < ttab, accept the null hypothesis (Ho)
|tcal| > ttab, reject the null hypothesis. Rejecting H 0 means, the coefficient being tested is significantly different
from 0 or the regressor variable X, significantly affects the dependent variable Y.

4.6.2. Testing the Overall Significance of Regression Model


Testing the overall significance of the model means testing the null hypothesis that none of the explanatory
variables in the model significantly determine the changes in the dependent variable. Put in other words, it
means testing the null hypothesis that none of the explanatory variables significantly explain the dependent
variable in the model. This can be stated as:
H 0 : ò1 ý ò 2 ý 0
H 1 : ò i ù 0, at least for one i.
The test statistic for this test is given by:

õ yÆ 2

Fcal ý k ý 1
õ e2
nýk

8
Where, k is the number variables in the model.
Recall, TSS = ESS + RSS, which decomposed the total sum of squares (TSS) into two components: explained
sum of squares (ESS) and residual sum of squares (RSS). A study of these components of TSS is known as the
analysis of variance (ANOVA) from the regression viewpoint.
The results of the overall significance test of a model are summarized in the analysis of variance (ANOVA)
table as follows.

Source of variation Sum of squares Degrees of Mean sum of squares Fcal


freedom
Regression ^2 k ý1 ^
MSE
SSE ý õ y MSE ý
õy 2
F ý
k ý1 MSR
Residual SSR ý õ e 2
nýk
MSR ý
õ e2
nýk
Total SST ý õ y 2
n ý1

Decision rule
Reject H 0 if Fcal þFñ (k ý 1, n ý k ) Where, Fñ (k ý 1, n ý k ) is the value to be read from the F- distribution table
at a given significant level.

Note
In practice the most commonly significant test is the p value (i.e., probability value), also known as the
observed or exact level of significance or the exact probability of committing a Type I error. More technically,
the p value is defined as the lowest significance level at which a null hypothesis can be rejected. To put it
differently, it is better to give up fixing ³ arbitrarily at some level and simply choose the p value of the test
statistic. Most software regression output reports the p value to indicate the overall and individual significant
test. If p value is less than or equal to 0.05 at ³ =5% with 95% CI (but can also 90% for social science), reject
the null hypothesis concluding it is significant. If p value is greater than 0.05 at 95% CI, accept the null
hypothesis concluding it is insignificant.
4.7. Other Functional Forms
If the model is non-linear in the parameters, it is non-linear regression model whether the variables of such
model are linear or not.

Example:

9
Some models may look nonlinear in the parameters but are inherently or intrinsically linear because
with suitable transformation they can be made linear-in-the-parameter regression models. NLRM means
when the models cannot be linearized in the parameters, with suitable transformation.
Example of nonlinear regression model
§ Consider now the famous Cobb3Douglas (C3D) production function. Letting
and, we can represent this function in three different ways.
Where we make a transform, it can be represented as

Where Thus, in this format the C3D function is intrinsically linear.


Generally, the linear regression model can have the following common functional form but not the only.

a) The Linear-linear form: It is based on the assumption that the slope of the relationship between the
independent variable and the dependent variable is constant. Such models expressed as:
b) Log-log form: Consider interpreting coefficients from a regression where the dependent and
independent variable of interest are in log form. Such models expressed as:
c) Linear-log form: The linear-log functional form is a variant of the linear-log equation in which the
independent variables are expressed in terms of their logs. Such models expressed as: +Ui
d) Log-linear form: The log-linear functional form is a variant of the log-linear equation in which the
dependent variables are expressed in terms of their logs. Such models expressed as:
Generally interpreted as the following table
Model If x increase by Then y will increases by
Linear-Linear 1 unit  units
Linear- Log 1% 
( )
Log-Linear 1 unit 100  %
Log-Log 1% 
Summary of functional forms
Model Form Slope

Linear-Linear
Linear- Log +Ui
Log-Linear

Log-Log
( )

10
Illustration: The following table shows a particular country9s the value of imports (Y), the level of Gross
National Product(X1) measured in arbitrary units, and the price index of imported goods (X2), over 12 years
period.
Table: Data for multiple regression examples
Year 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971
Y 57 43 73 37 64 48 56 50 39 43 69 60
X1 220 215 250 241 305 258 354 321 370 375 385 385
X2 125 147 118 160 128 149 145 150 140 115 155 152
a) Estimate the coefficients of the economic relationship and fit the model.
To estimate the coefficients of the economic relationship, we compute the entries given in following Table

Table: Computations of the summary statistics for coefficients for data of Table
Year Y X1 X2 x1 x2 Y X 12 x22 x1y x2y x1x2 y2
1960 57 220 125 -86.5833 -15.3333 3.75 7496.668 235.1101 -324.687 -57.4999 1327.608 14.0625
1961 43 215 147 -91.5833 6.6667 -10.25 8387.501 44.44489 938.7288 -68.3337 -610.558 105.0625
1962 73 250 118 -56.5833 -22.3333 19.75 3201.67 498.7763 -1117.52 -441.083 1263.692 390.0625
1963 37 241 160 -65.5833 19.6667 -16.25 4301.169 386.7791 1065.729 -319.584 -1289.81 264.0625
1964 64 305 128 -1.5833 -12.3333 10.75 2.506839 152.1103 -17.0205 -132.583 19.52731 115.5625
1965 48 258 149 -48.5833 8.6667 -5.25 2360.337 75.11169 255.0623 -45.5002 -421.057 27.5625
1966 56 354 145 47.4167 4.6667 2.75 2248.343 21.77809 130.3959 12.83343 221.2795 7.5625
1967 50 321 150 14.4167 9.6667 -3.25 207.8412 93.44509 -46.8543 -31.4168 139.3619 10.5625
1968 39 370 140 63.4167 -0.3333 -14.25 4021.678 0.111089 -903.688 4.749525 -21.1368 203.0625
1969 43 375 115 68.4167 -25.3333 -10.25 4680.845 641.7761 -701.271 259.6663 -1733.22 105.0625
1970 69 385 155 78.4167 14.6667 15.75 6149.179 215.1121 1235.063 231.0005 1150.114 248.0625
1971 60 385 152 78.4167 11.6667 6.75 6149.179 136.1119 529.3127 78.75022 914.8641 45.5625
Sum 639 3679 1684 0.0004 0.0004 0 49206.92 2500.667 1043.25 -509 960.6667 1536.25
Mean 53.25 306.5833 140.3333 0 0 0
From above Table, we can take the following summary results.

õY ý 639 õX 1 ý 3679 õX 2 ý 1684 n ý 12

õY639
õX 1
3679
õX 2
1684
Yý ý ý 53.25 , X 1 ý ý ý 306.5833, X2 ý ý ý 140.3333
n 12 n 12 n 12

11
The summary results in deviation forms are then given by:

õx 1
2
ý 49206.92 õx 2
2
ý 2500.667

õ x y ý 1043.25
1 õx 2 y ý ý509

õx x 1 2 ý 960.6667 õy 2
ý 1536.25

The coefficients are then obtained as follows.


^ øõ x y ùøõ x ùý øõ x y ùøõ x x ù (1043.25)(2500.667)- (-509)(960.6667) 2608821û 488979.4
2

ò1 ý ý ý
1 1 2 2 1 2

øõ x ùøõ x ù ý øõ x x ù
2
1
2
2 1 2
2
(49206.92)(2500.667)- (960.667) 123050121ý 922880.51 2

3097800.2
ý ý 0.025365
122127241

^ øõ x y ùøõ x ù ý øõ x y ùøõ x x ù (-509)(49206.92)- (1043.25)(960.6667) - 2504632-1002216


2

ò2 ý ý ý
2 1 1 1 2

øõ x ùøõ x ùý øõ x x ù
2
1
2
2 1 2
2
(49206.92)(2500.667)- (960.667) 123050121ý 922880.51 2

- 26048538
ý ý ý0.21329
122127241
òÆ0 ý Y ý òÆ1 X 1 ý òÆ2 X 2 ý 53.25 ý (0.025365) ý (ý0.21329) ý 75.40512
The fitted model is then written as: YÆi = 75.40512 + 0.025365X1 - 0.21329X2
b) Compute the variance and standard errors of the slopes.
First, you need to compute the estimate of the variance of the random term as follows

ô
^
2
ý
õe 2
i
ý
1401.223 1401.223
ý ý 155.69143
ný3 12 ý 3
u
9
^
Variance of ò1

ö ^ ö ù
Var ÷ ò1 ÷ ý ô u ú
2 õ x22 ù
ý 155.69143(
2500.667
) ý 0.003188
2 ú
ø ø ûú õ x1 õ x2 ý (õ x1 x2 ) ûú
2 2
12212724
^
Standard error of ò1

^ ^
SE( ò1 ) ý Var ( ò1 ) ý 0.003188 ý 0.056462
^
Variance of ò 2
ù
Var ( ò 2 ) ý ô u ú
^
õ x12 ^ 2 ù
ý 155.69143(
49206.92
) ý 0.0627
2 ú
ûú õ x1 õ x2 ý (õ x1 x2 ) ûú
2 2
122127241

12
^
Standard error of ò 2
^ ^
SE( ò 2 ) ý Var ( ò 2 ) ý 0.0627 ý 0.25046
Similarly, the standard error of the intercept is found to be 37.98177. The detail is left for you as an
exercise.
c) Calculate and interpret the coefficient of determination.
We can use the following summary results to obtain the R2.

õ yÆ 2
ý 135.0262

õe 2
ý 1401.223

õy 2
ý 1536.25 (The sum of the above two). Then,

^ ^
ò 1 õ x1 y û ò 2 õ x 2 y (0.025365)(1043.25)û (-0.21329)(-509)
R2 ý ý ý 0.087894
õy 2
1356.25

õe 2

1401.223
or R 2 ý 1 ý ý 1ý ý 0.087894
1356.25
õy 2

d) Compute the adjusted R2.


(n ý 1 12 - 1
2
Rady ý 1 ý (1 ý R 2 ) ý 1 - (1 - 0.087894) ý ý0.114796
n ý k) 12 - 3
e) Construct 95% confidence interval for the true population parameters (partial regression
coefficients).[Exercise: Base your work on Simple Linear Regression]

f) Test the significance of X1 and X2 in determining the changes in Y using t-test.


The hypotheses are summarized in the following table.
Coefficient Hypothesis Estimate Std. error Calculated t Conclusion

ò1 H0: ò1=0 0.025365 0.056462 0.025365 We do not


t cal ý ý 0.449249 reject H since
H1: ò1ù0 0.056462 0
tcal<ttab
ò2 H0: ò2=0 -0.21329 0.25046 ý 0.21329 We do not
t cal ý ý ý0.85159reject H since
H1: ò2ù0 ý 0.21329 0
tcal<ttab

13
The critical value (t 0.05, 9) to be used here is 2.262. Like the standard error test, the t- test revealed
that both X1 and X2 are insignificant to determine the change in Y since the calculated t values are
both less than the critical value.

Exercise: Test the significance of X1 and X2 in determining the changes in Y using the standard
error test.
g) Test the overall significance of the model. (Hint: use ñ = 0.05)
This involves testing whether at least one of the two variables X1 and X2 determine the changes in
Y. The hypothesis to be tested is given by:
H 0 : ò1 ý ò 2 ý 0
H 1 : ò i ù 0, at least for one i.
The ANOVA table for the test is give as follows:
Source of Sum of Squares Degrees of Mean Sum of Squares Fcal
variation freedom
Regression ^ 2 ^ MSR
SSR ý õ y ý 135.0262 k ý 1 =3- MSR ý
õ y2ý
135.0262
ý

MSE
1=2 k ý1 2 ý 0.433634
67.51309
Residual SSE ý õ e 2 ý 1401.223 n ý k =12- õ e 2 1401.223
3=9 MSE ý ý ý
nýk 9
155.614
Total SST ý õ y 2 ý 1536.25 n ý 1 =12-
1=11
The tabulated F value (critical value) is Fñ(2, 11) = 3.98. In this case, the calculated F value
(0.4336) is less than the tabulated value (3.98). Hence, we do not reject the null hypothesis and
conclude that there is no significant contribution of the variables X1 and X2 to the changes in Y.

h) Compute the F value using the R2.


(n ý k ) R2 (12 - 3) 0.087894
Fcal ý . ý ý 0.433632
k ý 1 (1 ý R )
2
3 - 1 1 ý 0.087894

4.8. Dummy Variable Regression Models

There are four basic types of variables we generally encounter in empirical analysis. These are:
nominal, ordinal, interval and ratio scale variables. In preceding sections, we have encountered
ratio scale variables. However, regression models do not deal only with ratio scale variables; they
can also involve nominal and ordinal scale variables. In regression analysis, the dependent variable
can be influenced by nominal variables such as sex, race, color, geographical region etc. models
where all regressors are nominal (categorical) variables are called ANOVA (Analysis of Variance)

14
models. If there is mixture of nominal and ratio scale variables, the models are called ANCOVA
(Analysis of Covariance) models.

Illustration: The following model represents the relationship between geographical location and
teachers9 average salary in public schools. The data were taken from 50 states for a single year. The
50 states were classified into three regions: Northeast, South and West. The regression models
looks like the following.
Yi ý ò 0 û ò1 D1i û ò 2 D2i û u i
Where Yi = the (average) salary of public school teachers in state i
D1i = 1 if the state is in the Northeast
= 0 otherwise (i.e. in other regions of the country)
D2i = 1 if the state is in the South
= 0 otherwise (i.e. in other regions of the country)
Note that the above regression model is like any multiple regression model considered previously,
except that instead of quantitative regressors, we have only qualitative (dummy) regressors.
Dummy regressors take value of 1 if the observation belongs to that particular category and 0
otherwise.
Note also that there are 3 states (categories) for which we have created only two dummy variables
(D1 and D2). One of the rules in dummy variable regression is that if there are m categories, we
need only m-1 dummy variables. If we are suppressing the intercept, we can have m dummies but
the interpretation will be a bit different.
The intercept value represents the mean value of the of the bench mark category. This is the
category for which we do not assign a dummy (in our case, West is a bench mark category). The
coefficients of the dummy variable are called differential intercept coefficients because they tell us
by how much the value of the intercept that receives the value of 1 differs from the intercept
coefficient of the benchmark category.
YÆi ý 26,158.62 ý 1734.47D1i ý 3264.62ò 2 D2i i

Se ý (1128.52) (1435.95) (1499.62)


t ý (23.18) (ý1.21) (ý2.18)

p ý value (0.000) (0.233) (0.0349) R 2 ý 0.0901

15
From the above fitted model, we can see that mean salary of public-school teachers in the West is
about $26,158.62. The mean salary of teachers in the Northeast is lower by $1734.47 than those of
the West and those in the South is lower by $3264.42. Doing this, we will find the average salaries
in the latter two regions are about $24,424 and $22,894.

Exercises
1. The following results were obtained from a sample of 12 agribusiness firms on their output (Y),
labour input (X1) and capital input (X2) measured in arbitrary units. (Hint: use 5% level of
significance).

õ Y ý 753 õY 2
ý 48,139 õ YX ý 40,830
1

õ X ý 643
1 õX 1
2
ý 34,843 õ YX ý 6,796
2

õ X ý 106
2 õX 2
2 ý 976 õ X X ý 5,779
1 2

a) Estimate and interpret the regression coefficients.


b) Compute the average and marginal productivity of labor and capital in the firms.
c) Compute the standard errors of the estimates.
d) Compute and interpret the coefficient of multiple determinations.
e) Calculate the adjusted R2.
f) Test significance of individual coefficients.
g) Test the overall significance of the model.
h) Identify the type of economies of scale (returns to scale) for the firm and advise.
2. The following is a log-transformed Cobb-Douglas Demand function for Potatoes in Ethiopia.
ln Yi ý ò 0 û ò1 ln X 1i û ò 2 X 2i û ò 3 ln X 3i û ò 4 ln X 4i û õ i
where Y is per capita consumption of Potato in Birr, X1 is real disposable per capita income in Birr,
X2 retail price of Potato per Kg, X3 retail price of Cabbage per Kg and X4 is retail price of
Cauliflower per Kg and õ is a random or error term.
a) How you will interpret the coefficients?
b) How you will state the following hypotheses:
a. Own price elasticity of demand is negative as predicted by economic theory?
b. Potato and Cabbage are unrelated commodities?
c. Potato and Cauliflower are substitutes?

16
Chapter 5: Econometric Problems

As discussed in the chapter 3 and 4, The estimates derived using OLS techniques and the
inferences based on those estimates are valid only under certain conditions. These conditions
amount to the regression model being "well-specified". A regression model is statistically well-
specified for an estimator (say, OLS) if all assumptions needed for optimality of the estimator are
satisfied.

Econometrics problems are a rise when the assumption of the classical linear regression model
(assumption of OLS is violated).
The common econometrics problems are;
1. Small Samples
2. Multicollinearity
3. Non-Normal Error terms: (
) is failed
4. Heteroskedasticity:

5. Autocorrelation:
6. Endogeneity
A. Stochastic Regressors and
Measurement Errors
B. Model Specification Errors:
a. Omission of Relevant
Variables
b. Wrong Functional Form
c. Inclusion of Irrelevant
Variables
d. Stability of Parameters
C. Simultaneity or Reverse Causality

17
5.1. Sample Size: Problems with Few Data Points
ø Requirement for estimation: n g K+1.
ø If n is small, it may be difficult to detect violations of assumptions.
ø With small n, it is hard to detect heteroskedasticity or non-normality of [i's even when
present.
ø Though no assumption is violated, a regression with small n may not have sufficient
power to reject ³j = 0, even if ³j b 0.
ø If [(K+1)/n] > 0.4, it will often be difficult to fit a reliable model.
ø Rule of thumb: aim to have ng6X & ideally ng10X.

5.2. Non-normality
This assumption is not very essential if the objective is estimation only. The OLS estimators are
BLUE regardless of whether the ui are normally distributed or not.

Reasons of non-normality
Extreme values or outliers
Two or more processes overlapping
Insufficient data
Data follows some other distribution
Consequence of non-normality
Non-Normality does not produce bias in the coefficient estimates, but it does have two
important consequences:
It poses problems for efficiency- i.e., the OLS standard errors are no longer the smallest.
Standard errors can be biased- i.e., confidence intervals and significance test may lead to
wrong conclusions.
Detecting non-normality
Commonly used method of detecting non-normality are
1. Histogram of residuals;
2. Normal probability plot (NPP), a graphical device;
3. Anderson3Darling normality test, known as the A2 statistic (MINITAB) and

18
4. The Jarque3Bera test.
Jarque3Bera (JB) Test of Normality is an asymptotic, or large-sample, test. It is also based on
the OLS residuals. This test first computes the skewness and kurtosis measures of the OLS
residuals and uses the following test statistic:

where n = sample size, S = skewness coefficient, and K = kurtosis coefficient. For a normally
distributed variable, S = 0 and K = 3. Therefore, the
JB test of normality is a test of the joint hypothesis that S and K are 0 and 3,
respectively. In that case the value of the JB statistic is expected to be 0.
Remedies for non-normality
Remove outliers
Take transformations
Fit other distributions
Use non-parametric tests
5.3. Multicollinearity
ø Multicollinearity refers to a situation with a high correlation among the explanatory
variables within a multiple regression model. For the obvious reason it could never appear in
the simple regression model, since it only has one explanatory variable.
Reasons of Multicollinearity
The data collection method employed: Example: If we regress on small sample values
of the population; there may be multicollinearity but if we take all the possible values, it
may not show multicollinearity.
Over-determined model: This happens when the model has more explanatory variables
than the number of observations. This could happen in medical research where there may
be a small number of patients about whom information is collected on a large number of
variables.
Wrong theoretical specification: Inappropriate construction of theory.

19
Consequence of Multicollinearity
1. If multicollinearity is perfect, the regression coefficients of the X variables are indeterminate
and their standard errors are infinite.
2. If multicollinearity is not perfect but high multicollinearity)
ü The regression coefficients are determinate
ü OLS coefficient estimates are still unbiased.
ü Significance higher R2
ü OLS coefficient estimates will have large variances
ü There is a high probability of accepting the null hypothesis of zero coefficient

Detecting Multicollinearity
Some of the most commonly used approaches are the following:

1. High R² but few significant t-ratios


This is the classical test or symptom of multicollinearity. Often if R² is high (R² > 0.8) the F-test
in most cases will reject the hypothesis that the partial slope coefficients are simultaneously
equal to zero, but the individual t-tests will show that none or very few partial slope coefficients
are statistically different from zero. So the combination of high R² with low calculated t-values
for the individual regression coefficients is an indicator of the possible presence of severe
multicollinearity.

2. High pair-wise (simple) correlation coefficients among the regressors (explanatory


variables).
If the R9s are high in absolute value, then it is highly probable that the X9s are highly correlated
and that multicollinearity is a potential problem.
3. VIF and Tolerance
VIF shows the speed with which the variances and covariances increase. It also shows how the
variance of an estimator is influenced by the presence of multicollinearity. VIF is defined as
follows:

Where is the correlation between two explanatory variables. As approaches 1, the VIF
approaches infinity. If there is no collinearity, VIF will be 1. VIF value of 10 or more shows
multicollinearity is sever problem. Tolerance is defined as the inverse of VIF.
20
4. Contingency Coefficient (CC): is used to detect the degree of association among dummy
explanatory variables. It measures the relationship between the raw and column variables of a
cross tabulation. The value ranges between 0-1, with 0 indicating no association between the raw
and column variables and value close to 1 indicating a high degree of association between
variables. The decision criterion, if the contingency coefficient value is (CC >0.75) the dummy
variables are said to be collinear.
5. Auxiliary regressions as a solution for multicollinearity: Since multicollinearity arises
because one or more of the regressors are exact or approximately linear combinations of the
other regressors, one way of finding out which X variable is related to other X variables is to
regress each Xi on the remaining X variables and compute the corresponding R2, which we
designate as R2 i ; each one of these regressions is called an auxiliary regression, auxiliary to the
main regression of Y on the X9s. One may adopt Klien9s rule of thumb, which suggests that
multicollinearity may be a troublesome problem only if the R2 obtained from an auxiliary
regression is greater than the overall R2.
Remedies for Multicollinearity
1. Dropping one or more of the multicollinear variables: When faced with severe
multicollinearity, one of the simplest way to get rid of (drop) one or more of the collinear
variables. Since multicollinearity is caused by correlation between the explanatory variables,
if the multicollinear variables are dropped the correlation no longer exists.
2. Increase the sample size: Another solution to reduce the degree of multicollinearity is to
attempt to increase the size of the sample. Larger data set (often requiring new data
collection) will allow more accurate estimates than a small one, since the large sample
normally will reduce somewhat the variance of the estimated coefficients reducing the impact
of multicollinearity.
5.4. Heteroscedasticity
In the classical linear regression model, one of the basic assumptions is that the probability
distribution of the disturbance term remains same over all observations of X; i.e. the variance of
each Ui is the same for all the values of the explanatory variable. This feature of homogeneity of
variance (or constant variance) is known as homoscedasticity.

21
Heteroscedasticity occurs when the error variance has non-constant variance. In this case, we
can think of the disturbance for each observation as being drawn from a different distribution
with a different variance.
Reasons of Heteroscedasticity
There are several reasons why the variance of the error term may be variable, some of which are
as follows.
Existence of outliers might also cause heteroscedasticity.
Misspecification of a model can also be a cause for heteroscedasticity.
Incorrect data transformation and incorrect functional form are also other sources of
heteroscedasticity.
Consequences of heteroscedasticity
If the error term has non-constant variance, but all other assumptions of the classical linear
regression model are satisfied, then the consequences of using the OLS estimator to obtain
estimates of the population parameters are:
The OLS estimator is still unbiased.
The OLS estimator is inefficient;
The estimated variances and covariance9s of the OLS estimates are biased and
inconsistent.
Hypothesis tests are not valid.
Detecting Heteroscedasticity
There are two methods of testing or detecting heteroscedasticity. These are:
I. Informal method (Graphic Method)
II. Formal method (Statistical test method)
I. Graphic Method (informal method)
The squared residuals can be plotted either against Y or against one of the explanatory variables.
If there appears any systematic pattern, heteroscedasticity might exist.
II. Formal method
A. Park Test

22
Park suggested a statistical test for heteroscedasticity based on the assumption that the variance
of the disturbance term (ói²) is some function of the explanatory variable Xi.
ò
Park suggested a functional form as: ó i ý ó 2 X i e vi which can be transferred to a linear function
2

using in transformation. Hence, Var(ei ) ý ó 2 X iòi e vi where vi is the stochastic disturbance term.

ln ó i ý ln ó 2 û ò ln X i û vi
2

ln ei ý ln ó 2 û ò ln X i û vi since ó² is not known.


2

The Park-test is a two-stage procedure: run OLS regression disregarding the heteroscedasticity
question and obtain the ei and then run the above equation. The regression is run and if ò turns

out to be statistically significant, then it would suggest that heteroscedasticity is present in the
data.
b. Spearman9s Rank Correlation test
ù õ di 2 ù
Recall that: rS ý 1 ý 6ú ú d = difference between ranks
ú
û N ( N 2
ý 1) ú
û
A high rank correlation suggests the presence of heteroscedasticity. If more than one explanatory
variable, compute the rank correlation coefficient of explanatory variable separately.
Note: There are also other methods of testing the existence of heteroscedasticity. These are
Goldfeld and Quandt Test, Breusch-Pagan Test, and White9s General Test
Remedial Measures of heteroscedasticity
OLS estimators are still unbiased even in the presence of heteroscedasticity. But they are not
efficient. This lack of efficiency makes the usual hypothesis testing procedure a dubious
exercise. Remedial measures are, therefore, necessary. Generally, the solution is based on some
form of transformation.
a) The Weighted Least Squares (WLS)
Given a regression equation model of the form
Yi ý ò 0 û ò 1 X i û U i

23
The weighted least square method requires running the OLS regression to a transformed data.
The transformation is based on the assumption of the form of heteroscedasticity.
b) Other Remedies for Heteroscedasticity
Two other approaches could be adopted to remove the effect of heteroscedasticity.
ø Include a previously omitted variable(s) if heteroscedasticity is suspected due to omission
of variables.
ø Redefine the variables in such a way that avoids heteroscedasticity. For example, instead
of total income, we can use Income per capita.
5.5. Autocorrelation
Autocorrelation or serial correlation refers to the case in which the error term in one-time
period is correlated with the error term in any other time period. In presence of autocorrelation
although the estimate remains unbiased and linear but will not have minimum variance.
Reasons of Autocorrelation
There are several reasons why serial or autocorrelation a rises. Some of these are:
A. Specification bias: This arises because of the following.
ü Exclusion of relevant variables from the regression model
ü Incorrect functional form of the model
B. Non-stationarity: When dealing with time series data, we should check whether the given
time series is stationary. A time series is stationary if its characteristics (e.g. mean, variance
and covariance) are time invariant; that is, they do not change over time. If that is not the
case, we have a non-stationary time series.
Consequences of Autocorrelation
a. OLS estimates are unbiased
b. The variance of OLS estimates is inefficient.
c. Wrong Testing Procedure
Detection of Autocorrelation
There are several methods for the detection of autocorrelation. Among which commonly
used methods are:
1. Informal or graphic method.

24
2. Formal method
A. Run test: Before going to the detail analysis of this method, let us define what a run is in this
context. Run is the number of positive and negative signs of the error term arranged in sequence
according to the values of the explanatory variables, like <++++++++---------------++++++++----
------++++++=

The above sequence has three runs, the first run is 8 pluses, the second one has 15 mines, the third
run has 7 pluses the forth run contains 10 minuses and the last one has 6 pluses. Assuming that N1
>10 and N2 >10, then the number of runs is normally distributed with the following mean and
variance:

25
Decision Rule: Do not reject the null hypothesis of randomness with 95% confidence if R, the
number of runs, lies in the preceding confidence interval; reject the null hypothesis if the
estimated R lies outside these limits.

B. Durbin-Watson test (d)

The test for serial correlation that is most widely used is the Durbin-Watson d test. This test is
appropriate only for the first order autoregressive scheme.

U t ý PU t ý1 û Et then Et ý PE t ý1 û U t

The test may be outlined as


HO : P ý 0

H1 : P ù 0

This test is, however, applicable where the underlying assumptions are met:
ø The regression model includes an intercept term
ø The serial correlation is first order in nature
ø The regression does not include the lagged dependent variable as an explanatory variable
ø There are no missing observations in the data
The equation for the Durban-Watson d statistic is
N

õ (e t ý et ý1 ) 2
d ý t ý2
N

õe
2
t
t ý1

Which is simply the ratio of the sum of squared differences in successive residuals to the RSS.
Note that the numerator has one fewer observation than the denominator, because an observation
must be used to calculate et ý1 . A great advantage of the d-statistic is that it is based on the
estimated residuals. Thus it is often reported together with R², t, etc.
The d-statistic equals zero if there is extreme positive serial correlation, two if there is no serial
correlation, and four if there is extreme negative correlation.
1. Extreme positive serial correlation: d û 0
et ý et ý1 so (et ý et ý1 ) û 0 and d û 0 .
2. Extreme negative correlation: d û 4

26
et ý ýet ý1 and (et ý et ý1 ) ý (2et )

thus d ý
õ (2e ) t
2

and d û4
õe
2
t

3. No serial correlation: d û 2
õ (e ý e õe û õ et ý1 ý 2õ et et ý1
2 2
t ý1 )2
d ý ý ý2
t t

õe õ et
2 2
t

Since õe et t ý1 ý 0 , because they are uncorrelated. Since õe


t
2
and õe t ý1
2
differ in only
one observation, they are approximately equal.
The exact sampling or probability distribution of the d-statistic is not known and, therefore,
unlike the t, X² or F-tests there are no unique critical values which will lead to the acceptance or
rejection of the null hypothesis.

But Durbin and Watson have successfully derived the upper and lower bound so that if the
computed value d lies outside these critical values, a decision can be made regarding the
presence of a positive or negative serial autocorrelation.
Thus
õ (e ý e õe û õ et ý1 ý 2õ et et ý1
2 2
t ý1 )2
dý ý
t t

õe õe
2 2
t t

ý 2(1 ý
õe e t t ý1
)
õe
2
t ý1

Æ ) since
d ý 2(1 ý P
õe e t t ý1 Æ
ýP
õe
2
t ý1

But, since 31 ó P ó 1 the above identity can be written as: 0 ó d ó 4


Therefore, the bounds of d must lie within these limits.

27
zone of indecision zone of indecision
f(d)

Reject H0 Reject H0

+ve -ve
autocorr. autocorr.

accept H0
no serial
correlation

0 d
dL dU 4-dU 4-dL 4

Thus if PÆ ý 0 ð d = 2, no serial autocorrelation.


if PÆ ý û1 ð d = 0, evidence of positive autocorrelation.
if PÆ ý ý1 ð d = 4, evidence of negative autocorrelation.
C. Higher order tests: the Breusch 3 Godfrey (BG) test
There are various higher order tests including the Ljung-Box Q (P2 statistic), Portmanteau test,
and Breusch-Godfrey (BG) test. The BG test is a general test for autocorrelation in the sense that
it allows for
a. non-stochastic regressors such as the lagged values of the regressand;
b. higher-order autoregressive schemes such as AR(1), AR (2) etc.; and
c. simple or higher-order moving averages of white noise error terms.

Remedial Measures of Autocorrelation


Different approaches are used to treat autocorrelation.
The best approach is to reformulate the regression model and re-estimate the regression
coefficients to eliminate the problem of autocorrelation. We can reformulate the model by
adding/removing a variable(s) or changing its functional form.

28
However, in some cases, the autocorrelation problem cannot be removed by adding/removing
the variable(s) or changing the model's functional form. In these cases, we can use the
Cochrane-Orcutt method to eliminate the autocorrelation problem.

Chapter 6: Non-linear Regression and Time Series Econometrics


6.1 Limited Dependent Variable

In multiple regression model, we have seen that the dependent variable will be quantitative and
independent variable will be either qualitative or quantitative or mixture of them. But often we will come
across situations where the dependent variable will be qualitative response variable 3 involving binary
outcomes or ordinal outcomes or nominal scale.

The most common with binary response regression model are

1. Linear Probability Model (LPM)


2. Logit Model
3. Probit Model
4. Tobit Modelvariables.
1. Linear Probability Model (LPM)

The Linear Probability Model uses OLS to estimate the model, the coefficients and t-statistics etc are then
interpreted in the usual way. This produces the usual linear regression line, which is fitted through the two
sets of observations.

29
Features of the LPM

1. The dependent variable has two values, the value 1 has a probability of p and the value 0 has a
probability of (1-p).

2. This is known as the Bernoulli probability distribution. In this case the expected value of a random
variable following a Bernoulli distribution is the probability the variable equals 1.

3. Since the probability of p must lie between 0 and 1, then the expected value of the dependent variable
must also lie between 0 and 1.

Problems with LPM1


1. The error term is not normally distributed, it also follows the Bernoulli distribution.
2. The variance of the error term is heteroskedastistic. The variance for the Bernoulli distribution is p(1-
p), where p is the probability of a success and (1-p) is the probability of failure.
3. The value of the R-squared statistic is limited, given the distribution of the LPMs.

1
The Bernoulli distribution is a discrete probability distribution for a random variable that has only two possible
outcomes: success (usually coded as 1) and failure (usually coded as 0).
30
4. Possibly the most problematic aspect of the LPM is the non-fulfilment of the requirement that the
estimated value of the dependent variable y lies between 0 and 1.
5. One way around the problem is to assume that all values below 0 and above 1 are actually 0 or 1
respectively.
6. The final problem with the LPM is that it is a linear model and assumes that the probability of the
dependent variable equalling 1 is linearly related to the explanatory variable.

For example if we have a model where the dependent variable takes the value of 1 if a farmer has
extension contact and 0 otherwise, regressed on the farmers education level. The probability of contacting
an extension agent will rise as education level rises.

The following model of technology adoption (TA) was estimated, with extension visit (EV) and education
(ED) as the explanatory variables. Regression using OLS gives the following result.

tÆi ý 2.79 û 0.76ei ý 0.12d i


(2.10) (0.06) (0.04)
R 2 ý 0.15, DW ý 1.78
ü 1 ý Adopted
týý
þ0 ý Not adopted

The coefficients are interpreted as in the usual OLS models, i.e. a 1% rise in extension contact, gives a
0.76% increase in the probability of technology adoption. The R-squared statistic is low, but this is
probably due to the LPM approach, so we would usually ignore it. The t-statistics are interpreted in the
usual way.

2 The Logit Model


Use logit models whenever your dependent variable is binary (also called dummy) which takes
values 0 or 1. Logit regression is a nonlinear regression model that forces the output (predicted
values) to be either 0 or 1. Logit models estimate the probability of your dependent variable to be
1 (Y=1). This is the probability that some event happens.
If we assume we have the following basic model, we can express the probability that y=1 as a cumulative
logistic distribution function.

31
Logit model output

Suppose the following logit regression output

32
Odds ratio

We can repost the odds ration rather than by adding the option <or= after comma.

3. The probit model


33
Like logit modle, probit model is used to estimate the probability of a binary outcome based
on one or more predictor variables. It is particularly useful when the response variable is
dichotomous (e.g., yes/no, success/failure).

ö Logit and probit models are basically the same, the difference is in the distribution:
Logit 3 Cumulative standard logistic distribution of the error terms (F)
ö Probit 3 Cumulative standard normal distribution of the error terms (§)
ö Both models provide similar results.
ö Interpretation of Coefficients: The coefficients logit represent the change in the log-odds of
the outcome for a one-unit change in the predictor whereas the coefficients of probit
represent the change in the z-score (standard normal score) for a one-unit change in the
predictor.
ö Field of Application: Logit is more common in social sciences and health research, whereas
probit is more common in economics.

The choice between logit and probit model

The between the two model are up to the researcher. But the two models have the following
difference. While it's true that logit and probit models often yield similar results, logistic
regression (logit) is generally preferred over probit regression for several reasons:

1. Easier Interpretation: Odds Ratios of logistic regression outputs can be directly converted to
odds ratios, which are more intuitive for many practitioners to interpret. Odds ratios explain the
relative change in odds of the outcome for a one-unit change in the predictor variable.

÷ Coefficients: The coefficients in logistic regression can be explained in terms of the change
in log-odds of the outcome, which is simpler than the change in z-scores used in probit
models.

2. Computational Simplicity: The logistic function used in logit models is computationally less
intensive than the normal CDF used in probit models. This makes logistic regression faster and
easier to implement, especially with large datasets.

3. Widely Used and Supported or Software Packages: Logistic regression is more widely
implemented across various statistical software packages, including Stata, SPSS, R, and Python.
This broad support ensures that practitioners have access to robust and well-documented tools
for their analysis.

34
÷ Standard Practice: Logistic regression is often the standard choice in many fields, such as
epidemiology, social sciences, and machine learning, due to its wide acceptance and
familiarity.

4. Robustness: Logistic regression is less sensitive to the underlying distribution assumptions


compared to probit regression. This robustness makes it a safer choice when the exact
distribution of the error terms is unknown.

5. Practical Application: Logistic regression performs well with real-world data, where the true
distribution of the underlying data is often not perfectly normal. Its flexibility and ease of
interpretation make it a practical choice for many applied researchers.

In general, while both logit and probit models are valuable for binary outcome modeling, logistic
regression is generally preferred due to its easier interpretation, computational simplicity,
widespread use, and robustness. These advantages make it a practical and efficient choice for
many applications.

Comparison of different limited probability models

The coefficient estimates from all three models are related.

ü According to Amemiya, if you multiply the coefficients from a Logit model by 0.625, they are
approximately the same as the Probit model.
ü If the coefficients from the LPM are multiplied by 2.5 (also 1.25 needs to be subtracted from the
constant term) they are approximately the same as those produced by a Probit model.

4. The Tobit model


Researchers sometimes encounter dependent variables that have a mixture of discrete and continuous
properties. The problem is that for some values of the outcome variable, the response has discrete
properties; for other values, it is continuous.

Censoring and truncation

35
Censoring is when the limit observations are in the sample (only the value of the dependent variable
is censored) and truncation is when the observations are not in the sample.

Censored sample: include consumers who consume zero quantities of a product.

Truncated sample: only include consumers who choose positive quantities of a product.

The censored sample is representative of the population (only the mean for the dependent variable is
not) because all observations are included. The truncated sample is not representative of the
population because some observations are not included.

Truncation has greater loss of information than censoring (missing observations rather than values for
the dependent variable).

Censored sample: observe people that do not work but their work hours are recorded as zero.

Truncated sample: do not observe anything about people who do not work.

A truncated sample will have fewer observations and higher mean (with censoring from below) than a
censored sample.

Because of censoring, the dependent variable y is the incompletely observed value of the latent
dependent variable y*

Example, Income of y*=120,000 will be censored as y=100,000 with top coding of 100,000.

Y is <censored= when we observe X for all observations, but we only know the true value of Y for a
restricted range of observations. If Y = k or Y > k for all Y, then Y is <censored from below=. If Y = k or
Y < k for all Y, then Y is <censored from above=. Y is <truncated= when we only observe X for
observations where Y would not be truncated.
Example 6.1: Non-censoring but truncation

No censoring or truncation

10

6
y

0
0 2 4 6
x
36
We observe the full range of Y and the full range of X

Censored from above Censored from below

10 10

8 8

6 6
y

y
4 4
2 2
0 0
0 2 4 6 0 2 4 6
x x

Here if Y g 6, we do not know its exact value Here, if Y <= 5, we do not know its exact value.

Truncated

10

6
y

0
0 2 4 6
x

Here if X < 3, we do not know the value of Y.

6.2. Time Series Econometrics

One of the important types of data used in empirical analysis is time series data. A time-series
is a set of observations on a quantitative or discrete variable collected over time. A time series
is a set of observations taken at a specified time, usually at equal time intervals. Time series
analysis is important to analysis the past behavior of a variable in order to predict its future
behavior.

A random or stochastic process is a collection of random variables ordered in time. A type of


stochastic process that has received a great deal of attention and scrutiny by time series

37
analysts is called stationary stochastic process.

Broadly speaking, a stochastic process is said to be stationary if its mean and variance are
constant over time and the value of the covariance between the two time periods depends only
on the distance or gap or lag between the two time periods and not the actual time at which the
covariance is computed. In the time series literature, such a stochastic process is known as a
weakly stationary, or covariance stationary, or second-order stationary, or wide sense,
stochastic process. In most practical situations, this type of stationarity often suffices.

In other words, a non-stationary time series will have a time-varying mean or a time-varying
variance or both. In order to model a time series, the series has to be stationary. In practical
terms, the series is stationary if it tends to wander more or less uniformly about some fixed
level. In statistical terms, a stationary process is assumed to be in a particular state of statistical
equilibrium.
6.2.1. Objectives of time series

ø To understand the past behaviors of data to forecast future behaviors.


ø To compare the actual performance with expected performance and analyze causes
of variation.
ø Planning future operations.
ø Evaluate current accomplishments/performances to ascertain causes of poor
performance.
ø Useful in planning, administration, business, social sciences and other area of
human knowledge.

6.2.2. Components of a Time Series

ö Trend
The general tendency of a time series to increase, decrease or stagnate over a long period of
time.

ö Seasonal variation

This component explains fluctuations within a year during the season, usually caused by climate

38
and weather conditions, customs, traditional habits, etc.

ö Cyclical variation

This component describes the medium-term changes caused by circumstances, which repeat in
cycles. The duration of a cycle extends over longer period of time.

ö Irregular variation

Irregular or random variations in a time series are caused by unpredictable influences, which are
not regular and also do not repeat in a particular pattern.

These variations are caused by incidences such as war, strike, earthquake, flood, revolution, etc.

There is no defined statistical technique for measuring random fluctuations in a time series.

6.2.3. Model of time series

Time series modeling involves analyzing time-ordered data to extract meaningful statistics and
make forecasts. Here are some popular models used in time series analysis:

1. Purely Random Process (White Noise):

÷ Definition: A purely random process, also known as white noise, is a time series where each
value is a completely random draw from a fixed probability distribution. Each value is
independent of previous values; values fluctuate around a fixed mean (typically zero) and no
long-term trend or pattern.
÷ Characteristics:
o Zero Mean: The average value of the series is zero.
o Constant Variance: The variance is constant over time.
o No Autocorrelation: There is no predictable pattern; past values provide no
information about future values.
÷ Mathematical Representation: where is a white noise process with mean zero
and constant variance à . Example: Fluctuations in day-to-day stock market returns around a
2

mean value with no apparent trend or pattern.

2. Random Walk: A random walk is a time series where each value is the sum of the previous
value plus a random error term. It's a non-stationary process often used to model certain financial
time series.

39
Each value depends on the previous value plus a random step.
Values tend to wander without a fixed mean.
Can exhibit long-term trends and is inherently non-stationary.

3. Autoregressive (AR) Models: An autoregressive model is a type of time series model where
the current value is regressed on its own past values (lags). Stationary (in most cases may not
always); Many AR models are stationary, meaning their mean and variances are constant over
time.

AR(p) Model: Represents a time series that regresses on its own past values.

Mathematical Form:

4. Moving Average (MA) Models:

MA(q) Model: Represents a time series based on the past forecast errors.
Mathematical Form

This indicated that the current error is based on the last error

5. Autoregressive Moving Average (ARMA) Models:

ARMA(p, q) Model: Combines AR and MA models.

Mathematical Form
This indicated that both past values and errors influence the current value.

6. Autoregressive Integrated Moving Average (ARIMA) Models:

ARIMA (p, d, q) Model: Extends ARMA by including differencing to handle non-stationary


data.
Mathematical Form: after differencing d
times.
This indicated that the data is differenced once to achieve stationarity before applying the
ARMA model.

7. Seasonal ARIMA (SARIMA) Models:

SARIMA(p, d, q)(P, D, Q)[s]: Extends ARIMA by including seasonal components.


40
Incorporates seasonal autoregressive, differencing, and moving average terms.

8. Vector Autoregressive (VAR) Models:

VAR(p) Model: For multivariate time series where each variable is a linear function of past
values of itself and other variables.
Mathematical Form: Each variable in the system is modeled as a linear combination of
lagged values of all variables in the system.
This indicated that model for two time series Y1Y_1 and Y2Y_2.

9. Exponential Smoothing Models:

Simple Exponential Smoothing: For data with no trend or seasonality.


Holt9s Linear Trend Model: For data with a trend.
Holt-Winters Seasonal Model: For data with both trend and seasonality.
Mathematical Form: Applies exponential smoothing to level, trend, and seasonal
components.

10. Generalized Autoregressive Conditional Heteroskedasticity (GARCH) Models:

GARCH(p, q): Models time-varying volatility.


Mathematical Form: Variance of the error terms is modeled as a function of past squared
errors and past variances.
This indicated that current volatility depends on last period's squared error and past volatility.

Each of these models has specific use cases, depending on the characteristics of the time series
data being analyzed. Choosing the right model involves understanding the underlying data
patterns and the objectives of the analysis of the time series data.

41

You might also like