0% found this document useful (0 votes)
33 views13 pages

Group 2 - Chapter 3 - Multiple Regression Analysis Estimation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views13 pages

Group 2 - Chapter 3 - Multiple Regression Analysis Estimation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

ECONOMETRICS

MULTIPLE REGRESSION ANALYSIS ESTIMATION (CS)

GROUPS 2 :

Ni Wayan Elsa Iryanti NIM: 2207511013


Kadek Dwi Agestiari NIM: 2207511014
Ni Wayan Widarbayanti NIM: 2207511045
Ning Ai Satyawati NIM: 2207511046
Hilda Nurhidayati NIM: 2207511073
Dewa Ayu Intan Adi Ari NIM: 2207511240

LECTURERS
I Gusti Agung Ayu Apsari Anandari,M.S.E

FACULTY OF ECONOMICS AND BUSINESS


ECONOMETRICS 2022/2023
UDAYANA UNIVERSITY
2023
INTRODUCTION

Multiple regression analysis is more amenable to ceteris paribus analysis because it allows
us to explicitly control for many other factors that simultaneously affect the dependent variable.
This is important both for testing economic theories and for evaluating policy effects when we
must rely on nonexperimental data. Because multiple regression models can accommodate many
explanatory variables that may be correlated, we can hope to infer causality in cases where simple
regression analysis would be misleading.
3.1 Example

The 3-1 example is about a regression equation used to predict students' average grade point
average (GPA) in college based on their average high school GPA and ACT score. There are 141
students in this sample, and both GPA scores are measured on a four-point scale.
The regression equation used was:

If interpreted the equation becomes:


1. Intercept (1.29): This is the prediction of college GPA if high school GPA (hsGPA) and
ACT score are both zero. However, in the real context, no one has a zero high school GPA
or ACT score when entering college, so this intercept has no direct meaning.
2. Coefficient for hsGPA (0.453): This is the most important number in the equation. It
indicates that there is a positive relationship between college GPA (colGPA) scores. If we
compare two students with the same ACT score, but Student A's hsGPA is one point higher
than Student B's hsGPA, we predict that Student A will have a 0.453 higher colGPA than
Student B.
3. The coefficient of 0.0094 on ACT indicates that a 10-point change in ACT score only
affects colGPA by about ten percent of one point. This suggests that, after considering
hsGPA, ACT score is not a strong predictor of colGPA.

If we only consider a simple regression between colGPA and ACT, the equation is:
However, this equation does not allow us to compare two people with the same hsGPA. This
equation represents a different experiment. In other words, the regression equation is used to
understand how one variable (e.g., ACT or hsGPA) affects another variable (colGPA) in the
context of this data. In this case, hsGPA appears to have a greater influence than ACT on colGPA.

3.2 Example

An example given is a multiple regression analysis that uses data from 526 workers in WAGE1 to
explain the relationship between dependent variables (Y) : log (Wage) and three independent
variables (X): education (years of education), experience (years of labor market experience),
tenure (years with the current employer). The estimated equation is:

This equation is used to estimate the impact of education, experience, and tenure (years with the
current employer).
The coefficients of the equation are as follows:

The intercept is .284

The coefficient for education is .092

The coefficient for experience is .0041

The coefficient for tenure is .022

⮚ The equation shows that the intercept is .284 (which means that even if the value of the
independent variable (X) is equal to zero, the value of Y will still exist as much as the value of
the constant .284).
And the coefficients of education, experience, and tenure are .092, .0041, and .022, respectively.
The value of n is 526 which represents the number of observations used in the analysis. The
coefficients in the equation have a percentage interpretation

⮚ The coefficient on education is .092, which means that for every one-year increase in education,
holding experience and tenure is constant, log wages are expected to increase by .092 or 9.2%
⮚ The coefficient on experience is .0041, which means that for every one-year increase in
experience, holding education and tenure constant, log wages are expected to increase by .0041
or 0,41%
⮚ The tenure coefficient is 0.022, which means that for every one-year increase in tenure, holding
a constant education and experience, the log wage is expected to increase by .022 or 2,2%

In summary, this example shows how multiple linear regression models can be used to estimate
the relationship between a dependent variable and multiple independent variables. The coefficients
on the independent variable indicate how much the dependent variable is expected to change for
an increase of one unit in each independent variable, holding the other independent variable
constant.

3.3 Example

In the following regression equation, there are three variables, namely:

1. Participation Rate (prate): This is the dependent variable or the target to be predicted.
2. Plan Match Rate (mrate): This is the first independent variable. It measures the extent to
which the 401(k) plan match rate (employer contribution) affects the participation rate in
the retirement program.
3. 401(k) Plan Age: This is the second independent variable. It measures the extent to which
the age of the 401(k) plan affects the participation rate.

In the regression equation, there is an estimated regression coefficient for each independent
variable:
- Intercept (80.12): This is the expected value of the participation rate when both independent
variables, i.e. plan match rate and 401(k) plan age, are zero. In this context, it may not have any
real interpretation as both variables are unlikely to be zero in real situations.

- Coefficient for Plan Match Level (5.52): This is a regression coefficient that indicates how much
influence the level of plan suitability has on the participation rate. In this case, every 1 unit increase
in the plan match rate would be expected to increase the participation rate by 5.52, provided that
other factors remain constant.

- Coefficient for 401(k) Plan Age (0.243): This is the regression coefficient that indicates how
much influence 401(k) plan age has on the participation rate. In this case, every 1 unit increase in
401(k) plan age would be expected to increase the participation rate by 0.243, holding other factors
constant.

N = 1,534 indicates the number of observations or data used in this regression analysis.

Thus, this regression equation makes it possible to predict the participation rate in a 401(k)
retirement plan based on the plan match rate and the age of the 401(k) plan. You can state that the
plan match rate has a significant positive effect on the participation rate, while the 401(k) plan age
has a smaller but still significant positive effect on the participation rate. This statement illustrates
the importance of controlling for other variables that might affect the results in a regression
analysis. Let us further analyze what happens in this case:

1. Estimation of Age Effect: In the multiple regression, the age of the 401(k) plan (Age) is
the independent variable. This indicates that you have controlled for the age variable when
estimating the effect of the plan match rate (mrate) on the participation rate (prate). In other
words, when you look at the effect of mrate on prate in the context of multiple regression,
you are already considering that the age of the worker also plays a role.
2. Estimating the Effect Without Age Control: However, in the case of a simple regression of
prate on mrate, you no longer control for the age variable. Therefore, the estimation of the
effect of mrate on prate becomes more "coarse" as it does not consider any possible
influence of age. This is an important example of how control variables can affect
regression results.
3. Correlation Between mrate and Age: You also mentioned that the sample correlation
between mrate and age is only about 0.12. This indicates that, while there is a slight
correlation between the two variables, the correlation is not strong. In this context, this
imbalance between the mrate and age variables may account for the relatively small
difference in estimates between the simple regression and the multiple regression.

However, it is important to remember that when there are other factors that might affect the
dependent variable (in this case, prate), it is important to control for those factors as best as possible
to make the regression results more accurate. This is a basic principle in regression analysis: to
consider and control for variables that might affect the dependent variable so that the effect of the
variable being tested can be assessed more precisely.
In this context, although the simple regression estimation without controlling for age gives
different results from the multiple regression, the difference is not very large due to the weak
correlation between mrate and Age. But in other situations with more correlated variables or larger
effects, omitting the control variable may result in very different estimates.

3.4 Example

In the following example, information has been shared about the results of a regression that has
been done to predict college grade point average (GPA) based on the variables hsGPA (high school
GPA) and ACT (ACT test score).

In the regression results, the regression equation was given:

^colGPA = 1.29 + 0.453 hsGPA + 0.0094 ACT

- "n = 121" indicates that there are 121 samples or observations in this analysis.

- "R^2 = 0.176" is the coefficient of determination, which indicates how much variation in colGPA
can be explained by the regression model. In this case, R^2 is 0.176, which means that about 17.6%
of the variation in college GPA (colGPA) can be explained by the hsGPA and ACT variables that
have been included in the model. The remaining 82.4% cannot be explained by this model.

Your comment about the seemingly low R^2 level is correct. While hsGPA and ACT can provide
some insight into a student's performance in college, there are still many other factors that
influence college GPA, such as family background, personality, quality of high school education,
and interest in attending college. Therefore, the low R^2 indicates that this regression model is not
strong enough to explain the entire variation in college GPA.

It is important to understand that in the real world, phenomena such as performance in college are
often complex and influenced by many factors that are difficult to measure or include in regression
models. Therefore, interpretation of regression analysis results should always be done with
caution, and regression results are not always able to explain all aspects of a phenomenon.
3.5 Example

Data on arrests in 1986 can be found in CRIME1, along with other details on 2,725 men who were
born in California in either 1960 or 1961. Prior to 1986, each guy in the sample had at least one
arrest.

narr86: the number of times the man was arrested during 1986

pcnv: the proportion (not percentage)

avgsen: average sentence length served for prior convictions (zero for most people)

ptime86: months spent in prison in 1986

qemp86: the number of quarters during which the man was employed in 1986 (from zero to four).

First, we estimate the model without the variable avgsen. We obtain

narr86 = .712 - .150 pcnv - .034 ptime86 - .104 qemp86

n = 2,725, R2 = .0413.
If we increase pcnv by .50 (a large increase in the probability of conviction), then, holding the
other factors fixed, Δnarr86 = - .150 (.50) = -.075. This may seem unusual because an arrest cannot
change by a fraction. But we can use this value to a large group of men. For example, among 100
men, the predicted fall in arrests when pcnv increases by .50 is - 7.5.

Longer prison terms result in lower predicted arrests, with a ptime86 increase from 0 to 12 reducing
arrests by.0341122 5.408. Legal employment also lowers predicted arrests by.104, resulting in
10.4 arrests among 100 men.

The estimated equation is

narr86 = .707 - .151 pcnv + .0074 avgsen - .037 ptime86 - .103 qemp86

n = 2,725, R2 = .0422.

The average sentence variable increases R2 from.0413 to.0422, with a small effect. The coefficient
on avgsen indicates that longer average sentences increase criminal activity.

The second regression's four explanatory variables, which only explain 4.2% of the variation in
arrests, may still be reliable estimates of the ceteris paribus effects of each independent variable
on narr86. However, the accuracy of these estimates depends on the size of the R2 (Relative Error
Square) and the difficulty in predicting individual outcomes with high accuracy.

3.6 Example

The above example explains that log(wage)= β0+ β1educ+ β2abil+u satisfies Assumptions MLR.1
through MLR.4. With this model we try to understand the effect of education on wages. The
assumption is that the higher one's education level, the higher the wage they will receive.
Furthermore, we also want to understand the effect of ability on wages.The goal is to understand
how much education and ability affect wages.

The equation can be explained;

➢ Log(wage) is the dependent variable (y)


➢ β0 is the constant variable or the intercept of the regression model
➢ β1educ is the coefficient that measures the effect of education on wages (x1)
➢ β2abil is the coefficient measuring the effect of ability on wages (x2)
➢ u is the error terms or other variables that affect wages other than education and ability
with satisfies assumption MLR.1 through MLR.4 which means, MLR (Multiple Linear
Regression) assumptions are a group of assumptions that must be met so that the estimation
results of multiple linear regression are reliable. MLR.1 to MLR.4 refer to specific
assumptions in this regression model, including assumptions about the distribution of
errors (u), the interrelationships between independent variables, and so on. The model
meets Assumptions MLR.1 to MLR.4. This indicates that the results of the model are
reliable, and the underlying assumptions, such as normality of the error distribution,
homoscedasticity, no multicollinearity, and so on, have been met.

The data set in WAGE1 does not contain data on ability, which means it does not provide data on
the ability of individuals. However, due to our ignorance or data unavailability, we estimate the

model by excluding β2 ability, so we estimate β1 from the simple regression;

We use the symbol "~" rather than "^" to emphasize that this model comes from an underspecified
model. Since data on ability is not available, the coefficient (β1) measuring the effect of education
on wages (log(wage)) is estimated using a simple regression of the logarithm of wages (log(wage))
on education (educ).

An explanation of equation of the simple regression model used;

● The value of .584 is the estimate of β₀, the intercept of the regression model. It indicates
the value of the logarithm of wage when education (educ) and ability (abil) are zero.
● The value of .083 is the estimate or coefficient of β₁, which measures the effect of education
on the logarithm of wages.
● n is the number of observations in the sample or dataset. In this case, there are 526 data
used in the analysis.
● The coefficient of determination or R-squared (R²) is a measure that explains how much
variation in the dependent variable can be explained by the regression model. In the context
of this model, the R-squared is .186, which means about 18.6% of the variation in the
logarithm of wages can be explained by the independent variables (education and ability)
in the model.

So we can see that the result of this model is only on a single sample, so we cannot say that .083
is greater than β1; the actual education result could be lower or higher than 8.3% (and we will
never know for sure). However, we do know that the average of the estimates from all random
samples will be too large.
3.7 Example

The variable we want to explain, y=earn98 is labor market income in 1998, the year after the job
training program (which took place in 1997).

It is known that:

1. y = earn 98

2. w= train

3. variables we use earnings in 1996 (the year prior to the program), years of schooling (educ),
age, and marital status (married).

4. on train, -22.05

5. The average earnings for those who did not participate is gotten from the intercept, so
$10,610.

First, we estimate the model without control variables We obtain:


Therefore, participation in job training programs is partly based on past labor market outcomes
and partly voluntary. Therefore, random assignment is unlikely to be a good assumption. The
control variables used are income in 1996 (the year before the program), years of schooling
(educated), age, and marital status (married). As with the training indicator, marital status was
coded as a binary variable, if added with these four variables so the multiple regression:

This explains that pre-program income, education level, age, and marital status produce
significantly different estimates. or as explained in the formula, workers with more education also
earn more: about $363 for each additional year. also earn more: about $363 for each additional
year. The marriage effect is roughly as large as the employment effect training effect: ceteris
paribus, married men earn, on average, about $2,480 more than their single counterparts. But the
predictivity of the other control variables is also hard to explain but has a good working level.
CONCLUSION

Multiple regression analysis is a statistical technique that uses several explanatory variables to predict the
outcome of a response variable. In multiple regression analysis, the parameters are estimated using the
method of ordinary least squares. The R-squared value is used to determine how well the model fits the
data, and it represents the proportion of the variance for a dependent variable that's explained by an
independent variable. However, R-squared only works as intended in a simple linear regression model with
one explanatory variable. With a multiple regression made up of several independent variables, the R-
squared must be adjusted. Multiple regression analysis can be used to assess effect modification by
estimating a multiple regression equation relating the outcome of interest to independent variables. The
multiple regression model is based on the assumption that there is a linear relationship between the
dependent variables and the independent variables, the independent variables are not too highly correlated
with each other, yi observations are selected independently and randomly from the population, and residuals
should be normally distributed with a mean of 0 and variance σ.

You might also like