0% found this document useful (0 votes)
17 views7 pages

Home Work 1: Group Member Student Name ID Contribution

The document outlines a homework assignment involving contributions from group members and exercises related to regression analysis based on the Chris Brooks and Wooldridge texts. It covers concepts such as the population regression function, ordinary least squares estimators, hypothesis testing, and the implications of various assumptions in regression models. Additionally, it discusses the limitations of simple linear regression and the importance of accounting for confounding variables in establishing causal relationships.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views7 pages

Home Work 1: Group Member Student Name ID Contribution

The document outlines a homework assignment involving contributions from group members and exercises related to regression analysis based on the Chris Brooks and Wooldridge texts. It covers concepts such as the population regression function, ordinary least squares estimators, hypothesis testing, and the implications of various assumptions in regression models. Additionally, it discusses the limitations of simple linear regression and the importance of accounting for confounding variables in establishing causal relationships.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

HOME WORK 1

GROUP MEMBER

Student Name ID Contribution

Mai Anh Toàn 31231025495 100%

Đậu Phương Uyên 31231026102 100%

Phạm Minh Trí 31221020391 100%

Phạm Trọng Tuyển 31231021256 100%

Đặng Minh Tiến 31231020840 100%

Đào Xuân Vũ 31231020581 100%

EXERCISE OF CHAPTER 3 CHRIS BROOKS TEXT


1. (a)
Using vertical distances, instead of horizontal ones, aligns with the principle that the independent
variable X remains constant across repeated samples. The model's goal is to determine the most
appropriate value of Y for a given X based on the chosen model. In contrast, using horizontal
distances would suggest that Y is fixed, and the objective would shift to identifying the
corresponding values of X.
(b)
The vertical distances are squared before summing them to address the issue of cancellation
when calculating the deviations of points (yt ) from the fitted values (ŷt). Some points lie above
the line (yt > ŷt), while others fall below it (yt < ŷt), resulting in positive residuals for the former
and negative for the latter. If the residuals (yt - ŷt) were simply summed, their positive and
negative values would largely cancel each other out. This cancellation would allow infinitely
many lines to have an average residual of zero. All deviations must positively contribute to the
loss measure, which is ensured by squaring the residuals., avoiding cancellation. This method
provides the foundation for ordinary least squares, which determines the intercept and slope of
the best-fit line.
(c)
Another way to stop the positive and negative residuals from canceling each other out is to
minimize the sum of the absolute residuals. However, working with the absolute value function
is more complex than using squared terms. Squared terms are easier to differentiate, allowing for
simpler derivation of analytical formulas for the mean and variance.
2.
The population regression function (PRF) is a description of the model that is thought to be
generating the actual data and describes the true relationship between the dependent variable y
and the independent variable x in the entire population. It can be written as:
y t = α + β x t + ut
The PRF is theoretical because it represents the underlying "true" process (also referred to as the
data generating process - DGP) that produces the data. The disturbance term u reflects all factors
affecting y not included in the model. In some textbooks, a distinction is drawn between the PRF
and the DGP.
The sample regression function (SRF) is an estimated version of the PRF based on sample data.
It is expressed as:
^y t =a^ + ^β x t
Unlike the PRF, the SRF does not include the disturbance term u because the it focuses only on
the expected value for y ( denoted ^y ) based on the estimated parameters from the sample data. It
predicts the average value of y for a given x based on the sample data. It is also possible to write
y t =a^ + ^β x t +u^ t
The SRF can decompose the observed value y into two components: the fitted value and the
residual term. The SRF is used to infer likely values of the PRF. Through sample data, the SRF
provides estimated values (a^ , ^β ), which are then used to infer the true relationship in the
population.
3.
An estimator is a mathematical formula used to calculate the coefficients that define the
relationship between two or more independent variables.
Ordinary least squares (OLS) is generally considered a dependable option among the myriad of
potential estimators. Because OLS produces the lowest variance among all linear unbiased
estimators, it is regarded as the "best" estimator. The sampling variance is less than that of any
other linear and unbiased estimator. It would be possible to design an estimator that has a lower
sample variance than OLS, but it would inevitably be biased, non-linear, or both. Thus, the
choice of estimator involves a trade-off between variance and bias.

4.
1. E(ut ) = 0 : the expected value of the error term ut in the regression model is 0. In other
words, the mean of the errors across all observations is 0. This assumption implies that
the errors do not exhibit any systematic bias.
2. Var (ut ) = σ² < ∞ : the variance of the error termut in the regression model is constant and
finite for all values of the independent variable x t . This helps to reduce bias in the
estimates..
3. cov (ui , u j) = 0: the errors ui và u j in the regression model are not linearly correlated with
each other. In other words, the error at observation i is independent of the error at
observation j.
4. cov (ut , x t ) = 0: there is no relationship (no correlation) between the error term ut and the
corresponding independent variable x t . This ensures that the errors in the model are
completely random and not related to the independent variable.
5. ut ~ N(0, σ²): the error term ut in the regression model follows a normal distribution with
a mean of 0 and constant variance σ².
Making the first four assumptions, these are necessary to prove that the OLS estimators are
BLUE (Best Linear Unbiased Estimators) as stated by the Gauss-Markov Theorem. They ensure
that the estimators are unbiased and have the smallest variance among all linear unbiased
estimators. If violated, the OLS estimators may become biased or inefficient, leading to
unreliable results.
The fifth assumption states that the disturbance term ut follows a normal distribution. This
assumption is made to support statistical inference from the sample to the population. It allows
the application of statistical tests, as these rely on the normality of the disturbances. It ensures
that OLS estimates can be used to test hypotheses about the coefficients. If the disturbances do
not follow a normal distribution, statistical tests may become inaccurate or lose their
significance. Making this assumption implies that test statistics will follow a t-distribution
(provided that the other assumptions also hold).

5.
(3.39)
y t =α + β x t +ut
Because the parameters are linear, this model can be estimated using OLS.
(3.40)
α β u
y t =ⅇ x t ⅇ ↔ ln ( y t )=α + β ln (x ¿¿ t)+ut ¿
t

Give ln ( y t )=Y t , ln (x¿ ¿t )=X t ¿


→ Y t =α + β X t +ut
Because the parameters are linear, this model can be estimated using OLS.
(3.41)
y t =α + βγ x t +ut
Multiple and give βγ =B
→ y t =α + B x t +u t
Because the parameters are linear, this model can be estimated using OLS.
(3.42)
ln ( y t )=α + β ln ( x t ) +ut
Give ln ( y t )=Y t , ln (x¿ ¿t )=X t ¿
→ Y t =α + β X t +ut
Because the parameters are linear, this model can be estimated using OLS.
(3.43)
y t =α + β x t z t +ut
Multiple and give β z t =B
→ y t =α + B x t +u t
Because the parameters are linear, this model can be estimated using OLS.
6.
❖ The null hypothesis ( not risky ) → H0: b = 1
❖ A one-sided alternative hypothesis ( more risky ) → H1: b > 1

The test statistic :


β∗¿
test stat=β− ¿
SE ¿ ¿

β∗is the value of β under the null hypothesis .

T-2 is the degrees of freedom ⇒ 62 - 2 = 60


- We have T as the sample size, which equal to 62 daily observations
-
- Looking at the T-table ( we are doing a 1-sided test with a 5% level )
➔ critical t-value is 1.671
Since 2.682 is greater than 1.671 → reject the null hypothesis
→ the beta coefficient for the security is greater than one which suggests that the security is
more risky, on average, than the market as a whole.
→ Therefore, the analyst's claims are not empirically verified. The evidence suggests that the
security is more risky than the market.

9.
In hypothesis testing for regression analysis, hypotheses are tested using the actual
coefficients, not the estimated values.
Hypothesis testing in statistics is generally concerned with making inferences about
population parameters based on sample data. The estimated coefficients are derived from a
specific sample and are subject to sampling variability.
Hypothesis testing accounts for this variability using the sampling distribution of which
allows researchers to infer properties about the actual coefficients in the population.
By testing the actual coefficients, the results can be generalized to the entire population.
If we tested the estimated coefficients directly, the conclusions would be valid only for the
specific sample and not for the population.
Hypotheses are tested concerning the actual values of the coefficients because the goal is
to infer properties about the population, not just describe the specific sample at hand. The
estimated values serve as the basis for this inference but are not the direct focus of the hypothesis
test.
CHAPTER 2 WOOLDRIDGE TEXT 7E
Exercise 4
(i).
when cigs = 0, the predicted birth weight is 119.77 ounces
when cigs = 0, the predicted birth weight is 109.49 ounces
 When compared to not smoking at all, the expected birth weight drops by 10.28 (8.6%)
ounces when the mother smokes 20 cigarettes a day.
(ii).
No, a causal relationship is not always captured by this regression. Since smoking behaviors and
baby birth weight may be influenced by various factors, the relationship is correlational. To
prove causation, a more thorough analysis that accounts for confounding variables is needed.
(iii).
When predicted birth weight is 125 ounces, we have equation
125=119.77-0.514cigs
 Cisg ≈ -10.18
Comment: This result suggests that the mother would have to smoke approximately -10
cigarettes per day. This outcome is unrealistic because the amount of cigarettes smoked cannot
be negative. Furthermore, a birth weight of 125 ounces is greater than the 119.77 ounces
maximum projected weight for cigs=0. Consequently, this model is unable to predict a birth
weight of 125 ounces. This demonstrates the drawbacks of attempting to forecast something as
complicated as birth weight using a straightforward linear regression model with just one
explanatory variable (cigarettes).
(iv). The percentage of pregnant women who abstain from smoking is 0.85 (85%). This suggests
that the majority of the sample's women do not smoke, and the results strongly mirror those of
moms who do not smoke. Given that smoking is uncommon in this population, the regression
equation's projections for greater birth weights are more in line with those of non-smokers,
confirming that smokers in the sample cannot realistically have a birth weight of 125 ounces.

Exercise 5
(i). The intercept suggests that when income is zero, the predicted consumption is -$124.84. This
unrealistic prediction highlights that this consumption model may not accurately predict
consumption at extremely low-income levels. However, viewed on an annual basis, -$124.84 is
relatively close to zero.
(ii). Income = 30,000$, based on the equation, we calculate
Predicted consumption= –124.84 + .853(30,000) = 25,465.16 $
( iii ).
Exercise 6
(I). The coefficient represents the percentage change in housing price/log (price) for a 1%
change in the distance from the incinerator/log (dist). Yes, it is. It is obvious that living nearer to
an incinerator depreciates housing prices; however, living farther away appreciates housing
prices.
(II). Simple regression fails to provide an unbiased estimate of the ceteris paribus elasticity of
price concerning dist. The Decision of the city to locate the incinerator farther from higher-
priced neighborhoods introduces a positive correlation between log (dist) and housing quality, an
omitted variable that influences housing prices. This violates the SLR.4 assumption, which
requires that the error term be uncorrelated with the explanatory variable. As a result, the OLS
estimates are biased and fail to isolate the true ceteris paribus effect of dist on price.
(III). House size, number of bathrooms, lot size, home age, and neighborhood quality (such as
school quality) are examples of factors that, as noted in (II), could be associated with dist/log
(dist)

You might also like