0% found this document useful (0 votes)
40 views9 pages

Ecmt1010 Assignment

economics and statistics assessment

Uploaded by

joeymedxna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views9 pages

Ecmt1010 Assignment

economics and statistics assessment

Uploaded by

joeymedxna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

SID: 540737349

ECMT1010

1. Provide a clearly labelled scatter plot of height and weight in your 2004 sample.
Explain your choice of variables on the vertical axis and the horizontal axis of the
scatter plot. Comment on the scatter plot. [2 marks]

Vertical axis (y-axis): The weight in 2004. Weight is the dependent variable as it is
influenced by height in this situation.

Horizontal axis (x-axis): This is the height; it is the independent variable since its factors
will influence the weight of individuals.

The plot suggests a positive relationship between height and weight, indicating that, in
general, taller individuals tend to weigh more. However, there seems to be some variability,
as not all taller individuals are necessarily heavier.
2. Estimate the simple regression equation for weight in 2004 and height. Test for the
statistical significance of the relationship. List your notation, the null and alternative
hypotheses, the test statistic, decision rule, and conclusion to the test. [2 marks]

The regression equation is: weight2004 = -189.883 + 5.245 x ℎ𝑒𝑖𝑔ℎ𝑡𝑖𝑛𝑐ℎ𝑒𝑠

Slope: β1 = 5.245.

Intercept: β0 = −189.883.

Null and Alternative Hypotheses:

• 𝐻0 : 𝛽1 = 0 (There is no linear relationship between height and weight.)


• 𝐻𝑎 : 𝛽1 ≠ 0 (There is a linear relationship between height and weight.)

Test statistic:

̂1
𝛽
Test statistic for β1: 𝑡 = ̂1 )
𝑆𝐸(𝛽

β1= Slope estimate (5.245)

SE β1= Standard error of the slope (0.420)

Decision Rule

For this two-tailed test:

If ∣t∣> 𝑡𝛼,𝑛−2 , we reject the Null hypothesis, 𝐻Ο


2

By using the degrees of freedom n-2 = 398 (where n is the sample, 400). The critical value
for a level of significance at 5% can be found in the t-distribution table. The critical value for
T at a large DF at a 5% level is approximately 1.96.

Since the t-value that was calculated is (12.499) and is much larger than that of 1.96, we
reject the Null hypothesis, 𝐻Ο . This concludes that the relationship between height and
weight is statistically significant.

Conclusion: The slope β1= 5.245 is statistically significant and indicates a linear relationship
between height and weight in 2004. This indicates that taller individuals tend to have a higher
weight in 2004.
3. Give an interpretation of the intercept and slope in the regression equation. [2 marks]

Regression Equation:
weight2004 = -189.883 + 5.245 x ℎ𝑒𝑖𝑔ℎ𝑡𝑖𝑛𝑐ℎ𝑒𝑠

Intercept: −189.883, this means that a height of 0 inches the weight in -189.883 pounds.
This does not have a meaningful interpretation in this context because a height of 0 inches is
unrealistic.

Slope: 5.245, For each additional inch in height, weight in 2004 increases by 5.245 pounds
on average.

4. Convert the height and 2004 weight data to metric using the conversions 1 pound =
0.45359237 kilograms and 1 inch = 2.54 centimetres. Estimate the regression equation
using the metric data. What do you notice when you compare the R 2 for the metric
data and non-metric data equations? How do you explain your finding? [2 marks]

Regression Equation for Metric Data:

𝑊𝑒𝑖𝑔ℎ𝑡 2004𝑘𝑔𝑠 = β0+ β1 ×𝐻𝑒𝑖𝑔ℎ𝑡𝑐𝑚


𝑊𝑒𝑖𝑔ℎ𝑡 2004𝑘𝑔𝑠 = 152.81+0.242 × 𝐻𝑒𝑖𝑔ℎ𝑡𝑐𝑚

Compare R2 Values:

Metric data R2 = 0.2418

Non-metric data R2 = 0.2819

When comparing we find that the non-metric data has a higher R2 (0.2819) compared to the
metric data R2 (0.2418). With both values being relatively close in proximity to each other the
non-metric data shows 4% more variance in weight compared to the regression of the metric
data. The small difference within the R2 values is likely due to the conversion rates when
going from non-metric to metric data. As the weight and heights remain the same the
conversion causes a slight discrepancy in the regression. However, the overall models remain
consistent with the relationship between height and weight within both values.
5. You will see that the slope estimates from the metric data and non-metric data
regression equations are quite different. Show how the slope estimate from the metric
data can be derived exactly from the slope estimate from the non-metric data. [2 marks]

Regression model non-metric units:

𝑊𝑒𝑖𝑔ℎ𝑡𝑙𝑏𝑠 = −189.88 + 5.24496 x ℎ𝑒𝑖𝑔ℎ𝑡𝑖𝑛𝑐ℎ𝑒𝑠

Regression model for metric units:

𝑊𝑒𝑖𝑔ℎ𝑡 2004𝑘𝑔𝑠 = 152.81+ 0.242 × 𝐻𝑒𝑖𝑔ℎ𝑡𝑐𝑚

Convert Height (inches to centimetres):

1 inch = 2.54 cm.

Convert Weight (pounds to kilograms):

1 pound = 0.45359237 kg.

Derive metric slope from nonmetric slope:

𝛽1𝑛𝑜𝑛−𝑚𝑒𝑡𝑟𝑖𝑐 = 5.24496 pounds per inch


Convert to Kilograms per centimetre:
0.45359237
𝛽1𝑛𝑜𝑛−𝑚𝑒𝑡𝑟𝑖𝑐 = 5.24496 x
2.54
= 𝛽1𝑛𝑜𝑛−𝑚𝑒𝑡𝑟𝑖𝑐 = 5.24496 x 0.178

Slope estimate ≈ 0.933kgs/cm

The slope in metric units (𝛽1𝑚𝑒𝑡𝑟𝑖𝑐 ) can be derived from the slope in non-metric units
(𝛽1𝑛𝑜𝑛−𝑚𝑒𝑡𝑟𝑖𝑐 ) by multiplying it by approximately 0.178, which happens to be the factor
derived from the conversion of pounds to kilograms and inches to centimetres. The small
discrepancy between the values is due to the conversion rates changing when converting from
non-metric to metric values.
6. Estimate the simple regression equation for weight in 2011 and height. Explain why
the R2 for 2011 weight is lower than in the regression reported in Question 2. [2 marks]

The estimated regression equation for predicting weight in 2011 from height is:

𝑊𝑒𝑖𝑔ℎ𝑡2011= −201.440 + 5.588 × Heigh𝑡𝑖𝑛𝑐ℎ𝑒𝑠

Intercept: (-201.440): This value suggests that for a person with a height of 0 inches, the
predicted weight in 2011 would be −201.44 pounds.

Slope: (5.588): For every additional inch in height, the predicted weight in 2011 increases by
approximately 5.588 pounds.

R2: (0.242): About 24.2% of the variation in weight in 2011 is explained by height.

The R2 value is lower than that from the regression of weight in 2004 (as noted in the earlier
question 2), implying that height in 2011 explains less of the variance in weight compared to
2004. This might suggest that over time, weight gains or losses could be influenced more by
factors other than height.
7. Use the bootstrap to construct a 99% confidence interval for the population height
difference between females and males. Show your bootstrap distribution along with the
lower and upper bounds of the confidence interval. What does the confidence interval
imply about the null hypothesis that there is no difference in the population between
average female and male height? [2 marks]

The bootstrap analysis estimates a 99% confidence interval for the difference in average
height between males and females. The interval is approximately: (4.84,6.39) inches.

This means we can be 99% confident that the true difference between the average height
between males’ females falls within this range. Since the whole CI is above zero, it implies
that the male average height is significantly greater than that of the female average.

Implication for the Null Hypothesis


The null hypothesis (𝐻0 ) in this case would be there is no difference in the mean populations
(0). But, as the CI does not contain zero, we can reject the null hypothesis. This suggests that
there is a significant difference in height between males and females within the population.
8. Test the hypothesis that, on average, males have gained more weight than females
between 2004 and 2011. List your notation, the null and alternative hypotheses, the test
statistic, decision rule, and conclusion to the test. [2 marks]

Null Hypothesis (H0): There is no difference in the average weight gain between males and
females.
𝐻0 : μ male = μ female

Alternative Hypothesis (H1): Males have gained more weight on average than females.

𝐻𝑎 : μ male > μ female

Calculate the Weight Change

The weight change for individuals is defined as:


Weight Change = Weight2011 – Weight2004

- 𝑋̅ male = 12.716 (mean weight gain for males)


- 𝑋̅ female =10.704 (mean weight gain for females)
- Smale = 21 (𝜎 of weight gain for males)
- Sfemale = 22.388 (𝜎 of weight gain for females)
- Nmale = 197 (number of males)
- Nfemale = 203 (number of females)

Perform the Two-Sample T-Test

T compare the mean weight changes for males and females, we use an independent two-
sample t-test. The test statistic is calculated as follows:

𝑋̅𝑚𝑎𝑙𝑒 𝑋̅𝑓𝑒𝑚𝑎𝑙𝑒
𝑡=
2 2
𝑠𝑚𝑎𝑙𝑒 𝑠𝑓𝑒𝑚𝑎𝑙𝑒
√ +
𝑛𝑚𝑎𝑙𝑒 𝑛𝑓𝑒𝑚𝑎𝑙𝑒

Now substitute the values into the equation:


12.716−10.704
𝑡= 2 2
= 0.917
√21.501 + 22.388
197 203

p- value associated with this = 0.36

Decision rule:

At a significance level of α = 0.05:

• If p-value < α, we reject the null hypothesis.


• If p-value > α, we fail to reject the null hypothesis.
Since the p-value is greater than 0.05, we fail to reject the Null hypothesis. This means we do
not have enough evidence to conclude that males have gained more weight than females.

Conclusion:

Based on the results of the t-test, we conclude that there is no statistically significant evidence
that males gained more weight than females on average between 2004 and 2011.

9. According to Tucker & Parker (2022) Journal of Obesity, 16% of American adults
gained more than 2% of their body weight per year over a 10-year period. To test
support for this hypothesis over the 7-year period in your sample, construct a new
variable called GAIN14PCT in your data set such that GAIN14PCT = 1 if an individual
gained more than 14% of their initial weight between 2004 and 2011, and GAIN14PCT
= 0 otherwise. What is the interpretation, in plain English, of the mean of GAIN14PCT?
[2 marks]

The mean of GAIN14PCT is 0.215 or 21.5% which indicates the number of individuals who
have gained more than 14% of their body weight between 2004 and 2011. From the Tucker
and Parker report we can see that this is higher than their 14% over a ten-year period. This
suggests that the sample from this study experienced a slightly greater proportion of
significant weight gain in the 7-year period.

10. Test support for this hypothesis over the 7-year period in your sample. List your
notation, the null and alternative hypotheses, the test statistic, decision rule, and
conclusion to the test. [2 marks]

Null Hypothesis (𝐻0 ): True proportion of individuals who have gained more than 14% of
their body weight is 16%

𝐻0 : p = 0.16

Alternative Hypothesis (𝐻𝑎 ): True proportion of individuals who gained more than 14% of
their body weight is different to 16%The true proportion of individuals who gained more than
14% of their weight is different from 0.16.

𝐻𝑎 : p ≠ 0.16

From Question 9, we know that 21.5% of individuals in the sample gained more than 14% of
their initial weight. This gives us the sample proportion:

𝑝̂ = 0.215
Sample size: n = 400

Test Statistic

To perform a hypothesis test for proportions, we use the z-test for proportions. The formula
for the test statistic is:

𝑝̂−𝑝0
Where: 𝑧 = 𝑝 (1−𝑝0 )
√ 0
𝑛

• 𝑝̂ , is the sample proportion (0.215).


• 𝑝0, is the hypothesized proportion (0.16).
• n, is the sample size (400).

Calculate z-statistic:

Calculate the standard error of the proportion:

𝑝0 (1−𝑝0 ) 0.16−(1−0.16)
𝑆𝐸 = √ =√ = 0.0183
𝑛 400

0.215−0.16
Calculate z-statistic: 𝑧 = = 3.005
0.0183

Decision Rule

Conducting a two-tailed test at a 5% significance level, α = 0.05. The critical values for a
two-tailed test at the 5% significance level are ±1.96.

If ∣z∣ > 1.96, we reject the null hypothesis.

If ∣z∣ ≤ 1.96, we fail to reject the null hypothesis.

Conclusion

As the calculated z-value is 3.005 and it happens to be greater than that of 1.96, we reject the
Null Hypothesis (𝐻0 ). This gives us sufficient evidence to conclude that the proportion of
individuals who have gained more than 14% of their original body weight over the 7-year
period is different to the 16%. The data from Tucker & Parker's study may not hold for the
sample as the body weight gain from 2004-2011 is statistically different from the 16% that
they indicated.

You might also like