0% found this document useful (0 votes)
51 views16 pages

Math2010 - Statistical Methods I: Stefanie Biedermann

1) The document discusses using simple linear regression to model the relationship between body fat percentage and skinfold measurements in athletes. 2) It finds that transforming both variables to logs stabilizes the variability and produces a good linear fit. 3) The regression analysis results in estimates of the intercept and slope, and confidence intervals show skinfold is a useful predictor of body fat percentage.

Uploaded by

Joshua Halim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views16 pages

Math2010 - Statistical Methods I: Stefanie Biedermann

1) The document discusses using simple linear regression to model the relationship between body fat percentage and skinfold measurements in athletes. 2) It finds that transforming both variables to logs stabilizes the variability and produces a good linear fit. 3) The regression analysis results in estimates of the intercept and slope, and confidence intervals show skinfold is a useful predictor of body fat percentage.

Uploaded by

Joshua Halim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Math2010 - Statistical Methods I

Chapter 2 - Simple Linear Regression


(Notes 5)
Stefanie Biedermann

2.16 Bodyfat Example

Knowledge of the fat content of the human body is


physiologically and medically important. The fat content may
influence susceptibility to disease, the outcome of disease, the
effectiveness of drugs (especially anaesthetics) and the ability to
withstand adverse conditions including exposure to cold and
starvation.

In practice, fat content is difficult to measure directly one way is


by measuring body density which requires subjects to be
weighed underwater! For this reason, it is useful to try to relate
simpler measures such as skinfold thicknesses (which are
readily measured using calipers) to body fat content and then
use these to estimate the body fat content.

Dr R. Telford collected skinfold (the sum of four skinfold


measurements) and percent body fat measurements on 102 elite
male athletes training at the Australian Institute of Sport.

The scatterplot of the data shows that the percent body fat
increases linearly with the skinfold measurement

From the plot we might also be concerned that the variation in


percent body fat seems to increases with the skinfold
measurement.

To take a closer look we can fit a simple linear regression model


and examine the diagnostic plots as well

From the scatter plot there do not seem to be any unusual data
points and there are no other obvious patterns to note.

20
18
16
14
Fat
12
10
8
6

40

60

80

100

Skinfold

Figure: Scatterplot of bodyfat against skinfold

We can see in the residual plot that there is a fan or funnel shape
evident, giving more evidence to our concern of heteroscedacity

The normal probability plot of the residuals also seems to


indicates that the normality assumption is not appropriate in the
tails of the distribution

2
1
0

Sample Quantiles

-2

-1

0
-2

-1

Residuals

Normal Q-Q Plot

10

12
Fitted values

14

16

18

-2

-1

Theoretical Quantiles

Figure: Anscombe and normal probability plot from a linear regression of


bodyfat on skinfold

It is often the case that we can simplify the relationships


exhibited by data by transforming one or more of the variables.

In the example, we want to preserve the simple linear


relationship for the conditional mean but stabilise the variability
(i.e. make it constant) so that it is easier to describe. We
therefore try transforming both variables.

There is no theoretical argument to suggest a transformation


here but empirical experience suggests that we try the log
transformation.

The scatterplot on the log scale shows that the log


transformation of both variables preserves the linear conditional
location relationship and stabilises the conditional variability.

After examining the diagnostics, we conclude that the linear


regression model provides a good approximation to the
underlying population.

Anscombe plot

1.8

-0.2

2.0

-0.1

0.0

Residuals

2.4
2.2

log Fat

2.6

2.8

0.1

3.0

Scatterplot

3.4

3.6

3.8

4.0

4.2

4.4

4.6

1.8

2.0

log Skinfold

2.2

2.4

2.6

2.8

Fitted values

0.0
-0.2

-0.1

Sample Quantiles

0.1

Normal Q-Q Plot

-2

-1

Theoretical Quantiles

Figure: Scatterplot (top left) Anscombe plot (top right) and normal probability
plot (bottom) on the transformed scale

Let Yi denote the log percent body fat measurement and Xi the
log skinfold measurement on athlete i, where i = 1, . . . , 102.
Then we have
E(Yi |xi ) = 0 + 1 xi and Var (Yi |xi ) = 2

Fitting the model to the data results in estimates of the


parameters so that we can write:
= 1.25 + 0.88x
Y
(0.097)(0.025)
where the standard errors of the intercepts and slope are the
numbers in parentheses.

We also obtain a variance estimate S 2 = 0.0067 on 100 degrees


of freedom.

Parameter Estimates
Term
Intercept
logskin

Analysis of Variance
Source

Estimate

Std Error

t Ratio

Prob>|t|

1.250
0.882

0.097
0.024

12.940
35.590

0.000
0.000

DF

SS

MS

Prob>F

logskin
Residuals

1 8.5240 8.5240 1266.3


100 0.6731 0.0067

0.0000

Total

101

9.1971

A 95% confidence interval for the population slope parameter 1 is


(0.83, 0.93).

A common test of interest is the test of the hypothesis that


1 = 0. i.e. that skinfold is not useful for predicting bodyfat.

The 95% confidence interval for 1 does not contain zero so we


can conclude that there is evidence against the hypothesis and
that skinfold is a useful variable in the model for predicting body
fat.

A formal hypothesis test can be carried out by computing the


t-ratio 35.59 which has a p-value of zero (to 4 decimal places).
We conclude that 1 is significantly different from zero.

A test of the hypothesis 1 = 1 is also of interest. This is


because:

A 95% confidence interval for the mean log body fat percentage
of all individuals with a skinfold of 70 is (2.474, 2.522).

A 95% prediction interval for the log body fat percentage of a


single male athlete with a skinfold of 70 is (2.334, 2.663).

Although in our body fat example, we have worked on the simpler


log scale, it may be useful to make predictions on the raw scale.

We can also plot confidence bands across the range of the


skinfold measurements taken for the mean log body fat
percentage at any given skinfold measurement

We can also plot prediction bands across the range of the


skinfold measurements taken for the log body fat percentage of a
single male athlete at any given skinfold measurement

The bands are widests at the extremes and narrowest at the


centre

Also note that the prediction band is always wider than the
confidence band

3.0

Scatter plot with the fitted line and 95% confidence/prediction bands

2.4
1.8

2.0

2.2

log Fat

2.6

2.8

Fitted line
95% CI for mean
95% PI for observation

3.4

3.6

3.8

4.0

4.2

4.4

4.6

log Skinfold

Figure: Scatterplot of log bodyfat against log skinfold with fitted regression
line and 95% confidence bands and prediction bands

The validity of the inferences depends on the representativeness


of the sample and the validity of the model.

It is vital when collecting data to ensure that it is representative of


the population of interest and before making inferences to ensure
that the model is appropriate.

Strictly speaking, model assessment is a form of informal


inference and has an impact on other inferences but it is not
sensible to simply hope that a model is valid when empirical
verification is available.

You might also like