0% found this document useful (0 votes)
83 views29 pages

Examples Correation and Regression

The document provides examples of regression analysis and hypothesis testing. It examines the correlation between different variables like assets and profits, exercise equipment usage over time, city population and median age. It calculates correlation coefficients and tests hypotheses about slopes of regression lines to determine if correlations exist between variables.

Uploaded by

Minahil Fatima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views29 pages

Examples Correation and Regression

The document provides examples of regression analysis and hypothesis testing. It examines the correlation between different variables like assets and profits, exercise equipment usage over time, city population and median age. It calculates correlation coefficients and tests hypotheses about slopes of regression lines to determine if correlations exist between variables.

Uploaded by

Minahil Fatima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Examples Regression Analysis

and Hypothesis Testing


Significance of Correlation Coefficient
Example I
• A study of 20 worldwide financial institutions showed the correlation
between their assets and pretax profit to be .86. At the .05 significance
level, can we conclude that there is positive correlation in the
population?
• Solution:
• 𝐻0 : 𝜌 ≤ 0, 𝐻1 : 𝜌 > 0.
• Reject 𝐻0 if 𝑡 > 1.734.
• 𝑑𝑓 = 18.
0.86 20−2
•𝑡= 2
= 7.150.
1− 0.86
• Reject 𝐻0 . There is a positive correlation between assets and profit.
Example II

• The manufacturer of an exercise equipment wants to study the


relationship between the number of months since the
equipment was purchased and the time, in hours, the
equipment was used last week.
a. Plot the information on a scatter diagram. Let hours of
exercise be the dependent variable. Comment on the graph.
b. Determine the correlation coefficient. Interpret.
c. At the .01 significance level, can we conclude that there is a
negative association between the variables?
Solution a Plot the information on a scatter diagram. Let hours of
exercise be the dependent variable. Comment on the graph.
Hours Exercised
12

10
y = -0.6368x + 9.9393
8
Hours

0
0 2 4 6 8 10 12 14
Months

• There is an inverse relationship between the variables. As the


months owned increase, the number of hours exercised decreases.
Solution b Determine the correlation coefficient. Interpret.
𝑛 𝑥 𝑦 𝑥−𝑥 𝑦−𝑦 (̅ 𝑥 − 𝑥)(𝑦 − 𝑦)
Months Owned Hours Exercised
1 12 4 5.5 -1.8 -9.9
2 2 10 -4.5 4.2 -18.9
3 6 8 -0.5 2.2 -1.1
4 9 5 2.5 -0.8 -2
5 7 5 0.5 -0.8 -0.4
6 2 8 -4.5 2.2 -9.9
7 8 3 1.5 -2.8 -4.2
8 4 8 -2.5 2.2 -5.5
9 10 2 3.5 -3.8 -13.3
10 5 5 -1.5 -0.8 1.2

Sample Mean 6.5 5.8 -64


ST dev Sample 3.341656276 2.573367875 r -0.82694
Solution c At the .01 significance level, can we conclude that
there is a negative association between the variables?
• 𝐻0 : 𝜌 ≥ 0; 𝐻1 ∶ 𝜌 < 0.
• Reject 𝐻0 if 𝑡 < −2.896.
−0.827 10−2
•𝑡 = 2
= −4.16
1− −0.827
• Reject 𝐻0 .
• There is a negative association between months owned and
hours exercised.
Example III
• City planners believe that larger cities are populated by older residents.
• To investigate the relationship, data on population and median age in 10
large cities were collected.

• What is the difference between


mean age and median age?
• Mean age is the average age of the
people. Median age is the point
where half the population is
above and half are below it.
a. Plot these data on a scatter diagram with
median age as the dependent variable
Median Age (in years) vs Population (in Millions)
34.5

34

33.5 y = 0.2722x + 31.367

33

32.5

32

31.5

31

30.5

30
0 1 2 3 4 5 6 7 8 9

• The median age and population are directly related.


b. Find the correlation coefficient.
City Median Age x- y- (x - sample mean of x)
population sample sample *(y - sample mean of y )
in millions mean of x mean of y

x y
1 2.8333 31.5 0.36147 -0.54 -0.1951938
2 1.233 30.5 -1.23883 -1.54 1.9077982
3 2.144 30.9 -0.32783 -1.14 0.3737262
4 3.849 31.6 1.37717 -0.44 -0.6059548
5 8.214 34.2 5.74217 2.16 12.4030872
6 1.448 34.2 -1.02383 2.16 -2.2114728
7 1.513 30.7 -0.95883 -1.34 1.2848322
8 1.297 31.7 -1.17483 -0.34 0.3994422
9 1.257 32.5 -1.21483 0.46 -0.5588218
10 0.93 32.6 -1.54183 0.56 -0.8634248
Sample mean 2.47183 32.04
Total 11.934018
St Dev 2.20713002 1.3301629

Correlation Coefficient 0.451659793


c. Regression Equation
• A regression analysis was performed and the resulting regression
equation is
Median Age (𝒚) = 31.4 + 0.272 Population (𝒙)
𝒔𝒚
(Use 𝒚 = 𝒃𝒙 + 𝒂, 𝒂 = 𝒚 − 𝒃 𝒙, 𝒃 = 𝒓 )
𝒔𝒙
• Interpret the meaning of the slope.
• Solution:
• The slope of 0.272 indicates that for each increase of 1 million in
the population, the median age increases on average by 0.272
year.
d. Estimate the median age for a city of 2.5
million people.
• The median age is 32.08 years, found by 31.4 + 0.272(2.5).
e. Here is a portion of the regression software
output. What does it tell you?

Coefficients Standard Error t Stat P-value


Intercept 31.36716753 0.615835268 50.93 2.4E-11
X Variable 1 0.272200139 0.190103183 1.432 0.19007

• The p-value (0.190) for the population variable is greater than,


say, .05. A test for significance of that coefficient would fail to
be rejected.
• In other words, it is possible the population coefficient is zero.
f. Significance of correlation
• Using the .10 significance level, test the significance of the
correlation. Interpret the result.
• Is there a significant relationship between the two variables?
• Solution:
• 𝑯𝟎 : 𝝆 = 𝟎 𝑯𝟏 : 𝝆 ≠ 𝟎
• Reject 𝐻0 if t is not between −1.86 and 1.86.
0.452 10−2
• 𝑑𝑓 = 8, 𝑡 = 2
= 1.433
1− 0.452
• Do not reject H0. There may be no relationship between age and
population
Example IV
• The city council of Pine Bluffs is considering increasing the number of
police in an effort to reduce crime. Before making a final decision, the
council asked the chief of police to survey other cities of similar size
to determine the relationship between the number of police and the
number of crimes reported. The chief gathered the following sample
information.
Questions
a. Which variable is the dependent variable and which is the
independent variable? Hint: Which of the following makes
better sense: Cities with more police have fewer crimes, or
cities with fewer crimes have more police? Explain your
choice.
b. Draw a scatter diagram.
c. Determine the correlation coefficient.
d. Interpret the correlation coefficient. Does it surprise you
that the correlation coefficient is negative?
Solution
• a. Either variable could be independent. In the
scatter plot, police is the independent
variable.
• c. 𝑛 = 8,
𝑥 − 𝑥 𝑦 − 𝑦 = −231.75

𝑠𝑥 = 5.8737, 𝑠𝑦 = 6.4462
−231.75
𝑟= = −0.8744
8 − 1 (5.8737)(6.4462)
• d. Strong inverse relationship. As the number
of police increases, the crime decreases, or as
crime increases, the number of police
decreases.
Questions
• Assume the dependent variable is number of crimes.
a. Determine the regression equation.
b. Estimate the number of crimes for a city with 20 police
officers.
c. Interpret the regression equation.
Solution
a.
𝑏 = −.8744( 6.4462/5.8737) = −0.9596
95 146
𝑎 = − −0.9596 = 29.3877
8 8
b. 10.1957, found by 29.3877 − 0.9596(20)
c. For each policeman added, crime goes down by almost one.
Testing The Significance of Slope
of the Regression Line
Testing The Significance Of The Slope
• We already showed how to find the equation of the regression
line that best fits the data, based on the least squares principle.
• The purpose of the regression equation is to quantify a linear
relationship between two variables.
• The next step is to analyze the regression equation by conducting
a test of hypothesis to see if the slope of the regression line is
different from zero.
• If we cannot demonstrate that this slope is different from zero,
then we conclude there is no merit to using the independent
variable as a predictor.
Two-Tailed Hypothesis Testing
• The null and alternative hypotheses are:
𝐻0 ∶ 𝛽 = 0 (the slope of the regression equation in the
population is zero.)
𝐻1 ∶ 𝛽 ≠ 0 (the slope of the regression equation in the
population is other than zero.)
𝛽 (the Greek letter beta) represents the population slope for the
regression equation.
• In regression analysis, 𝑏 is our computed slope based on a
sample and is an estimate of the population’s slope, identified
as 𝛽.
Conclusions and T test
• If 𝐻0 is accepted, then the regression line is horizontal and
there is no relationship between the independent variable,
X, and the dependent variable, Y.
• If 𝐻0 is rejected and the alternative statement is accepted, a
significant relationship exists between the two variables.
• T test for slope is:
One-tailed Test
• Instead of a two tailed test, we prefer one tailed test of the form.
𝐻0 ∶ 𝛽 ≤ 0
𝐻1 ∶ 𝛽 > 0
• If we do not reject the null hypothesis, we conclude that the
slope of the regression line in the population could be zero. This
means the independent variable is of no value in improving our
estimate of the dependent variable.
• If we reject the null hypothesis and accept the alternative, we
conclude the slope of the line is greater than zero. Hence, the
independent variable is an aid in predicting the dependent
variable.
Example I
• We take the same example of salespersons
and copiers sold.
• The t distribution is the test statistic.
• 𝑑𝑓 = 𝑛 − 2 = 15 − 2 = 13.
• We use the .05 significance level. From Appendix B.5, the
critical value is 1.771.
• Our decision rule is to reject the null hypothesis if the value
computed from formula is greater than 1.771.
Decision
• The computed value of 6.205 exceeds our critical value of
1.771, so we reject the null hypothesis and accept the
alternative hypothesis.
• We conclude that the slope of the line is greater than zero.
• The independent variable, number of sales calls, is useful in
estimating copier sales.
Example II
• The owner of Haverty’s Furniture Company studied the
relationship between the amount spent on advertising in a
month and sales revenue for that month.
• The amount of sales is the dependent variable and advertising
expense is the independent variable. The regression equation in
that study was ŷ = 1.5 + 2.2𝑥 for a sample of 5 months.
• Conduct a test of hypothesis to show there is a positive
relationship between advertising and sales. From statistical
software, the standard error of the regression coefficient is 0.42.
Use the .05 significance level.
Solution
Example III
• The regression equation is ŷ = 29.29 − 0.96x, the sample size
is 8, and the standard error of the slope is 0.22. Use the .05
significance level.
• Can we conclude that the slope of the regression line is less
than zero?
Solution
• 𝐻0 : 𝛽 ≥ 0 𝐻1 : 𝛽 < 0
• 𝑑𝑓 = 𝑛 − 2 = 8 − 2 = 6
• Reject H0 if 𝑡 < −1.943.
• 𝑡 = − 0.96 0.22 = −4.364
• Reject 𝐻0 and conclude the slope is less than zero.

You might also like