SPSS Module-1
SPSS Module-1
For Practice: Suppose we are interested in testing the effectiveness of a new type of antibiotic.
Three different types of bacteria are exposed to the drug and the survival time for a particular
bacteria culture is measured as the amount of time required to kill 50% of the cells in the petri
dish. The survival times for eight colonies of one particular bacteria culture are 1.1 hours, 1.2
hours, 1.5 hours, 1.7 hours, 1.9 hours, 1.1 hours, 1.3 hours, and 1.8 hours. Calculate the mean,
median, and mode.
SPSS Steps to Generate Output:
Click “Analyze” tab
Select “Descriptive Statistics”
Click “Frequencies”
Input your “survival_time” variable into “Variable(s)” box
On the right hand side click Statistics
In Central Tendency section check mean, median and mode
Click “Continue”
Click “OK”
Statistics
survival_time
N Valid 8
Missing 0
Mean 1.4500
Median 1.4000
Mode 1.10
Measures of Dispersion
For Practice: Calculate the range, variance, standard deviation, and coefficient of variation for
the survival times in the previous practice question.
SPSS Steps to Generate Output:
Click “Analyze” Tab
Select “Descriptive Statistics”
Click “Frequencies”
Input your “survival_time” variable into “Variable(s)” box
On the right hand side click “Statistics”
In “Dispersion” section check standard deviation, range, variance, minimum,
maximum and S.E.mean
Click “Continue”
Click “OK”
Statistics
survival_time
N Valid 8
Missing 0
Std. Error of Mean .11339
Std. Deviation .32071
Variance .103
Range .80
Minimum 1.10
Maximum 1.90
| |
Measures of Position
For Practice: The following are the systolic blood pressures of 20 men.
150 141 90 108
158 119 156 114
95 97 145 167
144 171 132 97
163 111 186 98
1. Compute the 10th percentile (P10).
2. Compute Q1, Q2, and Q3.
3. Determine if the above data set has any outliers.
4. Determine the percentile rank of the value 163.
Click “Analyze” Tab
Select “Descriptive Statistics”
Click “Frequencies”
Input your “systolic_BP” variable to “Variable(s)” box
On the right hand side click “Statistics”
In “Percentile Values” section check “Quartiles” and “Percentile(s)”
After checking “Percentile(s)” add 10 (or the percentile you wish to compute Px)
to the adjacent box and click add
Click “Continue”
Click “OK”
Statistics
systolic_BP
N Valid 20
Missing 0
Percentiles 10 95.2000
25 100.5000
50 136.5000
75 157.5000
For Practice: For our systolic blood pressures for 20 men, obtain the five-number summary.
The five-number summary can be used to draw another type of graph called the box-and-whisker
plot which is often called a boxplot.
To draw a boxplot,
1. Determine the upper and lower fences.
2. Draw vertical lines at Q1 , M, and Q3 . Enclose these vertical lines in a box.
3. Draw a line from Q1 to the smallest data value that is larger than the lower fence.
4. Draw a line from Q3 to the largest data value that is smaller than the upper fence.
Statistics
systolic_BP
N Valid 20
Missing 0
Mean 132.1000
Minimum 90.00
Maximum 186.00
Percentiles 25 100.5000
50 136.5000
75 157.5000
For Practice: Draw a boxplot for the systolic blood pressures of 20 men example.
The distribution shape based upon a boxplot:
1. If the median is near the center of the box and each horizontal line is approximately the same
length, then the distribution is roughly symmetric.
2. If the median is to the left of the center of the box OR the right horizontal line is substantially
longer than the left horizontal line, the distribution is skewed right.
3. If the median is to the right of the center of the box OR the left horizontal line is substantially
longer than the right horizontal line, the distribution is skewed left.
4. If one wanted to compare the underlying distributions of two different data sets, one would
create a boxplot for both data sets and plot them one on top of the other on the same
horizontal scale.
To get SPSS generate a Boxplot
Click “Graphs” Tab
Select “Legacy Dialogs”
Click “Boxplot”
Select “Simple”
Check “Summaries of separate variables”
Click “Define”
Input your “systolic_BP” variable to “Boxes Represent”
Click “OK”
Now Your Turn: BFAHS p.61 q. 29 Thilothammal et al. (A-19) designed a study to determine
the efficacy of BCG (bacillus Calmette-Guerin) vaccine in preventing tuberculous meningitis.
Among the data collected on each study was a measure of nutritional status. The following table
contains the nutritional status of the 91 cases studies.
Statistics
NutritionalStatus
N Valid 91
Missing 0
Mean 74.7319
Median 73.3000
Mode 76.90
Std. Deviation 14.61530
Variance 213.607
Percentiles 70 79.4200
For Practice: Each of 15 hypertension patients was administered several drugs on different
occasions. The results of concern are for a placebo drug compared with Inderal. Each patient
first took the placebo for one month. After the month, their systolic blood pressures were
recorded. They then stopped taking the placebo and started taking 120 mg of Inderal for one
month. After the month, their blood pressures were recorded. The data presented in the
following table are the systolic blood pressures measured.
Patient Placebo Inderal
1 175 176
2 199 181
3 180 146
4 180 140
5 164 127
6 174 139
7 195 129
8 204 133
9 205 194
10 180 169
11 195 186
12 161 158
13 164 141
14 190 150
15 178 164
At the 5% level of significance, test whether the true average systolic blood pressure of those
people who took Inderal is 160.
To get SPSS generate the output:
Click “Analyze” Tab
Select “Compare Means”
Click “One - Sample T Test”
Input your “Inderal” variable to “Test Variable(s)” box
In “Test Value” section type in 160
On the right hand side click “Options”
In “Confidence Interval Percentage” type in 95
Click “Continue”
Click “OK”
Research Question: Is the true average systolic blood pressure of those people who took inderal
is 160?
Hypothesis to be tested:
H0: µBP_inderal= 160 (the true average systolic blood pressure of those people who took inderal
=160)
Ha: µBP_inderal ≠ 160 (the true average systolic blood pressure of those people who took inderal is
not 160)
Hypothesis Test to be used: One –Sample T-Test.
Assumptions required to implement the hypothesis test:
1. A simple random sample is obtained
2. The population from which the sample is drawn is normally distributed OR the sample size is
greater than 29.
(Suppose for now that all required assumptions are hold. An example how to test for normality
will be provided later).
The Significance Level: α=0.05
The Test Statistic and corresponding p-value:
One-Sample Test
One-Sample Test
Test Value = 0
Now Your Turn: Suppose in the previous Inderal example we wish to test:
(a) At the 4% level of significance whether the true mean systolic blood pressure after taking
the placebo is 190 mmHg.
Solution: The required SPSS output is included below.
(b) At the 1% level of significance, test whether the true mean systolic blood pressure after
taking the Inderal is greater than 150 mmHg.
(c) Create a 90% confidence interval for the true mean systolic blood pressure after taking
the Inderal. Suppose that 160 mmHg is considered to be the normal blood pressure for
the age category of the participants in the study. Based on your confidence interval, test
whether the true mean systolic blood pressure after taking inderal is 160 mmHg.
SOLUTION
(a) To get SPSS generate the output for the true mean systolic blood pressure after taking the
placebo=190 mmHg at the α=0.04:
Click “Analyze” Tab
Select “Compare Means”
Click “One- Sample T Test”
Input your “Placebo” variable to “Test Variable(s)” box
In “Test Value” section type in 190
On the right hand side click “Options”
In “Confidence Interval Percentage” type in 96
Click “Continue”
Click “OK”
Research Question: Is the true mean systolic blood pressure of those people who took placebo
is 190?
Hypothesis to be tested:
H0: µBP_placebo= 190 (the true average systolic blood pressure of those people who took placebo
=190)
Ha: µBP_placebo ≠ 190 (the true average systolic blood pressure of those people who took placebo is
not 190, meaning less or greater than 190: two –tailed test)
Hypothesis Test to be used: One –Sample T-Test.
Assumptions required to implement the hypothesis test:
1. A simple random sample is obtained
2. The population from which the sample is drawn is normally distributed OR the sample size is
greater than 29.
(Suppose for now that all required assumptions are hold. An example how to test for normality
will be provided later).
The Significance Level: α=0.04
The Test Statistic and corresponding p-value:
One-Sample Test
(b) To get SPSS generate the output testing whether the true mean systolic blood pressure after
taking the Inderal is greater than 150 mmHg at α=0.01
Click “Analyze” Tab
Select “Compare Means”
Click “One- Sample T Test”
Input your “Inderal” variable to “Test Variable(s)” box
In “Test Value” section type in 150
On the right hand side click “Options”
In “Confidence Interval Percentage” type in 99
Click “Continue”
Click “OK”
Research Question: Is the true mean systolic blood pressure after taking the Inderal is 150?
Hypothesis to be tested:
H0: µBP_inderal= 150 (the true mean systolic blood pressure after taking the Inderal is =150)
Ha: µBP_inderal ≠ 150 (the true mean systolic blood pressure after taking the Inderal is not 150,
meaning that less or greater than 150: two –tailed test)
Hypothesis Test to be used: One –Sample T-Test.
Assumptions required to implement the hypothesis test:
1. A simple random sample is obtained
2. The population from which the sample is drawn is normally distributed OR the sample size is
greater than 29.
(Suppose for now that all required assumptions are hold. An example how to test for normality
will be provided later).
The Significance Level: α=0.01
The Test Statistic and corresponding p-value:
One-Sample Test
(c) Create a 90% confidence interval for the true mean systolic blood pressure after taking the
Inderal. Suppose that 160 mmHg is considered to be the normal blood pressure for the age
category of the participants in the study. Based on your confidence interval, test whether the true
mean systolic blood pressure after taking Inderal is 160 mmHg.
To get SPSS generate the output:
Click “Analyze” Tab
Select “Compare Means”
Click “One- Sample T Test”
Input your “Inderal” variable to “Test Variable(s)” box
In “Test Value” section type in 160
On the right hand side click “Options”
In “Confidence Interval Percentage” type in 90
Click “Continue”
Click “OK”
Research Question: Is the true mean systolic blood pressure after taking the Inderal is 160?
Hypothesis to be tested:
H0: µBP_inderal= 160 (the true mean systolic blood pressure after taking the Inderal is =160)
Ha: µBP_inderal ≠ 160 (the true mean systolic blood pressure after taking the Inderal is not 160,
meaning that less or greater than 160: two –tailed test)
Hypothesis Test to be used: One –Sample T-Test.
Assumptions required to implement the hypothesis test:
1. A simple random sample is obtained
2. The population from which the sample is drawn is normally distributed OR the sample size is
greater than 29.
(Suppose for now that all required assumptions are hold. An example how to test for normality
will be provided later).
The Significance Level: α=0.10
The Test Statistic and corresponding p-value:
One-Sample Test
90 % confidence interval for the true mean systolic blood pressure after taking the Inderal is
between 145 and 165.
Click “Analyze” Tab
Select “Compare Means”
Click “One- Sample T Test”
Input your “Inderal” variable to “Test Variable(s)” box
In “Test Value” section type in 0 (zero)
On the right hand side click “Options”
In “Confidence Interval Percentage” type in 90
Click “Continue”
Click “OK”
One-Sample Test
Test Value = 0
For Practice: BFAHS p185, q. 6.4.10: In a study of factors thought to be responsible for the
adverse effects of smoking on human reproduction, cadmium level determinations (nanograms
per gram) were made on placenta tissue of a random sample of 14 mothers who were smokers
and an independent random sample of 18 nonsmoking mothers. The data is summarized below:
Nonsmokers Smokers
10.0 9.4 30.0 28.5
8.4 25.1 30.1 17.5
12.8 19.5 15.0 14.4
25.0 25.5 24.1 12.5
11.8 9.8 30.5 20.4
9.8 7.5 17.8
12.5 11.8 16.8
15.4 12.2 14.8
23.5 15.0 13.4
1. At the α=0.10 level of significance, test the claim that the mean cadmium level is higher
among smokers than nonsmokers.
cadmium Equal variances .461 .502 - 30 .020 -5.69206 2.30647 -9.60675 -1.77738
_level assumed 2.468
2. Determine a 95% confidence interval for the true difference in the mean cadmium levels
between the two groups.
To get SPSS generate a 95% confidence interval for the true difference in the mean cadmium
levels between two groups follow the steps as in question 1, except the %
Click “Analyze” Tab
Select “Compare Means”
Click “Independent- Samples T Test”
Input your “cadmium_level” variable to “Test Variable(s)” box
Input your “status” variable to “Grouping Variable” box
Click “Define Groups”
In “Group 1” type in 1
In “Group 2” type in 2
Click “Continue”
On the right hand side click “Options”
In “Confidence Interval Percentage” type in 95
Click “Continue”
Click “OK”
Independent Samples Test
95% Confidence
Interval of the
cadmium_ Equal variances .461 .502 -2.468 30 .020 -5.69206 2.30647 -10.40251 -.98161
level assumed
Answer: 95% confidence interval for the true difference in the mean cadmium levels between
two groups with equal variances assumed is
Lower bound: - 10.403
Upper bound: - 0.982
For Practice: Plot 80% confidence intervals for the two groups in which cadmium levels were
measured.
cadmium_ Equal variances .461 .502 -2.468 30 .020 -5.69206 2.30647 -10.40251 -.98161
level assumed
Assume that the reaction times of both populations are normally distributed.
(a) If the variances of the two populations are not equal, at the α =0.10 level of significance, test
the claim that a person's reaction time is increased if texting while driving.
To get SPSS generate the output:
Click “Analyze” Tab
Select “Compare Means”
Click “Independent- Samples T Test”
Input your “reaction_time” variable to “Test Variable(s)” box
Input your “status” variable to “Grouping Variable” box
Click “Define Groups”
In “Group 1” type in 1
In “Group 2” type in 2
Click “Continue”
On the right hand side click “Options”
In “Confidence Interval Percentage” type in 90
Click “Continue”
Click “OK”
Research Question: Is the mean reaction time if texting while driving greater than the true mean
reaction time if not texting while driving?
Let population 1 be the reaction times of those who were texting while driving.
Let population 2 be the reaction times of those who were not texting while driving.
Hypothesis to be tested:
H0: µ1= µ2 (there is no difference in the true mean reaction times based texting status while
driving)
Ha: µ1 > µ2(the true mean reaction time if texting while driving is greater than the true mean
reaction time if not texting while driving)
Hypothesis Test to be used: Independent – Samples T-Test.
Assumptions required to implement the hypothesis test:
1. A sample random sample is obtained
2. The populations from which both samples are drawn are normally distributed and sample sizes
are large (n1 ≥ 30 and n2 ≥ 30).
3. The two samples are independent.
Normally we would not continue with Independent – Samples T-Test due to violation of second
assumption. However, assuming that all required assumption are met we will proceed.
The Significance Level: α=0.10
The Test Statistic and corresponding p-value:
Independent Samples Test
90% Confidence
Mean Std. Error Interval of the
reaction_ Equal variances .618 .437 -5.968 32 .000 -1.93333 .32396 -2.48208 -1.38459
time assumed
(b) Assuming the variances of the two populations are not equal, construct an appropriate plot
(using the level of significance α=0.05) from which you could visually inspect whether it was
plausible that the two populations had the same mean. Referencing this plot, discuss why or why
not you would conclude the two populations have the same mean.
Click “Graphs” Tab
Select “Legacy Dialogs”
Click “Error Bar”
Select “Simple”
In “Data in Chart” check “Summaries for groups of cases”
Click “Define”
Input your “reaction_time” variable to “Variable”
Input “status” variable to “Category Axis”
In “Bars Represent” select “Confidence interval for mean”
Set Level at 95%
Click “OK”
Conclusion: Since 95% confidence intervals for the true mean reaction times when texting while
driving and not texting while driving do not overlap it’s likely that two populations have
different means.
(c) If the variances of the two populations are not equal, determine a 99% confidence interval for
the true difference in the mean reaction times between the two groups.
To get SPSS generate a table with a 99% confidence interval for the true difference in the
reaction times between the two groups follow the same steps as for the question (a), except the %
in In “Confidence Interval Percentage”.
Independent Samples Test
99% Confidence
Interval of the
reaction_t Equal variances .618 .437 -5.968 32 .000 -1.93333 .32396 -2.82048 -1.04619
ime assumed
99% confidence interval for the true difference in the reaction times between the two groups
assuming not equal variances is:
Lower bound I
(d) If the variances of the two populations are equal, at the α=0.10 level of significance, test the
claim that a person's reaction time is increased if texting while driving.
Use the table from the question (c) and repeat steps as in question (a).
(e) If the variances of the two populations are equal, determine a 99% confidence interval for the
true difference in the mean reaction times between the two groups.
Use the table from the question (c).
Dependent Random Samples from
Two Populations
For Practice: Referring back to the Inderal example, the hope is that one’s systolic blood
pressure after taking Inderal would be lower than before taking Inderal.
1. At the α=0.10 level of significance, test the claim that one’s systolic blood pressure is lower
after taking Inderal.
To get SPSS generate the output for two dependent variables:
Click “Analyze” Tab
Select “Compare Means”
Click “Paired- Samples T Test”
Input your “Placebo” variable to “Paired Variables” box (“Variable 1”)
Input your “Inderal” variable to “Paired Variables” box (“Variable 2”)
On the right hand side click “Options”
In “Confidence Interval Percentage” type in “90”
Click “Continue”
Click “OK”
Research Question: Is the true mean systolic blood pressure after taking Inderal lower than the
true mean systolic blood pressure after taking placebo?
Let population 1 be the systolic blood pressures after taking Inderal.
Let population 2 be the systolic blood pressures after taking Placebo.
Hypothesis to be tested:
H0: µ1= µ2 (there is no difference in the true mean systolic blood pressures based on taking either
Inderal or Placebo)
Ha: µ1 < µ2 (the true mean systolic blood pressure after taking Inderal is lower than the true mean
systolic blood pressure after taking Placebo)
Hypothesis Test to be used: Paired – Samples T-Test.
Assumptions required to implement the hypothesis test:
1. A simple random sample is obtained
2. The populations from which both samples are drawn are normally distributed.
3. The two samples are matched-pairs.
The Significance Level: α=0.10
The Test Statistic and corresponding p-value:
Paired Differences
2. Determine a 90% confidence interval for the true difference in the mean systolic blood
pressures between the two groups.
From the Paired Samples Test above used in question #1, 90% confidence interval for the true
difference in the mean systolic blood pressures between two groups are:
The Lower bound is 17.63
The Upper bound is 37.17
Now Your Turn: BFAHS, p252, example 7.4.1: John M. Morton et al. (A-14) examined
gallbladder function before and after fundoplication - a surgery used to stop stomach contents
from flowing back into the esophagus (reflux) - in patients with gastroesophageal reflux disease.
The authors measured gall bladder functionality by calculating the gall bladder ejection fraction
(GBEF) before and after fundoplication. These values are stored in the table below.
Patient 1 2 3 4 5 6 7 8 9 10 11 12
Pre-op % 22.0 63.3 96.0 9.2 3.1 50.0 33.0 69.0 64.0 18.8 0.0 34.0
Post-op % 63.5 91.5 59.0 37.8 10.1 19.6 41.0 87.8 86.0 55.0 88.0 40.0
The goal of fundoplication is to increase GBEF, which is measured as a percent. Does the data
support, at the 5% level of significance, that fundoplication increases GBEF functioning? You
may assume that the patients were randomly selected and that the differences in the Pre-op and
Post-op GBEF are normally distributed.
Solution:
Paired Differences
For Practice: Use Levene's Test for Equality of Variances to determine whether or not (at the
10% level of significance) the variances between the smoking and non-smoking group in the
cadmium level study are equal.
To get SPSS generate the output:
Click “Analyze” Tab
Select “Compare Means”
Click “Independent- Samples T Test”
Input your “cadmium_level” variable to “Test Variable(s)” box
Input your “status” variable to “Grouping Variable” box
Click “Define Groups”
In “Group 1” type in 1
In “Group 2” type in 2
Click “Continue”
On the right hand side click “Options”
In “Confidence Interval Percentage” type in 90
Click “Continue”
Click “OK”
Research question: Are the variances between the smoking and non-smoking group in the
cadmium level study equal?
Hypothesis to be tested:
H0: µ1= µ2 (variances are equal)
Ha: µ1 ≠ µ2 (variances arre not equal)
Hypothesis Test to be used: Levene’s Test for equality.
Assumptions required to implement the hypothesis test:
The Significance Level: α=0.10
The Test Statistic and corresponding p-value:
Independent Samples Test
Levene's Test
for Equality of
Variances t-test for Equality of Means
90% Confidence
cadmium_ Equal variances .461 .502 -2.468 30 .020 -5.69206 2.30647 -9.60675 -1.77738
level assumed
For Practice: Create a Q-Q Plot for the Inderal group in the Systolic Blood Pressure example.
To get SPSS generate a Q-Q Plot:
Click “Analyze” Tab
Select “Descriptive Statistics”
Click “Q-Q Plots”
Input your “Inderal” variable to “Variables” box
Make sure that “Mean” in the “Rank Assigned to Ties” is checked
Click “OK”
Shapiro-Wilk W Test for Normality
For the Placebo Systolic Blood Pressure data set, W = 0.937 with p-value=0.349.
For the Inderal Systolic Blood Pressure data set, W=0.938 with p-value=0.363.
The Decision Rule: Since for the placebo group p-value=0.349>0.05=α, do not reject H0, Placebo,
and for the Inderal group p-value=0.363>0.05=α, do not reject H0, Inderal.
The Conclusion: At the 5% level of significance with p-value=0.349>0.05=α for the Placebo
group, and with p-value=0.363>0.05= α for the Inderal group there is not enough evidence to
conclude that both populations were drawn from not normally distributed populations.
NOW YOUR TURN: Using the data in presented in the Texting While Driving Now-Your-
Turn Scenario (put a link to this???),
a) Generate the P-P plots required to visually inspect whether each sample was drawn from
a normally distributed population. Comment on whether, from the plots, you would
believe each population is normally distributed.
To get SPSS generate a P-P Plot:
Click “Analyze” Tab
Select “Descriptive Statistics”
Click “P-P Plots”
Input your “texting_while_driving” variable to “Variables” box
Make sure that “Mean” in the “Rank Assigned to Ties” is checked
Click “OK”
Repeat the same steps separately for the “not_texting_while_driving” variable.
b) Generate the Q-Q plots required to visually inspect whether each sample was drawn from
a normally distributed population. Comment on whether, from the plots, you would
believe each population is normally distributed.
To get SPSS generate a Q-Q Plot:
Click “Analyze” Tab
Select “Descriptive Statistics”
Click “Q-Q Plots”
Input your “texting_while_driving” variable to “Variables” box
Make sure that “Mean” in the “Rank Assigned to Ties” is checked
Click “OK”
Repeat the same steps separately for the “not_texting_while_driving” variable.
c) Test, at the 10% level of significance, whether each sample was drawn from a normally
distributed population. Be sure to write your solution in format discussed in class.
Click “Analyze” Tab
Select “Descriptive Statistics”
Click “Explore”
Input your “texting_while_driving” and “not_texting_while_driving” variables to
“Dependent List” box
On the right hand side click “Statistics”
In “Confidence Interval for Mean” type in 95
Click “Continue”
On the right hand side click “Plots”
In “Boxplots” section check “None”
In “Descriptive” section uncheck everything
Check “Normality plots with tests”
Click “Continue”
In “Display” section check “Both”
Click “OK
d) Test, at the 10% level of significance, whether all both samples were drawn from
populations with the same variance. Be sure to write your solution in the format
discussed in class.
To get SPSS generate the output:
Click “Analyze” Tab
Select “Compare Means”
Click “Independent- Samples T Test”
Input your “reaction_time” variable to “Test Variable(s)” box
Input your “status” variable to “Grouping Variable” box
Click “Define Groups”
In “Group 1” type in 1
In “Group 2” type in 2
Click “Continue”
On the right hand side click “Options”
In “Confidence Interval Percentage” type in 95
Click “Continue”
Click “OK”
Correlation and Linear Regression
To construct a scatter diagram (or scatter plot), we simply plot the points for the n cases.
For Practice: A medical researcher wants to determine if there is a linear relationship between
the costs of prescription drugs that can be administered to both humans and pets. The data
collected (in Canadian dollars) is summarized in the following table.
For Practice: For our Human versus Pet Drug Cost data, calculate Pearson's correlation
coefficient.
To get SPSS generate a Pearson’s Correlation coefficient:
Correlations
Human Pet
**
Human Pearson Correlation 1 .952
N 7 7
**
Pet Pearson Correlation .952 1
N 7 7
For Practice: For the Human versus Pet Drug Cost example, determine whether there is
significant positive correlation between the independent and dependent variables at the 10%
level of significance. Assume all the required assumptions hold.
Research question: Is there a true correlation between the Human drug cost and Pet drug cost?
Hypothesis to be tested:
H0: ρ = 0
Ha: ρ ≠ 0
Note: ρ represents the population correlation
Hypothesis Test to be used: Test for Significant Correlation.
Assumptions required to implement the test: the variables x and y are linearly related; each
pair was randomly selected; and the variables must have a bivariate normal distribution.
We were told to assume all required assumptions hold.
The Significance Level: α= 0.10
The Test Statistics and corresponding p-value:
Correlations
Human Pet
**
Human Pearson Correlation 1 .952
N 7 7
**
Pet Pearson Correlation .952 1
N 7 7
For Practice: For our Human versus Pet Drug Cost data, calculate Spearman's Rank Correlation
Coefficient.
To get SPSS generate Spearman’s Rank Correlation Coefficient:
Human Pet
N 7 7
N 7 7
Human Pet
N 7 7
N 7 7
Least-Squares Regression
For Practice: The Blood pressure and age were measured for female patients. The patients
were then grouped by age and, for each of the age groups, the median Blood Pressure
measurement was computed. The data are summarized below:
Draw a scatter diagram, calculate Pearson’s correlation coefficient, and calculate a least-squares
regression line for the above data.
To get SPSS generate a scatter plot:
Correlations
Age_group median_BP
**
Age_group Pearson Correlation 1 .997
N 5 5
**
median_BP Pearson Correlation .997 1
N 5 5
Correlations
Age_group median_BP
**
Age_group Pearson Correlation 1 .997
N 5 5
**
median_BP Pearson Correlation .997 1
N 5 5
a
Coefficients
Standardized
Unstandardized Coefficients Coefficients
̂
Inferences concerning an expected response
b
ANOVA
Total 1647.200 4
Descriptive Statistics
For Practice: For the Blood Pressure and Age data in the previous example, determine the
coefficient of determination.
To calculate coefficient of determination you need Anova table. To generate it follow steps
described above.
b
ANOVA
Total 1647.200 4
a. Predictors: (Constant), Age_group
b. Dependent Variable: median_BP
For Practice: For the Blood Pressure and Age data, calculate the least-squares line, ,
and .
The least- squared line is
̂ ̂ ̂
determined by:
∑( ̅)
∑( ̅)
determined by:
∑( )