Measures of Association: Lesson 1 Data Analysis
Measures of Association: Lesson 1 Data Analysis
Learning Objectives:
Introduction
Another major type of data analysis is concerned with determining the extent to which
variables are related or associated with each other. Statistical tests available depend upon the
type of data gathered for the research. Some requires interval data as in Pearson product-Moment
Correlation (r). When data are ordinal or in ranked form, the Spearman Rank-Order Correlation
or rho is the formula which is more appropriate to use when determining the relationship of two
variables.
93
Correlation Analysis
93
Figure 1 Hypothetical scatter diagrams indicating some forms of relationship between X and Y..
If the scatter diagram indicates the presence of linear association between two variables,
the extent of their association can be measured. The most commonly used statistical measure is
the Pearson Product Moment Correlation Coefficient denoted by p and estimated by r.
This coefficient can take any value from -1 to +1, and it is a dimensionless or unit less
quantity. A positive value indicates a negative slope.
If we ignore the sign, the size of the coefficient indicates how close the data is to a
straight line. The closer the size is to one, the more closely the data follow the straight line. A
correlation equal to +1, indicates that the points lie exactly on a line with a positive slope; a -1
correlation indicates that the points lie on a line with a negative slope. A correlation close to 0
indicates that the points are more scattered.
The square of the correlation coefficient (r) is the coefficient of determination (r2).
In the example, the correlation coefficient of 0.95 indicates that (0.95) 2 (100) = 90.25%
of the variation in variables y is due to the linear relationship between x and y while 9.7%, called
an error is due to other unexplained factors.
n ∑ xy−( ∑ x )( ∑ y )
R=
√ [ n ( ∑ x ) − ( ∑ x ) ] [ n ( ∑ y )− ( ∑ y ) ]
2 2 2 2
X Y X2 Y2 XY
3 9 9 81 27
4 13 16 169 52
5 15 25 225 75
6 16 36 256 96
7 17 49 289 119
∑ x =25 ∑ Y =70 ∑ X 2=135 ∑ Y 2=1020 ∑ XY =369
93
Substituting the Values
In linear correlation analysis, it is important that we know the extent by which the
variation in the dependent variable is related to the variation in the independent variable such
that we may be able to see the need for examining other related variables by such statistical tools
as the multiple or partial correlation.
The formula for testing the significance of Pearson r is called the t-test. It is being used to
verity if the value of the computed coefficient of correlation is significant, that is, does it
represent a real correlation, or is the obtained coefficient of correlation merely brought about by
chance?
n−2
t=r
√ 1−r 2
N = number of samples
93
n−2
t=r
√ 1−r 2
5−2
¿ .95
√ 1−(.95)2
3
¿ .95
√ 1−.9025
3
¿ .95
√ .0975
¿ .95 ( 5.547 )=5.27
This value with (N-2) degrees of freedom, or 3 degrees of freedom is significant at the 5
percent leve; (t.05 = 3.182, df = 3). Therefore, there is a significant relationship that exists
between x and y at .05 level of significance.
93
Select the variables for which we want to calculate the Pearson r correlation and drag
them to the right side of the variable list. Select “Pearson test” from the window. Click on the
“ok” button.
The result window will show the result of the Pearson correlation with the same set of
values used with manual computations.
Correlations
X Y
X Pearson Correlation 1.000 .950*
Sig. (2-tailed) 0.013
N 5.000 5
Y Pearson Correlation .950* 1.000
Sig. (2-tailed) 0.013
N 5 5.000
*Correlation is significant at the 0.05 level (2-tailed)
93
Computational procedure:
1. Record the rank order of the first distribution giving the highest score a rank of 1.
2. Do the same with the second distribution.
3. Get the difference between these two ranks for each individual and record in the D
column.
4. Square each of these difference and sum this column.
5. Apply the values in the formula:
6 ∑ D2
R=1−
n ( n2−1 )
Example:
A committee of five men and five women in a government agency evaluated ten
applicants for a vacant position in the office. They were asked to rate each applicant on the basis
of his/her qualification and interview. The ranking of the applicants by men and women are
given in the following table:
Applicant 1 2 3 4 5 6 7 8 9 10
Men(X) 6 2 1 4 9 10 5 3 8 7
Women(Y) 5 8 4 6 3 2 7 10 1 9
Solution:
Rx Ry D D2
1 6 5 1 1
2 2 8 -6 36
3 1 4 -3 9
4 4 6 -2 4
5 9 3 6 36
6 10 2 8 64
7 5 7 -2 4
8 3 10 -7 49
93
9 8 1 7 49
10 7 9 -2 4
∑ D 2=256
Substituting the values:
6 ∑ D2
R=1−
n ( n2−1 )
6(256)
¿ 1−
10(102−1)
1536
¿ 1−
10(99)
1536
¿ 1−
990
¿ 1−1.55
¿ 0.55
Spearman rank correlation in SPSS: To run the correlation in SPSS, select Analyze,
Correlate, Bivariate…from the Data Editor window.
93
The bivariate correlation window will appear after clicking on the bivariate
option. Select the correlation variable and put them on the right hand side of the variable list.
Then select the “Spearman test” from the window. Click on the “ok” button.
93
The following table will show the Spearman rank correlation from the result window
with the same set of values used with manual computations.
93
variables, to estimate the unknown value of a quantitative variable called the dependent or
response variable.
Suppose we want to find out whether the values of variable Y depend on the values of
another variable X, where Y is the dependent variable and X is the independent variable. The
first step is to collect a random sample of n pairs of values of X and Y, namely: (X 1, Y1), (X2,
Y2), …(Xn, Yn); then plot these values on the X-Y plane. The plotted points of the basic data on
the X-Y plane is called the scatter diagram. It gives us an idea on the possible relationship
netween X and Y. Figure 2 shows some forms of the scatter diagram. Scatter diagrams (a) and
(b) show a linear relationship between X and Y while; (c) exhibits no relation at all; (d) shows a
curved relation.
93
(c) No relationship (d) Nonlinear (quadratic) relationship
The next step in regression analysis is to find a suitable function that expresses the
predicted value of Y given a value of X. The general pattern presented by the points on the
scatter diagram gives us a clue to finding the appropriate regression function. Figure 3 shows that
a linear function seems to fit the pattern of the data points.
Y =a+bX
Where a is called the Y-intercept of the line (the value of Y when X is equal to zero); and
b is the slope of the line called the regression (the rate of change of Y per unit change in X). An
illustration is made in Figure 3 with b>0.
93
Figure 3: Graph of Y = a + bX when b>0
The line that describes the statistical relationship between X and Y is called the
regression line.
The regression line gives an estimate of the mean value of Y, denoted by Y, given the
value of X. hence the equation of the regression line is
Y =a+bX
Y should be distinguished from the observed data values which we denote simply by Y.
For a specific X, the resulting Y is a predicted value of the dependent variable.
b yx =
∑ XY −[ ∑ X ∑ Y / N ]
[∑ x 2−(∑ X )2 /N ]
Measure of Association [Data Analysis]
93
a yx =Y −b yx X
Data Scores of 10 students on two entrance tests, Test I in English, Test II in Filipino
∑ X=270
∑ Y =125
∑ XY =2740
∑ X 2=5992
∑ Y 2=1307
n = 15
X 270
X =∑ = =18
N 15
Y 125
Y =∑ = =8.3
N 15
b xy =∑ XY −¿ ¿
93
n ∑ xy−∑ x ∑ y
¿ 2
n ∑ x 2−∑ x
¿¿
¿ 0.43286
a yx =Y −b yx X
=8.33 – (0.43286)(18)
=8.33 – 7.7915
=0.5385
Y =a+bX
Y =0.5385+0.43286 X
We may now use this equation to predict Test 2 scores given Test I score.
Y =0.5385+0.4286 ( 30 )=13.496=13.5
The proper interpretation of 13.5 as a predicted value is that on the average a student
obtain a score of 13.5 in Test 2if his score in Test 1 is 30.
Simple Linear Regression in SPSS. To run simple linear regression in SPSS, select Analyze,
Regression, Linear... from the data editor window.
93
Then, specify Test 2 in dependent variable box and test 1 in the independent variable box.
93
a.
*Difference in manual computation and use of SPSS is due to rounding off errors.
Chi-square (X2)
The chi-square (Greek letter chi, X2 ) is the most commonly used method of comparing
proportions. It is particularly useful in test evaluating a relationship between nominal- or ordinal
data. Typical situations or settings are cases where persons, events, or objects are grouped in two
or more nominal categories such as “Yes – No” responses. “Favor – Against – Undecided” or
class “A, B, C, or D”.
Chi-square analysis compares the observed frequencies of the responses with the
expected frequencies. It is given by the formula:
2 (F ¿ ¿ o−F e )2
X = ¿
Fe
93
Chi-square and Goodness of Fit: One Sample Case
There are research problems where responses fall in any one of a number of categories.
The data are expressed in frequencies and the observed frequencies (Fo ) are compared to the
frequencies expected (Fe ) on the basis of some hypothesis.
If the differences between the observed and the expected frequencies are small, X2 will be
small. The greater the difference between the observed and expected frequencies under the null
hypothesis, the greater or larger the X2 will be. If the difference between the observed and
expected values are so large collectively as to occur by chance only, say 0.05 or less, when the
null hypothesis is true, then the null hypothesis is rejected.
Illustration:
Consider the nomination of three (3) presidential candidates of a political party, A, B, and
C. The chairman wonders whether or not they will be equally popular among the members of the
party. To this the hypothesis of equal preference, a random sample of 315 men were selected and
interviewed which one of the three candidates they prefer. The following are the results of the
survey:
Candidates Frequency
A 98
B 115
C 102
Are you going to reject the null hypothesis that equal members of men in the party prefer
each of the three-candidates? Or are you going to accept the null hypothesis of equality of
preference?
Fo Fe Fo – Fe F – F e )2
( o (Fo – Fe )2 / Fe
A 98 105 -7 49 49/105=0.467
1.505
93
2 [f 0 −f e ]2
X =∑ =1.505
fe
In order to test the significance of the computed X2 value using a specified criterion of
significance, the obtained value is referred to a table with appropriate degrees of freedom which
is equal to k-1, where k is equal to the number of categories of the variable. In this problem,
df=3-1=2. Therefore, for the X2 to be significant at the 0.05 level, the computed value should be
more than (>) the tabular value which is 5.991.
Summarizing,
=3-1
=2
Conclusion:
Since 1.505 < 5.991, do not reject Ho . There is no sufficient evidence or reason to
reject the null hypothesis that the frequencies in the population are equal.
93
Chi-square and Goodness of Fit: One Sample Case in SPSS: From the Data Editor window.
Data are entered in the SPSS like this
93
The following table will show the Chi-square correlation from the result window:
Correlations
Observed N Expected N Residual
98 98 105 -7
102 102 105 -3
115 115.000 105 10
Total 315
Test Statistics
Votes
Chi- 1.505
square (a)
Df 2
Asymp. Sig 0.471
a 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is
105.0.
93
Chi-square as a Test of Independence: Two Variable Problems
Chi-square can also be used to test the significance of the relationship between two
variables when data are expressed in terms of frequencies of joint occurrence. For this kind of
problem, a two-way contingency table with rows and columns are set-up.
In the case of two variable problems, the expected frequencies are those predicated on the
independence of the two variables. The formula is equal to
( rowtotal ) (columntotal)
F e= [ N ]
Example:
Suppose, one wants to know if there is a relationship between gender and school
choice. A sample of 100 female and 100 male freshman students were asked individually for
their school choice. Test the null hypothesis of no significant relationship between the students’
gender and school choice at 5% level of significance.
Gender
School choice
Female Male Total
Public 42 C1 65 C3 107
Private 58 C2 35 C4 93
Total 100 100 200
Computational Procedure:
100 x 107
C 1= =53.5
200
100 x 93
C 2= =46.5
200
100 x 107
C 3= =53.5
200
93
100 x 93
C 4= =46.5
200
( f ¿ ¿ o−f e )
∑ fe
=10.68 ¿
Therefore X2 =10.66
c=no. of columns
r=no. of rows
df=(2-1)(2-1)=(1)(1)=1
CV=3.841
4. Decision Rule
Since the computed value X2 = 10.66 is greater than the critical value (3.841), we
reject the null hypothesis. Thus, the two variables of gender and school are related-
females tend to prefer private schools while males tend to prefer to study in public
schools.
93
Pearson Chi Square in SPSS:Select Analyze, Descriptive Statistics,
Crosstabs...from the Data Editor window
As you click on the cross tab, the following window will appear. From this window, select
the row variable and insert it as a marked row. Select the second variable and put them in to
mark a column
93
The cross tabulation table for the Chi-Square test of Independence will show the expected
value for two nominal variables.
93
Gender
Female Male Total
School Public
42 65 107
School
Private
52 35 93
School
Total 100 100 200
The Chi-square tests table will show the value of Pearson Chi-Square value,
associated with the significance value.
Chi-square test
d
Value f Asymp. Sig (2 sided) Exact Sig. (2 sided Exact Sig. (1-sided)
10.63
Pearson Chi-square 2 1 0.001
Continuity Correction 9.728 1 0.002
Likelihood Ratio 10.73 1 0.001
Fisher's Exact Test 0.002 0.001
10.57
Linear-by-Linear 9 1 0.001
Association
N of Valid Cases 200
a. 0 cells (.0%) have expected count less than 5. The minimum expected count is
46.50.
b. Computed only for a 2x2 table
93
CHI-SQUARE TEST
Example
educational attainment of the respondents. To investigate the issue, the researcher randomly
selected 25 teachers and noted their gender and educational attainment. Gender is coded 1 for
male and 2 for female, and the Educational Attainment is coded 1 for BA/BS Degree Holder,
2 for With Units in MA/MS, 3 for MA/MS Degree Holder. The data are shown in Table 1.
93
19 2 3
20 2 2
21 2 2
22 2 2
23 2 3
24 2 3
25 2 3
Click on Analyze.
Click on Descriptive Statistics.
Click on Crosstabs
A dialog box will open, with a list of your variables in the white rectangle on the left.
Click on the independent variable, and then click on the small right arrow to the left of
the white box headed Column(s). This will move the independent variable into the
Column(s) box (in this example “EducAttainment”).
Click on the dependent variable and then click on the small right arrow to the left of the
white box headed Row(s) (see figure 1.2). This will move the dependent variable into
Row(s) box. In this example the dependent variable is “Gender”.
93
Figure 1.2 Entering variables in Crosstabs
Click on Statistics.
Click on the chi-square box. A tick will appear there.
Click on Continue.
Click on Cells.
In the percentages area click on Column. A tick will appear
Click on Continue
Click on OK.
An output window will open with the results of the analysis as follows.
93
Gender * EducAttainment Crosstabulation
EducAttainment Total
Count 4 2 3 9
1.00
% within EducAttainment 80.0% 20.0% 30.0% 36.0%
Gender
Count 1 8 7 16
2.00
% within EducAttainment 20.0% 80.0% 70.0% 64.0%
Count 5 10 10 25
Total
% within EducAttainment 100.0% 100.0% 100.0% 100.0%
Cases
Chi-square was used to determine if the gender of the teachers is significantly related to
educational attainment. Since the computed p-value (0.065_ with chi-square value of 5.469 is
greater than to the assigned level of significance (0.05), therefore, the hypothesis that there is no
significant relationship between gender and educational attainment of the teachers is thus
accepted. A non-significance chi-square indicates that, in the population from which we have
sampled, the data in the each row does not significantly related to data in each column.
93
SPSS APPLICATION – t-TEST FOR INDEPENDENT SAMPLES
Example
93
Figure 2.1 SPSS Data Editor after independent samples – test data entered in SPSS Data Editor
Click on Analyze
Click on Compare means
Click on Independent-Samples T Test
A dialog box will open with a list of your variables in the white rectangle on the left hand
side. Click on the dependent variable, and then click on the small black arrow to the left
of the white box headed Test Variable(s). This will place the dependent variable (in this
case Teaching Performance) in the box.
Click on the independent variable and then click on the small black arrow to the left of
the white box headed Grouping Variable. This will place the independent variable (in
this case Gender) in the box (see Figure 2.2)
93
Figure 2.2 Entering the test and grouping variables
93
Figure 2.3 Entering the levels of the independent variable
Click on Continue
Click on OK.
The output will be as follows
Group Statistics
93
Significance level: the
probability of a significant
Significance of Leven’s Test for t-value Degrees of difference in the means as
equality of variances. If this value is freedom great as the one here if the
less than 0.05, use the t-value, df, and
null hypothesis is true. This
significance level given for
value needs to be below
compared to the assigned
level of significance for
statistical
Referring to the Independent Samples Test table again we can see the results of the t-test.
The t-value is 8.253, there 23 degrees of freedom and the (two tailed) significance level is 0.000.
The difference between two groups’ means is significant in this case because 0.000 is less than o
the assigned level of significance (0.05).
The mean teaching performance were 93.55 (SD = 3.01) for males and 83.36 (SD = 3.10)
for females. The teaching performance of males and females were compared using an
independent samples t-test. There was no significant difference between the teaching
performance of males and females t(23) = 8.253, p-value = 0.000
93
PAIRED SAMPLE t-TEST
Cas
e Teaching Performance before Training Teaching Performance After Training
1 85 95
2 84 98
3 86 97
4 87 92
5 89 96
6 82 93
7 80 94
8 84 95
9 86 90
10 82 82
11 89 97
12 87 98
13 82 95
14 81 95
15 86 92
16 89 91
17 89 94
18 84 95
19 85 96
20 88 97
21 81 99
22 80 98
23 86 99
24 87 91
25 89 98
93
Step 1. Enter your data in the SPSS Data Editor.
Figure 3.1 Paired samples t-test data entered in SPSS Data Editor
Click on Analyze
Click on Compare Means
Click on Paired Samples T Test
A dialog box will appear. Click on both the variables you want to include in the analysis,
and then click on the small black arrow to the left of the white box headed Paired
Variables. The variables will move into the white box as in shown in Figure 3.2.
Click OK.
93
Figure 3.2 Entering variables in a paired sample t-test
The output will be as follows
Std. Std.
Deviatio Error
Mean N n Mean
Pair 1
before_TeachingPerformance 85.12 25 3.00444 0.60089
after_TeachingPerformance 94.68 25 3.70495 0.74099
93
Paired Samples test
The mean Teaching Performance was 85.12 (SD = 3.00) before the training course
and 94.68 (SD = 3.70) after the training course. The difference between the mean level of
Teaching Performance before and after the training course was examined with a paired samples
t-test. Teaching Performance was significantly higher after the training course than before it t(24)
= 10.347, p-value < 0.05.
Reading Assignment:
E-Journals/E-books:
PUP website: infotrac.galegroup.com/itweb/pup
Password: powersearch
Exercises/Written Assignments:
93
2. The scores on an aptitude test of ten students and the number of hours they reviewed for the are test
are as follows:
Number of Hours 18 27 20 10 30 24 32 27 12 16
Scores (y) 68 82 77 90 78 72 94 88 60 70
3. Table 1 gives the ranks of ten applicants for teaching position on intelligence and emotional
quotients as shown by their psychological test results.
4. In the following given data, x =number of sessions attended by 15 trainees in a leadership training
seminar, while y = scores obtained by the same trainees in a test given after the seminar.
x 3 2 4 5 5 6 6 7 9 7 8 5 6 3 8
y 65 50 75 70 80 85 79 88 91 87 88 70 71 63 85
93
References/Bibliography
Downie, N. M. & Heath, Robert W. Basic Statistical Methods (Fifth Edition). 1984
Field, M.N., Ryan, J.M., & Hess, R.K. Handbook of statistical procedures and their
computer applications to education and the behavioural sciences. NY: MacMillan
Publishing Co. 1991
Ostle, Bernard B. Statistics in Research: Basic Concepts and Techniques for Research
Workers. Ames: lowa State University Press. 1988
93