Hypothesis
Hypothesis
Session 6
Internet Usage Data
Respondent Gender Familiarity Internet Attitude Toward Usage of Internet
Number Usage Internet Technology Shopping Banking
1 1.00 7.00 14.00 7.00 6.00 1.00 1.00
2 2.00 2.00 2.00 3.00 3.00 2.00 2.00
3 2.00 3.00 3.00 4.00 3.00 1.00 2.00
4 2.00 3.00 3.00 7.00 5.00 1.00 2.00
5 1.00 7.00 13.00 7.00 7.00 1.00 1.00
6 2.00 4.00 6.00 5.00 4.00 1.00 2.00
7 2.00 2.00 2.00 4.00 5.00 2.00 2.00
8 2.00 3.00 6.00 5.00 4.00 2.00 2.00
9 2.00 3.00 6.00 6.00 4.00 1.00 2.00
10 1.00 9.00 15.00 7.00 6.00 1.00 2.00
11 2.00 4.00 3.00 4.00 3.00 2.00 2.00
12 2.00 5.00 4.00 6.00 4.00 2.00 2.00
13 1.00 6.00 9.00 6.00 5.00 2.00 1.00
14 1.00 6.00 8.00 3.00 2.00 2.00 2.00
15 1.00 6.00 5.00 5.00 4.00 1.00 2.00
16 2.00 4.00 3.00 4.00 3.00 2.00 2.00
17 1.00 6.00 9.00 5.00 3.00 1.00 1.00
18 1.00 4.00 4.00 5.00 4.00 1.00 2.00
19 1.00 7.00 14.00 6.00 6.00 1.00 1.00
20 2.00 6.00 6.00 6.00 4.00 2.00 2.00
21 1.00 6.00 9.00 4.00 2.00 2.00 2.00
22 1.00 5.00 5.00 5.00 4.00 2.00 1.00
23 2.00 3.00 2.00 4.00 2.00 2.00 2.00
24 1.00 7.00 15.00 6.00 6.00 1.00 1.00
25 2.00 6.00 6.00 5.00 3.00 1.00 2.00
26 1.00 6.00 13.00 6.00 6.00 1.00 1.00
27 2.00 5.00 4.00 5.00 5.00 1.00 1.00
28 2.00 4.00 2.00 3.00 2.00 2.00 2.00
29 1.00 4.00 4.00 5.00 3.00 1.00 2.00
30 1.00 3.00 3.00 7.00 5.00 1.00 2.00
Frequency Distribution
• In a frequency distribution, one variable is
considered at a time.
Valid Cumulative
Value label Value Frequency (N) Percentage percentage percentage
4
3
2
1
0
2 3 4 5 6 7
Familiarity
Statistics Associated with Frequency
Distribution
Measures • Mean
• Mode
of Location • Median
Measures •
•
Range
variance
• standard deviation
of Variation • coefficient of variation
Measures • Skewness
• Kurtosis
of Shape
Statistics Associated with Frequency
Distribution Measures of Location
• The mean, or average value, is the most commonly used
measure of central tendency. The mean, ,isX given by
n
X = X i /n
i=1
Where,
Xi = Observed values of the variable X
n = Number of observations (sample size)
Symmetric Distribution
Skewed Distribution
Mean
Median
Mode
(a)
p-
z=
p
where
( − )
p =
n
Type I and Type II Errors
Shaded Area
= 0.9699
Unshaded Area
= 0.0301
0 z = 1.88
A General Procedure for Hypothesis Testing
Step 4: Collect Data and Calculate Test Statistic
• The required data are collected and the value of
the test statistic computed.
• In our example, the value of the sample
proportion is
p = 17/30 = 0.567.
• The value of pcan be determined as follows:
p = (1 - )
n
=
(0.40)(0.6)
30
= 0.089
A General Procedure for Hypothesis Testing
Step 4: Collect Data and Calculate Test Statistic
pˆ −
z =
p
= 0.567-0.40
0.089
= 1.88
A General Procedure for Hypothesis Testing
Step 5: Determine the Probability
(Critical Value)
• Using standard normal tables (Table 2 of the Statistical
Appendix), the probability of obtaining a z value of 1.88
can be calculated (see Figure 15.5).
• The shaded area between - and 1.88 is 0.9699.
Therefore, the area to the right of z = 1.88 is 1.0000 -
0.9699 = 0.0301.
• Alternatively, the critical value of z, which will give an area
to the right side of the critical value of 0.05, is between
1.64 and 1.65 and equals 1.645.
• Note, in determining the critical value of the test statistic,
the area to the right of the critical value is either or /2.
It is for a one-tail test and
/2 for a two-tail test.
A Broad Classification of Hypothesis Tests
Hypothesis Tests
Tests of Tests of
Association Differences
Proportions Median/
Distributions Means
Rankings
A Classification of Hypothesis Testing Procedures
for Examining Differences
Hypothesis Tests
Gender
Row
Internet Usage Male Female Total
Light (1) 5 10 15
Heavy (2) 10 5 15
Column Total 15 15
Two Variables Cross-Tabulation
• Since two variables have been cross-classified,
percentages could be computed either columnwise,
based on column totals (Table 4), or rowwise, based on
row totals (Table 5).
Gender
No 68% 79%
No 50% 50%
No 35% 35%
Income
Eat Frequently in Fast- Low High
Food Restaurants
Family size Family size
Small Large Small Large
Yes 65% 65% 65% 65%
No 35% 35% 35% 35%
Column totals 100% 100% 100% 100%
Number of respondents 250 250 250 250
Statistics Associated with
Cross-Tabulation Chi-Square
• To determine whether a systematic association exists, the
probability of obtaining a value of chi-square as large or larger
than the one calculated from the cross-tabulation is estimated.
Do Not Reject
H0
Reject H0
2
Critical
Value
Statistics Associated with
Cross-Tabulation Chi-Square
• The chi-square statistic ( ) is used to test the
statistical significance of the observed
association in a cross-tabulation.
• The expected frequency for each cell can be
calculated by using a simple formula:
nrnc
fe = n
15 X 15 15 X 15
= 7.50 = 7.50
30 30
2 = (fo - fe)2
fe
all
cells
Statistics Associated with
Cross-Tabulation Chi-
Square
For the data in Table 3, the value of is
calculated as:
= 3.333
Statistics Associated with
Cross-Tabulation Chi-Square
• The chi-square distribution is a skewed distribution whose
shape depends solely on the number of degrees of freedom.
As the number of degrees of freedom increases, the chi-square
distribution becomes more symmetrical.
• Table 3 in the Statistical Appendix contains upper-tail areas of
the chi-square distribution for different degrees of freedom.
For 1 degree of freedom the probability of exceeding a chi-
square value of 3.841 is 0.05.
• For the cross-tabulation given in Table 15.3, there are (2-1) x (2-
1) = 1 degree of freedom. The calculated chi-square statistic
had a value of 3.333. Since this is less than the critical value of
3.841, the null hypothesis of no association can not be rejected
indicating that the association is not statistically significant at
the 0.05 level.
Statistics Associated with
Cross-Tabulation Phi Coefficient
• The phi coefficient () is used as a measure of the
strength of association in the special case of a table with
two rows and two columns (a 2 x 2 table).
• The phi coefficient is proportional to the square root of
the chi-square statistic
2
=
n
• It takes the value of 0 when there is no association, which
would be indicated by a chi-square value of 0 as well.
When the variables are perfectly associated, phi assumes
the value of 1 and all the observations fall just on the main
or minor diagonal.
Statistics Associated with Cross-Tabulation
Contingency Coefficient
• While the phi coefficient is specific to a 2 x 2 table, the
contingency coefficient (C) can be used to assess the
strength of association in a table of any size.
2
C=
2 + n
• The contingency coefficient varies between 0 and 1.
• The maximum value of the contingency coefficient
depends on the size of the table (number of rows and
number of columns). For this reason, it should be
used only to compare tables of the same size.
Statistics Associated with Cross-Tabulation
Cramer’s V
2
V=
min (r-1), (c-1)
or
2/n
V=
min (r-1), (c-1)
Cross-Tabulation in Practice
While conducting cross-tabulation analysis in practice, it is useful to proceed
along the following steps.
1. Test the null hypothesis that there is no association between the variables
using the chi-square statistic. If you fail to reject the null hypothesis, then
there is no relationship.
2. If H0 is rejected, then determine the strength of the association using an
appropriate statistic (phi-coefficient, contingency coefficient, Cramer's V,
lambda coefficient, or other statistics), as discussed earlier.
3. If H0 is rejected, interpret the pattern of the relationship by computing the
percentages in the direction of the independent variable, across the
dependent variable.
4. If the variables are treated as ordinal rather than nominal, use tau b, tau c,
or Gamma as the test statistic. If H0 is rejected, then determine the strength
of the association using the magnitude, and the direction of the relationship
using the sign of the test statistic.
Hypothesis Testing Related to Differences
• Parametric tests assume that the variables of interest are
measured on at least an interval scale.
• Nonparametric tests assume that the variables are measured
on a nominal or ordinal scale.
• These tests can be further classified based on whether one or
two or more samples are involved.
• The samples are independent if they are drawn randomly from
different populations. For the purpose of analysis, data
pertaining to different groups of respondents, e.g., males and
females, are generally treated as independent samples.
• The samples are paired when the data for the two samples
relate to the same group of respondents.
Parametric Tests
• The t statistic assumes that the variable is normally
distributed and the mean is known (or assumed to be
known) and the population variance is estimated from the
sample.
• Assume that the random variable X is normally distributed,
with mean and unknown population variance that is
estimated by the sample variance s 2.
• Then, t = ( X - )/s X is t distributed with n - 1 degrees of
freedom.
• The t distribution is similar to the normal distribution in
appearance. Both distributions are bell-shaped and
symmetric. As the number of degrees of freedom
increases, the t distribution approaches the normal
distribution.
Hypothesis Testing Using
the t Statistic
1. Formulate the null (H0) and the alternative (H1)
hypotheses.
2. Select the appropriate formula for the t statistic.
3. Select a significance level, λ , for testing H0.
Typically, the 0.05 level is selected.
4. Take one or two samples and compute the mean
and standard deviation for each sample.
5. Calculate the t statistic assuming H0 is true.
Hypothesis Testing Using
the t Statistic
6. Calculate the degrees of freedom and estimate the probability
of getting a more extreme value of the statistic from Table 4
(Alternatively, calculate the critical value of the t statistic).
7. If the probability computed in step 5 is smaller than the
significance level selected in step 2, reject H0. If the probability is
larger, do not reject H0. (Alternatively, if the value of the
calculated t statistic in step 4 is larger than the critical value
determined in step 5, reject H0. If the calculated value is smaller
than the critical value, do not reject H0). Failure to reject H0 does
not necessarily imply that H0 is true. It only means that the true
state is not significantly different than that assumed by H0.
8. Express the conclusion reached by the t test in terms of the
marketing research problem.
One Sample : t Test
For the data in Table 2, suppose we wanted to test
the hypothesis that the mean familiarity rating exceeds
4.0, the neutral value on a 7 point scale. A significance
level of = 0.05 is selected. The hypotheses may be
formulated as:
H0 : < 4.0
H1: > 4.0
t = (X - )/sX
sX = s/ n
sX = 1.579/ 29
= 1.579/5.385 = 0.293
t = (4.724-4.0)/0.293 = 0.724/0.293 = 2.471
One Sample : t Test
The degrees of freedom for the t statistic to test the
hypothesis about one mean are n - 1. In this case,
n - 1 = 29 - 1 or 28. From Table in the Statistical Appendix,
the probability of getting a more extreme value than
2.471 is less than 0.05 (Alternatively, the critical t value
for 28 degrees of freedom and a significance level of 0.05
is 1.7011, which is less than the calculated value). Hence,
the null hypothesis is rejected. The familiarity level does
exceed 4.0.
One Sample : Z Test
Note that if the population standard deviation was
assumed to be known as 1.5, rather than estimated
from the sample, a z test would be appropriate. In this
case, the value of the z statistic would be:
z = (X - )/X
where
X
= 1.5/ 29
= 1.5/5.385 = 0.279
and z = (4.724 - 4.0)/0.279 = 0.724/0.279 = 2.595
One Sample : Z Test
• From Table in the Statistical Appendix, the probability of
getting a more extreme value of z than 2.595 is less than
0.05. (Alternatively, the critical z value for a one-tailed
test and a significance level of 0.05 is 1.645, which is less
than the calculated value.) Therefore, the null hypothesis
is rejected, reaching the same conclusion arrived at earlier
by the t test.
• The procedure for testing a null hypothesis with respect
to a proportion was illustrated earlier in this chapter when
we introduced hypothesis testing.
Two Independent Samples
Means
• In the case of means for two independent samples, the
hypotheses take the following form.
H : =
0 1 2
H :
1 1 2
• The two populations are sampled and the means and variances
computed based on samples of sizes n1 and n2. If both
populations are found to have the same variance, a pooled
variance estimate is computed from the two sample variances
as follows:
n1 n2 2 2
(X − X ) + (X − X )
2
2 (n 1 - 1) s1 + (n 2-1) s2
2
2
= i =1
i1 1
or s = i =1
i2 2
s
n + n −2 1 2
n1 + n2 -2
Two Independent Samples
Means
The standard deviation of the test statistic can be
estimated as:
sX 1 - X 2 = s 2 (n1 + n1 )
1 2
(X 1 -X 2) - (1 - 2)
t= sX 1 - X 2
H0:12 = 22
Number Standard
of Cases Mean Deviation
15.507 0.000
t Test
Equal Variances Assumed Equal Variances Not Assumed
P −P
Z= 1 2
S P1− p 2
Two Independent Samples Proportions
where
n1P1 + n2P2
P = n1 + n2
Two Independent Samples Proportions
P −P
1 2 = (11/15) -(6/15)
Z = 0.333/0.181 = 1.84
Two Independent Samples
Proportions
H0: D = 0
H1: D 0
D - D
tn-1 = sD
continued… n
Paired Samples
Where:
n
Di
D = i=1n
n
=1 (Di - D)2
sD = i
n-1
S
SD = n
D
Difference = Internet
- - Technology
Mean: 6.600
Standard Deviation: 4.296
Cases: 30
Male 20.93 15
Female 10.07 15
Total 30
Note
U = Mann-Whitney test statistic
W = Wilcoxon W Statistic
z = U transformed into normally distributed z statistic.
Nonparametric Tests
Paired Samples
• The Wilcoxon matched-pairs signed-ranks test analyzes
the differences between the paired observations, taking
into account the magnitude of the differences.
• It computes the differences between the pairs of
variables and ranks the absolute differences.
• The next step is to sum the positive and negative ranks.
The test statistic, z, is computed from the positive and
negative rank sums.
• Under the null hypothesis of no difference, z is a standard
normal variate with mean 0 and variance 1 for large
samples.
Nonparametric Tests Paired
Samples
• The example considered for the paired t test, whether the
respondents differed in terms of attitude toward the Internet
and attitude toward technology, is considered again. Suppose
we assume that both these variables are measured on ordinal
rather than interval scales. Accordingly, we use the Wilcoxon
test. The results are shown in Table 18.
• The sign test is not as powerful as the Wilcoxon matched-pairs
signed-ranks test as it only compares the signs of the
differences between pairs of variables without taking into
account the ranks.
• In the special case of a binary variable where the researcher
wishes to test differences in proportions, the McNemar test
can be used. Alternatively, the chi-square test can also be
used for binary variables.
Wilcoxon Matched-Pairs Signed-Rank Test
Internet with Technology
Table 18
-Ranks 23 12.72
+Ranks 1 7.50
Ties 6
Total 30