0% found this document useful (0 votes)
25 views92 pages

Hypothesis

BRM

Uploaded by

dkmahur99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views92 pages

Hypothesis

BRM

Uploaded by

dkmahur99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

Hypothesis Testing

Session 6
Internet Usage Data
Respondent Gender Familiarity Internet Attitude Toward Usage of Internet
Number Usage Internet Technology Shopping Banking
1 1.00 7.00 14.00 7.00 6.00 1.00 1.00
2 2.00 2.00 2.00 3.00 3.00 2.00 2.00
3 2.00 3.00 3.00 4.00 3.00 1.00 2.00
4 2.00 3.00 3.00 7.00 5.00 1.00 2.00
5 1.00 7.00 13.00 7.00 7.00 1.00 1.00
6 2.00 4.00 6.00 5.00 4.00 1.00 2.00
7 2.00 2.00 2.00 4.00 5.00 2.00 2.00
8 2.00 3.00 6.00 5.00 4.00 2.00 2.00
9 2.00 3.00 6.00 6.00 4.00 1.00 2.00
10 1.00 9.00 15.00 7.00 6.00 1.00 2.00
11 2.00 4.00 3.00 4.00 3.00 2.00 2.00
12 2.00 5.00 4.00 6.00 4.00 2.00 2.00
13 1.00 6.00 9.00 6.00 5.00 2.00 1.00
14 1.00 6.00 8.00 3.00 2.00 2.00 2.00
15 1.00 6.00 5.00 5.00 4.00 1.00 2.00
16 2.00 4.00 3.00 4.00 3.00 2.00 2.00
17 1.00 6.00 9.00 5.00 3.00 1.00 1.00
18 1.00 4.00 4.00 5.00 4.00 1.00 2.00
19 1.00 7.00 14.00 6.00 6.00 1.00 1.00
20 2.00 6.00 6.00 6.00 4.00 2.00 2.00
21 1.00 6.00 9.00 4.00 2.00 2.00 2.00
22 1.00 5.00 5.00 5.00 4.00 2.00 1.00
23 2.00 3.00 2.00 4.00 2.00 2.00 2.00
24 1.00 7.00 15.00 6.00 6.00 1.00 1.00
25 2.00 6.00 6.00 5.00 3.00 1.00 2.00
26 1.00 6.00 13.00 6.00 6.00 1.00 1.00
27 2.00 5.00 4.00 5.00 5.00 1.00 1.00
28 2.00 4.00 2.00 3.00 2.00 2.00 2.00
29 1.00 4.00 4.00 5.00 3.00 1.00 2.00
30 1.00 3.00 3.00 7.00 5.00 1.00 2.00
Frequency Distribution
• In a frequency distribution, one variable is
considered at a time.

• A frequency distribution for a variable produces a


table of frequency counts, percentages, and
cumulative percentages for all the values
associated with that variable.
Frequency Distribution of Familiarity
with the Internet

Valid Cumulative
Value label Value Frequency (N) Percentage percentage percentage

Not so familiar 1 0 0.0 0.0 0.0


2 2 6.7 6.9 6.9
3 6 20.0 20.7 27.6
4 6 20.0 20.7 48.3
5 3 10.0 10.3 58.6
6 8 26.7 27.6 86.2
Very familiar 7 4 13.3 13.8 100.0
Missing 9 1 3.3

TOTAL 30 100.0 100.0


Frequency
Histogram
8
7
6
5
Frequency

4
3
2
1
0
2 3 4 5 6 7
Familiarity
Statistics Associated with Frequency
Distribution

Measures • Mean
• Mode
of Location • Median

Measures •

Range
variance
• standard deviation
of Variation • coefficient of variation

Measures • Skewness
• Kurtosis
of Shape
Statistics Associated with Frequency
Distribution Measures of Location
• The mean, or average value, is the most commonly used
measure of central tendency. The mean, ,isX given by
n
X =  X i /n
i=1
Where,
Xi = Observed values of the variable X
n = Number of observations (sample size)

• The mode is the value that occurs most frequently. It


represents the highest peak of the distribution. The mode is a
good measure of location when the variable is inherently
categorical or has otherwise been grouped into categories.
Statistics Associated with Frequency
Distribution Measures of Location

• The median of a sample is the middle value when the data


are arranged in ascending or descending order. If the
number of data points is even, the median is usually
estimated as the midpoint between the two middle values
– by adding the two middle values and dividing their sum
by 2. The median is the 50th percentile.
Statistics Associated with Frequency
Distribution Measures of Variability

• The range measures the spread of the data. It is


simply the difference between the largest and
smallest values in the sample.
Range = Xlargest – Xsmallest.
• The interquartile range is the difference between
the 75th and 25th percentile. For a set of data points
arranged in order of magnitude, the pth percentile is
the value that has p% of the data points below it and
(100 - p)% above it.
Statistics Associated with Frequency
Distribution Measures of Variability

• The variance is the mean squared deviation from the mean.


The variance can never be negative.
• The standard deviation is the square root of the variance.
n
(Xi - X)2
sx = 
i =1 n - 1
• The coefficient of variation is the ratio of the standard
deviation to the mean expressed as a percentage, and is a
unitless measure of relative variability.
CV = sx /X
Statistics Associated with Frequency
Distribution Measures of Shape
• Skewness. The tendency of the deviations from the mean to
be larger in one direction than in the other. It can be thought
of as the tendency for one tail of the distribution to be heavier
than the other.
• Kurtosis is a measure of the relative peakedness or flatness of
the curve defined by the frequency distribution. The kurtosis
of a normal distribution is zero. If the kurtosis is positive, then
the distribution is more peaked than a normal distribution. A
negative value means that the distribution is flatter than a
normal distribution.
Skewness of a Distribution

Symmetric Distribution

Skewed Distribution

Mean
Median
Mode
(a)

Mean Median Mode


(b)
Hypothesis
• Unproven proposition
• Supposition that tentatively explains certain
facts or phenomena
• Assumption about nature of the world
Hypothesis
• An unproven proposition or supposition that
tentatively explains certain facts or
phenomena
– Null hypothesis
– Alternative hypothesis
Null Hypothesis
• Statement about the status quo
• No difference
Alternative Hypothesis
• Statement that indicates the opposite of the
null hypothesis
Significance Level
• Critical probability in choosing between the
null hypothesis and the alternative
hypothesis
Significance Level
• Critical Probability
• Confidence Level
• Alpha
• Probability Level selected is typically .05 or
.01
• Too low to warrant support for the null
hypothesis
Steps Involved in Hypothesis
Testing
Formulate H0 and H1

Select Appropriate Test


Choose Level of Significance

Collect Data and Calculate Test Statistic

Determine Probability Determine Critical


Associated with Test Value of Test
Statistic Statistic TSCR
Determine if TSCR
Compare with Level
falls into (Non)
of Significance, 
Rejection Region
Reject or Do not Reject H0

Draw Research Conclusion


Choosing the Appropriate
Statistical Technique
• Type of question to be answered
• Number of variables
– Univariate
– Bivariate
– Multivariate
• Scale of measurement
PARAMETRIC NONPARAMETRIC
STATISTICS STATISTICS
t-Distribution
• Symmetrical, bell-shaped distribution
• Mean of zero and a unit standard deviation
• Shape influenced by degrees of freedom
Testing a Hypothesis about a
Distribution
• Chi-Square test
• Test for significance in the analysis of
frequency distributions
• Compare observed frequencies with
expected frequencies
• “Goodness of Fit”
Degrees of Freedom
• Abbreviated d.f.
• Number of observations
• Number of constraints
A General Procedure for Hypothesis Testing
Step 2: Select an Appropriate Test
• The test statistic measures how close the sample has
come to the null hypothesis.
• The test statistic often follows a well-known
distribution, such as the normal, t, or chi-square
distribution.
• .

p-
z=
p
where
 ( − )
p =
n
Type I and Type II Errors

Accept null Reject null

Null is true Correct- Type I


no error error

Null is false Type II Correct-


error no error
Type I and Type II Errors
in Hypothesis Testing
State of Null Hypothesis Decision
in the Population Accept Ho Reject Ho

Ho is true Correct--no error Type I error


Ho is false Type II error Correct--no error
Probability of z with a One-Tailed Test

Shaded Area
= 0.9699

Unshaded Area
= 0.0301

0 z = 1.88
A General Procedure for Hypothesis Testing
Step 4: Collect Data and Calculate Test Statistic
• The required data are collected and the value of
the test statistic computed.
• In our example, the value of the sample
proportion is
p = 17/30 = 0.567.
• The value of pcan be determined as follows:

p = (1 - )
n
=
(0.40)(0.6)
30
= 0.089
A General Procedure for Hypothesis Testing
Step 4: Collect Data and Calculate Test Statistic

The test statistic z can be calculated as follows:

pˆ − 
z =
 p

= 0.567-0.40
0.089

= 1.88
A General Procedure for Hypothesis Testing
Step 5: Determine the Probability
(Critical Value)
• Using standard normal tables (Table 2 of the Statistical
Appendix), the probability of obtaining a z value of 1.88
can be calculated (see Figure 15.5).
• The shaded area between -  and 1.88 is 0.9699.
Therefore, the area to the right of z = 1.88 is 1.0000 -
0.9699 = 0.0301.
• Alternatively, the critical value of z, which will give an area
to the right side of the critical value of 0.05, is between
1.64 and 1.65 and equals 1.645.
• Note, in determining the critical value of the test statistic,
the area to the right of the critical value is either  or  /2.
It is  for a one-tail test and
/2 for a two-tail test.
A Broad Classification of Hypothesis Tests

Hypothesis Tests

Tests of Tests of
Association Differences

Proportions Median/
Distributions Means
Rankings
A Classification of Hypothesis Testing Procedures
for Examining Differences
Hypothesis Tests

Parametric Tests Non-parametric Tests


(Metric Tests) (Nonmetric Tests)

One Sample Two or More One Sample Two or More


Samples Samples
* t test * Chi-Square
* Z test * K-S
* Runs
* Binomial
Independen Paired
t Samples Samples Independen Paired
* Two-Group t
t Samples Samples
* Paired * Chi-Square * Sign
test t test
* Z test * Mann-Whitney * Wilcoxon
* Median * McNemar
* K-S * Chi-Square
Cross-Tabulation
• While a frequency distribution describes one variable at a
time, a cross-tabulation describes two or more variables
simultaneously.

• Cross-tabulation results in tables that reflect the joint


distribution of two or more variables with a limited number
of categories or distinct values.
Gender and Internet Usage
Table 3

Gender
Row
Internet Usage Male Female Total

Light (1) 5 10 15

Heavy (2) 10 5 15

Column Total 15 15
Two Variables Cross-Tabulation
• Since two variables have been cross-classified,
percentages could be computed either columnwise,
based on column totals (Table 4), or rowwise, based on
row totals (Table 5).

• The general rule is to compute the percentages in the


direction of the independent variable, across the
dependent variable. The correct way of calculating
percentages is as shown in Table 4.
Internet Usage by Gender

Gender

Internet Usage Male Female

Light 33.3% 66.7%

Heavy 66.7% 33.3%

Column total 100% 100%


Gender by Internet Usage
Internet Usage

Gender Light Heavy Total

Male 33.3% 66.7% 100.0%

Female 66.7% 33.3% 100.0%


Introduction of a Third Variable in Cross-
Tabulation
Original Two Variables

Some Association No Association


between the Two between the Two
Variables Variables

Introduce a Third Introduce a Third


Variable Variable

Refined Association No Association No Change in Some Association


between the Two between the Two the Initial between the Two
Variables Variables Pattern Variables
Three Variables Cross-Tabulation
Refine an Initial Relationship
Purchase of Fashion Clothing
by Marital Status
Table 6

Purchase of Current Marital Status


Fashion
Clothing Married Unmarried
High 31% 52%
Low 69% 48%
Column 100% 100%
Number of 700 300
respondents
Purchase of Fashion Clothing
by Table
Marital
7 Status
Purchase of Gender
Fashion Male Female
Clothing
Married Not Married Not
Married Married
High 35% 40% 25% 60%

Low 65% 60% 75% 40%

Column 100% 100% 100% 100%


totals
Number of 400 120 300 180
cases
Three Variables Cross-
Tabulation
Initial Relationship was
Spurious
Ownership of Expensive
Automobiles by Education
Level Table 8
Own Expensive Education
Automobile
College Degree No College Degree

Yes 32% 21%

No 68% 79%

Column totals 100% 100%

Number of cases 250 750


Ownership of Expensive Automobiles by
Education Level and Income Levels
Table 9
Income
Own Low Income High Income
Expensive
Automobile
College No College No College
Degree College Degree Degree
Degree

Yes 20% 20% 40% 40%


No 80% 80% 60% 60%
Column totals 100% 100% 100% 100%
Number of 100 700 150 50
respondents
Three Variables Cross-Tabulation
Reveal Suppressed Association
Desire to Travel Abroad by Age
Table 10

Desire to Travel Abroad Age

Less than 45 45 or More

Yes 50% 50%

No 50% 50%

Column totals 100% 100%

Number of respondents 500 500


Desire to Travel Abroad by
Age and Gender
Table 11
Desir e to Sex
Tr avel Male Female
Abr oad Age Age
< 45 >=45 <45 >=45

Yes 60% 40% 35% 65%

No 40% 60% 65% 35%

Column 100% 100% 100% 100%


totals
Number of 300 300 200 200
Cases
Three Variables Cross-
Tabulations
No Change in Initial
Relationship
Eating Frequently in
Fast-Food
Table 12
Restaurants by Family Size
Eat Frequently in Fast- Family Size
Food Restaurants
Small Large

Yes 65% 65%

No 35% 35%

Column totals 100% 100%

Number of cases 500 500


Eating Frequently in Fast Food-Restaurants
by Family Size and Income
Table 15

Income
Eat Frequently in Fast- Low High
Food Restaurants
Family size Family size
Small Large Small Large
Yes 65% 65% 65% 65%
No 35% 35% 35% 35%
Column totals 100% 100% 100% 100%
Number of respondents 250 250 250 250
Statistics Associated with
Cross-Tabulation Chi-Square
• To determine whether a systematic association exists, the
probability of obtaining a value of chi-square as large or larger
than the one calculated from the cross-tabulation is estimated.

• An important characteristic of the chi-square statistic is the


number of degrees of freedom (df) associated with it. That is,
df = (r - 1) x (c -1).

• The null hypothesis (H0) of no association between the two


variables will be rejected only when the calculated value of the
test statistic is greater than the critical value of the chi-square
distribution with the appropriate degrees of freedom, as
shown in Figure 8.
Chi-square Distribution
Fig. 8

Do Not Reject
H0

Reject H0

2
Critical
Value
Statistics Associated with
Cross-Tabulation Chi-Square

• The chi-square statistic (  ) is used to test the
statistical significance of the observed
association in a cross-tabulation.
• The expected frequency for each cell can be
calculated by using a simple formula:

nrnc
fe = n

where nr = total number in the row


nc = total number in the column
n = total sample size
Statistics Associated with
Cross-Tabulation Chi-Square
For the data in Table 3, the expected frequencies for
the cells going from left to right and from top to
bottom, are:
15 X 15 = 7.50 15 X 15 = 7.50
30 30

15 X 15 15 X 15
= 7.50 = 7.50
30 30

Then the value of is calculated as follows:

2 =  (fo - fe)2
fe
all
cells
Statistics Associated with
Cross-Tabulation Chi-
Square
For the data in Table 3, the value of is
calculated as:

= (5 -7.5)2 + (10 - 7.5)2 + (10 - 7.5)2 + (5 - 7.5)2


7.5 7.5 7.5 7.5

=0.833 + 0.833 + 0.833+ 0.833

= 3.333
Statistics Associated with
Cross-Tabulation Chi-Square
• The chi-square distribution is a skewed distribution whose
shape depends solely on the number of degrees of freedom.
As the number of degrees of freedom increases, the chi-square
distribution becomes more symmetrical.
• Table 3 in the Statistical Appendix contains upper-tail areas of
the chi-square distribution for different degrees of freedom.
For 1 degree of freedom the probability of exceeding a chi-
square value of 3.841 is 0.05.
• For the cross-tabulation given in Table 15.3, there are (2-1) x (2-
1) = 1 degree of freedom. The calculated chi-square statistic
had a value of 3.333. Since this is less than the critical value of
3.841, the null hypothesis of no association can not be rejected
indicating that the association is not statistically significant at
the 0.05 level.
Statistics Associated with
Cross-Tabulation Phi Coefficient
• The phi coefficient () is used as a measure of the
strength of association in the special case of a table with
two rows and two columns (a 2 x 2 table).
• The phi coefficient is proportional to the square root of
the chi-square statistic
2
=
n
• It takes the value of 0 when there is no association, which
would be indicated by a chi-square value of 0 as well.
When the variables are perfectly associated, phi assumes
the value of 1 and all the observations fall just on the main
or minor diagonal.
Statistics Associated with Cross-Tabulation
Contingency Coefficient
• While the phi coefficient is specific to a 2 x 2 table, the
contingency coefficient (C) can be used to assess the
strength of association in a table of any size.

2
C=
2 + n
• The contingency coefficient varies between 0 and 1.
• The maximum value of the contingency coefficient
depends on the size of the table (number of rows and
number of columns). For this reason, it should be
used only to compare tables of the same size.
Statistics Associated with Cross-Tabulation
Cramer’s V

• Cramer's V is a modified version of the phi correlation


coefficient, , and is used in tables larger than 2 x 2.

2

V=
min (r-1), (c-1)
or

2/n
V=
min (r-1), (c-1)
Cross-Tabulation in Practice
While conducting cross-tabulation analysis in practice, it is useful to proceed
along the following steps.
1. Test the null hypothesis that there is no association between the variables
using the chi-square statistic. If you fail to reject the null hypothesis, then
there is no relationship.
2. If H0 is rejected, then determine the strength of the association using an
appropriate statistic (phi-coefficient, contingency coefficient, Cramer's V,
lambda coefficient, or other statistics), as discussed earlier.
3. If H0 is rejected, interpret the pattern of the relationship by computing the
percentages in the direction of the independent variable, across the
dependent variable.
4. If the variables are treated as ordinal rather than nominal, use tau b, tau c,
or Gamma as the test statistic. If H0 is rejected, then determine the strength
of the association using the magnitude, and the direction of the relationship
using the sign of the test statistic.
Hypothesis Testing Related to Differences
• Parametric tests assume that the variables of interest are
measured on at least an interval scale.
• Nonparametric tests assume that the variables are measured
on a nominal or ordinal scale.
• These tests can be further classified based on whether one or
two or more samples are involved.
• The samples are independent if they are drawn randomly from
different populations. For the purpose of analysis, data
pertaining to different groups of respondents, e.g., males and
females, are generally treated as independent samples.
• The samples are paired when the data for the two samples
relate to the same group of respondents.
Parametric Tests
• The t statistic assumes that the variable is normally
distributed and the mean is known (or assumed to be
known) and the population variance is estimated from the
sample.
• Assume that the random variable X is normally distributed,
with mean and unknown population variance that is
estimated by the sample variance s 2.
• Then, t = ( X -  )/s X is t distributed with n - 1 degrees of
freedom.
• The t distribution is similar to the normal distribution in
appearance. Both distributions are bell-shaped and
symmetric. As the number of degrees of freedom
increases, the t distribution approaches the normal
distribution.
Hypothesis Testing Using
the t Statistic
1. Formulate the null (H0) and the alternative (H1)
hypotheses.
2. Select the appropriate formula for the t statistic.
3. Select a significance level, λ , for testing H0.
Typically, the 0.05 level is selected.
4. Take one or two samples and compute the mean
and standard deviation for each sample.
5. Calculate the t statistic assuming H0 is true.
Hypothesis Testing Using
the t Statistic
6. Calculate the degrees of freedom and estimate the probability
of getting a more extreme value of the statistic from Table 4
(Alternatively, calculate the critical value of the t statistic).
7. If the probability computed in step 5 is smaller than the
significance level selected in step 2, reject H0. If the probability is
larger, do not reject H0. (Alternatively, if the value of the
calculated t statistic in step 4 is larger than the critical value
determined in step 5, reject H0. If the calculated value is smaller
than the critical value, do not reject H0). Failure to reject H0 does
not necessarily imply that H0 is true. It only means that the true
state is not significantly different than that assumed by H0.
8. Express the conclusion reached by the t test in terms of the
marketing research problem.
One Sample : t Test
For the data in Table 2, suppose we wanted to test
the hypothesis that the mean familiarity rating exceeds
4.0, the neutral value on a 7 point scale. A significance
level of = 0.05 is selected. The hypotheses may be
formulated as:
H0 : < 4.0
H1:  > 4.0
t = (X - )/sX
sX = s/ n
sX = 1.579/ 29

= 1.579/5.385 = 0.293
t = (4.724-4.0)/0.293 = 0.724/0.293 = 2.471
One Sample : t Test
The degrees of freedom for the t statistic to test the
hypothesis about one mean are n - 1. In this case,
n - 1 = 29 - 1 or 28. From Table in the Statistical Appendix,
the probability of getting a more extreme value than
2.471 is less than 0.05 (Alternatively, the critical t value
for 28 degrees of freedom and a significance level of 0.05
is 1.7011, which is less than the calculated value). Hence,
the null hypothesis is rejected. The familiarity level does
exceed 4.0.
One Sample : Z Test
Note that if the population standard deviation was
assumed to be known as 1.5, rather than estimated
from the sample, a z test would be appropriate. In this
case, the value of the z statistic would be:
z = (X - )/X
where
X
= 1.5/ 29
= 1.5/5.385 = 0.279
and z = (4.724 - 4.0)/0.279 = 0.724/0.279 = 2.595
One Sample : Z Test
• From Table in the Statistical Appendix, the probability of
getting a more extreme value of z than 2.595 is less than
0.05. (Alternatively, the critical z value for a one-tailed
test and a significance level of 0.05 is 1.645, which is less
than the calculated value.) Therefore, the null hypothesis
is rejected, reaching the same conclusion arrived at earlier
by the t test.
• The procedure for testing a null hypothesis with respect
to a proportion was illustrated earlier in this chapter when
we introduced hypothesis testing.
Two Independent Samples
Means
• In the case of means for two independent samples, the
hypotheses take the following form.

H : = 
0 1 2

H :  
1 1 2

• The two populations are sampled and the means and variances
computed based on samples of sizes n1 and n2. If both
populations are found to have the same variance, a pooled
variance estimate is computed from the two sample variances
as follows:

n1 n2 2 2
(X − X ) + (X − X )
2
2 (n 1 - 1) s1 + (n 2-1) s2
2

2
= i =1
i1 1
or s = i =1
i2 2

s
n + n −2 1 2
n1 + n2 -2
Two Independent Samples
Means
The standard deviation of the test statistic can be
estimated as:

sX 1 - X 2 = s 2 (n1 + n1 )
1 2

The appropriate value of t can be calculated as:

(X 1 -X 2) - (1 - 2)
t= sX 1 - X 2

The degrees of freedom in this case are (n1 + n2 -2).


Two Independent Samples F Test

An F test of sample variance may be performed if it is


not known whether the two populations have equal
variance. In this case, the hypotheses are:

H0:12 = 22

H1: 12  22


Two Independent Samples
F Statistic
The F statistic is computed from the sample variances
as follows
s12
F(n1-1),(n2-1) =
s 2
where 2
n1 = size of sample 1
n2 = size of sample 2
n1-1 = degrees of freedom for sample 1
n2-1 = degrees of freedom for sample 2
s12 = sample variance for sample 1
s2 2 = sample variance for sample 2

Using the data of Table 15.1, suppose we wanted to determine


whether Internet usage was different for males as compared to
females. A two-independent-samples t test was conducted. The
results are presented in Table 15.14.
Two
Table 14
Independent-Samples t Tests
Table 15.14 Summary Statistics

Number Standard
of Cases Mean Deviation

Male 15 9.333 1.137


Female 15 3.867 0.435

F Test for Equality of Variances


F 2-tail
value probability

15.507 0.000

t Test
Equal Variances Assumed Equal Variances Not Assumed

t Degrees of 2-tail t Degrees of 2-tail


value freedom probability value freedom probability

- 4.492 28 0.000 -4.492 18.014 0.000


Two Independent Samples
Proportions
The case involving proportions for two independent samples is
also illustrated using the data of Table 15.1, which gives the
number of males and females who use the Internet for
shopping. Is the proportion of respondents using the Internet
for shopping the same for males and females?
The null and alternative hypotheses are:
H0: 1 = 2
H1: 1  2

A Z test is used as in testing the proportion for one sample.


However, in this case the test statistic is given by:

P −P
Z= 1 2

S P1− p 2
Two Independent Samples Proportions

In the test statistic, the numerator is the difference


between the proportions in the two samples, P1 and
P2. The denominator is the standard error of the
difference in the two proportions and is given by
1 1
S P1− p 2 = P(1 − P) + 
 n1 n2 

where

n1P1 + n2P2
P = n1 + n2
Two Independent Samples Proportions

A significance level of = 0.05 is selected. Given the data of


Table 15.1, the test statistic can be calculated as:

P −P
1 2 = (11/15) -(6/15)

= 0.733 - 0.400 = 0.333

P = (15 x 0.733+15 x 0.4)/(15 + 15) = 0.567


S P1− p 2 0.567 x 0.433 [ 1 + 1 ]
15 15
= = 0.181

Z = 0.333/0.181 = 1.84
Two Independent Samples
Proportions

Given a two-tail test, the area to the right of the critical


value is 0.025. Hence, the critical value of the test
statistic is 1.96. Since the calculated value is less than the
critical value, the null hypothesis can not be rejected.
Thus, the proportion of users (0.733 for males and 0.400
for females) is not significantly different for the two
samples. Note that while the difference is substantial, it
is not statistically significant due to the small sample
sizes (15 in each group).
Paired Samples
The difference in these cases is examined by a paired samples
t test. To compute t for paired samples, the paired difference
variable, denoted by D, is formed and its mean and variance
calculated. Then the t statistic is computed. The degrees of
freedom are n - 1, where n is the number of pairs. The relevant
formulas are:

H0:  D = 0
H1:  D  0

D - D
tn-1 = sD
continued… n
Paired Samples
Where:
n
 Di
D = i=1n
n
=1 (Di - D)2
sD = i
n-1

S
SD = n
D

In the Internet usage example (Table 15.1), a paired t


test could be used to determine if the respondents
differed in their attitude toward the Internet and
attitude toward technology. The resulting output is
shown in Table 15.15.
Paired-Samples t Test
Table 15
Number Standard Standard
Variable of Cases Mean Deviation Error

Internet Attitude 30 5.167 1.234 0.225


Technology Attitude 30 4.100 1.398 0.255

Difference = Internet
- - Technology

Difference Standard Standard 2-tail t Degrees of 2-tail


Mean deviation error Correlation prob. value freedom probability

1.067 0.828 0.1511 0.809 0.000 7.059 29 0.000


Nonparametric Tests

Nonparametric tests are used when the independent


variables are nonmetric. Like parametric tests,
nonparametric tests are available for testing variables
from one sample, two independent samples, or two
related samples.
Nonparametric Tests
OneSometimes
Sample the researcher wants to test whether the
observations for a particular variable could reasonably
have come from a particular distribution, such as the
normal, uniform, or Poisson distribution.

The Kolmogorov-Smirnov (K-S) one-sample test


is one such goodness-of-fit test. The K-S compares the
cumulative distribution function for a variable with a
specified distribution. Ai denotes the cumulative
relative frequency for each category of the theoretical
(assumed) distribution, and Oi the comparable value of
the sample frequency. The K-S test is based on the
maximum value of the absolute difference between Ai
and Oi. The test statistic is
K = Max A i - Oi
Nonparametric Tests
One Sample
• The decision to reject the null hypothesis is based on the value
of K. The larger the K is, the more confidence we have that H0 is
false. For = 0.05, the critical value of K for large samples (over
35) is given by 1.36/ nAlternatively, K can be transformed into a
normally distributed z statistic and its associated probability
determined.
• In the context of the Internet usage example, suppose we
wanted to test whether the distribution of Internet usage was
normal. A K-S one-sample test is conducted, yielding the data
shown in Table 15.16. Table 15.16 indicates that the probability
of observing a K value of 0.222, as determined by the normalized
z statistic, is 0.103. Since this is more than the significance level
of 0.05, the null hypothesis can not be rejected, leading to the
same conclusion. Hence, the distribution of Internet usage does
not deviate significantly from the normal distribution.
K-S One-Sample Test for
Normality of Internet Usage
Table 16
Test Distribution - Normal

Mean: 6.600
Standard Deviation: 4.296

Cases: 30

Most Extreme Differences


Absolute Positive Negative K-S z 2-Tailed p
0.222 0.222 -0.142 1.217 0.103
Nonparametric Tests
One Sample
• The chi-square test can also be performed on a single
variable from one sample. In this context, the chi-square
serves as a goodness-of-fit test.
• The runs test is a test of randomness for the dichotomous
variables. This test is conducted by determining whether
the order or sequence in which observations are obtained is
random.
• The binomial test is also a goodness-of-fit test for
dichotomous variables. It tests the goodness of fit of the
observed number of observations in each category to the
number expected under a specified binomial distribution.
Nonparametric Tests
Two Independent Samples
• When the difference in the location of two populations is to be
compared based on observations from two independent samples,
and the variable is measured on an ordinal scale, the Mann-Whitney
U test can be used.
• In the Mann-Whitney U test, the two samples are combined and the
cases are ranked in order of increasing size.
• The test statistic, U, is computed as the number of times a score
from sample or group 1 precedes a score from group 2.
• If the samples are from the same population, the distribution of
scores from the two groups in the rank list should be random. An
extreme value of U would indicate a nonrandom pattern, pointing
to the inequality of the two groups.
• For samples of less than 30, the exact significance level for U is
computed. For larger samples, U is transformed into a normally
distributed z statistic. This z can be corrected for ties within ranks.
Nonparametric Tests
Two Independent Samples
• We examine again the difference in the Internet usage of males and
females. This time, though, the Mann-Whitney U test is used. The
results are given in Table 15.17.
• One could also use the cross-tabulation procedure to conduct a chi-
square test. In this case, we will have a 2 x 2 table. One variable will
be used to denote the sample, and will assume the value 1 for sample 1
and the value of 2 for sample 2. The other variable will be the binary
variable of interest.
• The two-sample median test determines whether the two groups are
drawn from populations with the same median. It is not as powerful
as the Mann-Whitney U test because it merely uses the location of
each observation relative to the median, and not the rank, of each
observation.
• The Kolmogorov-Smirnov two-sample test examines whether the
two distributions are the same. It takes into account any differences
between the two distributions, including the median, dispersion, and
skewness.
Mann-Whitney U - Wilcoxon Rank
Sum W Test Internet Usage by Gender
Table 17
Gender Mean Rank Cases

Male 20.93 15
Female 10.07 15

Total 30

Corrected for ties


U W z 2-tailed p

31.000 151.000 -3.406 0.001

Note
U = Mann-Whitney test statistic
W = Wilcoxon W Statistic
z = U transformed into normally distributed z statistic.
Nonparametric Tests
Paired Samples
• The Wilcoxon matched-pairs signed-ranks test analyzes
the differences between the paired observations, taking
into account the magnitude of the differences.
• It computes the differences between the pairs of
variables and ranks the absolute differences.
• The next step is to sum the positive and negative ranks.
The test statistic, z, is computed from the positive and
negative rank sums.
• Under the null hypothesis of no difference, z is a standard
normal variate with mean 0 and variance 1 for large
samples.
Nonparametric Tests Paired
Samples
• The example considered for the paired t test, whether the
respondents differed in terms of attitude toward the Internet
and attitude toward technology, is considered again. Suppose
we assume that both these variables are measured on ordinal
rather than interval scales. Accordingly, we use the Wilcoxon
test. The results are shown in Table 18.
• The sign test is not as powerful as the Wilcoxon matched-pairs
signed-ranks test as it only compares the signs of the
differences between pairs of variables without taking into
account the ranks.
• In the special case of a binary variable where the researcher
wishes to test differences in proportions, the McNemar test
can be used. Alternatively, the chi-square test can also be
used for binary variables.
Wilcoxon Matched-Pairs Signed-Rank Test
Internet with Technology
Table 18

(Technology - Internet) Cases Mean rank

-Ranks 23 12.72

+Ranks 1 7.50

Ties 6

Total 30

z = -4.207 2-tailed p = 0.0000

You might also like